Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Neo4j Connector #500

Open
liuzongliang0202 opened this issue May 10, 2023 · 8 comments
Open

Support Neo4j Connector #500

liuzongliang0202 opened this issue May 10, 2023 · 8 comments

Comments

@liuzongliang0202
Copy link

The issue tracker is used for bug reporting purposes ONLY whereas feature request needs to follow the RIP process. To avoid unnecessary duplication, please check whether there is a previous issue before filing a new one.

It is recommended to start a discussion thread in the mailing lists in cases of discussing your deployment plan, API clarification, and other non-bug-reporting issues.
We welcome any friendly suggestions, bug fixes, collaboration, and other improvements.

Please ensure that your bug report is clear and self-contained. Otherwise, it would take additional rounds of communication, thus more time, to understand the problem itself.

Generally, fixing an issue goes through the following steps:

  1. Understand the issue reported;
  2. Reproduce the unexpected behavior locally;
  3. Perform root cause analysis to identify the underlying problem;
  4. Create test cases to cover the identified problem;
  5. Work out a solution to rectify the behavior and make the newly created test cases pass;
  6. Make a pull request and go through peer review;

As a result, it would be very helpful yet challenging if you could provide an isolated project reproducing your reported issue. Anyway, please ensure your issue report is informative enough for the community to pick up. At a minimum, include the following hints:

BUG REPORT

  1. Please describe the issue you observed:
  • What did you do (The steps to reproduce)?

  • What is expected to see?

  • What did you see instead?

  1. Please tell us about your environment:

  2. Other information (e.g. detailed explanation, logs, related issues, suggestions on how to fix, etc):

FEATURE REQUEST

  1. Please describe the feature you are requesting.

  2. Provide any additional detail on your proposed use case for this feature.

  3. Indicate the importance of this issue to you (blocker, must-have, should-have, nice-to-have). Are you currently using any workarounds to address this issue?

  4. If there are some sub-tasks involved, use -[] for each sub-task and create a corresponding issue to map to the sub-task:

@leehom
Copy link

leehom commented May 12, 2023

我实现了neo4j datax reader/writer,并已应用在生产1年多,实现上跟connect的connector类似,设计上二阶段设计,node读写阶段和关系读写阶段,需要自定义record,因这点不太满足datax的reader/writer随意切换设计理念

@liuzongliang0202
Copy link
Author

我实现了neo4j datax reader/writer,并已应用在生产1年多,实现上跟connect的connector类似,设计上二阶段设计,node读写阶段和关系读写阶段,需要自定义record,因这点不太满足datax的reader/writer随意切换设计理念

可以一起讨论下方案吗,我最近在思考neo4j的非机构化数据怎么转成k-v键值模型,还有文档型,还有如何确保从其他源导入到neo4j的时候node已经创建(创建relationship的时候需要节点已经存在)

@leehom
Copy link

leehom commented May 12, 2023

https://www.toutiao.com/article/7109709746328060431/
可以参看一下我的博文,主要思想是两阶段,先转节点,后转关系
文档型是什么意思?

@liuzongliang0202
Copy link
Author

@odbozhou 在neo4j为source的时候,假设查询出来的记录都是节点,由于neo4j没有表的概念,所以如果我查出多个标签label的节点(一个标签对应一张表),这样就相当于一次poll会出现多张表的记录,这中情况使用mysql-sink-connector可以正常接受吗

@leehom
Copy link

leehom commented May 17, 2023

@odbozhou 在neo4j为source的时候,假设查询出来的记录都是节点,由于neo4j没有表的概念,所以如果我查出多个标签label的节点(一个标签对应一张表),这样就相当于一次poll会出现多张表的记录,这中情况使用mysql-sink-connector可以正常接受吗

这种东西交给用户决定,我现在做法是,以目标为导向,就是说,目标的表来源于什么源数据(cql),这样的话,多标签是映射一个表,还是多个表由用户写的cql决定

@liuzongliang0202
Copy link
Author

@odbozhou 在neo4j为source的时候,假设查询出来的记录都是节点,由于neo4j没有表的概念,所以如果我查出多个标签label的节点(一个标签对应一张表),这样就相当于一次poll会出现多张表的记录,这中情况使用mysql-sink-connector可以正常接受吗

这种东西交给用户决定,我现在做法是,以目标为导向,就是说,目标的表来源于什么源数据(cql),这样的话,多标签是映射一个表,还是多个表由用户写的cql决定

那如果是用户写cql的话,查询结果会多样性,例如下面两条都是查询Person标签的节点cql:
"MATCH (n:Person) RETURN n limit 100"
"MATCH (n:Person) RETURN n.name as name , n.age as age, n.mobile as mobile limit 100 "
这两种查询结果我都要去适配解析成项目中行记录 ConnectRecord吗?

还是说我们制定约束,让用户去选择poll指定标签(label)的节点还是指定类型(type)的关系,最终的cql是我们服务器自己组装的完整cql

@leehom
Copy link

leehom commented May 17, 2023

@odbozhou 在neo4j为source的时候,假设查询出来的记录都是节点,由于neo4j没有表的概念,所以如果我查出多个标签label的节点(一个标签对应一张表),这样就相当于一次poll会出现多张表的记录,这中情况使用mysql-sink-connector可以正常接受吗

这种东西交给用户决定,我现在做法是,以目标为导向,就是说,目标的表来源于什么源数据(cql),这样的话,多标签是映射一个表,还是多个表由用户写的cql决定

那如果是用户写cql的话,查询结果会多样性,例如下面两条都是查询Person标签的节点cql: "MATCH (n:Person) RETURN n limit 100" "MATCH (n:Person) RETURN n.name as name , n.age as age, n.mobile as mobile limit 100 " 这两种查询结果我都要去适配解析成项目中行记录 ConnectRecord吗?

还是说我们制定约束,让用户去选择poll指定标签(label)的节点还是指定类型(type)的关系,最终的cql是我们服务器自己组装的完整cql

用户写cql,首先生成目标库的schema,cql生成结果装填的ConnectRecord column顺序和schema一致,sink端根据schema去读取

@liuzongliang0202
Copy link
Author

@leehom 好的,我看了DataX实现GDB,是让用户自己在配置文件中定义需要拉取的节点标签和节点属性,最后source task自己拼装DSL查询数据集,并将数据写入到Record。大概知道怎么做了,多谢啦

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants