Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Using apoc.export.csv.graph with bulkImport:true produces wrong result for "id" property #1335

Closed
p0macs opened this issue Nov 4, 2019 · 6 comments · Fixed by #1359
Closed

Comments

@p0macs
Copy link

p0macs commented Nov 4, 2019

When the node has a property named "id" and the node will be exported with the bulkImport:true option, then the value of the "id" property will be replaced with the value of the node's internal ID.
The code is probably wrong in the CsvFormat.java file, in the writeNodesBulkImport procedure.

@conker84
Copy link
Collaborator

conker84 commented Nov 25, 2019

Hi @p0macs can you try with the following jar?
apoc-3.5.0.6-all.jar.zip

I look forward to your feedback!

conker84 added a commit to conker84/neo4j-apoc-procedures that referenced this issue Nov 25, 2019
…:true produces wrong result for id property
conker84 added a commit to conker84/neo4j-apoc-procedures that referenced this issue Nov 25, 2019
…:true produces wrong result for id property
conker84 added a commit to conker84/neo4j-apoc-procedures that referenced this issue Nov 25, 2019
…:true produces wrong result for id property
@p0macs
Copy link
Author

p0macs commented Nov 26, 2019

Hi @conker84 ,
Thank you for the new version, I have tested it - but I think you probably misunderstood the issue.
Now the bulkImport-mode creates an export file where the first column is "id:ID(NodeLabel)" and the value for that column is the value from my "id" property.
You must understand, that the "id" property must not be a unique identifier of the node. It can be only a coincidence that this property is called "id". I can have a lot of nodes with the same "id" value.
Now the export file handling this "id" as it would be the real ID of the node - but it is not.
The proper way in my opinion would be:

  • having one "id:ID(NodeLabel)" column for the internal ID
  • having one additional "id" column for my "id" property

When you export without the bulkImport option, then the CSV file is created with the following header:
"_id","_labels","id","_start","_end","_type"
the "_id" is here the internal ID of the node, and the "id" is my own "id" property
The proper header for the bulkImport option would be (when no unique constraint is defined on the node):
_id:ID(NodeLabel),id,:LABEL

I think the bulk import is designed to work with any ":ID" column which is unique. But please note: alone the fact the the property name is "id" does not mean that this is the unique identifier for the node.

@conker84
Copy link
Collaborator

conker84 commented Jan 9, 2020

Sorry @p0macs I lost you comment.
Can you check the test that I added in the PR:

String expectedNodesLarus = String.format(":ID,name,id:long,:LABEL%n"
+ "%s,Andrea,1,User;Larus%n", map.get("sourceId"));
String expectedNodesNeo4j = String.format(":ID,name,id:long,:LABEL%n"
+"%s,Michael,2,User;Neo4j%n", map.get("targetId"));
String expectedRelsNeo4j = String.format(":START_ID,:END_ID,:TYPE,id:long%n"
+ "%s,%s,KNOWS,10%n", map.get("sourceId"), map.get("targetId"));

The new header should be:

:ID,name,id:long,:LABEL
<internalID>,<your_prop_name>,<your_prop_id>,<labels>

Before my PR the id prop was always overwritten by the internal id, while now we have it as a separate field.
I think that we should always rely on internal id because you can have also composite unique keys in your domain (this simplifies the whole export process) do you think that could be a problem for your use-case?

@p0macs
Copy link
Author

p0macs commented Jan 9, 2020

Hi @conker84 ,
Exactly, that is what we need. using the internal ID for the export (as before) - but exporting also the user property "id" when it exists. In our case it has not the long datatype (it is a string), but I think that will be handled right anyway.

@conker84
Copy link
Collaborator

@p0macs if I provide a build with the fix can you test it?

@p0macs
Copy link
Author

p0macs commented Jan 10, 2020

Hi @conker84 , yes I could test it.

JMHReif added a commit that referenced this issue Jan 10, 2020
fixes #1335: Using apoc.export.csv.graph with bulkImport:true produces wrong result for id property
sarmbruster pushed a commit that referenced this issue Jan 11, 2020
sarmbruster added a commit that referenced this issue Jan 13, 2020
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
2 participants