Bulk update #53

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

#

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Jump to bottom

Merged

swilly22 merged 5 commits into master from bulk-update

Feb 19, 2021

Contributor

jeffreylovitz commented Feb 16, 2021

This PR adds a redisgraph-bulk-loader entrypoint and logic for batch updating existing graphs with CSV files and a Cypher query.

jeffreylovitz added 2 commits

February 12, 2021 12:52

WIP

ff910ea


          Add bulk update utility and entrypoint

df94aee

jeffreylovitz added the enhancement label

jeffreylovitz requested review from swilly22 and filipecosta90

February 16, 2021 22:17

jeffreylovitz self-assigned this

swilly22 suggested changes

View reviewed changes

Contributor

swilly22 left a comment

Nice work! please see comments, suggestions

README.md Outdated

+              |  -p   | --port INTEGER           |             Redis server port (default: 6379)              |
+              |  -a   | --password TEXT          |           Redis server password (default: none)            |
+              |  -u   | --unix-socket-path TEXT  |           Redis unix socket path (default: none)           |
+              |  -e   | --query TEXT             |                   Query to run on server                   |

Contributor

swilly22 Feb 18, 2021

is -q taken?

Contributor Author

jeffreylovitz Feb 18, 2021

👍 Good catch!

test/test_bulk_update.py Outdated

+                      runner = CliRunner()
+                      csv_path = os.path.dirname(os.path.abspath(__file__)) + '/../example/'
+                      person_file = csv_path + 'Person.csv'

Contributor

swilly22 Feb 18, 2021

use os.path.join

test/test_bulk_update.py

+                                                        graphname])
+                      self.assertNotEqual(res.exit_code, 0)
+                      self.assertIn("Cannot merge node", str(res.exception))

Contributor

swilly22 Feb 18, 2021

Consider adding tests for miss-use, e.g. path to a none existing CSV file, accessing an invalid row index, invalid CSV file.

In addition please add test for introducing each one of our supported types, i.e. a CSV with column for each supported data-type

Contributor Author

jeffreylovitz Feb 19, 2021

Accessing an invalid row index implicitly returns a NULL value, which is the same behavior as Neo's LOAD CSV.

I think there could be some fringe use cases that rely on this, as creating or setting null values is a no-op - you could introduce attributes for only some entities by using non-rectangular CSVs.

redisgraph_bulk_loader/bulk_update.py Outdated

+                  def __init__(self, graph, max_token_size, separator, no_header, filename, query, variable_name, client):
+                      self.separator = separator
+                      self.no_header = no_header
+                      self.query = " UNWIND $rows AS " + variable_name + " " + query

Contributor

swilly22 Feb 18, 2021

consider adding the white space when introducing the query parameter

redisgraph_bulk_loader/bulk_update.py Show resolved Hide resolved

redisgraph_bulk_loader/bulk_update.py Outdated

Comment on lines 49 to 53

+                      except ResponseError as e:
+                          raise e
+                      # If we encountered a run-time error, the last response element will be an exception.
+                      if isinstance(result[-1], ResponseError):
+                          raise result[-1]

Contributor

swilly22 Feb 18, 2021

If you want to consider using RedisGraph-py you'll get error detection for free.

redisgraph_bulk_loader/bulk_update.py Outdated

+                              next_line = "[" + row.strip() + "]"
+                              # Emit buffer now if the max token size would be exceeded by this addition.
+                              if utf8len(rows_str) + utf8len(next_line) > self.max_token_size:

Contributor

swilly22 Feb 18, 2021

computing utf8len(rows_str) with each iteration is expensive, compute it by maintaining the current length and adding utf8len(next_line) to it

redisgraph_bulk_loader/bulk_update.py Outdated

+                              first = False
+                              # Concatenate the string into the rows string representation.
+                              rows_str += next_line

Contributor

swilly22 Feb 18, 2021

Expensive, see here

Contributor Author

jeffreylovitz Feb 18, 2021

Oof, good to know! Thanks!

redisgraph_bulk_loader/bulk_update.py Outdated

+                      # Add a closing bracket
+                      rows_str += "]"
+                      self.emit_buffer(rows_str)
+                      self.infile.close()

Contributor

swilly22 Feb 18, 2021

Consider using with open to scope access to the file, you can open/close the file multiple times (whenever needed)

redisgraph_bulk_loader/bulk_update.py Outdated

+              @click.option('--password', '-a', default=None, help='Redis server password')
+              @click.option('--unix-socket-path', '-u', default=None, help='Redis server unix socket path')
+              # Cypher query options
+              @click.option('--query', '-e', help='Query to run on server')

Contributor

swilly22 Feb 18, 2021

We can execute GRAPH.EXPLAIN with the specified query to quickly detect malformed queries

Contributor Author

jeffreylovitz Feb 18, 2021

👍 Good idea!

jeffreylovitz added 3 commits

February 18, 2021 15:23


          Address PR comments

d2a02c7


          Prefer joining arrays in string concatenation

6ef68d4


          Address PR comments

bfee4c8

jeffreylovitz force-pushed the bulk-update branch from b918e9b to bfee4c8 Compare

February 19, 2021 20:34

swilly22 approved these changes

View reviewed changes

swilly22 merged commit 2a5a641 into master

swilly22 deleted the bulk-update branch

February 19, 2021 20:38

# for free to join this conversation on GitHub. Already have an account? # to comment

Labels