-
Notifications
You must be signed in to change notification settings - Fork 34
Bulk update #53
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Bulk update #53
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! please see comments, suggestions
README.md
Outdated
| -p | --port INTEGER | Redis server port (default: 6379) | | ||
| -a | --password TEXT | Redis server password (default: none) | | ||
| -u | --unix-socket-path TEXT | Redis unix socket path (default: none) | | ||
| -e | --query TEXT | Query to run on server | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is -q
taken?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Good catch!
test/test_bulk_update.py
Outdated
runner = CliRunner() | ||
|
||
csv_path = os.path.dirname(os.path.abspath(__file__)) + '/../example/' | ||
person_file = csv_path + 'Person.csv' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use os.path.join
graphname]) | ||
|
||
self.assertNotEqual(res.exit_code, 0) | ||
self.assertIn("Cannot merge node", str(res.exception)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding tests for miss-use, e.g. path to a none existing CSV file, accessing an invalid row index, invalid CSV file.
In addition please add test for introducing each one of our supported types, i.e. a CSV with column for each supported data-type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accessing an invalid row index implicitly returns a NULL value, which is the same behavior as Neo's LOAD CSV.
I think there could be some fringe use cases that rely on this, as creating or setting null values is a no-op - you could introduce attributes for only some entities by using non-rectangular CSVs.
def __init__(self, graph, max_token_size, separator, no_header, filename, query, variable_name, client): | ||
self.separator = separator | ||
self.no_header = no_header | ||
self.query = " UNWIND $rows AS " + variable_name + " " + query |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider adding the white space when introducing the query parameter
except ResponseError as e: | ||
raise e | ||
# If we encountered a run-time error, the last response element will be an exception. | ||
if isinstance(result[-1], ResponseError): | ||
raise result[-1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to consider using RedisGraph-py you'll get error detection for free.
next_line = "[" + row.strip() + "]" | ||
|
||
# Emit buffer now if the max token size would be exceeded by this addition. | ||
if utf8len(rows_str) + utf8len(next_line) > self.max_token_size: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
computing utf8len(rows_str)
with each iteration is expensive, compute it by maintaining the current length and adding utf8len(next_line)
to it
first = False | ||
|
||
# Concatenate the string into the rows string representation. | ||
rows_str += next_line |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Expensive, see here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oof, good to know! Thanks!
# Add a closing bracket | ||
rows_str += "]" | ||
self.emit_buffer(rows_str) | ||
self.infile.close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using with open
to scope access to the file, you can open/close the file multiple times (whenever needed)
@click.option('--password', '-a', default=None, help='Redis server password') | ||
@click.option('--unix-socket-path', '-u', default=None, help='Redis server unix socket path') | ||
# Cypher query options | ||
@click.option('--query', '-e', help='Query to run on server') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can execute GRAPH.EXPLAIN
with the specified query
to quickly detect malformed queries
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Good idea!
b918e9b
to
bfee4c8
Compare
This PR adds a
redisgraph-bulk-loader
entrypoint and logic for batch updating existing graphs with CSV files and a Cypher query.