Skip to content

Bulk update #53

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 5 commits into from
Feb 19, 2021
Merged

Bulk update #53

merged 5 commits into from
Feb 19, 2021

Conversation

jeffreylovitz
Copy link
Contributor

This PR adds a redisgraph-bulk-loader entrypoint and logic for batch updating existing graphs with CSV files and a Cypher query.

@jeffreylovitz jeffreylovitz added the enhancement New feature or request label Feb 16, 2021
@jeffreylovitz jeffreylovitz self-assigned this Feb 16, 2021
Copy link
Contributor

@swilly22 swilly22 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! please see comments, suggestions

README.md Outdated
| -p | --port INTEGER | Redis server port (default: 6379) |
| -a | --password TEXT | Redis server password (default: none) |
| -u | --unix-socket-path TEXT | Redis unix socket path (default: none) |
| -e | --query TEXT | Query to run on server |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is -q taken?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Good catch!

runner = CliRunner()

csv_path = os.path.dirname(os.path.abspath(__file__)) + '/../example/'
person_file = csv_path + 'Person.csv'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use os.path.join

graphname])

self.assertNotEqual(res.exit_code, 0)
self.assertIn("Cannot merge node", str(res.exception))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding tests for miss-use, e.g. path to a none existing CSV file, accessing an invalid row index, invalid CSV file.

In addition please add test for introducing each one of our supported types, i.e. a CSV with column for each supported data-type

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accessing an invalid row index implicitly returns a NULL value, which is the same behavior as Neo's LOAD CSV.

I think there could be some fringe use cases that rely on this, as creating or setting null values is a no-op - you could introduce attributes for only some entities by using non-rectangular CSVs.

def __init__(self, graph, max_token_size, separator, no_header, filename, query, variable_name, client):
self.separator = separator
self.no_header = no_header
self.query = " UNWIND $rows AS " + variable_name + " " + query
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider adding the white space when introducing the query parameter

Comment on lines 49 to 53
except ResponseError as e:
raise e
# If we encountered a run-time error, the last response element will be an exception.
if isinstance(result[-1], ResponseError):
raise result[-1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to consider using RedisGraph-py you'll get error detection for free.

next_line = "[" + row.strip() + "]"

# Emit buffer now if the max token size would be exceeded by this addition.
if utf8len(rows_str) + utf8len(next_line) > self.max_token_size:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

computing utf8len(rows_str) with each iteration is expensive, compute it by maintaining the current length and adding utf8len(next_line) to it

first = False

# Concatenate the string into the rows string representation.
rows_str += next_line
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expensive, see here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oof, good to know! Thanks!

# Add a closing bracket
rows_str += "]"
self.emit_buffer(rows_str)
self.infile.close()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using with open to scope access to the file, you can open/close the file multiple times (whenever needed)

@click.option('--password', '-a', default=None, help='Redis server password')
@click.option('--unix-socket-path', '-u', default=None, help='Redis server unix socket path')
# Cypher query options
@click.option('--query', '-e', help='Query to run on server')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can execute GRAPH.EXPLAIN with the specified query to quickly detect malformed queries

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Good idea!

@swilly22 swilly22 merged commit 2a5a641 into master Feb 19, 2021
@swilly22 swilly22 deleted the bulk-update branch February 19, 2021 20:38
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants