-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Es item #111
Es item #111
Conversation
…equest.force_datastore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks good, mostly just small things I've commented on
return None | ||
|
||
return find_it(our_dict) | ||
used_datastore = 'elasticsearch' # datastore used by this model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason this is necessary? May be a little confusing since used_datastore
is an attribute on Item
, and while CachedModel
is related it isn't exactly the same thing. Don't you always know in this case implicitly that the data comes from elasticsearch?
document = existing_doc.source | ||
|
||
index_name = get_namespaced_index(self.registry, document['item_type']) | ||
# use `refresh='waitfor'` so that the ES model is immediately available |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this mean under heavy load this could block? May want to make this configurable
src/snovault/resources.py
Outdated
@@ -255,6 +258,9 @@ class Item(Resource): | |||
embedded_list = [] | |||
filtered_rev_statuses = () | |||
schema = None | |||
# `used_datastore` determines where the properties of the item are store. | |||
# None for traditional Postgres setup or "elasticsearch" to use ES document | |||
used_datastore = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that this is set yet I suspect in some case it must not be given the errors I saw when running this branch on CGAP. To fix it I had to put a try-except where this field is accessed if I recall for static sections?
for operations like getting current sid/max_sid, rev_links, and | ||
updating. Leverage `model.used_datastore` to determine source | ||
""" | ||
if self.model.used_datastore != 'database': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the convention that used_datastore='elasticsearch'
and anything else implies database? In which case if the default None were here you would grab the model twice when you didn't need to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a bit confusing because the models themselves (storage.Resource
or esstorage.CachedModel
) had used_datstore
parameters which are always set, as well as the used_datastore
parameters on Item. I've refactored it a bit.
I think this is correct though; if you added a third type of model, say used_datastore='redis'
, then we would still want to use the write model here
src/snovault/storage.py
Outdated
# properties and sheets, as those are exclusively stored in ES | ||
self.write.update(model, {}, None, unique_keys, links) | ||
# update contents of the ES documents | ||
return storage.update(model, properties, sheets, unique_keys, links) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't think this return is necessary since the one below is identical
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True!
src/snovault/tests/test_indexing.py
Outdated
@@ -1260,3 +1259,151 @@ def test_validators_on_indexing(app, testapp, indexer_testapp): | |||
val_err_view = testapp.get(ppp_id + '@@validation-errors', status=200).json | |||
assert val_err_view['@id'] == ppp_id | |||
assert val_err_view['validation_errors'] == es_res['_source']['validation_errors'] | |||
|
|||
|
|||
def test_elasticsearch_item(app, testapp, indexer_testapp): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would maybe consider splitting into 2 (even 3) separate tests if you can logically split the behavior. Your call though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably a good call. This test grew organically, hence it's massive size. I'll split it up
src/snovault/tests/test_link.py
Outdated
|
||
url = '/testing-link-sources-sno/' | ||
for item in sources: | ||
testapp.post_json(url, item, status=201) | ||
res = testapp.post_json(url, item) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This and the above change was for testing, right? Should probably be changed back since I believe this one will fail silently (and you don't need res?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, definitely. Reverted.
… properties_datastore
…idation in Connection.__getitem__
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to read every line of this code, although reviewing the actual algorithmic choices is beyond the scope of what I had time to do. I did reason through things that are locally understandable and didn't see flaws in most of that, but I'm glad Will had eyes on it, too. Some of the bigger sections of code look like they were just moving code around. I pulled up sources and tried to do side-by-side of blocks of code that seemed to have moved a lot, though that's error-prone in its own ways. In general, for big changes like this, reading the commit-by-commit changes can be helpful, and maybe I should do that as well, though it has issues with changes getting superseded. Anyway, reviewing the things I actually remarked on, I see they were largely cosmetic. I'm going to ponder whether the level of work I've done on it constitutes an approval. I'll likely talk to Will before signing off. But I'll just say in words here that I don't have any reason not to approve this, presuming it's passing tests. It's good not to let a change this big linger. And we have other big things like PR #68 that may be affected and also need to get in.
if not hasattr(context.model, 'source'): | ||
if context.model.used_datastore != 'elasticsearch': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume we can rely on every model to have a .used_datastore
so we don't have to be so timid about the access?
docs/source/local_installation.rst
Outdated
|
||
Step 1: Verify that homebrew is working properly:: | ||
|
||
$ sudo brew doctor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did
~$ sudo brew doctor
Error: Running Homebrew as root is extremely dangerous and no longer supported.
As Homebrew does not drop privileges on installation you would be giving all
build scripts full access to your system.
~$ brew doctor
Your system is ready to brew.
I think we should change this line to say just
$ brew doctor
docs/source/local_installation.rst
Outdated
|
||
$ brew install libevent libmagic libxml2 libxslt openssl postgresql graphviz | ||
$ brew install freetype libjpeg libtiff littlecms webp # Required by Pillow | ||
$ brew tap homebrew/versions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
~$ brew tap homebrew/versions
Error: homebrew/versions was deprecated. This tap is now empty as all its formulae were migrated.
I recommend that we just remove this line. I don't actually think it's needed.
@@ -0,0 +1,45 @@ | |||
Local Installation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I bet most of these instructions would be better if we borrowed the instructions I just created for forefront. (We could do that in a separate PR sometime. Not a blocker here, though.
registry[STORAGE] = storage | ||
# expect existing values to be used | ||
register_storage(registry) | ||
assert registry[STORAGE].write == 'dummy_db' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It hurts my head that these are called simply read and write, since I keep assuming their value will be used like you'd use
sys.stdout.write("Foo")
Definitely not something to change on this PR, which is already very sprawling and hard to review. But one day I hope we fix this to have a more intuitive name like .storage_for_read
, etc.
return [request.resource_path(conn[uuid]) for uuid in | ||
self.get_filtered_rev_links(request, rev_name)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather see the line break before the for
so that the fact of an iteration is more prominent. And this is a place I deviate from PEP. I tend to like to use a space after the bracket/brace when the text inside is computed, whether a literal or a comprehension. The space tells you something interesting is afoot and draws your attention. The line break in the proper place makes it like an upside-down for loop, and so a more familiar notation the eye can more easily scan. e.g.,
return [ request.resource_path(conn[uuid])
for uuid in self.get_filtered_rev_links(request, rev_name) ]
@@ -1046,7 +1065,7 @@ def test_indexing_esstorage(app, testapp, indexer_testapp): | |||
# test the following methods: | |||
es_res_by_uuid = esstorage.get_by_uuid(test_uuid) | |||
es_res_by_json = esstorage.get_by_json('required', 'some_value', TEST_TYPE) | |||
es_res_direct = esstorage.get_by_uuid_direct(test_uuid, namespaced_test_type, TEST_TYPE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use keyword calling here, please.
src/snovault/crud_views.py
Outdated
# purge_uuid fxn ensures that all links to the item are removed | ||
namespaced_index = get_namespaced_index(request, item_type) | ||
request.registry[STORAGE].purge_uuid(item_uuid, namespaced_index, item_type) | ||
request.registry[STORAGE].purge_uuid(item_uuid, item_type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I thought I'd committed this comment, but apparently hadn't: Please whenever changing argument conventions like this, use keyword calling so that you're sure it's lining up consistently. In fact, I use keyword calling in nearly all cases when there are arbitrarily ordered arguments like these. (By arbitrarily, I mean that there's something natural about putting the uuid first in a function named purge_uuid
, so I'm OK with that not being a keyword, though I also don't mind a keyword even then. But in this PR there is a def purge_uuid(self, rid, item_type=None):' in
src/snovault/storage.pythat takes one set of argument conventions, and there's also been a
max_sidargument that's been used. I recommend writing at least
purge_uuid(item_uuid, item_type=item_type)and optionally even
purge_uuid(rid=item_uuid, item_type=item_type)`. This make sure the right method with the right args is being called while in transition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of my other comments, which intended to allude back to that one, may have been confusing in its absence. :)
@@ -1262,6 +1280,207 @@ def test_validators_on_indexing(app, testapp, indexer_testapp): | |||
assert val_err_view['validation_errors'] == es_res['_source']['validation_errors'] | |||
|
|||
|
|||
def test_elasticsearch_item_basic(app, testapp, indexer_testapp, es_based_target): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests look plausible, though I'm still not up on enough detail to review some aspects of them. I wouldn't mind sitting with Will and going over them in detail to make sure I get the things that are being tested. No action item for the code due to that, though. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One question I can ask here, though: These tests are longish. Is that because there is state that runs through the whole thing and changes? Would they not work if carved up into more and smaller tests? Would that require repeating a constant bit of boilerplate setup, or would there be a (pardon pun) pyramidal cost in setup as the series of tests progressed? Or is the reason for lumping them together that the fixture setup is costly and we just don't want too many of such tests? I feel like this is not just a matter of ease of review, but of more concise noticing of particular problems in case of error, of having one error not mask another, and of modularity in test setup so that tests are easier to change or less state-dependent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all looks good.
Large change that refactors a lot of storage-related code and adds a new feature: creating items with ES-based properties. See this document for an overview. There are some other misc. changes as well.
Version is currently set to 1.3.8 but may need to change if subsequent releases are made before this is merged.
Summary
linkFrom
code (unused)PickStorage
anddatastore
(on request) slightly and moved to storage.pyICachedItem
(for ES-based items)ESStorage
and addedcreate
+update
methods (for ES-based items)properties_datastore='elasticsearch'
Please note that links between documents will not work until the RTD build runs.