-
-
Notifications
You must be signed in to change notification settings - Fork 319
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Add FSStore #546
Add FSStore #546
Conversation
Thanks Martin, great to see this. |
Thanks a lot @martindurant for getting this started! Let me know when you're ready for a review or some feedback. It would be really nice to implement |
|
@martindurant I was just reloading the underlying issue into my brain's cache this morning, and excited to see this. What is the overall plan, with regard to how this would interact with things like N5Store and NestedDirectoryStore? How can I help? |
The idea is, that nested and consolidated, at least, would be options to this store, no need for extra classes. I don't know enough about the N5 case to know if that is as simple. As far as I know, the little code that is here "works", but needs a good deal of testing and doubtless some massage to get totally right. At the moment, the PR just hijacks some existing nested tests, which all then pass locally, and almost all on CI. |
If this is to fix 540, then I would like to see how I could help update N5Store to use this. The N5Store is a NestedDirectoryStore (also a MutableMapping like FSStore) which lays out the data according the N5 spec. I do not think N5Store can just inherit from FSStore, since it will lose the Directory/NestedDirectory functionality. It's possible that N5Store could take a FSStore in it's c'tor, but that may not be the cleanest way to combine the N5, FS, and NestedDirectory functionality. Is FSStore a possible replacement for a DirectoryStore? Regarding the comment above about the |
In this POC, FSStore is tested as if it were a nested store, but it can do both, depending on the arguments passed. That's what I meant bu not needing to have separate classes.
OK, so simple enough |
Just to say FWIW I think the best way forward for N5Store is to modify it so it becomes a transformation layer over another store, as suggested here and in zarr-developers/n5py#9. That way N5Store can be used with any type of underlying storage. I.e., work on N5Store can be orthogonal to this PR. |
@martindurant I did a quick swap on N5Store to have it inherit from FSStore (and use the FSStore listdir within N5Store.listdir), and it does seem to correctly read/write a single array to GCS. I didn't test other functionality, like groups or anything related to listdir explicitly. @alimanfoo OK. I'll pick up the discussion about the correct interaction of N5Store and FSStore on 540. |
Right, there will be problems - so far I only ran the nestedstore tests, and see that they pass. You must of course pass
I haven't got very far yet, but the code here should show that it isn't too complicated. |
This demonstrates the purpose of this PR! Now allows, for exmple `zarr.open('http://localhost:8000')`, although there is no way to pass on further args with this method yet.
Hello @martindurant! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2020-09-11 14:08:37 UTC |
zarr/storage.py
Outdated
@@ -949,16 +949,17 @@ def atexit_rmglob(path, | |||
class FSStore(MutableMapping): | |||
|
|||
def __init__(self, url, normalize_keys=True, key_separator='.', | |||
**storage_options): | |||
mode='r', **storage_options): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this change, the TestNestedDirectoryStore failure becomes:
_____________________________________________ TestNestedDirectoryStore.test_chunk_nesting _____________________________________________
self = <zarr.tests.test_storage.TestNestedDirectoryStore testMethod=test_chunk_nesting>
def test_chunk_nesting(self):
store = self.create_store()
# any path where last segment looks like a chunk key gets special handling
> store['0.0'] = b'xxx'
zarr/tests/test_storage.py:773:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <zarr.storage.FSStore object at 0x7fb7501c2c18>, key = '0.0', value = b'xxx'
def __setitem__(self, key, value):
if self.mode == 'r':
> raise PermissionError
E PermissionError
zarr/storage.py:974: PermissionError
We realized recently that the default mode for a regular DirectoryStore is w
meaning that directories are automatically created:
In [18]: z=zarr.open('this/file/does/not/exist')
In [19]: list(z.groups()), list(z.arrays())
Out[19]: ([], [])
I was debating suggesting a change to "r" on a separate issue, but we'll probably need to keep it in mind for this PR as well.
If I force a "w" mode:
@@ -764,7 +764,7 @@ class TestNestedDirectoryStore(TestDirectoryStore, unittest.TestCase):
path = tempfile.mkdtemp()
atexit.register(atexit_rmtree, path)
store = NestedDirectoryStore(path, normalize_keys=normalize_keys,
- key_separator='/', auto_mkdir=True)
+ key_separator='/', auto_mkdir=True, mode="w")
return store
then I again see the previous error:
zarr/tests/test_storage.py:230:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
zarr/storage.py:970: in __getitem__
return self.map[key]
/opt/anaconda/envs/py36/lib/python3.6/site-packages/fsspec/mapping.py:76: in __getitem__
result = self.fs.cat(k)
/opt/anaconda/envs/py36/lib/python3.6/site-packages/fsspec/spec.py:587: in cat
return self.open(path, "rb").read()
/opt/anaconda/envs/py36/lib/python3.6/site-packages/fsspec/spec.py:775: in open
**kwargs
/opt/anaconda/envs/py36/lib/python3.6/site-packages/fsspec/implementations/local.py:108: in _open
return LocalFileOpener(path, mode, fs=self, **kwargs)
/opt/anaconda/envs/py36/lib/python3.6/site-packages/fsspec/implementations/local.py:175: in __init__
self._open()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <fsspec.implementations.local.LocalFileOpener object at 0x7faa10221908>
def _open(self):
if self.f is None or self.f.closed:
if self.autocommit or "w" not in self.mode:
> self.f = open(self.path, mode=self.mode)
E IsADirectoryError: [Errno 21] Is a directory: '/var/folders/z5/txc_jj6x5l5cm81r56ck1n9c0000gn/T/tmpwebfv41i/c'
/opt/anaconda/envs/py36/lib/python3.6/site-packages/fsspec/implementations/local.py:180: IsADirectoryError
which seems related, but I'm unclear how to get started with it, since it seems to be saying that fsspec
expects one of the intermediate keys to be a file rather than a directory. I'll keep poking around but suggestions welcome.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I probably should have started with the non-nested stores for testing...
Exactly when directory trees should be removed and when files should be removed to make way for directories, I tries to reverse-engineer from the existing tests.
The actual error here, is that zarr expects a KeyError
when trying to read "c", but fsspec
as of fsspec/filesystem_spec#259 only converts FileNotFound
into a KeyError
. I suppose IsADirectory
should be caught too, or the code should call fs.isfile()
?
This appears to pass all tests with fsspec master, where I have substituted FSStore for DirectoryStore and NestedDirectoryStore in test_storage (not the final solution!). Note that FSStore now directly implements consolidation, also in write mode (which is controversial, since it makes each metadata write take two remote calls, and introduces possible sync problems). |
Would it be useful to have fsspec master in this POC to show it will pass? |
I appreciate all the work on this from @martindurant. At the same time, I do find it confusing to hear things like "FSStore now directly implements consolidation". We should have a clear separation between a storage layer and the zarr layer. It makes me nervous to see more features seeping into the storage layer. |
In my mind, fsspec is the storage layer, and everything in zarr.storage can be specific/specialised (such as the original conslolidated). You could use the previous consolidated with FSStore if you wanted, but this new version seems to me to remove duplication, as well as adding write-mode. On of the issues I am trying to solve is "which store do I subclass": anyone coming to zarr and wanting to implement something new now would have to accommodate regular keys, nested keys or consolidated, not any combination. |
zarr/storage.py
Outdated
self.fs = self.map.fs # for direct operations | ||
self.mode = mode | ||
self.exceptions = exceptions | ||
# TODO: should warn if consolidated and write mode? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO ?
zarr/storage.py
Outdated
self.consolidated = consolidated | ||
self.metadata_key = metadata_key | ||
if consolidated: | ||
self.meta = json.loads(self.map.get(metadata_key, b"{}").decode()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a json_loads
utils function used in other place of this file:
def json_loads(s):
"""Read JSON in a consistent way."""
return json.loads(ensure_text(s, 'ascii'))
Do you want to use it ?
Dropping in comments from today's discussion on the community call. I have no objection to this PR, would be great to see it go in. Two points that come up are:
@martindurant happy for you to make the call on how to round this one off. |
Has anyone any idea why numcodecs failed to build during the docs stage here ? |
Due to the now yanked https://pypi.org/project/numcodecs/0.7.0/ as we try to get the wheels deployed. (Sorry 'bout that) |
Should the problem go away then? Also, I have coverage failure even though "169 of 169 new or added lines in 5 files covered. (100.0%)" (because fsspec is not tested on py35?). That's annoying... |
(never mind on the latter, coverage just updated itself) |
It should already be gone! Oddly travis is still finding it, perhaps because of some caching??
All the more reason to get 0.7.1 out ASAP. |
@joshmoore : is that S3 test OK for you? |
(I plan to merge this as is, unless there are further comments) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests pass, thanks @martindurant. I also tried adding a ```@pytest.mark.skipif(have_fsspec is False, reason="needs fsspec")
class TestNestedFSStore2(TestFSStore):
def create_store(self, normalize_keys=False):
path = tempfile.mkdtemp()
atexit.register(atexit_rmtree, path)
store = FSStore(path, normalize_keys=normalize_keys,
key_separator='/', auto_mkdir=True)
return store
locally which equally passed. :+1:
Quick question -- is there a good reason why FSStore defaults to |
I had the same question. I've need to hard-code the argument in all of my uses. cc: @martindurant |
I certainly don't mind the default changing, if that is the consensus. I would like to claim there was a good reason for the current , but ... |
Bubbling this up into a new issue: #739 |
Fixes #540
Ref #373 (comment) (@rabernat )
Introduces a short Store implementation for generic fsspec url+options. Allows both lowercasing of keys and choice between '.' and '/'-based ("nested") keys.
For testing, have hijacked the TestNestedDirectoryStore tests just as an example - this is not how things will remain.