-
-
Notifications
You must be signed in to change notification settings - Fork 324
Tree option to omit array metadata (shape, dtype) #224
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Comments
cc @jakirkham - I was playing around with S3 storage and noticed that |
What if this metadata is also cached? |
Caching would help the second time you want to view the tree, but it would still be slow to build the tree first time round. |
It’s true. Though Any thoughts on making |
For caching metadata, the new
If you have a lot of data and want to prevent chunk data from evicting metadata from the cache, then you could do this:
Yes tempted to make |
I'm coming around to having the default be as simple as possible, I think
it is a good principle if the default is the cheapest/fastest thing to
compute. I started trying to code this up but then realised that even if we
don't show any array properties, the current implementation of the tree
traversal still retrieves the metadata for each array and group in the
hierarchy, because it relies on instantiating Array and Group objects,
which always retrieve their metadata. It would be possible to implement the
initial construction of the tree purely by scanning the store keys and
looking for keys ending in '.zarray' and '.zgroup', but is a bit more of a
change than I was originally thinking. Will mull on it for a bit.
…On Wednesday, January 3, 2018, jakirkham ***@***.***> wrote:
It’s true. Though tree isn’t the only thing that may use shape and dtype.
Any thoughts on making meta=False by default? Just thinking about
simplifying what sounds like the common case (particularly for cloud
storage).
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<https://github.com/alimanfoo/zarr/issues/224#issuecomment-354915846>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAq8Qk-Ek_2eUJIt71Vn0sMJCn78UJ-Iks5tGtChgaJpZM4RQjNG>
.
--
Alistair Miles
Head of Epidemiological Informatics
Centre for Genomics and Global Health <http://cggh.org>
Big Data Institute Building
Old Road Campus
Roosevelt Drive
Oxford
OX3 7LF
United Kingdom
Phone: +44 (0)1865 743596
Email: alimanfoo@googlemail.com
Web: http://a <http://purl.org/net/aliman>limanfoo.github.io/
Twitter: https://twitter.com/alimanfoo
|
Should we make this a release blocker and/or should we add a note in |
I'm inclined to push this to the next release, I'd really like to get 2.2
out. I'll add a note to the docstring if you're agreeable.
…On Wednesday, January 3, 2018, jakirkham ***@***.***> wrote:
Should we make this a release blocker and/or should we add a note in tree’s
docs that it’s behavior is still evolving?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#224 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAq8QnZJwyegt2pFZ2WERSA9t8bu_gfuks5tG44GgaJpZM4RQjNG>
.
--
Alistair Miles
Head of Epidemiological Informatics
Centre for Genomics and Global Health <http://cggh.org>
Big Data Institute Building
Old Road Campus
Roosevelt Drive
Oxford
OX3 7LF
United Kingdom
Phone: +44 (0)1865 743596
Email: alimanfoo@googlemail.com
Web: http://a <http://purl.org/net/aliman>limanfoo.github.io/
Twitter: https://twitter.com/alimanfoo
|
Totally agree. |
Has there been any progress on this? I am noticing very large wall times (currently at ~6 min) with data stored on GCP. I am new to zarr in general, so any advice to reduce this would be great too!
|
This may be a different issue. Would suggest looking into consolidated metadata |
When using the
tree()
function/method, currently arrays are printed with shape and dtype. This is useful diagnostic information but requires that the.zarray
resource is retrieved and read for every array in the tree. This is not an issue for data stored locally, but can be an issue for remote storage as retrieving each.zarray
resource will require a network round-trip.Proposed to add an option
meta=True
to thetree()
function/method, which if set tometa=False
will omit the array metadata in the output, and thus building the tree representation will require only retrieving the list of keys from the store.The text was updated successfully, but these errors were encountered: