-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Is Item start_datetime to end_datetime an inclusive or exclusive range? #1255
Comments
Hi @lossyrob, I'm with @impactobservatory and Dan asked me to share my thoughts with you on this topic. This has been a source of internal debate for us so we'd appreciate clarity/guidance. We have generally operated under the assumption that I look forward to hearing what the community thinks! |
Can you provide more information about your query and your system? E.g.
That way we can dig in a bit more. Thanks! |
(FYI, I made a correction to the final sentence my earlier comment.) We're running a fork of from pystac_client import Client
client = Client.open("https://planetarycomputer.microsoft.com/api/stac/v1")
search = client.search(collections=["io-lulc-9-class"], datetime="2022")
print(len(search.item_collection()))
# 1482
# check start dates
print({item.properties["start_datetime"] for item in search.item_collection()})
# {'2021-01-01T00:00:00Z', '2022-01-01T00:00:00Z'}
# 2021 Items are included because their end date is 2022-01-01T00:...
# using a query we can get exactly what we want
query = {"start_datetime": {"eq": "2022-01-01T00:00:00Z"}}
search = client.search(collections=["io-lulc-9-class"], query=query)
print(len(search.item_collection()))
# 756
# should only have 2022 Items
print({item.properties["start_datetime"] for item in search.item_collection()})
# {'2022-01-01T00:00:00Z'} |
The suggestion is that it is just maybe worth being clear in the spec. Since it it is not specified, it is up to those implementing the spec to decide how it should behave, which could lead to confusion. I would second @lossyrob 's suggestion that date ranges be interpreted as |
Ah ok thanks, that makes more sense -- the
Agreed. From the implementation side of things, most code I've seems assumes inclusive -- e.g. pystac-client makes an "inclusive" range, and pgstac uses inclusive search: https://github.com/stac-utils/pgstac/blob/e3ae32d5e4c4b29731026ed9133add0d2a04eb73/src/pgstac/sql/004_search.sql#L158. That's not to say that's correct, or how it will be specified in the spec, that's just to explain behavior. |
I hadn't really thought about this before @lossyrob brought it up, and just always thought that inclusive makes the most sense, as I (and probably most people) think in terms of date only e.g., 2022-01-01/2022-12-31, rather than time. The behavior of pystac-client when you specify dates and not time is to fill the first date with the earliest time, and the second date with the latest time, e.g., 2022-01-01T00:00:00Z/2022-12-31T23:59:59.9Z I think for a human this makes the most intuitive sense, and although the spec may not be clear I think that was the intention. However, what gives me pause now is theoretically the "latest" time is never going to be the latest time, no matter how many 9's we include, so I'm inclined to move toward an exclusive end since it's the most correct. From a practical standpoint I'm not sure it matters one way or the other, as long as users know what the behavior is. We've got two options:
|
As we are describing data here, it's not directly related to search. Search is a different story and defined in another spec. Let's say I have a capture that takes two seconds: 2022-01-01T00:00:00Z - 2022-01-01T00:00:02Z (that's what I get from the source metdata).
This also happens here, but the other way around. I'd need to append an infinite number of 0's and a 1 at the end. Also, datetime is pretty much our equivalent for start_datetime = end_datetime. If we make end_datetime exclusive, this is not true any longer. From work in openEO (same discussion), I know that ISO is also not 100% certain about it and changed their definition over time. We ended up making it inclusive, which I think is also the latest status in ISO8601. I think I slightly tend towards an inclusive end_datetime, but there will be always pros and cons for both sides. |
Hi all, thanks for all your work. Just wanted to voice my support for the end date being exclusive. I find it easier to work with intervals that have an exclusive end date because it means a collection of items can have complete temporal coverage without needing to tweak the end date by an undefined amount of time (e.g., a second or a microsecond). |
@LiamBindle Isn't that an argument for having the end date inclusive? If it's exclusive and you create data (i.e. this is not the search use case), then you need to tweak the end date by an undefined amount of time. |
EDIT: Reopened in #1283 @m-mohr Sorry I missed your response and question. My bad, and I see this has already gone ahead. I'm going to reopen this because I think an inclusive end date introduces a logical flaw, so I'd like to advocate for making
I don't think so. Say you have an item that represents an average for the year 2018. When start is inclusive and end is exclusive you have More importantly, to respond to your Q higher in this thread:
In this case, the provided end date in the sources metadata is already exclusive isn't it? The period [2022-01-01T00:00:00Z,2022-01-01T00:00:02Z), has a duration of exactly 2 seconds with an exclusive end date. But what happens if there is another capture for the next 2 seconds? If the end date is inclusive then two items claim to cover the "2022-01-01T00:00:02Z" instant in time whereas an exclusive end date handles it cleanly. If the end date is inclusive then there is no way to represent a time series where every instant in time is covered by exactly one item. |
A start_datetime and end_datetime can be added to Item properties as per the Common Metadata spec.
The end_datetime is defined as "The last or end date and time for the Item, in UTC.".
From this description, it is not clear whether the start_datetime -> end_datetime is an inclusive or exclusive range.
For instance, if there's an annual dataset where the Item's date range is 2022-01-01T00:00:00 - 2023-01-01T00:00:00, does this represent only the year of 2022, or all of 2022 and also the very first second of 2023?
Based on feedback I've heard, I would suggest that a start time inclusive, end time exclusive range would make the most sense in practical terms.
The text was updated successfully, but these errors were encountered: