Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Is Item start_datetime to end_datetime an inclusive or exclusive range? #1255

Closed
lossyrob opened this issue Sep 27, 2023 · 10 comments · Fixed by #1280
Closed

Is Item start_datetime to end_datetime an inclusive or exclusive range? #1255

lossyrob opened this issue Sep 27, 2023 · 10 comments · Fixed by #1280
Milestone

Comments

@lossyrob
Copy link
Collaborator

A start_datetime and end_datetime can be added to Item properties as per the Common Metadata spec.

The end_datetime is defined as "The last or end date and time for the Item, in UTC.".

From this description, it is not clear whether the start_datetime -> end_datetime is an inclusive or exclusive range.

For instance, if there's an annual dataset where the Item's date range is 2022-01-01T00:00:00 - 2023-01-01T00:00:00, does this represent only the year of 2022, or all of 2022 and also the very first second of 2023?

Based on feedback I've heard, I would suggest that a start time inclusive, end time exclusive range would make the most sense in practical terms.

@lossyrob lossyrob added this to the 1.1 milestone Sep 27, 2023
@bmcandr
Copy link

bmcandr commented Sep 27, 2023

Hi @lossyrob,

I'm with @impactobservatory and Dan asked me to share my thoughts with you on this topic. This has been a source of internal debate for us so we'd appreciate clarity/guidance. We have generally operated under the assumption that end_datetime is exclusive as you described and therefore set the start/end_datetime fields on our annual LULC map Items to 2022-01-01T00:00:00 and 2023-01-01T00:00:00 respectively. This results in somewhat unexpected results when performing searches using pystac_client, for example. I might naively expect that performing a search with the argument datetime="2022" would return the Items representing our 2022 map, but pystac_client's date -> datetime expansion results in 0 Items returned in this case. Items from both 2021 and 2022.

I look forward to hearing what the community thinks!

@gadomski
Copy link
Collaborator

Can you provide more information about your query and your system? E.g.

  • What backend+server combination are you using?
  • Can you share the code you're running, including (if possible) the endpoint you're hitting?

That way we can dig in a bit more. Thanks!

@bmcandr
Copy link

bmcandr commented Sep 27, 2023

(FYI, I made a correction to the final sentence my earlier comment.)

We're running a fork of stac-fastapi w/ pgstac backend (we haven't yet upgraded to stac-fastapi-pgstac yet). Our internal STAC server is private, but the behavior I described is reproducible with our io-lulc-9-class STAC Collection on PCH and pystac_client:

from pystac_client import Client
client = Client.open("https://planetarycomputer.microsoft.com/api/stac/v1")

search = client.search(collections=["io-lulc-9-class"], datetime="2022")
print(len(search.item_collection()))
# 1482

# check start dates
print({item.properties["start_datetime"] for item in search.item_collection()})
# {'2021-01-01T00:00:00Z', '2022-01-01T00:00:00Z'}
# 2021 Items are included because their end date is 2022-01-01T00:...

# using a query we can get exactly what we want
query = {"start_datetime": {"eq": "2022-01-01T00:00:00Z"}}
search = client.search(collections=["io-lulc-9-class"], query=query)
print(len(search.item_collection()))
# 756

# should only have 2022 Items
print({item.properties["start_datetime"] for item in search.item_collection()})
# {'2022-01-01T00:00:00Z'}

@aliasmrchips
Copy link

The suggestion is that it is just maybe worth being clear in the spec. Since it it is not specified, it is up to those implementing the spec to decide how it should behave, which could lead to confusion. I would second @lossyrob 's suggestion that date ranges be interpreted as[start_datetime, end_datetime).

@gadomski
Copy link
Collaborator

(FYI, I made a correction to the final sentence my earlier comment.)

Ah ok thanks, that makes more sense -- the 0 returns was surprising to me.

The suggestion is that it is just maybe worth being clear in the spec.

Agreed.

From the implementation side of things, most code I've seems assumes inclusive -- e.g. pystac-client makes an "inclusive" range, and pgstac uses inclusive search: https://github.com/stac-utils/pgstac/blob/e3ae32d5e4c4b29731026ed9133add0d2a04eb73/src/pgstac/sql/004_search.sql#L158. That's not to say that's correct, or how it will be specified in the spec, that's just to explain behavior.

@matthewhanson
Copy link
Collaborator

I hadn't really thought about this before @lossyrob brought it up, and just always thought that inclusive makes the most sense, as I (and probably most people) think in terms of date only e.g., 2022-01-01/2022-12-31, rather than time.

The behavior of pystac-client when you specify dates and not time is to fill the first date with the earliest time, and the second date with the latest time, e.g., 2022-01-01T00:00:00Z/2022-12-31T23:59:59.9Z

I think for a human this makes the most intuitive sense, and although the spec may not be clear I think that was the intention.

However, what gives me pause now is theoretically the "latest" time is never going to be the latest time, no matter how many 9's we include, so I'm inclined to move toward an exclusive end since it's the most correct.

From a practical standpoint I'm not sure it matters one way or the other, as long as users know what the behavior is.

We've got two options:

  • Inclusive end date: Make this clear in documentation. The core tools out there implement it this way, although that should be checked. But this is just a docs change.
  • Exclusive end date: Because the implementations suggest otherwise if we want to change this we should consider this a change and get it in the 1.1 release. I wouldn't consider this simply a doc change and tools should be brought up to date when updated for 1.1

@m-mohr
Copy link
Collaborator

m-mohr commented Jan 30, 2024

As we are describing data here, it's not directly related to search. Search is a different story and defined in another spec.

Let's say I have a capture that takes two seconds: 2022-01-01T00:00:00Z - 2022-01-01T00:00:02Z (that's what I get from the source metdata).
How am I supposed to make this exclusive? It's the same issue that Matt describes:

However, what gives me pause now is theoretically the "latest" time is never going to be the latest time, no matter how many 9's we include, so I'm inclined to move toward an exclusive end since it's the most correct.

This also happens here, but the other way around. I'd need to append an infinite number of 0's and a 1 at the end.

Also, datetime is pretty much our equivalent for start_datetime = end_datetime. If we make end_datetime exclusive, this is not true any longer.

From work in openEO (same discussion), I know that ISO is also not 100% certain about it and changed their definition over time. We ended up making it inclusive, which I think is also the latest status in ISO8601.

I think I slightly tend towards an inclusive end_datetime, but there will be always pros and cons for both sides.

@LiamBindle
Copy link

Hi all, thanks for all your work. Just wanted to voice my support for the end date being exclusive. I find it easier to work with intervals that have an exclusive end date because it means a collection of items can have complete temporal coverage without needing to tweak the end date by an undefined amount of time (e.g., a second or a microsecond).

@m-mohr
Copy link
Collaborator

m-mohr commented Feb 8, 2024

@LiamBindle Isn't that an argument for having the end date inclusive? If it's exclusive and you create data (i.e. this is not the search use case), then you need to tweak the end date by an undefined amount of time.

@m-mohr m-mohr linked a pull request May 14, 2024 that will close this issue
4 tasks
@m-mohr m-mohr closed this as completed May 15, 2024
@LiamBindle
Copy link

LiamBindle commented May 15, 2024

EDIT: Reopened in #1283

@m-mohr Sorry I missed your response and question. My bad, and I see this has already gone ahead. I'm going to reopen this because I think an inclusive end date introduces a logical flaw, so I'd like to advocate for making end_datetime exclusive one more time. Feel free to reclose if you don't think this needs any more discussion--I'm not trying to make a mountain out of a mole hill.

Isn't that an argument for having the end date inclusive?

I don't think so. Say you have an item that represents an average for the year 2018. When start is inclusive and end is exclusive you have start_datetime="2018-01-01T00:00:00.000000000Z" and end_datetime="2019-01-01T00:00:00.000000000Z". If the end date is inclusive then you need to subtract an undefined amount of time (1ns?) from the ending date. I.e., should it be end_datetime="2018-12-31T23:59:59.999999999Z"?

More importantly, to respond to your Q higher in this thread:

Let's say I have a capture that takes two seconds: 2022-01-01T00:00:00Z - 2022-01-01T00:00:02Z (that's what I get from the source metdata).
How am I supposed to make this exclusive? It's the same issue that Matt describes:

In this case, the provided end date in the sources metadata is already exclusive isn't it? The period [2022-01-01T00:00:00Z,2022-01-01T00:00:02Z), has a duration of exactly 2 seconds with an exclusive end date.

But what happens if there is another capture for the next 2 seconds? If the end date is inclusive then two items claim to cover the "2022-01-01T00:00:02Z" instant in time whereas an exclusive end date handles it cleanly. If the end date is inclusive then there is no way to represent a time series where every instant in time is covered by exactly one item.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants