Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[C++] compute::LocalTimestamp() Performs incorrect conversion #45751

Open
gowerc opened this issue Mar 11, 2025 · 3 comments
Open

[C++] compute::LocalTimestamp() Performs incorrect conversion #45751

gowerc opened this issue Mar 11, 2025 · 3 comments

Comments

@gowerc
Copy link

gowerc commented Mar 11, 2025

Describe the bug, including details regarding any error messages, version, and platform.

Apologies in advance if I've made a mistake here I am relatively new to the arrow Cpp API and also to managing datetime stamps, that being said I think there might be a bug with the compute::LocalTimestamp() function (at least it appears to be producing results I wouldn't have expected:

For example take a timestamp(seconds) of

2222997212 = Monday, June 11, 2040  3:13:32 UTC
           = Sunday  June 10, 2040 23:13:32 America/New York (EDT)

Assuming that the value was stored in a Timestamp array with a timezone of EDT I would have expected after running compute::LocalTimestamp() a value to be produced of:

2222982812 = Sunday, June 10, 2040 23:13:32 UTC

However in practice when doing this I am observing an actual value of:

2222979212 = Sunday, June 10, 2040 22:13:32 UTC

I tried searching but I couldn't see any other issues (open or closed) related to this.


I am running on Fedora 41 using libarrow-16.1.0-12.fc41.x86_64 (latest available from the fedora package manager)

--- EDIT - Just tested against arrow-19.0.1 and am still getting the same behavior ---

Code I am running to reproduce this:

#include <arrow/api.h>
#include <arrow/io/api.h>
#include <arrow/compute/api.h>
#include <iostream>
#include <memory>


arrow::Status RunMain() {
    // Create timestamp array with the target value
    arrow::TimestampBuilder builder(
        arrow::timestamp(arrow::TimeUnit::SECOND, "America/New_York"),
        arrow::default_memory_pool()
    );
    ARROW_RETURN_NOT_OK(builder.Append(2222997212));
    ARROW_ASSIGN_OR_RAISE(std::shared_ptr<arrow::Array> array_raw, builder.Finish());
    auto array = std::static_pointer_cast<arrow::TimestampArray>(array_raw);


    // Display what the current value is
    std::cout << "Value = " << array->Value(0) << std::endl; // 2222997212

    // Cast to localtime zone and the display the value again
    ARROW_ASSIGN_OR_RAISE(
        auto array_converted_raw,
        arrow::compute::LocalTimestamp(array)
    )
    auto array_converted = std::static_pointer_cast<arrow::TimestampArray>(array_converted_raw.make_array());
    std::cout << "Value = " << array_converted->Value(0) << std::endl; // 2222979212
    
    return arrow::Status::OK();
}


int main (int argc, char** argv) {
    arrow::Status st = RunMain();
    if (!st.ok()) {
        std::cerr << st << std::endl;
        return 1;
    }
    return 0;
}

Component(s)

C++

@gowerc gowerc changed the title compute::LocalTimestamp() Resulting in incorrect conversion compute::LocalTimestamp() Performs incorrect conversion Mar 11, 2025
@gowerc
Copy link
Author

gowerc commented Mar 11, 2025

Just to add, experimenting with a different timezone library (link) gets the expected 2222982812 value:

#include <iostream>
#include <chrono>
#include <date/date.h>
#include <date/tz.h>


int main() {
    date::sys_seconds utc_time{std::chrono::seconds(2222997212)};
    date::zoned_time ny_time{"America/New_York", utc_time};
    std::cout << "Epoch seconds:  " << ny_time.get_sys_time().time_since_epoch().count() << std::endl;
    std::cout << "UTC time:       " << date::format("%F %T %Z", utc_time) << '\n';
    std::cout << "NY time:        " << date::format("%F %T %Z", ny_time) << '\n';
    date::local_seconds naive_local = ny_time.get_local_time();
    std::cout << "NY time naive:  " << naive_local.time_since_epoch().count() << "\n";
}

Output:

Epoch seconds:  2222997212
UTC time:       2040-06-11 03:13:32 UTC
NY time:        2040-06-10 23:13:32 EDT
NY time naive:  2222982812

@kou kou changed the title compute::LocalTimestamp() Performs incorrect conversion [C++] compute::LocalTimestamp() Performs incorrect conversion Mar 11, 2025
@kou
Copy link
Member

kou commented Mar 12, 2025

TimestampArray values don't depend on timezone. TimestampArray::type() has timezone information instead. If you want to get offset-ed seconds by timezone. You need to convert it by yourself or we may want to add a new compute kernel for it.

BTW, why do you want to get offset-ed seconds?

FYI: The document of local_timestamp(): https://arrow.apache.org/docs/cpp/compute.html#timezone-handling

@gowerc
Copy link
Author

gowerc commented Mar 12, 2025

Hi @kou thank you for your time and reply,

You need to convert it by yourself or we may want to add a new compute kernel for it.

Apologies I am confused, from the documentation I thought that was the exact purpose of the local_timestamp() computation ? In particular from the docs:

local_timestamp function converts UTC-relative timestamps to local “timezone-naive” timestamps. The timezone is taken from the timezone metadata of the input timestamps.

At least the implication of that from the the way its written is that it is performing the following calculation (which is what I am looking for):

$$ time_{local} = time_{utc} + offset(timezone) $$

I should also note for ~99% of cases I've tested so far the local_timestamp function appears to be working as I was hoping / expecting, I just found this one example where it is not performing as expected.

BTW, why do you want to get offset-ed seconds?

I am trying to write a small CLI tool that converts parquet data to XPT format; XPT format however has no support for timezones so how to correctly store timestamp data is dependent on the use case; some users prefer to store the data as timezone-naive whilst others (myself included) prefer to just store the UTC-relative timestamps. To this end I am just providing an option for the user to choose.

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

2 participants