Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 22 additions & 29 deletions parquet/src/record/api.rs
Original file line number Diff line number Diff line change
Expand Up @@ -928,29 +928,22 @@ fn convert_date_to_string(value: i32) -> String {
format!("{}", dt.format("%Y-%m-%d"))
}

/// Helper method to convert Parquet timestamp into a string.
/// Input `value` is a number of seconds since the epoch in UTC.
/// Datetime is displayed in local timezone.
#[inline]
fn convert_timestamp_secs_to_string(value: i64) -> String {
let dt = Utc.timestamp_opt(value, 0).unwrap();
format!("{}", dt.format("%Y-%m-%d %H:%M:%S %:z"))
}

/// Helper method to convert Parquet timestamp into a string.
/// Input `value` is a number of milliseconds since the epoch in UTC.
/// Datetime is displayed in local timezone.
#[inline]
fn convert_timestamp_millis_to_string(value: i64) -> String {
convert_timestamp_secs_to_string(value / 1000)
let dt = Utc.timestamp_millis_opt(value).unwrap();
format!("{}", dt.format("%Y-%m-%d %H:%M:%S%.3f %:z"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The input is i64, so it will never attach a time zone, and the output will always be' +00:00'. Do we need to add "%:z" in the output?

From Utc.timestamp_millis_opt, it seems that it will always return a UTC timezone value, so maybe we need to change the comment for this function(the comment said the result will be in local timezone).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another question not related to the current PR: this function convert_timestamp_millis_to_string will be used in Field, and Field contains an i64 for TimestampMillis/TimestampMicros(the milliseconds/microseconds from unix epoch), does output the raw timestamp(the i64 value) here is enough? and translate the timestamp to dt.format value when displaying something like LogicalType::Timestamp

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think saying that "Datetime is displayed in local timezone." is misleading in this case.
Maybe we indeed can remove "%:z" and fix comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue as I see it is that the record api only uses ConvertedType, so the timestamps are normalized to UTC. I would suggest keeping the %:z and simply update the documentation.

I also think this may be a breaking change since it will change the JSON output value. What do you think @alamb?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have much insight into this particular API and if anyone would treat this as an API change

Since the existing code uses almost the same format string it seems like not a big change to me, but I don't really know

}

/// Helper method to convert Parquet timestamp into a string.
/// Input `value` is a number of microseconds since the epoch in UTC.
/// Datetime is displayed in local timezone.
#[inline]
fn convert_timestamp_micros_to_string(value: i64) -> String {
convert_timestamp_secs_to_string(value / 1000000)
let dt = Utc.timestamp_micros(value).unwrap();
format!("{}", dt.format("%Y-%m-%d %H:%M:%S%.6f %:z"))
}

/// Helper method to convert Parquet time (milliseconds since midnight) into a string.
Expand Down Expand Up @@ -1278,44 +1271,44 @@ mod tests {

#[test]
fn test_convert_timestamp_millis_to_string() {
fn check_datetime_conversion(y: u32, m: u32, d: u32, h: u32, mi: u32, s: u32) {
fn check_datetime_conversion(y: u32, m: u32, d: u32, h: u32, mi: u32, s: u32, milli: u32) {
let datetime = chrono::NaiveDate::from_ymd_opt(y as i32, m, d)
.unwrap()
.and_hms_opt(h, mi, s)
.and_hms_milli_opt(h, mi, s, milli)
.unwrap();
let dt = Utc.from_utc_datetime(&datetime);
let res = convert_timestamp_millis_to_string(dt.timestamp_millis());
let exp = format!("{}", dt.format("%Y-%m-%d %H:%M:%S %:z"));
let exp = format!("{}", dt.format("%Y-%m-%d %H:%M:%S%.3f %:z"));
assert_eq!(res, exp);
}

check_datetime_conversion(1969, 9, 10, 1, 2, 3);
check_datetime_conversion(2010, 1, 2, 13, 12, 54);
check_datetime_conversion(2011, 1, 3, 8, 23, 1);
check_datetime_conversion(2012, 4, 5, 11, 6, 32);
check_datetime_conversion(2013, 5, 12, 16, 38, 0);
check_datetime_conversion(2014, 11, 28, 21, 15, 12);
check_datetime_conversion(1969, 9, 10, 1, 2, 3, 4);
check_datetime_conversion(2010, 1, 2, 13, 12, 54, 42);
check_datetime_conversion(2011, 1, 3, 8, 23, 1, 27);
check_datetime_conversion(2012, 4, 5, 11, 6, 32, 0);
check_datetime_conversion(2013, 5, 12, 16, 38, 0, 15);
check_datetime_conversion(2014, 11, 28, 21, 15, 12, 59);
}

#[test]
fn test_convert_timestamp_micros_to_string() {
fn check_datetime_conversion(y: u32, m: u32, d: u32, h: u32, mi: u32, s: u32) {
fn check_datetime_conversion(y: u32, m: u32, d: u32, h: u32, mi: u32, s: u32, micro: u32) {
let datetime = chrono::NaiveDate::from_ymd_opt(y as i32, m, d)
.unwrap()
.and_hms_opt(h, mi, s)
.and_hms_micro_opt(h, mi, s, micro)
.unwrap();
let dt = Utc.from_utc_datetime(&datetime);
let res = convert_timestamp_micros_to_string(dt.timestamp_micros());
let exp = format!("{}", dt.format("%Y-%m-%d %H:%M:%S %:z"));
let exp = format!("{}", dt.format("%Y-%m-%d %H:%M:%S%.6f %:z"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be nice if we could change these tests to have hard coded values rather than just programatically checking consistency -- with just these changes, I don't have any sense of how different the output is after this change.

assert_eq!(res, exp);
}

check_datetime_conversion(1969, 9, 10, 1, 2, 3);
check_datetime_conversion(2010, 1, 2, 13, 12, 54);
check_datetime_conversion(2011, 1, 3, 8, 23, 1);
check_datetime_conversion(2012, 4, 5, 11, 6, 32);
check_datetime_conversion(2013, 5, 12, 16, 38, 0);
check_datetime_conversion(2014, 11, 28, 21, 15, 12);
check_datetime_conversion(1969, 9, 10, 1, 2, 3, 4);
check_datetime_conversion(2010, 1, 2, 13, 12, 54, 42);
check_datetime_conversion(2011, 1, 3, 8, 23, 1, 27);
check_datetime_conversion(2012, 4, 5, 11, 6, 32, 0);
check_datetime_conversion(2013, 5, 12, 16, 38, 0, 15);
check_datetime_conversion(2014, 11, 28, 21, 15, 12, 59);
}

#[test]
Expand Down
Loading