Skip to content

Commit

Permalink
Support writing date/time values (#36)
Browse files Browse the repository at this point in the history
  • Loading branch information
junyuan-chen authored Apr 1, 2024
1 parent da191d3 commit d6f9f1c
Show file tree
Hide file tree
Showing 17 changed files with 341 additions and 252 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/CI-stable.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ jobs:
fail-fast: false
matrix:
version:
- '1.6'
- '1.7'
- '1'
os:
- 'ubuntu-latest'
Expand Down
6 changes: 4 additions & 2 deletions Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ CEnum = "fa961155-64e5-5f13-b03f-caf6b980ea82"
DataAPI = "9a962f9c-6df0-11e9-0e5d-c546b8b5ee8a"
Dates = "ade2ca70-3891-5945-98fb-dc099432e06a"
InlineStrings = "842dd82b-1e85-43dc-bf29-5d0ee9dffc48"
MappedArrays = "dbb5928d-eab1-5f90-85c2-b9b0edb7c900"
PooledArrays = "2dfb63ee-cc39-5dd5-95bd-886bf059d720"
PrecompileTools = "aea7be01-6a6a-4083-8856-8a6e6704d82a"
PrettyTables = "08abe8d2-0d0c-5749-adfa-8a2ac140af0d"
Expand All @@ -22,14 +23,15 @@ CategoricalArrays = "0.10"
DataAPI = "1.13"
DataFrames = "1"
InlineStrings = "1.1"
MappedArrays = "0.4"
PooledArrays = "1"
PrecompileTools = "1"
PrettyTables = "1, 2"
ReadStat_jll = "1.1.5"
ReadStat_jll = "1.1.9"
SentinelArrays = "1.2"
StructArrays = "0.6"
Tables = "1.2"
julia = "1.3"
julia = "1.7"

[extras]
CategoricalArrays = "324d7699-5711-5eae-9e2f-1d82baa6b597"
Expand Down
4 changes: 4 additions & 0 deletions data/alltypes.do
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,10 @@ gen float vfloat = 1 if _n == 1
gen double vdouble = 1 if _n == 1
gen str2 vstr = "ab" if _n == 1
gen strL vstrL = "This is a long string! This is a long string! This is a long string! This is a long string! This is a long string!" if _n == 1
gen int vdate = 1 if _n == 1
format vdate %td
gen double vtime = 1 if _n == 1
format vtime %tc

replace vbyte = .a if _n == 2
replace vint = .a if _n == 2
Expand Down
Binary file modified data/alltypes.dta
Binary file not shown.
57 changes: 45 additions & 12 deletions docs/src/man/date-and-time-values.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

Date and time values in the data files are recognized based on
the format of each variable.
Most data/time formats can be recognized without user intervention.[^1]
Many data/time formats can be recognized without user intervention.[^1]
In case certain date/time formats are not recognized,
they can be added easily.

Expand All @@ -14,13 +14,52 @@ since a reference date or time point (epoch) chosen by the software.
Therefore, knowing the reference data/time and the length of a single period
is sufficient for uncovering the represented date/time values for a given format.

If a variable is in a date/time format that can be recognized,
the values will be displayed as Julia `Date` or `DateTime`
when printing a `ReadStatTable`.
Notice that the underlying numerical values are preserved
and the conversion to the Julia `Date` or `DateTime` happens only lazily
via a [`MappedArray`](https://github.com/JuliaArrays/MappedArrays.jl)
when working with a `ReadStatTable`.

```@repl date
using ReadStatTables, DataFrames
tb = readstat("data/sample.dta")
tb.mydate
tb.mydate.data
colmetadata(tb, :mydate, "format")
```

The variable-level metadata key named `format` informs
`ReadStatTable` whether the variable represents date/time
and how the numerical values should be interpreted.
Changing the `format` directly affects how the values are displayed,
although the numerical values remain unchanged.

```@repl date
colmetadata!(tb, :mydate, "format", "%tm")
tb.mydate
colmetadata!(tb, :mydate, "format", "%8.0f")
tb.mydate
```

Copying a `ReadStatTable` (e.g., converting to a `DataFrame`)
may drop the underlying numerical values.
Hence, users who wish to directly work with the underlying numerical values
may want to preserve the `ReadStatTable` generated from the data file.

```@repl date
df = DataFrame(tb)
df.mydate
```

In the above example, `df.mydate` only contains the `Date` values
and the underlying numerical values are lost when constructing the `DataFrame`.

The full lists of recognized date/time formats for the statistical software
are stored as dictionary keys;
while the associated values are tuples of reference date/time and period length.[^2]
If a variable is in a date/time format that can be found in the dictionary,
[`readstat`](@ref) will handle the conversion to a Julia time type
(unless the `convert_datetime` option prevents it).
Otherwise, if a date/time format is not found in the dictionary,
If a date/time format is not found in the dictionary,
no type conversion will be attempted.
Additional formats may be added by inserting key-value pairs to the relevant dictionaries.

Expand All @@ -34,13 +73,6 @@ ReadStatTables.sas_dt_formats["MMDDYY"]
ReadStatTables.spss_dt_formats["TIME"]
```

Translation of the date/time values into a Julia time type is handled by
`parse_datetime`, which is not exported.

```@docs
ReadStatTables.parse_datetime
```

[^1]:

For Stata, all date/time formats except `%tC` and `%d` are supported.
Expand All @@ -52,6 +84,7 @@ ReadStatTables.parse_datetime
only the `%tc` format is supported.
The `%d` format that appears in earlier versions of Stata
is no longer documented in recent versions.
For SAS and SPSS, the coverage of date/time formats might be less comprehensive.

[^2]:

Expand Down
4 changes: 4 additions & 0 deletions docs/src/man/table-interface.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,11 @@ tb[1,1]
tb[1,:mylabl]
tb[1,:mylabl] = 2
tb[1,:mylabl]
tb[1,:mydate]
tb[1,:dtime]
```

Notice that for data columns with value labels,
these methods only deal with the underlying values and disregard the value labels.
Similarly, for data columns with a date/time format,
the numerical values instead of the converted `Date`/`DateTime` values are returned.
1 change: 1 addition & 0 deletions src/ReadStatTables.jl
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ using DataAPI: refpool
using Dates
using Dates: unix2datetime
using InlineStrings
using MappedArrays: MappedArray, mappedarray
using PooledArrays: PooledArray, PooledVector, RefArray
using PrettyTables: pretty_table
using ReadStat_jll
Expand Down
Loading

0 comments on commit d6f9f1c

Please # to comment.