-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
NA's from date fields when reading JSON's #121
Comments
I've replicated the error with the data set above with the most recent version. However, I was not able to produce the problem on similar data set with timestamps. The issue may be related to #68, but I will investigate a bit further to determine the cause. @jjhall77 - In the meantime, the following commands will work by not specifying the JSON file. The dates appear properly formatted in R. url <- "https://data.cityofnewyork.us/Health/DOHMH-New-York-City-Restaurant-Inspection-Results/xx67-kt59"
df <- read.socrata(url) |
The underlying problem is the date format for > posixify("2017-02-21T00:00:00")
[1] NA However, the date format in my other example, > posixify("2016-02-18T00:00:00.000")
[1] "2016-02-18 CST" However, this type of format is not used in the CSV files, so that will successfully work. When the file format is not specified in the request, it defaults to CSV. We will patch this bug and release it on GitHub, which can be installed using |
After looking at the code and realizing we've previously had similar errors with #68 and #106, I'm going to try to improve the error handling for this with the following steps:
|
In my experience it's tricky to get R to print out subseconds. You might find this code useful: This allows you to see three decimal places which are otherwise hidden. For example:
|
@geneorama - displaying the decimal seconds was not the issue in this case. It was anything in ISO 8601 without decimal seconds that was causing the error. |
I intended that as an FYI not a fix. I thought it might be useful for
debugging. Sorry if that wasn't clear.
…On Thursday, February 23, 2017, Tom Schenk Jr ***@***.***> wrote:
@geneorama <https://github.com/geneorama> - displaying the decimal
seconds was not the issue in this case. It was anything in ISO 8601
*without* decimal seconds that was causing the error.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#121 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABU9xPEaFUQi5uSWASf4njBtB23XCLmtks5rfly-gaJpZM4MGzL4>
.
|
@tomschenkjr Commenting as of b0ce826 For the most part the function's doing what I would expect, but I noticed something that I didn't expect. The fractional part of the second doesn't seem to be captured, and it doesn't round the result. For example:
Is this expected behavior? |
Yes, since I think that is how R is deciding to handle it.
What are your `options()` set at?
From: Gene Leynes [mailto:notifications@github.com]
Sent: Friday, March 03, 2017 1:33 PM
To: Chicago/RSocrata <RSocrata@noreply.github.com>
Cc: Schenk, Tom <Tom.Schenk@cityofchicago.org>; Mention <mention@noreply.github.com>
Subject: Re: [Chicago/RSocrata] NA's from date fields when reading JSON's (#121)
@tomschenkjr<https://github.com/tomschenkjr> Commenting as of b0ce826<b0ce826>
For the most part the function's doing what I would expect, but I noticed something that I didn't expect.
The fractional part of the second doesn't seem to be captured, and it doesn't round the result. For example:
posixify("2012-09-14T22:14:21.000")
[1] "2012-09-14 22:14:21 CDT"
posixify("2012-09-14T22:14:21.999")
[1] "2012-09-14 22:14:21 CDT"
Is this expected behavior?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#121 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ABkC0ZayUikNkMPa_2_oCDVlfsF2ixHeks5riGrRgaJpZM4MGzL4>.
…________________________________
This e-mail, and any attachments thereto, is intended only for use by the addressee(s) named herein and may contain legally privileged and/or confidential information. If you are not the intended recipient of this e-mail (or the person responsible for delivering this document to the intended recipient), you are hereby notified that any dissemination, distribution, printing or copying of this e-mail, and any attachment thereto, is strictly prohibited. If you have received this e-mail in error, please respond to the individual sending the message, and permanently delete the original and any copy of any e-mail and printout thereof.
|
The options didn't matter, I set the digits to 3 and it still didn't show the fractional part of the second. My point in the example above is that even .999 gets rounded / shown as 0.
|
The example above shows that the fractional time isn't coming through, even though it could be there. |
To clarify, R does not appear to round decimal times and just truncates it and is similar to what RSocrata also does, so I think that would be expected behavior. z <- Sys.time()
z
# [1] "2017-03-03 17:14:45 CST"
options(digits.secs = 3)
z
# [1] "2017-03-03 17:14:45.517 CST" |
When printing the time off options(digits.secs = 3)
posixify("2012-09-14T22:14:21.999")
# [1] "2012-09-14 22:14:21 CDT" According to the help for
The help implies that if you have your options set, the format function will use the decimal portion of the time given, however this doesn't seem to be the case > options(digits.secs = 3)
> as.POSIXct("2012-09-14T22:14:21.999", format = "%Y-%m-%dT%H:%M:%S")
[1] "2012-09-14 22:14:21 CDT" Also, the help implies that you shouldn't need to specify the format because the default is > as.POSIXct("2012-09-14T22:14:21.999")
[1] "2012-09-14 CDT" It bothers me that there would be a loss of data for fractional seconds, but there doesn't seem to be a good way around it so I'm accepting the pull request. |
As far as I can tell the cleanest way to fix ## Conversion without setting digits.secs (NULL is the default)
> options(digits.secs = NULL)
> as.POSIXct("2012-09-14 22:14:21.999")
[1] "2012-09-14 22:14:21 CDT" # Fractional second is lost, not hidden (you can check with unclass)
## Conversion WITH setting digits.secs set
> options(digits.secs = 3)
> as.POSIXct("2012-09-14 22:14:21.999")
[1] "2012-09-14 22:14:21.999 CDT"
## Note that even with digits.secs set using a "format" argument will lop off the fractional second.
> as.POSIXct("2012-09-14 22:14:21.999", format = "%Y-%m-%d %H:%M:%S")
[1] "2012-09-14 22:14:21 CDT" What I would do if we have fractional seconds present in the data
As discussed I'm going to accept the current pull request as is so that we can move forward with other issues. |
Also note, I don't know of a data set that has subseconds in it. I didn't see any in the NY food inspections above. This made me wonder if some of the times had subsecond values and some time values didn't. If this is possible, then we should add a test for mixes of subsecond and non-subsecond JSON vectors, e.g.: |
* Added testing for issue #121 -- will fail * Changed to camel case to accomodate future variable renaming * Added handling of ISO 8601 timestamps without fractional seconds. See issue #121 * Increased minimum version of R because of `anyNA` function. * Unformatted dates kept as chr; added warning; added tests * Added tests for #124 -- will fail * Handles downloadURL from ls.socrata. Fixes #124
* Added unit tests for issue #118 * Add unit test for #120 * Bug fix - fixes #120 * Update version number * Fixes #124 (#125) * Added testing for issue #121 -- will fail * Changed to camel case to accomodate future variable renaming * Added handling of ISO 8601 timestamps without fractional seconds. See issue #121 * Increased minimum version of R because of `anyNA` function. * Unformatted dates kept as chr; added warning; added tests * Added tests for #124 -- will fail * Handles downloadURL from ls.socrata. Fixes #124 * Closes #129 * Iteration version * startsWith() requires R >= 3.3.0 * Resolves #133. Also documents per #132 * Updated to reflect consequence of #133 * resolved merge conflict from rebasing * fixing test - dataset is live so rows can increase, and fixed data types. * add unit test for issue #118 - will fail * fixes #118 * Add .gitattributes, ignores it during R build. Closes #135 * Fixes #134 * Clarified language on installing from GitHub * Removed penultimate release from build testing. Closes #136 * Updated formatting for sections * Update NEWS.md for 1.7.2 release. * Included remaining changes to NEWS
When I use read.socrata on a link with a json, all of the date fields are returned as NA's.
url <- "https://data.cityofnewyork.us/resource/xx67-kt59.json" df <- read.socrata(url)
The dates are populated in the actual json file.
The text was updated successfully, but these errors were encountered: