Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

bind -> rbind.fill; getMetadata; GeoJSON method #56

Merged
merged 2 commits into from
Oct 26, 2015
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion .Rbuildignore
Original file line number Diff line number Diff line change
Expand Up @@ -10,4 +10,7 @@ RSocrata.Rcheck
.DS_Store
^\.travis\.yml$
appveyor.yml
CONTRIBUTING.md
CONTRIBUTING.md
vignettes/rsconnect
vignettes/bench.rmd
^.*\.o$
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -14,3 +14,4 @@
*.Rhistory
.Rproj.user
inst/doc
/vignettes/rsconnect
1 change: 1 addition & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ r_github_packages:
- jeroenooms/curl
- klutometis/roxygen
- jimhester/covr
- yihui/mime
- ropensci/geojsonio
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now also require mime version >= 4.0 which has been published to CRAN
https://cran.rstudio.com/web/packages/mime/index.html


after_success:
Expand Down
31 changes: 16 additions & 15 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -5,27 +5,28 @@ Description: Provides easier interaction with
Socrata open data portals http://dev.socrata.com.
Users can provide a 'Socrata' data set resource URL,
or a 'Socrata' Open Data API (SoDA) web query,
or a 'Socrata' "human-friendly" URL,
returns an R data frame.
Converts dates to 'POSIX' format.
Manages throttling by 'Socrata'.
Version: 1.6.3
Date: 2015-07-23
Author: Hugh Devlin, Ph. D., Tom Schenk, Jr., and John Malc
or a 'Socrata' "human-friendly" URL, all of which
return a R data frame.
Additionally, it converts dates to 'POSIX' format,
manages throttling by 'Socrata' and supports geospacial data.
Version: 1.7.5
Date: 2015-10-10
Author: Hugh Devlin, Ph. D., Tom Schenk, Jr., David A Springate (@DASpringate) and John Malc (@dmpe)
Maintainer: "Tom Schenk Jr." <developers@cityofchicago.org>
Depends:
R (>= 3.0.0)
R (>= 3.2.2)
Imports:
httr (>= 1.0.0),
jsonlite (>= 0.9.16),
mime (>= 0.3)
jsonlite (>= 0.9.17),
mime (>= 0.4),
geojsonio (>= 0.1.4),
plyr (>= 1.8.3)
Suggests:
testthat (>= 0.10.0),
roxygen2 (>= 4.1.0),
knitr (>= 1.10.5),
leaflet (>= 1.0.0),
geojsonio (>= 0.1.0)
roxygen2 (>= 4.1.1),
knitr (>= 1.11),
leaflet (>= 1.0.0)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leaflet is now used in the vignette with examples

License: MIT + file LICENSE
URL: https://github.com/Chicago/RSocrata
BugReports: https://github.com/Chicago/RSocrata/issues
VignetteBuilder: knitr
VignetteBuilder: knitr
7 changes: 5 additions & 2 deletions NAMESPACE
Original file line number Diff line number Diff line change
@@ -1,15 +1,18 @@
# Generated by roxygen2 (4.1.1): do not edit by hand

export(fieldName)
export(getMetadata)
export(isFourByFour)
export(ls.socrata)
export(posixify)
export(read.socrata)
export(read.socrataGEO)
export(validateUrl)
importFrom(geojsonio,geojson_read)
importFrom(httr,GET)
importFrom(httr,add_headers)
importFrom(httr,build_url)
importFrom(httr,content)
importFrom(httr,parse_url)
importFrom(httr,stop_for_status)
importFrom(jsonlite,fromJSON)
importFrom(mime,guess_type)
importFrom(plyr,rbind.fill)
17 changes: 10 additions & 7 deletions NEWS.md → NEWS
Original file line number Diff line number Diff line change
Expand Up @@ -38,10 +38,13 @@ Deprecated ```httr::guess_media()``` and implemented ```mime::guess_type()```
* Migrate Travis-CI to "proper" R YAML ([#46](https://github.com/Chicago/RSocrata/issues/46))


### 1.6.3 / 1.6.2 (see roadmap)

* Add a small vignette with existing examples
* Mostly internal changes which should not influence the current behaviour ([#53](https://github.com/Chicago/RSocrata/pull/53))
* Add support of a `floating timestamp`
* New error handling function

### 1.7 Several changes, bug fixes and new features

* Create a small vignette with existing examples and add new ones with with ```leaflet``` map package.
* Some internal changes ([#53](https://github.com/Chicago/RSocrata/pull/53))
* Add support of a `floating timestamp`
* New error handling function
* New functions returning metadata and GeoJSON data (similar to the read.socrata)
* Should be fixing [#27](https://github.com/Chicago/RSocrata/issues/27) + [#24](https://github.com/Chicago/RSocrata/pull/25)
* Many thanks to the [David A Springate](https://github.com/DASpringate), who wrote a function in 2014 to allow conversion of files with some missing columns to dataframe. This should be now fixing the long standing issue with rbind [https://github.com/Chicago/RSocrata/issues/19](https://github.com/Chicago/RSocrata/issues/19).
* ```rbind``` from base system has been replaced with `plyr's` `rbind.fill`, which can be faster [in some cases](https://github.com/Chicago/RSocrata/pull/56).
43 changes: 22 additions & 21 deletions R/errorHandling.R
Original file line number Diff line number Diff line change
@@ -1,44 +1,45 @@
#' Provides error handling functionality
#'
#' @description Based on \url{http://dev.socrata.com/docs/response-codes.html}
#'
#' @section TODO: Add messages that alert the user on the URL being valid,
#' but one that is not compatible with RSocrata.
#' See \url{https://github.com/Chicago/RSocrata/issues/16}
#'
#' @param rsp - \code{\link{httr::response}} response
#' @importFrom httr stop_for_status
#'
#' @noRd
errorHandling <- function(rsp = NULL) {
# Provides error handling functionality
#
# @description Based on \url{http://dev.socrata.com/docs/response-codes.html}
#
# @section TODO: Add messages that alert the user on the URL being valid,
# but one that is not compatible with RSocrata.
# See \url{https://github.com/Chicago/RSocrata/issues/16}
#
# @param url - SOPA url
#' @importFrom httr stop_for_status GET add_headers
errorHandling <- function(url = "", app_token = NULL) {
rsp <- httr::GET(url, httr::add_headers("X-App-Token" = app_token))

if (rsp$status_code == 200) {
invisible("OK. Your request was successful.")

} else if(rsp$status_code == 202) {
} else if (rsp$status_code == 202) {
warning("202 Request processing. You can retry your request, and when it's complete, you'll get a 200 instead.")

} else if(rsp$status_code == 400) {
} else if (rsp$status_code == 400) {
stop("400 Bad request. Most probably was your request malformed (e.g URL with ?)")

} else if(rsp$status_code == 401) {
} else if (rsp$status_code == 401) {
# only necessary when accessing datasets that have been marked as private or when making write requests (PUT, POST, and DELETE)
stop("Unauthorized. You attempted to authenticate but something went wrong.")
stop("Unauthorized. You attempted to authenticate but something went wrong.")

} else if(rsp$status_code == 403) {
} else if (rsp$status_code == 403) {
stop("Forbidden. You're not authorized to access this resource. Make sure you authenticate to access private datasets.")

} else if(rsp$status_code == 404) {
} else if (rsp$status_code == 404) {
stop("Not found. The resource requested doesn't exist.")

} else if(rsp$status_code == 429) {
} else if (rsp$status_code == 429) {
stop("Too Many Requests. Your client is currently being rate limited. Make sure you're using an app token.")

} else if(rsp$status_code == 500) {
} else if (rsp$status_code == 500) {
stop("Server error. Try later.")

} else {
httr::stop_for_status(rsp)
}

return(rsp)

}
97 changes: 97 additions & 0 deletions R/metadata.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
#' Return metadata about a Socrata dataset
#'
#' This function returns metadata about a dataset. Generally, such metadata can be accessed
#' with browser at \code{http://DOMAIN/api/views/FOUR-FOUR/rows.json} or
#' \code{http://DOMAIN/api/views/FOUR-FOUR/columns.json}, which is used here.
#'
#' @param url - A Socrata resource URL, or a Socrata "human-friendly" URL!
#'
#' @source \url{http://stackoverflow.com/a/29782941}
#'
#' @examples
#' \dontrun{
#' gM1 <- getMetadata(url = "http://data.cityofchicago.org/resource/y93d-d9e3.json")
#' gM3 <- getMetadata(url = "https://data.cityofchicago.org/resource/6zsd-86xi.json")
#' gM2 <- getMetadata(url = "https://data.cityofboston.gov/resource/awu8-dc52")
#' }
#'
#' @return a list (!) containing a number of rows & columns and a data frame of metadata
#'
#' @importFrom jsonlite fromJSON
#' @importFrom httr parse_url build_url
#' @importFrom mime guess_type
#'
#' @author John Malc \email{cincenko@@outlook.com}
#'
#' @export
getMetadata <- function(url = "") {

urlParsedBase <- httr::parse_url(url)
mimeType <- mime::guess_type(urlParsedBase$path)

# use function below to get them using =COUNT(*) SODA query
gQRC <- getQueryRowCount(urlParsedBase, mimeType)

# create URL for metadata data frame
fourByFour <- substr(basename(urlParsedBase$path), 1, 9)
urlParsed <- urlParsedBase
urlParsed$path <- paste0("api/views/", fourByFour, "/columns.json")

# execute it
URL <- httr::build_url(urlParsed)
df <- jsonlite::fromJSON(URL)

# number of rows can be sometimes "cached". If yes, then below we calculate the maximum number of
# rows from all non-null and null fields.
# If not, then it uses "getQueryRowCount" fnct with SODA =COUNT(*) SODA query.
rows <- if (suppressWarnings(max(df$cachedContents$non_null + df$cachedContents$null)) > 0) {
suppressWarnings(max(df$cachedContents$non_null + df$cachedContents$null))
} else {
# as.numeric(ifelse(is.null(gQRC$count), gQRC$COUNT, gQRC$count)) # the reason
as.numeric(tolower(gQRC$COUNT))
}

columns <- as.numeric(nrow(df))

return(list(rows = rows, cols = columns, df))
}

# Return (always & only) number of rows as specified in the metadata of the data set
#
# @source Taken from \link{https://github.com/Chicago/RSocrata/blob/sprint7/R/getQueryRowCount.R}
# @author Gene Leynes \email{gleynes@@gmail.com}
#
#' @importFrom httr GET build_url content
getQueryRowCount <- function(urlParsed, mimeType) {
## Construct the count query based on the URL,
if (is.null(urlParsed[['query']])) {
## If there is no query at all, create a simple count

cntQueryText <- "?$SELECT=COUNT(*)"
} else {
## Otherwise, construct the query text with a COUNT command at the beginning of any other
## limiting commands. Reconstitute the httr url into a string
cntQueryText <- httr::build_url(structure(list(query = urlParsed[['query']]), class = "url"))
## Add the COUNT command to the beginning of the query
cntQueryText <- gsub(pattern = ".+\\?", replacement = "?$SELECT=COUNT(*)&", cntQueryText)
}

## Combine the count query with the rest of the URL
cntUrl <- paste0(urlParsed[[c('scheme')]], "://", urlParsed[[c('hostname')]], "/",
urlParsed[[c('path')]], cntQueryText)

## Execute the query to count the rows
totalRowsResult <- errorHandling(cntUrl, app_token = NULL)

## Parsing the result depends on the mime type
if (mimeType == "application/json") {
totalRows <- httr::content(totalRowsResult)[[1]]
} else {
totalRows <- httr::content(totalRowsResult)
}

## Limit the row count to $limit (if the $limit existed).
# totalRows <- min(totalRows, as.numeric(rowLimit))

return(totalRows)
}
Loading