Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

remove argument "return_xts" in influx_query? Add tsibble? #49

Open
dleutnant opened this issue Feb 5, 2019 · 10 comments
Open

remove argument "return_xts" in influx_query? Add tsibble? #49

dleutnant opened this issue Feb 5, 2019 · 10 comments

Comments

@dleutnant
Copy link
Owner

dleutnant commented Feb 5, 2019

Currently, the influx_query result is either a data.frame (tibble) or an xts object. The latter is only the case, if the function call specifically contains the return_xts=TRUE argument. However, this approach is strictly limited to the xts time series class. With the recent development of the tsibble package, a powerful time series class is available which could be useful to incorporate into influxdbr. Therefore, one idea is

  1. to remove the "return_xts" argument and always return a data.frame (tibble),
  2. add an unambiguous class (e.g. "influxdb_result") to the resulting data.frame and
  3. add conversion functions as_xts.influxdb_result, as_tsibble.influxdb_result.

This would allow the following chain:

influx_connection() %>% 
  influx_query() %>% 
  as_xts()

influx_connection() %>% 
  influx_query() %>% 
  as_tsibble()

This change would potentially break existing codes. What is the best-practice of deprecate function arguments?

@dleutnant dleutnant changed the title remove argument "return_xts" in influx_query? remove argument "return_xts" in influx_query? Add tsibble? Feb 5, 2019
@vspinu
Copy link
Contributor

vspinu commented Jun 23, 2019

What you propose makes sense to me.

I would add to this that internally there is no need to use json parser. Just request a csv and parse with read.csv, readr::read_csv or `data.table::fread. Fallback automatically to whichever library is present on the user's computer without adding hard dependencies. It's probably much faster this way and would be another step towards #54. If ok with you I can have a look.

Regarding the breaking change, I think it's best to just increment the major version of the package and announce that it's a breaking change. I am sure all users would appreciate this as the current return value is far from elegant.

@dleutnant
Copy link
Owner Author

If ok with you I can have a look.

Cool! That’d be great! Thank you!
In this case I’ll postpone the CRAN submission.

@dleutnant
Copy link
Owner Author

The proposed change most likely would also address #48.

@vspinu
Copy link
Contributor

vspinu commented Jun 24, 2019

I have looked at it and there is a hiccup with older version of influxdb (1.1.1, current stable is 1.7.6). When the accepted output is application/csv and there is a query error which generates 400 status the content of the error is empty. So the user won't get an informative output.

For example both of the following would produce an empty output:

 curl -v -H "Accept: application/csv" --data-urlencode "q=some-query" -G "http://localhost:8086/query"
 curl -v -H "Accept: application/csv" --data-urlencode "qq=some-query" -G "http://localhost:8086/query"

I don't know which version of influx fixed this issue, but 1.1.1 is still the default on ubuntu 18.04, which is relatively recent OS. So I believe this is a show stopper for the csv parser.

The good news is that most of the time is spent by influx on building the response, so parsing csv instead of json on the R side has small speed benefits relatively speaking.

@dleutnant
Copy link
Owner Author

OK, let's stick to the json parser. (anyway: nice idea to reduce package dependencies!)

@vspinu
Copy link
Contributor

vspinu commented Jun 26, 2019

A heads up. I am very close to finishing this and reducing dependencies to 0 on both write and read. Will be back with a PR hopefully later today.

@dleutnant
Copy link
Owner Author

Wow! looking forward to it!

@vspinu
Copy link
Contributor

vspinu commented Jun 26, 2019

Actually there is a small hick up with multiple queries. Do you really want to merge multiple queries into one big data.frame? I think it would make sense to return a list of data.frames, one data.frame per query and let the user decide what to do with it.

I am currently using a custom rbind_list which would fall back on data.table::rbindlist, dplyr::rbind_list and then base::rbind, whichever is available on the user's system without an explicit dependency. Both data.table:rbindlist and dplyr::rbind can handle heterogenuous components but not base::rbind.

If you really want binding for multiple quieries then one posibility would be to fail with message to install data.table or dplyr in case the user uses multiple quieries in one request (which I think is pretty rare). What do you think?

@dleutnant
Copy link
Owner Author

... I think it would make sense to return a list of data.frames...

Agreed, as each single result might contain a different set of tags.

@dleutnant
Copy link
Owner Author

... and fields, of course.

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

2 participants