Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Conflicting which for compressed formats with multiple sheets #412

Closed
chainsawriot opened this issue May 14, 2024 · 1 comment
Closed

Conflicting which for compressed formats with multiple sheets #412

chainsawriot opened this issue May 14, 2024 · 1 comment
Labels

Comments

@chainsawriot
Copy link
Collaborator

chainsawriot commented May 14, 2024

Of course, one can argue why anyone would use compressed formats with multiple sheets in the first place, e.g. xlsx.zip. But a bug is a bug.

The issue is that the which parameter of import() is used twice: first for selecting a file in the archive, and second for selecting a sheet.

rio/R/import.R

Line 131 in c86db70

file <- parse_archive(file, which = which, file_type = "zip")

rio/R/import.R

Line 156 in c86db70

x <- .import(file = file, which = which, ...)

In order not to make thing more complicated (such as introducing new parameters for such an edge case), my suggestion is simply to make some precedence rules.

zip_file <- tempfile(fileext = ".xlsx.zip")

rio::export(head(iris), zip_file)

raw_file <- utils::unzip(zip_file, list = TRUE)$Name[1]

rio::import(zip_file)
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1          5.1         3.5          1.4         0.2  setosa
#> 2          4.9         3.0          1.4         0.2  setosa
#> 3          4.7         3.2          1.3         0.2  setosa
#> 4          4.6         3.1          1.5         0.2  setosa
#> 5          5.0         3.6          1.4         0.2  setosa
#> 6          5.4         3.9          1.7         0.4  setosa

## this is fine-ish, I guess?
rio::import(zip_file, which = "aaaa.xlsx")
#> Warning in extract_func(file, files = file_list[grep(which2, file_list)[1]], :
#> requested file not found in the zip file
#> Error: `path` does not exist: '/tmp/RtmpH9K6ta/file831fb50f53589/aaaa.xlsx'

rio::import(zip_file, which = raw_file)
#> Error: Sheet 'file831fb5a3e85e.xlsx' not found

## a more illustrative example

zip_file2 <- tempfile(fileext = ".xlsx.zip")

rio::export(list(first_sheet = head(iris), second_sheet = tail(iris)), zip_file2)

xlsx_file <- tempfile(fileext = ".xlsx")

rio::export(list(first_sheet = head(iris), second_sheet = tail(iris)), xlsx_file)

rio::import(zip_file2, which = 1)
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1          5.1         3.5          1.4         0.2  setosa
#> 2          4.9         3.0          1.4         0.2  setosa
#> 3          4.7         3.2          1.3         0.2  setosa
#> 4          4.6         3.1          1.5         0.2  setosa
#> 5          5.0         3.6          1.4         0.2  setosa
#> 6          5.4         3.9          1.7         0.4  setosa
rio::import(zip_file2, which = 2)
#> Warning in extract_func(file, files = file_list[which], exdir = d): requested
#> file not found in the zip file
#> Error: 'file' has no extension

rio::import(xlsx_file, which = 1)
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1          5.1         3.5          1.4         0.2  setosa
#> 2          4.9         3.0          1.4         0.2  setosa
#> 3          4.7         3.2          1.3         0.2  setosa
#> 4          4.6         3.1          1.5         0.2  setosa
#> 5          5.0         3.6          1.4         0.2  setosa
#> 6          5.4         3.9          1.7         0.4  setosa
rio::import(xlsx_file, which = 2)
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
#> 1          6.7         3.3          5.7         2.5 virginica
#> 2          6.7         3.0          5.2         2.3 virginica
#> 3          6.3         2.5          5.0         1.9 virginica
#> 4          6.5         3.0          5.2         2.0 virginica
#> 5          6.2         3.4          5.4         2.3 virginica
#> 6          5.9         3.0          5.1         1.8 virginica

Created on 2024-05-14 with reprex v2.1.0

@chainsawriot
Copy link
Collaborator Author

ref #400

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant