Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Having trouble scraping historical data using rvest on CSS html pages #318

Closed
adityabaskaran opened this issue May 22, 2021 · 10 comments
Closed

Comments

@adityabaskaran
Copy link

Please briefly describe your problem and what output you expect. If you have a question, please don't use this form. Instead, ask on https://stackoverflow.com/ or https://community.rstudio.com/.

Please include a minimal reproducible example (AKA a reprex). If you've never heard of a reprex before, start by reading https://www.tidyverse.org/help/#reprex.

test_url <-"https://www.moneycontrol.com/stocks/hist_stock_result.php?ex=B&sc_id=ITC&mycomp=ITC" %>% read_html()

test.table<- test_url %>% html_nodes("table") %>% html_table()

Error in matrix(unlist(values), ncol = width, byrow = TRUE) :
'data' must be of a vector type, was 'NULL'

Brief description of the problem

# insert reprex here
@epiben
Copy link
Contributor

epiben commented Aug 3, 2021

I believe this works with v1.0.1.

@hadley
Copy link
Member

hadley commented Aug 3, 2021

I still see a failure:

library(rvest)

test_url <- "https://www.moneycontrol.com/stocks/hist_stock_result.php?ex=B&sc_id=ITC&mycomp=ITC" %>% read_html()

test.table <- test_url %>%
  html_nodes("table") %>%
  html_table()
#> Error in matrix(unlist(values), ncol = width, byrow = TRUE): 'data' must be of a vector type, was 'NULL'

Created on 2021-08-03 by the reprex package (v2.0.0)

@epiben
Copy link
Contributor

epiben commented Aug 3, 2021

Sorry, my bad. This is solved with v1.0.1.9000 (current master at GitHub, but not yet on CRAN):

remotes::install_github("tidyverse/rvest", force = TRUE)
library(rvest)

"https://www.moneycontrol.com/stocks/hist_stock_result.php?ex=B&sc_id=ITC&mycomp=ITC" %>% 
	read_html() %>%
	html_elements("table") %>% 
	html_table()

yields

[[1]]
# A tibble: 1 x 1
  X1   
  <lgl>
1 NA   

[[2]]
# A tibble: 1 x 3
  X1            X2           X3                       
  <chr>         <chr>        <chr>                    
1 Period High : Period Low : Change in market-cap : 0%

[[3]]
# A tibble: 0 x 0

[[4]]
# A tibble: 1 x 2
  X1      X2       
  <chr>   <chr>    
1 AT (Rs) GAIN (Rs)

[[5]]
# A tibble: 1 x 2
  X1         X2        
  <chr>      <chr>     
1 RECO PRICE PEAK PRICE

@epiben
Copy link
Contributor

epiben commented Aug 3, 2021

Ah, our replies crossed 😀

@hadley
Copy link
Member

hadley commented Aug 3, 2021

@epiben nice, I can close this then 😄

@hadley hadley closed this as completed Aug 3, 2021
@gunawebs
Copy link

gunawebs commented Sep 26, 2022

I still see the error. J FYI

k<-read_html("https://www.geos.ed.ac.uk/sccs/project-info/1182")

k%>%html_table(".table", header = FALSE)

Error in matrix(unlist(values), ncol = width, byrow = TRUE) :
'data' must be of a vector type, was 'NULL'

packageVersion("rvest")

[1] ‘1.0.3’

@epiben
Copy link
Contributor

epiben commented Sep 26, 2022

I'll look into it ASAP.

@epiben
Copy link
Contributor

epiben commented Sep 26, 2022

Turns out that the offending table (the eleventh) was empty in the sense of having no cells, but it did have a row. I proposed a fix to this in #360.

@gunawebs
Copy link

Thanks @epiben
yes, the offending table was empty. I worked around it, by lappying and check for emptiness. So all good for now

But just wanted to let you know. Just in case you wanted to handle this case too. Thanks again!

@epiben
Copy link
Contributor

epiben commented Sep 27, 2022

No worries, and thank you for raising it here so we can handle also this edge case; it's better to handle it up front. I never knew empty tables came in so many different forms.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants