Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Problem getting the data inside a noscript tag #221

Closed
miguelcrespo opened this issue Sep 15, 2018 · 2 comments
Closed

Problem getting the data inside a noscript tag #221

miguelcrespo opened this issue Sep 15, 2018 · 2 comments
Labels

Comments

@miguelcrespo
Copy link

Hi, I am having a problem when I try to get the data inside a noscript tag. I am trying with this url https://stackoverflow.com/questions/20170275/how-to-find-a-type-of-an-object-in-go when I use the following code

c.OnHTML(`body`, func(e *colly.HTMLElement) {
			name = e.ChildAttr(`img`, "src")
			log.Println(fmt.Sprintf("Name: %s", name))
		})

It prints something but when I go inside a noscript tag

c.OnHTML(`body noscript`, func(e *colly.HTMLElement) {
			name = e.ChildAttr(`img`, "src")
			log.Println(fmt.Sprintf("Name: %s", name))
		})

or noscript

Nothing is printed even when the page actually have an image inside a noscript tag

<noscript>
  <div>
     <img src="/posts/20170275/ivc/06b4" class="dno" alt="" width="0" height="0">
  </div>
</noscript>

Any idea why is this happening?

@vosmith
Copy link
Collaborator

vosmith commented Sep 17, 2018

@miguelcrespo I looked into this a little bit and found a trail that leads all the way back to the golang html package.

Colly relies on goquery for the DOM parsing where I found this issue mentioning the same problem you are having. This issue references An issue in Cascadia which is the library used to compile CSS queries. It seems to be the nature of the net/html parser, it converts the contents of a noscript tag into a text node, so nothing inside a noscript elem is searchable via CSS queries.

@vosmith vosmith closed this as completed Oct 2, 2018
@miguelcrespo
Copy link
Author

Hi @vosmith thank you for your answer It pointed me in the right direction.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants