Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

css选择器无法选择h3下的p标签 #203

Open
xujiang1 opened this issue Oct 26, 2020 · 3 comments
Open

css选择器无法选择h3下的p标签 #203

xujiang1 opened this issue Oct 26, 2020 · 3 comments
Labels

Comments

@xujiang1
Copy link

xujiang1 commented Oct 26, 2020

from parsel import Selector

html = "<h3>吉林大学社会科学学报<p>Jilin University Journal Social Sciences Edition</p></h3>"

sel = Selector(html)

print(sel.css("h3"))

print(sel.css("h3 > p::text").getall())

当我使用css选择器时 无法获取h3下的p标签,结果如下:


[<Selector xpath='descendant-or-self::h3' data='<h3>吉林大学社会科学学报</h3>'>]

[]

当我将p标签换成其他标签时可以正常获取:

from parsel import Selector

html = "<h3>吉林大学社会科学学报<em>Jilin University Journal Social Sciences Edition</em></h3>"

sel = Selector(html)

print(sel.css("h3"))

print(sel.css("h3 > em::text").getall())

结果:

[<Selector xpath='descendant-or-self::h3' data='<h3>吉林大学社会科学学报<em>Jilin University Jo...'>]

['Jilin University Journal Social Sciences Edition']
@Gallaecio
Copy link
Member

Gallaecio commented Oct 30, 2020

Interesting. I’m marking it as a bug, although I am not 100% it is one, and even if it is, it is probably an upstream issue from lxml.

@Gallaecio Gallaecio added the bug label Oct 30, 2020
@felipeboffnunes
Copy link
Member

felipeboffnunes commented Nov 1, 2022

@Gallaecio
Copy link
Member

I don’t think Parsel intends to require that input HTML is standard-compliant. Ideally, anything that a browser accepts we should accept as well, because HTML documents in the wild care about browser support more than they care about standard compliance.

Browsers seem to accept this syntax.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants