Skip to content

[WIP] Extract opengraph from body as well #129

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

lopuhin
Copy link
Member

@lopuhin lopuhin commented Apr 8, 2020

Normally opengraph <meta property=".." content=".."> tags are in the head, but having them in the body is also surprisingly common - in our internal article dataset they are present in body on 5% of all pages (out of all pages with such tags anywhere on the page), and on 12% for products.

One such example is https://www.reuters.com/article/us-health-coronavirus-apple/coronavirus-case-at-apples-irish-hq-trinity-college-goes-online-idUSKBN20X1QT - so it's even on a popular website.

TODO:

  • add tests
  • double-check what is happening with namespaces

@codecov
Copy link

codecov bot commented Apr 8, 2020

Codecov Report

Merging #129 into master will not change coverage by %.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #129   +/-   ##
=======================================
  Coverage   87.78%   87.78%           
=======================================
  Files          11       11           
  Lines         475      475           
  Branches      103      103           
=======================================
  Hits          417      417           
  Misses         52       52           
  Partials        6        6           
Impacted Files Coverage Δ
extruct/opengraph.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a365dc0...1187d9d. Read the comment docs.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant