Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Create ScrapedPage object #32

Open
jeremybmerrill opened this issue Jan 15, 2014 · 1 comment
Open

Create ScrapedPage object #32

jeremybmerrill opened this issue Jan 15, 2014 · 1 comment
Milestone

Comments

@jeremybmerrill
Copy link
Contributor

Which is what would be yielded out of Scraper#scrape instead of the HTML, the URL, and instance page's index, etc.

This ScrapedPage object -- which might inherit from Nokogiri::HTML -- would contain the raw HTML, the parsed HTML, the URL, the index page from which the instance page was linked (if present), a reference to the index page's ScrapedPage object, and the instance page's index (i.e. ordinal count) of pages linked to from the index page.

This would be a breaking change, so is farther away from being implemented into stable Upton.

@jeremybmerrill jeremybmerrill added this to the 1.0.0 milestone Feb 15, 2014
@jeremybmerrill
Copy link
Contributor Author

Implemented in future (for 1.0.0) in 31cbf41

Will be minimally breaking, since missing methods on Page are passed through to Nokogiri::HTML.

Maybe I should implement this even-less-breakingly in 0.4.0 by still passing the instance_index, instance_url, etc. attrs through to blk.call?

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant