Create ScrapedPage object #32

jeremybmerrill · 2014-01-15T01:13:15Z

Which is what would be yielded out of Scraper#scrape instead of the HTML, the URL, and instance page's index, etc.

This ScrapedPage object -- which might inherit from Nokogiri::HTML -- would contain the raw HTML, the parsed HTML, the URL, the index page from which the instance page was linked (if present), a reference to the index page's ScrapedPage object, and the instance page's index (i.e. ordinal count) of pages linked to from the index page.

This would be a breaking change, so is farther away from being implemented into stable Upton.

The text was updated successfully, but these errors were encountered:

jeremybmerrill · 2014-02-16T22:45:50Z

Implemented in future (for 1.0.0) in 31cbf41

Will be minimally breaking, since missing methods on Page are passed through to Nokogiri::HTML.

Maybe I should implement this even-less-breakingly in 0.4.0 by still passing the instance_index, instance_url, etc. attrs through to blk.call?

jeremybmerrill added this to the 1.0.0 milestone Feb 15, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create ScrapedPage object #32

Create ScrapedPage object #32

jeremybmerrill commented Jan 15, 2014

jeremybmerrill commented Feb 16, 2014

Create ScrapedPage object #32

Create ScrapedPage object #32

Comments

jeremybmerrill commented Jan 15, 2014

jeremybmerrill commented Feb 16, 2014