Advanced Topics

This tutorial will show further options such as searching for specific publishers in the PublisherCollection or dealing with deprecated ones.

How to search for publishers

Using `search()`

There are quite a few differences between the publishers, especially in the attributes the underlying parser supports. You can search through the collection to get only publishers fitting your use case by utilizing the search() method.

Let's get some publishers based in the US, supporting an attribute called topics and NewsMap as a source, and use them to initialize a crawler afterward.

from fundus import Crawler, PublisherCollection, NewsMap

fitting_publishers = PublisherCollection.us.search(attributes=["topics"], source_types=[NewsMap])
crawler = Crawler(fitting_publishers)

Working with deprecated publishers

When we notice that a publisher is uncrawlable for whatever reason, we will mark it with a deprecated flag. This mostly has internal usages, since the default value for the Crawler ignore_deprecated flag is False. You can alter this behaviour when initiating the Crawler and setting the ignore_deprecated flag.

In the next section we introduce you to Fundus logging mechanics.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

5_advanced_topics.md

5_advanced_topics.md

Table of Contents

Advanced Topics

How to search for publishers

Using `search()`

Working with deprecated publishers

Files

5_advanced_topics.md

Latest commit

History

5_advanced_topics.md

File metadata and controls

Table of Contents

Advanced Topics

How to search for publishers

Using search()

Working with deprecated publishers

Using `search()`