Skip to content

microformats/mf2py

Repository files navigation

mf2py banner

version downloads license python-version

Welcome 👋

mf2py is a Python microformats parser with full support for microformats2, backwards-compatible support for microformats1 and experimental support for metaformats.

Installation 💻

To install mf2py run the following command:

$ pip install mf2py

Quickstart 🚀

Import the library:

>>> import mf2py

Parse an HTML Document from a file or string

>>> with open("test/examples/eras.html") as fp:
...     mf2json = mf2py.parse(doc=fp)
>>> mf2json
{'items': [{'type': ['h-entry'],
            'properties': {'name': ['Excited for the Taylor Swift Eras Tour'],
                           'author': [{'type': ['h-card'],
                                       'properties': {'name': ['James'],
                                                      'url': ['https://example.com/']},
                                       'value': 'James',
                                       'lang': 'en-us'}],
                           'published': ['2023-11-30T19:08:09'],
                           'featured': [{'value': 'https://example.com/eras.jpg',
                                         'alt': 'Eras tour poster'}],
                           'content': [{'value': "I can't decide which era is my favorite.",
                                        'lang': 'en-us',
                                        'html': "<p>I can't decide which era is my favorite.</p>"}],
                           'category': ['music', 'Taylor Swift']},
            'lang': 'en-us'}],
 'rels': {'webmention': ['https://example.com/mentions']},
 'rel-urls': {'https://example.com/mentions': {'text': '',
                                               'rels': ['webmention']}},
 'debug': {'description': 'mf2py - microformats2 parser for python',
           'source': 'https://github.com/microformats/mf2py',
           'version': '2.0.1',
           'markup parser': 'html5lib'}}
>>> mf2json = mf2py.parse(doc="<a class=h-card href=https://example.com>James</a>")
>>> mf2json["items"]
[{'type': ['h-card'],
  'properties': {'name': ['James'],
                 'url': ['https://example.com']}}]

Parse an HTML Document from a URL

>>> mf2json = mf2py.parse(url="https://events.indieweb.org")
>>> mf2json["items"][0]["type"]
['h-feed']
>>> mf2json["items"][0]["children"][0]["type"]
['h-event']

Experimental Options

The following options can be invoked via keyword arguments to parse() and Parser().

expose_dom

Use expose_dom=True to expose the DOM of embedded properties.

metaformats

Use metaformats=True to include any metaformats found.

filter_roots

Use filter_roots=True to filter known conflicting user names (e.g. Tailwind). Otherwise provide a custom list to filter instead.

Advanced Usage

parse is a convenience function for Parser. More sophisticated behaviors are available by invoking the parser object directly.

>>> with open("test/examples/festivus.html") as fp:
...     mf2parser = mf2py.Parser(doc=fp)

Filter by Microformat Type

>>> mf2json = mf2parser.to_dict()
>>> len(mf2json["items"])
7
>>> len(mf2parser.to_dict(filter_by_type="h-card"))
3
>>> len(mf2parser.to_dict(filter_by_type="h-entry"))
4

JSON Output

>>> json = mf2parser.to_json()
>>> json_cards = mf2parser.to_json(filter_by_type="h-card")

Breaking Changes in mf2py 2.0

  • Image alt support is now on by default.

Notes 📝

  • If you pass a BeautifulSoup document it may be modified.
  • A hosted version of mf2py is available at python.microformats.io.

Contributing 🛠️

We welcome contributions and bug reports via GitHub.

This project follows the IndieWeb code of conduct. Please be respectful of other contributors and forge a spirit of positive co-operation without discrimination or disrespect.

License 🧑‍⚖️

mf2py is licensed under an MIT License.