BeautifulSoup is a Python library from www.crummy.com
Features:
Beautiful Soup parses anything you give it, and does the tree traversal stuff for you.
- Find all the links in page.
- Find specific class objects
- Find urls that match abc.com
- Find table headings
- Find specific text
Installation
pip install beautifulsoup4
Make sure you have pip installed in system. Check whether pip is installed or not by cmd:
pip --version
How To
Basic Setup
from bs4 import BeautifulSoup
# imports beautifulsoup4
soup = BeautifulSoup(html_doc, 'html.parser')
#parses the html_doc with html parser (There are other parsers available too)
The Source Code
print(soup.prettify())
#prints the whole source code of document in formatted way
Find all links
for a in soup.findAll('a',href=True):
print(a.text)
Find specific class div objects
mydivs = soup.findAll("div", { "class" : "stylelistrow" })
Find all tables
tables = soup.findAll("table")
for table in tables:
if table.findParent("table") is None:
print(str(table))
Further Read: