What is BeautifulSoup?

BeautifulSoup is a Python library from www.crummy.com

Features:

Beautiful Soup parses anything you give it, and does the tree traversal stuff for you.

Find all the links in page.
Find specific class objects
Find urls that match abc.com
Find table headings
Find specific text

Installation

pip install beautifulsoup4

Make sure you have pip installed in system. Check whether pip is installed or not by cmd:

pip --version

How To

Basic Setup

from bs4 import BeautifulSoup

# imports beautifulsoup4

soup = BeautifulSoup(html_doc, 'html.parser')

#parses the html_doc with html parser (There are other parsers available too)

The Source Code

print(soup.prettify())                

#prints the whole source code of document in formatted way

Find all links

for a in soup.findAll('a',href=True):
    print(a.text)

Find specific class div objects

mydivs = soup.findAll("div", { "class" : "stylelistrow" })

Find all tables

tables = soup.findAll("table")

for table in tables:
    if table.findParent("table") is None:
        print(str(table))

Further Read:

Python Web Scraping Tutorial using BeautifulSoup
Web scraping and parsing with Beautiful Soup 4 Introduction
Beautiful Soup Documentation
Stackoverflow Questions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how-to-beautifulsoup.md

how-to-beautifulsoup.md

What is BeautifulSoup?

Files

how-to-beautifulsoup.md

Latest commit

History

how-to-beautifulsoup.md

File metadata and controls

What is BeautifulSoup?