Create the training dataset very easily by using a smart chrome extension. This helps in annotating HTML contents of a web page using chrome extension & a flask web application app.
Annotator consists of 2 components.
- A chrome extension: This helps in the annotation of HTML tags from a given webpage
- A Flask app: This helps in storing annotated HTML tag with the help of SQLite.
Python 3.6 and above
Running Flask app
- Clone the Github repo:
git clone https://github.com/sachinkalsi/html_tag_annotator.git
pip3 install -r flask_app/requirements.txt
python3 flask_app/app.py
to start the server.- Flask server should be running on the port
5000
. Checkhttp://localhost:5000/
to verify.
Installing Chrome Extension
- Goto
chrome://extensions/
in the URL - Click on
Load unpacked
button & choose thechrome_extension
folder
- Make sure, flask server is running on the
5000
port - Create DB file if not created already (
python3 utils/create_db_file.py
) - Go to URL in chrome for which you need annotation
- Press capital
S
to start annotation - Once started, mouse hovers through the web page & click on the tag which needs annotation. (in the following demo, it is the publication date)
- Once selected, click on the
Save
button - Press capital
S
to stop annotation. - Look into
how_to_use.ipynb
notebook to know about the reading of the stored annotated data
Watch the following YouTube Playlist videos to know more about the usage and the installation:
Playlist link: https://www.youtube.com/playlist?list=PLfSv7CK7EjD2XmStXvZthQjGn1DAhfOaK
Installation link: https://youtu.be/MtQ1glIuzZ8