This repository contains some of the scripts developed during the class "Content Management, Such- und Texttechnologien" (4th Semester) at the HTW Berlin and other scripts about Natural language processing.
The tasks include:
- Text preprocessing for German language
- Text similarity computation
- Text summarization
Dataset(s):
- Newspaper articles
- ....
Libraries used:
- nltk
- spacy
- sklearn
- pandas
- numpy
- networkX
- gensim
Literature/Inspiration
- Sarkar, Bali, Sharma - "Practical Machine Learning with Python", Apress
- Sarkar - "Text Analytics with Python", Apress
- Cucci - "A tu per tu col Machine Learning", thedotcompany
- Thanaki - "Python Natural Language Processing: Advanced machine learning and deep learning techniques for natural language processing. Packt Publishing. Kindle-Version.