The presentation describes the processing of Bulgarian text in a suitable way so that different machine learning algorithms can be trained in an attempt to find the most suitable settings for each of them, so as to achieve satisfactory results in binary classification. Special attention is paid to the preliminary preparation of the data, since most libraries are optimized for working with English, and using them for other languages (especially those with a highly developed inflectional grammatical system, such as Bulgarian) is a serious challenge. The notebook compares and analyzes the results of machine learning with three algorithms - Logistic regression, Naïve Bayes and SVM.
The notebook and the project files can be downloaded form classification_models_for_bulgarian.rar