This repo contains notebooks performing clustering and classification on documents from the FUNSD dataset
The first notebook implements K-means and agglomerative clustering on the FUNSD dataset using visual and textual features, as well as Principant Component Analysis on the tokenized content of the documents for clusters visualization purposes.
The second notebook implements supervised classification by performing transfer learning on the VGG architecture, using the labels learned through clustering.
The notebooks make use of Scikit-learn and keras libraries.