Author: Kyriafinis Vasilis
Winter Semester 2022 - 2023
The goal of this project is to implement the KNN and K-means classification algorithms and compare their performance. The project is implemented in C++ and the results of the comparison can be found in the report pdf file.
To setup this repository on your local machine run the following command on the terminal:
$ git clone git@github.com:Billkyriaf/Neural_Networks_1.git
Or alternatively download and extract the zip file of the repository.
This project uses make utilities to build and run the executables.
This project uses C++ 14 features and requires a compiler that supports C++ 14.
This project uses the progressbar library by gipert to display a progress bar during the execution of the program. The library is included in the repository and no additional setup is required. For more information about the library visit the github page.
The dataset used in this project is the Mnist dataset. The dataset is included in the repository and no additional setup is required.
To build the executables from the root directory of the repository run the following command on the terminal:
$ cd knn_classifier
In the knn_classifier directory you can find the Makefile that is used to build the executables.
IMPORTANT!
Before building the executables make sure that the MNIST dataset is in the data
directory. If the dataset is not in the data
directory you can download it from the official website and extract it in the data
directory. The data
directory should contain the following files:
train-images.idx3-ubyte
train-labels.idx1-ubyte
t10k-images.idx3-ubyte
t10k-labels.idx1-ubyte
The Makefile contains the following targets:
# knn classifier
$ make run_knn
Arguments:
# The kkn executable requires 2 mandatory arguments and 3 optional arguments
# Mandatory arguments:
# -d <dataset> : The directory of the dataset
# -k <int> : The number of neighbors to use
#
# Optional arguments:
# -t <int> : The number of threads to use (default: 16)
# -n <int> : The number of images to use for testing (default: 10000)
# -s <int> : The starting index of the testing images (default: 0)
To change the arguments edit the Makefile here.
# k-means classifier
$ make run_ncc
Arguments:
# The ncc executable requires 1 mandatory arguments and 2 optional arguments
# Mandatory arguments:
# -d <dataset> : The directory of the dataset
#
# Optional arguments:
# -n <int> : The number of images to use for testing (default: 10000)
# -s <int> : The starting index of the testing images (default: 0)
To change the arguments edit the Makefile here.
# k-means clustering
$ make run_ncc_clustering
Arguments:
# The ncc_clustering executable requires 2 mandatory arguments and 1 optional flag
# Mandatory arguments:
# -d <dataset> : The directory of the dataset
# -c <int> : The number of clusters to use
#
# Optional arguments:
# -fit : If the fit flag is set the program will fit the clusters from scratch else it will use
# the clusters from the previous runs. The first time the program must be run with the fit
# flag set. Every time the clusters are incremented the fit flag must be set.
To change the arguments edit the Makefile here.