The k-means NUMA Optimized Routine library or knor is a highly optimized and fast library for computing k-means in parallel with accelerations for Non-Uniform Memory Access (NUMA) architectures.
knor can perform at 10-100 times the speed of popular packages like Spark's MLlib, Dato (GraphLab) and H2O.
knor can operate:
- On a single machine with all data in-memory -- knori.
- In a cluster in distributed memory -- knord.
- On a single machine with some data in-memory and the rest on disk (Semi-External Memory) -- knors.
These modules are incarnations of algorithms in our publication which will appear in the annuls of HPDC 2017.
By default knor can use a scalable adaption of Elkan's algorithm that can drastically reduce the number of distance computations required for k-means without the memory bloat traditionally associated with Elkan's algorithm.
We expose the knori to users through Python
. We support Linux and Mac OSX.
For instructions on how to install the Python
package please see the
Python repository.
In most cases all that is necessary is:
pip install knor
We expose the knori to users through R
. We support Linux, Mac OSX, and Windows.
For instructions on how to install the R
package please see the
R repository.
In most cases all that is necessary is:
install.packages("knor")
knor relies on the following:
- p-threads for multithreading.
- numa for NUMA optimization.
- FlashGraph for a semi-external memory vertex-centric interface.
- Linux or Mac OS 10.*
- At least 4 (GB) of RAM
- Administrative privileges
The following is Tested on Ubuntu 14.04 - 16.04. We require a fairly modern compiler to take advantage of compile time optimizations:
We assume users have git
already installed.
git clone --recursive https://github.com/flashxio/knor.git
cd knor
./bootstrap.sh
Users can also choose to use a Docker container to run knor.
Mac Docker Users NOTE:
- You must use the
-O
flag for knori and knord to use OpenMP due to a lack of hardware support for our optimizations. - knors for Mac is unsupported due to the lack of support for low-level interfaces.
Users can get a version that is stable, but possibly not up to date as follows:
docker run -i -t flashxio/knor
If you already have the knor docker image then updating to the newest stable release can be done as follows:
docker pull flashxio/knor
Users may also obtain a bleeding-edge up to date stable docker image by
building the image themselves. We assume your
OS has curl
and
docker
already installed.
curl -O https://raw.githubusercontent.com/flashxio/knor/master/Dockerfile
docker build -t knor .
docker run -i -t knor
Assume the following:
k
is the number of clustersdim
is the dimensionality of the featuresnsamples
is the number of samples$KNOR_HOME
is the directory created when knor was cloned from Github.nproc
is the number of MPI processes to use for knord.nthread
is the number of threads per process. For knori this is the total number of threads of execution. NOTE: For knord the total number of threads of executions isnproc * nthread
.
To run modules from the $KNOR_HOME
directory, you may do the following:
We maintain a very small dataset for testing:
$KNOR_HOME/test-data/matrix_r50_c5_rrw.bin
.
This dataset has:
nsamples
=50
dim
=5
NOTE: knor will fail if you provide more units of parallelism than the
nsamples
in the dataset. So use small -t
and -T
values like 2
For a help message and to see valid flags:
exec/knori
An example of how to process with file datafile.bin
:
exec/knori datafile.bin nsamples dim k \
-t random \
-T 2 \
-i 10 \
-o outdir
For comparison (slower speed) run our algorithm using OpenMP as follows:
exec/knori datafile.bin nsamples dim k \
-t random \
-T nthread \
-i 10 \
-o outdirOMP -O
It is also possible to disable computataion pruning i.e., using Minimal
triangle inequality algorithm by using the -P
flag.
For a help message and to see valid flags:
exec/knord
An example of how to process with file datafile.bin
:
mpirun.mpich -n nproc exec/knord datafile nsamples dim k \
-t random -i 10 -T nthread -o outdir
See the mpirun list of flags like to allow your processes to distribute correctly across a cluster. Flags of note are:
--map-by ppr:n:node
, determining how many processes are mapped to a node.--map-by ppr:n:socket
, determining how processes per socket.
For semi-external memory use, users must configure the underlying user-space filesystem SAFS as described here
For a help message:
exec/knors
An example of how to process with file
matrix_r50_c5_rrw.adj
:
exec/knors libsem/FlashX/flash-graph/conf/run_graph.txt test-data/matrix_r50_c5_rrw.adj 50 5 8
The following flags can further aid to accelerate knors performance.
-r: size of the row cache in gb
Specifies another layer within the memory hierarchy to allow for caching at the granularity of rows of data rather than a page like traditional caches. In practice a small cache of 0.5GB-1GB can accelerate the performance of 10 Million+ samples.
-I: row cache update interval
The row cache lazily updates users must specify how often to refresh the cache in iterations of the algorithm. Note that as iterations proceed the cache will automatically update less frequently as computation stabilizes.
The output file obtained by using the -o
flag within knor produces a
YAML file that is easily readable by any YAML reader.
For instance, within Python we can use
PyYAML which is installable via
by pip install pyyaml
to read the file as follows:
from yaml import load
with open("kmeans_t.yml", "r") as f:
kms = load(f) # Returns a python dictionary
kms.keys()
> ['niter', 'dim', 'nsamples', 'cluster', 'k', 'centroids', 'size'
The fields are as follows:
niter
: The number of iterations performed.dim
: The number of attributes within the dataset.nsamples
: The number of samples within the dataset.cluster
: To which cluster, [0-k), each sample is assigned.k
: The number of clusters requested.centroids
: Thek x dim
centroids matrix with each row representing a cluster's centroid.size
: The number of samples placed within each cluster.
We use two different formats dependent upon if you operate fully in-memory or use semi-external memory.
In-memory and distributed memory routines (knori, knord) use row-major
binary matrices of type double
i.e., 16 Bytes per entry. For example the
matrix:
1 2 3 4
5 6 7 8
Must be written out in row-major form i.e., the bytes would be organized such that on disk they would look like:
1 2 3 4 5 6 7 8
NOTE: The row-major data on disk will be non-human readable since it is in binary format.
Semi-external memory (knors) data is stored in row-major format with
a leading 4KB FlashGraph header. You can convert binary or text data to knors format using the convert_matrix
utility.
We provide a single threaded data conversion script to convert data from one format to the other.
We consider a file named example.txt
in our knor root directory,
$KNOR_HOME
with the following content:
1 2 3 4
5 6 7 8
To convert this file from text readable (tsv) format one would do the following:
cd $KNOR_HOME
utils/convert_matrix # gives help on how to use the converter
> usage: ./convert_matrix in_filename in_format [text/knori/knord/knors]\
out_filename out_format[text/knori/knord/knors] nsamples dim
utils/convert_matrix example.txt \
text \
example.dat\
knori\
2 4
For knori and knord the format can
be produced by any binary writing utility in any language. **NOTE: ** knors
is a custom format that should be generated using the convert_matrix
utility.
Below are a couple examples of writing knori and knord format from some
higher-level languages.
The following is an example of how to take an R data frame and write out to raw binary row-major format:
# `mtcars` is a data frame loaded by default into R
mtcars # Take a look at the data
nbytes.per.elem = 8 # NOTE: Always true
mtvec <- as.vector(t(mtcars)) # Turn dataframe into matrix (`t`) then vector
writeBin(mtvec, "yourfilename.bin", size=nbytes.per.elem)
The following is an example of how to take a numpy ndarray and write it out to raw binary row-major format:
import numpy as np
np.random.seed(1) # For consistency
dat = np.random.rand(16, 8)
dat.shape # We have a 16 x 8 matrix
type(dat[0][0]) # Check type is double i.e. 8 bytes = 64 bits
# dat = dat.astype(np.float64) # if data type is wrong
dat.tofile("yourfilename.bin")
To update knor simply run the update script which pulls directly from Github:
cd $KNOR_HOME
./update.sh
- Different knor routines may not produce the same result dependent upon which initialization routine is used, but every routine will give the same answer over multiple runs.
- Found a bug? Please create an issue on Github.
Mhembere, D., Zheng, D., Vogelstein, J. T., Priebe, C. E., & Burns, R. (2017). knor: A NUMA-optimized In-memory, Distributed and Semi-external-memory k-means Library. arXiv preprint arXiv:1606.08905.