diff --git a/README.MD b/README.MD index d409a49..e489c57 100644 --- a/README.MD +++ b/README.MD @@ -11,45 +11,27 @@ a framework that, beyond offering various tools that can be used to operate un s of web graphs (locality and similarity), as well as some other ideas tailored for the context, to compress them in an efficient format called BVGraph. This project aims to improve the records of the mentioned frameworks, which have been standing for almost -two decades, by adding another layer of compression by means of -[Asymmetrical Numeral Systems](https://en.wikipedia.org/wiki/Asymmetric_numeral_systems) (ANS) over the first layer of compression performed by WebGraph. - +two decades, by switching from instantaneous codes to [Asymmetrical Numeral Systems](https://en.wikipedia.org/wiki/Asymmetric_numeral_systems) (ANS) when compressing +the graph in the BvGraph format. ### Compressor -Since the BVGraph format is composed of 9 models (Outdegree, ReferenceOffset, BlockCount, Blocks, IntervalCount, -IntervalStart, IntervalLen, FirstResidual, Residual), the model used by the compressor is going to be switched on the -fly among the nine built for each specific component of the compression format. Moreover, to overcome the problem of -dealing with enormous alphabets (we set as maximum symbol $2^{48} - 1$), the symbol folding technique introduced -by Moffat and Petri in their [work](https://dl.acm.org/doi/10.1145/3397175) is -implemented. - -In general, the coder uses a 32-bit state and 16-bit renormalization step with some other constraints discussed below. - -The compressor's interval for each model is defined as: -```math - I = [M * K, M * K * B) -``` -where: -1. $M$ is the sum (power of two) of all approximated symbols frequencies for a specific model. -2. $K = 2^{16} - M$ -3. $B = 2^{16}$ - -All this is done to guarantee that the shared interval between all models is $[2^{16}, 2^{32})$, a guarantee needed to make the compressor -able to switch models on the fly. - -PS: This implementation assumes that the most frequent symbols are the smallest positive integers. +WIP ### Binaries The binary that can be used to recompress a .graph is bvcomp: ``` $ cargo build --release --bin bvcomp -$ ./target/release/bvcomp +$ ./target/release/bvcomp [] ``` For example: ``` $ ./target/release/bvcomp tests/data/cnr-2000/cnr-2000 ans-cnr-2000 ``` -recompresses with standard compression parameters the cnr-2000.graph file in the tests/data/cnr-2000/ directory and save -the new compressed graph in the current directory with the name ans-cnr-2000.graph. +This command recompresses the cnr-2000.graph file located in the tests/data/cnr-2000/ directory using the default compression parameters and saves the new compressed graph in the current directory as ans-cnr-2000.graph. + +Note: is optional. When not specified, default compression values are utilized. + +### Benches +WIP PS: graphs can be found [here](http://law.di.unimi.it/datasets.php). \ No newline at end of file