diff --git a/README.md b/README.md index 18bd7bf..8a90237 100644 --- a/README.md +++ b/README.md @@ -1,10 +1,9 @@ # webgraph-ans -Web graphs are fundamental structures that represent the very complex nature of a specific portion of the World Wide Web -used in a variety of applications such as search engines. In practical terms, a web graph is a mathematical -abstraction in which there is a node for each web page and an arc from node 𝑥 to node 𝑦 if the page associated with the first node contains a hyperlink to the page associated with +In practical terms, a web graph is a mathematical abstraction in which there is a node for each web page +and an arc from node 𝑥 to node 𝑦 if the page associated to the first node contains a hyperlink to the page associated to the second. It's easy to guess that the size of these enormous structures makes the traditional way of storing them -obsolete. +not efficient. One of the greatest frameworks built with the goal of compressing web graphs is [WebGraph](https://github.com/vigna/webgraph-rs), a framework that, beyond offering various tools that can be used to operate un such structures, exploits the properties @@ -18,54 +17,73 @@ the graph in the BvGraph format. This crate supplies two different unit structs exposing methods to load a previously ANS-encoded BvGraph or recompress a BvGraph using the ANS-based approach. -Two unit structures, `ANSBvGraph` and `ANSBvGraphSeq`, are available to load respectively a [`BvGraph`] or -a [`BvGraphSeq`]. +Since this crate is based on [webgraph](https://github.com/vigna/webgraph-rs), please refer to its +docs to learn more about the files and structs cited next. -### ANSBvGraph -Can be used to load a [`BvGraph`], that is a graph that can be visited both randomly and iteratively. To -correctly load an ANS-encoded graph and retrieve a [`BvGraph`], the user needs to supply the following -files: `BASENAME.ans`, `BASENAME.pointers` and `BASENAME.states`. +### Loading a BVGraphSeq with ANSBvGraphSeq +To load a [`BvGraphSeq`], you only need the `BASENAME.ans`file. Then, you can use `ANSBvGraphSeq`: + +```ignore + let graph = ANSBvGraphSeq::load("BASENAME")?; +``` + +### Loading a BVGraph with ANSBvGraph +To load a [`BvGraph`] the user needs to supply the following files: `BASENAME.ans`, `BASENAME.pointers` and +`BASENAME.states`. ```ignore let graph = ANSBvGraph::load("BASENAME")?; ``` -This struct can be even used to recompress a BvGraph using the ANS-based approach. You just need -to use the method `ANSBvGraph::store()` to indicate where the BvGraph is located and where -the output of the encoding must be located, together with customized compression parameters if -needed. +### Recompressing a BvGraph using bvcomp +No matter which approach you use, recompressing a BvGraph will produce the `.ans`, +`.pointers` and`.states` files in the output directory. + +The first approach is using the bvcomp binary: -We can achieve the same goal by using the binary `bvcomp`: +1. Compile the bvcomp binary: ```ignore $ cargo build --release --bin bvcomp +``` + +2. Run bvcomp to recompress the graph + +```ignore $ ./target/release/bvcomp [] ``` +For example -For example: ```ignore $ ./target/release/bvcomp tests/data/cnr-2000/cnr-2000 ans-cnr-2000 ``` -This command recompresses the cnr-2000.graph file located in the tests/data/cnr-2000/ directory using the default -compression parameters and stores in the output directory the following files: `ans-cnr-2000.pointers`, -`ans-cnr-2000.states`and `ans-cnr-2000.ans`. - -**Note** is optional. When not specified, default compression values indicated [`here`] are utilized. +This command recompresses the cnr-2000.graph file located in tests/data/cnr-2000/ and saves the output in +current directory with the name ans-cnr-2000. +Note: [compression_params] is optional. If omitted, [`default`] values are used. - -### ANSBvGraphSeq -Can be used to load a [`BvGraphSeq`], that is a graph that can be visited iteratively. To -correctly load an ANS-encoded graph and retrieve a [`BvGraphSeq`], the user needs to supply the following -files: `BASENAME.ans`. +### Recompressing a BvGraph using ANSBvGraph::store() +ANSBvGraph can be even used to recompress a BvGraph using the ANS-based approach. You just need +to use the method `ANSBvGraph::store()` to indicate where the BvGraph is located and where +the output of the encoding must be located, together with customized compression parameters if +needed. ```ignore - let graph = ANSBvGraphSeq::load("BASENAME")?; + ANSBvGraph::store( + basename, + new_basename, + compression_window, + max_ref_count as usize, + min_interval_length, + )?; ``` + +### Results + PS: BV graphs can be found [here](http://law.di.unimi.it/datasets.php). [`BvGraph`]: [`BvGraphSeq`]: -[`here`]: \ No newline at end of file +[`default`]: \ No newline at end of file