Skip to content

Commit

Permalink
readme updated
Browse files Browse the repository at this point in the history
  • Loading branch information
ciminilorenzo committed Nov 13, 2024
1 parent 6575e40 commit 8855fc9
Showing 1 changed file with 46 additions and 28 deletions.
74 changes: 46 additions & 28 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,9 @@
# webgraph-ans

Web graphs are fundamental structures that represent the very complex nature of a specific portion of the World Wide Web
used in a variety of applications such as search engines. In practical terms, a web graph is a mathematical
abstraction in which there is a node for each web page and an arc from node 𝑥 to node 𝑦 if the page associated with the first node contains a hyperlink to the page associated with
In practical terms, a web graph is a mathematical abstraction in which there is a node for each web page
and an arc from node 𝑥 to node 𝑦 if the page associated to the first node contains a hyperlink to the page associated to
the second. It's easy to guess that the size of these enormous structures makes the traditional way of storing them
obsolete.
not efficient.

One of the greatest frameworks built with the goal of compressing web graphs is [WebGraph](https://github.com/vigna/webgraph-rs),
a framework that, beyond offering various tools that can be used to operate un such structures, exploits the properties
Expand All @@ -18,54 +17,73 @@ the graph in the BvGraph format.
This crate supplies two different unit structs exposing methods to load a previously ANS-encoded BvGraph or recompress
a BvGraph using the ANS-based approach.

Two unit structures, `ANSBvGraph` and `ANSBvGraphSeq`, are available to load respectively a [`BvGraph`] or
a [`BvGraphSeq`].
Since this crate is based on [webgraph](https://github.com/vigna/webgraph-rs), please refer to its
docs to learn more about the files and structs cited next.

### ANSBvGraph
Can be used to load a [`BvGraph`], that is a graph that can be visited both randomly and iteratively. To
correctly load an ANS-encoded graph and retrieve a [`BvGraph`], the user needs to supply the following
files: `BASENAME.ans`, `BASENAME.pointers` and `BASENAME.states`.
### Loading a BVGraphSeq with ANSBvGraphSeq
To load a [`BvGraphSeq`], you only need the `BASENAME.ans`file. Then, you can use `ANSBvGraphSeq`:

```ignore
let graph = ANSBvGraphSeq::load("BASENAME")?;
```

### Loading a BVGraph with ANSBvGraph
To load a [`BvGraph`] the user needs to supply the following files: `BASENAME.ans`, `BASENAME.pointers` and
`BASENAME.states`.

```ignore
let graph = ANSBvGraph::load("BASENAME")?;
```

This struct can be even used to recompress a BvGraph using the ANS-based approach. You just need
to use the method `ANSBvGraph::store()` to indicate where the BvGraph is located and where
the output of the encoding must be located, together with customized compression parameters if
needed.
### Recompressing a BvGraph using bvcomp
No matter which approach you use, recompressing a BvGraph will produce the `<graph_name>.ans`,
`<graph_name>.pointers` and`<graph_name>.states` files in the output directory.

The first approach is using the bvcomp binary:

We can achieve the same goal by using the binary `bvcomp`:
1. Compile the bvcomp binary:

```ignore
$ cargo build --release --bin bvcomp
```

2. Run bvcomp to recompress the graph

```ignore
$ ./target/release/bvcomp <path_to_graph> <output_dir> <new_graph_name> [<compression_params>]
```
For example

For example:
```ignore
$ ./target/release/bvcomp tests/data/cnr-2000/cnr-2000 ans-cnr-2000
```

This command recompresses the cnr-2000.graph file located in the tests/data/cnr-2000/ directory using the default
compression parameters and stores in the output directory the following files: `ans-cnr-2000.pointers`,
`ans-cnr-2000.states`and `ans-cnr-2000.ans`.

**Note** <compression_params> is optional. When not specified, default compression values indicated [`here`] are utilized.
This command recompresses the cnr-2000.graph file located in tests/data/cnr-2000/ and saves the output in
current directory with the name ans-cnr-2000.

Note: [compression_params] is optional. If omitted, [`default`] values are used.


### ANSBvGraphSeq
Can be used to load a [`BvGraphSeq`], that is a graph that can be visited iteratively. To
correctly load an ANS-encoded graph and retrieve a [`BvGraphSeq`], the user needs to supply the following
files: `BASENAME.ans`.
### Recompressing a BvGraph using ANSBvGraph::store()
ANSBvGraph can be even used to recompress a BvGraph using the ANS-based approach. You just need
to use the method `ANSBvGraph::store()` to indicate where the BvGraph is located and where
the output of the encoding must be located, together with customized compression parameters if
needed.

```ignore
let graph = ANSBvGraphSeq::load("BASENAME")?;
ANSBvGraph::store(
basename,
new_basename,
compression_window,
max_ref_count as usize,
min_interval_length,
)?;
```


### Results

PS: BV graphs can be found [here](http://law.di.unimi.it/datasets.php).

[`BvGraph`]: <https://docs.rs/webgraph/0.1.4/webgraph/graphs/bvgraph/random_access/struct.BvGraph.html>
[`BvGraphSeq`]: <https://docs.rs/webgraph/0.1.4/webgraph/graphs/bvgraph/sequential/struct.BvGraphSeq.html>
[`here`]: <https://docs.rs/webgraph/0.1.4/src/webgraph/cli/mod.rs.html#172-206>
[`default`]: <https://docs.rs/webgraph/0.1.4/src/webgraph/cli/mod.rs.html#172-206>

0 comments on commit 8855fc9

Please # to comment.