Skip to content

Latest commit

 

History

History
83 lines (62 loc) · 5.38 KB

README.md

File metadata and controls

83 lines (62 loc) · 5.38 KB

erdospy

Fast sampling of the G(n, m)-model of Erdős–Rényi random graphs.

The time and space complexity of the algorithm is in O(m*samples). The result can be returned either as an integer np.ndarray of size (2, m, samples) or a listof integer sp.sparse.coo_matrix.

Installation

erdospy can be installed directly from source with:

pip install git+https://github.com/NiMlr/erdospy

Test your installation with:

import erdospy
erdospy.testall()

Generate five samples of the G(n, m)-model for n, m = 4, 3 with:

n = 4
m = 3
samples = 5
erdospy.sample_erdos_renyi_gnm(n, m, samples, random_state=42)
# array([[[1, 3, 3, 1, 2],
#         [2, 1, 2, 3, 3],
#         [3, 2, 1, 3, 3]],

#        [[0, 0, 1, 0, 1],
#         [0, 0, 0, 2, 2],
#         [2, 0, 0, 0, 1]]])

Empirical performance

Thanks to the powerful implementation of sklearn.utils.random.sample_without_replacement, it turns out that the method in many cases is not only faster than that of other packages

from erdospy import sample_erdos_renyi_gnm
import networkx as nx

n = 50
m = 600
samples = 2000

%timeit sample_erdos_renyi_gnm(n, m, samples, return_as="edge_array")
# 116 ms ± 1.37 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit sample_erdos_renyi_gnm(n, m, samples, return_as="adjacency_matrix")
# 359 ms ± 15.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit [nx.gnm_random_graph(n, m) for _ in range(samples)]
# 4.66 s ± 142 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

but it opens up the use-case n >> m

n = 1_000_000
m = 600
samples = 1

%timeit sample_erdos_renyi_gnm(n, m, samples, return_as="adjacency_matrix")
# 3.6 ms ± 196 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit [nx.gnm_random_graph(n, m) for _ in range(samples)]
# 931 ms ± 16.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# running nx.gnm_random_graph for even larger n will soon break down

Implementation details

The simple algorithm is based on the fast sampling of m edge indices in 0, 1, ..., n*(n-1)//2 using sklearn.utils.random.sample_without_replacement and the subsequent sample-wise constant time mapping onto the indices of the adjacency matrix A_{n, ij} -> (i, j), where the edge indices are associated with the corresponding matrix entry, due to undirectedness the cases j < i are sufficient, and


The mapping can be done without explicit representation of A_n by solving a vectorized quadratic equation. Per sample of the random graphs, this results in m edge tuples of form (i, j) and represented as a np.ndarray.