-
Notifications
You must be signed in to change notification settings - Fork 13.1k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Use BTreeMap with u128 values for sparse bit sets/"vectors" (in dataflow etc.). #47575
Comments
Is the idea that the key would be |
@nikomatsakis That's pretty much it, yes. Here's what I've been playing with so far: // FIXME(eddyb) move to rustc_data_structures.
#[derive(Clone)]
pub struct SparseBitSet<I: Idx> {
map: BTreeMap<u32, u128>,
_marker: PhantomData<I>,
}
fn key_and_mask<I: Idx>(index: I) -> (u32, u128) {
let index = index.index();
let key = index / 128;
let key_u32 = key as u32;
assert_eq!(key_u32 as usize, key);
(key_u32, 1 << (index % 128))
}
impl<I: Idx> SparseBitSet<I> {
pub fn new() -> Self {
SparseBitSet {
map: BTreeMap::new(),
_marker: PhantomData
}
}
pub fn capacity(&self) -> usize {
self.map.len() * 128
}
pub fn contains(&self, index: I) -> bool {
let (key, mask) = key_and_mask(index);
self.map.get(&key).map_or(false, |bits| (bits & mask) != 0)
}
pub fn insert(&mut self, index: I) -> bool {
let (key, mask) = key_and_mask(index);
let bits = self.map.entry(key).or_insert(0);
let old_bits = *bits;
let new_bits = old_bits | mask;
*bits = new_bits;
new_bits != old_bits
}
pub fn remove(&mut self, index: I) -> bool {
let (key, mask) = key_and_mask(index);
if let Some(bits) = self.map.get_mut(&key) {
let old_bits = *bits;
let new_bits = old_bits & !mask;
*bits = new_bits;
// FIXME(eddyb) maybe remove entry if now `0`.
new_bits != old_bits
} else {
false
}
}
pub fn iter<'a>(&'a self) -> impl Iterator<Item = I> + 'a {
self.map.iter().flat_map(|(&key, &bits)| {
let base = key as usize * 128;
(0..128).filter_map(move |i| {
if (bits & (1 << i)) != 0 {
Some(I::new(base + i))
} else {
None
}
})
})
}
} |
Makes sense. I wonder if it would be useful in NLL too. |
One thing that might be worth considering is using binmaps. |
We should also benchmark "sparse bit matrices" with EDIT: This makes iteration within a "row" harder, especially with the |
I've added a "chunked" API to my // FIXME(eddyb) move to rustc_data_structures.
#[derive(Clone)]
pub struct SparseBitSet<I: Idx> {
chunk_bits: BTreeMap<u32, u128>,
_marker: PhantomData<I>,
}
#[derive(Copy, Clone)]
pub struct SparseChunk<I> {
key: u32,
bits: u128,
_marker: PhantomData<I>,
}
impl<I: Idx> SparseChunk<I> {
pub fn one(index: I) -> Self {
let index = index.index();
let key_usize = index / 128;
let key = key_usize as u32;
assert_eq!(key as usize, key_usize);
SparseChunk {
key,
bits: 1 << (index % 128),
_marker: PhantomData
}
}
pub fn any(&self) -> bool {
self.bits != 0
}
pub fn iter(&self) -> impl Iterator<Item = I> {
let base = self.key as usize * 128;
let mut bits = self.bits;
(0..128).map(move |i| {
let current_bits = bits;
bits >>= 1;
(i, current_bits)
}).take_while(|&(_, bits)| bits != 0)
.filter_map(move |(i, bits)| {
if (bits & 1) != 0 {
Some(I::new(base + i))
} else {
None
}
})
}
}
impl<I: Idx> SparseBitSet<I> {
pub fn new() -> Self {
SparseBitSet {
chunk_bits: BTreeMap::new(),
_marker: PhantomData
}
}
pub fn capacity(&self) -> usize {
self.chunk_bits.len() * 128
}
pub fn contains_chunk(&self, chunk: SparseChunk<I>) -> SparseChunk<I> {
SparseChunk {
bits: self.chunk_bits.get(&chunk.key).map_or(0, |bits| bits & chunk.bits),
..chunk
}
}
pub fn insert_chunk(&mut self, chunk: SparseChunk<I>) -> SparseChunk<I> {
if chunk.bits == 0 {
return chunk;
}
let bits = self.chunk_bits.entry(chunk.key).or_insert(0);
let old_bits = *bits;
let new_bits = old_bits | chunk.bits;
*bits = new_bits;
let changed = new_bits ^ old_bits;
SparseChunk {
bits: changed,
..chunk
}
}
pub fn remove_chunk(&mut self, chunk: SparseChunk<I>) -> SparseChunk<I> {
if chunk.bits == 0 {
return chunk;
}
let changed = match self.chunk_bits.entry(chunk.key) {
Entry::Occupied(mut bits) => {
let old_bits = *bits.get();
let new_bits = old_bits & !chunk.bits;
if new_bits == 0 {
bits.remove();
} else {
bits.insert(new_bits);
}
new_bits ^ old_bits
}
Entry::Vacant(_) => 0
};
SparseChunk {
bits: changed,
..chunk
}
}
pub fn clear(&mut self) {
self.chunk_bits.clear();
}
pub fn chunks<'a>(&'a self) -> impl Iterator<Item = SparseChunk<I>> + 'a {
self.chunk_bits.iter().map(|(&key, &bits)| {
SparseChunk {
key,
bits,
_marker: PhantomData
}
})
}
pub fn contains(&self, index: I) -> bool {
self.contains_chunk(SparseChunk::one(index)).any()
}
pub fn insert(&mut self, index: I) -> bool {
self.insert_chunk(SparseChunk::one(index)).any()
}
pub fn remove(&mut self, index: I) -> bool {
self.remove_chunk(SparseChunk::one(index)).any()
}
pub fn iter<'a>(&'a self) -> impl Iterator<Item = I> + 'a {
self.chunks().flat_map(|chunk| chunk.iter())
}
} cc @spastorino |
Done here #48245 |
Should this be closed, now? |
Maybe? There are still some places we could use sparse sets that we are not today, I suppose. |
@rustbot release-assignment |
I'm going to go ahead and close. There's been a bunch of iterations on various sparse vs. dense tradeoffs throughout the compiler (particularly in the MIR-related code), I don't think a generic tracking issue remains useful. |
According to our current implementation of B-trees:
rust/src/liballoc/btree/node.rs
Lines 53 to 55 in 5965b79
it would appear that up to
11
key-value pairs can be stored in each node.For
u128
values representing128
set elements each,1408
set elements can be stored in a single allocation, with an overhead of around 50% compared toVec<u128>
in the dense case.Such sparse bitsets would be really useful for (e.g. dataflow) analysis algorithms, in situations where the bitset elements tend to be localized, with multi-"word" gaps in between local groups.
cc @nikomatsakis @pnkfelix
The text was updated successfully, but these errors were encountered: