Never Flush Entities #18577
Replies: 3 comments 16 replies
-
It may not be useful now, but I think querying for entities with no components won't be so strange once Bevy's editor workflow matures and defining queries at runtime becomes a common thing. (I can imagine users wanting to know if entities without components exist when they're not expected to.)
Hmm, I don't remember why Too many intermediate archetype/table moves when applying commands is a known problem. I think that problem is better solved by command "batching" (#10154) where we'd "preview" all commands queued on the same entity to first figure out its destination archetype and then move/place it only once.
All of the downsides you've listed sound quite alarming to me. More coupling between the entity metadata and archetype structures, special handling requirements for the empty archetype, and the inherent complexity of the proposed allocation process seem like a combo that'll have future contributors (and even maintainers) very anxious about messing with this code. So even if this is faster, I'm reluctant to support it because of its inherent complexity. Bevy hasn't even presented |
Beta Was this translation helpful? Give feedback.
-
If I understand this correctly, freeing an entity will write it to the What prevents an additional "free" from happening between the decrement and the read during the recycle? That would overwrite the entity in I believe this is why the concurrent queue implementations use a "stamp" value, although I don't see a straightforward way to make that work with a stack instead of a queue. More generally, it looks like you're describing a custom concurrent queue implementation, but I'm still not clear how it differs from other queues or how you expect to do better than them.
This is clever! It reminds me a bit of the data structure used in |
Beta Was this translation helpful? Give feedback.
-
@maniwani Your criticism of this design is really helpful. I've been thinking more about it, and I think I now agree with more of what you've been saying. Here's some improvements that should make this less complex and easier to justify:
The only downside is that we would be releasing entities to public APIs that still have an invalid archetype. But that doesn't worry me very much. We already do this now (it's just that flush will be a catch all that applies soon). In practice, as long as we queue a move from invalid archetype to the empty archetype with each command that just spawns an empty, I think this should be a non issue. Do you have any thoughts here? This fixes a lot of valid issues you've brought up, but it could have problems of its own. |
Beta Was this translation helpful? Give feedback.
-
Background
This is heavily motivated by remote entity reservation, but much of it can be applied without that context.
How
Entities
works nowRight now, there are 2 parts of
Entities
. First, we keep aVec<EntityMeta>
that is the source of truth for where an entity is, who freed it, etc. Second, we keep apending
list that tracks which entities no longer exist and can be reused.Allocating and freeing entities are simple. To allocate, try to pull from the
pending
list, and if it's empty, extend themeta
list, and return a brand new entity. Once allocated,set
theEntityLocation
of the entity before releasing it though any public API. To free, make sure the entity exists, increase its generation, and add it to thepending
list.We also need to reserve entities, currently via
&Entities
but soon via anArc
-like structure that can be shared between threads and operate fully concurrently with&mut Entities
. Right now, (with&Entities
), We keep afree_cursor: AtomicISize
that tracks the target length ofpending
. When we reserve, we decrement thefree_cursor
as if we were popping from the list (just like an allocation). Iffree_cursor
goes negative, we just need to extend themeta
list by its absolute value. In other words, if we do an = free_cursor.fetch_sub(1)
, ifn >= 0
, we reserved thepending[n]
entity, otherwise we reserved the brand new entity at indexmeta.len() -n - 1
. Now the reservation has given us anEntity
. But right now, it's invalid; it has noEntityLocation
. That's whereflush
comes in.flush
set
s theEntityLocation
of newly reserved entities to point to the empty archetype.What's wrong with this
Even aside from remote entity reservation, a lot could be improved. First, let's clarify exactly what
flush
does. It removes reserved entities from thepending
list, creates new, reserved entities on themeta
list, and performs aninit
call on all of them. Technically,init
is kept generic, but in practice, it alwaysset
s the entity location to the empty archetype. So what does that look like? We push the entity onto theArchetype::entities: Vec<ArchetypeEntity>
.ArchetypeEntity
is just anEntity
and it'sTableRow
.So let me get this strait. Let's say we use commands to spawn 100 entities with
MyComponent
. And lets say there are somepending
entities to reuse (which is expected after the app has been running for a bit). For each of those 100 entities: 1) We remove them from thepending
list. 2) We re-arrange them into aArchetypeEntity
by addingTableRow::INVALID
(the empty archetype has no table.) 3) Weset
theirEntityLocation
. Then, now thatflush
is over, we 4) swap remove them from the empty archetype. 5) Insert them into theMyComponent
archetype, and 6)set
their newEntityLocation
.What the heck? Steps 1-3 are completely irrelevant after steps 4-6. It's wasted work! And's it's not just "oh we need one more mem-copy calls" since we have to change
Entity
toArchetypeEntity
at both steps. Plus, we're swap removing front to back, so that takes even longer too! Again, what the heck?Solution (WIP that needs discussion)
I'll cut to the chase: Let's remove the empty archetype and consider "invalid" (not yet flushed) entities as valid but empty entities. In other words:
ArchetypeId::INVALID
becomesArchetypeId::Empty
. As soon as an entity is reserved or allocated, it is fully valid and usable; it's just empty.That means, we never need to
flush
. We save 50% of the work for most commands. We ditch expensiveflush
calls that really just existed to let usalloc
orfree
an entity. Etc. This could be a really big win for performance!There are some downsides too:
Entities
needs to keep archetypeEdges
and have some extra moving parts.Query<Entity>
can't just loop through every Archetype anymore. But in practice, who doesQuery<Entity>
completely unqualified. The minute you add any component into the query at all, this goes away.Details
How does this work with remote entity reservation, and what does the new
Entities
look like in practice.First, lets get some obvious things out of the way. 1) The
meta
list needs to continue to be a normalVec
directly onEntities
for performance. 2)reserve_entity
andreserve_entities
are now exactly the same asalloc
mechanically, so we can get rid of them. 3) As a resultalloc
and a newalloc_many
need to be created with only&Entities
. 4) Because of remote reservation, ideallyalloc
at least only needs some shared,Arc
'd state, not full&Entities
. This prevents remote reservation from being blocking or async waiting. This is bonus points, but it's ends up not costing anything extra.With that out of the way, the next big question is "how does reservation work?" The answer is complex, so let me break this down. At the end of the day, we need something like a
Vec<Entity>
that stores a slice of empty archetype entities followed by a slice of pending/free entities. Let's call thisowned
since these entities are owned byEntities
. We also need an atomic int to separate them,free_cursor
, which holds the index intoowned
of the first free/pending entity. And we need an atomic int to track the length of themeta
list; let's call itmeta_len
.Let's explore that model a little bit:
get
theEntityLocation
of anEntity
, we need a bit of work. If the index is valid inmeta
, we can just get it like normal. If the index is greater than the atomicmeta_len
or the entity's generation is higher than that of a brand new entity, the entity is invalid/from a different world, so we returnNone
. If neither of those conditions are met, the entity must be brand new, so it is not inowned
and it'sArchetypeRow
isArchetypeRow::INVALID
. (Originally I thought usingArchetypeRow::INVALID
here was dangerous, but it ends up being completely safe.) This takes 1 relaxed atomic operation for brand new entities.EntityMeta
of an entity index (internally), we just accessmeta
at the proper index. If that index is currently out of bounds, we just extend it with theEntityMeta
for a brand new entity. A brand new entity hasArchetypeId::EMPTY
andArchetypeRow::INVALID
, so we just fill that in. (No atomic operations)set
theEntityLocation
of anEntity
, there's a bit more work too. First, we update itsEntityMeta
inmeta
. If it's previous location isArchetypeId::EMPTY
and it is notArchetypeRow::INVALID
, then we need to remove it from ourowned
list. This is the hardest part since a remote thread might be doingalloc
at the same time as us. First, if thefree_cursor
index is not valid inowned
, we have theowned
list all to ourselves, so we can just swap remove and be done. Otherwise, we start by incrementingfree_cursor
to get an index that we own. Then we swap the entity to remove with that index, and swap theirArchetypeRow
s accordingly. Now, the entity leaving the empty archetype is right before the currentfree_cursor
. So, we do an atomic compare exchange with thefree_cursor
to try to decrement it. If that works, we're done. Otherwise, another one must have been reserved, so we need to do that swap again to put the entity leaving the empty archetype right beforefree_cursor
. We try this again until the compare exchange works. Note that the compare exchange will pretty much always work the first time and that none of this will happen unless we're moving a freshly allocated reused entity. In practice, when the current archetype is empty, this has between 1 and 3 atomic operations, all of which can be relaxed I think.owned
, and update itsEntityMeta
inmeta
. We also need toset
its new location. Importantly, it is now in the empty archetype (though not yet spawned), so we set it'sEntityLocation::archetype_id
toArchetypeId::EMPTY
and we set itsEntityLocation::archetype_row
to its index inowned
. Iffree_cursor
is greater than this index, that means this entity is now the only pending/free entity, and we need to setfree_cursor
to its index inowned
. This takes 2 relaxed atomic operations.alloc
/reserve
an entity, we just increment thefree_cursor
. If the previousfree_cursor
was a valid index inowned
, we look it up inowned
, and return it. Otherwise, we just return a generation 0 entity at indexmeta_len.fetch_add(1)
. This takes either 1 or 2 relaxed atomic operations, which we can amortize some in the future.owned[0..free_cursor]
, the new entities with indicesmeta.len()..meta_len
, and any new indices ranges we missed because of recentmeta
resizes. We'll have to do a tiny bit if book keeping for this, but not much. (Or we can just not iterate this because what would be the point?)If these atomic operations are bothering you, remember that all of them are relaxed, which is pretty much free.
To do this model, what we need is a
Vec<Entity>
-like data structure forowned
that we canpush
,pop
, and index without mutable access. Thankfully, this is a much simper problem that's already solved. We need some form of liked list, but to make things more compact, we need a linked list ofBox<[UnsafeCell<Entity>]>
s where the length of each slice is doubling. We can also skip the first few powers of 2, and we can flatten the linked list into aVec<OwnedChunk>
, where each chunk is theBox<[UnsafeCell<Entity>]>
described, and the first one has length 256. We could even store a pointer to the biggest chunk inline since that's the most likely to be used. We also need an atomic integer for the length, and in reality, thoseBox
es will need to beAtomicPtr
, but we can still useRelaxed
ordering, so there's no practical difference.Note that this structure lets us index easily (we don't have to do any searches), has the same memory footprint as a normal
Vec
, has the same growing strategy as a normalVec
, etc. The only downside here is that it uses more cache lanes if the chunks are not adjacent in memory. But remember that unless we're finallyset
ing theEntityLocation
of anEntity
wealloc
ated ages ago, we're only going to be using the biggest chunk (1 cache lane). And worst case, we just need 2 for swaping between chunks. Just like the currentflush
ing system uses 2 (thepending
list and theentities
in the empty archetype).Final notes
This could maybe be simplified a little bit without the remote reservations requirement. For example, we could maybe re-introduce
flush
for fulfilling remote reservations that are waiting in async. But this doesn't give us many wins for performance over this implementation.Part of me thinks I've missed something (otherwise we'd be doing this already), but I can't think of anything. Note that this would use a few atomic operations even during
alloc
calls, etc. But they could all beRelaxed
(no more costly than a normal operation just not as optimizable by the compiler). So I don't think this is a big deal.In any case, I think that this would be much much faster than the current
flush
system, since it's doing double the work it needs to! Plus this solves a lot of design problems with remote reservation, etc.Beta Was this translation helpful? Give feedback.
All reactions