Skip to content

[track-state] Completely redesign track-state middleware #931

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 1 commit into from
Apr 14, 2025

Conversation

alexander-yakushev
Copy link
Member

@alexander-yakushev alexander-yakushev commented Apr 11, 2025

I can't believe it took me this long to get to this. The current implementation of track-state is very far from ideal and I had problems with it for a long time (see #806). In the current large project I'm testing this, every eval op takes ~300-400ms to recalculate the updated state (even if there were no changes to relevant metadata) and this produces ~250-300 MB of garbage. The state recalculation doesn't delay the evaluation result (as the state update happens on a separate thread), but it delays font lock updates which can be annoying. Besides, the allocated garbage produces extra strain if a REPL process is already low on memory. Finally, the cache used by this middleware occupies again ~200 MB of heap, and there is a counterpart cache on the Emacs side which eats memory and causes GC pauses too.

This is a complete redesign. Here are the main points:

  • Drop metadata that is not relevant to state tracking. Namely, I removed :doc and :arglists. This instantly reduced the cache size by a lot.
  • The most common relevant metadata for functions is {:fn "true"} (not private, not deprecated, not a macro). Reusing this as a literal brings down retained cached size further. This is very significant.
  • New cache design.
    • For each namespace, we keep a map of var-symbol -> real-metadata-map. Because the "relevant metadata map" is always a derivative of a real metadata map, we can skip recomputing the relevant map if the real one hasn't changed, and take the computed map from the cache (from previous run). This map is kept as WeakHashMap to avoid holding onto metadata maps for vars that could have been undefined. The cache WHM is updated if the real metadata of the var has changed since the previous run.
    • For each loaded namespace, we go over its vars and recompute relevant metadata maps. If none of the vars changed its metadata, it means we took all relevant meta maps from cache, and the whole namespace state can remain as is (speeds up the comparison/diff operations later).

The remaining changes are mostly about refactoring and prettyfication. I introduced a couple of dynamic vars to simplify the internal API. Dynvars usually have worse performance, but here it is negligible, and I cache them into local vars where necessary.

Results on that large project:

  • ~15ms middleware execution time (it becomes ~20-40ms end to end – time between "done" and "state" messages, but it still feels instant).
  • ~3-4 MB allocated.

The public API hasn't changed. There are minor tweaks to be done on CIDER side later, but overall this is backwards-compatible.


  • You've updated the README

@alexander-yakushev
Copy link
Member Author

I need to fix the tests, but this can be already reviewed.

Copy link
Member

@bbatsov bbatsov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At a glance everything seems solid to me. Amazing work!!! 🔥

;; A "real metadata cache" is a map {ns-symbol WHM{var-symbol meta}} of
;; actual (not filtered) var metadata. We keep this to know when to recompute
;; the filtered metadata for a var (if the real meta hasn't changed, no need to
;; recompute). WHM is used tp avoid unnecessarily holding onto the metadata if
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tp -> to

current-meta)))

;; Note that we aggressively cut down unnecessary keys in this mw as it triggers
;; on EACH evaluation, and the middleware retains (caches) this data for ALL
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ALL loaded project namespaces :-)

@alexander-yakushev alexander-yakushev force-pushed the track-state branch 2 times, most recently from 0f13052 to c1e7d94 Compare April 14, 2025 14:36
Optimizes execution speed and allocation rate by ~100x.
@alexander-yakushev alexander-yakushev merged commit 9153326 into master Apr 14, 2025
16 checks passed
@alexander-yakushev alexander-yakushev deleted the track-state branch April 14, 2025 15:13
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants