Skip to content

Conversation

alexander-yakushev
Copy link
Member

@alexander-yakushev alexander-yakushev commented Apr 3, 2025

Now this is a feature I had on my mind for quite some time, and one I'm very excited about. The idea is to have all kinds of analytics that the inspector can automatically (on demand) compute when the user inspects a large collection of data. It happens very often to me that I inspect something big and can't really grasp some characteristics of that data just by looking at it visually at the inspector, and I start running all the things on it like frequencies, (map type coll), etc. Having the inspector do it without leaving the inspector window is crazily convenient.

Bonus: we can present the analytics data using the inspector itself. I'm almost crying as I'm writing this. It is a thing of beauty; Father McCarthy smiles upon us from heaven.

Some screenshots how it would look like in the end:

image image

Things it can do so far:

  • For any list: frequencies of values, frequencies of value types
  • Basic numerical stats on lists with numbers (min, max, mean, number of zeros)
  • Very basic string stats on lists with strings (number of blanks, min, max, avg length)
  • Same but on lists of lists
  • For keyvalues (maps k->v): separate list analysis of keys and of values
  • For lists of tuples: separate list analysis of each "vertical slice" (treat all first values as list 1, all second values as list 2, etc.)
  • For lists of "records" (maps of the same structure): separate list analysis on values under each key.

Extras:

  • Configurable cutoff limit (100,000 by default). Only that many values from the head of the list will be analyzed.
  • A hint about this feature that is displayed to the user until they trigger analytics for the first time. Will serve the discoverability function.

I have a dream that as people start using it, we together can come up with more ideas what to compute and make this feature even awesomer.


  • You've added tests to cover your change(s)
  • All tests are passing
  • You've updated the changelog (if adding/changing user-visible functionality)

@bbatsov
Copy link
Member

bbatsov commented Apr 5, 2025

Bonus: we can present the analytics data using the inspector itself. I'm almost crying as I'm writing this. It is a thing of beauty; Father McCarthy smiles upon us from heaven.

❤️

The proposal looks pretty good to me. I'm just wondering about one thing - would it be nice to have some mechanism to add/remove some of the analytics as a client. Adding, because I assume some people would have specific needs, and removing - in case someone runs into issues (e.g. performance) with some of them.

But overall I love the feature and the implementation looks pretty good to me.

@alexander-yakushev
Copy link
Member Author

alexander-yakushev commented Apr 5, 2025

The proposal looks pretty good to me. I'm just wondering about one thing - would it be nice to have some mechanism to add/remove some of the analytics as a client. Adding, because I assume some people would have specific needs, and removing - in case someone runs into issues (e.g. performance) with some of them.

I've thought about this overall, and this is how I see it:

  • The current calculations are cheap, you can count on me measuring and optimizing that. The whole thing comes back in ~1ms for 100k non-trivial items.
  • When deciding what to calculate, the price is one of my primary criteria. For example, I haven't put in median/percentiles calculations yet, as it would be more expensive. Probably still affordable, but they didn't make it into the first batch.
  • If people ever want to add something else, I'd say they'll have to do it by contributing to Orchard directly where we can vet. It's too early to make this thing extendable.
  • Given all that, I don't think it will have value to disable some of the calculations at least for now. The configuration fatigue is a real thing, and I'd like to keep this config-free as long as it's possible.

@bbatsov
Copy link
Member

bbatsov commented Apr 5, 2025

No argument from me. Just to be clear - I wasn't thinking of configuration options, but rather some mechanism where users could contribute extra analytics just by defining simple functions in their own codebase. But that was just a general remark and not something I'm pushing to do right now. Premature optimization in the root of all evil. :-)

@alexander-yakushev
Copy link
Member Author

I agree. I think there can be space for user-extendable analytics in the future.

@alexander-yakushev alexander-yakushev merged commit 515486a into master Apr 5, 2025
20 checks passed
@alexander-yakushev alexander-yakushev deleted the analytics branch April 5, 2025 12:33
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants