Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Improvement: Extend gstat statistics by indexes (null values) #8404

Open
sim1984 opened this issue Jan 20, 2025 · 2 comments
Open

Improvement: Extend gstat statistics by indexes (null values) #8404

sim1984 opened this issue Jan 20, 2025 · 2 comments

Comments

@sim1984
Copy link

sim1984 commented Jan 20, 2025

Currently, gstat outputs the following statistics for the index:

    Index HORSE_IDX_BIRTHDAY (0) 
	Root page: 150238, depth: 2, leaf buckets: 167, nodes: 545113 
	Average node length: 4.94, total dup: 520604, max dup: 27865 
	Average key length: 2.00, compression ratio: 1.90 
	Average prefix length: 3.75, average data length: 0.05 
	Clustering factor: 436641, ratio: 0.80 
	Fill distribution: 
	     0 - 19% = 0 
	    20 - 39% = 1 
	    40 - 59% = 0 
	    60 - 79% = 0 
	    80 - 99% = 166 

It is proposed to expand this statistics with the number of null values ​​in the keys. This value is quite important if the index can contain null values, since the real selectivity for operations that do not take null into account will be different (primarily equality). It is clear that the share of null values ​​should be in the stored statistics, as is selectivity now. However, the number of null values ​​in the gstat output will also be useful for assessing the real selectivity.

    Index HORSE_IDX_BIRTHDAY (0) 
	Root page: 150238, depth: 2, leaf buckets: 167, nodes: 545113 
	Average node length: 4.94, total dup: 520604, max dup: 27865 
        Segments: 1, Nulls: 27866
	Average key length: 2.00, compression ratio: 1.90 
	Average prefix length: 3.75, average data length: 0.05 
	Clustering factor: 436641, ratio: 0.80 
	Fill distribution: 
	     0 - 19% = 0 
	    20 - 39% = 1 
	    40 - 59% = 0 
	    60 - 79% = 0 
	    80 - 99% = 166 

@sim1984 sim1984 changed the title ImprovementЖ Extend gstat statistics by indexes (null values) Improvement: Extend gstat statistics by indexes (null values) Jan 20, 2025
@dyemanov
Copy link
Member

How is it going to work for compound indices? Count only NULLs in all segments?

@sim1984
Copy link
Author

sim1984 commented Jan 20, 2025

I think it's worth adding the output of the number of segments to this statistic. And consider null only for single-segment indexes, in other cases just don't output, or consider when null in all segments.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants