Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Suggest to use hash index, instead of btree index in migration. #94

Closed
wants to merge 4 commits into from

Conversation

skatkov
Copy link
Contributor

@skatkov skatkov commented Oct 9, 2023

Good day. Thanks for amazing gem.

I experimented with storing cache in a relational database as well. From my findings, it seems like a good idea to suggest using hash index instead of btree as a possible alternative.

Hash index doesn't have to descend through index to find a right entry, instead it efficiently finds cache entry with O(1) operation. This small improvement can yield 40-60% lookup speed improvement, but it depends on a size of cache.

This optimization doesn't bring many drawbacks, only one - #delete_matched method can't be used in this case. So I'm throwing an error, if hash index will be detected. But to be frank, not a lot of people use this method anyways - so it seems like a manageable drawback.

Just to add a little bit more context to this PR, I'm linking interesting benchmark that compares btree with hash index.
https://evgeniydemin.medium.com/postgresql-indexes-hash-vs-b-tree-84b4f6aa6d61

@skatkov
Copy link
Contributor Author

skatkov commented Oct 10, 2023

Upps, let me fix this :)

@rafaelsales
Copy link
Contributor

rafaelsales commented Oct 17, 2023

IMO the added complexity in the installation command and keeping two migration files is a bit too much here.

How about ensuring that the hash index is the default one, and add a commented add_index line that uses btree advising on when to uncomment. E.g.:

    create_table :solid_cache_entries do |t|
      t.binary :key, null: false, limit: 1024, index: { unique: true, using: :hash }
      t.binary :value, null: false, limit: 512.megabytes
      t.datetime :created_at, null: false

      # Uncomment the btree index below if you want to use `SolidCache#Entry.delete_matched`
      # t.index :key, unique: true, index: using: :btree
    end

@skatkov
Copy link
Contributor Author

skatkov commented Oct 17, 2023

Thanks, @rafaelsales I agree with your suggestion.

Just for visibility’s sake, I'll post @djmb response from slack

It might be that we could just decide not to support delete_matched at all

-In MySQL with InnoDB, there are no hash indexes - it’s allowed in the create index syntax but it just creates btree indexes. But we could reduce the key column size instead
-I’d like to get to somewhere where there are no options but it’s as fast as it can be - so for Postgres that would mean probably your approach.

I’m away for a week or so, so I’ll leave your PR for now, but I’ve not forgotten it!

I'll wait for final decision from @djmb and when I'll fix this PR.

But it seems reasonable to completely remove support for #delete_matched and have only one migration that uses hash index in PG, but btree in MySQL.

@jmonteiro
Copy link

jmonteiro commented Oct 24, 2023

Small note on this, per Postgres' documentation:

Currently, only B-tree indexes can be declared unique.

So a hash index cannot have uniqueness enforcement.

@skatkov
Copy link
Contributor Author

skatkov commented Dec 30, 2023

@djmb would be great to hear, which direction you want to take with "hash index" for PG.

I'll be happy to adjust this PR accordingly. Not sure, if there is a point to fix current implementation.

@djmb
Copy link
Collaborator

djmb commented Jan 4, 2024

Hi @skatkov!

Sorry for the delay on this!

I'm planning to add a separate key_hash column which will be a 64 bit integer. This will allow a much smaller index for looking up records that works on all databases. The query for records would then be where key_hash = <hash> and key = <key>

Maybe we could still make it a hash index on postgresql, but that would depend on whether we need it to also be a unique index.

We could keep the unique index on key which we can use for delete_matched, or have a unique index on the new key_hash column instead and make the index on key optional for delete_matched support.

Also the performance difference between btree and hash indexes may also not be as significant for a 64 bit integer column compared to a 1K blob.

Another consideration is the changes needed to allow us to estimate the cache size. If those work, then the key_hash could potentially be used as the bucket column - you could do something like select sum(size) from solid_cache_entries where key_hash between x and y.

This would require an index on key_hash and size to be efficient, so that adds to the mix.

I'll be working on this over the next few weeks, but it needs some investigation. Once we have the columns and indexes in place for everything we can see where a postgresql hash index fits into the mix.

Copy link

@PikachuEXE PikachuEXE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor indent related issue

@@ -0,0 +1,11 @@
class CreateSolidCacheEntries < ActiveRecord::Migration[7.0]
def change

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 4 spaces instead of 2 spaces

@@ -0,0 +1,11 @@
class CreateSolidCacheEntries < ActiveRecord::Migration[7.0]
def change

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

t.binary :value, null: false, limit: 512.megabytes
t.datetime :created_at, null: false

t.index :key, unique: true, using: :hash

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error encountered with message
PG::FeatureNotSupported: ERROR: access method "hash" does not support unique indexes

From doc
https://www.postgresql.org/docs/16/indexes-unique.html
Currently, only B-tree indexes can be declared unique.

Copy link

@PikachuEXE PikachuEXE Jan 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though I attempted to workaround it by removing unique: true, I would encounter another error
ArgumentError - No unique index found for key from
https://github.com/rails/rails/blob/v7.1.2/activerecord/lib/active_record/insert_all.rb#L161

Edit: And then I read comments above and it's already reported sorry~

@skatkov
Copy link
Contributor Author

skatkov commented Jan 25, 2024

I'll drop this for now. Happy to take another stab at this once @djmb will decide on a approach he wants to take here.

@skatkov skatkov closed this Jan 25, 2024
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants