Suggest to use hash index, instead of btree index in migration. #94

skatkov · 2023-10-09T20:54:54Z

Good day. Thanks for amazing gem.

I experimented with storing cache in a relational database as well. From my findings, it seems like a good idea to suggest using hash index instead of btree as a possible alternative.

Hash index doesn't have to descend through index to find a right entry, instead it efficiently finds cache entry with O(1) operation. This small improvement can yield 40-60% lookup speed improvement, but it depends on a size of cache.

This optimization doesn't bring many drawbacks, only one - #delete_matched method can't be used in this case. So I'm throwing an error, if hash index will be detected. But to be frank, not a lot of people use this method anyways - so it seems like a manageable drawback.

Just to add a little bit more context to this PR, I'm linking interesting benchmark that compares btree with hash index.
https://evgeniydemin.medium.com/postgresql-indexes-hash-vs-b-tree-84b4f6aa6d61

skatkov · 2023-10-10T10:40:29Z

Upps, let me fix this :)

rafaelsales · 2023-10-17T02:52:29Z

IMO the added complexity in the installation command and keeping two migration files is a bit too much here.

How about ensuring that the hash index is the default one, and add a commented add_index line that uses btree advising on when to uncomment. E.g.:

    create_table :solid_cache_entries do |t|
      t.binary :key, null: false, limit: 1024, index: { unique: true, using: :hash }
      t.binary :value, null: false, limit: 512.megabytes
      t.datetime :created_at, null: false

      # Uncomment the btree index below if you want to use `SolidCache#Entry.delete_matched`
      # t.index :key, unique: true, index: using: :btree
    end

skatkov · 2023-10-17T10:07:18Z

Thanks, @rafaelsales I agree with your suggestion.

Just for visibility’s sake, I'll post @djmb response from slack

It might be that we could just decide not to support delete_matched at all

-In MySQL with InnoDB, there are no hash indexes - it’s allowed in the create index syntax but it just creates btree indexes. But we could reduce the key column size instead
-I’d like to get to somewhere where there are no options but it’s as fast as it can be - so for Postgres that would mean probably your approach.

I’m away for a week or so, so I’ll leave your PR for now, but I’ve not forgotten it!

I'll wait for final decision from @djmb and when I'll fix this PR.

But it seems reasonable to completely remove support for #delete_matched and have only one migration that uses hash index in PG, but btree in MySQL.

jmonteiro · 2023-10-24T21:14:21Z

Small note on this, per Postgres' documentation:

Currently, only B-tree indexes can be declared unique.

So a hash index cannot have uniqueness enforcement.

skatkov · 2023-12-30T21:17:48Z

@djmb would be great to hear, which direction you want to take with "hash index" for PG.

I'll be happy to adjust this PR accordingly. Not sure, if there is a point to fix current implementation.

djmb · 2024-01-04T11:21:40Z

Hi @skatkov!

Sorry for the delay on this!

I'm planning to add a separate key_hash column which will be a 64 bit integer. This will allow a much smaller index for looking up records that works on all databases. The query for records would then be where key_hash = <hash> and key = <key>

Maybe we could still make it a hash index on postgresql, but that would depend on whether we need it to also be a unique index.

We could keep the unique index on key which we can use for delete_matched, or have a unique index on the new key_hash column instead and make the index on key optional for delete_matched support.

Also the performance difference between btree and hash indexes may also not be as significant for a 64 bit integer column compared to a 1K blob.

Another consideration is the changes needed to allow us to estimate the cache size. If those work, then the key_hash could potentially be used as the bucket column - you could do something like select sum(size) from solid_cache_entries where key_hash between x and y.

This would require an index on key_hash and size to be efficient, so that adds to the mix.

I'll be working on this over the next few weeks, but it needs some investigation. Once we have the columns and indexes in place for everything we can see where a postgresql hash index fits into the mix.

PikachuEXE

Minor indent related issue

PikachuEXE · 2024-01-15T02:57:46Z

lib/generators/solid_cache/install/templates/create_solid_cache_entries_btree.rb

@@ -0,0 +1,11 @@
+class CreateSolidCacheEntries < ActiveRecord::Migration[7.0]
+    def change


Why 4 spaces instead of 2 spaces

PikachuEXE · 2024-01-15T02:57:51Z

lib/generators/solid_cache/install/templates/create_solid_cache_entries_hash.rb

@@ -0,0 +1,11 @@
+class CreateSolidCacheEntries < ActiveRecord::Migration[7.0]
+    def change


PikachuEXE · 2024-01-15T06:18:51Z

lib/generators/solid_cache/install/templates/create_solid_cache_entries_hash.rb

+        t.binary   :value,      null: false,   limit: 512.megabytes
+        t.datetime :created_at, null: false
+
+        t.index    :key,        unique: true,  using: :hash


Error encountered with message
PG::FeatureNotSupported: ERROR: access method "hash" does not support unique indexes

From doc
https://www.postgresql.org/docs/16/indexes-unique.html
Currently, only B-tree indexes can be declared unique.

Though I attempted to workaround it by removing unique: true, I would encounter another error
ArgumentError - No unique index found for key from
https://github.com/rails/rails/blob/v7.1.2/activerecord/lib/active_record/insert_all.rb#L161

Edit: And then I read comments above and it's already reported sorry~

skatkov · 2024-01-25T15:39:33Z

I'll drop this for now. Happy to take another stab at this once @djmb will decide on a approach he wants to take here.

Stanislav Katkov added 3 commits October 10, 2023 12:31

implement support for hash index

8021da4

ability to change index to hash

8e1bb5d

fmt

4c57357

skatkov force-pushed the hash-index branch from 94d1cf9 to 4c57357 Compare October 10, 2023 10:31

CI: don't fail fast

915e8f8

skatkov mentioned this pull request Jan 12, 2024

Remove support for delete_matched #128

Merged

PikachuEXE reviewed Jan 15, 2024

View reviewed changes

skatkov closed this Jan 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggest to use hash index, instead of btree index in migration. #94

Suggest to use hash index, instead of btree index in migration. #94

skatkov commented Oct 9, 2023 •

edited

Loading

skatkov commented Oct 10, 2023

rafaelsales commented Oct 17, 2023 •

edited

Loading

skatkov commented Oct 17, 2023 •

edited

Loading

jmonteiro commented Oct 24, 2023 •

edited

Loading

skatkov commented Dec 30, 2023 •

edited

Loading

djmb commented Jan 4, 2024

PikachuEXE left a comment

PikachuEXE Jan 15, 2024

PikachuEXE Jan 15, 2024

PikachuEXE Jan 15, 2024

PikachuEXE Jan 15, 2024 •

edited

Loading

skatkov commented Jan 25, 2024

		@@ -0,0 +1,11 @@
		class CreateSolidCacheEntries < ActiveRecord::Migration[7.0]
		def change

Suggest to use hash index, instead of btree index in migration. #94

Suggest to use hash index, instead of btree index in migration. #94

Conversation

skatkov commented Oct 9, 2023 • edited Loading

skatkov commented Oct 10, 2023

rafaelsales commented Oct 17, 2023 • edited Loading

skatkov commented Oct 17, 2023 • edited Loading

jmonteiro commented Oct 24, 2023 • edited Loading

skatkov commented Dec 30, 2023 • edited Loading

djmb commented Jan 4, 2024

PikachuEXE left a comment

Choose a reason for hiding this comment

PikachuEXE Jan 15, 2024

Choose a reason for hiding this comment

PikachuEXE Jan 15, 2024

Choose a reason for hiding this comment

PikachuEXE Jan 15, 2024

Choose a reason for hiding this comment

PikachuEXE Jan 15, 2024 • edited Loading

Choose a reason for hiding this comment

skatkov commented Jan 25, 2024

skatkov commented Oct 9, 2023 •

edited

Loading

rafaelsales commented Oct 17, 2023 •

edited

Loading

skatkov commented Oct 17, 2023 •

edited

Loading

jmonteiro commented Oct 24, 2023 •

edited

Loading

skatkov commented Dec 30, 2023 •

edited

Loading

PikachuEXE Jan 15, 2024 •

edited

Loading