Large MaxTableSize and AndreasBriese/bbloom #745

zorino · 2019-03-20T15:58:07Z

When setting a large MaxTableSize of 1024 << 20 I'm getting the following error :

panic: runtime error: makeslice: len out of range

goroutine 95948 [running]:
github.com/zorino/metaprot/vendor/github.com/AndreasBriese/bbloom.(*Bloom).Size(...)
        /home/deraspem/go/src/github.com/zorino/metaprot/vendor/github.com/AndreasBriese/bbloom/bbloom.go:203
github.com/zorino/metaprot/vendor/github.com/AndreasBriese/bbloom.New(0xc4a914eb08, 0x2, 0x2, 0x0, 0x0, 0x0, 0x0, 0x0, 0x3f, 0x7fffffffffffffff, ...)
        /home/deraspem/go/src/github.com/zorino/metaprot/vendor/github.com/AndreasBriese/bbloom/bbloom.go:73 +0x12e
github.com/zorino/metaprot/vendor/github.com/AndreasBriese/bbloom.NewWithBoolset(0xc4a914eb88, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /home/deraspem/go/src/github.com/zorino/metaprot/vendor/github.com/AndreasBriese/bbloom/bbloom.go:81 +0xb2
github.com/zorino/metaprot/vendor/github.com/AndreasBriese/bbloom.JSONUnmarshal(0xcd23e30af2, 0x0, 0x4, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /home/deraspem/go/src/github.com/zorino/metaprot/vendor/github.com/AndreasBriese/bbloom/bbloom.go:105 +0xea
github.com/zorino/metaprot/vendor/github.com/dgraph-io/badger/table.(*Table).readIndex(0xc000254ff0, 0x0, 0x0)
        /home/deraspem/go/src/github.com/zorino/metaprot/vendor/github.com/dgraph-io/badger/table/table.go:233 +0xca
github.com/zorino/metaprot/vendor/github.com/dgraph-io/badger/table.OpenTable(0xc62e1f4db8, 0x2, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
        /home/deraspem/go/src/github.com/zorino/metaprot/vendor/github.com/dgraph-io/badger/table/table.go:156 +0x17d
github.com/zorino/metaprot/vendor/github.com/dgraph-io/badger.(*levelsController).compactBuildTables.func2(0x6, 0xc053666000, 0xc4e53ced80, 0xc5b94479e0)
        /home/deraspem/go/src/github.com/zorino/metaprot/vendor/github.com/dgraph-io/badger/levels.go:581 +0x240
created by github.com/zorino/metaprot/vendor/github.com/dgraph-io/badger.(*levelsController).compactBuildTables
        /home/deraspem/go/src/github.com/zorino/metaprot/vendor/github.com/dgraph-io/badger/levels.go:567 +0x107b

The text was updated successfully, but these errors were encountered:

jarifibrahim · 2019-03-20T18:52:42Z

@zorino What does go env show?

zorino · 2019-03-20T20:13:49Z

@jarifibrahim

GOARCH="amd64"
GOBIN=""
GOCACHE="/home/deraspem/.cache/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/deraspem/go"
GOPROXY=""
GORACE=""
GOROOT="/home/deraspem/packages/go"
GOTMPDIR=""
GOTOOLDIR="/home/deraspem/packages/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build291546033=/tmp/go-build"

jarifibrahim · 2019-03-21T12:10:27Z

It looks like the bbloom.Size (https://github.com/AndreasBriese/bbloom/blob/master/bbloom.go#L203) is trying to create a slice a bigger than what your system supports.
From https://github.com/golang/go/blob/master/src/runtime/slice.go#L37

func makeslice(et *_type, len, cap int) unsafe.Pointer {
	mem, overflow := math.MulUintptr(et.size, uintptr(cap))
	if overflow || mem > maxAlloc || len < 0 || len > cap {
		// NOTE: Produce a 'len out of range' error instead of a
		// 'cap out of range' error when someone does make([]T, bignumber).
		// 'cap out of range' is true too, but since the cap is only being
		// supplied implicitly, saying len is clearer.
		// See golang.org/issue/4085.
		mem, overflow := math.MulUintptr(et.size, uintptr(len))
		if overflow || mem > maxAlloc || len < 0 {
			panicmakeslicelen()
		}
		panicmakeslicecap()
	}

	return mallocgc(mem, et, true)
}

@zorino You might want to try with a smaller MaxTableSize. That should work.

zorino · 2019-03-21T13:57:15Z

@jarifibrahim

Yes it works, but I was trying to overcome the max open files of 1024 (ulimit -n) which I have no control over.

Thank you anyway.

I was able to run it with 768 << 20 and then will try doing backup/restore/flatten + aggressive GC (as proposed here #718) to limit the number of files.

I already set the ValueLogMaxEntries to 100000000 giving vlog files of 1.1GB.

Any other recommendations would be welcome.

manishrjain · 2019-03-21T20:38:44Z

You can increase your ulimit. 1024 files are not that many. Also, if you have larger SSTables, you'd have issues later during compactions. Each compaction could iterate over 10+1 SSTables (assuming each level is 10x the size of the previous level), which would take a long time if your SSTable is 768MB (instead of 64MB).

zorino · 2019-03-21T21:33:36Z

Yeah but that's the problem I cannot set ulimit to more than 2048 since I'm running the creation of the KV stores on grid computers.

If I decrease the LevelSizeMultiplier to 3 would that make less iteration for the compaction ?

I'm trying to build a rather big KV store and my bulk creation is time consuming.

I had to split my data entries (tsv files here) into 10 files of ~10M lines each.

Each line creates in average ~300 KV for my main KV store but there is a lot of duplicate in there, but that is still ~3 billions key to process.

In the first iteration of my program, I was updating the value in place but that was too slow... now I insert everything in one run and then I iterate over all KVs (stream API) and merging the keys that have several versions (with a discard earlier versions).

Also to improve the throughput I have several badger KV stores which store the features as well as the combination of features (sha1sum) into KVs.
Combination KV are useful since a lot of keys have the same list of values for different features.

E.g.

K_store : k_key -> kk_store_hash_key
KK_store : kk_store_hash_key -> [ff_store_hash_key, gg_store_hash_key, ..]
FF_Store: ff_store_hash_key -> [f_store_key_1, f_store_key_2]
F_Store: f_store_key_1 -> f_value
....

OS :
32 x Intel(R) Xeon(R) CPU E5-2683 v4 @ 2.10GHz
132GB of RAM
878G local SSD

Right now it takes me ~400 minutes to build a store and almost 3x that time to merge all the KVs.

I would also like to improve the load, for the first batch insert it goes around 15 / 32 in average (except for the first few minutes where the load is awesome) but for the stream / merge afterward not much more then 3-4/32 but my merging routines are inherently slow since I need to query several other KV store so I can live with that.

Anyway that was just to give you some context and thank you again for your help.

zorino closed this as completed Mar 21, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large MaxTableSize and AndreasBriese/bbloom #745

Large MaxTableSize and AndreasBriese/bbloom #745

zorino commented Mar 20, 2019

jarifibrahim commented Mar 20, 2019

zorino commented Mar 20, 2019 •

edited

Loading

jarifibrahim commented Mar 21, 2019

zorino commented Mar 21, 2019

manishrjain commented Mar 21, 2019 •

edited

Loading

zorino commented Mar 21, 2019

Large MaxTableSize and AndreasBriese/bbloom #745

Large MaxTableSize and AndreasBriese/bbloom #745

Comments

zorino commented Mar 20, 2019

jarifibrahim commented Mar 20, 2019

zorino commented Mar 20, 2019 • edited Loading

jarifibrahim commented Mar 21, 2019

zorino commented Mar 21, 2019

manishrjain commented Mar 21, 2019 • edited Loading

zorino commented Mar 21, 2019

zorino commented Mar 20, 2019 •

edited

Loading

manishrjain commented Mar 21, 2019 •

edited

Loading