Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Large MaxTableSize and AndreasBriese/bbloom #745

Closed
zorino opened this issue Mar 20, 2019 · 6 comments
Closed

Large MaxTableSize and AndreasBriese/bbloom #745

zorino opened this issue Mar 20, 2019 · 6 comments

Comments

@zorino
Copy link

zorino commented Mar 20, 2019

When setting a large MaxTableSize of 1024 << 20 I'm getting the following error :

panic: runtime error: makeslice: len out of range

goroutine 95948 [running]:
github.com/zorino/metaprot/vendor/github.com/AndreasBriese/bbloom.(*Bloom).Size(...)
        /home/deraspem/go/src/github.com/zorino/metaprot/vendor/github.com/AndreasBriese/bbloom/bbloom.go:203
github.com/zorino/metaprot/vendor/github.com/AndreasBriese/bbloom.New(0xc4a914eb08, 0x2, 0x2, 0x0, 0x0, 0x0, 0x0, 0x0, 0x3f, 0x7fffffffffffffff, ...)
        /home/deraspem/go/src/github.com/zorino/metaprot/vendor/github.com/AndreasBriese/bbloom/bbloom.go:73 +0x12e
github.com/zorino/metaprot/vendor/github.com/AndreasBriese/bbloom.NewWithBoolset(0xc4a914eb88, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /home/deraspem/go/src/github.com/zorino/metaprot/vendor/github.com/AndreasBriese/bbloom/bbloom.go:81 +0xb2
github.com/zorino/metaprot/vendor/github.com/AndreasBriese/bbloom.JSONUnmarshal(0xcd23e30af2, 0x0, 0x4, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /home/deraspem/go/src/github.com/zorino/metaprot/vendor/github.com/AndreasBriese/bbloom/bbloom.go:105 +0xea
github.com/zorino/metaprot/vendor/github.com/dgraph-io/badger/table.(*Table).readIndex(0xc000254ff0, 0x0, 0x0)
        /home/deraspem/go/src/github.com/zorino/metaprot/vendor/github.com/dgraph-io/badger/table/table.go:233 +0xca
github.com/zorino/metaprot/vendor/github.com/dgraph-io/badger/table.OpenTable(0xc62e1f4db8, 0x2, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
        /home/deraspem/go/src/github.com/zorino/metaprot/vendor/github.com/dgraph-io/badger/table/table.go:156 +0x17d
github.com/zorino/metaprot/vendor/github.com/dgraph-io/badger.(*levelsController).compactBuildTables.func2(0x6, 0xc053666000, 0xc4e53ced80, 0xc5b94479e0)
        /home/deraspem/go/src/github.com/zorino/metaprot/vendor/github.com/dgraph-io/badger/levels.go:581 +0x240
created by github.com/zorino/metaprot/vendor/github.com/dgraph-io/badger.(*levelsController).compactBuildTables
        /home/deraspem/go/src/github.com/zorino/metaprot/vendor/github.com/dgraph-io/badger/levels.go:567 +0x107b
@jarifibrahim
Copy link
Contributor

@zorino What does go env show?

@zorino
Copy link
Author

zorino commented Mar 20, 2019

@jarifibrahim

GOARCH="amd64"
GOBIN=""
GOCACHE="/home/deraspem/.cache/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/deraspem/go"
GOPROXY=""
GORACE=""
GOROOT="/home/deraspem/packages/go"
GOTMPDIR=""
GOTOOLDIR="/home/deraspem/packages/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build291546033=/tmp/go-build"

@jarifibrahim
Copy link
Contributor

It looks like the bbloom.Size (https://github.com/AndreasBriese/bbloom/blob/master/bbloom.go#L203) is trying to create a slice a bigger than what your system supports.
From https://github.com/golang/go/blob/master/src/runtime/slice.go#L37

func makeslice(et *_type, len, cap int) unsafe.Pointer {
	mem, overflow := math.MulUintptr(et.size, uintptr(cap))
	if overflow || mem > maxAlloc || len < 0 || len > cap {
		// NOTE: Produce a 'len out of range' error instead of a
		// 'cap out of range' error when someone does make([]T, bignumber).
		// 'cap out of range' is true too, but since the cap is only being
		// supplied implicitly, saying len is clearer.
		// See golang.org/issue/4085.
		mem, overflow := math.MulUintptr(et.size, uintptr(len))
		if overflow || mem > maxAlloc || len < 0 {
			panicmakeslicelen()
		}
		panicmakeslicecap()
	}

	return mallocgc(mem, et, true)
}

@zorino You might want to try with a smaller MaxTableSize. That should work.

@zorino
Copy link
Author

zorino commented Mar 21, 2019

@jarifibrahim

Yes it works, but I was trying to overcome the max open files of 1024 (ulimit -n) which I have no control over.

Thank you anyway.

I was able to run it with 768 << 20 and then will try doing backup/restore/flatten + aggressive GC (as proposed here #718) to limit the number of files.

I already set the ValueLogMaxEntries to 100000000 giving vlog files of 1.1GB.

Any other recommendations would be welcome.

@zorino zorino closed this as completed Mar 21, 2019
@manishrjain
Copy link
Contributor

manishrjain commented Mar 21, 2019

You can increase your ulimit. 1024 files are not that many. Also, if you have larger SSTables, you'd have issues later during compactions. Each compaction could iterate over 10+1 SSTables (assuming each level is 10x the size of the previous level), which would take a long time if your SSTable is 768MB (instead of 64MB).

@zorino
Copy link
Author

zorino commented Mar 21, 2019

Yeah but that's the problem I cannot set ulimit to more than 2048 since I'm running the creation of the KV stores on grid computers.

If I decrease the LevelSizeMultiplier to 3 would that make less iteration for the compaction ?

I'm trying to build a rather big KV store and my bulk creation is time consuming.

I had to split my data entries (tsv files here) into 10 files of ~10M lines each.

Each line creates in average ~300 KV for my main KV store but there is a lot of duplicate in there, but that is still ~3 billions key to process.

In the first iteration of my program, I was updating the value in place but that was too slow... now I insert everything in one run and then I iterate over all KVs (stream API) and merging the keys that have several versions (with a discard earlier versions).

Also to improve the throughput I have several badger KV stores which store the features as well as the combination of features (sha1sum) into KVs.
Combination KV are useful since a lot of keys have the same list of values for different features.

E.g.

K_store : k_key -> kk_store_hash_key
KK_store : kk_store_hash_key -> [ff_store_hash_key, gg_store_hash_key, ..]
FF_Store: ff_store_hash_key -> [f_store_key_1, f_store_key_2]
F_Store: f_store_key_1 -> f_value
....

OS :
32 x Intel(R) Xeon(R) CPU E5-2683 v4 @ 2.10GHz
132GB of RAM
878G local SSD

Right now it takes me ~400 minutes to build a store and almost 3x that time to merge all the KVs.

I would also like to improve the load, for the first batch insert it goes around 15 / 32 in average (except for the first few minutes where the load is awesome) but for the stream / merge afterward not much more then 3-4/32 but my merging routines are inherently slow since I need to query several other KV store so I can live with that.

Anyway that was just to give you some context and thank you again for your help.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Development

No branches or pull requests

3 participants