Skip to content

Fastest and most efficient goroutine pool (experimental)

License

Notifications You must be signed in to change notification settings

alphadose/itogami

Repository files navigation

Itogami

An experimental goroutine pool implemented using a lock-free stack

By limiting concurrency with a fixed pool size and recycling goroutines using a stack, itogami saves a lot of memory as compared to using unlimited goroutines and remaining just as fast.

Benchmarks to support the above claims here

Note:- This work is experimental and should not be used in production

Installation

You need Golang 1.19.x or above

$ go get github.com/alphadose/itogami

Usage

package main

import (
	"fmt"
	"sync"
	"sync/atomic"
	"time"

	"github.com/alphadose/itogami"
)

const runTimes uint32 = 1000

var sum uint32

func myFunc(i uint32) {
	atomic.AddUint32(&sum, i)
	fmt.Printf("run with %d\n", i)
}

func demoFunc() {
	time.Sleep(10 * time.Millisecond)
	println("Hello World")
}

func examplePool() {
	var wg sync.WaitGroup
	// Use the common pool
	pool := itogami.NewPool(10)

	syncCalculateSum := func() {
		demoFunc()
		wg.Done()
	}
	for i := uint32(0); i < runTimes; i++ {
		wg.Add(1)
		// Submit task to the pool
		pool.Submit(syncCalculateSum)
	}
	wg.Wait()
	println("finished all tasks")
}

func examplePoolWithFunc() {
	var wg sync.WaitGroup
	// Use the pool with a pre-defined function
	pool := itogami.NewPoolWithFunc(10, func(i uint32) {
		myFunc(i)
		wg.Done()
	})
	for i := uint32(0); i < runTimes; i++ {
		wg.Add(1)
		// Invoke the function with a value
		pool.Invoke(i)
	}
	wg.Wait()
	fmt.Printf("finish all tasks, result is %d\n", sum)
}

func main() {
	examplePool()
	examplePoolWithFunc()
}

Benchmarks

Benchmarking was performed against:-

  1. Unlimited goroutines
  2. Ants
  3. Gamma-Zero-Worker-Pool
  4. golang.org/x/sync/errgroup
  5. Bytedance GoPool

Pool size -> 50k

CPU -> M1, arm64, 8 cores, 3.2 GHz

OS -> darwin

Results were computed from benchstat of 30 cases

name                   time/op
UnlimitedGoroutines-8   331ms ± 4%
ErrGroup-8              515ms ± 9%
AntsPool-8              582ms ± 9%
GammaZeroPool-8         740ms ±13%
BytedanceGoPool-8       572ms ±18%
ItogamiPool-8           337ms ± 1%

name                   alloc/op
UnlimitedGoroutines-8  96.3MB ± 0%
ErrGroup-8              120MB ± 0%
AntsPool-8             22.4MB ± 6%
GammaZeroPool-8        18.8MB ± 1%
BytedanceGoPool-8      82.2MB ± 2%
ItogamiPool-8          25.6MB ± 2%

name                   allocs/op
UnlimitedGoroutines-8   2.00M ± 0%
ErrGroup-8              3.00M ± 0%
AntsPool-8              1.10M ± 2%
GammaZeroPool-8         1.08M ± 0%
BytedanceGoPool-8       2.59M ± 1%
ItogamiPool-8           1.08M ± 0%

The following conclusions can be drawn from the above results:-

  1. Itogami is the fastest among all goroutine pool implementations and slightly slower than unlimited goroutines
  2. Itogami has the least allocs/op and hence the memory usage scales really well with high load
  3. The memory used per operation is in the acceptable range of other pools and drastically lower than unlimited goroutines
  4. The tolerance (± %) for Itogami is quite low for all 3 metrics indicating that the algorithm is quite stable overall

Benchmarking code available here