Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Pool fails to manage number of open connection? #520

Closed
ukai opened this issue Sep 30, 2020 · 2 comments · Fixed by #527
Closed

Pool fails to manage number of open connection? #520

ukai opened this issue Sep 30, 2020 · 2 comments · Fixed by #527

Comments

@ukai
Copy link

ukai commented Sep 30, 2020

I've got many context deadline exceeded error from Pool.GetContext.

I suspect it fails to manage number of open connection.
It acquires from p.ch here

case <-p.ch:

but should we release it back to p.ch here?
return 0, ctx.Err()

otherwise available items in p.ch would decrease and eventually couldn't get new one?

@rhnvrm
Copy link
Contributor

rhnvrm commented Oct 6, 2020

With the following diff, I was able to replicate that 2/3rd of the times while running go test -run='TestWaitPoolGetContextIssue520', the test hangs at waiting for c2. With a sufficient timeout, there is no waiting.

redis/pool.go      |  2 ++
redis/pool_test.go | 28 ++++++++++++++++++++++++++++

modified   redis/pool.go
@@ -21,6 +21,7 @@ import (
 	"crypto/sha1"
 	"errors"
 	"io"
+	"log"
 	"strconv"
 	"sync"
 	"sync/atomic"
@@ -387,6 +388,7 @@ func (p *Pool) waitVacantConn(ctx context.Context) (waited time.Duration, err er
 		// because `select` picks a random `case` if several of them are "ready".
 		select {
 		case <-ctx.Done():
+			log.Println("here in ctx done inside read p ch")
 			return 0, ctx.Err()
 		default:
 		}
modified   redis/pool_test.go
@@ -18,6 +18,7 @@ import (
 	"context"
 	"errors"
 	"io"
+	"log"
 	"net"
 	"reflect"
 	"sync"
@@ -825,6 +826,33 @@ func TestWaitPoolGetContext(t *testing.T) {
 	defer c.Close()
 }
 
+func TestWaitPoolGetContextIssue520(t *testing.T) {
+	d := poolDialer{t: t}
+	p := &redis.Pool{
+		MaxIdle:   1,
+		MaxActive: 1,
+		Dial:      d.dial,
+		Wait:      true,
+	}
+	defer p.Close()
+	ctx1, _ := context.WithTimeout(context.Background(), 1*time.Nanosecond)
+	c, err := p.GetContext(ctx1)
+	if err != context.DeadlineExceeded {
+		t.Fatalf("GetContext returned %v", err)
+	}
+	defer c.Close()
+
+	log.Println("waiting for c2")
+	ctx2, _ := context.WithCancel(context.Background())
+	c2, err := p.GetContext(ctx2)
+	if err != nil {
+		t.Fatalf("Get context returned %v", err)
+	}
+	defer c2.Close()
+	log.Println("got c2")
+	t.Fatalf("%#v", p.Stats())
+}
+
 func TestWaitPoolGetContextWithDialContext(t *testing.T) {
 	d := poolDialer{t: t}
 	p := &redis.Pool{

The fix as suggested by @ukai,

	select {
	case <-p.ch:
		// Additionally check that context hasn't expired while we were waiting,
		// because `select` picks a random `case` if several of them are "ready".
		select {
		case <-ctx.Done():
                        // release back
			p.ch <- struct{}{}
			return 0, ctx.Err()
		default:
		}
	case <-ctx.Done():
		return 0, ctx.Err()
	}

Seems to fix the issue.

If this can be reviewed by a second eye, I'd be happy to send a fix for the same as a PR.

@vividvilla
Copy link

@rhnvrm testcase seems fine 👍 If enough context fails then it just never releases the connection and everything gets locked.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants