Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Locking issue? #293

Closed
Joey92 opened this issue Apr 6, 2018 · 7 comments
Closed

Locking issue? #293

Joey92 opened this issue Apr 6, 2018 · 7 comments

Comments

@Joey92
Copy link

Joey92 commented Apr 6, 2018

We've been testing tile38 as a proof of concept lately and do have a couple hundred SET commands every minute. It's running on a 8 core machine with 32gigs of ram, but a it's a node in Kubernetes and shared. We're trying to set up 2600 static geofences, all single coordinates with a radius of 500 meters. It feels like Tile38 gets stuck while setting them up. We've put a sleep between each command to set up a Geofence, that seems to work with some pauses here and there. Running the server command while that is running takes a while to execute:

{"ok":true,"stats":{"aof_size":15409953,"avg_item_size":2461,"cpus":8,"heap_released":0,"heap_size":22886360,"http_transport":true,"id":"ecc385f1f50d7f3c35d3e2bcc38dcb57","in_memory_size":334790,"max_heap_size":0,"mem_alloc":22886360,"num_collections":6,"num_hooks":1328,"num_objects":6665,"num_points":9297,"num_strings":0,"pid":1,"pointer_size":8,"read_only":false,"threads":8},"elapsed":"98.742µs"}
real	0m 17.76s
user	0m 0.00s
sys	0m 0.00s

It's close to 0 CPU and around 60megabyte of RAM after an aofshrink. Could there be a locking issue?
Maybe we should consider using a roaming fence instead?

@tidwall
Copy link
Owner

tidwall commented Apr 6, 2018

Roaming geofences automatically track moving points. If you're having to continually update the 2600 static geofences to match a position, then roaming geofences are probably what you want.

That said, 17 seconds seems pretty slow for the server command. I'm interested in reproducing the issue on my side. What steps do you recommend I take? Or maybe just provide me with your appendonly.aof file and I can diagnose directly.

@Joey92
Copy link
Author

Joey92 commented Apr 9, 2018

Okay I guess we found the issue.. All the webhooks had the same endpoint, but we forgot to provide the specific port each. So it seems to have ran out of handles and only continued when one request timed out. Removing all webhooks and recreating them correctly immediately freed it.

@Joey92 Joey92 closed this as completed Apr 9, 2018
@Joey92
Copy link
Author

Joey92 commented Apr 12, 2018

@tidwall So to continue on this.. If a webhook can't send out its message, due to the endpoint being not reachable, it bricks the entire application. In that time not even a command to remove the faulty webhooks can be run. In this case it's the kafka endpoint.

@Joey92 Joey92 reopened this Apr 12, 2018
@tidwall
Copy link
Owner

tidwall commented Apr 12, 2018

Perhaps there's something up with the kafka webhook driver.

It's designed to that if an endpoint isn't reachable then the message should be queued, the system will make subsequent attempts, and finally give up after a reasonable amount of time. All of this happens in a background thread so it shouldn't slow down other operations.

I'm gonna try to reproduce and I'll let you know what I run into.

@tidwall
Copy link
Owner

tidwall commented Apr 13, 2018

I've been unable to reproduce locally.
If there is a data race or deadlock it should be detectable by compiling Tile38 using the -race flag.

From the tile38 root directory:

go build -race -o tile38-server cmd/tile38-server/main.go

This creates a tile38-server binary that prints data race issues to the terminal.

I'll keep poking around.

@Joey92
Copy link
Author

Joey92 commented Apr 17, 2018

The easiest way I got it locally was to create a couple kafka webhook and use some unreachable broker, but always the same one for each hook. Then trigger them a couple times.

@tidwall
Copy link
Owner

tidwall commented Apr 17, 2018

I found the issue and pushed an update to the master branch.
If possible, please verify from your side and let me know if you run into any problems.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants