-
Notifications
You must be signed in to change notification settings - Fork 197
Async driver keeping connections from pool open? #796
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Comments
Hi and thanks for contacting us. The fact that the error looks off (with code and message being None) is indeed a bug that I thought I already fixed. Turns out it did so (it's not in 4.4 anymore) but I accidentally reverted it along with other things that were meant to be reverted. Expect a fix for that soon. Your assessment of the error cause, however, is correct in that it is thrown in
|
A shot in the dark: are you closing all sessions you open? https://stackoverflow.com/questions/72022910/neo4j-python-api-crashing-on-multiple-queries/72031241#72031241 |
We are closing the sessions with "with session" option that is the second way, as it is shown in the answer on stackoverflow |
So I've been trying to reproduce the issue with a synthetic workload using
import asyncio
import datetime
import random
import time
import typing as t
import neo4j
# from neo4j.debug import watch
NUM_WORKERS = 4
POOL_SIZE = 5
async def work(
tx: neo4j.AsyncManagedTransaction,
query: str,
**query_params: t.Any
) -> list[neo4j.Record]:
result = await tx.run(query, **query_params)
records = [rec async for rec in result]
await result.consume() # just because I can
return records
async def test_program(driver: neo4j.AsyncDriver) -> None:
while True:
node_id = random.randint(1, 1000)
clear = random.random() < 0.001 # 0.1% chance
query = "MERGE (n:Node {id: $id})"
query_params = {"id": node_id}
if clear:
query = "MATCH (n:Node) DETACH DELETE n"
query_params = {}
t0 = time.time()
async with driver.session() as session:
await session.execute_write(work, query, **query_params)
t1 = time.time()
took = t1 - t0
if took >= 3:
print(f"took {took} seconds. too long!")
raise Exception("took too long")
async def main() -> None:
async with neo4j.AsyncGraphDatabase.driver(
"neo4j://localhost:7687", auth=("neo4j", "pass"),
max_connection_pool_size=POOL_SIZE
) as driver:
await driver.verify_connectivity()
await asyncio.gather(*(
test_program(driver)
for _ in range(NUM_WORKERS)
))
if __name__ == '__main__':
print("starting", datetime.datetime.now())
try:
# with watch("neo4j"): # enable debug logging
asyncio.run(main())
finally:
print("finished", datetime.datetime.now()) After over an hour, no issues arose for me. Can you try running the above example on your machine and see if it causes the issue? If not, can you alter it (as little as possible) towards your real world application until the error occurs and then share it with me? As you might know, it's quite hard to debug a problem without being able to reproduce it. Also, make sure your environment matches or share with me in which environment you are able to reproduce the error. |
Would it be possible for you to change your code to run with python 3.7.10? That is the version that we will be using for production. As for the rest of the requirments I will share them here: jsonschema==4.4.0 Using the above requirements and python 3.7.10 I am able to get the error locally using our own code that is run in production. |
Thanks. I'm pretty positive the other packages will not affect this error, but I'll give it a try with, should the error not occur without. Out of pure curiosity: why are you having py2neo and the official driver in your project? |
We are using py2neo to run queries in Neo4j that update/create our database. We don't need it in production as something else is handling that, but I guess we still have it locally to update our local Neo4j. |
So I ran the above code with Python 3.7.10 for 2 hours (after replacing I also switched the server to match yours: docker run \
--name testneo4j \
-p7474:7474 -p7688:7687 \
--rm -it \
-v $HOME/data/neo4j/data:/data \
-v $HOME/data/neo4j/logs:/logs \
-v $HOME/data/neo4j/import:/var/lib/neo4j/import \
-v $HOME/data/neo4j/plugins:/plugins \
--env-file bin/neo4j.env \
neo4j:4.4.10-community (both I had to downgrade So I think that pretty much matches your environment. (I'm running on Linux, not sure about you, but I'd be hugely surprised if the OS made a difference here). Ideas what to try/investigate next:
|
I have tried running the code as well, but I also tried to lower the pool size to 2 or 3 and then I got the error code fairly quickly. If you can try that as well, I think it should work. There was no other application accessing the database other than our API. |
When the error is not received from the DBMS, but instead originates from somewhere in the driver, it might not have a code and a message. In that case, we fall back to the default Exception representation. Related: * neo4j#796
The reservation count was also increased if the pool was full and the reserveration would never be turned into a new connection. This lead to the pool clogging up after a while.
When the error is not received from the DBMS, but instead originates from somewhere in the driver, it might not have a code and a message. In that case, we fall back to the default Exception representation. Related: * neo4j#796
The reservation count was also increased if the pool was full and the reserveration would never be turned into a new connection. This lead to the pool clogging up after a while.
I finally managed to reproduce it and hunt down the bug. Configuring the driver with a small Thanks again for reaching out and helping me to get to the core of the issue 👏 The fix should be included in the (soon to be released) patch versions 5.0.1 and 4.4.8. |
Perfect, glad I could help :) |
* Improve Noe4jError representation When the error is not received from the DBMS, but instead originates from somewhere in the driver, it might not have a code and a message. In that case, we fall back to the default Exception representation. Related: * #796 * Make Neo4jError representation more robust Even though, all errors that come from the server go through `Noej4jError.hydrate` which (currently) makes sure that `code` *and* `message` are set, this might change in the future.
* Improve Noe4jError representation When the error is not received from the DBMS, but instead originates from somewhere in the driver, it might not have a code and a message. In that case, we fall back to the default Exception representation. Related: * #796 * Make Neo4jError representation more robust Even though, all errors that come from the server go through `Noej4jError.hydrate` which (currently) makes sure that `code` *and* `message` are set, this might change in the future.
I encountered a new issue with the Async driver version 5.0.0a2.
The error states:
I suspected it is the issue with connections being kept alive or being alive for a prolonged period of time, as when I checked
"/usr/local/lib/python3.7/site-packages/neo4j/_async/io/_pool.py"
I noticed the error is thrown when the pool is full and there are no avaliable connections. Also the code and message are shown as None, so I am not sure if that is a bug also.We are running Neo4j Version 4.4.10 Community edition as a single instance.
The text was updated successfully, but these errors were encountered: