Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Fix memory leak. #787

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Conversation

jvaracarbonell
Copy link

This PR fixes #786.

@RasmusOrsoe
Copy link
Collaborator

Hi @jvaracarbonell! Could you provide a few more details regarding the issue you opened in #787 and how this pull request solves it?

@jvaracarbonell
Copy link
Author

Hello @RasmusOrsoe!

I encountered a memory leak when using the GraphNeTDataModule with a list of SQLite files as input. For example:

  files = glob.glob("/scratch/tmp/fvaracar/graphnet_training_data/NuMu/quality_cut/*/*/*db")
   
   dm = GraphNeTDataModule(
       dataset_reference = SQLiteDataset,
       selection=None,
       dataset_args={
           "truth": truth,
           "truth_table": truth_table,
           "features": features,
           "graph_definition": graph_definition,
           "pulsemaps": [config["pulsemap"]],
           "path": files,
           "index_column": "event_no",
           "labels": {"direction": Direction(
               azimuth_key="azimuth", zenith_key="zenith"
           )},
       },
       train_dataloader_kwargs={
           "batch_size": config["batch_size"],
           "num_workers": config["num_workers"],
           "shuffle": True,
       },
       test_selection = [None for _ in range(len(files))],
       test_dataloader_kwargs={
           "batch_size": config["batch_size"],
           "num_workers": config["num_workers"],
           "shuffle": True,
       },
   )

   training_dataloader = dm.train_dataloader
   validation_dataloader = dm.val_dataloader

My CPU RAM was steadily filling up with each iteration until my jobs were eventually killed. I noticed that closing the connection in the query_table method of the SQLiteDataset module prevents this memory leak.

I'm not entirely sure if this is the optimal solution, so any suggestions would be welcome.

By the way, I apologize for not following the Graphnet contribution guidelines—specifically, merging from my fork’s main branch instead of a separate branch—as the guidelines link did not seem to work when I submitted this.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Memory leak when using a list of SQLite files in GraphNeTDataModule
2 participants