Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

/rest/v1/project_directory performance #4725

Closed
MichaelAkvo opened this issue Dec 8, 2021 · 1 comment
Closed

/rest/v1/project_directory performance #4725

MichaelAkvo opened this issue Dec 8, 2021 · 1 comment
Assignees
Labels
Priority: Low python Pull requests that update Python code

Comments

@MichaelAkvo
Copy link
Contributor

MichaelAkvo commented Dec 8, 2021

The endpoint is unbelievably slow and incredibly large.

$ time curl 'http://localhost/rest/v1/project-directory' > dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0 11.6M    0 52749    0     0    113      0 29:56:54  0:07:46 29:49:08 13131

real    7m46.442s
user    0m0.021s

Basically 9k projects are selected in production and for each of them a bunch more requests are made.

@MichaelAkvo MichaelAkvo changed the title /rest/v1/project_directory /rest/v1/project_directory performance Dec 8, 2021
@MichaelAkvo MichaelAkvo self-assigned this Dec 8, 2021
@MichaelAkvo MichaelAkvo added Priority: High python Pull requests that update Python code labels Dec 8, 2021
MichaelAkvo added a commit that referenced this issue Dec 9, 2021
The projects don't change often, so we should be able to cache them for a while.
Additionally, every hour we will regenerate their cached values (take ~7m right now).

So the quick fix is two-fold:

 - cache the entire result and keep for an hour
 - regenerate per project cache every hour

Once the page cache expires, queries should take ~10-20 seconds, but only once until the next hour.

#4725: /rest/v1/project_directory performance
MichaelAkvo added a commit that referenced this issue Dec 9, 2021
The projects don't change often, so we should be able to cache them for a while.
Additionally, every hour we will regenerate their cached values (take ~7m right now).

So the quick fix is two-fold:

 - cache the entire result and keep for an hour
 - regenerate per project cache every hour

Once the page cache expires, queries should take ~10-20 seconds, but only once until the next hour.

#4725: /rest/v1/project_directory performance
MichaelAkvo added a commit that referenced this issue Dec 9, 2021
The projects don't change often, so we should be able to cache them for a while.
Additionally, every hour we will regenerate their cached values (take ~7m right now).

So the quick fix is two-fold:

 - cache the entire result and keep for an hour
 - regenerate per project cache every hour

Once the page cache expires, queries should take ~10-20 seconds, but only once until the next hour.

#4725: /rest/v1/project_directory performance
MichaelAkvo added a commit that referenced this issue Dec 9, 2021
For some reason it takes a whole lot of time to reverse an URL.
Even the example for `cached_property` uses `absolute_url` as an example.

#4725: /rest/v1/project_directory performance
MichaelAkvo added a commit that referenced this issue Dec 9, 2021
Instead of relying on django to make deferred queries, we make all queries ourselves,
 build the cache, and use it.
It's per project ID for faster access.

Additionally, `url` is not required anymore as the UI builds the url to the project now

#4725: /rest/v1/project_directory performance
MichaelAkvo added a commit that referenced this issue Dec 9, 2021
The request takes way too long when the cache is broken.
On production, it never actually is able to fill the cache.

To give us time to work on a solution, this limits the number of projects in the directory from

#4725: /rest/v1/project_directory performance
MichaelAkvo added a commit that referenced this issue Dec 9, 2021
Instead of relying on django to make deferred queries, we make all queries ourselves,
 build the cache, and use it.
It's per project ID for faster access.

Additionally, `url` is not required anymore as the UI builds the url to the project now

#4725: /rest/v1/project_directory performance
MichaelAkvo added a commit that referenced this issue Dec 9, 2021
The request takes way too long when the cache is broken.
On production, it never actually is able to fill the cache.

To give us time to work on a solution, this limits the number of projects in the directory from

#4725: /rest/v1/project_directory performance
MichaelAkvo added a commit that referenced this issue Dec 9, 2021
…_dir-to-10

[#4725] workaround: Ultra hacky limitation to speed up project_dir
MichaelAkvo added a commit that referenced this issue Dec 9, 2021
Instead of relying on django to make deferred queries, we make all queries ourselves,
 build the cache, and use it.
It's per project ID for faster access.

Additionally, `url` is not required anymore as the UI builds the url to the project now

#4725: /rest/v1/project_directory performance
zuhdil added a commit that referenced this issue Dec 10, 2021
- Add fields that should not be deffered
zuhdil added a commit that referenced this issue Dec 10, 2021
[#4725] Improve project directory performance
zuhdil added a commit that referenced this issue Dec 11, 2021
zuhdil added a commit that referenced this issue Dec 11, 2021
zuhdil added a commit that referenced this issue Dec 11, 2021
[#4725] Project directory performance improvement part 2
@MichaelAkvo
Copy link
Contributor Author

The problem in production seems to have come from get_thumbnail never ending and blocking the entire process.

It might still be possible to further improve performance by using cache_page on the endpoint and running the populate_project_directory_cache django command in a cron.

A rework of the landing page is coming, so this might not be relevant anymore.

@MichaelAkvo MichaelAkvo moved this to Done in RSR Dec 7, 2022
@MichaelAkvo MichaelAkvo added this to RSR Dec 7, 2022
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Priority: Low python Pull requests that update Python code
Projects
Archived in project
Development

No branches or pull requests

1 participant