Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Reduce the number of fetches when harvesting one package #464

Closed
qtomlinson opened this issue Feb 17, 2022 · 3 comments
Closed

Reduce the number of fetches when harvesting one package #464

qtomlinson opened this issue Feb 17, 2022 · 3 comments
Assignees

Comments

@qtomlinson
Copy link
Collaborator

qtomlinson commented Feb 17, 2022

Package fetch from repository is triggered multiple times during harvesting one package. The fetch is currently dispatched once per processor (e.g. licensee or scancode). A package is processed by 3 processors at the moment, resulting in 3 round trips to the repository. Presumably, only one fetch from the repository is necessary.

@qtomlinson
Copy link
Collaborator Author

Sample log from processing a maven package: 3 maven fetches were dispatched.

[V] Finished getRequest (6ms) package@cd:/maven/gradleplugin/io.github.lognet/grpc-spring-boot-starter-gradle-plugin/4.5.7:mM {"crawlerId":"979f84c6-3adc-4cd8-97e6-cb5692a96377","buildNumber":"0"}
[V] Started fetching package@cd:/maven/gradleplugin/io.github.lognet/grpc-spring-boot-starter-gradle-plugin/4.5.7:mM {"loopName":"0","cid":"14","root":"self","crawlerId":"979f84c6-3adc-4cd8-97e6-cb5692a96377","buildNumber":"0"}
[V] Finished fetching (2ms) package@cd:/maven/gradleplugin/io.github.lognet/grpc-spring-boot-starter-gradle-plugin/4.5.7:mM {"loopName":"0","cid":"14","root":"self","crawlerId":"979f84c6-3adc-4cd8-97e6-cb5692a96377","buildNumber":"0"}
[V] Started processing package@cd:/maven/gradleplugin/io.github.lognet/grpc-spring-boot-starter-gradle-plugin/4.5.7:mM {"loopName":"0","cid":"14","root":"self","crawlerId":"979f84c6-3adc-4cd8-97e6-cb5692a96377","buildNumber":"0"}
[V] Finished processing (3ms) package@cd:/maven/gradleplugin/io.github.lognet/grpc-spring-boot-starter-gradle-plugin/4.5.7:mM {"loopName":"0","cid":"14","root":"self","crawlerId":"979f84c6-3adc-4cd8-97e6-cb5692a96377","buildNumber":"0"}
[I] Traversed package@cd:/maven/gradleplugin/io.github.lognet/grpc-spring-boot-starter-gradle-plugin/4.5.7  {"loopName":"0","cid":"14","root":"self","outcome":"Traversed","time":12,"crawlerId":"979f84c6-3adc-4cd8-97e6-cb5692a96377","buildNumber":"0"}
[V] Started getRequest  {"crawlerId":"979f84c6-3adc-4cd8-97e6-cb5692a96377","buildNumber":"0"}
[V] Finished getRequest (0ms) maven@cd:/maven/gradleplugin/io.github.lognet/grpc-spring-boot-starter-gradle-plugin/4.5.7:mM {"crawlerId":"979f84c6-3adc-4cd8-97e6-cb5692a96377","buildNumber":"0"}
[V] Started fetching maven@cd:/maven/gradleplugin/io.github.lognet/grpc-spring-boot-starter-gradle-plugin/4.5.7:mM {"loopName":"0","cid":"15","root":"package@cd:/maven/gradleplugin/io.github.lognet/grpc-spring-boot-starter-gradle-plugin/4.5.7","crawlerId":"979f84c6-3adc-4cd8-97e6-cb5692a96377","buildNumber":"0"}
[V] ---Start Maven Fetch for cd:/maven/gradleplugin/io.github.lognet/grpc-spring-boot-starter-gradle-plugin/4.5.7, 2022-02-17T18:06:43.920Z {"crawlerId":"979f84c6-3adc-4cd8-97e6-cb5692a96377","buildNumber":"0"}
[V] Started getRequest  {"crawlerId":"979f84c6-3adc-4cd8-97e6-cb5692a96377","buildNumber":"0"}
[V] Finished getRequest (1ms) _blank@null:mM {"crawlerId":"979f84c6-3adc-4cd8-97e6-cb5692a96377","buildNumber":"0"}
[I] Drained   _blank@null Waiting 5000 milliseconds {"loopName":"1","cid":"16","root":"self","outcome":"Drained","time":0,"crawlerId":"979f84c6-3adc-4cd8-97e6-cb5692a96377","buildNumber":"0"}
[V] Crawler: 1 waiting for 5000ms {"crawlerId":"979f84c6-3adc-4cd8-97e6-cb5692a96377","buildNumber":"0"}
[V] ---End Maven Fetch for cd:/maven/gradleplugin/io.github.lognet/grpc-spring-boot-starter-gradle-plugin/4.5.7, 2022-02-17T18:06:45.433Z {"crawlerId":"979f84c6-3adc-4cd8-97e6-cb5692a96377","buildNumber":"0"}
[V] Finished fetching (1515ms) maven@cd:/maven/gradleplugin/io.github.lognet/grpc-spring-boot-starter-gradle-plugin/4.5.7:mM {"loopName":"0","cid":"15","root":"package@cd:/maven/gradleplugin/io.github.lognet/grpc-spring-boot-starter-gradle-plugin/4.5.7","crawlerId":"979f84c6-3adc-4cd8-97e6-cb5692a96377","buildNumber":"0"}
[V] Started processing maven@cd:/maven/gradleplugin/io.github.lognet/grpc-spring-boot-starter-gradle-plugin/4.5.7:mM {"loopName":"0","cid":"15","root":"package@cd:/maven/gradleplugin/io.github.lognet/grpc-spring-boot-starter-gradle-plugin/4.5.7","crawlerId":"979f84c6-3adc-4cd8-97e6-cb5692a96377","buildNumber":"0"}
[V] Finished processing (363ms) maven@cd:/maven/gradleplugin/io.github.lognet/grpc-spring-boot-starter-gradle-plugin/4.5.7:mM {"loopName":"0","cid":"15","root":"package@cd:/maven/gradleplugin/io.github.lognet/grpc-spring-boot-starter-gradle-plugin/4.5.7","crawlerId":"979f84c6-3adc-4cd8-97e6-cb5692a96377","buildNumber":"0"}
[I] Processed maven@cd:/maven/gradleplugin/io.github.lognet/grpc-spring-boot-starter-gradle-plugin/4.5.7  {"loopName":"0","cid":"15","root":"package@cd:/maven/gradleplugin/io.github.lognet/grpc-spring-boot-starter-gradle-plugin/4.5.7","k":38,"count":5,"write":2,"outcome":"Processed","time":1881,"crawlerId":"979f84c6-3adc-4cd8-97e6-cb5692a96377","buildNumber":"0"}
[V] Started getRequest  {"crawlerId":"979f84c6-3adc-4cd8-97e6-cb5692a96377","buildNumber":"0"}
[V] Finished getRequest (0ms) licensee@cd:/maven/gradleplugin/io.github.lognet/grpc-spring-boot-starter-gradle-plugin/4.5.7:mM {"crawlerId":"979f84c6-3adc-4cd8-97e6-cb5692a96377","buildNumber":"0"}
[V] Started fetching licensee@cd:/maven/gradleplugin/io.github.lognet/grpc-spring-boot-starter-gradle-plugin/4.5.7:mM {"loopName":"0","cid":"17","root":"package@cd:/maven/gradleplugin/io.github.lognet/grpc-spring-boot-starter-gradle-plugin/4.5.7","crawlerId":"979f84c6-3adc-4cd8-97e6-cb5692a96377","buildNumber":"0"}
[V] ---Start Maven Fetch for cd:/maven/gradleplugin/io.github.lognet/grpc-spring-boot-starter-gradle-plugin/4.5.7, 2022-02-17T18:06:45.802Z {"crawlerId":"979f84c6-3adc-4cd8-97e6-cb5692a96377","buildNumber":"0"}
[V] ---End Maven Fetch for cd:/maven/gradleplugin/io.github.lognet/grpc-spring-boot-starter-gradle-plugin/4.5.7, 2022-02-17T18:06:46.569Z {"crawlerId":"979f84c6-3adc-4cd8-97e6-cb5692a96377","buildNumber":"0"}
[V] Finished fetching (768ms) licensee@cd:/maven/gradleplugin/io.github.lognet/grpc-spring-boot-starter-gradle-plugin/4.5.7:mM {"loopName":"0","cid":"17","root":"package@cd:/maven/gradleplugin/io.github.lognet/grpc-spring-boot-starter-gradle-plugin/4.5.7","crawlerId":"979f84c6-3adc-4cd8-97e6-cb5692a96377","buildNumber":"0"}
[V] Started processing licensee@cd:/maven/gradleplugin/io.github.lognet/grpc-spring-boot-starter-gradle-plugin/4.5.7:mM {"loopName":"0","cid":"17","root":"package@cd:/maven/gradleplugin/io.github.lognet/grpc-spring-boot-starter-gradle-plugin/4.5.7","crawlerId":"979f84c6-3adc-4cd8-97e6-cb5692a96377","buildNumber":"0"}
[V] Finished processing (224ms) licensee@cd:/maven/gradleplugin/io.github.lognet/grpc-spring-boot-starter-gradle-plugin/4.5.7:mM {"loopName":"0","cid":"17","root":"package@cd:/maven/gradleplugin/io.github.lognet/grpc-spring-boot-starter-gradle-plugin/4.5.7","crawlerId":"979f84c6-3adc-4cd8-97e6-cb5692a96377","buildNumber":"0"}
[I] Processed licensee@cd:/maven/gradleplugin/io.github.lognet/grpc-spring-boot-starter-gradle-plugin/4.5.7  {"loopName":"0","cid":"17","root":"package@cd:/maven/gradleplugin/io.github.lognet/grpc-spring-boot-starter-gradle-plugin/4.5.7","write":1,"outcome":"Processed","time":994,"crawlerId":"979f84c6-3adc-4cd8-97e6-cb5692a96377","buildNumber":"0"}
[V] Started getRequest  {"crawlerId":"979f84c6-3adc-4cd8-97e6-cb5692a96377","buildNumber":"0"}
[V] Finished getRequest (0ms) scancode@cd:/maven/gradleplugin/io.github.lognet/grpc-spring-boot-starter-gradle-plugin/4.5.7:mM {"crawlerId":"979f84c6-3adc-4cd8-97e6-cb5692a96377","buildNumber":"0"}
[V] Started fetching scancode@cd:/maven/gradleplugin/io.github.lognet/grpc-spring-boot-starter-gradle-plugin/4.5.7:mM {"loopName":"0","cid":"18","root":"package@cd:/maven/gradleplugin/io.github.lognet/grpc-spring-boot-starter-gradle-plugin/4.5.7","crawlerId":"979f84c6-3adc-4cd8-97e6-cb5692a96377","buildNumber":"0"}
[V] ---Start Maven Fetch for cd:/maven/gradleplugin/io.github.lognet/grpc-spring-boot-starter-gradle-plugin/4.5.7, 2022-02-17T18:06:46.798Z {"crawlerId":"979f84c6-3adc-4cd8-97e6-cb5692a96377","buildNumber":"0"}
[V] ---End Maven Fetch for cd:/maven/gradleplugin/io.github.lognet/grpc-spring-boot-starter-gradle-plugin/4.5.7, 2022-02-17T18:06:47.559Z {"crawlerId":"979f84c6-3adc-4cd8-97e6-cb5692a96377","buildNumber":"0"}
[V] Finished fetching (762ms) scancode@cd:/maven/gradleplugin/io.github.lognet/grpc-spring-boot-starter-gradle-plugin/4.5.7:mM {"loopName":"0","cid":"18","root":"package@cd:/maven/gradleplugin/io.github.lognet/grpc-spring-boot-starter-gradle-plugin/4.5.7","crawlerId":"979f84c6-3adc-4cd8-97e6-cb5692a96377","buildNumber":"0"}
[V] Started processing scancode@cd:/maven/gradleplugin/io.github.lognet/grpc-spring-boot-starter-gradle-plugin/4.5.7:mM {"loopName":"0","cid":"18","root":"package@cd:/maven/gradleplugin/io.github.lognet/grpc-spring-boot-starter-gradle-plugin/4.5.7","crawlerId":"979f84c6-3adc-4cd8-97e6-cb5692a96377","buildNumber":"0"}
[I] Analyzing scancode@cd:/maven/gradleplugin/io.github.lognet/grpc-spring-boot-starter-gradle-plugin/4.5.7 using ScanCode. input: /tmp/cd-ZQw04n output: /tmp/cd-OGA0F7 {"crawlerId":"979f84c6-3adc-4cd8-97e6-cb5692a96377","buildNumber":"0"}
[V] Started getRequest  {"crawlerId":"979f84c6-3adc-4cd8-97e6-cb5692a96377","buildNumber":"0"}

@qtomlinson qtomlinson changed the title Fetch triggered multiple times when harvesting one package Reduce the number of round trips to repository when harvesting one package Feb 22, 2022
@qtomlinson qtomlinson changed the title Reduce the number of round trips to repository when harvesting one package Reduce the number of fetches when harvesting one package Mar 1, 2022
@qtomlinson
Copy link
Collaborator Author

qtomlinson commented Mar 1, 2022

For example, when harvesting a maven central component, the following fetches were incurred:

  • 3 fetches to the maven central repository for the specific artifact (jar file)
  • 3 fetches to the corresponding git repository for source files (or maven cental reposiotry for source archive)

Of the six fetches, two fetches (one to the maven central repository and one to git source repository) are necessary.

@qtomlinson
Copy link
Collaborator Author

qtomlinson commented Mar 1, 2022

Work involved:

  • Add FetchResult and caching mechanism in Dispatcher
  • Adapt MavenBasedFetch for maven google, maven central and gradle plugin
  • Adapt GitCloner for gitLab and gitHub
  • Adapt PypiFetch
  • Adapt NpmjsFetch
  • Adapt RubyGemFetch
  • Adapt PackagistFetch (php/composer)
  • Adapt DebianFetch
  • Adapt CratesioFetch(rust)
  • Adapt GoFetch
  • Adapt NugetFetch
  • Adapt PodFetch
  • Add unit tests

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant