RFC: Introduce pipeline templates #996

pombredanne · 2023-11-02T22:44:04Z

Context and problem
The recent experiment with the d2d pipeline show that we face pipeline growth issues:

the pipeline grew big with many interdependent steps
it will be hard to add support for new tech stacks beyond Java and JS in such a long pipeline

Separetely some users are also confused about which (of the many) pipeline to use (for instance, should they use "scan codebase" or "scan package" or "scan codebase package"?) or some run all the pipelines instead to be "safe".

Therefore, we need to think about how we could evolve the architecture to be more modular and easier to evolve but also how the pipelines could be presented in a way that is easier for users.

Solution elements

To make it easier for end users we should provide fewer pipelines with better documentation and guidance.
To make pipelines more modular we should provide more smaller pipelines that we can easily compose.

The two approaches seem to conflict.

A possible simple initial implementation could be called "analysis templates" and may reconcile these:

Each template would have a name, description/help and a simple list of pipelines.
The UI would allow to select a template. Selecting a template would copy its pipelines to the project. Nothing else.
The rest of processing would be unchanged.
This could be a simple hardcoded list at first and could evolve in a Web UI configurable list stored in the DB and provided with defaults.

An example of such analysis template could be:

"Java web app": analyze a Java/JS web application source and binaries for package origins and licenses
With these pipelines:
- "Extract inputs and create resources, apply ignores and ABOUT files"
- "Map deployed Java binary files to their sources, scan deployed sources"
- "Map deployed JavaScript files to their sources, scan deployed sources"
- "Match code to the PurlDB"
- "Lookup curated package data in reference DB"
- "Perform housekeeping and tag problematic resources and packages"

Here the pipelines would be new smaller modular and composable pipelines, either extracted from existing ones, or created as new.

Note that in the future that some pipelines may be run in // like here with the map and match pipelines that are independent

Also another approach could be to provide ways to compose steps in the UI rather than only in code as we do today and leave pipelines as they are, just make it easier to create them. But this looks much more involved as this would be some meta programming of sorts.

Another example of such analysis template could be:

"Package vulnerabilities": Scan code for packages and lookup for their vulnerabilities
With these pipelines:
- "Extract inputs and create resources, apply ignores and ABOUT files"
- "Scan code for Package-URLs (ignoring licensing)"
- "Lookup Package-URLs in VulnerableCode for vulnerabilities"
- "Perform housekeeping and tag problematic packages"

Here some the pipelines are reused, and some are new.

DennisClark · 2023-11-03T16:24:10Z

thanks for providing the useful details here @pombredanne
I think that your "simple initial implementation of analysis templates" makes a lot of sense and is a good idea.

pombredanne mentioned this issue Jan 4, 2024

RFC: The future of pipelines #1040

Closed

14 tasks

pombredanne added this to the v34.0.0 milestone Jan 11, 2024

tdruez removed this from the v34.0.0 milestone Mar 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Introduce pipeline templates #996

RFC: Introduce pipeline templates #996

pombredanne commented Nov 2, 2023

DennisClark commented Nov 3, 2023

RFC: Introduce pipeline templates #996

RFC: Introduce pipeline templates #996

Comments

pombredanne commented Nov 2, 2023

DennisClark commented Nov 3, 2023