fix: improvements to known CPE index construction #2801

westonsteimel · 2024-04-23T09:59:47Z

Previously when building the known CPE index, there was logic to de-duplicate processing based on the normalized CPE name; however, this means a significant number of known CPE's don't get indexed because the first instance of that name didn't have a supported collection url but a later one did. This isn't code that executes at runtime in syft so de-duplicating the processing for performance isn't really necessary here and it doesn't add much to the total runtime anyways

There was also a bug with the struct definition that caused only the final reference url in the list to be unmarshaled and considered when constructing the index

Previously when building the known CPE index, there was logic to de-duplicate processing based on the normalized CPE name; however, this means a significant number of known CPE's don't get indexed because the first instance of that name didn't have a supported collection url but a later one did. This isn't code that executes at runtime in syft so de-duplicating the processing for performance isn't really necessary here and it doesn't add much to the total runtime anyways Signed-off-by: Weston Steimel <commits@weston.slmail.me>

Previously the struct definition for CpeItem caused only the last URL reference in the list to be kept and processed for inclusion in the index Signed-off-by: Weston Steimel <commits@weston.slmail.me>

willmurphyscode · 2024-04-23T13:06:32Z

syft/pkg/cataloger/internal/cpegenerate/dictionary/index-generator/nvd.go

@@ -3,8 +3,8 @@ package main
 type CpeItem struct {
 	Name       string `xml:"name,attr"`
 	Title      string `xml:"title"`
-	References []struct {
-		Reference struct {
+	References struct {


@westonsteimel can you help me understand this change?

Why is it more correct to model references as a struct holding a slice of structs than just as a slice of structs?

With the previous struct definition we only got a single url (the final one from the list) after unmarshalling. If there is a better way to do this, I'm happy to update this, but the go xml unmarshalling examples I found all seemed to show this was the way to make it work

westonsteimel added the bug label Apr 23, 2024

westonsteimel enabled auto-merge (squash) April 23, 2024 10:11

westonsteimel force-pushed the fix-cpe-indexing branch from c135885 to 508c07d Compare April 23, 2024 10:14

westonsteimel changed the title ~~fix: stop pre-filtering potential known CPE URLs~~ fix: improvements to known CPE index construction Apr 23, 2024

fix: CPE index builder should extract and consider all reference urls

0cf0f56

Previously the struct definition for CpeItem caused only the last URL reference in the list to be kept and processed for inclusion in the index Signed-off-by: Weston Steimel <commits@weston.slmail.me>

westonsteimel force-pushed the fix-cpe-indexing branch from 064627c to 0cf0f56 Compare April 23, 2024 11:03

westonsteimel disabled auto-merge April 23, 2024 11:27

westonsteimel enabled auto-merge (squash) April 23, 2024 11:27

willmurphyscode reviewed Apr 23, 2024

View reviewed changes

willmurphyscode approved these changes Apr 23, 2024

View reviewed changes

westonsteimel merged commit 891e61a into main Apr 23, 2024
11 checks passed

westonsteimel deleted the fix-cpe-indexing branch April 23, 2024 13:28

BrewTestBot mentioned this pull request Apr 26, 2024

syft 1.3.0 Homebrew/homebrew-core#170167

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: improvements to known CPE index construction #2801

fix: improvements to known CPE index construction #2801

westonsteimel commented Apr 23, 2024 •

edited

Loading

willmurphyscode Apr 23, 2024

westonsteimel Apr 23, 2024 •

edited

Loading

fix: improvements to known CPE index construction #2801

fix: improvements to known CPE index construction #2801

Conversation

westonsteimel commented Apr 23, 2024 • edited Loading

willmurphyscode Apr 23, 2024

Choose a reason for hiding this comment

westonsteimel Apr 23, 2024 • edited Loading

Choose a reason for hiding this comment

westonsteimel commented Apr 23, 2024 •

edited

Loading

westonsteimel Apr 23, 2024 •

edited

Loading