-
Notifications
You must be signed in to change notification settings - Fork 612
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
fix: improvements to known CPE index construction #2801
Conversation
Previously when building the known CPE index, there was logic to de-duplicate processing based on the normalized CPE name; however, this means a significant number of known CPE's don't get indexed because the first instance of that name didn't have a supported collection url but a later one did. This isn't code that executes at runtime in syft so de-duplicating the processing for performance isn't really necessary here and it doesn't add much to the total runtime anyways Signed-off-by: Weston Steimel <commits@weston.slmail.me>
c135885
to
508c07d
Compare
Previously the struct definition for CpeItem caused only the last URL reference in the list to be kept and processed for inclusion in the index Signed-off-by: Weston Steimel <commits@weston.slmail.me>
064627c
to
0cf0f56
Compare
@@ -3,8 +3,8 @@ package main | |||
type CpeItem struct { | |||
Name string `xml:"name,attr"` | |||
Title string `xml:"title"` | |||
References []struct { | |||
Reference struct { | |||
References struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@westonsteimel can you help me understand this change?
Why is it more correct to model references as a struct holding a slice of structs than just as a slice of structs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the previous struct definition we only got a single url (the final one from the list) after unmarshalling. If there is a better way to do this, I'm happy to update this, but the go xml unmarshalling examples I found all seemed to show this was the way to make it work
Previously when building the known CPE index, there was logic to de-duplicate processing based on the normalized CPE name; however, this means a significant number of known CPE's don't get indexed because the first instance of that name didn't have a supported collection url but a later one did. This isn't code that executes at runtime in syft so de-duplicating the processing for performance isn't really necessary here and it doesn't add much to the total runtime anyways
There was also a bug with the struct definition that caused only the final reference url in the list to be unmarshaled and considered when constructing the index