Skip to content

fingerprinted artifact from multiple builds causes exception #430

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
grayaii opened this issue Apr 1, 2016 · 5 comments
Closed

fingerprinted artifact from multiple builds causes exception #430

grayaii opened this issue Apr 1, 2016 · 5 comments
Labels
feature request stale Issues not active for more than 30 days

Comments

@grayaii
Copy link

grayaii commented Apr 1, 2016

I have a build that saves an artifact. If I run that build multiple times, the artifact gets saved multiple times. The artifact is the same for all the builds (it has the same md5sum)

If I download the artifact from a build, I get an exception:

Build Done: http://jenkins.clearcare.it:8080/job/unit-test-worker/739 status: FAILURE
Downloading artifacts...
\Saving artifact to: /mnt/ope/ws/workspace/unit-test-driver/results_739.txt
\Saving artifact to: /mnt/ope/ws/workspace/unit-test-driver/nosetests_739.tar
Traceback (most recent call last):
  File "/mnt/ope/ws/workspace/unit-test-driver/misc-devops/scripts/cco-unit-tester/cco-unit-test-driver.py", line 221, in <module>
    ret_code = driver(test_file=args.test_file, num_of_workers=args.workers, historical_stats=historical_stats)
  File "/mnt/ope/ws/workspace/unit-test-driver/misc-devops/scripts/cco-unit-tester/cco-unit-test-driver.py", line 206, in driver
    v.save(os.path.join(args.artifact_dir, artifact_name))
  File "/usr/local/lib/python2.7/site-packages/jenkinsapi/artifact.py", line 62, in save
    self._verify_download(filepath, strict_validation)
  File "/usr/local/lib/python2.7/site-packages/jenkinsapi/artifact.py", line 97, in _verify_download
    raise ArtifactBroken("Artifact %s seems to be broken, check %s" % (local_md5, baseurl))
jenkinsapi.custom_exceptions.ArtifactBroken: Artifact 1276481102f218c981e0324180bafd9f seems to be broken, check http://jenkins.clearcare.it:8080
J = Jenkins('http://localhos:8080/', username='alex.gray', password='mypassword')
job = J['unit-test-worker']
r = job.get_build(739)
for artifact_name, v in r.get_artifact_dict().items():
    print('Saving artifact to: {0}'.format(os.path.join('.', artifact_name)))
    v.save(os.path.join('.', artifact_name), strict_validation=False)
print('all done')

As you can see, this artifact comes from multiple builds (the above code is downloading from build 739, but notice build 500 is in this list too):
screen shot 2016-03-31 at 8 54 25 pm

It fails because in fingerprint.py, in function validate_for_build, it thinks this artifact is coming from build 500, not build 739:

if self._data["fileName"] != filename:
            log.info(
                msg="Filename from jenkins (%s) did not match provided (%s)" %
                (self._data["fileName"], filename))

I think this function needs to optimized to look at ALL builds whose fingerprint matches the artifact.

@grayaii grayaii changed the title fingerprinted artifacted from multiple builds causes exception fingerprinted artifact from multiple builds causes exception Apr 1, 2016
@lechat
Copy link
Collaborator

lechat commented Apr 1, 2016

Could you post here json from build 739 (http://jenkins.clearcare.it:8080/unit-test-worker/739/api) and http://jenkins.clearcare.it:8080/unit-test-worker/739/fingerprint/

I guess that the problem here because of this code:

        if not self._data["original"] is None:
            if self._data["original"]["name"] == job:
                if self._data["original"]["number"] == build:
                    return True

It looks that Jenkins stores only one version of the artefact for the same md5 fingerprint and any artefacts generated by later builds will not be stored. So the file name of artefact will be nosetests_500.tar in all builds till fingerprint stays the same.

I think that easiest way to solve this is to not put build number into artefact's name...

@grayaii
Copy link
Author

grayaii commented Apr 1, 2016

Here is the json from unit-test-worker/739/api/json:

{"actions":[{"parameters":[{"name":"GIT_HASH","value":"e6410e96be176bda6a012fd5cdab18b1771f9eec"},{"name":"UPSTREAM_BUILD_NUMBER","value":"114"},{"name":"TEST_LIST","value":"python manage.py test receivables.tests.view_tests,python manage.py test twilio_app,python manage.py test geocode,python manage.py test logit,python manage.py test accounting_exports.tests.helpers"}]},{"causes":[{"shortDescription":"Started by user sre","userId":"sre@clearcareonline.com","userName":"sre"}]},{},{"buildsByBranchName":{"detached":{"buildNumber":739,"buildResult":null,"marked":{"SHA1":"e6410e96be176bda6a012fd5cdab18b1771f9eec","branch":[{"SHA1":"e6410e96be176bda6a012fd5cdab18b1771f9eec","name":"detached"}]},"revision":{"SHA1":"e6410e96be176bda6a012fd5cdab18b1771f9eec","branch":[{"SHA1":"e6410e96be176bda6a012fd5cdab18b1771f9eec","name":"detached"}]}}},"lastBuiltRevision":{"SHA1":"e6410e96be176bda6a012fd5cdab18b1771f9eec","branch":[{"SHA1":"e6410e96be176bda6a012fd5cdab18b1771f9eec","name":"detached"}]},"remoteUrls":["git@github.com:clearcare/clearcare.git"],"scmName":""},{},{},{},{"parameters":[{"name":"DESCRIPTION_SETTER_DESCRIPTION","value":"From Driver: 114"}]},{},{},{}],"artifacts":[{"displayPath":"nosetests_739.tar","fileName":"nosetests_739.tar","relativePath":"nosetests_739.tar"},{"displayPath":"results_739.txt","fileName":"results_739.txt","relativePath":"results_739.txt"}],"building":false,"description":"From Driver: 114","displayName":"#739","duration":598461,"estimatedDuration":572127,"executor":null,"fullDisplayName":"unit-test-worker #739","id":"739","keepLog":false,"number":739,"queueId":29640,"result":"FAILURE","timestamp":1459465315675,"url":"http://jenkins.clearcare.it:8080/job/unit-test-worker/739/","builtOn":"eod-us-west-2_m3.large-172.31.3.106-29b0adb6","changeSet":{"items":[],"kind":"git"},"culprits":[]}

And here is the FULL json from the unit-test-worker/739/api/xml?depth=2

{"actions":[{"parameters":[{"name":"GIT_HASH","value":"e6410e96be176bda6a012fd5cdab18b1771f9eec"},{"name":"UPSTREAM_BUILD_NUMBER","value":"114"},{"name":"TEST_LIST","value":"python manage.py test receivables.tests.view_tests,python manage.py test twilio_app,python manage.py test geocode,python manage.py test logit,python manage.py test accounting_exports.tests.helpers"}]},{"causes":[{"shortDescription":"Started by user sre","userId":"sre@clearcareonline.com","userName":"sre"}]},{},{"buildsByBranchName":{"detached":{"buildNumber":739,"buildResult":null,"marked":{"SHA1":"e6410e96be176bda6a012fd5cdab18b1771f9eec","branch":[{"SHA1":"e6410e96be176bda6a012fd5cdab18b1771f9eec","name":"detached"}]},"revision":{"SHA1":"e6410e96be176bda6a012fd5cdab18b1771f9eec","branch":[{"SHA1":"e6410e96be176bda6a012fd5cdab18b1771f9eec","name":"detached"}]}}},"lastBuiltRevision":{"SHA1":"e6410e96be176bda6a012fd5cdab18b1771f9eec","branch":[{"SHA1":"e6410e96be176bda6a012fd5cdab18b1771f9eec","name":"detached"}]},"remoteUrls":["git@github.com:clearcare/clearcare.git"],"scmName":""},{"tags":[]},{},{},{"parameters":[{"name":"DESCRIPTION_SETTER_DESCRIPTION","value":"From Driver: 114"}]},{},{},{}],"artifacts":[{"displayPath":"nosetests_739.tar","fileName":"nosetests_739.tar","relativePath":"nosetests_739.tar"},{"displayPath":"results_739.txt","fileName":"results_739.txt","relativePath":"results_739.txt"}],"building":false,"description":"From Driver: 114","displayName":"#739","duration":598461,"estimatedDuration":572127,"executor":null,"fullDisplayName":"unit-test-worker #739","id":"739","keepLog":false,"number":739,"queueId":29640,"result":"FAILURE","timestamp":1459465315675,"url":"http://jenkins.clearcare.it:8080/job/unit-test-worker/739/","builtOn":"eod-us-west-2_m3.large-172.31.3.106-29b0adb6","changeSet":{"items":[],"kind":"git"},"culprits":[],"fingerprint":[{"fileName":"nosetests_500.tar","hash":"1276481102f218c981e0324180bafd9f","original":{"name":"unit-test-worker","number":500},"timestamp":1459348516686,"usage":[{"name":"unit-test-worker","ranges":{"ranges":[{"end":501,"start":500},{"end":540,"start":506},{"end":599,"start":579},{"end":740,"start":739},{"end":752,"start":751},{"end":769,"start":768}]}}]},{"fileName":"results_739.txt","hash":"b22363c4cae1f2a88fb9ae24521cb2a5","original":{"name":"unit-test-worker","number":739},"timestamp":1459465913782,"usage":[{"name":"unit-test-worker","ranges":{"ranges":[{"end":740,"start":739}]}}]}]}

Correct, by naming the artifacts the same the problem goes away, but it would be nice not to have this requirement.

Hope this helps, and thanks for looking into this!

@stale
Copy link

stale bot commented Oct 18, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale Issues not active for more than 30 days label Oct 18, 2019
@stale
Copy link

stale bot commented Oct 25, 2019

Closed due to inactivity

@stale stale bot closed this as completed Oct 25, 2019
@plastikos
Copy link

plastikos commented Dec 12, 2023

This is still a problem:

INFO:jenkinsapi.artifact:Saving artifact @ https://example.org:8443/job/i40e-dkms/job/br-update_build/13/artifact/i40e-dkms/artifacts/ubuntu_16_04/i40e_2.19.3%2Blex.3784.orig.tar.gz to /home/plastikos/dev/gio/distro_support/i40e-dkms-bin.git/fetch/i40e-dkms/ubuntu_16_04/i40e_2.19.3+lex.3784.orig.tar.gz
DEBUG:urllib3.connectionpool:https://example.org:8443 "GET /fingerprint/3bad63620e58d3d692220726597f4a9f/api/python HTTP/1.1" 200 226
INFO:jenkinsapi.fingerprint:Filename from jenkins (i40e_2.19.3.dev.1.orig.tar.gz) did not match provided (i40e_2.19.3+lex.3784.orig.tar.gz)
WARNING:jenkinsapi.artifact:Jenkins artifact could not be identified.
DEBUG:urllib3.connectionpool:https://example.org:8443 "GET /job/i40e-dkms/job/br-update_build/13/artifact/i40e-dkms/artifacts/ubuntu_16_04/i40e_2.19.3%2Blex.3784.orig.tar.gz HTTP/1.1" 200 686306
DEBUG:urllib3.connectionpool:https://example.org:8443 "GET /fingerprint/3bad63620e58d3d692220726597f4a9f/api/python HTTP/1.1" 200 226
INFO:jenkinsapi.fingerprint:Filename from jenkins (i40e_2.19.3.dev.1.orig.tar.gz) did not match provided (i40e_2.19.3+lex.3784.orig.tar.gz)
Traceback (most recent call last):
  File "/home/plastikos/bin/fetch", line 1825, in <module>
    sys.exit(main(sys.argv))
             ^^^^^^^^^^^^^^
  File "/home/plastikos/bin/fetch", line 1817, in main
    if not fetch_items(fetch_info, opts):
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/plastikos/bin/fetch", line 1786, in fetch_items
    fetch_item(finfo, opts)
  File "/home/plastikos/bin/fetch", line 1735, in fetch_item
    ENGINES[finfo["engine"]](finfo, opts)
  File "/home/plastikos/bin/fetch", line 1669, in fetch_jenkins
    art.save(write_path, finfo.get("strict_validation", True))
  File "/usr/lib/python3/dist-packages/jenkinsapi/artifact.py", line 67, in save
    self._verify_download(filepath, strict_validation)
  File "/usr/lib/python3/dist-packages/jenkinsapi/artifact.py", line 106, in _verify_download
    raise ArtifactBroken(
jenkinsapi.custom_exceptions.ArtifactBroken: Artifact 3bad63620e58d3d692220726597f4a9f seems to be broken, check https://example.org:8443

If Jenkins stores files by hash and the fileName of the artifact is the file name used when first stored and thus will not match any subsequent stores of the same artifact by a different filename (which Jenkins treats as valid) then I don't see why comparing the fileName in Fingerprint.validate_for_build() is an appropriate way to validate an artifact. Jenkins may not store any additional artifacts by the same hash but it does store the alternate names for the build. Telling a user to not store the same artifact with alternate names just to avoid the bug in Fingerprint.validate_for_build() ignores the way Jenkins stores artifacts and still tracks the alternate names for the same artifact.

It seems that it would be better if Fingerprint.validate_for_build() compared the hash - which is already computed for the local file and can be compared to the Jenkins hash ID:

--- jenkinsapi/fingerprint.py.orig      2023-12-12 01:57:10.094022986 -0700
+++ jenkinsapi/fingerprint.py   2023-12-12 02:09:52.028085027 -0700
@@ -79,10 +79,10 @@
             if self._data["original"]["name"] == job:
                 if self._data["original"]["number"] == build:
                     return True
-        if self._data["fileName"] != filename:
+        if self._data["hash"] != self.id_:
             log.info(
-                msg="Filename from jenkins (%s) did not match provided (%s)"
-                % (self._data["fileName"], filename)
+                msg="File hash from Jenkins (%s) did not match local hash (%s)"
+                % (self._data["hash"], self.id_)
             )
             return False
         for usage_item in self._data["usage"]:

With this change the filename parameter to Fingerprint.validate_for_build() is unused and could be removed (although that would change the interface and should be avoided).

plastikos added a commit to plastikos/jenkinsapi that referenced this issue Dec 12, 2023
…an "fileName".

Jenkins only stores one version of an artifact by its hash and by the
original filename.  Subsequent stores of an artifact with an identical
hash but different filename will point to the original artifact.  If a
duplicate artifact (identical hash) has a different filename than the
original filename then that new filename will be stored as a name in
the build artifacts but will not change the fileName of the original
artifact.  This makes it problematic to compare against the original
fileName when validating a build artifact that has been saved.

Since the md5sum hash is computed for the local, saved artifact it can
be compared against the Jenkins artifact ID (viz hash) for validation.
This avoids the problem of identical artifacts having additional
filenames.  Using the hash is also a better way of validating data
integrity rather than using the fileName even when the filename
matches.
plastikos added a commit to plastikos/jenkinsapi that referenced this issue Dec 12, 2023
…fileName"

Jenkins only stores one version of an artifact by its hash and by the
original filename.  Subsequent stores of an artifact with an identical
hash but different filename will point to the original artifact.  If a
duplicate artifact (identical hash) has a different filename than the
original filename then that new filename will be stored as a name in
the build artifacts but will not change the fileName of the original
artifact.  This makes it problematic to compare against the original
fileName when validating a build artifact that has been saved.

Since the md5sum hash is computed for the local, saved artifact it can
be compared against the Jenkins artifact ID (viz hash) for validation.
This avoids the problem of identical artifacts having additional
filenames.  Using the hash is also a better way of validating data
integrity rather than using the fileName even when the filename
matches.
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
feature request stale Issues not active for more than 30 days
Projects
None yet
Development

No branches or pull requests

3 participants