Skip to content

🐛 [TestRemoval/TestRepair] - 211, 215- include status code in mock response #33

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
dmelcer9 opened this issue Jul 26, 2024 · 3 comments · Fixed by #49
Closed

🐛 [TestRemoval/TestRepair] - 211, 215- include status code in mock response #33

dmelcer9 opened this issue Jul 26, 2024 · 3 comments · Fixed by #49
Assignees
Labels
bug Something isn't working

Comments

@dmelcer9
Copy link

EvalPlus version

v0_1_0_hf

Output of running ls ~/.cache/bigcodebench

BigCodeBench-v0.1.0_hf.jsonl

Task ID of the programming task

BigCodeBench/211, BigCodeBench/215, probably some others as well

The original test

(All tests)
mock_response = MagicMock() 
mock_response.content = MOCK_CONTENT 
mock_requests_get.return_value = mock_response

Your proposed new test

mock_response = MagicMock() 
mock_response.content = MOCK_CONTENT 
mock_response.status_code = 200
mock_requests_get.return_value = mock_response

Description

The LLM sometimes (reasonably!) generates code like:

    if r.status_code != 200:
        print("Error: Failed to download file from URL.")
        return None

   (Rest of code solves task correctly)

But fails the test

Other context

No response

@dmelcer9 dmelcer9 added the bug Something isn't working label Jul 26, 2024
@terryyz
Copy link
Collaborator

terryyz commented Jul 26, 2024

Thanks @dmelcer9! It makes sense :) We didn't think about this when developing the initial tasks. We will incorporate this change in the next dataset release.

@hvaara
Copy link
Contributor

hvaara commented Sep 14, 2024

@dmelcer9 which model did you use? I'd like to verify resolution in #49.

@dmelcer9
Copy link
Author

Not 100% sure but I believe this was with Starcoder2-15b, temperature was somewhere between 0.7 and 1.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants