Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[Bug]: LLMRerank Default Prompt will fail and cause an error if an unrelated line containing a comma is included in the answer #17352

Closed
okirmis opened this issue Dec 22, 2024 · 1 comment · Fixed by #17353
Labels
bug Something isn't working triage Issue needs to be triaged/prioritized

Comments

@okirmis
Copy link
Contributor

okirmis commented Dec 22, 2024

Bug Description

Similar to #11092 there is still an IndexError being thrown when the LLM's answer in LLMRerank contains at least one unrelated line which also contains a comma, in my case the answer was:

Doc: 5, Relevance: 9
Doc: 1, Relevance: 8

The document with the highest relevance score is Doc: 5, as it directly answers the question about the contract. 

While the first two lines and the third (empty) line are processed correctly, the third line throws an error in default_parse_choice_select_answer_fn in the following code as line_tokens[1] does not contain a colon:

        _answer_relevance = re.findall(r"\d+", line_tokens[1].split(":")[1].strip())[0]
        answer_relevances.append(float(_answer_relevance))

Version

0.12.8

Steps to Reproduce

The can be reproduced using the example code mentioned in https://docs.llamaindex.ai/en/stable/examples/node_postprocessor/LLMReranker-Gatsby/ but I cannot share the exact input data.

But manually calling default_parse_choice_select_answer_fn with the LLM answer mentioned will throw an IndexError, e.g.:

answer = """Doc: 5, Relevance: 9
Doc: 1, Relevance: 8

The document with the highest relevance score is Doc: 5, as it directly answers the question about the contract."""
default_parse_choice_select_answer_fn(answer, 5)

Relevant Logs/Tracbacks

Traceback (most recent call last):
  File "<REDACTED>/app.py", line 67, in <module>
    response = query_engine.query(query)
               ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.miniconda3/lib/python3.12/site-packages/llama_index/core/instrumentation/dispatcher.py", line 321, in wrapper
    result = func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.miniconda3/lib/python3.12/site-packages/llama_index/core/base/base_query_engine.py", line 52, in query
    query_result = self._query(str_or_query_bundle)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.miniconda3/lib/python3.12/site-packages/llama_index/core/instrumentation/dispatcher.py", line 321, in wrapper
    result = func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.miniconda3/lib/python3.12/site-packages/llama_index/core/query_engine/retriever_query_engine.py", line 178, in _query
    nodes = self.retrieve(query_bundle)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.miniconda3/lib/python3.12/site-packages/llama_index/core/query_engine/retriever_query_engine.py", line 134, in retrieve
    return self._apply_node_postprocessors(nodes, query_bundle=query_bundle)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.miniconda3/lib/python3.12/site-packages/llama_index/core/query_engine/retriever_query_engine.py", line 127, in _apply_node_postprocessors
    nodes = node_postprocessor.postprocess_nodes(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.miniconda3/lib/python3.12/site-packages/llama_index/core/postprocessor/types.py", line 54, in postprocess_nodes
    return self._postprocess_nodes(nodes, query_bundle)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.miniconda3/lib/python3.12/site-packages/llama_index/core/instrumentation/dispatcher.py", line 321, in wrapper
    result = func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.miniconda3/lib/python3.12/site-packages/llama_index/core/postprocessor/llm_rerank.py", line 96, in _postprocess_nodes
    raw_choices, relevances = self._parse_choice_select_answer_fn(
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.miniconda3/lib/python3.12/site-packages/llama_index/core/indices/utils.py", line 119, in default_parse_choice_select_answer_fn
    _answer_relevance = re.findall(r"\d+", line_tokens[1].split(":")[1].strip())[0]
                                           ~~~~~~~~~~~~~~~~~~~~~~~~~^^^
IndexError: list index out of range
@okirmis okirmis added bug Something isn't working triage Issue needs to be triaged/prioritized labels Dec 22, 2024
Copy link

dosubot bot commented Dec 22, 2024

Hey there, @okirmis! I'm here to help you with any bugs, questions, or contributions you have in mind. Let's work together to solve the issue you're facing.

To resolve the IndexError in the default_parse_choice_select_answer_fn function when the LLM's answer contains an unrelated line with a comma, you can modify the function to handle cases where the line does not contain the expected format. Specifically, ensure that the code checks for the presence of a colon (:) before attempting to split and access elements. Here's a possible solution:

def default_parse_choice_select_answer_fn(
    answer: str, num_choices: int, raise_error: bool = False
) -> Tuple[List[int], List[float]]:
    """Default parse choice select answer function."""
    answer_lines = answer.split("\n")
    answer_nums = []
    answer_relevances = []
    for answer_line in answer_lines:
        line_tokens = answer_line.split(",")
        if len(line_tokens) != 2:
            if not raise_error:
                continue
            else:
                raise ValueError(
                    f"Invalid answer line: {answer_line}. "
                    "Answer line must be of the form: "
                    "answer_num: <int>, answer_relevance: <float>"
                )
        try:
            # Check if the token contains a colon before splitting
            if ':' not in line_tokens[0]:
                if raise_error:
                    raise ValueError(
                        f"Invalid answer line: {answer_line}. "
                        "Answer line must be of the form: "
                        "answer_num: <int>, answer_relevance: <float>"
                    )
                continue
            answer_num = int(line_tokens[0].split(":")[1].strip())
        except (IndexError, ValueError) as e:
            if not raise_error:
                continue
            else:
                raise ValueError(
                    f"Invalid answer line: {answer_line}. "
                    "Answer line must be of the form: "
                    "answer_num: <int>, answer_relevance: <float>"
                )
        if answer_num > num_choices:
            continue
        # Check if the token contains a colon before splitting
        if ':' not in line_tokens[1]:
            if raise_error:
                raise ValueError(
                    f"Invalid answer line: {answer_line}. "
                    "Answer line must be of the form: "
                    "answer_num: <int>, answer_relevance: <float>"
                )
            continue
        _answer_relevance = re.findall(r"\d+", line_tokens[1].split(":")[1].strip())[0]
        answer_relevances.append(float(_answer_relevance))
    return answer_nums, answer_relevances

This modification ensures that the function checks for the presence of a colon before attempting to split the string, which helps prevent IndexError when the line format is incorrect [1].

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

okirmis added a commit to okirmis/llama_index that referenced this issue Dec 22, 2024
logan-markewich pushed a commit that referenced this issue Dec 26, 2024
…#17353)

* Fix bug #17352 by catching errors for malformed LLM responses

* Fix formatting
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working triage Issue needs to be triaged/prioritized
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant