Please support phi-1_5 and phi-2 #34

RadixSeven · 2024-04-13T20:15:43Z

In the README, we were asked to report unsupported models. The supported_models.yaml lists microsoft/phi-1_5 as supported.

However, when I run:

python cfg_generate.py -m "microsoft/phi-1_5" 'You would represent "My dog Sparky is 7 years old and weighs 21 kg" in JSON as '

Where cfg_generate.py is only a minor modification from the code in the README:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import argparse
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers_cfg.grammar_utils import IncrementalGrammarConstraint
from transformers_cfg.generation.logits_process import (
    GrammarConstrainedLogitsProcessor,
)

DEFAULT_GRAMMAR = "examples/grammars/json.ebnf"

DEFAULT_MODEL = "openai-community/gpt2-xl"


def main():
    parser = argparse.ArgumentParser(
        description="Generate text using a specified model and grammar."
    )
    parser.add_argument(
        "-m",
        "--model_id",
        default=DEFAULT_MODEL,
        help=f"The ID of the model to use. (default: {DEFAULT_MODEL})",
    )
    parser.add_argument(
        "-g",
        "--grammar_file",
        default=DEFAULT_GRAMMAR,
        help=f"The path to the grammar file. (default: {DEFAULT_GRAMMAR})",
    )
    parser.add_argument(
        "prompts", nargs="+", help="The prompts to use for generation."
    )
    args = parser.parse_args()

    device = torch.device("cpu")
    print(f"Using device: {device}")

    # Load model and tokenizer
    tokenizer = AutoTokenizer.from_pretrained(args.model_id)
    tokenizer.pad_token = tokenizer.eos_token
    model = AutoModelForCausalLM.from_pretrained(args.model_id).to(device)
    model.generation_config.pad_token_id = model.generation_config.eos_token_id

    # Load json grammar
    with open(args.grammar_file, "r") as file:
        grammar_str = file.read()
    grammar = IncrementalGrammarConstraint(grammar_str, "root", tokenizer)
    grammar_processor = GrammarConstrainedLogitsProcessor(grammar)

    # Generate
    input_ids = tokenizer(
        args.prompts,
        add_special_tokens=False,
        return_tensors="pt",
        padding=True,
    )["input_ids"]
    output = model.generate(
        input_ids,
        max_length=50,
        logits_processor=[grammar_processor],
        repetition_penalty=1.1,
        num_return_sequences=1,
    )

    # decode output
    generations = tokenizer.batch_decode(output, skip_special_tokens=True)
    print(generations)


if __name__ == "__main__":
    main()

I get:

Using device: cpu
tokenizer_config.json: 100%|███████████████████| 237/237 [00:00<00:00, 4.06MB/s]
vocab.json: 100%|████████████████████████████| 798k/798k [00:00<00:00, 10.9MB/s]
merges.txt: 100%|████████████████████████████| 456k/456k [00:00<00:00, 9.77MB/s]
tokenizer.json: 100%|██████████████████████| 2.11M/2.11M [00:00<00:00, 15.0MB/s]
added_tokens.json: 100%|███████████████████| 1.08k/1.08k [00:00<00:00, 21.9MB/s]
special_tokens_map.json: 100%|███████████████| 99.0/99.0 [00:00<00:00, 1.64MB/s]
config.json: 100%|█████████████████████████████| 864/864 [00:00<00:00, 15.9MB/s]
pytorch_model.bin: 100%|███████████████████| 2.84G/2.84G [03:33<00:00, 13.3MB/s]
generation_config.json: 100%|████████████████| 74.0/74.0 [00:00<00:00, 1.07MB/s]
WARNING:transformers_cfg.vocab_struct:Warning: unrecognized tokenizer: using default token formatting
Traceback (most recent call last):
  File "/home/eric/Prj/cfg_llm_security/cfg_generate.py", line 73, in <module>
    main()
  File "/home/eric/Prj/cfg_llm_security/cfg_generate.py", line 59, in main
    output = model.generate(
             ^^^^^^^^^^^^^^^
  File "/home/eric/venv/cfg_llm_security/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/eric/venv/cfg_llm_security/lib/python3.11/site-packages/transformers/generation/utils.py", line 1479, in generate
    return self.greedy_search(
           ^^^^^^^^^^^^^^^^^^^
  File "/home/eric/venv/cfg_llm_security/lib/python3.11/site-packages/transformers/generation/utils.py", line 2353, in greedy_search
    next_tokens_scores = logits_processor(input_ids, next_token_logits)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/eric/venv/cfg_llm_security/lib/python3.11/site-packages/transformers/generation/logits_process.py", line 97, in __call__
    scores = processor(input_ids, scores)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/eric/venv/cfg_llm_security/lib/python3.11/site-packages/transformers_cfg/generation/logits_process.py", line 102, in __call__
    return self.process_logits(input_ids, scores)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/eric/venv/cfg_llm_security/lib/python3.11/site-packages/transformers_cfg/generation/logits_process.py", line 95, in process_logits
    self.mask_logits(scores, scores.device)
  File "/home/eric/venv/cfg_llm_security/lib/python3.11/site-packages/transformers_cfg/generation/logits_process.py", line 57, in mask_logits
    logits[~acceptance] = -math.inf
    ~~~~~~^^^^^^^^^^^^^
IndexError: The shape of the mask [1, 50295] at index 1 does not match the shape of the indexed tensor [1, 51200] at index 1

Because of the following line in the errors:

WARNING:transformers_cfg.vocab_struct:Warning: unrecognized tokenizer: using default token formatting

I suspect that the problem is that the tokenizer selected by the AutoTokenizer has changed from what your code is expecting. The same thing happens for phi-2.

I am working on a time-sensitive project right now, so I won't be helping more than just reporting the bug. Feel free to close this as "won't do;" I won't feel bad. I've already received the rest of your work for free. (And if I can't complete the project without fixing it, I might submit a pull request.)

The text was updated successfully, but these errors were encountered:

Saibo-creator · 2024-04-16T13:59:27Z

Hello @RadixSeven
Thank you for reporting this error! I understand the cause of the error and will incorporate a solution and explanation in this discussion. Another user has reported the same issue with T5 as well.

reason

The problem stems from a deliberate design decision in Phi (and similar models like T5), involving a discrepancy between the tokenizer vocabulary (50295) and the model embedding size (51200). This discrepancy allows for the future addition of special tokens.

fix

To resolve this, adding model.resize_token_embeddings(len(tokenizer)) before performing inference will solve the issue. This is an appropriate solution.

    model = AutoModelForCausalLM.from_pretrained(args.model_id).to(device)
    model.generation_config.pad_token_id = model.generation_config.eos_token_id
    model.resize_token_embeddings(len(tokenizer))

    # Load json grammar
    with open(args.grammar_file, "r") as file:

I get following output with you example: 'You would represent "My dog Sparky is 7 years old and weighs 21 kg" in JSON as {"name":["Dog","Sparky"],"age":[7,21],"weight":[21]}'

We will soon update our code to handle this automatically, so users won't have to manage it on their own.

More details:

There are two related discussions from HF community:

Similar practice has also been done with other models such as T5

Inconsistent number of vocab from pretrained T5Tokenizer and T5ForConditionalGeneration

Saibo-creator self-assigned this Apr 16, 2024

Saibo-creator closed this as completed Jul 9, 2024

x0wllaar mentioned this issue Aug 27, 2024

Support masking when embedding size is different from vocab size #83

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Please support phi-1_5 and phi-2 #34

Please support phi-1_5 and phi-2 #34

RadixSeven commented Apr 13, 2024

Saibo-creator commented Apr 16, 2024 •

edited

Loading

Please support phi-1_5 and phi-2 #34

Please support phi-1_5 and phi-2 #34

Comments

RadixSeven commented Apr 13, 2024

Saibo-creator commented Apr 16, 2024 • edited Loading

reason

fix

More details:

Saibo-creator commented Apr 16, 2024 •

edited

Loading