Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Question in reproducing experimental results. #2

Open
wenhaoli-xmu opened this issue Dec 4, 2024 · 4 comments
Open

Question in reproducing experimental results. #2

wenhaoli-xmu opened this issue Dec 4, 2024 · 4 comments

Comments

@wenhaoli-xmu
Copy link

wenhaoli-xmu commented Dec 4, 2024

HI😊, we are reproducing your experimental results as the baseline of our method.

We are confused by the following questions. 🤔

First, why running the following code needs more than 1 minutes? Since the prompt is short, it is expected to finish very quickly.

In [8]: prompt = "TESLA company is found by"
In [9]: output = model(prompt=prompt)
...(a long time)...

Second, after waiting for over 1 min, we finally got the results looks like this:

In [10]: output
Out[10]: {'text': ['Nik']}

We think this output is not reasonable and want to know if there are some improper configurations in the following scripts?

@dataclass
class MagicpigConfig:
    server_type: str = 'hf'
    server_host: str = '127.0.0.1'
    server_port: str = '5000'
    ssh_server: Optional[str] = None
    ssh_key_path: Optional[str] = None
    model_name_or_path: str = 'meta-llama/Llama-2-7b-chat-hf'

    temperature: float = 0.0
    top_k: int = 32
    top_p: float = 1.0
    random_seed: int = 0
    stop_words: list = field(default_factory=list)
    sliding_window_size: int = None
    threads: int = 1
    
    K: int = 10
    L: int = 150
    S: float = 4.0
    W: int = 64
    Q: int = 0
    QR: float = 0.0
    max_seq_length: int = 4096
    max_new_tokens: int = 128

If there are improper configurations for short prompt generation, we want to further know what is the most suitable configuration under different prompt length? e.g. 1K, 2K, 4K, 8K.

@dreaming-panda
Copy link
Contributor

I am unsure whether you can directly input a sentence (without tokenization) into a model. Can you run the RULER experiments?

@wenhaoli-xmu
Copy link
Author

wenhaoli-xmu commented Dec 14, 2024

☺️☺️Thanks a lot for the answer🙏. We indeed use the code which you tested on RULER, and we have figured out the reason why it seems slow.

One more question for you, how can I stat the concrete pruning rate used in the decoding phase 🤔? As MagicPIG uses dynamic retrieval, it is not like Quest, which uses a fixed token budget.

By the way, if I got the concrete pruning rate, can I use the following formula to calculate the overall equivalent token budget 🤔?

sink_budget = 4
local_budget = 64
equivalent_budget = pruning_rate * prefill_context_length + sink_budget + local_budget

@dreaming-panda
Copy link
Contributor

I think your understanding is correct. BTW, we will release v0.2 next week. Maybe make it easier for you to evaluate.

@wenhaoli-xmu
Copy link
Author

Thanks a lot!☺️ Looking forward to your new release.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants