Skip to content

[Text Generation][V2] NonKVCachePipeline #1417

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
wants to merge 7 commits into from

Conversation

dbogunowicz
Copy link
Contributor

@dbogunowicz dbogunowicz commented Nov 17, 2023

Feature Description

Added the TestGenerationPipelineNoKVCache. This pipeline processes the prompt and returns the new token. That's it.
Its main functionality is mapping prompt tokens to logits, instrumental for computing the perplexity of the model given a dataset

Testing

Updated the integration tests to cover the case of non-kv-cache inference.

Example Use

from deepsparse.v2.text_generation import TextGenerationPipelineNoCache

prompt = ["Some funny prompt", "Why are you so"]

pipeline = TextGenerationPipelineNoCache(model_path="hf:mgoin/TinyStories-1M-ds",
                                         onnx_model_name="model-orig.onnx",
                                         sequence_length=20)

out = pipeline(prompt=prompt,
               include_prompt_logits=True,
               generation_kwargs=dict(output_scores=True))

for gen in out.generations:
    print(gen)
text='.' score=array([[ 2.9344807 , -0.03345669, -4.11256   , ..., -6.9316325 ,
        -4.6005425 ,  1.1827914 ],
       [ 7.008805  , -0.11603884, -7.1837015 , ..., -7.0405912 ,
        -2.386351  , -2.2007818 ],
       [ 6.348213  , -2.2960157 , -6.433192  , ..., -6.5930486 ,
        -5.8315077 , -0.58804405],
       [ 0.        ,  0.        ,  0.        , ...,  0.        ,
         0.        ,  0.        ]], dtype=float32) finished=True finished_reason='length' # notice that logits get zero padding from the end, this is because all logits need to have the same shape (the length of the longest prompt in the input +1)
text=' sad' score=array([[ 2.560934 ,  1.1993233, -6.670935 , ..., -7.3002615, -3.823823 ,
         1.8125833],
       [-1.1050931, -2.4256568, -7.3015127, ..., -6.1500154, -4.074909 ,
         1.8155754],
       [ 6.172593 , -2.2252593, -9.146653 , ..., -7.70834  , -4.810748 ,
         0.3985293],
       [ 1.4988875,  1.0973434, -4.4714937, ..., -4.8026247, -1.1791464,
         1.6924176]], dtype=float32) finished=True finished_reason='length'

Next steps

  • Create a parentTextGenerationPipeline operator that can either choose to use the kv-cache or non-kv cache version of the pipeline, depending on the topology of the ONNX model
  • Move the overwriting of the transformer inputs to some high-level function
  • Use the V2 pipeline for Perplexity calculation
  • swap GraphRouter for LinearRouter in TextGenerationPipelineNoKVCache

@dbogunowicz dbogunowicz changed the base branch from main to v2 November 17, 2023 15:46
@dbogunowicz dbogunowicz changed the base branch from v2 to feature/damian/v2/factor_out_transformation_utils November 20, 2023 13:31
@dbogunowicz dbogunowicz changed the base branch from feature/damian/v2/factor_out_transformation_utils to v2 November 20, 2023 13:33
@dbogunowicz dbogunowicz force-pushed the feature/damian/no_kv_cache branch from a95d55a to fa96efb Compare November 20, 2023 13:58
@dbogunowicz dbogunowicz changed the title [Text Generation][V2] NonKVCachePipeline [WiP][Text Generation][V2] NonKVCachePipeline Nov 27, 2023
@dbogunowicz dbogunowicz marked this pull request as ready for review November 27, 2023 14:11
@dbogunowicz dbogunowicz changed the title [WiP][Text Generation][V2] NonKVCachePipeline [Text Generation][V2] NonKVCachePipeline Nov 28, 2023
@dbogunowicz dbogunowicz force-pushed the feature/damian/no_kv_cache branch from 83d6cf1 to dcab3f9 Compare December 6, 2023 12:38
Base automatically changed from v2 to main December 6, 2023 15:37
bfineran and others added 6 commits December 18, 2023 16:08
… router and image classification pipeline/operators/example (#1325)

* initial functionality and working example with image classification

* remove testing image

* update args

* initial functionality and working example with image classification

* remove testing image

* pr comments

* defines schemas for operators and test

* add image classification test, PR comments

* fix input/output handling in pipeline and operator base classes to be more generic; remove context

* add additional operator input message

* typo fix
* [v2] EngineOperator updates to make continuous batching easier

* test fixes
…ity (#1348)

* initial functionality and working example with image classification

* remove testing image

* rebase fixes

* initial functionality and working example with image classification

* text gen

* updates func

* prompt inference, initial functionality

* remove image; update state docstring

* Fix typo

* add todo for split/join

* remove context, clean-up args, remove prefill_preprocess_operaator

* fix docstrings
…generation functionality (#1356)

* initial functionality and working example with image classification

* remove testing image

* rebase fixes

* initial functionality and working example with image classification

* text gen

* updates func

* prompt inference, initial functionality

* remove image; update state docstring

* Fix typo

* add todo for split/join

* remove context, clean-up args, remove prefill_preprocess_operaator

* fix docstrings

* initial functionality and working example with image classification

* updates func

* prompt inference, initial functionality

* finish generation operators and update routes

* further breakdown operators

* add operators

* fix can_operate condition

* update can_operate to not rely on the inference_state

* rebase + update

* fix condition

* fix capacity settting again

* typo fixes
@dbogunowicz dbogunowicz force-pushed the feature/damian/no_kv_cache branch from e0a9dee to 7f3eb12 Compare December 18, 2023 16:10
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants