[Pipeline Refactor] Fix Operator scheduling to fix issue with slow execution #1453

dsikka · 2023-12-04T23:41:11Z

Summary

While using the new timer with the new refactored pipeline, engine operator execution was really slow for all cases compared to v1 (ORT, deepsparse with internal kv_cache, deepsparse engine without internal kv_cache)
After doing some digging, this was due to conditions that were being checked when executing each operator within the subgraphs
Fixing this issue allows all 3 cases to run with the expected speed (time comparisons were made with the sample scripts shown below)

Testing

Time Comparisons

Multiple Prompts (3) with Multiple Generations (4 per prompt):

	ORT	Continuous Batching, ORT	Deepsparse with External KV Cache	Continuous Batching, Deepsparse with External KV Cache	Deepsparse with Internal KV Cache
v1	58.44	x	46.41	x	15.44
v2	60.33	47.34	46.24	32.00	14.23

Tested locally using internal kv_cache, external kv_cache, and ORT. Continuous batching was also tested for v2, using batch sizes 2 and 4

Example test script:

import time

from deepsparse.transformers.pipelines.text_generation import TextGenerationInput
from deepsparse.v2.text_generation.pipeline import TextGenerationPipeline

pipeline = TextGenerationPipeline(
    model_path=model_path,
    engine_kwargs={"engine_type": "deepsparse"},
    internal_kv_cache=False
)

prompts = [["Hello there!", "The sun shined bright", "The dog barked"]]

input_value = TextGenerationInput(
    prompt=prompts[0],
    generation_kwargs={
        "num_return_sequences": 4,
        "max_new_tokens": 20,
        "do_sample": True,
    },
)
s = time.time()
output = pipeline(input_value)
e = time.time()
print("Total Time", e-s)
for i in output.generations:
    print(i)
    print("\n")

Sample Output:

Total Time 46.2393000125885
[GeneratedText(text=" I'm happy to announce that we've launched a new app for the Android device. Have you heard about", score=None, finished=True, finished_reason='max_new_tokens'), GeneratedText(text=' Let me know if you have any questions that I can help with.\nHi and welcome to the website', score=None, finished=True, finished_reason='max_new_tokens'), GeneratedText(text=' Let me tell you about a website I stumbled upon that I think you will enjoy, http://www.', score=None, finished=True, finished_reason='max_new_tokens'), GeneratedText(text=' I am writing this blog post to help you make the most out of your next camping trip. You will', score=None, finished=True, finished_reason='max_new_tokens')]


[GeneratedText(text=' and beautiful this past Sunday afternoon, illuminating the world around me.\nI sat down on my porch,', score=None, finished=True, finished_reason='max_new_tokens'), GeneratedText(text=', illuminating the dusty dirt roads that led to the small village. The people there had never seen such a', score=None, finished=True, finished_reason='max_new_tokens'), GeneratedText(text=' upon our faces, the light caressing every inch of our bodies. We walked barefooted through the', score=None, finished=True, finished_reason='max_new_tokens'), GeneratedText(text=' through the clouds, making everything seem lighter.\nThe sky was a vibrant shade of blue, with wis', score=None, finished=True, finished_reason='max_new_tokens')]


[GeneratedText(text=', and the children laughed and giggled.\nShe laughed and giggled throughout the whole conversation.\n', score=None, finished=True, finished_reason='max_new_tokens'), GeneratedText(text=' so loudly and repeatedly that the neighbors called 911 to report a disturbance at the home.\n\n“Someone', score=None, finished=True, finished_reason='max_new_tokens'), GeneratedText(text=" loud.\nBarking is a common sound in the house, and it's often associated with joy", score=None, finished=True, finished_reason='max_new_tokens'), GeneratedText(text=' wildly and the woman cried; "I don\'t know how to live!"\nSomewhere in the wilderness', score=None, finished=True, finished_reason='max_new_tokens')]

Comparing with v1:

from deepsparse.transformers.pipelines.text_generation import TextGenerationInput
from deepsparse import Pipeline
import time


model_path = "hf:neuralmagic/mpt-7b-chat-pruned50-quant"
pipeline = Pipeline.create(
    task="text_generation",
    model_path=model_path,
    engine_type="deepsparse",
    internal_kv_cache=False,
)

prompts = [["Hello there!", "The sun shined bright", "The dog barked"]]
input_value = TextGenerationInput(
    prompt=prompts[0],
    generation_kwargs={
        "num_return_sequences": 4,
        "max_new_tokens": 20,
        "do_sample": True,
    },
)
s = time.time()
output = pipeline(input_value)
e = time.time()
print("Total Time", e-s)
for i in output.generations:
    print(i)
    print("\n")

Sample Output

Total Time 46.41077995300293
[GeneratedText(text=' If you’re looking for some help with your marketing efforts, I’m here to help! Let', score=None, finished=True, finished_reason='max_new_tokens'), GeneratedText(text=' Welcome to my homepage!\nHi there! Thanks for dropping by my page, and apologies for the initial', score=None, finished=True, finished_reason='max_new_tokens'), GeneratedText(text=" Welcome to the Rhythm Collective's website. We are an independent artist collective based in New York City", score=None, finished=True, finished_reason='max_new_tokens'), GeneratedText(text=' We welcome you to visit our web site!\nWe have some exciting news for you, we are going', score=None, finished=True, finished_reason='max_new_tokens')]


[GeneratedText(text=' upon the earth, casting shadows across the landscape. The wind whistled through the trees and carried away leaves', score=None, finished=True, finished_reason='max_new_tokens'), GeneratedText(text=' on the ocean, and casting its light all over me.\nI thought about life and what it was', score=None, finished=True, finished_reason='max_new_tokens'), GeneratedText(text=' at the park this morning.\nI went for my first run in the morning and then I met with', score=None, finished=True, finished_reason='max_new_tokens'), GeneratedText(text=' today. The flowers blossomed, and the trees were painted with an array of colors. The birds chir', score=None, finished=True, finished_reason='max_new_tokens')]


[GeneratedText(text=' at her fiercely.\n“She couldn’t help but be mesmerized by the sight of him', score=None, finished=True, finished_reason='max_new_tokens'), GeneratedText(text=' at the mailman, which startled him and made him drop the package.\nI was surprised when I', score=None, finished=True, finished_reason='max_new_tokens'), GeneratedText(text=' at them.\nMeaning #1: The dog gave a loud bark.\nMeaning #2', score=None, finished=True, finished_reason='max_new_tokens'), GeneratedText(text=' for attention and ran for his life. The squirrel leapt away from the fox that came too close', score=None, finished=True, finished_reason='max_new_tokens')]

…e new completed attribute for Subgraph instead of checking instance type

dbogunowicz

Great job getting to the resolution of this pesky situation

dbogunowicz · 2023-12-05T10:37:16Z

src/deepsparse/v2/pipeline.py

-                    break
-
-            # keep running until all sub graphs have completed.
-            if not any(isinstance(x.output, Future) for x in sub_graphs):


So what was the core issue? We were waiting idly for all the subgraphs?

From what I understand, I was using the concurrent Future done() method incorrectly.

* Pipelines Refactor - Initial Impl (#1287) * [Pipeline Refactor] Additional functionality, engine operator, linear router and image classification pipeline/operators/example (#1325) * initial functionality and working example with image classification * remove testing image * update args * initial functionality and working example with image classification * remove testing image * pr comments * defines schemas for operators and test * add image classification test, PR comments * fix input/output handling in pipeline and operator base classes to be more generic; remove context * add additional operator input message * typo fix * [v2] EngineOperator updates to make continuous batching easier (#1371) * [v2] EngineOperator updates to make continuous batching easier * test fixes * [Pipeline Refactor] Update routes, text generation initial functionality (#1348) * initial functionality and working example with image classification * remove testing image * rebase fixes * initial functionality and working example with image classification * text gen * updates func * prompt inference, initial functionality * remove image; update state docstring * Fix typo * add todo for split/join * remove context, clean-up args, remove prefill_preprocess_operaator * fix docstrings * [Pipeline Refactor] Additional Operators, Route update and completed generation functionality (#1356) * initial functionality and working example with image classification * remove testing image * rebase fixes * initial functionality and working example with image classification * text gen * updates func * prompt inference, initial functionality * remove image; update state docstring * Fix typo * add todo for split/join * remove context, clean-up args, remove prefill_preprocess_operaator * fix docstrings * initial functionality and working example with image classification * updates func * prompt inference, initial functionality * finish generation operators and update routes * further breakdown operators * add operators * fix can_operate condition * update can_operate to not rely on the inference_state * rebase + update * fix condition * fix capacity settting again * typo fixes * [Pipeline Refactor] Split/Join Functionality for multiple prompts (#1384) * add split/join functionality * update router to include split/join in parent class, refactor pipeline code to remove repeat code, update map function * process multiple generations * move map to base class * [Pipeline Refactor] Unit Testing for Text Generation Operators (#1392) * unit testing for text generation operators * additional changes * unit testing completion * remove debug * fix * add todo * more clean-up * fix test * add docstrings/comments * break out tests to individual unit test files; add conftest and make scope of fixtures module to help with speed * fix name * [Continuous Batching] Queue Implementation to support batching grouping and prioritization (#1373) * [Continuous Batching] Queue Implementation to support batching grouping and prioritization * has_key method * thread safety * add blocking option for pop_batch * update docstring * allow mutex to be shared across continuous batching objects * revert last commit * [Continuous Batching] Executor thread for running continuous batching (#1374) * [Continuous Batching] Executor thread for running continuous batching * quality * ensure that executor stops when main thread does - clean up test hack * [ContinuousBatching] ContinuousBatchingScheduler Implementation (#1375) * [ContinuousBatching] ContinuousBatchingScheduler Implementation * cleanup unnecessary stop condition * [continuous batching] singleton pattern for scheduler (#1391) * [continuous batching] singleton pattern for scheduler * catch from review * [Pipeline Refactor][Text-Generation] Create a helper function for creating engine_inputs (#1364) * rebasing off my initial commit * cleanups * unit testing for text generation operators * additional changes * unit testing completion * remove debug * fix * add todo * more clean-up * fix test * add docstrings/comments * break out tests to individual unit test files; add conftest and make scope of fixtures module to help with speed * Delete tests/deepsparse/v2/unit/text_generation/test_msic.py --------- Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> * [Pipeline Refactor][Text-Generation] Refactor `transformers` helpers functions (#1394) * add split/join functionality * update router to include split/join in parent class, refactor pipeline code to remove repeat code, update map function * process multiple generations * initial commit * fix error * unit testing for text generation operators * additional changes * unit testing completion * remove debug * fix * add todo * more clean-up * fix test * add docstrings/comments * break out tests to individual unit test files; add conftest and make scope of fixtures module to help with speed * Delete tests/deepsparse/v2/unit/text_generation/test_msic.py * pipeline runs, but incorrectly * Revert "pipeline runs, but incorrectly" This reverts commit 51c4ee6. * PR review comments --------- Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> * [Text Generation][V2] End-to-end tests (#1402) * initial commit * initial commit * its working now * beautification * thank you Dipika <3 * ready to review * [Pipeline Refactor][Text Generation][Continuous Batching] Integration (#1409) * update split/join * use map * update * run end-to-end * clean-up * fix bug with batch size, introduce SplitRoute dataclass * update tests to use new inputs/outputs * use the normal scheduler for internal kv_cache * add pipeline inpuits * clean-up * change engine type, update docstrings, update override function to be more generic * move subgraph functionality to its own function; clean-up cont batching in text gen pipeline * update linear pathway to also use subgraph execution * rebase fix * fix tests * [Pipeline Refactor] Operator Registry (#1420) * initial registry functionality * use sparsezoo mixin * [Pipeline Refactor] Fix Operator scheduling to fix issue with slow execution (#1453) * fix scheduling to fix issue with engine running very slowly; introduce new completed attribute for Subgraph instead of checking instance type * fix warning message * [Pipeline Refactor] Add `Pipeline.create` method to initialize pipelines (#1457) * add pipeline create method for pipeline creation using the operator registry * add instance check * [Pipeline Refactor] async (#1380) * initial functionality and working example with image classification * remove testing image * rebase fixes * initial functionality and working example with image classification * text gen * updates func * prompt inference, initial functionality * remove image; update state docstring * Fix typo * add todo for split/join * remove context, clean-up args, remove prefill_preprocess_operaator * fix docstrings * initial functionality and working example with image classification * updates func * prompt inference, initial functionality * finish generation operators and update routes * further breakdown operators * add operators * fix can_operate condition * update can_operate to not rely on the inference_state * rebase + update * fix condition * async initial functionality * fix capacity settting again * add blocking * more testing * update to use split/join * fix * rebase fix * remove index * change event loop * rebase fix * update async run to use new operator scheduling properly * rebase fixes (#1458) * more fixes (#1459) --------- Co-authored-by: Benjamin Fineran <bfineran@users.noreply.github.com> Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>

* Pipelines Refactor - Initial Impl (#1287) * [Pipeline Refactor] Additional functionality, engine operator, linear router and image classification pipeline/operators/example (#1325) * initial functionality and working example with image classification * remove testing image * update args * initial functionality and working example with image classification * remove testing image * pr comments * defines schemas for operators and test * add image classification test, PR comments * fix input/output handling in pipeline and operator base classes to be more generic; remove context * add additional operator input message * typo fix * [v2] EngineOperator updates to make continuous batching easier (#1371) * [v2] EngineOperator updates to make continuous batching easier * test fixes * [Pipeline Refactor] Update routes, text generation initial functionality (#1348) * initial functionality and working example with image classification * remove testing image * rebase fixes * initial functionality and working example with image classification * text gen * updates func * prompt inference, initial functionality * remove image; update state docstring * Fix typo * add todo for split/join * remove context, clean-up args, remove prefill_preprocess_operaator * fix docstrings * [Pipeline Refactor] Additional Operators, Route update and completed generation functionality (#1356) * initial functionality and working example with image classification * remove testing image * rebase fixes * initial functionality and working example with image classification * text gen * updates func * prompt inference, initial functionality * remove image; update state docstring * Fix typo * add todo for split/join * remove context, clean-up args, remove prefill_preprocess_operaator * fix docstrings * initial functionality and working example with image classification * updates func * prompt inference, initial functionality * finish generation operators and update routes * further breakdown operators * add operators * fix can_operate condition * update can_operate to not rely on the inference_state * rebase + update * fix condition * fix capacity settting again * typo fixes * add split/join functionality * update router to include split/join in parent class, refactor pipeline code to remove repeat code, update map function * process multiple generations * initial commit * fix error * [Pipeline Refactor] Split/Join Functionality for multiple prompts (#1384) * add split/join functionality * update router to include split/join in parent class, refactor pipeline code to remove repeat code, update map function * process multiple generations * move map to base class * unit testing for text generation operators * additional changes * unit testing completion * remove debug * fix * add todo * more clean-up * fix test * add docstrings/comments * break out tests to individual unit test files; add conftest and make scope of fixtures module to help with speed * [Pipeline Refactor] Unit Testing for Text Generation Operators (#1392) * unit testing for text generation operators * additional changes * unit testing completion * remove debug * fix * add todo * more clean-up * fix test * add docstrings/comments * break out tests to individual unit test files; add conftest and make scope of fixtures module to help with speed * fix name * Delete tests/deepsparse/v2/unit/text_generation/test_msic.py * [Continuous Batching] Queue Implementation to support batching grouping and prioritization (#1373) * [Continuous Batching] Queue Implementation to support batching grouping and prioritization * has_key method * thread safety * add blocking option for pop_batch * update docstring * allow mutex to be shared across continuous batching objects * revert last commit * [Continuous Batching] Executor thread for running continuous batching (#1374) * [Continuous Batching] Executor thread for running continuous batching * quality * ensure that executor stops when main thread does - clean up test hack * [ContinuousBatching] ContinuousBatchingScheduler Implementation (#1375) * [ContinuousBatching] ContinuousBatchingScheduler Implementation * cleanup unnecessary stop condition * [continuous batching] singleton pattern for scheduler (#1391) * [continuous batching] singleton pattern for scheduler * catch from review * [Pipeline Refactor][Text-Generation] Create a helper function for creating engine_inputs (#1364) * rebasing off my initial commit * cleanups * unit testing for text generation operators * additional changes * unit testing completion * remove debug * fix * add todo * more clean-up * fix test * add docstrings/comments * break out tests to individual unit test files; add conftest and make scope of fixtures module to help with speed * Delete tests/deepsparse/v2/unit/text_generation/test_msic.py --------- Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> * pipeline runs, but incorrectly * it works for a single sequence * cleanup. now lets figure out how to run multiple sequences * [Pipeline Refactor][Text-Generation] Refactor `transformers` helpers functions (#1394) * add split/join functionality * update router to include split/join in parent class, refactor pipeline code to remove repeat code, update map function * process multiple generations * initial commit * fix error * unit testing for text generation operators * additional changes * unit testing completion * remove debug * fix * add todo * more clean-up * fix test * add docstrings/comments * break out tests to individual unit test files; add conftest and make scope of fixtures module to help with speed * Delete tests/deepsparse/v2/unit/text_generation/test_msic.py * pipeline runs, but incorrectly * Revert "pipeline runs, but incorrectly" This reverts commit 51c4ee6. * PR review comments --------- Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> * [Text Generation][V2] End-to-end tests (#1402) * initial commit * initial commit * its working now * beautification * thank you Dipika <3 * ready to review * integration tests pass * [Pipeline Refactor][Text Generation][Continuous Batching] Integration (#1409) * update split/join * use map * update * run end-to-end * clean-up * fix bug with batch size, introduce SplitRoute dataclass * update tests to use new inputs/outputs * use the normal scheduler for internal kv_cache * add pipeline inpuits * clean-up * change engine type, update docstrings, update override function to be more generic * move subgraph functionality to its own function; clean-up cont batching in text gen pipeline * update linear pathway to also use subgraph execution * rebase fix * fix tests * [Pipeline Refactor] Operator Registry (#1420) * initial registry functionality * use sparsezoo mixin * fix tricky rebase * one more cleanup * got tests to work after rebase. implementing SPLIT and JOIN in linearouter now * pipeline working, with GraphRouter. Needs some more testing * ready for review * cleanup * simplify after PR review round * [Pipeline Refactor] Fix Operator scheduling to fix issue with slow execution (#1453) * fix scheduling to fix issue with engine running very slowly; introduce new completed attribute for Subgraph instead of checking instance type * fix warning message * [Pipeline Refactor] Add `Pipeline.create` method to initialize pipelines (#1457) * add pipeline create method for pipeline creation using the operator registry * add instance check * [Pipeline Refactor] async (#1380) * initial functionality and working example with image classification * remove testing image * rebase fixes * initial functionality and working example with image classification * text gen * updates func * prompt inference, initial functionality * remove image; update state docstring * Fix typo * add todo for split/join * remove context, clean-up args, remove prefill_preprocess_operaator * fix docstrings * initial functionality and working example with image classification * updates func * prompt inference, initial functionality * finish generation operators and update routes * further breakdown operators * add operators * fix can_operate condition * update can_operate to not rely on the inference_state * rebase + update * fix condition * async initial functionality * fix capacity settting again * add blocking * more testing * update to use split/join * fix * rebase fix * remove index * change event loop * rebase fix * update async run to use new operator scheduling properly * rebase fixes (#1458) * more fixes (#1459) * bring back functionalities that were lost in v2 during rebasing * Update src/deepsparse/transformers/helpers.py * ready for review * bring tests back" * quality * original readme * addressing Dipikas comments * Update src/deepsparse/transformers/pipelines/text_generation/pipeline_no_kv_cache.py * addressing PR review --------- Co-authored-by: Benjamin Fineran <bfineran@users.noreply.github.com> Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>

dsikka added 2 commits December 4, 2023 23:32

fix scheduling to fix issue with engine running very slowly; introduc…

cb55e3d

…e new completed attribute for Subgraph instead of checking instance type

fix warning message

71e5cca

dsikka requested review from bfineran and dbogunowicz December 5, 2023 01:37

dbogunowicz approved these changes Dec 5, 2023

View reviewed changes

bfineran approved these changes Dec 5, 2023

View reviewed changes

dsikka merged commit e15a24b into v2 Dec 5, 2023

dsikka deleted the fix_operator_scheduling branch December 5, 2023 15:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Pipeline Refactor] Fix Operator scheduling to fix issue with slow execution #1453

[Pipeline Refactor] Fix Operator scheduling to fix issue with slow execution #1453

dsikka commented Dec 4, 2023 •

edited

Loading

dbogunowicz left a comment

dbogunowicz Dec 5, 2023

dsikka Dec 5, 2023

[Pipeline Refactor] Fix Operator scheduling to fix issue with slow execution #1453

[Pipeline Refactor] Fix Operator scheduling to fix issue with slow execution #1453

Conversation

dsikka commented Dec 4, 2023 • edited Loading

Summary

Testing

Time Comparisons

Sample Output:

Comparing with v1:

Sample Output

dbogunowicz left a comment

Choose a reason for hiding this comment

dbogunowicz Dec 5, 2023

Choose a reason for hiding this comment

dsikka Dec 5, 2023

Choose a reason for hiding this comment

dsikka commented Dec 4, 2023 •

edited

Loading