`transcribe_timestamped` cutoff script and starts again #203

CorrM · 2024-08-09T12:21:18Z

["Reddit, You've had to apologize for something ridiculous.", 'What was it?', "I once apologized for accidentally convincing my best friend's grandma that I was a professional cage fighter and getting her to attend one of my (fake) matches.", 'She brought homemade chicken soup as a "good luck charm" and sat in the front row, cheering me on with her cane while yelling "You show \'em, young man!"', 'The whole crowd thought it was hilarious, but I had to apologize when she found out it was all an elaborate prank just for kicks.', '(ends abruptly)']

here is my code:

    def audio_to_text(filename: str, model_size: str = "base") -> dict[str, Any]:
        """
        Converts an audio file to text using a pre-trained model.

        :param filename: The path to the audio file.
        :param model_size: The size of the model to use (default is "base").
        :return: A generator object that yields the transcribed text and its corresponding timestamps.
        """
        from whisper_timestamped import load_model, transcribe_timestamped

        global WHISPER_MODEL
        if WHISPER_MODEL is None:
            WHISPER_MODEL = load_model(model_size)

        gen = transcribe_timestamped(WHISPER_MODEL, filename, verbose=False, fp16=False)
        return gen

Here is what transcribe_timestamped["text"] return

Reddit? You've had to apologize for something ridiculous. What was it? I once apologized for accidentally convincing my best friend's grandma that I was a professional cage fighter and getting her to attend one of my fake matches. She brought home a chicken soup as a good luck charm and sat in the front row, cheering me on with her came while yelling you show him young man. The whole crowd thought it was hilarious, but I had to apologize when she found out it was all 
Reddit? You've had to apologize for something ridiculous. What was it? I once apologized for accidentally convincing my best friend's grandma that I was a professional cage fighter and getting her to attend one of my fake matches. She brought home a chicken soup as a good luck charm and sat in the front row, cheering me on with her came while yelling you show him young man. The whole crowd thought it was hilarious, but I had to apologize when she found out it was all an elaborate prank just for kicks. Ends abruptly.

I added a new line to make it more visible that the original script cuts off and starts again.

when i do something like that:

text: str = "".join(x["text"] for x in whisper_analysis["segments"])

I get:

 Reddit? You've had to apologize for something ridiculous. What was it? I once apologized for accidentally convincing my best friend's grandma that I was a professional cage fighter and getting her to attend one of my fake matches. She brought home a chicken soup as a good luck charm and sat in the front row, cheering me on with her came while yelling you show him young man. The whole crowd thought it was hilarious, but I had to apologize when she found out it was all an elaborate prank just for kicks. Ends abruptly.

Which it is what transcribe_timestamped["text"] should return/

The text was updated successfully, but these errors were encountered:

CorrM changed the title ~~transcribe_timestamped duplicate text~~ transcribe_timestamped cutoff script and starts again Aug 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`transcribe_timestamped` cutoff script and starts again #203

`transcribe_timestamped` cutoff script and starts again #203

CorrM commented Aug 9, 2024 •

edited

Loading

transcribe_timestamped cutoff script and starts again #203

transcribe_timestamped cutoff script and starts again #203

Comments

CorrM commented Aug 9, 2024 • edited Loading

`transcribe_timestamped` cutoff script and starts again #203

`transcribe_timestamped` cutoff script and starts again #203

CorrM commented Aug 9, 2024 •

edited

Loading