Provide guidance on creating a header for `streaming_synthesize` in streaming_tts_quickstart.py #13080

parthea · 2025-01-21T01:33:52Z

From googleapis/google-cloud-python#13405, the response to streaming_synthesize is headerless LINEAR16 audio with a sample rate of 24000.. The code sample below prints the size of the audio content but does not include the necessary header to actually play the audio.

python-docs-samples/texttospeech/snippets/streaming_tts_quickstart.py

Lines 46 to 48 in 5e8e178

    
           streaming_responses = client.streaming_synthesize(itertools.chain([config_request], request_generator())) 
        
           for response in streaming_responses: 
        
               print(f"Audio content size in bytes is: {len(response.audio_content)}")

This may not be the purpose of the code sample, however having this extra information in the code sample will help with debugging customer issues such as googleapis/google-cloud-python#13405.

I added code which includes the raw audio header, however there is likely an easier way to achieve this. We should provide guidance on how folks should create the audio header.

# This is a raw header based on the spec at https://docs.fileformat.com/audio/wav/
header = b'RIFF\x00\x00\x00\x00WAVEfmt \x10\x00\x00\x00\x01\x00\x01\x00\xc0]\x00\x00\x80\xbb\x00\x00\x02\x00\x10\x00data\x00\x00\x00\x00'

total_length = 0

with open(f"output.wav", "wb") as out:
    out.write(header)
    for response in streaming_responses:
        # calculate the length of the content
        total_length += len(response.audio_content)
        out.write(response.audio_content)
    # Position 40 - 43: Size of the data section
    out.seek(40)
    out.write(bytes([total_length & 0xFF, (total_length >> 8) & 0xFF, (total_length >> 16) & 0xFF, (total_length >> 24) & 0xFF]))

import os
file_size = os.path.getsize("output.wav")

with open(f"output.wav", "r+b") as out:
    # Position 4-7: Size of the overall file - 8 bytes, in bytes (32-bit integer). Typically, you’d fill this in after creation.
    out.seek(4)
    out.write(bytes([file_size & 0xFF, (file_size >> 8) & 0xFF, (file_size >> 16) & 0xFF, (total_length >> 24) & 0xFF]))

The text was updated successfully, but these errors were encountered:

glasnt · 2025-01-21T01:41:55Z

This part of this WIP PR might be similar to what you need here (possibly)

https://github.com/GoogleCloudPlatform/python-docs-samples/pull/13053/files#diff-5d664c635b2f6262b57f11d8b4d2016da17a18a41a8f57efd60d69b39c37365dR254-R272

parthea added priority: p2 Moderately-important priority. Fix may not be included in next release. triage me I really want to be triaged. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. labels Jan 21, 2025

product-auto-label bot added the samples Issues that are directly related to samples. label Jan 21, 2025

blunderbuss-gcf bot assigned glasnt Jan 21, 2025

parthea mentioned this issue Jan 27, 2025

Voice Experience Issue in Google Text to Speech (streaming_synthesize) googleapis/google-cloud-python#13405

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide guidance on creating a header for `streaming_synthesize` in streaming_tts_quickstart.py #13080

Provide guidance on creating a header for `streaming_synthesize` in streaming_tts_quickstart.py #13080

parthea commented Jan 21, 2025

glasnt commented Jan 21, 2025

Provide guidance on creating a header for streaming_synthesize in streaming_tts_quickstart.py #13080

Provide guidance on creating a header for streaming_synthesize in streaming_tts_quickstart.py #13080

Comments

parthea commented Jan 21, 2025

glasnt commented Jan 21, 2025

Provide guidance on creating a header for `streaming_synthesize` in streaming_tts_quickstart.py #13080

Provide guidance on creating a header for `streaming_synthesize` in streaming_tts_quickstart.py #13080