Split long audio to chunks before sending it to cloud speech recognition services like OpenAI or Groq #93

MinmoTech · 2025-02-12T15:49:18Z

With a large file (for example https://www.youtube.com/watch?v=xX4mBbJjdYM downloaded with yt-dlp) I get errors.
I have seen these 2 errors, the first one being more common:

Error: Connection error.
Error: 413 Request Entity Too Large

Here is a stacktrace from the debug option:
npx echogarden transcribe --debug --openAICloud.model=whisper-large-v3-turbo --openAICloud.apiKey='<api_key>' --engine=openai-cloud --openAICloud.baseURL=https://api.groq.com/openai/v1 --language=en "My Business Is In Danger - WAN Show February 7, 2025-xX4mBbJjdYM.mkv" "My Business Is In Danger - WAN Show February 7, 2025-xX4mBbJjdYM.srt"

Send request to https://api.groq.com/openai/v1.. APIConnectionError: Connection error.
    at OpenAI.makeRequest (file:///home/user/.npm/_npx/e9225f775dff863b/node_modules/openai/core.mjs:316:19)
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
    at async Module.recognize (file:///home/user/.npm/_npx/e9225f775dff863b/node_modules/echogarden/dist/recognition/OpenAICloudSTT.js:31:20)
    at async Module.recognize (file:///home/user/.npm/_npx/e9225f775dff863b/node_modules/echogarden/dist/api/Recognition.js:163:41)
    at async transcribe (file:///home/user/.npm/_npx/e9225f775dff863b/node_modules/echogarden/dist/cli/CLI.js:451:115)
    at async startWithArgs (file:///home/user/.npm/_npx/e9225f775dff863b/node_modules/echogarden/dist/cli/CLI.js:222:13)
    at async start (file:///home/user/.npm/_npx/e9225f775dff863b/node_modules/echogarden/dist/cli/CLI.js:149:9) {
  status: undefined,
  headers: undefined,
  request_id: undefined,
  error: undefined,
  code: undefined,
  param: undefined,
  type: undefined,
  cause: FetchError: request to https://api.groq.com/openai/v1/audio/transcriptions failed, reason: read ECONNRESET
      at ClientRequest.<anonymous> (/home/user/.npm/_npx/e9225f775dff863b/node_modules/node-fetch/lib/index.js:1501:11)
      at ClientRequest.emit (node:events:524:28)
      at emitErrorEvent (node:_http_client:104:11)
      at TLSSocket.socketErrorListener (node:_http_client:518:5)
      at TLSSocket.emit (node:events:536:35)
      at emitErrorNT (node:internal/streams/destroy:170:8)
      at emitErrorCloseNT (node:internal/streams/destroy:129:3)
      at process.processTicksAndRejections (node:internal/process/task_queues:90:21) {
    type: 'system',
    errno: 'ECONNRESET',
    code: 'ECONNRESET'
  }
}

When using the official openai api I get a similar error:
npx echogarden transcribe --debug --openAICloud.model=whisper-1 --openAICloud.apiKey='<api_key>' --engine=openai-cloud --language=en "My Business Is In Danger - WAN Show February 7, 2025-xX4mBbJjdYM.mkv" "My Business Is In Danger - WAN Show February 7, 2025-xX4mBbJjdYM.srt"

APIError: 413 413: Maximum content size limit (26214400) exceeded (26362076 bytes read)
    at APIError.generate (file:///home/user/.npm/_npx/e9225f775dff863b/node_modules/openai/error.mjs:64:16)
    at OpenAI.makeStatusError (file:///home/user/.npm/_npx/e9225f775dff863b/node_modules/openai/core.mjs:286:25)
    at OpenAI.makeRequest (file:///home/user/.npm/_npx/e9225f775dff863b/node_modules/openai/core.mjs:330:30)
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
    at async Module.recognize (file:///home/user/.npm/_npx/e9225f775dff863b/node_modules/echogarden/dist/recognition/OpenAICloudSTT.js:31:20)
    at async Module.recognize (file:///home/user/.npm/_npx/e9225f775dff863b/node_modules/echogarden/dist/api/Recognition.js:163:41)
    at async transcribe (file:///home/user/.npm/_npx/e9225f775dff863b/node_modules/echogarden/dist/cli/CLI.js:451:115)
    at async startWithArgs (file:///home/user/.npm/_npx/e9225f775dff863b/node_modules/echogarden/dist/cli/CLI.js:222:13)
    at async start (file:///home/user/.npm/_npx/e9225f775dff863b/node_modules/echogarden/dist/cli/CLI.js:149:9) {
  status: 413,
  headers: {
    'access-control-expose-headers': 'X-Request-ID',
    'alt-svc': 'h3=":443"; ma=86400',
    'cf-cache-status': 'DYNAMIC',
    'cf-ray': '910a50bb9c4c9288-MUC',
    connection: 'keep-alive',
    'content-length': '176',
    'content-type': 'application/json',
    date: 'Wed, 12 Feb 2025 05:53:33 GMT',
    'openai-organization': 'org',
    'openai-processing-ms': '6365',
    'openai-version': '2020-10-01',
    server: 'cloudflare',
    'set-cookie': '<redacted>',
    'strict-transport-security': 'max-age=31536000; includeSubDomains; preload',
    via: 'envoy-router-56b7c7f47-2pjgd',
    'x-content-type-options': 'nosniff',
    'x-envoy-upstream-service-time': '545',
    'x-ratelimit-limit-requests': '10000',
    'x-ratelimit-remaining-requests': '9999',
    'x-ratelimit-reset-requests': '6ms',
    'x-request-id': '<redacted>'
  },
  request_id: '<redacted>',
  error: {
    message: '413: Maximum content size limit (26214400) exceeded (26362076 bytes read)',
    type: 'server_error',
    param: null,
    code: null
  },
  code: null,
  param: null,
  type: 'server_error'
}

The text was updated successfully, but these errors were encountered:

rotemdan · 2025-02-15T09:26:05Z

Thanks for the report. I've tagged this as a feature suggestion.

It's possible to pre-split the audio to chunks before sending it to the cloud provider.

It will require incorporating some voice activity detection to find good split points.

rotemdan changed the title ~~Large files connection error with groq/openai api~~ Split large audio files before sending them to cloud speech recognition services like OpenAI or Groq Feb 15, 2025

rotemdan added recognition Issue related to speech recognition feature Issue proposes a new feature labels Feb 15, 2025

rotemdan changed the title ~~Split large audio files before sending them to cloud speech recognition services like OpenAI or Groq~~ Split long audio to chunks before sending it to cloud speech recognition services like OpenAI or Groq Feb 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split long audio to chunks before sending it to cloud speech recognition services like OpenAI or Groq #93

Split long audio to chunks before sending it to cloud speech recognition services like OpenAI or Groq #93

MinmoTech commented Feb 12, 2025

rotemdan commented Feb 15, 2025

Split long audio to chunks before sending it to cloud speech recognition services like OpenAI or Groq #93

Split long audio to chunks before sending it to cloud speech recognition services like OpenAI or Groq #93

Comments

MinmoTech commented Feb 12, 2025

rotemdan commented Feb 15, 2025