Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Split long audio to chunks before sending it to cloud speech recognition services like OpenAI or Groq #93

Open
MinmoTech opened this issue Feb 12, 2025 · 1 comment
Labels
feature Issue proposes a new feature recognition Issue related to speech recognition

Comments

@MinmoTech
Copy link

With a large file (for example https://www.youtube.com/watch?v=xX4mBbJjdYM downloaded with yt-dlp) I get errors.
I have seen these 2 errors, the first one being more common:

  • Error: Connection error.
  • Error: 413 Request Entity Too Large

Here is a stacktrace from the debug option:
npx echogarden transcribe --debug --openAICloud.model=whisper-large-v3-turbo --openAICloud.apiKey='<api_key>' --engine=openai-cloud --openAICloud.baseURL=https://api.groq.com/openai/v1 --language=en "My Business Is In Danger - WAN Show February 7, 2025-xX4mBbJjdYM.mkv" "My Business Is In Danger - WAN Show February 7, 2025-xX4mBbJjdYM.srt"

Send request to https://api.groq.com/openai/v1.. APIConnectionError: Connection error.
    at OpenAI.makeRequest (file:///home/user/.npm/_npx/e9225f775dff863b/node_modules/openai/core.mjs:316:19)
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
    at async Module.recognize (file:///home/user/.npm/_npx/e9225f775dff863b/node_modules/echogarden/dist/recognition/OpenAICloudSTT.js:31:20)
    at async Module.recognize (file:///home/user/.npm/_npx/e9225f775dff863b/node_modules/echogarden/dist/api/Recognition.js:163:41)
    at async transcribe (file:///home/user/.npm/_npx/e9225f775dff863b/node_modules/echogarden/dist/cli/CLI.js:451:115)
    at async startWithArgs (file:///home/user/.npm/_npx/e9225f775dff863b/node_modules/echogarden/dist/cli/CLI.js:222:13)
    at async start (file:///home/user/.npm/_npx/e9225f775dff863b/node_modules/echogarden/dist/cli/CLI.js:149:9) {
  status: undefined,
  headers: undefined,
  request_id: undefined,
  error: undefined,
  code: undefined,
  param: undefined,
  type: undefined,
  cause: FetchError: request to https://api.groq.com/openai/v1/audio/transcriptions failed, reason: read ECONNRESET
      at ClientRequest.<anonymous> (/home/user/.npm/_npx/e9225f775dff863b/node_modules/node-fetch/lib/index.js:1501:11)
      at ClientRequest.emit (node:events:524:28)
      at emitErrorEvent (node:_http_client:104:11)
      at TLSSocket.socketErrorListener (node:_http_client:518:5)
      at TLSSocket.emit (node:events:536:35)
      at emitErrorNT (node:internal/streams/destroy:170:8)
      at emitErrorCloseNT (node:internal/streams/destroy:129:3)
      at process.processTicksAndRejections (node:internal/process/task_queues:90:21) {
    type: 'system',
    errno: 'ECONNRESET',
    code: 'ECONNRESET'
  }
}

When using the official openai api I get a similar error:
npx echogarden transcribe --debug --openAICloud.model=whisper-1 --openAICloud.apiKey='<api_key>' --engine=openai-cloud --language=en "My Business Is In Danger - WAN Show February 7, 2025-xX4mBbJjdYM.mkv" "My Business Is In Danger - WAN Show February 7, 2025-xX4mBbJjdYM.srt"

APIError: 413 413: Maximum content size limit (26214400) exceeded (26362076 bytes read)
    at APIError.generate (file:///home/user/.npm/_npx/e9225f775dff863b/node_modules/openai/error.mjs:64:16)
    at OpenAI.makeStatusError (file:///home/user/.npm/_npx/e9225f775dff863b/node_modules/openai/core.mjs:286:25)
    at OpenAI.makeRequest (file:///home/user/.npm/_npx/e9225f775dff863b/node_modules/openai/core.mjs:330:30)
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
    at async Module.recognize (file:///home/user/.npm/_npx/e9225f775dff863b/node_modules/echogarden/dist/recognition/OpenAICloudSTT.js:31:20)
    at async Module.recognize (file:///home/user/.npm/_npx/e9225f775dff863b/node_modules/echogarden/dist/api/Recognition.js:163:41)
    at async transcribe (file:///home/user/.npm/_npx/e9225f775dff863b/node_modules/echogarden/dist/cli/CLI.js:451:115)
    at async startWithArgs (file:///home/user/.npm/_npx/e9225f775dff863b/node_modules/echogarden/dist/cli/CLI.js:222:13)
    at async start (file:///home/user/.npm/_npx/e9225f775dff863b/node_modules/echogarden/dist/cli/CLI.js:149:9) {
  status: 413,
  headers: {
    'access-control-expose-headers': 'X-Request-ID',
    'alt-svc': 'h3=":443"; ma=86400',
    'cf-cache-status': 'DYNAMIC',
    'cf-ray': '910a50bb9c4c9288-MUC',
    connection: 'keep-alive',
    'content-length': '176',
    'content-type': 'application/json',
    date: 'Wed, 12 Feb 2025 05:53:33 GMT',
    'openai-organization': 'org',
    'openai-processing-ms': '6365',
    'openai-version': '2020-10-01',
    server: 'cloudflare',
    'set-cookie': '<redacted>',
    'strict-transport-security': 'max-age=31536000; includeSubDomains; preload',
    via: 'envoy-router-56b7c7f47-2pjgd',
    'x-content-type-options': 'nosniff',
    'x-envoy-upstream-service-time': '545',
    'x-ratelimit-limit-requests': '10000',
    'x-ratelimit-remaining-requests': '9999',
    'x-ratelimit-reset-requests': '6ms',
    'x-request-id': '<redacted>'
  },
  request_id: '<redacted>',
  error: {
    message: '413: Maximum content size limit (26214400) exceeded (26362076 bytes read)',
    type: 'server_error',
    param: null,
    code: null
  },
  code: null,
  param: null,
  type: 'server_error'
}

@rotemdan rotemdan changed the title Large files connection error with groq/openai api Split large audio files before sending them to cloud speech recognition services like OpenAI or Groq Feb 15, 2025
@rotemdan rotemdan added recognition Issue related to speech recognition feature Issue proposes a new feature labels Feb 15, 2025
@rotemdan
Copy link
Member

Thanks for the report. I've tagged this as a feature suggestion.

It's possible to pre-split the audio to chunks before sending it to the cloud provider.

It will require incorporating some voice activity detection to find good split points.

@rotemdan rotemdan changed the title Split large audio files before sending them to cloud speech recognition services like OpenAI or Groq Split long audio to chunks before sending it to cloud speech recognition services like OpenAI or Groq Feb 15, 2025
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
feature Issue proposes a new feature recognition Issue related to speech recognition
Projects
None yet
Development

No branches or pull requests

2 participants