Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Updating to latest #1

Open
wants to merge 20 commits into
base: main
Choose a base branch
from
Open

Updating to latest #1

wants to merge 20 commits into from

Conversation

saamerm
Copy link
Owner

@saamerm saamerm commented Sep 28, 2024

No description provided.

ZachNagengast and others added 20 commits July 11, 2024 11:55
* Update resampling logic to handle chunking properly

* Cleanup logging

* Optimize memory usage when resampling

* Add filter to input prompt text

* Correct timestamp filter logic for #170

* Filter out zero length segments

- when calculating word timestamps
- resolves #170

* Add method for async audio loading

* Fix async load audio function

* Fix tests

* Fix tests

* Fix tests

* Revert timestamp filter changes

* Temporarily remove xcpretty for tests

* Check suspected test crash

* Remove errant test case for japanese options

* Add bigger range for early stopping test

* Reset progress between runs

* Fix progress resetting and improve example app transcription handling

* Update tests

* Minimize crash risk for early stop checks

* Fix finalize text

* Add source text to language label
…lity (#192)

* Make additional initializers, functions, members public, for WKPro

* Allows use of default internal functions & member accesses which have
  increased protections when imported

* Initializers were Xcode generated: right click class name -> refactor
  -> generate memberwise initializers
   * memberwise initializer defaults to internal, mark as public.

* Formatting

---------

Co-authored-by: ZachNagengast <znagengast@gmail.com>
… models (#193)

* Add initial mlpackage loading (if .mlmodelc not present)

-- Does not modify model loading in OS WK.  This is a hook to modify
load path URLs.

* Always load audio encoder last

* Adjust timings to account for decoder<>encoder order swap

* Add helper for mlpackage detection

---------

Co-authored-by: ZachNagengast <znagengast@gmail.com>
* Fix start time logic for file loading and resampling

* Add test file
As far as I can tell, these stored properties are not meant to be changed. Therefore, change them to be immutable. This change also makes these static properties concurrency-safe.
* Add VoiceActivityDetector base class

Add base class to allow different VAD implementations

* fix spaces
* CI fetch depth 0

* VAD refactoring

* Update logo

* Add WhisperKitConfig

* Open whisperkit methods

* add missing @available

---------

Co-authored-by: BlaiseMuhirwa <blaisemuhirwa3@gmail.com>
Co-authored-by: ZachNagengast <znagengast@gmail.com>
* Add model support config fetching from model repo

* Fix audio start index error handling

Co-authored-by: 1amageek <1amageek@users.noreply.github.com>

* Formatting

* Fix CI + watchOS build

- New github runner image does not include visionOS, so to prevent downloading for all platforms this will specify the platform from the test matrix

* Fix typo

* Use dispatch group for sync recommendedModels

* Remove sync remote model fetching

* Formatting and cleanup from review

---------

Co-authored-by: 1amageek <1amageek@users.noreply.github.com>
* Release memory when transcribing single files

Co-authored-by: keleftheriou <keleftheriou@users.noreply.github.com>

* Add method to load from file into float array iteratively

- Reduces peak memory by doing the array conversion while loading in chunks so the array copy size is lower
- Previously copied the entire buffer which spiked the memory 2x

* Fix leak

* Use vad by default in examples

* Fix vad thread issue

* Fix unused warning

* Revert change to early stop callback

* Fix warnings

- Optional cli commands are deprecated
- @_disfavoredOverload required @available to prevent infinite loop

* PR review - simplify early stop test logic

Co-authored-by: Andrey Leonov <aleonov@gmail.com>

* Cleanup from review

---------

Co-authored-by: keleftheriou <keleftheriou@users.noreply.github.com>
Co-authored-by: Andrey Leonov <aleonov@gmail.com>
Co-authored-by: Arda Atahan Ibis <ardaibis@gmail.com>
Co-authored-by: ZachNagengast <znagengast@gmail.com>
* Fix xcconfig tracking

* Add package.swift docs to readme

* Fix edge case where framePosition does not align with actual frame count of AVAudioFile

* Upgrade github runner macos version

* Update remaining github runner versions

* Use WERUtils to check vad accuracy

* Reduce calls to frameposition

* Fix xcode version for runner
* SegmentDiscovery callback

* ModelState callback

* FractionCompleted callback

* TranscriptionPhaseCallback callback

* Updates for review

* Formatting

* Remove remaining callback from init

---------

Co-authored-by: ZachNagengast <znagengast@gmail.com>
* Use @Frozen for loglevel enum

* Add public accessibility for loggingcallback and loglevel
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants