Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Update WhisperPlus version and fix example URL #9

Merged
merged 1 commit into from
Nov 22, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 37 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,10 @@
pip install whisperplus
```

# 🤗 Model Hub

You can find the models on the [HuggingFace Spaces](https://huggingface.co/spaces/ArtGAN/WhisperPlus) or on the [HuggingFace Model Hub](https://huggingface.co/models?search=whisper)

## 🎙️ Usage

To use the whisperplus library, follow the steps below for different tasks:
Expand All @@ -29,12 +33,42 @@ To use the whisperplus library, follow the steps below for different tasks:
```python
from whisperplus import SpeechToTextPipeline, download_and_convert_to_mp3

url = "https://www.youtube.com/watch?v=6Dh-RL__uN4"
url = "https://www.youtube.com/watch?v=di3rHkEZuUw"
video_path = download_and_convert_to_mp3(url)
pipeline = SpeechToTextPipeline()
pipeline = SpeechToTextPipeline(model_id="openai/whisper-large-v3")
transcript = pipeline(
audio_path=video_path, model_id="openai/whisper-large-v3", language="turkish"
audio_path=video_path, model_id="openai/whisper-large-v3", language="english
)

return transcript
```

### Contri

```bash
pip install -r dev-requirements.txt
pre-commit install
pre-commit run --all-files
```

## 📜 License

This project is licensed under the terms of the Apache License 2.0.

## 🤗 Acknowledgments

This project is based on the [HuggingFace Transformers](https://github.com/huggingface/transformers) library.

## 🤗 Citation

```bibtex
@misc{radford2022whisper,
doi = {10.48550/ARXIV.2212.04356},
url = {https://arxiv.org/abs/2212.04356},
author = {Radford, Alec and Kim, Jong Wook and Xu, Tao and Brockman, Greg and McLeavey, Christine and Sutskever, Ilya},
title = {Robust Speech Recognition via Large-Scale Weak Supervision},
publisher = {arXiv},
year = {2022},
copyright = {arXiv.org perpetual, non-exclusive license}
}
```
2 changes: 1 addition & 1 deletion whisperplus/__init__.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
from whisperplus.pipelines.whisper import SpeechToTextPipeline
from whisperplus.utils.download_utils import download_and_convert_to_mp3

__version__ = '0.0.3'
__version__ = '0.0.4'
__author__ = 'kadirnar'
__license__ = 'Apache License 2.0'
__all__ = ['']
15 changes: 14 additions & 1 deletion whisperplus/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,19 @@


def main(url, model_id, language_choice):
"""
Main function that downloads and converts a video to MP3 format, performs speech-to-text conversion using
a specified model, and returns the transcript along with the video path.

Args:
url (str): The URL of the video to download and convert.
model_id (str): The ID of the speech-to-text model to use.
language_choice (str): The language choice for the speech-to-text conversion.

Returns:
transcript (str): The transcript of the speech-to-text conversion.
video_path (str): The path of the downloaded video.
"""
video_path = download_and_convert_to_mp3(url)
pipeline = SpeechToTextPipeline(model_id)
transcript = pipeline(audio_path=video_path, model_id=model_id, language=language_choice)
Expand Down Expand Up @@ -61,7 +74,7 @@ def app():
gr.Examples(
examples=[
[
"https://www.youtube.com/watch?v=HDX8BE2Pje8",
"https://www.youtube.com/watch?v=di3rHkEZuUw",
"openai/whisper-large-v3",
"English",
],
Expand Down