Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Support Audio Datatypes #178

Open
HAKSOAT opened this issue Jan 12, 2025 · 0 comments
Open

Support Audio Datatypes #178

HAKSOAT opened this issue Jan 12, 2025 · 0 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@HAKSOAT
Copy link
Collaborator

HAKSOAT commented Jan 12, 2025

This task is not fully formed yet, but the idea is to take one step towards supporting audio data.

In the case of text data, tokenization is used to split it into tokens and pass it into the model.
In the case of image data, we resize, convert to ndarray, etc., then pass it into the model.

In the case of audio data???

That is the question this issue looks to answer.

The user should be able to pass in bytes of an audio file and we read and do basic processing (it is fine if this is not model-specific, just generic stuff) that leads to the input type supported by the model.

Such that we can add an extra input type here:

pub enum ModelInput {

Such as:

#[derive(Debug)]
pub enum ModelInput {
    Texts(Vec<Encoding>),
    Images(Array<f32, Ix4>),
    Audios(...)
}

Some other places where this new data type would reflect could be:

pub enum ModelType {
Text {
max_input_tokens: NonZeroUsize,
},
Image {
// width, height
expected_image_dimensions: (NonZeroUsize, NonZeroUsize),
},
}

Other useful pointers include:

How we currently read bytes for images:

pub fn try_new(bytes: Vec<u8>) -> Result<Self, AIProxyError> {

The information provided doesn't cover all the bits of code that need modification to make the introduction, so feel free to do what is needed.

Out of scope: Model Support, Specific Preprocessing

Focus should be on being able to get audio into the format that can be sent into models using the most minimal processing possible. For example, in the case of image processing, this would mean simply resizing the image and converting to NdArray.

@HAKSOAT HAKSOAT added enhancement New feature or request help wanted Extra attention is needed labels Jan 12, 2025
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

1 participant