Support Audio Datatypes #178

HAKSOAT · 2025-01-12T22:54:30Z

This task is not fully formed yet, but the idea is to take one step towards supporting audio data.

In the case of text data, tokenization is used to split it into tokens and pass it into the model.
In the case of image data, we resize, convert to ndarray, etc., then pass it into the model.

In the case of audio data???

That is the question this issue looks to answer.

The user should be able to pass in bytes of an audio file and we read and do basic processing (it is fine if this is not model-specific, just generic stuff) that leads to the input type supported by the model.

Such that we can add an extra input type here:

ahnlich/ahnlich/ai/src/engine/ai/models.rs

Line 247 in 4ed8654

pub enum ModelInput {

Such as:

#[derive(Debug)]
pub enum ModelInput {
    Texts(Vec<Encoding>),
    Images(Array<f32, Ix4>),
    Audios(...)
}

Some other places where this new data type would reflect could be:

ahnlich/ahnlich/ai/src/engine/ai/models.rs

Lines 25 to 33 in 4ed8654

    
           pub enum ModelType { 
        
               Text { 
        
                   max_input_tokens: NonZeroUsize, 
        
               }, 
        
               Image { 
        
                   // width, height 
        
                   expected_image_dimensions: (NonZeroUsize, NonZeroUsize), 
        
               }, 
        
           }

Other useful pointers include:

How we currently read bytes for images:

ahnlich/ahnlich/ai/src/engine/ai/models.rs

Line 261 in 4ed8654

pub fn try_new(bytes: Vec<u8>) -> Result<Self, AIProxyError> {

The information provided doesn't cover all the bits of code that need modification to make the introduction, so feel free to do what is needed.

Out of scope: Model Support, Specific Preprocessing

Focus should be on being able to get audio into the format that can be sent into models using the most minimal processing possible. For example, in the case of image processing, this would mean simply resizing the image and converting to NdArray.

The text was updated successfully, but these errors were encountered:

HAKSOAT added enhancement New feature or request help wanted Extra attention is needed labels Jan 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Audio Datatypes #178

Support Audio Datatypes #178

HAKSOAT commented Jan 12, 2025 •

edited

Loading

Support Audio Datatypes #178

Support Audio Datatypes #178

Comments

HAKSOAT commented Jan 12, 2025 • edited Loading

HAKSOAT commented Jan 12, 2025 •

edited

Loading