PCA behavior #367

chasemc · 2024-11-12T17:22:15Z

Would it be okay to switch:

Lines 601 to 607 in 0d9028c

    
           if n_components > pca_dimensions and pca_dimensions != 0: 
        
               logger.debug( 
        
                   f"Performing decomposition with PCA (seed {seed}): {n_components} to {pca_dimensions} dims" 
        
               ) 
        
               X = PCA(n_components=pca_dimensions, random_state=random_state).fit_transform(X) 
        
               # X = PCA(n_components='mle').fit_transform(X) 
        
               n_samples, n_components = X.shape

to adapt to a lower pca dimension when there aren't enough contigs/kmers

    if n_components > pca_dimensions and pca_dimensions != 0:
        if n_samples < pca_dimensions:
            logging.warning(f"n_samples ({n_samples}) is less than pca_dimensions ({pca_dimensions}), lowering pca_dimensions to {min(n_samples, n_components)} .")            
            pca_dimensions = min(n_samples, n_components)
        logger.debug(
            f"Performing decomposition with PCA (seed {seed}): {n_components} to {pca_dimensions} dims"
        )
        X = PCA(n_components=pca_dimensions, random_state=random_state).fit_transform(X)
        n_samples, n_components = X.shape

chasemc · 2024-11-12T17:32:15Z

To be clear -> as written this would only happen in the instance that there are less "samples" (contigs) than there are PCA dimensions

jason-c-kwan · 2024-11-12T18:53:35Z

What would the point be of doing PCA on a dataset with less than 50 contigs before some other dimension reduction technique? I think before making this change there should be some data gathered on whether it is useful or makes a difference.

chasemc · 2024-11-12T23:21:05Z

The main reason is so a minimal dataset that doesn't take forever doesn't fail when testing the workflows.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PCA behavior #367

PCA behavior #367

chasemc commented Nov 12, 2024 •

edited

Loading

chasemc commented Nov 12, 2024

jason-c-kwan commented Nov 12, 2024

chasemc commented Nov 12, 2024 •

edited

Loading

PCA behavior #367

PCA behavior #367

Comments

chasemc commented Nov 12, 2024 • edited Loading

chasemc commented Nov 12, 2024

jason-c-kwan commented Nov 12, 2024

chasemc commented Nov 12, 2024 • edited Loading

chasemc commented Nov 12, 2024 •

edited

Loading

chasemc commented Nov 12, 2024 •

edited

Loading