Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Incorrect/Inconsistent Fold Change Calculation #6654

Closed
nathanhaigh opened this issue Nov 8, 2022 · 6 comments
Closed

Incorrect/Inconsistent Fold Change Calculation #6654

nathanhaigh opened this issue Nov 8, 2022 · 6 comments
Labels
bug Something isn't working

Comments

@nathanhaigh
Copy link

I think the fold changes are being calculated incorrectly in most places but not all. In essence, the log of the means is calculated instead of the mean of the logs. Or at least there is inconsistency in the way it is calculated.

In several places in the FoldChange() methods you have things like this:

return(log(x = rowMeans(x = expm1(x = x)) + pseudocount.use, base = base))

This shows the means are calculated on the non log-transformed data and then the mean values are subsequently logged. The means should be taken on the log-transformed data. The code in FoldChange.default() seems to do this correctly since the fold change is calculated by subtracting the mean expression of one group from another. That would only make sense if those values were already log-transformed otherwise I'd expect a division:

  # Calculate fold change
  data.1 <- mean.fxn(object[features, cells.1, drop = FALSE])
  data.2 <- mean.fxn(object[features, cells.2, drop = FALSE])
  fc <- (data.1 - data.2)

The consequence of calculating means before the log-transformation is that fold changes will be systematically over-estimated. For example, see: https://rpubs.com/hrlai/meanlog_logmean

@nathanhaigh nathanhaigh added the bug Something isn't working label Nov 8, 2022
@coralzhang
Copy link

Is this the reason for the discrepancy between the fold change calculated in FoldChange and FindMarkers? Is there a recommendation on how to proceed?
https://github.com/satijalab/seurat/issues/6773
https://github.com/satijalab/seurat/issues/6701

@lucygarner
Copy link

lucygarner commented Mar 11, 2023

@nathanhaigh, thank you for the explanation.

default.mean.fxn <- function(x) {
  return(log(x = rowMeans(x = x) + pseudocount.use, base = base))
}
mean.fxn <- mean.fxn %||% switch(
  EXPR = slot,
  'data' = switch(
    EXPR = norm.method %||% '',
    'LogNormalize' = function(x) {
      return(log(x = rowMeans(x = expm1(x = x)) + pseudocount.use, base = base))
    },
    default.mean.fxn
  ),
  'scale.data' = rowMeans,
  default.mean.fxn
)

I found the code above within the FoldChange.Assay function, which changes mean.fxn depending on the data slot.

@nathanhaigh
Copy link
Author

@coralzhang just tagging these other issues...in case they are related: #6701, #6773 and #6976

@caodudu
Copy link

caodudu commented Mar 13, 2023

@nathanhaigh @coralzhang @lucygarner @longmanz
I have tried to dissect the issue in #6701. I hope this will be helpful to you.

@saketkc
Copy link
Collaborator

saketkc commented Jul 6, 2023

Thanks for the bug report @nathanhaigh and thanks everyone for the patience. The fix for this and related issues is in the develop branch. You should be able to pull latest changes from the develop branch:

remotes::install_github("satijalab/seurat", ref="develop")

Please feel free to create a new issue if you still notice issues.

@saketkc saketkc closed this as completed Jul 6, 2023
@nathanhaigh
Copy link
Author

nathanhaigh commented Jul 7, 2023

@saketkc I don't think the recent commits on the develop branch fix the issue I reported here. As things stand on develop:

  • FoldChange.default(); FoldChange.DimReduc() calculate fold change correctly by assuming/doing the following:
    1. Assumes expression values are already logged (see iii below)
    2. Calculate the means of the two groups (on logged data)
    3. Calculate the fold change by subtracting data.2 from data.1 (this implies the values were already log-transformed, otherwise subtraction makes no sense)
  • FoldChange.Assay() (lines 1107 and 1111) and FoldChange.SCTAssay() (lines 1177 and 1184) calculates fold change incorrectly by doing the following:
    1. Calculates means on non-logged data
    2. Calculates the log of the rowMeans

For FoldChange.Assay() and FoldChange.SCTAssay(), lines similar to these that log transform after taking the means will over-estimate the fold change (https://rpubs.com/hrlai/meanlog_logmean):

return(log(x = rowMeans(x = expm1(x = x)) + pseudocount.use, base = base))

They should be changed so the log transform is done before taking the means:

return(rowMeans(log(expm1(x = x) + pseudocount.use, base = base)))

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants