Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

vis_compare expansion #109

Open
Maschette opened this issue Jan 30, 2019 · 7 comments
Open

vis_compare expansion #109

Maschette opened this issue Jan 30, 2019 · 7 comments

Comments

@Maschette
Copy link

Suggestion for a new function which would essentially be an extension of vis_compare but has colors to specify what the 'not same' change is. Admittedly this is limited use case as too many variables and it would be super messy. The main thing purpose I was thinking of is tracking changes in genetics technical replicates where the possible values are 0, 1, 2, NA and you want to keep track of what these values change to between replicates.

Halfway through writing this I thought I may as well have a crack, this works although the 'new bit' could be a better, I'm primarily a base kid so that is what I did it in rather than mutate. Also I didn't use your colors and went with viridis. Finally the only thing I didnt do out of lazyness was NA's being used in from/to scenarios should be "NA" so that ggplot doesn't remove them and they get a color.

You could also not implement this if you think it is weird and that would be super fine.

vis_compare_new <- function(df1,
                        df2, type="same"){

 
  # throw error if df1 not data.frame
  visdat:::test_if_dataframe(df1)

  # throw error if df2 not data.frame
  visdat:::test_if_dataframe(df2)

  if (!identical(dim(df1), dim(df2))) {
    stop("vis_compare requires identical dimensions of df1 and df2")
  }

    v_identical <- Vectorize(identical)
    df_diff <- purrr::map2_df(df1,df2, v_identical)
    head(df_diff)
    d <- df_diff %>% as.data.frame() %>% purrr::map_df(visdat:::compare_print) %>% 
        visdat:::vis_gather_() %>% dplyr::mutate(value_df1 = visdat:::vis_extract_value_(df1), 
        value_df2 = visdat:::vis_extract_value_(df2))
#The new bit
    if (type!="same"){
    cols<-c('value_df1','value_df2' )
    d$fctr <- apply( d[ , cols ] , 1 , paste , collapse = "-" )
    d$fctr[d$valueType=="same"]<-"same"
    d$value_df1<-as.character(d$value_df1)
    d$value_df2<-as.character(d$value_df2)
    d[,cols][d$valueType=="same",]<-"same"
    }
    fillType<-dplyr::case_when(
      type == "same"~"valueType", 
      type == "from"~"value_df1", 
      type == "to"~ "value_df2",
      type == "both"~"fctr")
    
ggplot2::ggplot(data = d, ggplot2::aes_string(x = "variable", y = "rows")) + 
  ggplot2::geom_raster(ggplot2::aes_string(fill = fillType)) + 
  ggplot2::theme_minimal() + 
  ggplot2::theme(axis.text.x = ggplot2::element_text(angle = 45, vjust = 1, hjust = 1)) + 
  ggplot2::labs(x = "", y = "Observations", fill = "Cell Type") + 
 # ggplot2::scale_fill_manual(limits = c("same", "different"), breaks = c("same", "different"), values = c("#fc8d59", 
  #      "#91bfdb"), na.value = "grey") + 
  ggplot2::scale_y_reverse() + 
  ggplot2::theme(axis.text.x = ggplot2::element_text(hjust = 0.25)) + 
  ggplot2::scale_x_discrete(position = "top", limits = names(df_diff))+viridis::scale_fill_viridis(discrete = TRUE)
}

vis_compare_new(df1, df2, type = "both")
@njtierney
Copy link
Collaborator

Heya @Maschette ! :)

Thanks for putting the time into this :)

Do you think you could provide an example of the kinds of data you were imagining being compared here? I think that the idea is worthwhile exploring!

@Maschette
Copy link
Author

Hey @njtierney,
No worries, it was surprisingly quick. I have been thinking on this and it may be worth making it a new function vis_compare_dif maybe? the idea would then to be to add a rm.same option for if you want to filter out the ones that are the same and just display the differences.

Anyway, use case: this is a subset of genetics data from a technical replicate.

x1<-data.frame(x = c(NA, 2, 1, 2, 2, NA, 2, 2, 2, 2, 2, 
NA, 2, 2, 2, 2, 2, 2, 2, 2, 0, NA, 2, 2, 2, 2, 2, 2, 0, 2, NA, 
0, 2, 2, 2, 2, 2, 2, 2, NA, 0, 2, NA, NA, 0, 2, 2, NA, 2, 2, 
NA, 1, NA, 2, NA, 2, NA, 2, 0, NA, 2, 2, 0, NA, 2, NA, 2, 2, 
NA, 2, 0, NA, 2, 2, 2, 2, NA, 2, 2, 2, NA, NA, NA, NA, 2, NA, 
2, NA, NA, 2, 2, NA, NA, NA, 2, NA, 2, 2, NA, NA), y = c(NA, 
2, 1, 2, 2, NA, 2, 2, 2, 2, NA, 2, 2, 2, 2, 2, 2, 2, 2, 2, 0, 
NA, 2, 2, 2, 2, NA, 2, 0, 2, NA, 0, 2, 2, 2, 2, 2, 2, 2, 0, NA, 
2, NA, NA, 0, 2, 2, NA, 2, 2, 2, NA, NA, 2, 2, 2, NA, 2, 0, 2, 
2, 2, 0, 0, 2, NA, NA, 2, NA, 2, 0, NA, 2, 2, 2, 2, 2, 2, NA, 
2, 2, 2, 2, NA, 2, NA, 2, NA, NA, 2, 2, NA, 2, NA, 2, NA, 2, 
2, NA, NA))

vis_compare_new(x1[1], x1[2], type = "both")

image

The other option would be to display both columns of data and show side by side how they are different.
vis_compare_new(x1, rev(x1), type = "from")
image

@njtierney
Copy link
Collaborator

I like this a lot!

I would like to include this in visdat!

Two things to think about:

  1. The name - something to indicate that it is visualising the change/state/shift? would vis_shift make sense to you?

  2. The documentation about expected use - here it seems that comparison of columns is the focus, is that correct?

Thanks again for taking the time to do this, I really like it!

@Maschette
Copy link
Author

Hi Nick,
On your two points:
1: I like vis_change, or maybe vis_diff?
2: Yeah column comparison is the main use case I was thinking of but you should be able to use it for other things.

@njtierney
Copy link
Collaborator

Hi Dale,

  1. How about plural: vis_changes() or do you think singular vis_change() makes more sense for you? vis_diff() makes me think of git diff, but maybe that evokes more of what you think this would be used for?

  2. Sounds good!

  3. I'm not sure about the option type = "both"/"from" - perhaps something more verbose like show_both?

  4. Would this work for two data.frames?

@njtierney njtierney added this to the V0.6.0 milestone Feb 4, 2019
@Maschette
Copy link
Author

  1. vis_change() sounds good
  2. there is also a type = "to" so if you know all your data should be for example 0 you can see what it changes to. so maybe show="both" as default?
  3. yes it does.

This is where go to; since it would be a new function I removed the "same" option from case_when() the thing that would be cool would be to work out if you are comparing data frames with different names to have the names of both the columns in the x-axis.

vis_change <- function(df1,
                        df2, show="both"){

 
  # throw error if df1 not data.frame
  visdat:::test_if_dataframe(df1)

  # throw error if df2 not data.frame
  visdat:::test_if_dataframe(df2)

  if (!identical(dim(df1), dim(df2))) {
    stop("vis_compare requires identical dimensions of df1 and df2")
  }

    v_identical <- Vectorize(identical)
    df_diff <- purrr::map2_df(df1,df2, v_identical)
    head(df_diff)
    d <- df_diff %>% as.data.frame() %>% purrr::map_df(visdat:::compare_print) %>% 
        visdat:::vis_gather_() %>% dplyr::mutate(value_df1 = visdat:::vis_extract_value_(df1), 
        value_df2 = visdat:::vis_extract_value_(df2))
#The new bit
    if (type!="same"){
    cols<-c('value_df1','value_df2' )
    d$fctr <- apply( d[ , cols ] , 1 , paste , collapse = "-" )
    d$fctr[d$valueType=="same"]<-"same"
    d$value_df1<-as.character(d$value_df1)
    d$value_df2<-as.character(d$value_df2)
    d$value_df1[is.na(d$value_df1)]<-"NA"
    d$value_df2[is.na(d$value_df2)]<-"NA"
    
    d[,cols][d$valueType=="same",]<-"same"
    }
    fillType<-dplyr::case_when(
      show== "from"~"value_df1", 
      show== "to"~ "value_df2",
      show== "both"~"fctr")
    
ggplot2::ggplot(data = d, ggplot2::aes_string(x = "variable", y = "rows")) + 
  ggplot2::geom_raster(ggplot2::aes_string(fill = fillType)) + 
  ggplot2::theme_minimal() + 
  ggplot2::theme(axis.text.x = ggplot2::element_text(angle = 45, vjust = 1, hjust = 1)) + 
  ggplot2::labs(x = "", y = "Observations", fill = "Cell Type") + 
 # ggplot2::scale_fill_manual(limits = c("same", "different"), breaks = c("same", "different"), values = c("#fc8d59", 
  #      "#91bfdb"), na.value = "grey") + 
  ggplot2::scale_y_reverse() + 
  ggplot2::theme(axis.text.x = ggplot2::element_text(hjust = 0.25)) + 
  ggplot2::scale_x_discrete(position = "top", limits = names(df_diff))+viridis::scale_fill_viridis(discrete = TRUE)
}
x1<-data.frame(x = c(NA, 2, 1, 2, 2, NA, 2, 2, 2, 2, 2, NA, 2, 2, 2, 2, 2, 2, 2, 2, 0, NA, 2, 2, 2, 2, 2, 2, 0, 2, NA,  2,NA, 2, 0, NA, 2, 2, 2, 2, NA, 2, 2, 2, NA, NA, NA, NA, 2, NA, 2, NA, NA, 2, 2, NA, NA, NA, 2, NA, 2, 2, NA), 
               y = c(NA, 2, 1, 2, 2, NA, 2, 2, 2, 2, NA, 2, 2, 2, 2, 2, 2, 2, 2, 2, 0, NA, 2, 2, 2, 2, NA, 2, 0, 2, NA, NA, 2, 0, NA, 2, 2, 2, 2, 2, 2, NA, 2, 2, 2, 2, NA, 2, NA, 2, NA, NA, 2, 2, NA, 2, NA, 2, NA, 2, 2, NA, NA),  
               z = c( 0, 2, NA, 0, 2, 2, 2, 2, 2, 2, 2, 0, NA, 2, NA, NA, 0, 2, 2, NA, 2, 2, 2, NA, NA, 2, 2, 2, NA, 2,  2, 2, 2, 2, 2, 2, 2, 2, 2, 0, NA, 2, 2, 2, 2, NA, 2, 2, NA, 2, NA, NA, 2, 2, NA, 2, NA, 2, NA, 2, 2, NA, NA))

y1<-data.frame(w=c(2, 2, 0, NA, 2, NA, 2, 2, NA, 2, 0, NA, 2, 2, 2, 2, NA, 2, 2, 2, NA, NA, NA, NA, 2, NA, 2, NA, NA,  2, 2, 0, 2, NA, 0, 2, 2, 2, 2, 2, 2, 2, NA, 0, 2, NA, NA, 0, 2, 2, NA, 2, 2, NA, 1, NA, 2, NA, 2, NA, 2, 0, NA), q = c(NA,2, 1, 2, 2, NA, 2, 2, 2, 2, NA, 2, 2, 2, 2, 2, 2, 2, 2, 2, 0, NA, 2, 2, 2, 2, NA, 2, 0, 2, NA,  NA, 2, 0, NA, 2, 2, 2, 2, 2, 2, NA, 2, 2, 2, 2, NA, 2, NA, 2, NA, NA, 2, 2, NA, 2, NA, 2, NA, 2, 2, NA, NA), d = c( NA, 2, 2, 2, 2, 2, 2, NA, 2, 2, 2, 2, NA,NA, 2, 1, 2, 2, NA, 2, 2, 2, 2, NA, 2, 2, 2, 2, 2, 2, 2,  0, NA,2, NA, NA, 0, 2, 2, NA, 2, 2, 2, NA, NA, 2, 2, 2, NA, 2, 0, 2, 2, 2, 0, 0, 2, NA, NA, 2, NA, 2, 0))

vis_change(x1, y1, show= "to")
vis_change(x1, y1, show= "both")

image
image

@Maschette
Copy link
Author

oh it just occurred to me by removing the 'same' from type option you would also remove the if statement as it will always do it.

@njtierney njtierney modified the milestones: 0.7.0, 0.8.0 Apr 24, 2023
# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

2 participants