Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[FEATURE] Update Implicit Function APIs to Accept and/or Mutate a DataFrame #53

Closed
TheCedarPrince opened this issue Jun 25, 2023 · 2 comments
Labels
enhancement New feature or request moderate Issue of moderate difficulty

Comments

@TheCedarPrince
Copy link
Member

TheCedarPrince commented Jun 25, 2023

This has been a feature I have been thinking about for a while wanting. I think all species of functions within this package should be able to accept a DataFrame and, depending on the function, know how to index that DataFrame to automatically retrieve information required. Additionally, functions maybe should automatically join results onto a passed in DataFrame.

The reason for these changes is that I often want to use the pattern:

using Chain
using DataFrames
using OMOPCDMCohortCreator

@chain patient_df begin
  GetPatientGender
  GetPatientRace
  GetPatientAgeGroup
  _[:, Not(:person_id)]
  groupby(_, names(_))
end

or even

Characterize(x) = (GetPatientGender  GetPatientRace  GetPatientAgeGroup)(x)

To do very quick, rapid analyses and to re-use analyses over and over again clearly and explicitly. Not sure how much of the API should change as a result of this fix but would lend itself much better to composed functions and composition.

@TheCedarPrince TheCedarPrince added enhancement New feature or request moderate Issue of moderate difficulty labels Jun 25, 2023
@TheCedarPrince
Copy link
Member Author

TheCedarPrince commented Jul 3, 2023

Here is what we discussed in our call:

# Existing dispatch of working with person ids
GetPatientGender([1, 2, 3], conn)

# Issue idea
using DataFrames 
df = DataFrame([1, 2, 3], cols = [:person_id])

GetPatientGender(df, conn)

# Dispatch function "knows" what column it is expecting to see from the DataFrame
function GetPatientGender(df::DataFrame, conn; ...)
  ids = df.person_id
  conn = conn

  # DataFrame with two columns: person_id, gender_concept_id
  # This is the a new DataFrame returned from the dispatch call
  new_df = GetPatientGender(ids, conn)

  # With this part, try this out for one or two functions
  df = outerjoin(df, new_df, on = [:person_id => :person_id])
  
  # DataFrame with two columns: person_id, gender_concept_id
  # This is the original DataFrame that was passed into the function
  # but has been updated (mutated) by the function itself
  return df

end

Let me know if you have any questions -- thanks!

@TheCedarPrince
Copy link
Member Author

Closed by #54

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
enhancement New feature or request moderate Issue of moderate difficulty
Projects
None yet
Development

No branches or pull requests

1 participant