Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

added function to track active users #23

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

ArnavS59
Copy link

Track active users in DB

♻️ Current situation & Problem

PR for #17

⚙️ Release Notes

  • Added function extract_latest_user_interaction to data_processor.py to track active users in the DB.

📚 Documentation

Please ensure that you properly document any additions in conformance to Spezi Documentation Guide.
You can use this section to describe your solution, but we encourage contributors to document your reasoning and changes using in-line documentation.

✅ Testing

Please ensure that the PR meets the testing requirements set by CodeCov and that new functionality is appropriately tested.
This section describes important information about the tests and why some elements might not be testable.

📝 Code of Conduct & Contributing Guidelines

By submitting creating this pull request, you agree to follow our Code of Conduct and Contributing Guidelines:

@PSchmiedmayer PSchmiedmayer requested a review from Vicbi January 29, 2025 01:19
@PSchmiedmayer PSchmiedmayer added the enhancement New feature or request label Jan 29, 2025
Copy link

codecov bot commented Feb 7, 2025

Codecov Report

Attention: Patch coverage is 11.11111% with 8 lines in your changes missing coverage. Please review.

Project coverage is 81.65%. Comparing base (e6cf0e6) to head (490c25a).

Files with missing lines Patch % Lines
...zi_data_pipeline/data_processing/data_processor.py 11.12% 8 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main      #23      +/-   ##
==========================================
- Coverage   82.04%   81.65%   -0.39%     
==========================================
  Files          16       16              
  Lines        1609     1618       +9     
==========================================
+ Hits         1320     1321       +1     
- Misses        289      297       +8     
Flag Coverage Δ
unittests 81.65% <11.12%> (-0.39%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...zi_data_pipeline/data_processing/data_processor.py 60.44% <11.12%> (-5.41%) ⬇️

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e6cf0e6...490c25a. Read the comment docs.

Copy link
Collaborator

@Vicbi Vicbi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ArnavS59,
Thank you for your contribution! I have a few suggestions:

  • Consider using string constants for the column names within your function. This can help with readability and future code maintenance.
  • Please take a moment to address the linting errors.
  • It would be helpful to add a unit test for your function.

return

#First convert column to datetime for comparison
flattend_fhir_df['EffectiveDateTime'] = pd.to_datetime(flattend_fhir_df['EffectiveDateTime'], format='%d.%m.%y')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider replacing the string column name with a constant already declared for all column names in FHIRDataframe

flattend_fhir_df['EffectiveDateTime'] = pd.to_datetime(flattend_fhir_df['EffectiveDateTime'], format='%d.%m.%y')
#Filter the most recent entry for each userid
most_recent_df=flattend_fhir_df.loc[flattend_fhir_df.groupby('UserId')['EffectiveDateTime'].idxmax()]
most_recent_df=most_recent_df[['UserId','EffectiveDateTime']]#select the relevant cols
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here for "UserID" and "EffectiveDateTime".

most_recent_df=flattend_fhir_df.loc[flattend_fhir_df.groupby('UserId')['EffectiveDateTime'].idxmax()]
most_recent_df=most_recent_df[['UserId','EffectiveDateTime']]#select the relevant cols
most_recent_df.rename(columns={'EffectiveDateTime': 'LastUserInteraction'}, inplace=True)
most_recent_df.to_csv('output.csv')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another suggestion would be to allow the user of the spezi-data-pipeline package to set the name of the output file by themselves.

@@ -313,3 +313,24 @@ def select_data_by_dates( # pylint: disable=unused-variable
filtered_df.reset_index(drop=True),
resource_type=flattened_fhir_dataframe.resource_type,
)

def extract_latest_user_interaction( # pylint: disable=unused-variable
flattend_fhir_df: pd.DataFrame
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the spezi-data-pipeline package, the functionalities are built around FHIRDataFrame objects. You might want to adjust your function to receive FHIRDataFrame as an input argument.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
enhancement New feature or request
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

3 participants