-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
REF: Israel pipeline #518
REF: Israel pipeline #518
Conversation
@lucasrodes Thank you ! Will update the scripts with appropriate variables instead of input as it is reserved. Also will extract common functions into utils. |
Awesome @covid19owid! Thanks for your work! I can work on that with you. I'd suggest doing small incremental changes in the PRs affecting few files (e.g. changes in few pipelines, few methods in utils.pipeline, etc.). I think this way the code reviewing is simpler (let's make @edomt's life easier 😄) and it is less probable that there are code collisions (i.e. duplicate work) |
I asked many times before unsubscribe me from that list but I’m still receiving dozens of these every day!
I unsubscribed me using my GitHub account but it doesn’t work. Still receiving
…Sent from my iPhone
On 20 Feb 2021, at 19:32, Lucas Rodés-Guirao ***@***.***> wrote:
Awesome @covid19owid! Thanks for your work! I can work on that with you.
I'd suggest doing small incremental changes in the PRs affecting few files (e.g. changes in few pipelines, few methods in utils.pipeline, etc.). I think this way the code reviewing is simpler (let's make @edomt's life easier 😄) and it is less probable that there are code collisions (i.e. duplicate work)
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
Hey!
|
I tried to make those reusable steps emerge from working with the pipelines to:
|
"vaccinated_cum": "people_vaccinated", | ||
"vaccinated_seconde_dose_cum": "people_fully_vaccinated" | ||
}) | ||
def format_date(df: pd.DataFrame) -> pd.DataFrame: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps those should be decoupled into:
def format_date(df: pd.DataFrame) -> pd.DataFrame:
return df.assign(date=df.date.str.slice(0, 10))
def filter_date(df: pd.DataFrame) -> pd.DataFrame:
return df[df.date < str(datetime.date.today())]
This decouples the logic (cognitive load, reusability), and we don’t mutate inputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! Thanks
|
||
def select_distinct(df: pd.DataFrame) -> pd.DataFrame: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let’s return directly without a temporary variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Thanks for your feedback, @ValentinMouret. Very much appreciated.
Same 😄
I think it is a good practice not to use reserved keywords, just to avoid confusion if other people were to use the code. But like you, no strong opinion either.
Agree. I think for now it is totally fine to place the re-usable code in utils.pipeline. If in the future we think that further packaging the project makes sense, then the work of "pre-packaging" will be extremely valuable. If not, well, we did reduce code redundancy.
I really have liked your proposal. I might tend to go for the "having too much abstraction" way often, so please control me 😄 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks ok
|
||
def select_distinct(df: pd.DataFrame) -> pd.DataFrame: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
"vaccinated_cum": "people_vaccinated", | ||
"vaccinated_seconde_dose_cum": "people_fully_vaccinated" | ||
}) | ||
def format_date(df: pd.DataFrame) -> pd.DataFrame: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! Thanks
Thank you @lucasrodes & @ValentinMouret :) |
Refactored Israel vaccination script according to pipeline approach proposed in #465.
@covid19owid @ValentinMouret @edomt
Some open questions:
df
instead ofinput
(as the latter is a reserved keyword)?