-
Notifications
You must be signed in to change notification settings - Fork 372
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Comparison with dplyr and Stata #2329
Conversation
Thank you for working on it. I have left some comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this! It's super helpful.
I think we could make this a bit more biased towards DataFrames. For instance, working with variables rather than literals in a comparison between DataFrames and dplyr. Similarly, we should show off describe
more.
I've answered the comments and added two examples of functions that return dataframes rather than vectors. |
Thank you for the fixes. Let us wait for @nalimilan to have a look at the PR. |
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
In general the html output is a bit awkward. Lots of lines are cut off. Bit I'm not sure what we can do about that.
Co-authored-by: pdeffebach <23196228+pdeffebach@users.noreply.github.com>
Co-authored-by: pdeffebach <23196228+pdeffebach@users.noreply.github.com>
Maybe we should switch from a table to a list of lists? Not sure what would look better though |
We could split the comparison into a dplyr and stata one, which would make each table less wide. Moreover, in reality people tend to be interested in either comparison, not both at the same time. |
Good point - agreed. It will create some duplication but I do not think it is a problem (we can make separate sections for them). |
Done. This led me to add a Stata specific operation: transform certain rows. Makes me think that it would be nice to define some kind of replace signature to make it easier. |
Indeed, especially as |
I think |
Unfortunately it is incorrect. You have to write Using |
This is the reason that I am on board with efforts to make handling or in the original code:
which feels more idiomatic. |
I think it’s #2211 |
@nalimilan - ok to merge? |
@nalimilan - I would merge it if you do not have additional comments. |
So - is this good to merge? |
Just waiting for Travis... :-p |
Thanks! |
The new page doesn't show up in the manual. I think you need to add it to the list too. Also there are a few issues with some tables, see https://juliadata.github.io/DataFrames.jl/latest/man/comparisons/ |
||`combine(df, names(df, r"^x") .=> mean)`|`collapse (mean) x*`| | ||
||`combine(df, ([:x, :y] .=> [maximum minimum])...)`|`collapse (max) x y (min) x y`| | ||
|Multivariate function|`transform!(df, [:x, :y] => cor => :z)`|`egen z = corr(x y)`| | ||
|Row-wise|`transform!(df, [:x, :y] => ByRow(min) => :z)`|`egen z = rowmin(x y)`| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it would be useful to add an example of a simple row-wise function using gen
in Stata? For example it's not obvious that ByRow
is also useful for things like :x => ByRow(x -> x^2) => :x²
.
This adds a very simple table to compare the main functions in DataFrames with dplyr and Stata.
I focused on the main functions to make it as simple as possible.