Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Resolve type instability issues in aggregation #2424

Closed
bkamins opened this issue Sep 11, 2020 · 3 comments · Fixed by #2682
Closed

Resolve type instability issues in aggregation #2424

bkamins opened this issue Sep 11, 2020 · 3 comments · Fixed by #2682
Labels
bug non-breaking The proposed change is not breaking performance
Milestone

Comments

@bkamins bkamins added non-breaking The proposed change is not breaking performance labels Sep 11, 2020
@bkamins bkamins added this to the 1.0 milestone Sep 11, 2020
@nalimilan
Copy link
Member

In the following functions, the first two allocate, but not the last one:

function fill_row2!(row, col::AbstractVector,
                    i::Integer,
                    colnames::NTuple{N, Symbol}) where N
    @inbounds for j in 1:length(colnames)
        cn = colnames[j]
        val = row[cn]
        col[i] = val
    end
    return nothing
end

function fill_row2b!(row, col::AbstractVector,
                     i::Integer,
                     colnames::NTuple{N, Symbol}) where N
    @inbounds for cn in colnames
        val = row[cn]
        col[i] = val
    end
    return nothing
end

function fill_row3!(row, col::AbstractVector,
                    i::Integer,
                    colname::Symbol) where N
    val = row[colname]
    col[i] = val
    return nothing
end

A possible workaround is to require column names to be always in the same order: then we could index using an integer, which I just checked doesn't allocate.

@bkamins bkamins mentioned this issue Mar 4, 2021
19 tasks
@bkamins
Copy link
Member Author

bkamins commented Mar 27, 2021

This is a bug. We have to fix it and insist that column names always have the same order. I will fix it. Otherwise we have:

julia> df = DataFrame(x = 1:4)
4×1 DataFrame
 Row │ x     
     │ Int64 
─────┼───────
   1 │     1
   2 │     2
   3 │     3
   4 │     4

julia> gdf = groupby(df, :x)
GroupedDataFrame with 4 groups based on key: x
First Group (1 row): x = 1
 Row │ x     
     │ Int64 
─────┼───────
   1 │     1
⋮
Last Group (1 row): x = 4
 Row │ x     
     │ Int64 
─────┼───────
   1 │     4

julia> combine(gdf, :x => f => AsTable)
4×3 DataFrame
 Row │ x      b      a     
     │ Int64  Int64  Int64 
─────┼─────────────────────
   1 │     1      1     11
   2 │     2     12      2
   3 │     3      3     13
   4 │     4     14      4

julia> function f2(x)
           if isodd(x[1])
                  return (a=x[1], b=x[1]+10)
              else
                  return (b=x[1], a=x[1]+10)
              end
       end
f2 (generic function with 1 method)

julia> combine(gdf, :x => f2 => AsTable)
4×3 DataFrame
 Row │ x      a      b     
     │ Int64  Int64  Int64 
─────┼─────────────────────
   1 │     1      1     11
   2 │     2     12      2
   3 │     3      3     13
   4 │     4     14      4

so it is hard to predict the column order (it will depend on the first row processed, which in threading context is problematic)

@bkamins bkamins added the bug label Mar 27, 2021
@bkamins
Copy link
Member Author

bkamins commented Mar 27, 2021

Also this is what we require in other places:

julia> combine(df, :x => ByRow(x -> iseven(x) ? (a=x, b=x+10) : (b=x, a=x+10)))
4×1 DataFrame
 Row │ x_function      
     │ NamedTupl…      
─────┼─────────────────
   1 │ (b = 1, a = 11)
   2 │ (a = 2, b = 12)
   3 │ (b = 3, a = 13)
   4 │ (a = 4, b = 14)

julia> combine(df, :x => ByRow(x -> iseven(x) ? (a=x, b=x+10) : (b=x, a=x+10)) => AsTable)
ERROR: ArgumentError: keys of the returned elements must be identical

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug non-breaking The proposed change is not breaking performance
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants