Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

implemented dataframe.cov #2142

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open

implemented dataframe.cov #2142

wants to merge 6 commits into from

Conversation

LSturtew
Copy link
Contributor

@LSturtew LSturtew commented Apr 8, 2021

ref #1929

Implement DataFrame.cov

>>> kdf = ks.DataFrame([(1, 2), (0, 3), (2, 0), (1, 1)],
...                   columns=['dogs', 'cats'])
>>> kdf.cov()
                  dogs      cats
        dogs  0.666667 -1.000000
        cats -1.000000  1.666667

@codecov-io
Copy link

Codecov Report

Merging #2142 (7987192) into master (d7f6e88) will decrease coverage by 2.23%.
The diff coverage is 85.71%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2142      +/-   ##
==========================================
- Coverage   95.37%   93.14%   -2.24%     
==========================================
  Files          60       60              
  Lines       13694    13601      -93     
==========================================
- Hits        13060    12668     -392     
- Misses        634      933     +299     
Impacted Files Coverage Δ
databricks/koalas/missing/frame.py 74.57% <ø> (-25.43%) ⬇️
databricks/koalas/frame.py 95.65% <85.71%> (-0.84%) ⬇️
databricks/koalas/usage_logging/__init__.py 28.20% <0.00%> (-64.36%) ⬇️
databricks/koalas/usage_logging/usage_logger.py 47.82% <0.00%> (-52.18%) ⬇️
databricks/koalas/missing/series.py 60.56% <0.00%> (-39.44%) ⬇️
databricks/koalas/__init__.py 80.26% <0.00%> (-11.85%) ⬇️
databricks/koalas/missing/indexes.py 88.63% <0.00%> (-11.37%) ⬇️
databricks/conftest.py 89.09% <0.00%> (-10.91%) ⬇️
databricks/koalas/typedef/typehints.py 86.22% <0.00%> (-9.19%) ⬇️
... and 30 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d7f6e88...7987192. Read the comment docs.

]
kdf = self[num_cols]
names = [name for t in num_cols for name in t]
mat = kdf.to_pandas().to_numpy(dtype=float, copy=False)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid using to_pandas() without any restriction is not a good idea. It will cause OOM if the data side doesn't fit in a driver's memory.

@xinrong-meng
Copy link
Contributor

xinrong-meng commented Aug 3, 2021

Hi @LSturtew, since Koalas has been ported to Spark as pandas API on Spark, would you like to migrate this PR to the Spark repository? Here is the ticket https://issues.apache.org/jira/browse/SPARK-36396. Otherwise, I may do that for you next week.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants