Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

pyLDAvis "__num_dist_rows__" cannot assure that all rows' sum is 1 #173

Open
umusa opened this issue Sep 8, 2020 · 0 comments
Open

pyLDAvis "__num_dist_rows__" cannot assure that all rows' sum is 1 #173

umusa opened this issue Sep 8, 2020 · 0 comments

Comments

@umusa
Copy link

umusa commented Sep 8, 2020

I am trying to use pyLDAvis to visualize LDA results on databricks.

The env:

 Spark NLP version:  2.5.5
 Apache Spark version:  2.4.5

I got error:

 ValidationError: 
 * Not all rows (distributions) in topic_term_dists sum to 1.

from code:

pyLDAvis.prepare(**data)

the data has two arrays:

   data['doc_topic_dists'], data['doc_lengths']

I used the same function as the source code of PyLDAvis at https://github.com/bmabey/pyLDAvis/blob/master/pyLDAvis/_prepare.py

def __num_dist_rows__(array, ndigits=2):
return array.shape[0] - int((pd.DataFrame(array).sum(axis=1) < 0.999).sum())

to make sure that all rows' sum is 1.

But, I still got the error.

I found that the error only poped up when it size is large. Currently, it is 900+ rows.

If it is 300+ rows, no errorr.

Could anybody help me with this ?

thanks

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant