We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
I am trying to use pyLDAvis to visualize LDA results on databricks.
The env:
Spark NLP version: 2.5.5 Apache Spark version: 2.4.5
I got error:
ValidationError: * Not all rows (distributions) in topic_term_dists sum to 1.
from code:
pyLDAvis.prepare(**data)
the data has two arrays:
data['doc_topic_dists'], data['doc_lengths']
I used the same function as the source code of PyLDAvis at https://github.com/bmabey/pyLDAvis/blob/master/pyLDAvis/_prepare.py
def __num_dist_rows__(array, ndigits=2): return array.shape[0] - int((pd.DataFrame(array).sum(axis=1) < 0.999).sum())
to make sure that all rows' sum is 1.
But, I still got the error.
I found that the error only poped up when it size is large. Currently, it is 900+ rows.
If it is 300+ rows, no errorr.
Could anybody help me with this ?
thanks
The text was updated successfully, but these errors were encountered:
No branches or pull requests
I am trying to use pyLDAvis to visualize LDA results on databricks.
The env:
I got error:
from code:
the data has two arrays:
I used the same function as the source code of PyLDAvis at https://github.com/bmabey/pyLDAvis/blob/master/pyLDAvis/_prepare.py
to make sure that all rows' sum is 1.
But, I still got the error.
I found that the error only poped up when it size is large. Currently, it is 900+ rows.
If it is 300+ rows, no errorr.
Could anybody help me with this ?
thanks
The text was updated successfully, but these errors were encountered: