-
Notifications
You must be signed in to change notification settings - Fork 363
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
TypeError: unhashable type: 'Int64Index' #202
Comments
I had the same error solved by reducing the number of topics. |
What version are you using? I suggest using vs 3.3.1 and upgrading all |
FWIW, I ran into a similar issue with Python 3.8 and vs 3.3.1 in a situation where the original K of my model is greater than the resulting number of clusters. I've been driving myself insane trying to find a work around as my underlying data is pretty noisy, so if I reduce K to not have empty clusters a lot of junk ends up being spread around instead of being dumped into just a few clusters. Wish I could just go without the viz, but our communications team finds it really useful as a top line scan of recent twitter chatter. I tried implementing the following, borrowing from here, but that just got me to a different error (
Happy to post/share my full code if it's helpful. Thanks! |
I'm having the same error. Reducing the number of topics < 10 solves the issue though, but this def not optimal. |
I am also running into this issue. Following are the steps to reproduce it. Happy to provide more details if necessary. I am using pyLDAvis 3.3.1 I used the following 5 lines as documents to train a topic model for 5 topics.
data = [['I', 'ate', 'dinner'], ['We', 'had', 'a', 'three', 'course', 'meal'], ['In', 'the', 'end', 'we', 'all', 'felt', 'like', 'we', 'ate', 'too', 'much'], ['We', 'all', 'agreed', 'it', 'was', 'a', 'magnificent', 'evening'], ['He', 'loves', 'fish', 'tacos']]
id2word = corpora.Dictionary(data) # gensim.corpora
texts = data
corpus = [id2word.doc2bow(text1) for text1 in texts]
lda_mallet_model = gensim.models.wrappers.LdaMallet(mallet_path, corpus=corpus, num_topics=5, id2word=id2word, random_seed = 41) # I am using gensim-version 3.8.3
gensim_model = gensim.models.wrappers.ldamallet.malletmodel2ldamodel(lda_mallet_model)
pyldaVis_prepared_model = pyLDAvis.gensim_models.prepare(gensim_model, corpus, id2word) # this lines gives the error The error is:
|
I can reproduce the error now...work in progress TBC
for a working example (pyLDAvis_overview.ipynb), we get
your model produces
|
Hi Mark, I was wondering - did you manage to find some time to look into the above? Many thanks & best regards, Mike |
Hi, I believe the problem is that On line 258-259 in _prepare.py, log_lift = np.log(pd.eval("topic_term_dists / term_proportion")).astype("float64")
log_ttd = np.log(pd.eval("topic_term_dists")).astype("float64") when pyldavis calculate Then, on line 217-219 in _prepare.py, def _find_relevance(log_ttd, log_lift, R, lambda_):
relevance = lambda_ * log_ttd + (1 - lambda_) * log_lift
return relevance.T.apply(lambda topic: topic.nlargest(R).index) when it calculates relevance for different
Finally , when we call
Given that this problem is not specific to LDA Mallet (since any model which has 0 in I suggest to modify line 258-259 in _prepare.py to # to avoid -inf when calculating log_lift and log_ttd
topic_term_dists_non_zero = topic_term_dists.replace(0,1e-10)
log_lift = np.log(pd.eval("topic_term_dists_non_zero / term_proportion")).astype("float64")
log_ttd = np.log(pd.eval("topic_term_dists_non_zero")).astype("float64") |
Hi Ben,
Thank you for your great work!
I generated topic models with 5 different topic number on the same corpus and dictionary. I can use pyLDAvis to visualize
four of them, but one got an error. Would you please like to help me with this error. I got this error on both new and old version of pyLDAvis.
Best,
Zhijun
ERROR information
TypeError Traceback (most recent call last)
in
----> 1 vismallet = gensimvis.prepare(models[3], corpus, dictionary=id2word, sort_topics=False)
~\AppData\Roaming\Python\Python38\site-packages\pyLDAvis\gensim_models.py in prepare(topic_model, corpus, dictionary, doc_topic_dist, **kwargs)
121 """
122 opts = fp.merge(_extract_data(topic_model, corpus, dictionary, doc_topic_dist), kwargs)
--> 123 return pyLDAvis.prepare(**opts)
~\AppData\Roaming\Python\Python38\site-packages\pyLDAvis_prepare.py in prepare(topic_term_dists, doc_topic_dists, doc_lengths, vocab, term_frequency, R, lambda_step, mds, n_jobs, plot_opts, sort_topics, start_index)
437 term_frequency = np.sum(term_topic_freq, axis=0)
438
--> 439 topic_info = _topic_info(topic_term_dists, topic_proportion,
440 term_frequency, term_topic_freq, vocab, lambda_step, R,
441 n_jobs, start_index)
~\AppData\Roaming\Python\Python38\site-packages\pyLDAvis_prepare.py in _topic_info(topic_term_dists, topic_proportion, term_frequency, term_topic_freq, vocab, lambda_step, R, n_jobs, start_index)
278 for ls in _job_chunks(lambda_seq, n_jobs)))
279 topic_dfs = map(topic_top_term_df, enumerate(top_terms.T.iterrows(), start_index))
--> 280 return pd.concat([default_term_info] + list(topic_dfs))
281
282
~\AppData\Roaming\Python\Python38\site-packages\pyLDAvis_prepare.py in topic_top_term_df(tup)
262 def topic_top_term_df(tup):
263 new_topic_id, (original_topic_id, topic_terms) = tup
--> 264 term_ix = topic_terms.unique()
265 df = pd.DataFrame({'Term': vocab[term_ix],
266 'Freq': term_topic_freq.loc[original_topic_id, term_ix],
~\AppData\Roaming\Python\Python38\site-packages\pandas\core\series.py in unique(self)
1870 Categories (3, object): ['a' < 'b' < 'c']
1871 """
-> 1872 result = super().unique()
1873 return result
1874
~\AppData\Roaming\Python\Python38\site-packages\pandas\core\base.py in unique(self)
1045 result = np.asarray(result)
1046 else:
-> 1047 result = unique1d(values)
1048
1049 return result
~\AppData\Roaming\Python\Python38\site-packages\pandas\core\algorithms.py in unique(values)
405
406 table = htable(len(values))
--> 407 uniques = table.unique(values)
408 uniques = _reconstruct_data(uniques, original.dtype, original)
409 return uniques
pandas_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.unique()
pandas_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable._unique()
~\AppData\Roaming\Python\Python38\site-packages\pandas\core\indexes\base.py in hash(self)
4271 @Final
4272 def hash(self):
-> 4273 raise TypeError(f"unhashable type: {repr(type(self).name)}")
4274
4275 @Final
TypeError: unhashable type: 'Int64Index'
The text was updated successfully, but these errors were encountered: