Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Plotting a a condense tree result using the hdbscan.plots.CondensedTree class #25

Open
u3ks opened this issue Oct 11, 2024 · 3 comments

Comments

@u3ks
Copy link

u3ks commented Oct 11, 2024

Hi all,

Im trying to plot the output of fast_hdbscan.cluster_trees.condense_tree using the hdbscan.plots.CondensedTree class .
I tried converting the result like so:

ct_raw = np.rec.fromarrays((ct[0], ct[1], ct[2], ct[3]), dtype=[(' parent', np.intp),('child', np.intp),('lambda_val', float),('child_size', np.intp)])

Then passing it to the constructor - CondensedTree(ct_raw) - but i get an error that there are some parent nodes without children in the ct_raw array.

Specifically, the .max() call below (from the hdbscan.plots.CondensedTree.get_plot_data) throws the exception that its being called on an empty array:

`
for c in range(last_leaf, root - 1, -1):

        cluster_bounds[c] = [0, 0, 0, 0]

        c_children = self._raw_tree[self._raw_tree['parent'] == c]
        current_size = np.sum(c_children['child_size'])
        current_lambda = cluster_y_coords[c]
        cluster_max_size = current_size
        cluster_max_lambda = c_children['lambda_val'].max()`

Do you have any pointers how to convert between the two representations or how to change the get_plot_data function?

@lmcinnes
Copy link
Contributor

You may have ended up with a condensed forest instead of a condensed tree. That shouldn't really be possible, but perhaps there is a bug that makes it possible? I would need to see the actual tree data to diagnose...

@u3ks
Copy link
Author

u3ks commented Oct 15, 2024

Actually, I think I found the issue - it was because I was testing out the new sample weights functionality and I had a sample weight instance that was larger than the specified min_cluster_size.

Maybe throwing a warning for this in the initial tree construction would be beneficial?

@lmcinnes
Copy link
Contributor

Yes, that might be something that would be sensible. The sample weight stuff is pretty new so it isn't well tested yet.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants