Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Allow clustering of a nested path #1007

Merged
merged 9 commits into from
Dec 29, 2023
Merged

Allow clustering of a nested path #1007

merged 9 commits into from
Dec 29, 2023

Conversation

dsmilkov
Copy link
Collaborator

Clustering of something.*.* is not yet supported. The path must end in a field name for now.

Copy link
Contributor

@nsthorat nsthorat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sweet!

'texts': [
{
'text': 'Can you provide a short summary of the following text',
'text_cluster': {'cluster_id': 0, 'membership_prob': 1.0, 'topic': 'summarization'},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for my understanding where does "_cluster" come from? maybe this should be text.cluster if auto-gen

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's avoid having periods because that makes it unclear if cluster is a child of text, or a column name with a dot in it. If we do a "dot", we will now forever have to escape it with double quotes in order to select that field in text form
select_rows(['texts.*."text.cluster"']) vs simply select_rows(['texts.*.text_cluster']). Obviously a tuple will work, but the correct tuple will be tricky to write when using public api in notebooks.

@dsmilkov dsmilkov merged commit bb95217 into main Dec 29, 2023
4 checks passed
@dsmilkov dsmilkov deleted the ds-cluster-nested branch December 29, 2023 00:55
# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants