Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Make it possible to find the closest cluster/node for new data #208

Open
pulquero opened this issue Feb 3, 2021 · 5 comments · May be fixed by #214
Open

Make it possible to find the closest cluster/node for new data #208

pulquero opened this issue Feb 3, 2021 · 5 comments · May be fixed by #214

Comments

@pulquero
Copy link

pulquero commented Feb 3, 2021

Given a new data point and its associated lens values, make it possible to find where it should appear on the graph, i.e. find which cluster is the closest. The lens values can be used to lookup the hypercube(s), then find the nearest cluster(s), like get the N nearest points and return which clusters they belong to.

@sauln
Copy link
Member

sauln commented Feb 3, 2021

I could see that being very useful. Thanks for the suggestion. If you would like to take a stab at the implementation, that would be great for everyone.

@pulquero
Copy link
Author

pulquero commented Feb 3, 2021

Here is something functional. With the addition of some extra internal data structures to track cubes->nodes this could probably be made a bit cleaner.

# maybe this should be something like cover.contains() to handle Cover subclasses
def find_hypercubes(lens_values, cover):
  cube_ids = []
  for i, center in enumerate(cover.centers_):
    lower_bounds, upper_bounds = center - cover.radius_, center + cover.radius_
    if np.all(lens_values >= lower_bounds) and np.all(lens_values <= upper_bounds):
      cube_ids.append(i)
  return cube_ids

def find_nodes(cube_ids, kmgraph):
  nodes = {}
  for node, data_ids in kmgraph['nodes'].items():
    if node.startswith(tuple(['cube'+str(i) for i in cube_ids])):
      nodes[node] = data_ids
  return nodes

def assign_nodes(data_values, data, nodes, knn):
  knn_data = []
  knn_node_ids = []
  for node, data_ids in nodes.items():
    node_data = data[data_ids]
    knn_data.append(node_data)
    knn_node_ids.append([node]*len(node_data))
  knn_data = np.vstack(knn_data)
  knn_node_ids = np.concatenate(knn_node_ids)
  knn.fit(knn_data)
  nn_ids = knn.kneighbors([data_values], return_distance=False)
  return np.unique(knn_node_ids[nn_ids])

@sauln
Copy link
Member

sauln commented Feb 3, 2021

Looks great! would you like to submit a PR?

@ahassaine
Copy link

This would indeed be very useful. I was wondering if this has been implemented or shall I just use the code snippet shared above?
Many thanks!

@pulquero
Copy link
Author

pulquero commented Nov 16, 2021 via email

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants