Make it possible to find the closest cluster/node for new data #208

pulquero · 2021-02-03T12:47:16Z

Given a new data point and its associated lens values, make it possible to find where it should appear on the graph, i.e. find which cluster is the closest. The lens values can be used to lookup the hypercube(s), then find the nearest cluster(s), like get the N nearest points and return which clusters they belong to.

sauln · 2021-02-03T18:24:07Z

I could see that being very useful. Thanks for the suggestion. If you would like to take a stab at the implementation, that would be great for everyone.

pulquero · 2021-02-03T19:17:03Z

Here is something functional. With the addition of some extra internal data structures to track cubes->nodes this could probably be made a bit cleaner.

# maybe this should be something like cover.contains() to handle Cover subclasses
def find_hypercubes(lens_values, cover):
  cube_ids = []
  for i, center in enumerate(cover.centers_):
    lower_bounds, upper_bounds = center - cover.radius_, center + cover.radius_
    if np.all(lens_values >= lower_bounds) and np.all(lens_values <= upper_bounds):
      cube_ids.append(i)
  return cube_ids

def find_nodes(cube_ids, kmgraph):
  nodes = {}
  for node, data_ids in kmgraph['nodes'].items():
    if node.startswith(tuple(['cube'+str(i) for i in cube_ids])):
      nodes[node] = data_ids
  return nodes

def assign_nodes(data_values, data, nodes, knn):
  knn_data = []
  knn_node_ids = []
  for node, data_ids in nodes.items():
    node_data = data[data_ids]
    knn_data.append(node_data)
    knn_node_ids.append([node]*len(node_data))
  knn_data = np.vstack(knn_data)
  knn_node_ids = np.concatenate(knn_node_ids)
  knn.fit(knn_data)
  nn_ids = knn.kneighbors([data_values], return_distance=False)
  return np.unique(knn_node_ids[nn_ids])

sauln · 2021-02-03T20:22:55Z

Looks great! would you like to submit a PR?

ahassaine · 2021-11-16T11:15:44Z

This would indeed be very useful. I was wondering if this has been implemented or shall I just use the code snippet shared above?
Many thanks!

pulquero · 2021-11-16T13:37:50Z

Most of it has been merged already except for one outstanding PR, use the code from that branch. Sent from my Galaxy -------- Original message --------From: ahassaine ***@***.***> Date: 16/11/2021 11:15 (GMT+00:00) To: scikit-tda/kepler-mapper ***@***.***> Cc: pulquero ***@***.***>, Author ***@***.***> Subject: Re: [scikit-tda/kepler-mapper] Make it possible to find the closest cluster/node for new data (#208) This would indeed be very useful. I was wondering if this has been implemented or shall I just use the code snippet shared above? Many thanks! —You are receiving this because you authored the thread.Reply to this email directly, view it on GitHub, or unsubscribe.Triage notifications on the go with GitHub Mobile for iOS or Android.

pulquero mentioned this issue Feb 4, 2021

added contains() method to Cover class. #209

Merged

This was referenced Feb 13, 2021

Added clusters_from_cover to kmapper. #213

Closed

Locate nearest clusters for given data #214

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make it possible to find the closest cluster/node for new data #208

Make it possible to find the closest cluster/node for new data #208

pulquero commented Feb 3, 2021

sauln commented Feb 3, 2021

pulquero commented Feb 3, 2021

sauln commented Feb 3, 2021

ahassaine commented Nov 16, 2021

pulquero commented Nov 16, 2021 via email

Make it possible to find the closest cluster/node for new data #208

Make it possible to find the closest cluster/node for new data #208

Comments

pulquero commented Feb 3, 2021

sauln commented Feb 3, 2021

pulquero commented Feb 3, 2021

sauln commented Feb 3, 2021

ahassaine commented Nov 16, 2021

pulquero commented Nov 16, 2021 via email