Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Impl Automaton for {u8, str} #5

Closed
wants to merge 1 commit into from

Conversation

gereeter
Copy link
Contributor

For better or worse, this makes fst.search("foo") work as expected - this will be less efficient than the specialized methods. However, this also allows things like fst.search("foo".starts_with()) and so on.

The performance issue might be fixed by adding another method to Automaton:

fn suggest_next(&self, state: &Self::State) -> Option<u8> {
    None
}

This would have the semantics that if Some(byte) is returned, then !self.can_match(self.accept(state, other_byte)) for other_byte != byte. The FST searcher could then use this suggested byte to not have to explore all possible branches.

@gereeter
Copy link
Contributor Author

I believe the change in Stream would be

  self.stack.push(StreamState {
      node: next_node,
-     trans: 0,
+     trans: match self.aut.suggest_next(&next_state) {
+         None => 0,
+         Some(byte) => next_node.find_input(byte)
+     }
      out: out,
      aut_state: next_state,
  });

but I haven't delved too much into how the actual FST parts of this library work.

@BurntSushi
Copy link
Owner

Hmm, I don't think that change to Stream is actually required. As soon as accept returns StringState(usize::MAX), can_match should return false. That should be enough to prune the search space.

Oh, I see, I think you're saying that suggest_next would allow one to only examine a single transition if there are many. I'm not sure that's really worth it, since the vast majority of states in a typical FST have exactly 1 transition.

With all that said, I actually am not sure that I want these impls at this time. Notably, Regex::new("foo.*").unwrap() works well for "find keys with this prefix." It's certainly not as ergonomic, but these impls introduce multiple ways of accomplishing the same thing. For example, fst.search("foo") is a little strange, since I don't know when you would want to use that in lieu of fst.get("foo") or fst.contains_key("foo"). (I guess maybe in a generic context?)

I think this falls into the "maybe" pile for me personally. I'd like to get more experience using the library before adding too many conveniences. :-)

@gereeter
Copy link
Contributor Author

Yeah, I wasn't completely sure that this would be a good change - you probably wouldn't ever want fst.search("foo"), which is why I wrote "for better or worse". This is also why I didn't include it in the other PR. Incidentally, I also had a whole host of other changes that I judged even less useful than this one and so left out.

The use cases I was primarily thinking of were things like excluding a certain keyword, as in fst.search(aut.intersection("foo".complement())). However, regexes can definitely cover any necessary use cases.

@gereeter gereeter closed this Nov 16, 2015
@BurntSushi
Copy link
Owner

Yeah, that is pretty cool, I must admit. But I think being a little conservative for now is good. We may want to add these in the future. :-)

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants