Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

TFLite perf regression with in-graph feature computation #2139

Closed
reuben opened this issue May 29, 2019 · 7 comments
Closed

TFLite perf regression with in-graph feature computation #2139

reuben opened this issue May 29, 2019 · 7 comments
Assignees
Labels

Comments

@reuben
Copy link
Contributor

reuben commented May 29, 2019

The TFLite interpreter API runs the entire graph when you call Invoke(), which means every time we try to compute features we run the acoustic model as well.

The Interpreter API has experimental support for subgraphs, but InterpreterBuilder doesn't support that yet, so we would have to build the graphs manually (by copy-pasting and modifying the InterpreterBuilder code), which is hacky and makes upgrading TF versions difficult, as well as relying on an experimental API.

Alternatively, we can use two different models/interpreters. One with the acoustic model graph and one with the MFCC feature computation graph. The question is then how do we package this secondary graph:

  1. Separate .tflite file that has to be specified in the API (terrible for users)
  2. Separate .tflite file that we then convert into a C source file and embed into libdeepspeech.so, then load with tflite::FlatBufferModel::BuildFromBuffer (hacky, specially process wise - need to check-in generated code, remember to update it when the source changes, etc)
  3. Separate .tflite file that we then append to the main .tflite file, then mmap the whole thing ourselves, and use tflite::FlatBufferModel::BuildFromBuffer to create the two separate FlatBuffer models and interpreters (cleaner than other options IMO, but would require implementing or importing some cross-platform mmap abstraction)
  4. Something else that I'm missing.

I've implemented option 2 as a proof of concept to validate that it fixes the perf regression, it's here: 0c8017a

@lissyx thoughts?

@reuben reuben added the bug label May 29, 2019
@reuben reuben self-assigned this May 29, 2019
@lissyx
Copy link
Collaborator

lissyx commented May 30, 2019

The Interpreter API has experimental support for subgraphs, but InterpreterBuilder doesn't support that yet, so we would have to build the graphs manually (by copy-pasting and modifying the InterpreterBuilder code), which is hacky and makes upgrading TF versions difficult, as well as relying on an experimental API.

Right. Something I was wondering but could not check in time is: is the Mfcc node connected to the rest of the graph ? Why TensorFlow runtime is able to flow only the Mfcc node if it's connected ?
Could it be a bug, either in toco or in our export graph, that makes the Mfcc node connected to the rest of the graph ?

@reuben
Copy link
Contributor Author

reuben commented May 30, 2019

The two graphs are not connected. This behavior is documented in tflite::Interpreter::Invoke, the whole graph is run in topological order.

@reuben
Copy link
Contributor Author

reuben commented May 30, 2019

This could be fixed cleanly if TF had better support for subgraphs. Right now the .tflite produced by toco has all nodes in the primary subgraph, because other APIs like InterpreterBuilder don't support multiple subgraphs yet.

@lissyx
Copy link
Collaborator

lissyx commented May 30, 2019

TF had better support for subgraphs.

It might be the case, I saw some commits from februrary toying with that. Not sure which release it is, though, and we may want to have something ready now anyway.

@reuben
Copy link
Contributor Author

reuben commented May 30, 2019

Yeah, it'd be better to release 0.5 without this regression. But if better subgraph support is coming in, say, TF 1.14, we can always revert the solution we use now to a better one.

@lissyx
Copy link
Collaborator

lissyx commented May 30, 2019

So as discussed on IRC, we might have a way playing around with the execution plan: limiting it to nodes 1 and 2 for Mfcc, I could get expected behavior of not traversing the whole graph. Only limitation so far is that this too naive approach proves to provide improper final result. It's possible it requires more refinements.

@reuben reuben closed this as completed in 4fcf47c May 31, 2019
@lock
Copy link

lock bot commented Jun 30, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Jun 30, 2019
# for free to subscribe to this conversation on GitHub. Already have an account? #.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants