Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

BodyPix tflite model support #11

Closed
charliesantos opened this issue May 18, 2021 · 13 comments
Closed

BodyPix tflite model support #11

charliesantos opened this issue May 18, 2021 · 13 comments

Comments

@charliesantos
Copy link

@Volcomix , your work is really impressive! Thank you so much for this work.

I saw this comment from @PINTO0309 in this issue #2 (comment)

I don't know if it will be useful for you, but I have converted and quantized it for various frameworks and committed it to my repository.

TFLite Float32/Float16/INT8, TFJS, TF-TRT, ONNX, CoreML, OpenVINO IR FP32/FP16, Myriad Inference Blob

https://github.com/PINTO0309/PINTO_model_zoo
https://github.com/PINTO0309/PINTO_model_zoo/tree/main/109_Selfie_Segmentation

And I'm able to easily play with the different models. Thank you both for your hardwork.

I noticed @PINTO0309 also has bodypix tflite models here but they don't seem to work with @Volcomix pipeline. Getting some logs I noticed the output image has the following:

console.log({
  inputHeight: this._tflite._getInputHeight(),
  inputWidth: this._tflite._getInputWidth(),
  inputChannelCount: this._tflite._getInputChannelCount(),
  outputHeight: this._tflite._getOutputHeight(),
  outputWidth: this._tflite._getOutputWidth(),
  outputChannelCount: this._tflite._getOutputChannelCount(),
});
// Outputs an image with 10x8 resolution, 17 channels

As you can see, the output seems wrong. Any advice on what to adjust? My intention is I want to run bodypix model in wasm hoping to gain more performance than the tfjs one. Is this something you can help with? Thank you in advance!

@Volcomix
Copy link
Owner

Hey @charliesantos, thanks for reaching out. Indeed @PINTO0309 is doing an awesome work on these model conversions!
BodyPix models won't work with the pipelines of this demo without some tweaks. Especially because of the output which need to be decoded:

Not sure how easy it would be, I'll try to investigate a little bit.

@charliesantos
Copy link
Author

Thank you @Volcomix . It would be nice to at least use bodypix in wasm for the virtual background use case like the ones you have in this repo for google meet, ML Kit, and bodypix tfjs. No need to detect different poses.

@Volcomix
Copy link
Owner

I created a branch to experiment on BodyPix tflite models. I understand that the 4th output is supposed to hold the segmentation but I don't really understand how to work with its 20x15 resolution, even after looking into TF.js source code.
Any hint from @PINTO0309 maybe? Did you get any working example in Python with the float16 quantized model?

image

The inference time is good but I guess this one is for the worst accuracy. Not sure how promising it is. The output is unusable for now:
image

I'm not sure I will invest a lot more time on it so any help would be welcome.

@PINTO0309
Copy link

I have not written any sample code, but the following repository may be helpful.
https://github.com/hegman12/body_pix_tflite

The last time I converted the model was a long time ago (about a year ago), so I will try to convert it again with the latest knowledge.

@Volcomix
Copy link
Owner

Awesome! Thanks for the update I'm gonna take a look.

@PINTO0309
Copy link

PINTO0309 commented May 24, 2021

@Volcomix
The conversion took only five minutes. But what kind of resolution model do you want? I can change it as much as I want. The figure below shows the MobileNetV2 240x320 resolution model. [n, h, w, c] = [1, 240, 320, 3]
Screenshot 2021-05-24 11:39:32

I think we need to scale the height and width of the output according to the width of the stride.
Screenshot 2021-05-24 11:46:50
Screenshot 2021-05-24 11:47:03

@PINTO0309
Copy link

All the models were re-transformed and totally replaced with the same model structure of TensorFlow.js.
https://github.com/PINTO0309/PINTO_model_zoo/tree/main/035_BodyPix

@charliesantos
Copy link
Author

Thank you @PINTO0309 @Volcomix you both are amazing!

@PINTO0309
Copy link

I found a better sample.
https://github.com/google-coral/project-bodypix
segmentation

@PINTO0309
Copy link

Here is the one that was easiest to understand.
https://github.com/de-code/python-tf-bodypix

@Volcomix
Copy link
Owner

Thank you so much @PINTO0309. I can see that the new model output still has a resolution of 15x20 so I need to figure out how to map this on a 240x320 image. Hopefully I'll find the answer in the samples you found for us 🤞.
I'm gonna be pretty busy during the week so please don't worry if I answer with a little bit of delay.

@Volcomix
Copy link
Owner

Volcomix commented May 30, 2021

After loading the new tflite models and trying with the fastest one (mobilenet050, stride16, 240x320, float16 quantization), the performance seems equivalent/worse with tflite on wasm than with tfjs on webgl. Moreover the tfjs one works on a higher resolution:

image

image

After checking other projects code and the tfjs implementation of BodyPix more closely, I would have to handle padding and scaling implied by the strides which is implemented using tensorflow in all the reference projects but would require a lot of extra work in this demo. As the performance results are not that good with the tflite model of BodyPix, I don't wish to invest more time on this specific experiment.

I'm very sorry PINTO for making you spend time on it without ending.

P.S.: There is a work in progress in TF.js to handle tflite models. Maybe is there a chance that BodyPix would work on it when it will be ready: https://github.com/tensorflow/tfjs/tree/master/tfjs-tflite

@cansik
Copy link

cansik commented Jan 16, 2024

@Volcomix I am currently playing around with the openvino model provided by @PINTO0309 and I am was able to detect the keypoints of a person. However, the float_segments seem to be off by some pixels (or scale), if I resize them to the actual image size. How did you convert the mask from the segments to match the actual image size?

image

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants