-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
About Model Conversion! #3
Comments
我尝试用一下代码转换mge模型出现错误: Segmentation fault (core dumped)
|
@sunmooncode So far I was able to convert to traced module using the following code:
Running it will give you an error, but you can fix it by commenting this line: Line 45 in 924d9f2
My idea is to see if the
For some reason in that machine, loading the model does not work, but in another machine I have, I am able to do |
@ibaiGorordo thanks for your help!
The reason is that many OPs of CREstereo do not have corresponding operator support for mgeconverter. In addition: |
I get segmentation fault even if I build megengine and megconvert for my environment. By the way, I have already confirmed that CREStereo works in the CPU environment on which it was built. I have given up on exporting to ONNX because MegEngine will segfault no matter what workaround I try. ${HOME}/.local/bin/convert mge_to_onnx \
-i crestereo_eth3d.mge \
-o crestereo_eth3d.onnx \
--opset 11
/home/user/.local/lib/python3.8/site-packages/megengine/core/tensor/megbrain_graph.py:508: ResourceWarning: unclosed file <_io.BufferedReader name='crestereo_eth3d.mge'>
buf = open(fpath, "rb").read()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
Traceback (most recent call last):
File "/home/user/.local/bin/convert", line 525, in <module>
main()
File "/home/user/.local/bin/convert", line 518, in main
args.func(args)
File "/home/user/.local/bin/convert", line 283, in convert_func
converter_map[target](
File "/home/user/.local/lib/python3.8/site-packages/mgeconvert/converters/mge_to_onnx.py", line 50, in mge_to_onnx
irgraph = MGE_FrontEnd(mge_fpath, outspec=outspec).resolve()
File "/home/user/.local/lib/python3.8/site-packages/mgeconvert/frontend/mge_to_ir/mge_frontend.py", line 21, in __init__
_, outputs = load_comp_graph_from_file(model_path)
File "/home/user/.local/lib/python3.8/site-packages/mgeconvert/frontend/mge_to_ir/mge_utils.py", line 106, in load_comp_graph_from_file
ret = G.load_graph(path)
File "/home/user/.local/lib/python3.8/site-packages/megengine/core/tensor/megbrain_graph.py", line 511, in load_graph
cg, metadata = _imperative_rt.load_graph(buf, output_vars_map, output_vars_list)
RuntimeError: access invalid Maybe value
backtrace:
/home/user/.local/lib/python3.8/site-packages/megengine/core/lib/libmegengine_shared.so(_ZN3mgb13MegBrainErrorC1ERKSs+0x4a) [0x7f3b39dfe1fa]
/home/user/.local/lib/python3.8/site-packages/megengine/core/lib/libmegengine_shared.so(_ZN3mgb17metahelper_detail27on_maybe_invalid_val_accessEv+0x34) [0x7f3b39f060f4]
/home/user/.local/lib/python3.8/site-packages/megengine/core/_imperative_rt.cpython-38-x86_64-linux-gnu.so(+0x14c605) [0x7f3b94873605]
/home/user/.local/lib/python3.8/site-packages/megengine/core/_imperative_rt.cpython-38-x86_64-linux-gnu.so(+0x14c823) [0x7f3b94873823]
/home/user/.local/lib/python3.8/site-packages/megengine/core/_imperative_rt.cpython-38-x86_64-linux-gnu.so(+0x11d62e) [0x7f3b9484462e]
/usr/bin/python3(PyCFunction_Call+0x59) [0x5f5e79]
/usr/bin/python3(_PyObject_MakeTpCall+0x296) [0x5f6a46]
/usr/bin/python3(_PyEval_EvalFrameDefault+0x5d3f) [0x570a1f]
/usr/bin/python3(_PyFunction_Vectorcall+0x1b6) [0x5f6226]
/usr/bin/python3(_PyEval_EvalFrameDefault+0x5706) [0x5703e6] |
import pickle5
return pickle5.load(f) |
Hi @PINTO0309 thanks for the tip. @sunmooncode I got the same error after fixing my issue. Also I have create a Google Colab notebook to reproduce the error: Because of the CUDA version, it crashed if I load the model with a runtime with GPU. So, run it without GPU. |
Commenting these two lines fixes the rmul error: Lines 98 to 99 in 924d9f2
However, next I get the following error: |
That is because megconvert does not support this operation. I have seen that many in the network do not support it, so it should not be converted. |
Yeah... I tried converting the model to .mge (using I do not understand the framework enough, but seems to be hard to fix the issues |
Since it seems to be hard to convert, I have tried to implement the model in Pytorch. The model seems to run normally, but since I don't have the weights I cannot fully test it. My hope is that somehow we can translate the weights from this model there. |
@ibaiGorordo Good job |
@sunmooncode I was able to convert the weights directly, however, it seems that there is some parameter in my Pytorch implementation that is probably not correct. But overall the conversion seems to work. |
I was able to fix the implementation issues and convert the model to ONNX: https://github.com/ibaiGorordo/ONNX-CREStereo-Depth-Estimation. From there it should be easier to convert to other platforms. Here is a video with the output in ONNX: https://youtu.be/ciX7ILgpJtw @sunmooncode regarding the low speed, if you use a low resolution (320x240) and only do one pass without flow_init, you can get decent speed with good quality. |
I have committed a large number of ONNX models of various resolution and ITER combinations. I imagine the ITER10 version is twice as fast. https://github.com/PINTO0309/PINTO_model_zoo/tree/main/284_CREStereo
|
@PINTO0309 @ibaiGorordo Thanks for your help! |
@sunmooncode It is important to note that it takes 30 minutes to an hour to output a single onnx file. device = 'cpu'
model = Model(max_disp=256, mixed_precision=False, test_mode=True)
model.load_state_dict(torch.load(model_path), strict=True)
model.to(device)
model.eval()
import onnx
from onnxsim import simplify
RESOLUTION = [
[240//2,320//2],
[320//2,480//2],
[360//2,640//2],
[480//2,640//2],
[720//2,1280//2],
#[240,320],
#[320,480],
#[360,640],
#[480,640],
#[720,1280],
]
ITER=20
MODE='init'
MODEL = f'crestereo_{MODE}_iter{ITER}'
for H, W in RESOLUTION:
if MODE == 'init':
onnx_file = f"{MODEL}_{H}x{W}.onnx"
x1 = torch.randn(1, 3, H, W).cpu()
x2 = torch.randn(1, 3, H, W).cpu()
torch.onnx.export(
model,
args=(x1,x2),
f=onnx_file,
opset_version=12,
input_names = ['left','right'],
output_names=['output'],
)
model_onnx1 = onnx.load(onnx_file)
model_onnx1 = onnx.shape_inference.infer_shapes(model_onnx1)
onnx.save(model_onnx1, onnx_file)
model_onnx2 = onnx.load(onnx_file)
model_simp, check = simplify(model_onnx2)
onnx.save(model_simp, onnx_file)
elif MODE == 'next':
onnx_file = f"{MODEL}_{H}x{W}.onnx"
x1 = torch.randn(1, 3, H, W).cpu()
x2 = torch.randn(1, 3, H, W).cpu()
x3 = torch.randn(1, 2, H//2, W//2).cpu()
torch.onnx.export(
model,
args=(x1,x2,x3),
f=onnx_file,
opset_version=12,
input_names = ['left','right','flow_init'],
output_names=['output'],
)
model_onnx1 = onnx.load(onnx_file)
model_onnx1 = onnx.shape_inference.infer_shapes(model_onnx1)
onnx.save(model_onnx1, onnx_file)
model_onnx2 = onnx.load(onnx_file)
model_simp, check = simplify(model_onnx2)
onnx.save(model_simp, onnx_file)
import sys
sys.exit(0) Next, this script Alternatively, merging onnx into a single graph using this tool is optional and feasible. |
@PINTO0309 I have a problem that onnx has a hard time handling logical operators, but there are a lot of "if" constructs in the model. Does this have any effect on the transformation of the model? |
@sunmooncode |
@PINTO0309 hi, this image is from dataset Holopix 50K? I cannot get such good results with the same image. How did you do the pre-rectification? |
I think you can use any image you like. |
@PINTO0309 Hi, the result I get with test.py is much worse than the image you provided. Is there anything I need to do before input the image to test.py? |
@ibaiGorordo any sugestions? |
@Tord-Zhang 你使用的是什么模型 |
作者提供的模型哈。问题不在模型上,我是用的是holopix50k的原始数据,没有经过严格的极线校正。作者提供的两张图片是校正过的,所以我得到的结果比较差。不过作者没有回应关于极线校正的细节 |
极限校正的话可以用opencv或者MATLAB 你试过自己校正的数据或者Middlebury嘛 看看结果是否正确 |
@PINTO0309 Hello, have you tried to deploy on other platforms? The combined onnx reports an error when converting MNN: Can't convert Einsum for input size=3 |
If 3D doesn't work, just make it 2D. Try it yourself. |
#3 (comment) |
The topic is too old and I no longer have the resources at hand from that time. However, a comparison will immediately reveal the difference. You will soon see what errors you get when you run it on TensorRT.
|
Got it. Thanks a lot for the prompt reply! |
感谢您的开源!
您的项目非常有趣,我想问能使用mgeconvert 能将该模型转换成onnx嘛,或着有其他方法嘛!
期待您的回复!
The text was updated successfully, but these errors were encountered: