Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Problem with @jit.trace(symbolic=True) in the train.py of the cascade_emd model #5

Open
stoneMo opened this issue Jul 17, 2020 · 1 comment

Comments

@stoneMo
Copy link

stoneMo commented Jul 17, 2020

When I run the cascade_emd model, I met the error as the following. I appreciate it if you could help me out. Thank you in advance.

Traceback (most recent call last):
File "train.py", line 167, in
run_train()
File "train.py", line 164, in run_train
train(args)
File "train.py", line 156, in train
worker(0, 1, args)
File "train.py", line 119, in worker
train_one_epoch(model, train_loader, opt, max_steps, rank, epoch_id, gpu_num)
File "train.py", line 58, in train_one_epoch
losses = propagate()
File "/home/xinmiao/anaconda3/envs/CRDET/lib/python3.6/site-packages/megengine/jit/init.py", line 424, in call
self._compiled_func()
File "/home/xinmiao/anaconda3/envs/CRDET/lib/python3.6/site-packages/megengine/_internal/mgb.py", line 1208, in call
self._execute()
File "/home/xinmiao/anaconda3/envs/CRDET/lib/python3.6/site-packages/megengine/_internal/mgb.py", line 1092, in _execute
return _mgb.AsyncExec__execute(self)
megengine._internal.exc.MegBrainError: MegBrain core throws exception: mgb::AssertionError
assertion `begin >= 0 && end >= begin && end <= size_ax' failed at /home/code/src/core/impl/tensor.cpp:151: mgb::SubTensorSpec mgb::Slice::apply(megdnn::TensorLayout, int) const
extra message: index out of bound: layout={511(1),1(1)}; request begin=None end=2 step=None axis=1

  • bt:/home/xinmiao/anaconda3/envs/CRDET/lib/python3.6/site-packages/megengine/_internal/_mgb.cpython-36m-x86_64-linux-gnu.so{1e36052,1edec06,1fc6782,1fc6fd0}
    | Associated operator: id=160315 name=subtensor(argsort[160305]:o0)[160315] type=mgb::opr::Subtensor
    | input variables:
    | 0: {id:160306, shape:{511,1}, Float32, owner:argsort(MUL[160303])[160305]{ArgsortForward}, name:argsort(MUL[160303])[160305]:o0, slot:0, gpu0:0, d, 8, 1}
    | 1: {id:21, shape:{1}, Int32, owner:2[20]{ImmutableTensor}, name:2[20], slot:0, gpu0:0, s, 2, 2}
    | output variables:
    | 0: {id:160316, shape:{553,2}, Float32, owner:subtensor(argsort[160305]:o0)[160315]{Subtensor}, name:subtensor(argsort[160305]:o0)[160315], slot:0, gpu0:0, d, 8, 8}
    |
    | Unoptimized equivalent of associated operator: id=10623 name=subtensor(argsort[10615]:o0)[10623] type=mgb::opr::Subtensor
    | input variables:
    | 0: {id:10616, shape:{}, Float32, owner:argsort(MUL[10611])[10615]{ArgsortForward}, name:argsort(MUL[10611])[10615]:o0, slot:0, gpu0:0, d, 8, 1}
    | 1: {id:21, shape:{1}, Int32, owner:2[20]{ImmutableTensor}, name:2[20], slot:0, gpu0:0, s, 2, 2}
    | output variables:
    | 0: {id:10624, shape:{}, Float32, owner:subtensor(argsort[10615]:o0)[10623]{Subtensor}, name:subtensor(argsort[10615]:o0)[10623], slot:0, gpu0:0, d, 8, 8}
@xg-chu
Copy link
Collaborator

xg-chu commented Aug 3, 2020

Does this error exist in fpn_baseline or emd_simple?

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants