You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First of all, I would like to thank you for sharing this great work.
I trained the model successfully based on the instructions in README for articulatory detection in TIMIT.
But, I faced some problems in infer.py.
This is the type of error "operands could not be broadcast together with shapes (62,) (42,) " in this function "segs_phones_to_frame_binf". Because there is 62 attributes and 42 phonemes [39 (+UNK, SOS, EOS)]. The length of logits_binf is only 42.
Also, could you provide us the Notebook for compute the accuracy of attribute detection at frame level.
Finally, in the paper, there are 28 place and manner attributes. My question, which mapping file that you used to calculate the detection accuracy of these 28 only.
Thanks a lot.
The text was updated successfully, but these errors were encountered:
First of all, I would like to thank you for sharing this great work.
I trained the model successfully based on the instructions in README for articulatory detection in TIMIT.
But, I faced some problems in infer.py.
This is the type of error "operands could not be broadcast together with shapes (62,) (42,) " in this function "segs_phones_to_frame_binf". Because there is 62 attributes and 42 phonemes [39 (+UNK, SOS, EOS)]. The length of logits_binf is only 42.
Also, could you provide us the Notebook for compute the accuracy of attribute detection at frame level.
Finally, in the paper, there are 28 place and manner attributes. My question, which mapping file that you used to calculate the detection accuracy of these 28 only.
Thanks a lot.
The text was updated successfully, but these errors were encountered: