about dert code problem(transformer part) #44

darewolf007 · 2024-03-27T09:32:29Z

Hello, in your detr code, you use transformer get the output is [bs, hidden_dim, feature_dim], the code is

self.transformer(self.input_proj(src), mask, self.query_embed.weight, pos[-1])[0]

the transformer code is

hs = self.decoder(tgt, memory, memory_key_padding_mask=mask, pos=pos_embed, query_pos=query_embed)
hs = hs.transpose(1, 2)
return hs

Based on my understanding, in your code, you only choose the first decoder layer output as the feature to predict the action. However, i see the original detr code the transformer output is:

hs = self.decoder(tgt, memory, memory_key_padding_mask=mask, pos=pos_embed, query_pos=query_embed)
return hs.transpose(1, 2), memory.permute(1, 2, 0).view(bs, c, h, w)

The original detr code use the same feature processing code

hs = self.transformer(self.input_proj(src), mask, self.query_embed.weight, pos[-1])[0]
outputs_class = self.class_embed(hs)

I would like to ask why only the first-layer output is chosen as the feature. Would selecting the seventh layer be a better choice? Thank you！！！

The text was updated successfully, but these errors were encountered:

ka2hyeon · 2024-03-29T03:18:20Z

I think the authors made a mistake when they cherry-pick the original DETR code.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

about dert code problem(transformer part) #44

about dert code problem(transformer part) #44

darewolf007 commented Mar 27, 2024 •

edited

Loading

ka2hyeon commented Mar 29, 2024

about dert code problem(transformer part) #44

about dert code problem(transformer part) #44

Comments

darewolf007 commented Mar 27, 2024 • edited Loading

ka2hyeon commented Mar 29, 2024

darewolf007 commented Mar 27, 2024 •

edited

Loading