How to use MaskZero with LSTM and nn.ClassNLLCriterion for variable length squences #75

jundengdeng · 2015-12-10T16:33:07Z

Hi Guys,
I tried to make use LSTM to deal with variable length sequences. But I failed to do that by using the MaskZero function. Could you please help me out? Thanks a lot!!

Here a minimal code example of what I mean:

require 'rnn'
require 'optim'

inSize = 20
batchSize = 2
hiddenSize = 10
seqLengthMax = 11
numTargetClasses=5
numSeq = 30

x, y1 = {}, {}

for i = 1, numSeq do
   local seqLength = torch.random(1,seqLengthMax)
   local temp = torch.zeros(seqLengthMax, inSize)
   local targets ={}
   if seqLength == seqLengthMax then
         targets = (torch.rand(seqLength)*numTargetClasses):ceil()
   else
      targets = torch.cat(torch.zeros(seqLengthMax-seqLength),(torch.rand(seqLength)*numTargetClasses):ceil())
   end
      temp[{{seqLengthMax-seqLength+1,seqLengthMax}}] = torch.randn(seqLength,inSize)
   table.insert(x, temp)
   table.insert(y1, targets)
end

model = nn.Sequencer(
   nn.Sequential()
      :add(nn.MaskZero(nn.FastLSTM(inSize,hiddenSize),1))
      :add(nn.MaskZero(nn.Linear(hiddenSize, numTargetClasses),1))
      :add(nn.MaskZero(nn.LogSoftMax(),1))
)

criterion = nn.SequencerCriterion(nn.MaskZero(nn.ClassNLLCriterion(),1))

output = model:forward(x)
print(output[1])

err = criterion:forward(output, y1)
print(err)

jfsantos · 2015-12-12T23:51:16Z

I currently am facing the same issue to implement the char-rnn example using rnn.

ghost · 2015-12-14T12:05:10Z

Are you sure you need MaskZero? The Sequencer handles sequence lengths dynamically.

I adapted Karpathy's char-rnn to eat variable lengths of word sequences and it works very fine without any padding or masking.

(Sanity-checked with a Reber grammar length 5 to 50.)

jfsantos · 2015-12-14T12:25:28Z

@kmnns Using Sequencer works if you train using one sequence at a time, but when using mini-batches you can have multiple sequences of different length (therefore the need for masking).

ghost · 2015-12-14T12:38:14Z

@jfsantos I see. Then I second that question.

The test script seems to wrap the whole Sequential into a MaskZero, not its single parts:

rnn/test/test.lua

Line 2370 in 442276f

local module = nn.MaskZero(recurrent, 1)

Maybe give that a try,

nicholas-leonard · 2015-12-14T18:02:16Z

@jundeng86 The model seems ok. It masks the zeros correctly AFAIK.

As for the criterion, the MaskZero cannot be used to decorate a criterion. We would need a MaskZeroCriterion for that. Working on it.

nicholas-leonard · 2015-12-14T19:20:18Z

@jundeng86 So for the criterion, use MaskZeroCriterion(criterion).

jundengdeng · 2015-12-14T19:27:06Z

Thanks! You helped me out!

jnhwkim · 2015-12-15T02:05:30Z

FYI, MaskZero can be nicely used by calling recurrent:maskZero(nInputDim).

jfsantos · 2015-12-15T13:55:42Z

@nicholas-leonard just a heads-up: MaskZeroCriterion currently does not work if the entire batch has to be masked/ignored. I can try to fix that or at least add a test case to help with the fix.

nicholas-leonard · 2015-12-17T19:59:19Z

@jfsantos Yes please! Also, nice meeting you at NIPS!

rotmanmi · 2015-12-21T20:17:36Z

@nicholas-leonard Are you sure the nn.MaskZeroCriterion works when the mask is zero? (an input exists that fills the whole sequence)
I'm encountering problems with that.

emergix · 2015-12-24T17:13:04Z

hello

I have built this augmented reality model with MaskZero
it seems to work
maybe I am wrong, but I dont need to use MaskZeroCriterion

function
newModelBuild(dictionarySize,nbfeatures,embeddingSize,rhoInput,rhoOutput,lktype,logsoftFlag)
local model=nn.Sequential()
local p=nn.ParallelTable() 
p:add(nn.Identity()) --  -> carries the tensor of features
local lkt=nn.LookupTable(dictionarySize, embeddingSize)
local weightmatrix
if lktype == 0 then 
    weightmatrix=torch.Tensor(dictionarySize,embeddingSize)
    for i=1,dictionarySize do
        for j=1,embeddingSize do
            weightmatrix[i][j]=torch.uniform(0,1)
        end
    end
    lkt.weight:copy(weightmatrix)
else 
    lkt.weight:fill(1.0/embeddingSize)
end
p:add(nn.Sequencer(lkt)) -- ->ListofTensor(batchSize X embeddingSize)
model:add(p)
local SliceList=nn.ConcatTable() -- purpose: create a list tensor created by joining   tensors
for i=1, rhoInput do   
    local Slice =nn.Sequential()
    SliceList:add(Slice)
    local cc=nn.ConcatTable()   -- contains the 2 tensors to join
    Slice:add(cc)
    local a=nn.Sequential()
    cc:add(a)
    a:add(nn.SelectTable(2)) -- we select list of tensor(i)
    a:add(nn.SelectTable(i))  -- we select a tensor(i)
    local b=nn.Sequential()
    cc:add(b)
    b:add(nn.SelectTable(1)) -- we select  tensorF
    Slice:add(nn.JoinTable(2)) -- we create a single tensor = tensorF & tensor(i)
end
for i=rhoInput+1,rhoOutput do
    local Slice =nn.Sequential()
    SliceList:add(Slice)
    local cc=nn.ConcatTable()   -- contains the 2 tensors to join
    Slice:add(cc)
    local a=nn.Sequential()
    cc:add(a)
    a:add(nn.SelectTable(2)) -- we select list of tensor(i)
    a:add(nn.MaskZero(nn.SelectTable(i),1))  -- we select a tensor(i) : put at 0***********
    local b=nn.Sequential()
    cc:add(b)
    b:add(nn.SelectTable(1)) -- we select  tensorF
    Slice:add(nn.JoinTable(2)) -- we create a single tensor = tensorF & tensor(i)
end
model:add(SliceList)
model:add(nn.Sequencer(nn.FastLSTM(embeddingSize+nbfeatures, embeddingSize, rhoOutput)))
model:add(nn.Sequencer(nn.Linear(embeddingSize, dictionarySize)))
if logsoftFlag then model:add(nn.Sequencer(nn.LogSoftMax())) end
return model
end

jfsantos · 2015-12-27T17:14:05Z

@emergix You are only using MaskZero at a small part of the model, not for the whole model. Probably that part of the model does not have any issue with having it's output set to zeros. MaskZeroCriterion is designed to deal with masked outputs of the model (and does so by causing the criterion to ignore those outputs).

emergix · 2015-12-30T21:52:15Z

I see,

The case you want to address is the case where you have inputs of differents length and you want a consistent evaluation of the criterion.

so you need to adjust the rho of the LSTM to the maximum of the possible lengh of the inputs.

So the criterion has to compute something which should be usable to compare between inputs of different lengths.

something like

1/(nb of non zero inputs) sum{square distance(inputs,target), over non zero inputs)

and you should make sure that it is impossible that a zero inputs is a possible meaningful inputs

Am I right ?

Message du 27/12/15 18:14
De : "João Felipe Santos"
A : "Element-Research/rnn"
Copie à : "olivier croissant"
Objet : Re: [rnn] How to use MaskZero with LSTM and nn.ClassNLLCriterion for variable length squences (#75)

@emergix You are only using MaskZero at a small part of the model, not for the whole model. Probably that part of the model does not have any issue with having it's output set to zeros. MaskZeroCriterion is designed to deal with masked outputs of the model (and does so by causing the criterion to ignore those outputs).

—

Reply to this email directly or view it on GitHub.

shuzi · 2016-05-16T21:20:39Z

@nicholas-leonard
Will the zeroMask handle the BPTT accordingly? e.g. if the input is 1 2 3 4 5 0 0 0, the BPTT shall start from 5

jnhwkim · 2016-05-16T22:47:51Z

@shuzi maskZero and trimZero handle BPTT correctly. However, the input should be right-aligned with the left zero paddings (e.g. 0 0 0 1 2 3 4 5; meaningful forwarding starts from 1; backwarding ends at 1). In the case with lots of zero paddings, trimZero performs more efficiently with the same outputs.

nicholas-leonard · 2016-05-16T22:56:54Z

@shuzi As @jnhwkim mentioned, trimZero will work better than maskZero if you have lots of zeros. In any case, you can put zeros anywhere in a tensor. So you could have 1,2,3,0,4,5,6,0,2,3 and the 0 would reset both the forward and backward (BPTT) between sequences 1,2,3, 4,5,6 and 2,3.

shuzi · 2016-05-16T23:11:26Z

@nicholas-leonard @jnhwkim I see, right-aligned with left side zero paddings, thanks a lot!

jnhwkim · 2016-05-16T23:16:35Z

@shuzi you're welcome.

freakeinstein · 2016-07-23T06:49:06Z

can anyone answer this question? http://stackoverflow.com/questions/38539011/batch-processing-variable-length-sequences-using-element-research-rnn-for-torch

apsvvfb · 2016-08-01T08:33:11Z

I changed the criterion to MaskZeroCriterion as suggested, then I tried to run the example code given by @jundeng86 .
However, I got a warning.
"Warning : you are most likely using MaskZero the wrong way. You should probably use AbstractRecurrent:maskZero() so that it wraps the internal AbstractRecurrent.recurrentModule instead of wrapping the AbstractRecurrent module itself."
So is it OK to use MaskZero in the example?

require 'rnn'
require 'optim'

inSize = 20
batchSize = 2
hiddenSize = 10
seqLengthMax = 11
numTargetClasses=5
numSeq = 30

x, y1 = {}, {}

for i = 1, numSeq do
   local seqLength = torch.random(1,seqLengthMax)
   local temp = torch.zeros(seqLengthMax, inSize)
   local targets ={}
   if seqLength == seqLengthMax then
         targets = (torch.rand(seqLength)*numTargetClasses):ceil()
   else
      targets = torch.cat(torch.zeros(seqLengthMax-seqLength),(torch.rand(seqLength)*numTargetClasses):ceil())
   end
      temp[{{seqLengthMax-seqLength+1,seqLengthMax}}] = torch.randn(seqLength,inSize)
   table.insert(x, temp)
   table.insert(y1, targets)
end

model = nn.Sequencer(
   nn.Sequential()
      :add(nn.MaskZero(nn.FastLSTM(inSize,hiddenSize),1))
      :add(nn.MaskZero(nn.Linear(hiddenSize, numTargetClasses),1))
      :add(nn.MaskZero(nn.LogSoftMax(),1))
)

--criterion = nn.SequencerCriterion(nn.MaskZero(nn.ClassNLLCriterion(),1))
criterion = nn.SequencerCriterion(nn.MaskZeroCriterion(nn.ClassNLLCriterion(),1))

output = model:forward(x)
print(output[1])

err = criterion:forward(output, y1)
print(err)

mingstupid · 2016-11-21T19:43:10Z

Hello, I am trying to use SequencerCriterion with maskZeroCriterion on my varilable-length batch input (e.g., sentences with different number of words). However, I found that, if the predictions fed to the criterion is all zeros, even though the targets are not all zeros, the err is zero. If so, the global minimum would be for the model to produce all zeros regardless of what the input is. I hope I am wrong somewhere. Thank you for your help in advance!

For example:

c = nn.ClassNLLCriterion()
criterion = nn.SequencerCriterion(nn.MaskZeroCriterion(c,1))
outputs = torch.zeros(2,3,4) -- batchSize = 2, maxSeqLength = 3, featSize = 4
targets = torch.Tensor({{1,2,0}, {2,0,0}}) -- the target of the first example is 1,2; the target of the second example is just 2. pad 0's to the right of each sequence.
err = criterion:forward(outputs, targets)

err = 0. It gets the same result if I pad zero to the left of targets
targets = torch.Tensor({{0, 1,2}, {0,0,2}})

JoostvDoorn · 2016-11-21T19:50:27Z

This may be a bit counter intuitive but MaskZeroCriterion is based on the zeros of the input (so in this case outputs) and not the zeros of the target. I have a module which does it the other way around, you can find it here: https://gist.github.com/JoostvDoorn/d5e2787a0a307fcc126acf41c9f749bf

nicholas-leonard mentioned this issue Dec 14, 2015

MaskZeroCriterion #84

Merged

nicholas-leonard closed this as completed in #84 Dec 14, 2015

Tushar-N mentioned this issue Feb 24, 2016

Strange bug if using MaskZeroCriterion and SequencerCriterion together #128

Closed

vguptai mentioned this issue May 29, 2018

Doubt regarding MaskZero #433

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use MaskZero with LSTM and nn.ClassNLLCriterion for variable length squences #75

How to use MaskZero with LSTM and nn.ClassNLLCriterion for variable length squences #75

jundengdeng commented Dec 10, 2015

jfsantos commented Dec 12, 2015

ghost commented Dec 14, 2015

jfsantos commented Dec 14, 2015

ghost commented Dec 14, 2015

nicholas-leonard commented Dec 14, 2015

nicholas-leonard commented Dec 14, 2015

jundengdeng commented Dec 14, 2015

jnhwkim commented Dec 15, 2015

jfsantos commented Dec 15, 2015

nicholas-leonard commented Dec 17, 2015

rotmanmi commented Dec 21, 2015

emergix commented Dec 24, 2015

jfsantos commented Dec 27, 2015

emergix commented Dec 30, 2015

shuzi commented May 16, 2016

jnhwkim commented May 16, 2016 •

edited

Loading

nicholas-leonard commented May 16, 2016

shuzi commented May 16, 2016

jnhwkim commented May 16, 2016

freakeinstein commented Jul 23, 2016

apsvvfb commented Aug 1, 2016 •

edited

Loading

mingstupid commented Nov 21, 2016

JoostvDoorn commented Nov 21, 2016

How to use MaskZero with LSTM and nn.ClassNLLCriterion for variable length squences #75

How to use MaskZero with LSTM and nn.ClassNLLCriterion for variable length squences #75

Comments

jundengdeng commented Dec 10, 2015

jfsantos commented Dec 12, 2015

ghost commented Dec 14, 2015

jfsantos commented Dec 14, 2015

ghost commented Dec 14, 2015

nicholas-leonard commented Dec 14, 2015

nicholas-leonard commented Dec 14, 2015

jundengdeng commented Dec 14, 2015

jnhwkim commented Dec 15, 2015

jfsantos commented Dec 15, 2015

nicholas-leonard commented Dec 17, 2015

rotmanmi commented Dec 21, 2015

emergix commented Dec 24, 2015

jfsantos commented Dec 27, 2015

emergix commented Dec 30, 2015

shuzi commented May 16, 2016

jnhwkim commented May 16, 2016 • edited Loading

nicholas-leonard commented May 16, 2016

shuzi commented May 16, 2016

jnhwkim commented May 16, 2016

freakeinstein commented Jul 23, 2016

apsvvfb commented Aug 1, 2016 • edited Loading

mingstupid commented Nov 21, 2016

JoostvDoorn commented Nov 21, 2016

jnhwkim commented May 16, 2016 •

edited

Loading

apsvvfb commented Aug 1, 2016 •

edited

Loading