Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

How to use MaskZero with LSTM and nn.ClassNLLCriterion for variable length squences #75

Closed
jundengdeng opened this issue Dec 10, 2015 · 23 comments

Comments

@jundengdeng
Copy link

Hi Guys,
I tried to make use LSTM to deal with variable length sequences. But I failed to do that by using the MaskZero function. Could you please help me out? Thanks a lot!!

Here a minimal code example of what I mean:

require 'rnn'
require 'optim'

inSize = 20
batchSize = 2
hiddenSize = 10
seqLengthMax = 11
numTargetClasses=5
numSeq = 30

x, y1 = {}, {}

for i = 1, numSeq do
   local seqLength = torch.random(1,seqLengthMax)
   local temp = torch.zeros(seqLengthMax, inSize)
   local targets ={}
   if seqLength == seqLengthMax then
         targets = (torch.rand(seqLength)*numTargetClasses):ceil()
   else
      targets = torch.cat(torch.zeros(seqLengthMax-seqLength),(torch.rand(seqLength)*numTargetClasses):ceil())
   end
      temp[{{seqLengthMax-seqLength+1,seqLengthMax}}] = torch.randn(seqLength,inSize)
   table.insert(x, temp)
   table.insert(y1, targets)
end

model = nn.Sequencer(
   nn.Sequential()
      :add(nn.MaskZero(nn.FastLSTM(inSize,hiddenSize),1))
      :add(nn.MaskZero(nn.Linear(hiddenSize, numTargetClasses),1))
      :add(nn.MaskZero(nn.LogSoftMax(),1))
)

criterion = nn.SequencerCriterion(nn.MaskZero(nn.ClassNLLCriterion(),1))

output = model:forward(x)
print(output[1])

err = criterion:forward(output, y1)
print(err)
@jfsantos
Copy link

I currently am facing the same issue to implement the char-rnn example using rnn.

@ghost
Copy link

ghost commented Dec 14, 2015

Are you sure you need MaskZero? The Sequencer handles sequence lengths dynamically.

I adapted Karpathy's char-rnn to eat variable lengths of word sequences and it works very fine without any padding or masking.

(Sanity-checked with a Reber grammar length 5 to 50.)

@jfsantos
Copy link

@kmnns Using Sequencer works if you train using one sequence at a time, but when using mini-batches you can have multiple sequences of different length (therefore the need for masking).

@ghost
Copy link

ghost commented Dec 14, 2015

@jfsantos I see. Then I second that question.

The test script seems to wrap the whole Sequential into a MaskZero, not its single parts:

rnn/test/test.lua

Line 2370 in 442276f

local module = nn.MaskZero(recurrent, 1)

Maybe give that a try,

@nicholas-leonard
Copy link
Member

@jundeng86 The model seems ok. It masks the zeros correctly AFAIK.

As for the criterion, the MaskZero cannot be used to decorate a criterion. We would need a MaskZeroCriterion for that. Working on it.

@nicholas-leonard
Copy link
Member

@jundeng86 So for the criterion, use MaskZeroCriterion(criterion).

@jundengdeng
Copy link
Author

Thanks! You helped me out!

@jnhwkim
Copy link
Contributor

jnhwkim commented Dec 15, 2015

FYI, MaskZero can be nicely used by calling recurrent:maskZero(nInputDim).

@jfsantos
Copy link

@nicholas-leonard just a heads-up: MaskZeroCriterion currently does not work if the entire batch has to be masked/ignored. I can try to fix that or at least add a test case to help with the fix.

@nicholas-leonard
Copy link
Member

@jfsantos Yes please! Also, nice meeting you at NIPS!

@rotmanmi
Copy link

@nicholas-leonard Are you sure the nn.MaskZeroCriterion works when the mask is zero? (an input exists that fills the whole sequence)
I'm encountering problems with that.

@emergix
Copy link

emergix commented Dec 24, 2015

hello

I have built this augmented reality model with MaskZero
it seems to work
maybe I am wrong, but I dont need to use MaskZeroCriterion

function
newModelBuild(dictionarySize,nbfeatures,embeddingSize,rhoInput,rhoOutput,lktype,logsoftFlag)
local model=nn.Sequential()
local p=nn.ParallelTable() 
p:add(nn.Identity()) --  -> carries the tensor of features
local lkt=nn.LookupTable(dictionarySize, embeddingSize)
local weightmatrix
if lktype == 0 then 
    weightmatrix=torch.Tensor(dictionarySize,embeddingSize)
    for i=1,dictionarySize do
        for j=1,embeddingSize do
            weightmatrix[i][j]=torch.uniform(0,1)
        end
    end
    lkt.weight:copy(weightmatrix)
else 
    lkt.weight:fill(1.0/embeddingSize)
end
p:add(nn.Sequencer(lkt)) -- ->ListofTensor(batchSize X embeddingSize)
model:add(p)
local SliceList=nn.ConcatTable() -- purpose: create a list tensor created by joining   tensors
for i=1, rhoInput do   
    local Slice =nn.Sequential()
    SliceList:add(Slice)
    local cc=nn.ConcatTable()   -- contains the 2 tensors to join
    Slice:add(cc)
    local a=nn.Sequential()
    cc:add(a)
    a:add(nn.SelectTable(2)) -- we select list of tensor(i)
    a:add(nn.SelectTable(i))  -- we select a tensor(i)
    local b=nn.Sequential()
    cc:add(b)
    b:add(nn.SelectTable(1)) -- we select  tensorF
    Slice:add(nn.JoinTable(2)) -- we create a single tensor = tensorF & tensor(i)
end
for i=rhoInput+1,rhoOutput do
    local Slice =nn.Sequential()
    SliceList:add(Slice)
    local cc=nn.ConcatTable()   -- contains the 2 tensors to join
    Slice:add(cc)
    local a=nn.Sequential()
    cc:add(a)
    a:add(nn.SelectTable(2)) -- we select list of tensor(i)
    a:add(nn.MaskZero(nn.SelectTable(i),1))  -- we select a tensor(i) : put at 0***********
    local b=nn.Sequential()
    cc:add(b)
    b:add(nn.SelectTable(1)) -- we select  tensorF
    Slice:add(nn.JoinTable(2)) -- we create a single tensor = tensorF & tensor(i)
end
model:add(SliceList)
model:add(nn.Sequencer(nn.FastLSTM(embeddingSize+nbfeatures, embeddingSize, rhoOutput)))
model:add(nn.Sequencer(nn.Linear(embeddingSize, dictionarySize)))
if logsoftFlag then model:add(nn.Sequencer(nn.LogSoftMax())) end
return model
end

@jfsantos
Copy link

@emergix You are only using MaskZero at a small part of the model, not for the whole model. Probably that part of the model does not have any issue with having it's output set to zeros. MaskZeroCriterion is designed to deal with masked outputs of the model (and does so by causing the criterion to ignore those outputs).

@emergix
Copy link

emergix commented Dec 30, 2015

I see,

The case you want to address  is the case where you have inputs of differents length and you want a consistent evaluation of the criterion.

so you need to adjust the rho of the LSTM to the maximum of the possible lengh of the inputs.

So the criterion has to compute something which should be usable to compare between inputs of different lengths.

something like

1/(nb of non zero inputs) sum{square distance(inputs,target), over non zero inputs)

and you should make sure that it is impossible that a zero inputs is a possible meaningful inputs

 

Am I right ?

 

 

 

 

Message du 27/12/15 18:14
De : "João Felipe Santos"
A : "Element-Research/rnn"
Copie à : "olivier croissant"
Objet : Re: [rnn] How to use MaskZero with LSTM and nn.ClassNLLCriterion for variable length squences  (#75)

@emergix You are only using MaskZero at a small part of the model, not for the whole model. Probably that part of the model does not have any issue with having it's output set to zeros. MaskZeroCriterion is designed to deal with masked outputs of the model (and does so by causing the criterion to ignore those outputs).

Reply to this email directly or view it on GitHub.

@shuzi
Copy link

shuzi commented May 16, 2016

@nicholas-leonard
Will the zeroMask handle the BPTT accordingly? e.g. if the input is 1 2 3 4 5 0 0 0, the BPTT shall start from 5

@jnhwkim
Copy link
Contributor

jnhwkim commented May 16, 2016

@shuzi maskZero and trimZero handle BPTT correctly. However, the input should be right-aligned with the left zero paddings (e.g. 0 0 0 1 2 3 4 5; meaningful forwarding starts from 1; backwarding ends at 1). In the case with lots of zero paddings, trimZero performs more efficiently with the same outputs.

@nicholas-leonard
Copy link
Member

@shuzi As @jnhwkim mentioned, trimZero will work better than maskZero if you have lots of zeros. In any case, you can put zeros anywhere in a tensor. So you could have 1,2,3,0,4,5,6,0,2,3 and the 0 would reset both the forward and backward (BPTT) between sequences 1,2,3, 4,5,6 and 2,3.

@shuzi
Copy link

shuzi commented May 16, 2016

@nicholas-leonard @jnhwkim I see, right-aligned with left side zero paddings, thanks a lot!

@jnhwkim
Copy link
Contributor

jnhwkim commented May 16, 2016

@shuzi you're welcome.

@freakeinstein
Copy link

@apsvvfb
Copy link

apsvvfb commented Aug 1, 2016

I changed the criterion to MaskZeroCriterion as suggested, then I tried to run the example code given by @jundeng86 .
However, I got a warning.
"Warning : you are most likely using MaskZero the wrong way. You should probably use AbstractRecurrent:maskZero() so that it wraps the internal AbstractRecurrent.recurrentModule instead of wrapping the AbstractRecurrent module itself."
So is it OK to use MaskZero in the example?

require 'rnn'
require 'optim'

inSize = 20
batchSize = 2
hiddenSize = 10
seqLengthMax = 11
numTargetClasses=5
numSeq = 30

x, y1 = {}, {}

for i = 1, numSeq do
   local seqLength = torch.random(1,seqLengthMax)
   local temp = torch.zeros(seqLengthMax, inSize)
   local targets ={}
   if seqLength == seqLengthMax then
         targets = (torch.rand(seqLength)*numTargetClasses):ceil()
   else
      targets = torch.cat(torch.zeros(seqLengthMax-seqLength),(torch.rand(seqLength)*numTargetClasses):ceil())
   end
      temp[{{seqLengthMax-seqLength+1,seqLengthMax}}] = torch.randn(seqLength,inSize)
   table.insert(x, temp)
   table.insert(y1, targets)
end

model = nn.Sequencer(
   nn.Sequential()
      :add(nn.MaskZero(nn.FastLSTM(inSize,hiddenSize),1))
      :add(nn.MaskZero(nn.Linear(hiddenSize, numTargetClasses),1))
      :add(nn.MaskZero(nn.LogSoftMax(),1))
)

--criterion = nn.SequencerCriterion(nn.MaskZero(nn.ClassNLLCriterion(),1))
criterion = nn.SequencerCriterion(nn.MaskZeroCriterion(nn.ClassNLLCriterion(),1))

output = model:forward(x)
print(output[1])

err = criterion:forward(output, y1)
print(err)

@mingstupid
Copy link

Hello, I am trying to use SequencerCriterion with maskZeroCriterion on my varilable-length batch input (e.g., sentences with different number of words). However, I found that, if the predictions fed to the criterion is all zeros, even though the targets are not all zeros, the err is zero. If so, the global minimum would be for the model to produce all zeros regardless of what the input is. I hope I am wrong somewhere. Thank you for your help in advance!

For example:

c = nn.ClassNLLCriterion()
criterion = nn.SequencerCriterion(nn.MaskZeroCriterion(c,1))
outputs = torch.zeros(2,3,4) -- batchSize = 2, maxSeqLength = 3, featSize = 4
targets = torch.Tensor({{1,2,0}, {2,0,0}}) -- the target of the first example is 1,2; the target of the second example is just 2. pad 0's to the right of each sequence.
err = criterion:forward(outputs, targets)

err = 0. It gets the same result if I pad zero to the left of targets
targets = torch.Tensor({{0, 1,2}, {0,0,2}})

@JoostvDoorn
Copy link
Contributor

This may be a bit counter intuitive but MaskZeroCriterion is based on the zeros of the input (so in this case outputs) and not the zeros of the target. I have a module which does it the other way around, you can find it here: https://gist.github.com/JoostvDoorn/d5e2787a0a307fcc126acf41c9f749bf

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests