-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Rethink temporal pooling - discrete transitions at higher levels. #32
Rethink temporal pooling - discrete transitions at higher levels. #32
Conversation
This rewrite comes from taking seriously the need for sequence learning at higher levels. At higher levels we have temporal slowness: cells stay active for longer, i.e. several time steps. If there is a uniform cortical algorithm then we need the usual sequence learning to work under temporal slowness. I think that means discrete transitions. I define a threshold fraction of stable inputs to a layer for it to be "effectively stable". Only when a layer begins a period of stability does it replace its active columns based on the stable input. At other times, these active columns and cells continue to stay active. The exception is columns which have a definite (high) match to input bits, and the relative influence of these over continuing columns is controlled by a parameter (temporal-pooling-max-exc). Active columns learn on proximal dendrites as long as the layer is effectively stable. For learning on distal dendrites, define: * learning cells are the new winning cells, excluding any continuing. * learnable cells are the previous winning cells, excluding any continuing. Obviously, this is all experimental. Actually I haven't even run it yet.
…igher level layer is confused. also fix a couple of bugs.
With this change we grow distal synapses from source A to target B if their activation lines up in series, like
Maybe we should also grow when they overlap but are clearly ordered, like
and maybe
This could be implemented by defining learning cells to be newly active winners (as with this change), but allowing source learnable cells to be all winners, not just the ones turning off. However that would allow this connection to be learned which is questionable:
|
…ells are learnable.
OK I think we need to handle continuous sequences, so going with (what I just wrote,) allowing distal learning between cells overlapping in time, by making all winner cells learnable (but winners are still only learning when they first become active). Just looking at coordinate encoder demo, there is a lot of non-sequential learning going on because, while a column stays active, the winner cell in a column often switches under the influence of distal (predictive) excitation. Since these are "new" winner cells they do distal learning. Ends up with a lot of noise. An obvious solution would be to force the winner cell in a column to be fixed until the column turns off. However I don't want to do that because, the initial context of a column might be wrong, and e.g. top-down feedback should be able to resolve the context to the correct cell in a column even while the column stays active. I think instead we could allow the winner cell in a column to switch according to total excitation (as now), but if it is in a continuing active column it should not be learning. |
Hi Felix, I don't mean to chime in where I'm not invited - but it seems you are doing Just a thought... Cheers, On Tue, Sep 29, 2015 at 9:22 AM, Felix Andrews notifications@github.com
With kind regards, David Ray Cortical.io http://cortical.io/ |
@cogmission Speaking for myself, I think of |
@marcus I see. I just didn't know who was reading this, and I don't see any On Tue, Sep 29, 2015 at 11:04 AM, Marcus Lewis notifications@github.com
With kind regards, David Ray Cortical.io http://cortical.io/ |
@cogmission a good point. I will ask for help from the nupic-theory list as you suggest, but I will see if I can consolidate my thoughts a bit first to avoid wasting everyone's time. |
…ences. skip distal learning on continuing temporal pooling cells.
There is a problem with this whole approach which is obvious in retrospect. (In fact I now remember that I realised this before, in my first attempt at temporal pooling, but forgot about it.) Recall that as soon as the first level becomes predictable, the higher (temporal pooling) layer "engages" and fixes its active columns; they then keep growing new dendrites to encompass the following predictable sequence. The problem is, just because a sequence is recognised as predictable does not mean it is resolved into a unique identity, and of course it cannot in general be resolved uniquely until the whole sequence has been seen. For example seeing the letters "t,h,e" vs "t,h,r,e,e". The sequence is predicted at "h" but not uniquely. If we freeze the pooled representation at that point it will be identical for "the" and "three". One way to go is Numenta's "Union Pooler" approach - I only have a vague and possibly incorrect understanding: throughout a predicted sequence, more and more cells get added to the temporal pooling representation. Therefore the final representation should have some unique component. The nice part is that the union representation should include bits from all steps of a sequence, so you get semantic overlap with similar sequences. I'm not sure how you get this to be stable enough to model higher level sequences. Another way might be to use an attention-like mechanism to "engage" the temporal pooling once the predictions have been resolved down to a single path. |
Correct me if I'm wrong, but shouldn't the behavior be either a prediction Also the choice of "the" or "three" seems like it would be dependent on the Just some random thoughts... Cheers, On Thu, Oct 1, 2015 at 9:17 AM, Felix Andrews notifications@github.com
With kind regards, David Ray Cortical.io http://cortical.io/ |
…ulate during predicted sequences. just a rough sketch, untested and not fully thought out.
So I did a kind of implementation of a union pooler. |
A puzzle that comes up when we think about sequence learning at higher levels: How do we maintain a "bursting" column state in a higher level layer? (If a transition was not predicted, the newly activated columns should burst, activating all their cells / contexts.) I'm assuming that the same mechanism should apply at all levels. Under temporal pooling, cells in a column may stay active for several time steps; for sequence learning this could be either a single predicted cell, or many bursting cells. I make this work by setting a level of persistent temporal pooling excitation on all newly active cells. Apart from keeping multiple predictions open, the other role of bursting is in defining feed-forward outputs from the layer as being "stable" or not, which is used in temporal pooling at still higher levels. This seems to suggest we should define bursting simply by whether all cells in a column are (continuing to be) active. However, that definition can't apply in the first level if we have one cell per column. And it seems to lose the essence of "bursting" in being defined by a (lack of) predictive depolarising potential. In practice we seem to be left with a composite definition of bursting: by predictive potential on newly-active / first-level steps, and all-cells-per-column during temporal pooling phases. |
@mrcslws please review. Um sorry about the 13 commits... do you think I should squash them? |
I'm taking some time to study this change and become opinionated about it. I should have something coherent to say tomorrow (Sunday my time). The 13 commits are fine by me. Feel free to merge without me, I can always comment on commits. |
Sure, there's no urgency about it. And thanks. On Sunday, 4 October 2015, Marcus Lewis notifications@github.com wrote:
Felix Andrews / 安福立 |
@@ -434,8 +442,14 @@ | |||
(if good? (conj good-ids id) good-ids) | |||
(if (and good? (< exc min-good-exc)) exc min-good-exc))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't specific to this commit. But the winner dominates all other cells if it dominates... a single cell that's above the threshold? Shouldn't we instead just use filter/remove to get rid of the dominated cells case-by-case?
Alternately, if dominance is all-or-none, it should keep track of the second best excitation, not the lowest excitation above the threshold.
I might be looking at this wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, we should just use filter/remove.
(let [new-ac (if newly-engaged? | ||
ac | ||
(set/difference ac (:active-cells state)))] | ||
(into tp-exc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying to piece this together...
- If the layer is not engaged, then all newly active cells get their tp-exc set to the max. Those that were already active don't, so they decay in each timestep.
- If the layer is newly engaged, all active cells get their tp-exc set to the max.
- If the layer is engaged (but not newly), then all newly active cells get their tp-exc set to the max. The others decay in each timestep (because of the commented out
true ;(not engaged?)
)
Am I reading it right? I haven't grokked the change yet, I'm expecting it to all crystallize after I sleep on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's right. But I think it makes more sense if you consider the proximal excitation together with the temporal pooling excitation.
- if the layer is newly engaged, any existing TP is cleared, columns are selected using all proximal input, and the activated cells get the full TP amount.
- if the layer is continuing engaged, columns are selected using all proximal input together with the existing TP. Because we select a larger number of columns each step, they will be the existing TP ones plus some new ones. The new ones (cells) get the full TP amount.
- if the layer is not engaged, no proximal input comes through, with the exception of any columns having a high match. So TP columns will just continue. The idea was to carry forward context (including transition predictions) rather than forgetting it. But I just realised that it resets the activation level so it won't carry forward unchanged, d'oh.
A thought, inspired by this, "Novel input should appear stable": when a layer is newly not engaged, i.e. something novel appears, we could reset TP and start pooling again! That way we build up a representation of the novel thing, until we start to recognise something and reset (transition) again. But if not engaged then we don't learn proximally.
Whether TP amounts should always decay, or only when the layer is not engaged, I'm not sure. The former would bias the representation towards more recent steps in the pooled sequence. Since we allocate more columns for each step this seems like it should not be necessary, but it would be once we fill up to the maximum density/sparsity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the layer is not engaged, no proximal input comes through
Ok, yeah, because of the (select-keys (keys ff-good-paths))
, and because of inhibition from TP excitation. Given the current numbers, the inhibition will have a much larger effect than this threshold, since the threshold for "good" is just 12 (:ff-seg-new-synapse-count
) while the TP excitation starts at 50.
I'm just narrating in case it uncovers a flaw in my understanding.
So really the :ff-stimulus-threshold
is just for engaged layers. Non-engaged layers use the higher "good" threshold.
👍 |
Rethink temporal pooling - discrete transitions at higher levels.
Just merging to carry on with experiments, not because this is finished by any stretch of the imagination. |
This rewrite comes from taking seriously the need for sequence learning at higher levels. At higher levels we have temporal slowness: cells stay active for longer, i.e. several time steps. If there is a uniform cortical algorithm then we need the usual sequence learning to work under temporal slowness. I think that means discrete transitions.
I define a threshold fraction of stable inputs to a layer for it to be engaged. Only when a layer is newly engaged does it replace its active columns based on the stable input. At other times, these active columns and cells continue to stay active. The exception is any columns which have a definite (high) match to input bits, and the relative influence of these over continuing columns is controlled by a parameter (
temporal-pooling-max-exc
).Active columns learn on proximal dendrites as long as the layer is engaged, meaning it has continuing stable input.
For learning on distal dendrites, define:
And each time step, the learning cells can grow synapses to the learnable cells. This applies in all layers, not just higher temporal pooling layers. Notably it has a big effect on gradual continuous sequence learning, such as with the coordinate encoder. That will probably cause problems because there might not be enough coincidences of some cells starting while others are stopping. Maybe the learnable cells should remain learnable for a few time steps.
I'm not sure the old way was much better because it would end up with a lot of cells connecting to the other ones representing the same coordinate, which is not useful sequence information.
Obviously, this is all experimental. I haven't really experimented with it to see how the temporal pooling properties hold up. But what we had didn't work anyway, so might as well replace it.