-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Smarter SP parameters #536
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please help me discuss and brainstorm the proposed changes, relations between variables, candidates for removal, ...
@@ -99,7 +107,7 @@ class SpatialPooler : public Serializable | |||
columns use 2000, or [2000]. For a three dimensional | |||
topology of 32x64x16 use [32, 64, 16]. | |||
|
|||
@param potentialRadius This parameter deteremines the extent of the | |||
@param potentialRadius This parameter deteremines the extent of the //TODO change this to potentialRadiusPct 0.0..1.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replace with similar meaning but relative [0.0, 1.0] percentage of the input dimensions.
Current "receptive field radius is 16 [input bits]" is not transferable, while "recept field is 10% [of the input field]" will work well with any sizes.
Overall, everywhere move from absolute units to relative percentages.
|
||
@param numActiveColumnsPerInhArea An alternate way to control the sparsity of | ||
@param numActiveColumnsPerInhArea An alternate way to control the sparsity of //TODO remove this method of operation?! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I propose to completely remove this param, and switch to using localAreaDensity only.
All optimized models (mnist, hotgym) use the localAreaDensity.
When using this method, as columns
learn and grow their effective receptive fields, the
inhibitionRadius will grow, and hence the net density of the
active columns will decrease. This is in contrast to the
I esp. dislike this part, density of SP should remain constant.
This would get us rid off of a mutex, making param optimization easier.
Are there any usecases where this mode of operation would be favorable?
active synapse is incremented in each round. | ||
|
||
@param synPermConnected The default connected threshold. Any synapse | ||
@param synPermConnected The default connected threshold. Any synapse //TODO remove, hard-coded in Connections, raise to 0.5 from 0.2? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
definitely make hard-coded.
Propose changing to "0.5" (or middle of minPermanence, maxPerm). Is there a reason why everywhere this would be set unevenly closer to min? (0.2, 0.1 being common defaults. Performs well with 0.5 in MNIST)
|
||
@param boostStrength A number greater or equal than 0, used to | ||
@param boostStrength A number greater or equal than 0, used to //TODO no biological background(?), remove |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
verify biological background for boosting, and remove altogether if none (boosting does help somewhat in MNIST, see if this can be mitigated with new param config?)
-
if not removed, make fixed (2.0), or automated on robustness (
boost = 2.0 * <inverse ration of robustness>
)
likely to oscillate. //TODO do not allow too small | ||
//TODO make this to dutyCyclePeriodPct 0..1.0, which uses | ||
//TODO new `samplesPerEpoch`, if known. For MNIST (image dataset) this would be #image samples, | ||
//for stream with a weekly period this would be #samples per week. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- check for too small
- make relative % of new "epochSize" (samplesPerEpoch, period)
- epochSize is an estimate of periodicity of the data, eg:
- weekly reccuring timeseries: = number of samples per week
- mnist dataset = #samples on training set
- unknown (timeseries -> 0 = infinity)
@@ -202,6 +214,7 @@ class SpatialPooler : public Serializable | |||
@param wrapAround boolean value that determines whether or not inputs | |||
at the beginning and end of an input dimension are considered | |||
neighbors for the purpose of mapping inputs to columns. | |||
//TODO does it hurt to set this to always true? We could rm NonWrappingNeighbourhood |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whether or not inputs
at the beginning and end of an input dimension are considered
neighbors for the purpose of mapping inputs to columns
biologically, if we assume hierarchy, a Region we model with SP is a portion ("rectangle") on a 2D sheet. Its input field is another 2D sheet (or retina, ...) -> so inputs on one side are not close to the others. So we should leave this OFF?
|
||
@param stimulusThreshold This is a number specifying the minimum | ||
@param stimulusThreshold This is a number specifying the minimum //TODO replace with `robustness` 0..1.0, which will affect this & synPermInc/Dec |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stimulusThreshold well represents "robustness" (to noise) of the segment.
- bump default not to be too small (to 2,3,4,..?)
- must not be too high, or no segment will be able to satisfy and no learning will occur -> auto check that
num potential synapses on segment is x-times (2 times ?) bigger than the threshold
- in "smart" replace with "robustness" [0.0..1.0]
@param potentialPct The percent of the inputs, within a column's | ||
potential radius, that a column can be connected to. If set to | ||
1, the column will be connected to every input within its | ||
@param potentialPct The percent of the inputs, within a column's //TODO make this "automated" depending on #potentialRadius & numColumns. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
rename to
columnInputOverlapPct
-
remove and make it a function of
Fn(#columns, input area, potential radius,local area pct, *prefer-local-vs-global)
- #columns + -> Fn -
- area + -> Fn -
- pot radius + -> Fn +
- local area Pct + -> Fn +
- prefer local + -> Fn +
-
the Fn represents "prefer local, details" (over global, holistic)
-
new smart param "prefer local" 0..1.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this works well, see #533 for successful demonstration
inactive synapse is decremented in each learning step. | ||
|
||
@param synPermActiveInc The amount by which the permanence of an | ||
@param synPermActiveInc The amount by which the permanence of an //TODO ditto |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO how related ActiveInc, InactiveDec? Something as "prefer forgetting, or learning new?" Which is about relative ratio of the two
number of synapses that must be active in order for a column to | ||
turn ON. The purpose of this is to prevent noisy input from | ||
activating columns. | ||
|
||
@param synPermInactiveDec The amount by which the permanence of an | ||
@param synPermInactiveDec The amount by which the permanence of an //TODO make fixed and only depend on robustness? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make fixed? and depend only on robustness modifier (robustness + -> both changes - )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
synPermActiveInc
and synPermInactiveDec
can be reformulated as learningRate
and coincidenceThreshold
where:
coincidenceThreshold = inc / dec
learningRate = 1 / inc
which is the maximum number of cycles it takes for a synapses permanence to saturate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that might be better, for sure learningRate.
what is the meaning of coinc threshold? I take it it's whether it's easier to learn new (and "fill" mem faster), or forget when not repeated (so stable vs "one-shot" patterns?), vs balanced when equal (is this a golden middle?)
So for long timeseries, I'd choose to have more of forgetting, and for short, new, relatively rare events more learning?
The SP does not really unlearn (it could, but the capacity is just huge)?
I think a better approach for this PR would be to make a parameter structure. |
that's a good idea! So a struct
and |
In this PR my aim is at improving constructor parameters of the SpatialPooler.
Goals
Make the SP (params) more:
Implementation
In this PR I'd like to discuss feasibility, usefulness of the proposed changes, and then implement one by one with tests in separate PRs.