SP compute() const, thread safe #560

breznak · 2019-07-12T13:10:01Z

This PR will benefit from boosting cleanup #544 , is part of parallelization effort #214 and will make PR parallel MNIST usable #559 .

The concept is to make SP thread-safe, suitable for parallel execution. This is done by making SP.compute() const. Eventually that is not feasible, as SP.adaptSegments() calls Conn.updateSynapsePermanence() which cannot be const. But it serves as proof of concepts and shows what can be const, and what needs to be synced.

Note that parallelizing SP.compute() like this won't be useful, only in MNIST #559 , in the end we want parallelized SP.compute() - the inside of it. Likely inhibition.

Takeaways:

focus more on functional programming. Functions are const and pass needed arguments. Less/none state of the object (this.numColumns_). Connections work like this.
SP.adaptSegment will not be const -> needs lock
research atomic, thread-safe data structures. Eg. Folly?
efficient parallelism is likely incompatible with deterministic builds. So for single-thread we get that, of for parallel we get speed.

and obtrusive for const compute()

still not 100% thread safe, but semantic const

replaced by Cereal, probably forgotten

and SP calculateOverlap_()

remove SP::calculateOverlaps_() which is replaced by this return value.

breznak · 2019-07-12T13:11:20Z

@marty1885 FYI, this is my take on c++17 TS parallel. Do you have any advice on atomic, thread safe data-structures? (Ive worked a bit with facebook/folly) to avoid need of locks?

marty1885 · 2019-07-12T13:19:21Z

@breznak The main reason I made all compute() functions in Etaler const is so that parallelizing then becomes trivial. And the DoD/functional approach also helps in a huge way.

Some guidelines:

const functions are awesome to parallelize
avoid atomics and mutex unless needed. They are costly
paralleling programming is easy as long as you don't care about speed

breznak

Please review if you are interested.

don't take too seriously the SP.compute() const
but the other changes I'd like to implement, as this is nicer, safer, faster.

breznak · 2019-07-12T13:13:37Z

src/htm/algorithms/Connections.hpp


  void computeActivity(std::vector<SynapseIdx> &numActiveConnectedSynapsesForSegment,
-                       const std::vector<CellIdx> &activePresynapticCells);
+                       const std::vector<CellIdx> &activePresynapticCells) const;


Connections::computeActivity can be made const. With the "fake" (mutable) that for timeseries_ the curent/previousUpdates are mutable & not locked (I may use mutex, or just let it be for now)

breznak · 2019-07-12T13:15:11Z

src/htm/algorithms/SpatialPooler.cpp

@@ -176,18 +176,6 @@ void SpatialPooler::setBoostStrength(Real boostStrength) {
  boostStrength_ = boostStrength;
 }

-UInt SpatialPooler::getIterationNum() const { return iterationNum_; }
-
-void SpatialPooler::setIterationNum(UInt iterationNum) {


SP removed iteration stuff. not really needed. And connections will have the iteration from #537

breznak · 2019-07-12T13:15:56Z

src/htm/algorithms/SpatialPooler.cpp

@@ -462,16 +441,18 @@ void SpatialPooler::initialize(
 }


-void SpatialPooler::compute(const SDR &input, const bool learn, SDR &active) {
+vector<SynapseIdx> SpatialPooler::compute(const SDR &input, const bool learn, SDR &active) const {


WIP compute() is const.
But that will not be feasible. (adaptSegment -> conn.updateSynapsePermanence)

breznak · 2019-07-12T13:17:27Z

src/htm/algorithms/SpatialPooler.cpp

@@ -934,7 +901,7 @@ void SpatialPooler::inhibitColumnsLocal_(const vector<Real> &overlaps,


 bool SpatialPooler::isUpdateRound_() const {
-  return (iterationNum_ % updatePeriod_) == 0;
+  return (rng_.getReal64() < 1.0/updatePeriod_); //approx every updatePeriod steps


this is one place where iterationNum_ was needed, I have replaced with probability, which I think is nicer approach.

breznak · 2019-07-12T13:22:45Z

src/htm/utils/Random.hpp

  // calls to RNG
-  std::mt19937 gen; //Standard mersenne_twister_engine 64bit seeded with seed_
+  mutable std::mt19937 gen; //Standard mersenne_twister_engine 64bit seeded with seed_


all Random methods can be made const (thread safe), with the exception in mutable here. Result, will work OK, but for effective parallel algorithm, we likely won't have deterministic builds. So when finished, I imagine we run parallel by default, but when stuff breaks, we switch to single thread and deterministic operation.

@marty1885 thank you for your advice. What do you think of the mutable in Random here? Shouldn't make too much damage if some collision messed the state.

but it results in the non-deterministic builds. (Currently we support set seed and deterministic operation for the whole HTM, which is cool for some usages).

I'll say that using a shared RNG across threads is not a good idea. Besides making things nondeterministic. It also leads to false sharing and a tremendous amount of cache invalidation. (Even though I do share a RNG across threads in Etaler)

One solution is to have a master RNG, and then you use the master RNG to seed light weight RNGs local to each thread. This will make your results deterministic as long as you have the same # of thread.

Update: you can have a master RNG, then use the master RNG to seed a secondary RNG local to each function. Then use the secondary to seed thread-local RNGs. This way you can have deterministic results no matter the amount of threads.

In PRs that prepare for this functionality, like #552 , I'm removing dependency on local variables / state. That is a good thing overall. With the suggested approach as above, that is "passing all that is needed as parameters" I need to balance convenience and usability of the public API (for single threaded use), I don't want users who want to call SDR result = SP.compute(input) to need to pass 10s parameters (Random, ...).

So I'll need to figure some compromise. Maybe function overloads for user,parallel? compute(simple, params) x compute(many,needed,for,parallel)

Also whole SP.compute is quite bad idea to parallelize, rather some inner methods: inhibition, overlaps,...

breznak · 2019-07-12T13:29:58Z

avoid atomics and mutex unless needed. They are costly

will keep in mind

paralleling programming is easy as long as you don't care about speed

this makes the code more complex, I'm doing it only for the speed.

breznak · 2019-07-13T10:52:31Z

Follow up,

should I be consistent in the meaning const = 100% thread-safe? Like not abusing mutable, so revert the const changes to Random?

Thread-count management

How do you deal with some heuristic on when to parallelize?
There are 2 cases:

too small loops: if num columns > 10.000, it'd make sense, but for 100s a single thread would be faster.
total number of threads used by HTM. On CPUs I'm much more limited by available parallel computation resources. Say for a 2 thread system: If the task is MNIST, only SP, then I can parallelize some computation. If I use a Network (SP+TM+...) then I'm better off only running each of them in a single thread.

In real world it'll be something inbetween, say newest AMD Zen CPUs with 32threads. If I have HTM with SP-TM-SP-TM. I want to give 32/4 threads to each object.

now I realize c++17 ParallelTS/TBB is not able to specify the number of threads the loop should use. Is this already done some smart way?
- I will want to implement some control over threading, so for debug we can disable it, probably in a header define mode = std::execution_policy::par and redifine it to ::sequential if parallelization should be disabled
parallelizing SP.compute is probably pointless (as it'd be only used in MNIST and the likes, not for sequential usecases) (but this PR is excercise for me to get on with the TBB)
- should I focus on parallelizing stuff inside SP-compute? ie the inhibition?
  - or focus solely on making Connections (which is the workhorse) work seamlessly with threads?

marty1885 · 2019-07-13T11:09:37Z

Regarding to managing the amount of threads.
TBB can do so. Please refer to: https://stackoverflow.com/questions/3786408/number-of-threads-used-by-intel-tbb

And the problem of to too much threads. My solution in Etaler (not implemented) is to have a NUMA aware backend. For example. A 32 thread ThreadRipper is made of 4 NUMA nodes. And the InfinityFabric link is the main bottleneck. So just make your backend run on one of the NUMA modes. And spawn multiple of them.
Ref: etaler/Etaler#14

breznak · 2019-07-13T11:19:53Z

Regarding to managing the amount of threads.
TBB can do so. Please refer to: https://stackoverflow.com/questions/3786408/number-of-threads-used-by-intel-tbb

Thank you for this link! I assume Parallel TS is TBB, so I can use this.
For the hack I described (re)defining mode to switch par/seq, I guess the manager is that smart to use seq when #threads =1

marty1885 · 2019-07-13T11:23:02Z

No, Parallel TS is not TBB. They do look similar, but they are not the same thing.

breznak · 2019-07-13T11:46:02Z

No, Parallel TS is not TBB.

oh, I'm aiming for the c++17 standard. Seems the whole parallel_for is TBB specific, I should use transform

breznak · 2019-07-13T11:56:49Z

OT It looks like 3+ years, but I like where c++ is heading with concurrency, parallelism and heterogenous computing :
https://www.codeplay.com/portal/10-06-17-whats-in-cpp-20-and-cpp17-final-score-card

It looks like eventually we'll be able to run on GPUs and even FPGAs.

marty1885 · 2019-07-13T12:19:06Z

My experiences with FPGA tells me to not expect that to work well. FPGA is difficult to optimize and takes forever to compile (> 6Hr compilation time for a mid-range FPGA and a 10K compiler license fee)

The GPU part sounds interesting!

breznak added 8 commits July 11, 2019 22:21

SP compute const WIP

92da6ed

SP remove iterationLearnNum_ as unused

2ab0215

and obtrusive for const compute()

Random: make const and mutable where possible

6714ed3

still not 100% thread safe, but semantic const

Random: remove old serialization <<. >>

c92fc4c

replaced by Cereal, probably forgotten

Connections::computeActivity() const

001a170

and SP calculateOverlap_()

SP::compute returns overlaps

2d0641b

remove SP::calculateOverlaps_() which is replaced by this return value.

SP: remove boostOverlaps_, replaced by getBoostedOverlaps(overlaps)

1d3946a

Merge branch 'master_community' into sp_compute_const

5ead13f

breznak added in_progress SP optimization labels Jul 12, 2019

breznak self-assigned this Jul 12, 2019

breznak commented Jul 12, 2019

View reviewed changes

Merge branch 'master_community' into sp_compute_const

0cbfd63

Merge branch 'master_community' into sp_compute_const

1165ea0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SP compute() const, thread safe #560

SP compute() const, thread safe #560

breznak commented Jul 12, 2019 •

edited

Loading

breznak commented Jul 12, 2019

marty1885 commented Jul 12, 2019 •

edited

Loading

breznak left a comment

breznak Jul 12, 2019

breznak Jul 12, 2019

breznak Jul 12, 2019

breznak Jul 12, 2019

breznak Jul 12, 2019

breznak Jul 12, 2019

marty1885 Jul 12, 2019

marty1885 Jul 13, 2019

breznak Jul 13, 2019

breznak commented Jul 12, 2019

breznak commented Jul 13, 2019

marty1885 commented Jul 13, 2019 •

edited

Loading

breznak commented Jul 13, 2019

marty1885 commented Jul 13, 2019

breznak commented Jul 13, 2019

breznak commented Jul 13, 2019

marty1885 commented Jul 13, 2019 •

edited

Loading

SP compute() const, thread safe #560

Are you sure you want to change the base?

SP compute() const, thread safe #560

Conversation

breznak commented Jul 12, 2019 • edited Loading

breznak commented Jul 12, 2019

marty1885 commented Jul 12, 2019 • edited Loading

breznak left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

breznak commented Jul 12, 2019

breznak commented Jul 13, 2019

Thread-count management

marty1885 commented Jul 13, 2019 • edited Loading

breznak commented Jul 13, 2019

marty1885 commented Jul 13, 2019

breznak commented Jul 13, 2019

breznak commented Jul 13, 2019

marty1885 commented Jul 13, 2019 • edited Loading

breznak commented Jul 12, 2019 •

edited

Loading

marty1885 commented Jul 12, 2019 •

edited

Loading

marty1885 commented Jul 13, 2019 •

edited

Loading

marty1885 commented Jul 13, 2019 •

edited

Loading