Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

A new Region for plots in C++ (for C++ and C# apps) #748

Closed
dkeeney opened this issue Nov 8, 2019 · 39 comments · Fixed by #860
Closed

A new Region for plots in C++ (for C++ and C# apps) #748

dkeeney opened this issue Nov 8, 2019 · 39 comments · Fixed by #860

Comments

@dkeeney
Copy link

dkeeney commented Nov 8, 2019

The objective is to have a Region that is accessible by C++ (and eventually C#) users that can perform plots, much like Matplotlib which we have for Python.

@dkeeney
Copy link
Author

dkeeney commented Nov 8, 2019

My first impression was to search for plot library package written entirely in standard C++11 that can be linked into the htm.core library. However, the downside (in addition to license issues) is that it blots the library.

We are not looking for a super fast package, just something that can provide the capability enjoyed by Python users with Matplotlib. So, why not just call Matplotlib from C++?

  • It uses commands that we are familiar with.
  • We already know its stable
  • We already have instructions on how to download and install matplotlib.
  • It can be setup as an optional package. If not there it will just not plot.

For inspiration I looked at this package: https://github.com/lava/matplotlib-cpp
It is a simple wrapper around matplotlib to implement the calls from C++.
This could do the job but I would rather that it were implemented using the pybind11 library.
After a quick look through, this does not look too difficult to re-implement with pybind11.

Before I start in on this... does anyone have a better idea?

@breznak
Copy link
Member

breznak commented Nov 8, 2019

Python users with Matplotlib. So, why not just call Matplotlib from C++?

I agree on the matplotlib based package.

Do we even need this in c++? PyRegion would be much simpler, and yes it would require python to be installed on the machine (but for machines that will do graphical outputs, I think it's an OK requirement for on optional functionality)

@dkeeney
Copy link
Author

dkeeney commented Nov 8, 2019

Do we even need this in c++?

Good point. That would be much easier...

I got to thinking about how we would link up a .py Region for an app that its main is in C++. In order to do embedded python we need the python library to be linked in. The C++ app would also need PyBindRegion and RegisteredRegionImplPy classes. And to be general it would also need all of the python bindings so that the .py classes it calls can call back into htm.core; so basically it would need the 5 extension libraries.

The extension libraries are basically shared libraries. I wonder if a C++ app could link with those directly. I guess that is something to investigate.

@breznak
Copy link
Member

breznak commented Nov 8, 2019

I got to thinking about how we would link up a .py Region for an app that its main is in C++.

I think what you're trying to figure is if c++ only app could use our code written in py (with python statically linked etc).
What I meant was simpler, just offer the functionality if the system provides python and has htm.core installed with Py extensions on.

@Thanh-Binh
Copy link

@dkeeney I do not know what kind of plot/visualization this region should have? @marty1995 and I usually use SFML library for visualizing some important visualization tasks related to HTM (ploting curves, Cells/Layers-Visualization). I think it is good for all we need...

@breznak
Copy link
Member

breznak commented Nov 8, 2019

Another way would be to get in touch with @Zbysekz and HTMpandaVis, if it would make sense to just output the data to a file with some readable format for the visualizer, and let the graphics be on them.

But I think the 2 projects are about something else, HTMpandaVis plots the connections etc, while we want simple graphs here and maybe a visualization of a SDR.

@dkeeney
Copy link
Author

dkeeney commented Nov 8, 2019

for an app that its main is in C++.

I was trying to keep it all within one C++ process...as a plugin for NetworkAPI. Spawning out to the shell to call python would work I guess. Is that what you are suggesting?

@Zbysekz
Copy link

Zbysekz commented Jun 3, 2020

Another way would be to get in touch with @Zbysekz and HTMpandaVis, if it would make sense to just output the data to a file with some readable format for the visualizer, and let the graphics be on them.

Currently, i am working on revamp of the HTMpanda interface. Now it works through TCP sockets. But we got stuck with some execution order problems, and also execution speed is not so good for larger projects. So i decided to change this.
Basically, any script will feed data into SQLite3 database to defined table structure. (very simple)
I call this "baking".
Then HTMpandaVis can read this afterwards and do whatever it wants with it, taking advantage of having the history on reach of the hand.

But I think the 2 projects are about something else, HTMpandaVis plots the connections etc, while we want simple graphs here and maybe a visualization of a SDR.

Not really. I would like to incorporate also "universal data plot window". User configurable window, choosing what variable from database you want to plot. Up to let's say 10 or so graphs in one window so the state of the simulation can be seen. I am imagining just using matplotlib.
There could be startup window where you can choose if you want to visualize network or just plot these values, or both...

I could create C++ "data feeder" which could simply contains method like

InsertVariableDataStream( name, datastream);

What do you think? Is this what is wanted?

@breznak
Copy link
Member

breznak commented Jun 3, 2020

so there are 2 stages?

  1. record (baking)
  2. replay offline? (for that you don't need a HTM, so one could just share the recorded "image", and others can play,analyze it? 👍 👍 )

We have serialization for Connections, can you just use that? Or it stores too much data and you need to be storage-savvy?

  • bonus would be that you don't need to serialize the internals manually, just call connections.save() (and each TM,SP,.. has Connections).
  • do you need to store some other members that are specific to the SP,TM - or Conn is enough?

Not really. I would like to incorporate also "universal data plot window". User configurable window, choosing what variable from database you want to plot.

so this will include the raw data? (sequence of float, or an image, ...) ?
This is the approach I'd like to take, if possible not reinventing the wheel (by htm.core) and leaving the visualization to a dedicated project.

I could create C++ "data feeder" which could simply contains method like
InsertVariableDataStream( name, datastream);

sounds exactly like how I imagined things for this 👏

@breznak
Copy link
Member

breznak commented Jun 3, 2020

Repost the API draft:
htm-community/NAB#36 (comment)

@Zbysekz
Copy link

Zbysekz commented Jun 3, 2020

so there are 2 stages?

  1. record (baking)
  2. replay offline? (for that you don't need a HTM, so one could just share the recorded "image", and others can play,analyze it? 👍 👍 )

Yes exactly. Separated from htm.core code completely. The SQLite3 database has file with *.db extension.

We have serialization for Connections, can you just use that? Or it stores too much data and you need to be storage-savvy?

  • bonus would be that you don't need to serialize the internals manually, just call connections.save() (and each TM,SP,.. has Connections).
  • do you need to store some other members that are specific to the SP,TM - or Conn is enough?

If i use the connections.save() it will save all permanences for every cell right? If i calculate correctly, for Hotgym with these parameters:
'columnCount': 1638, 'cellsPerColumn': 13,'maxSegmentsPerCell': 128, 'maxSynapsesPerSegment': 64
it is something about 172M permanence values -> 5,5GiB for float32 datatype considering every cell has maxSegment count. Am i calculating that correctly?

If yes, then that's too much as i need to store thousands of iterations...
In my current state, the database contains everything excluding distal connections and it has few GB for 4k iterations, which is very OK. The problem is with distal synapses only.
Resolution could be maybe storing data just for interesting columns ? Like columns where is one of the cell predictive or active?
Do you have any suggestions please?

so this will include the raw data? (sequence of float, or an image, ...) ?

yes it will create the table with the given name if not exists and put there the data. Two columns, one is iteration number, second is the value. Image also could be stored, because it supports "blob" datatype (i am using that for storing whole numpy array)

(P.S. funny fact, sqlite3 has limit for one database file 140 TB 👍 )

@breznak
Copy link
Member

breznak commented Jun 3, 2020

If i use the connections.save() it will save all permanences for every cell right? If i calculate correctly, for Hotgym with these parameters:
'columnCount': 1638, 'cellsPerColumn': 13,[...]
it is something about 172M permanence values -> 5,5GiB for float32 datatype considering

yes, but it's a bit smart. Stores only sparse synapses, also in binary format (possibly can even compress?).

So, I quick test: TM(2048cols, 32 cells per col), ran a couple of iterations of randomized SDR. tm.saveToFile("/tmp/tm.dump") results in ~1MB. Which would be much more reasonable. Could you try this on your workload?

The problem is with distal synapses only.
Resolution could be maybe storing data just for interesting columns ? Like columns where is one of the cell predictive or active?

I think that is a good idea 👍 We're storing the "raw inputs" anyway, right? So given encoders are deterministic, we could compute the distal if needed.

sqlite3 has limit for one database file 140 TB +1

we're good for a while :)

@Zbysekz
Copy link

Zbysekz commented Jun 16, 2020

I've ended up with these plots as result: (there are 13 plots on one page in this case)
image

This uses the python plotly dash library. Very nice interactive graphs. In web browser
My script called "dashVis.py" reads from SQLite database and shows table data in browser (localhost, port 8050)

All plots have iteration on x axis. There is small cfg text file, where you can specify what you want to plot from the database and set labels.

To have just line plots with x axis as iteration is now enough for pandaVis.
This will be used by pandaVis as sub-program run from main app.

Should i write C++ feeder to be able to fill data into database? Or this is for no use regarding this PR?

@breznak
Copy link
Member

breznak commented Jun 16, 2020

python plotly dash library. Very nice interactive graphs. In web browser

wow! The image looks really great! I'm looking forward to the interactive plots.

Should i write C++ feeder to be able to fill data into database? Or this is for no use regarding this PR?

I'd like to hear from @dkeeney . But for me, if it's not too complicated, it'll surely be nice way to achieve the visualizations.
How would the 'feeder' work? As a NetworkAPI region? A wrapper (or better Interface) over existing TM,SP classes? I wouldn't like if the TM,SP code needed be complicated for viz purposes, but we can surely add some data-export function to provide that.

@dkeeney
Copy link
Author

dkeeney commented Jun 16, 2020

@breznak 's response is the same as mine... wow!

We would love to have such a feeder for C++. Such a module was on my 'back burner' to do and this is wonderful that you have taken on this project.

What I envisioned as a feeder from NetworkAPI was a new built-in Region Implementation which accepts a link and displays the data interactively. Much like FileOutputRegion except rather than sending the data to a file it sends it to the plot. To add a plot into a project, a user would connect a link from any output to this new PlotRegion and like magic there is a plot available in a browser.

This would work with the REST api as well.

@Zbysekz
Copy link

Zbysekz commented Jun 17, 2020

To be honest, i don't know the Network API, nor how to use it. I can't find API doc for the FileOutputRegion but it shouldn't be so hard to use.

@dkeeney do we have some minimal example that i can see how FileOutputRegion is used and run it?

I found this, but it uses just fileoutput stream.
napi_hello example

@breznak
Copy link
Member

breznak commented Jun 17, 2020

I can't find API doc for the FileOutputRegion

There's atleast
https://github.com/htm-community/htm.core/blob/master/src/htm/regions/FileInputRegion.hpp

A minimal example should be in the region's test: src/test/unit/regions/VectorFileTest.cpp

I found this, but it uses just fileoutput stream. napi_hello example

@dkeeney could you please look at these 2 things?

  • rename the VectorFileTest to match the FileInputOutput region?
  • it would be really nice if the napi_hello used a region to read in data, as well as to write out results. Good point, Zbysekz 👍
  • I think we had related issue on merging the Input,Output regions into a FileIORegion..?

@dkeeney
Copy link
Author

dkeeney commented Jun 17, 2020

rename the VectorFileTest to match the FileInputOutput region?

Yes, we can do that. Although it does not sound like a high priority issue.

it would be really nice if the napi_hello used a region to read in data, as well as to write out results. Good point, Zbysekz 👍

I was trying to strip down the example to simplify it as much as possible. But sure, that would be reasonable.

I think we had related issue on merging the Input,Output regions into a FileIORegion..?

I have that on my back burner as a low priority project that I never got to. Not sure it is really important.

@dkeeney
Copy link
Author

dkeeney commented Jun 17, 2020

I would look at FileOutputRegion object as an example. Make a copy of it and edit it from there.

Any configuration you need would be passed in params in the constructor.
The data you will want to plot would come in via the compute( ) function which is called on each cycle. Within that function you can get the data passed in.

dataIn_ = region_->getInput("dataIn")->getData();

This will fetch the incoming data for that cycle. This is an Array object which is a container for array data. On that container, dataIn_.getCount( ) is the elements in the array, dataIn_.getBuffer( ) gets a C style pointer to the beginning of the array.

The data type of each element is set in the spec. The value NTA_BasicType_Real32 means it is a float. The link object will have already converted the data to this type if necessary. So you can cast the pointer to Real32 and use it for the plot.

@Zbysekz
Copy link

Zbysekz commented Jun 26, 2020

Thanks for advice @dkeeney.

I made simple few line modif of napi_hello to use the fileoutput region https://github.com/Zbysekz/htm.core/tree/RegionForPlots
If i run it, it creates output file containing big table. Each row is one iteration.
Is each column cell state?
Where is defined what is given to the output region? I mean, for example where to define, that i want to send raw anomaly (or some metrics) to the output region?
I try to dig what the "uniformLink" type means but it seems that's not used?

@Zbysekz Zbysekz self-assigned this Jun 26, 2020
@dkeeney
Copy link
Author

dkeeney commented Jun 26, 2020

Where is defined what is given to the output region?

When the net.link( ... ) function is called it sets up the connections of output to input. This can be configured by calling the net.link( ) function or by setting up links in the JSON string that is passed to net.configure( ... ).

i want to send raw anomaly

The 'anomaly' output on the TM should be connected to the input of your PlotRegion with a link.

I try to dig what the "uniformLink" type means but it seems that's not used?

That is a leftover from Numenta's implementation. The link has no parameters other than the delay so those two parameters are not used. I left them there because I did not want to break the API.

I will write up something that describes how to build a new region module for NetworkAPI. This information exist in the numenta's old documentation but its a little hard to extract.

@dkeeney
Copy link
Author

dkeeney commented Jun 27, 2020

See PR #853 which describes how to create a region for NetworkAPI.

What I was thinking was that your PlotRegion would control one plot. An app could configure multiple PlotRegion instances to handle different data. Let us know how it is going.

@Zbysekz
Copy link

Zbysekz commented Jun 30, 2020

I stuck little bit on the compilation with sqlite3 library.
I pick from the start the amalgamation version (means whole sql library is just sqlite3.h and sqlite3.c) with the benefit that it will be natively multiplatform. (Note the .c file has approx 220K LOC)

I added the new region into CMakeLists.txt in the src folder and also put sqlite3.c & sqlite3.h into utils folder.
It compiles, but when it get to the linking of dynamic_hello example:

/usr/bin/c++  -O3 -DNDEBUG   CMakeFiles/dynamic_hello.dir/examples/hello/hello.cpp.o CMakeFiles/dynamic_hello.dir/examples/hello/HelloSPTP.cpp.o  -o dynamic_hello -Wl,-rpath,/home/osboxes/HTM/htm.core/build/scripts/src: -m64 -Wl,--no-undefined -O3 -flto -fno-fat-lto-objects libhtm_core.so -lpthread -ldl 
/usr/bin/ld: libhtm_core.so: undefined reference to `sqlite3_close'
/usr/bin/ld: libhtm_core.so: undefined reference to `sqlite3_open'
collect2: error: ld returned 1 exit status
make[2]: Leaving directory '/home/osboxes/HTM/htm.core/build/scripts'
make[2]: *** [src/CMakeFiles/dynamic_hello.dir/build.make:100: src/dynamic_hello] Error 1
make[1]: *** [CMakeFiles/Makefile2:333: src/CMakeFiles/dynamic_hello.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....

It seems that it is because sqlite3.c is missing in build/scripts/src/Makefile
My suspicion is that it is because it has *.c extension and not *.cpp, but how to tell cmake that it should take both?

@breznak
Copy link
Member

breznak commented Jun 30, 2020

My suspicion is that it is because it has *.c extension and not *.cpp, but how to tell cmake that it should take both?

I don't think we make a regexp now, but explicitely name all the files.
Did you list the new source (& .h) files in the "utils section" in src/CMakeLists?

set(utils_files
htm/utils/GroupBy.hpp
htm/utils/Log.hpp
htm/utils/MovingAverage.cpp
htm/utils/MovingAverage.hpp

@dkeeney
Copy link
Author

dkeeney commented Jun 30, 2020

sqlite3 is a third party package.
I should move that into the external stuff so that it can be automatically downloaded.
After this PR is merged I will make another PR to do that move so don't worry about doing that now.

But for now you do need to explicitly name the .h and .c files or sqlite3 in src/CMakeLists.txt

@Zbysekz
Copy link

Zbysekz commented Jul 1, 2020

Yes i already put them in the src/CMakeLists.
Also removed build folder & rebuild with cmake ../.. inside build/scripts folder

Today i tried just for test renaming sqlite3.c to sqlite3.cpp and sqlite3.h to sqlite3.hpp and it showed up in build/scripts/src/Makefile ! so it seems that it is really because of the extension.

Here is my branch https://github.com/Zbysekz/htm.core/tree/RegionForPlots

@breznak
Copy link
Member

breznak commented Jul 1, 2020

Oh, good find.. it's probably due to our project being specified as CXX in said CMakeFile.
Then

@Zbysekz
Copy link

Zbysekz commented Jul 1, 2020

Oh great, i added enable_language( C ) to the CommonComplireConfig.make and it fills the .c & .h files into the makefile.
The compile run further, but there are warnings ( it seems that two types)
error: this statement may fall through [-Werror=implicit-fallthrough=]
and
error: cast between incompatible function types from ‘void (*)(void *, const char *)’ to ‘int (*)(u32, void *, void *, void *)’ {aka ‘int (*)(unsigned int, void *, void *, void *)’} [-Werror=cast-function-type]

@breznak
Copy link
Member

breznak commented Jul 1, 2020

is there a c++ implementation of the db?

I don-t know if the non-original cpp sqlite implementation would work for you? (as the db readable in python, etc) or if that's a good idea (bugs fixed in db) but here's some list:

https://srombauts.github.io/SQLiteCpp/#see-also---some-other-simple-c-sqlite-wrappers

the sqlite_orm looks active.

@breznak
Copy link
Member

breznak commented Jul 1, 2020

The compile run further, but there are warnings ( it seems that two types)

we have strict Werror policy for our code. For external this is turned off. Can you try to set some Wno-error=xxx in your cmake unit to avoid (too strictly) failing on this? Or use compiler directives (#pragma) in the code to silence those warnings (possibly for the whole file
https://stackoverflow.com/questions/3378560/how-to-disable-gcc-warnings-for-a-few-lines-of-code

@breznak
Copy link
Member

breznak commented Jul 1, 2020

i added enable_language( C ) to the CommonComplireConfig.make and it fills the .c

nit, once it works, can you try doing that only for the cmake unit, not globally?

@Zbysekz
Copy link

Zbysekz commented Jul 1, 2020

This grows over my head, the cmake language is spanish village for me,

i have put the sqlite in ThirdParty, created the sqlite3.cmake to automatically download tar, but how to build it to create .a library and link it to the project i don't know..
I have also problems with including header...

Can one of you help me to setup this? this branch is what i have now. My goal is just to compile it.

Note: i think compiling sqlite3.c & sqlite3.h into static library or object file will be enough, but which is better i have no idea

@dkeeney
Copy link
Author

dkeeney commented Jul 1, 2020

Yes, I can help you with CMake. I was expecting that you could get it to compile "any way it works" for this PR and I will then I would fold sqlite3 into the ThirdParty build with a second PR.

Would it be easier if I just did sqlite3 first and then you can add your project using sqlite3 library which would already be there?

@Zbysekz
Copy link

Zbysekz commented Jul 1, 2020

Would it be easier if I just did sqlite3 first and then you can add your project using sqlite3 library which would already be there?

Yes that would be wonderful, thanks.

@dkeeney
Copy link
Author

dkeeney commented Jul 1, 2020

I just checked in PR #857 which folds sqlite3 into the htm.core library.

In your code reference the .h as sqlite3/sqlite3.h
So the only change to src/CMakeLists.txt you should need to make is to add your source modules for PlotRegion into the Region section. You will also need to add your PlotRegion to the list of pre-registered modules in RegionImplFactory.cpp if you want it to be a built-in region.

@dkeeney
Copy link
Author

dkeeney commented Jul 1, 2020

hmmm, sqlite3 package did not build on Linux so I have more work to do.

@Zbysekz
Copy link

Zbysekz commented Jul 2, 2020

hmmm, sqlite3 package did not build on Linux so I have more work to do.

I've tried build PR #857 on my linux machine and passed. (.a file was created)
But when i try include sqlite3/sqlite3.h it cannot find it (also tried variants like sqlite/sqlite.h etc..)

@dkeeney
Copy link
Author

dkeeney commented Jul 2, 2020

Ok, that should have worked. Let me look into it further. #857 is now in the main but I want to be sure this is right.

@breznak
Copy link
Member

breznak commented Jul 9, 2020

Now we can continue in NAB, where it all started
htm-community/NAB#40

EDIT:
hmm :) is there a way-can we create a dummy region- that allows bridging "core" (TM,SP classes) code with NetworkAPI (which has the plot DatabaseRegion now)?
CC @dkeeney any idea? (do you get my problem?)
We could rewrite the NAB detector (where this originates) to use NAPI (but there's about 30% performance penalty). But in general, I'd like to know if we're able to bridge the 2 programming styles?

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants