Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

es_objects plugin #500

Merged
merged 14 commits into from
Mar 20, 2018
Merged

Conversation

oxarbitrage
Copy link
Member

Plugin will store certain objects from the blockchain to ES database. Can be easy extended to support more objects. Initial idea was to store proposal objects for #115

Instead of loading the database with more data i decided to store the proposal_objects to ES where no object is deleted while we leave the core to keep with the reduced active proposal list.

I noticed then that we could store account data so we can go over #452 without loading more the blockchain database with a working referral index. By saving the account data we can index by referrer with ease(ill post some samples). Another advantage is in the case we want to get the proxies, currently we have to loop throw all the accounts and checking the options.vote_as field. Very intensive.

Asset and balance objects were also added to the plugin to solve similar issues and make the searching inside them easier for the client apps.

Runs good among the original elasticsearch plugin for operations.A starting command with the 2 plugins can be something as:

programs/witness_node/witness_node --data-dir data/my-blockprod --rpc-endpoint "127.0.0.1:8090" --plugins "elasticsearch es_objects" --es-objects-proposals true --es-objects-accounts true --es-objects-assets true --es-objects-balances true

Please let me know what do you guys think.

Copy link
Member

@abitmore abitmore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I slightly tend to merge this plugin into the earlier ES plugin.

acct.lifetime_referrer_fee_percentage = account_object->lifetime_referrer_fee_percentage;
acct.referrer_rewards_percentage = account_object->referrer_rewards_percentage;
acct.name = account_object->name;
acct.owner = fc::json::to_string(account_object->owner.get_addresses());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better use the whole account_object->owner here. Addresses are only used for backward compatibility. Usually people use keys and account ids.
This also applies to active authorities.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added in this commit 3545c22
i had to add the whole flat_map in text form to make it work. this will make the indexing by this fields hard but i think its ok, the real intention is to have something available to relate the accounts index with the balances that uses an owner key.

acct.name = account_object->name;
acct.owner = fc::json::to_string(account_object->owner.get_addresses());
acct.active = fc::json::to_string(account_object->active.get_addresses());
acct.voting_account = account_object->options.voting_account;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to use a constructor function to initialize all these fields.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls explain a bit more this because i don't get it. constructor will be made in the hpp while i don't have this values available there to initialize this way, i am probably missing something, an example may be good. thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, in hpp file, have

struct account_struct {
   string name;
   ...
   account_struct(const account_object* obj) {
      name = obj->name;
      ...
   }
}

In cpp file (here), only one line:

   account_struct a(pointer_to_the_account_obj);

}

void es_objects_plugin_impl::SendBulk()
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like duplicate code. Better abstract to dedicated utility class, so can be used in all ES related plugins.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree, working on it.

prop.id = proposal_object->id;
prop.expiration_time = proposal_object->expiration_time;
prop.review_period_time = proposal_object->review_period_time;
prop.proposed_transaction = fc::json::to_string(proposal_object->proposed_transaction);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO we should store approved_by fields for proposal objects.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can it be done by adding the proposal objects fields required_active_approvals, available_active_approvals, required_owner_approvals, available_owner_approvals and available_key_approvals ?
let me know if this will do it so i can add these.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It worth a try, although I'm not sure whether the final state can be correctly stored with current implementation, because if the proposal has no review period, the object will be removed in the same block when the proposal is approved.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added on fe4e87b

}

for(auto const& value: ids) {
if(value.space() == 1 && value.type() == 10 && _es_objects_proposals) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use magic numbers. We can check like this:

if( value.is<proposal_object>() && _es_objects_proposals )

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, sorry, i was supposed to change that before sending the pull. changed now here 26c57fd


for(auto const& value: ids) {
if(value.space() == 1 && value.type() == 10 && _es_objects_proposals) {
auto obj = db.find_object(value);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be nullptr.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added check for nullptr at 7b18456

bool _es_objects_assets = true;
bool _es_objects_balances = true;
bool _es_objects_logs = true;
CURL *curl; // curl handler
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to use a new curl handler for each plugin?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting question, this is debatable at least to me. using the same handler for all requests that goes to the same url is recommended for performance and doing otherwise will be impossible for our purposes in the original plugin(connect and disconnect with each request will be just too slow in a replay).

i made a new handler here because i am not sure how much is too much when they run at the same time. the performance of sending bulks in 5k operation batches were proven to be good enough in a replay scenario while if we add more queries to the same handler we could run into performance issues with more data coming in by new plugins trying to store more data in the database. for example if first plugin is sending a bulk of 5k operations(10000 lines of text ) and this new plugin is trying to send 1k lines more it will need to wait until the handler is available to do it ending in some delay that i am not sure if it will be a problem in practice.

basically for that reason a new handler was added and answering the question; yes. the intention was to use a new handler with each plugin so they can be sending data in "parallel".

i was able to get in sync a node with the 2 plugins running at the same time and it was ok this way, i can make some tests by using the same handler and check that if you think it worth it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I'm not sure, so was asking.
If the 2 plugins are merged into one, there will be only one handler by nature.

auto obj = db.find_object(value);
auto b = static_cast<const balance_object*>(obj);
PrepareBalance(b);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's best if we can also store account statistics, dynamic asset data, bitasset data and etc along with the main objects.

// time.
//
#ifndef PROPOSAL_SPACE_ID
#define PROPOSAL_SPACE_ID 6
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better use a better name here.
Also I'm not sure if the number 6 is appropriate.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is something i dont understand fully so maybe you can explain me a bit. the comment states that plugins with the same ID will be added to the same binary so i think maybe is appropriate to have the same ID in the 2 ES plugins as: https://github.com/bitshares/bitshares-core/blob/develop/libraries/plugins/elasticsearch/include/graphene/elasticsearch/elasticsearch_plugin.hpp#L47

but maybe i am wrong, the snapshot plugin from @pmconrad actually dont define a number(https://github.com/bitshares/bitshares-core/blob/develop/libraries/plugins/snapshot/include/graphene/snapshot/snapshot.hpp) please let me know if we should change this pulgin to number 7.

in regards to the better name i am not sure what do you mean, there is no name defined here but only the ID. let me know. thanks @abitmore !

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was referring that you use es_object for the plugin name but setting PROPOSAL here, it's not good practice. Not about the number.
According to the comment above, there should be a script to change the ID's at building time, but I'm not sure if there is one.
If I understood correctly, the space ID's are used to define a "space" for data storing in the object database. So, if no new object type is defined and need to be stored into object database, no need to set a XX_SPACE_ID, as have done with the snapshot plugin.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, sorry about that name, didn't saw it. it must be there because in a first version i was only going to store proposals. anyways, with your clarification i now removed all that section from the hpp file in commit 7b453c6 and code stills compiles and works fine.
thanks.

@oxarbitrage oxarbitrage mentioned this pull request Nov 28, 2017
@oxarbitrage
Copy link
Member Author

I slightly tend to merge this plugin into the earlier ES plugin.

After thinking about this i tend to agree and think it could be a good idea to have a single plugin with all the elastic stuff on it. I can close this pull and try to add this to the original if you think it will be better, i don't have any valid argument to don't do it that way neither against multiple plugins so i am not sure.

@pnomolos
Copy link

Hi @oxarbitrage What's the time frame on this do you think? Thanks!

@oxarbitrage
Copy link
Member Author

last commit (2bf302d) moves SendBulk and createBulk common elasticsearch calls to the utilities part of the project. es_objects plugin now make uses of them and also original elasticsearch_plugin can use them but i will make that in another pull.

it also add some console output for errors(#681), add elasticsearch 6 support and remove unnecessary includes.

@oxarbitrage
Copy link
Member Author

Hi @oxarbitrage What's the time frame on this do you think? Thanks!

I have resumed to work on it to have it included ASAP.


graphene::chain::database& database()
{
return _self.database();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this for?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need, simplified database call here 207eb07 thanks!


// check if we are in replay or in sync and change number of bulk documents accordingly
uint32_t limit_documents = 0;
if((fc::time_point::now() - block_time) < fc::seconds(30))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the only way to figure out if we are in replay or normal mode?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not that i know, this one seems to be effective as it is used in already implemented elasticsearch for account history plugin: https://github.com/bitshares/bitshares-core/blob/master/libraries/plugins/elasticsearch/elasticsearch_plugin.cpp#L198

of course open to ideas to change both.

}

void es_objects_plugin::plugin_startup()
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a log entry here so people have feedback about es plugin being enabled when starting the backend, please?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added at 109b19f thx.

@xeroc
Copy link
Member

xeroc commented Mar 2, 2018

Looks great!

@oxarbitrage
Copy link
Member Author

when saving object bitasset i realized it hits the database hard, even if feed and current_feed don't change the object is updated with a new date all the time, plus updated when any of the feeds inside feed change and finally also updates when current_feed change. this last and only this is what we want to track.

so i created a map of bitassets to keep track of the last current_feed string in order to compare when object change and insert records in the database only in that situation.

i didn't synchronized the chain full but it seems to be reasonable. maintaining the map of bitassets will increase RAM but it is better than massively querying the ES database to check if a change in current feed was done.

Copy link
Member

@abitmore abitmore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At a glance it's fine. We can merge it.


void es_objects_plugin_impl::PrepareAsset(const asset_object* asset_object)
{
asset_struct asset;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend not use "asset" as a local variable name, since there is a class or struct has same name. I'm not sure if it will cause trouble.

@oxarbitrage
Copy link
Member Author

don't merge yet please, need a few more changes.

@abitmore
Copy link
Member

This discussion is interesting: steemit/steem#2191

@oxarbitrage
Copy link
Member Author

oxarbitrage commented Mar 20, 2018

i am having a problem with the limit orders saving. it seems that when limit and fill are in the same block order 1.7.X is not created. the market history plugin subscribes to applied_block, maybe i that one at least for limit orders.

This discussion is interesting: steemit/steem#2191

will check this.

@abitmore
Copy link
Member

When a new limit order is filled in the same block, it will be created then removed, so you can't get the limit order with db.get(...) after the block is applied. The observer mechanism used in #732 may help.

Copy link
Member

@abitmore abitmore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can merge this now, and fix the same-block-removal issue with another PR.

@oxarbitrage oxarbitrage merged commit 51e6c79 into bitshares:develop Mar 20, 2018
@pnomolos
Copy link

@oxarbitrage I should now be able to go ahead with bitshares/bitshares-ui#661, right?

@oxarbitrage
Copy link
Member Author

@pnomolos close to it but not yet i think. i have a documentation for the plugin i will post in this next days that will make it easier to understand, there is going to probably be a hotfix before the next bitshares-core release for a bug we already identified in the limit orders tracking. i will wait until that to get started.

@pnomolos
Copy link

@oxarbitrage Thanks!

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants