Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Persisting into DB becoming very slow #16

Closed
FotiosBistas opened this issue Jan 14, 2023 · 1 comment · Fixed by #18
Closed

Persisting into DB becoming very slow #16

FotiosBistas opened this issue Jan 14, 2023 · 1 comment · Fixed by #18

Comments

@FotiosBistas
Copy link
Contributor

FotiosBistas commented Jan 14, 2023

Problem

I encountered this during the development of the HTTP server. I have inserted debug messages inside the hoarder/discoverer.go file:

addProviderRecordsHttp(...){
   ...
   defer addProviderWG.Done()
	ctx := discoverer.ctx
	counter := 0
	for {
		select {
		case trackableCids, ok := <-trackableCidsChannel:
                        log.Debugf(
				"New trackable CID array received from http channel. Cid:%s,ProvideTime:%s,PublicationTime:%s,Creator:%s. It's 
                                 number is %d",
				cidStr, tr.ProvideTime, tr.PublicationTime, tr.Creator, counter,
			)
			counter++
			...
			//the starting values for the discoverer
			cidIn, err := cid.Parse(cidStr)

			if err != nil {
				log.Errorf("couldnt parse cid")
			}

			cidInfo := models.NewCidInfo(cidIn, discoverer.ReqInterval, discoverer.StudyDuration, config.JsonFileSource,
				discoverer.CidSource.Type(), "")

			cidInfo.AddPublicationTime(tr.PublicationTime)
			cidInfo.AddProvideTime(tr.ProvideTime)
			cidInfo.AddCreator(tr.Creator)

			fetchRes := models.NewCidFetchResults(cidIn, 0)
			...
			discoverer.DBCli.AddCidInfo(cidInfo)

		}
         }
}

Everything is received properly by the channel, meaning if 3000 CIDs were provided I will definitely get 3000 counter appear in the log message. So after a lot of replications I think that the select statement in the db/client.go file is causing the problem.:

go func(wg *sync.WaitGroup, persisterID int) {
  defer wg.Done()
  logEntry := log.WithField("persister", persisterID)
  for {
  // give priority to the dbDone channel if it closed
  select {
  case <-db.doneC:
	  logEntry.Info("finish detected, closing persister")
	  return
  default:
  }
  
  select {
  case p := <-db.persistC:
	  switch p.(type) {
	  case (*models.CidInfo):
             ....
	  case (*models.PeerInfo):
	     ....
          case (*models.CidFetchResults):
             ....
	  }
  case <-db.ctx.Done():
	  logEntry.Info("shutdown detected, closing persister")
	  return
  }
 }
}(&persisterWG, persister)

From my experience select statements can become slow. In this particular instance some CIDs might take up (from what I have observed it might be more or less) to 30 minutes to be inserted into the database. In the end everything is inserted into the DB as expected.
Although I am not 100% certain the select statement is at fault, maybe the logic of inserting into the database is slow in it of itself.

@FotiosBistas FotiosBistas changed the title Persisting into DB Persisting into DB becoming very slow Jan 14, 2023
@cortze
Copy link
Owner

cortze commented Jan 16, 2023

You are right - the process could go much faster if we were writing into the DB more efficiently.
I'll try to implement a SQL query batching system, which should remove a huge part of the select bottleneck

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants