Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[FEATURE] Get raw result from cursor and add a Next method to a cursor to iterate results batch by batch #542

Open
charlyjna opened this issue Aug 10, 2023 · 4 comments

Comments

@charlyjna
Copy link

Hi,

I just realize that querying ArangoDB is limited to 1,000 records. How may I change it to 'unlimited'?

Thanks.

Charly

@jwierzbo
Copy link
Collaborator

Please use Cursor to iterate over batches: https://www.arangodb.com/docs/3.11/appendix-references-cursor-object.html

Here is an example from our tests in V2:

t.Run("Test retry if batch size equals more than 1", func(t *testing.T) {
opts := arangodb.QueryOptions{
Count: true,
BatchSize: 2,
Options: arangodb.QuerySubOptions{
AllowRetry: true,
},
}
var result []UserDoc
cursor, err := db.QueryBatch(ctx, query, &opts, &result)
require.NoError(t, err)
require.Len(t, result, 2)
require.Equal(t, docs[0].Name, result[0].Name)
for {
if !cursor.HasMoreBatches() {
break
}
require.NoError(t, cursor.ReadNextBatch(ctx, &result))
var resultRetry []UserDoc
require.NoError(t, cursor.RetryReadBatch(ctx, &resultRetry))
if cursor.HasMoreBatches() {
require.Len(t, result, 2)
require.Len(t, resultRetry, 2)
} else {
require.Len(t, result, 1)
require.Len(t, resultRetry, 1)
}
require.Equal(t, result[0].Name, resultRetry[0].Name)
}
err = cursor.Close()
require.NoError(t, err)

@charlyjna
Copy link
Author

I am using cursor but, in this case, I want to get raw JSON. I created a generic query execution method that return directly the result:

func (a *ArangoDB) ExecuteQuery(query string) (string, error) {
	errMsg := ""
	if cur, err := a.printCollection(query); err != nil {
		errMsg = fmt.Sprintf("Error while executing query 'query': %v", err)
		logger.Errorf("ArangoDB - %s", errMsg)
	} else {
		if cur != nil {
			defer (*cur).Close()
			if res, err := json.Marshal(cur); err != nil {
				errMsg = fmt.Sprintf("Cannot convert result to JSON: %v", err)
				logger.Errorf("ArangoDB - %s", errMsg)
			} else {
				result := make(map[string]json.RawMessage)
				if err := json.Unmarshal(res, &result); err != nil {
					errMsg = fmt.Sprintf("Cannot extract from JSON: %v", err)
					logger.Errorf("ArangoDB - %s", errMsg)
				} else {
					if raw, ok := result["result"]; ok && raw != nil {
						return string(raw), nil
					} else {
						return "", nil
					}
				}
			}
		}
	}
	return "", fmt.Errorf(errMsg)
} ```

In your example I understand that I should iterate the cursor, but in this case I should know which struct to use as a result recipient, isn't it?

@charlyjna
Copy link
Author

Ok, I read deeper the cursor readDocument method to understand how it works.
What I did is adding 2 methods:

// Return raw batch result
func (c *cursor) GetRawData() []*RawObject {
    return c.cursorData.Result
}
// A method based on readDocument but only to fetch the next batch
func (c *cursor) Next(ctx context.Context) error {
   ...
}

Now, I can use a generic method to execute a query and return a raw JSON file instead of iterating record by record.

I updated the title because it could be nice to have these methods in standard.

@charlyjna charlyjna changed the title [HELP] Limit 1000 records [FEATURE] Get raw result from cursor and add a Next method to a cursor to iterate results batch by batch Aug 10, 2023
@charlyjna
Copy link
Author

To explain more what I did:

First, I added these 2 methods in cursor_impl.go:

// Return raw batch result
func (c *cursor) GetRawData() []*RawObject {
	return c.cursorData.Result
}
// A method based on readDocument but only to fetch the next batch
func (c *cursor) Next(ctx context.Context, retryBatchID string) error {
	if c.resultIndex < 0 {
		return WithStack(NoMoreDocumentsError{})
	}

	// Force use of initial endpoint
	ctx = WithEndpoint(ctx, c.endpoint)
	if len(c.cursorData.NextBatchID) > 0 {
		c.NextBatchID = c.cursorData.NextBatchID
	}

	// This is required since we are interested if this was a dirty read,
	// but we do not want to trash the users bool reference.
	var wasDirtyRead bool
	fetchCtx := ctx
	if c.allowDirtyReads {
		fetchCtx = WithAllowDirtyReads(ctx, &wasDirtyRead)
	}

	p := path.Join(c.relPath(), c.cursorData.ID)

	// If we have a NextBatchID, use it
	if c.NextBatchID != "" {
		p = path.Join(c.relPath(), c.cursorData.ID, c.NextBatchID)
	}

	// We have to retry the batch instead of fetching the next one
	if retryBatchID != "" {
		p = path.Join(c.relPath(), c.retryData.cursorID, retryBatchID)
	}

	// Update currentBatchID before fetching the next batch (no retry case)
	if c.NextBatchID != "" && retryBatchID == "" {
		c.retryData.currentBatchID = c.NextBatchID
	}

	// Fetch the next batch
	req, err := c.conn.NewRequest("POST", p)
	if err != nil {
		return WithStack(err)
	}

	cs := applyContextSettings(fetchCtx, req)
	resp, err := c.conn.Do(fetchCtx, req)
	if err != nil {
		return WithStack(err)
	}
	if err := resp.CheckStatus(200); err != nil {
		return WithStack(err)
	}
	loadContextResponseValues(cs, resp)
	var data cursorData
	if err := resp.ParseBody("", &data); err != nil {
		return WithStack(err)
	}
	c.cursorData = data
	c.resultIndex = 0
	c.lastReadWasDirty = wasDirtyRead

	if !c.cursorData.HasMore {
		// Out of data
		c.resultIndex = -1
	}

	return nil
}

Then I use this code to query the database and get a result in JSON format:

...
if cur != nil {
	defer (*cur).Close()
	var raw []*driver.RawObject
	for {
		raw = append(raw, (*cur).GetRawData()...)
		if !(*cur).HasMore() {
			break
		}
		if err := (*cursor).Next(context.Background(), ""); err != nil {
			logger.Errorf("Cannot fetch next batch: %v", err)
			break
		}
	}
}
...

Do you think you could add these methods (GetRawData() and Next()) in a next release?

I wrote a small web server to be able to query ArangoDB from any external application using a POST query and get a JSON result. It is very helpful and very fast this way for example with Grafana or PowerBI.

Thank you.

Charly

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants