Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Get raw result from cursor and add a Next method to a cursor to iterate results batch by batch #542

Open
charlyjna opened this issue Aug 10, 2023 · 4 comments

Comments

@charlyjna
Copy link

Hi,

I just realize that querying ArangoDB is limited to 1,000 records. How may I change it to 'unlimited'?

Thanks.

Charly

@jwierzbo
Copy link
Collaborator

Please use Cursor to iterate over batches: https://www.arangodb.com/docs/3.11/appendix-references-cursor-object.html

Here is an example from our tests in V2:

t.Run("Test retry if batch size equals more than 1", func(t *testing.T) {
opts := arangodb.QueryOptions{
Count: true,
BatchSize: 2,
Options: arangodb.QuerySubOptions{
AllowRetry: true,
},
}
var result []UserDoc
cursor, err := db.QueryBatch(ctx, query, &opts, &result)
require.NoError(t, err)
require.Len(t, result, 2)
require.Equal(t, docs[0].Name, result[0].Name)
for {
if !cursor.HasMoreBatches() {
break
}
require.NoError(t, cursor.ReadNextBatch(ctx, &result))
var resultRetry []UserDoc
require.NoError(t, cursor.RetryReadBatch(ctx, &resultRetry))
if cursor.HasMoreBatches() {
require.Len(t, result, 2)
require.Len(t, resultRetry, 2)
} else {
require.Len(t, result, 1)
require.Len(t, resultRetry, 1)
}
require.Equal(t, result[0].Name, resultRetry[0].Name)
}
err = cursor.Close()
require.NoError(t, err)

@charlyjna
Copy link
Author

I am using cursor but, in this case, I want to get raw JSON. I created a generic query execution method that return directly the result:

func (a *ArangoDB) ExecuteQuery(query string) (string, error) {
	errMsg := ""
	if cur, err := a.printCollection(query); err != nil {
		errMsg = fmt.Sprintf("Error while executing query 'query': %v", err)
		logger.Errorf("ArangoDB - %s", errMsg)
	} else {
		if cur != nil {
			defer (*cur).Close()
			if res, err := json.Marshal(cur); err != nil {
				errMsg = fmt.Sprintf("Cannot convert result to JSON: %v", err)
				logger.Errorf("ArangoDB - %s", errMsg)
			} else {
				result := make(map[string]json.RawMessage)
				if err := json.Unmarshal(res, &result); err != nil {
					errMsg = fmt.Sprintf("Cannot extract from JSON: %v", err)
					logger.Errorf("ArangoDB - %s", errMsg)
				} else {
					if raw, ok := result["result"]; ok && raw != nil {
						return string(raw), nil
					} else {
						return "", nil
					}
				}
			}
		}
	}
	return "", fmt.Errorf(errMsg)
} ```

In your example I understand that I should iterate the cursor, but in this case I should know which struct to use as a result recipient, isn't it?

@charlyjna
Copy link
Author

Ok, I read deeper the cursor readDocument method to understand how it works.
What I did is adding 2 methods:

// Return raw batch result
func (c *cursor) GetRawData() []*RawObject {
    return c.cursorData.Result
}
// A method based on readDocument but only to fetch the next batch
func (c *cursor) Next(ctx context.Context) error {
   ...
}

Now, I can use a generic method to execute a query and return a raw JSON file instead of iterating record by record.

I updated the title because it could be nice to have these methods in standard.

@charlyjna charlyjna changed the title [HELP] Limit 1000 records [FEATURE] Get raw result from cursor and add a Next method to a cursor to iterate results batch by batch Aug 10, 2023
@charlyjna
Copy link
Author

To explain more what I did:

First, I added these 2 methods in cursor_impl.go:

// Return raw batch result
func (c *cursor) GetRawData() []*RawObject {
	return c.cursorData.Result
}
// A method based on readDocument but only to fetch the next batch
func (c *cursor) Next(ctx context.Context, retryBatchID string) error {
	if c.resultIndex < 0 {
		return WithStack(NoMoreDocumentsError{})
	}

	// Force use of initial endpoint
	ctx = WithEndpoint(ctx, c.endpoint)
	if len(c.cursorData.NextBatchID) > 0 {
		c.NextBatchID = c.cursorData.NextBatchID
	}

	// This is required since we are interested if this was a dirty read,
	// but we do not want to trash the users bool reference.
	var wasDirtyRead bool
	fetchCtx := ctx
	if c.allowDirtyReads {
		fetchCtx = WithAllowDirtyReads(ctx, &wasDirtyRead)
	}

	p := path.Join(c.relPath(), c.cursorData.ID)

	// If we have a NextBatchID, use it
	if c.NextBatchID != "" {
		p = path.Join(c.relPath(), c.cursorData.ID, c.NextBatchID)
	}

	// We have to retry the batch instead of fetching the next one
	if retryBatchID != "" {
		p = path.Join(c.relPath(), c.retryData.cursorID, retryBatchID)
	}

	// Update currentBatchID before fetching the next batch (no retry case)
	if c.NextBatchID != "" && retryBatchID == "" {
		c.retryData.currentBatchID = c.NextBatchID
	}

	// Fetch the next batch
	req, err := c.conn.NewRequest("POST", p)
	if err != nil {
		return WithStack(err)
	}

	cs := applyContextSettings(fetchCtx, req)
	resp, err := c.conn.Do(fetchCtx, req)
	if err != nil {
		return WithStack(err)
	}
	if err := resp.CheckStatus(200); err != nil {
		return WithStack(err)
	}
	loadContextResponseValues(cs, resp)
	var data cursorData
	if err := resp.ParseBody("", &data); err != nil {
		return WithStack(err)
	}
	c.cursorData = data
	c.resultIndex = 0
	c.lastReadWasDirty = wasDirtyRead

	if !c.cursorData.HasMore {
		// Out of data
		c.resultIndex = -1
	}

	return nil
}

Then I use this code to query the database and get a result in JSON format:

...
if cur != nil {
	defer (*cur).Close()
	var raw []*driver.RawObject
	for {
		raw = append(raw, (*cur).GetRawData()...)
		if !(*cur).HasMore() {
			break
		}
		if err := (*cursor).Next(context.Background(), ""); err != nil {
			logger.Errorf("Cannot fetch next batch: %v", err)
			break
		}
	}
}
...

Do you think you could add these methods (GetRawData() and Next()) in a next release?

I wrote a small web server to be able to query ArangoDB from any external application using a POST query and get a JSON result. It is very helpful and very fast this way for example with Grafana or PowerBI.

Thank you.

Charly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants