Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Add high throughput integration test #5655

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Conversation

rdettai
Copy link
Collaborator

@rdettai rdettai commented Jan 28, 2025

Description

This PR is a reuse of the tests and docs proposed in #5644, which itself is not necessary anymore after the the status code was fixed to be 429 when shards need scaling up (#5651).

It also adds a small indication of the number of retries that occurred to the CLI ingest command. This is handy for troubleshooting and shows concretely to users that retries are often necessary.

How was this PR tested?

Integration tests and running the CLI ingest command on the HDFS dataset.

@rdettai rdettai changed the base branch from retry-no-shard to main January 28, 2025 11:10
Comment on lines +93 to +123
pub fn merge(self, other: RestIngestResponse) -> Self {
Self {
num_docs_for_processing: self.num_docs_for_processing + other.num_docs_for_processing,
num_ingested_docs: apply_op(self.num_ingested_docs, other.num_ingested_docs, |a, b| {
a + b
}),
num_rejected_docs: apply_op(self.num_rejected_docs, other.num_rejected_docs, |a, b| {
a + b
}),
parse_failures: apply_op(self.parse_failures, other.parse_failures, |a, b| {
a.into_iter().chain(b).collect()
}),
num_too_many_requests: self.num_too_many_requests,
}
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved this back here as it makes more sense than in the API model because accumulating responses is quite specific to the rest client.

@rdettai rdettai force-pushed the test-client-retries branch 4 times, most recently from de8f2a1 to aa98399 Compare January 30, 2025 11:05
@rdettai rdettai force-pushed the test-client-retries branch 2 times, most recently from aa77d7e to c2069e4 Compare February 6, 2025 20:03
@rdettai rdettai force-pushed the test-client-retries branch from c2069e4 to ce4501f Compare February 6, 2025 20:08
// TODO: when using the default 10MiB batch size, we get persist
// timeouts with code 500 on some lower performance machines (e.g.
// Github runners). We should investigate why this happens exactly.
Some(5_000_000),
Copy link
Collaborator Author

@rdettai rdettai Feb 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@guilload I didn't find a good explanation for why this timeout occur here in the persist

let persist_result = tokio::time::timeout(
PERSIST_REQUEST_TIMEOUT,
ingester.persist(persist_request),
)
.await
.unwrap_or_else(|_| {
let message = format!(
"persist request timed out after {} seconds",
PERSIST_REQUEST_TIMEOUT.as_secs()
);
Err(IngestV2Error::Timeout(message))
});

Persisting 10MB should not take 6 sec, even on a slow system and in debug mode, should it?

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants