Skip to content
This repository has been archived by the owner on Jun 20, 2023. It is now read-only.

SearchScroll now uses POSTs when ScrollID exceeds 1900 characters #126

Merged
merged 2 commits into from
Jun 10, 2014

Conversation

srapp
Copy link
Contributor

@srapp srapp commented Jun 10, 2014

I stumbled upon this when trying to scroll through all of the indices of a large data set. If the scrollId of a SearchScroll is sufficiently long enough such that the generated URI for the GET request is longer than 2000 characters, the ElasticSearch servers refuse to process the request, causing an error. This change inspects the length of the provided scrollId, and if it is excessively long (over 1900 characters), the Action acts as a POST instead of a GET, providing the scrollId in the body of the request.

It would've been simpler to make every SearchScroll a POST by default, but I figured that most users would not encounter this bug, and a GET request does offer a bit more ease when it comes to debugging request logs and such.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.9%) when pulling 5720002 on srapp:master into 48eeefe on searchbox-io:master.

@@ -24,19 +30,34 @@ protected String buildURI() {

@Override
public String getRestMethodName() {
if (scrollId.length() > MAX_SCROLL_ID_LENGTH) {
return "POST";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Single exit point (single return with a local variable) would be nicer imho.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I normally would agree, but in a case like this, we're simply returning one value or another based on a boolean expression. Perhaps a ternary expression would be a better alternative?

return scrollId.length() > MAX_SCROLL_ID_LENGTH ? "POST" : "GET";

That said, if a local variable fits the code style of the project better, that's reason enough for me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, it was just my personal preference to have single exit points wherever as I believe it makes it easier to understand and extend later on. "Wherever" may not be always practical, so yes I'll be happy with the ternary alternative hehe.

@kramer
Copy link
Member

kramer commented Jun 10, 2014

Thanks for pointing that case out and the contribution!
It's pretty clean as it is though adding an integration test case wouldn't hurt 😄 .

@srapp
Copy link
Contributor Author

srapp commented Jun 10, 2014

I appreciate the desire to test our interactions with the ElasticSearch API contract, but I don't think an integration test around this change would be tenable. That would require the test to understand the internals of ElasticSearch and how scrollIds are generated (which may change in the future), and in my experience it has taken a significant amount of data to expose this corner case.

Ease of integration testing would be an argument for either uniformly submitting POSTs instead of GETs or allowing the user to provide some sort of override to SearchScroll's builder.

kramer pushed a commit that referenced this pull request Jun 10, 2014
SearchScroll now uses POSTs when ScrollID exceeds 1900 characters
@kramer kramer merged commit 030d500 into searchbox-io:master Jun 10, 2014
@kramer kramer added this to the 0.1.2 milestone Jul 10, 2014
# for free to subscribe to this conversation on GitHub. Already have an account? #.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants