Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Add basic HTTP compression #451

Merged
merged 2 commits into from
May 15, 2024

Conversation

mliarakos
Copy link
Contributor

Description

Add basic support for HTTP compression when writing to OpenSearch.

This PR adapts s4ch1n/elasticsearch-hadoop@d932775 to add basic HTTP compression support without upgrading the Apache HttpComponents dependency.

The elasticsearch-hadoop library and its HTTP compression fork use version 2.0 of the Apache License.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Michael Liarakos <mliarakos@gmail.com>
Copy link
Collaborator

@Xtansia Xtansia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mliarakos Thank you for contributing this, could you please add an entry to the CHANGELOG.md?

Signed-off-by: Michael Liarakos <mliarakos@gmail.com>
@mliarakos
Copy link
Contributor Author

@mliarakos Thank you for contributing this, could you please add an entry to the CHANGELOG.md?

Updated.

@mliarakos mliarakos requested a review from Xtansia May 15, 2024 01:46
@Xtansia Xtansia merged commit 33f7f4e into opensearch-project:main May 15, 2024
15 checks passed
@pavelnemirovsky
Copy link

@mliarakos, thank you for contributing. Is there any reason you didn't add compression support for HTTP response but just requests?

@mliarakos
Copy link
Contributor Author

@pavelnemirovsky, response compression is already supported by configuring the library to add the Accept-Encoding: gzip header to requests, like in this test.

@pavelnemirovsky
Copy link

Thank you. I was unable to make it work. I will double-check it.

@pavelnemirovsky
Copy link

@mliarakos I double-checked, and it doesn’t work. It sends the Accept-Encoding header, but the HTTP client receiving the gzip or deflate stream doesn’t decode it properly... here is the log below:

2024-12-06T18:53:22,763 [DEBUG] HeaderProcessor - Added HTTP Headers to method: [X-Opaque-ID: [spark] [user] [localrun] [local-1733503936577]
, User-Agent: opensearch-hadoop/1.3.0 spark/3.1.3
, Accept-Encoding: gzip
, Content-Type: application/json
, Accept: application/json
]
2024-12-06T18:53:22,763 [DEBUG] CommonsHttpTransport - Using regular user provider to wrap rest request
2024-12-06T18:53:22,767 [DEBUG] HttpMethodDirector - Preemptively sending default basic credentials
2024-12-06T18:53:22,770 [DEBUG] HttpMethodDirector - Authenticating with BASIC <any realm>@es-vip.domain.internal:9200
2024-12-06T18:53:22,770 [DEBUG] HttpConnection - Open connection to es-vip.domain.internal:9200
2024-12-06T18:53:22,924 [DEBUG] HttpMethodBase - Adding Host request header
2024-12-06T18:53:23,103 [DEBUG] HttpMethodBase - Resorting to protocol version default close connection policy
2024-12-06T18:53:23,103 [DEBUG] HttpMethodBase - Should NOT close connection, using HTTP/1.1
2024-12-06T18:53:23,103 [DEBUG] HttpConnection - Releasing connection back to connection manager.
Exception in thread "main" org.opensearch.hadoop.OpenSearchHadoopIllegalArgumentException: Cannot detect OpenSearch version - typically this happens if the network/OpenSearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'opensearch.nodes.wan.only'
	at org.opensearch.hadoop.rest.InitializationUtils.discoverClusterInfo(InitializationUtils.java:423)
	at org.opensearch.spark.sql.OpenSearchRelation.cfg$lzycompute(DefaultSource.scala:233)
	at org.opensearch.spark.sql.OpenSearchRelation.cfg(DefaultSource.scala:230)
	at org.opensearch.spark.sql.OpenSearchRelation.lazySchema$lzycompute(DefaultSource.scala:237)
	at org.opensearch.spark.sql.OpenSearchRelation.lazySchema(DefaultSource.scala:237)
	at org.opensearch.spark.sql.OpenSearchRelation.$anonfun$schema$1(DefaultSource.scala:241)
	at scala.Option.getOrElse(Option.scala:189)
	at org.opensearch.spark.sql.OpenSearchRelation.schema(DefaultSource.scala:241)
	at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:449)
	at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:325)
	at org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:307)
	at scala.Option.getOrElse(Option.scala:189)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:307)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:239)
	at com.dmetrics.analytics.stores.Elasticsearch.getDS(Elasticsearch.scala:89)
	at com.dmetrics.analytics.ArticlesETL$.$anonfun$main$4(ArticlesETL.scala:288)
	at scala.util.Try$.apply(Try.scala:213)
	at com.dmetrics.analytics.ArticlesETL$.$anonfun$main$2(ArticlesETL.scala:286)
	at com.dmetrics.analytics.ArticlesETL$.$anonfun$main$2$adapted(ArticlesETL.scala:244)
	at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
	at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
	at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
	at com.dmetrics.analytics.ArticlesETL$.main(ArticlesETL.scala:244)
	at com.dmetrics.analytics.ArticlesETL.main(ArticlesETL.scala)
Caused by: org.opensearch.hadoop.rest.OpenSearchHadoopParsingException: org.opensearch.hadoop.thirdparty.codehaus.jackson.JsonParseException: Illegal character ((CTRL-CHAR, code 31)): only regular white space (\r, \n, \t) is allowed between tokens
 at [Source: org.opensearch.hadoop.thirdparty.apache.commons.httpclient.AutoCloseInputStream@20d1737; line: 1, column: 2]
	at org.opensearch.hadoop.rest.RestClient.parseContent(RestClient.java:195)
	at org.opensearch.hadoop.rest.RestClient.mainInfo(RestClient.java:721)
	at org.opensearch.hadoop.rest.InitializationUtils.discoverClusterInfo(InitializationUtils.java:413)
	... 23 more
Caused by: org.opensearch.hadoop.thirdparty.codehaus.jackson.JsonParseException: Illegal character ((CTRL-CHAR, code 31)): only regular white space (\r, \n, \t) is allowed between tokens
 at [Source: org.opensearch.hadoop.thirdparty.apache.commons.httpclient.AutoCloseInputStream@20d1737; line: 1, column: 2]
	at org.opensearch.hadoop.thirdparty.codehaus.jackson.JsonParser._constructError(JsonParser.java:1291)
	at org.opensearch.hadoop.thirdparty.codehaus.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:385)
	at org.opensearch.hadoop.thirdparty.codehaus.jackson.impl.JsonParserMinimalBase._throwInvalidSpace(JsonParserMinimalBase.java:331)
	at org.opensearch.hadoop.thirdparty.codehaus.jackson.impl.Utf8StreamParser._skipWSOrEnd(Utf8StreamParser.java:1836)
	at org.opensearch.hadoop.thirdparty.codehaus.jackson.impl.Utf8StreamParser.nextToken(Utf8StreamParser.java:275)
	at org.opensearch.hadoop.thirdparty.codehaus.jackson.map.ObjectMapper._initForReading(ObjectMapper.java:2439)
	at org.opensearch.hadoop.thirdparty.codehaus.jackson.map.ObjectMapper._readValue(ObjectMapper.java:2377)
	at org.opensearch.hadoop.thirdparty.codehaus.jackson.map.ObjectMapper.readValue(ObjectMapper.java:1094)
	at org.opensearch.hadoop.rest.RestClient.parseContent(RestClient.java:190)

@pavelnemirovsky
Copy link

@mliarakos any chance for your feedback?

@mliarakos
Copy link
Contributor Author

@pavelnemirovsky, WIP basic compression support for http responses in #543

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants