Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[BUG] Async Query API Unable to handle date fields in results #3298

Open
normanj-bitquill opened this issue Feb 4, 2025 · 0 comments
Open
Labels
bug Something isn't working untriaged

Comments

@normanj-bitquill
Copy link
Contributor

What is the bug?
When a query is made with the Async Query API that will contain a (possibly generated) date field in the results, an exception is returned when trying to retrieve the results.

The results are correct in the query_execution_result_[DATASOURCE] index. OpenSearch fails when trying to parse the dates, with an error about the values not using the timestamp format.

{
  "status": 500,
  "error": {
    "type": "SemanticCheckException",
    "reason": "There was internal problem at backend",
    "details": "timestamp:2024-06-16 in unsupported format, please use \u0027yyyy-MM-dd HH:mm:ss[.SSSSSSSSS]\u0027"
  }
}

This was observed with the integ-test docker cluster. Most likely also applies to Glue/S3.

How can one reproduce the bug?
Steps to reproduce the behavior:

  1. Start an OpenSearch environment with an S3/Glue datasource. Could use the integ-test cluster.
  2. Create the S3 table named people using this file https://github.com/Bit-Quill/opensearch-spark/blob/e2e-spark-ppl-tests/e2e-test/src/test/resources/spark/tables/people.parquet
  3. Execute this PPL query:
    source=mys3.default.people  | eval c1 = adddate(@timestamp, 1) | fields c1 | head 10
    
  4. The query will fail.
  5. Take a look at the contents of the query_execution_result_[DATASOURCE] index. The results for the query from (3) is present and looks correct.
    {
      "_index" : "query_execution_result_mys3",
      "_id" : "tUJg0pQBWtm3wdS357WV",
      "_score" : 1.0,
      "_source" : {
        "result" : [
          "{'c1':'2024-06-16'}",
          "{'c1':'2024-06-16'}",
          "{'c1':'2024-06-16'}",
          "{'c1':'2024-06-16'}",
          "{'c1':'2024-06-16'}",
          "{'c1':'2024-06-16'}"
        ],
        "schema" : [
          "{'column_name':'c1','data_type':'date'}"
        ],
        ...
      }
    }
    

What is the expected behavior?
The results are returned properly. The field type in the example is date, so a value of 2024-06-16 should get parsed correctly.

What is your host/environment?

  • integ-test docker cluster from this repository

Do you have any screenshots?
No

Do you have any additional context?
No

@normanj-bitquill normanj-bitquill added bug Something isn't working untriaged labels Feb 4, 2025
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working untriaged
Projects
None yet
Development

No branches or pull requests

1 participant