Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Adding support for null values in Elastic-like bulk requests #2363

Closed
5 tasks done
Nick-S-2018 opened this issue Jul 2, 2024 · 10 comments
Closed
5 tasks done

Adding support for null values in Elastic-like bulk requests #2363

Nick-S-2018 opened this issue Jul 2, 2024 · 10 comments
Assignees

Comments

@Nick-S-2018
Copy link
Collaborator

Nick-S-2018 commented Jul 2, 2024

Bug Description:

As of now, Manticore is unable to process fields with null values that can be passed in _bulk requests. E.g.:

curl -sX POST http://localhost:9308/_bulk 
-H "Content-Type: application/x-ndjson" -d '
{ "index" : { "_index" : "test1" } }
{ "title" : "123"}
'
{"items":[{"index":{"_index":"test1","_type":"doc","_id":"0","_version":1,"result":"created","_shards":{"total":1,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1,"status":201}}],"errors":false,"took":1}

curl localhost:9308/_bulk -H "Content-Type: application/x-ndjson" -d '
{ "index" : { "_index" : "test2" } }
{ "title" : null}
'
{"error":"unsupported value type 'null' in field 'title'"}

We need to add support for them since it's necessary for integration with different Elastic tools, like Logstash.

Manticore Search Version:

Manticore 6.3.3 1de3cb364@24062617 dev

Operating System Version:

Ununtu 22.04 LTS

Have you tried the latest development version?

Yes

Internal Checklist:

To be completed by the assignee. Check off tasks that have been completed or are not applicable.

  • Implementation completed
  • Tests developed
  • Documentation updated
  • Documentation reviewed
  • Changelog updated
@tomatolog
Copy link
Contributor

fixed at 8193943 now HTTP JSON POST requests that populate data accepts null as full-text or string value

@zhangsanhuo
Copy link

fixed at 8193943 now HTTP JSON POST requests that populate data accepts null as full-text or string value

Could you please add support for manticore fields with bigint attributes and timestamp with Null value?

@tomatolog
Copy link
Contributor

could be better to write your suggestions there along with requests examples #2312

@sanikolaev
Copy link
Collaborator

support for manticore fields with bigint attributes

Manticore supports bigint attributes - https://manual.manticoresearch.com/Creating_a_table/Data_types#Big-Integer

Is there any problem with it?

timestamp with Null value

The related issue seems to be #2236

@zhangsanhuo
Copy link

zhangsanhuo commented Jul 3, 2024

could be better to write your suggestions there along with requests examples #2312

Yes, my test data is as follows, create a RT table:

drop table t1;CREATE TABLE t1 (
id bigint,
up_date text,
name text,
@version text,
@timestamp timestamp,
unix_ts_in_secs integer,
number bigint,
pub_date timestamp
) charset_table='non_cjk,cjk' morphology='icu_chinese,stem_en';

logstash synchronization file:

input {
  jdbc {
    type => "t1"
    jdbc_driver_library => "/root/mysql-connector-j-8.0.32/mysql-connector-j-8.0.32.jar"
    jdbc_driver_class => "com.mysql.jdbc.Driver"
    jdbc_connection_string => "jdbc:mysql://localhost:3306/test"
    jdbc_user => "root"
    jdbc_password => "root"
    jdbc_paging_enabled => true
    jdbc_page_size => "10"
    tracking_column => "unix_ts_in_secs"
    #record_last_run => "true"
    record_last_run => "false"
    last_run_metadata_path => "/root/logstash_jdbc_last_run"
    clean_run => "false"
    use_column_value => true
    tracking_column_type => "numeric"
    schedule => "*/5 * * * * *"
    statement => "SELECT *, UNIX_TIMESTAMP(up_date) AS unix_ts_in_secs FROM t1 WHERE (UNIX_TIMESTAMP(up_date) > :sql_last_value AND up_date < NOW()) ORDER BY up_date ASC"
  }
}

output {
  stdout { codec =>  "rubydebug"}
    elasticsearch {
      index => "t1"
      hosts => ["http://localhost:9308"]
    }
}

MySQL table structure and data:

CREATE TABLE `t1` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(255) DEFAULT '',
  `number` varchar(11) DEFAULT NULL,
  `pub_date` datetime DEFAULT current_timestamp(),
  `up_date` datetime DEFAULT current_timestamp() ON UPDATE current_timestamp(),
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=14 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci;

INSERT INTO `test`.`t1` (`id`, `name`, `number`, `pub_date`, `up_date`) VALUES (1, 'kevin1', '1000001', NULL, '2024-07-03 16:47:48');
INSERT INTO `test`.`t1` (`id`, `name`, `number`, `pub_date`, `up_date`) VALUES (2, 'kevin2', NULL, '2024-07-03 16:51:11', '2024-07-03 16:53:18');

I created two examples, one with number(bigint in RT table) as Null value and the other with pub_date(timestamp in RT table) as Null value.

When using logstash to synchronize data to manticore, an error is reported:

[2024-07-03T08:47:50,171][ERROR][logstash.outputs.elasticsearch][main][30a15ecf5e6cbe42f11b334f8da41fd6cfa5267d46f7b11973efecaf11a13318] Encountered a retryable error (will retry with exponential backoff) {:code=>409, :url=>"http://localhost:9308/_bulk", :content_length=>233}
[2024-07-03T08:53:34,394][ERROR][logstash.outputs.elasticsearch][main][30a15ecf5e6cbe42f11b334f8da41fd6cfa5267d46f7b11973efecaf11a13318] Encountered a retryable error (will retry with exponential backoff) {:code=>409, :url=>"http://localhost:9308/_bulk", :content_length=>250}
Data synchronization cannot be performed.

@zhangsanhuo
Copy link

Because in MySQL, the content of a datetime type field cannot be set to an empty string, but it can be set to Null. Therefore, when I try to synchronize the datetime field (part of the content is Null) in the MySQL database to manticore through logstash, an error occurs and the data cannot be synchronized.

@sanikolaev
Copy link
Collaborator

@zhangsanhuo did you close this issue as a duplicate of #2236?

@zhangsanhuo
Copy link

@zhangsanhuo did you close this issue as a duplicate of #2236?

I'm sorry, I don't understand what this means. I don't know much about Githup. Could you explain what it means? Thank you for your patience.

@sanikolaev
Copy link
Collaborator

Nevermind, @zhangsanhuo. My apologies. I thought you had closed this issue and was wondering why, since there seem to be unresolved matters. However, I now see it was initially created for a different topic and was closed after the initial request was addressed. If there's another known problem left can you please create a separate issue about it, so we don't get confused about different topics in the same issue?

@sanikolaev sanikolaev added the rel::upcoming Upcoming release label Jul 5, 2024
@zhangsanhuo
Copy link

Nevermind, @zhangsanhuo. My apologies. I thought you had closed this issue and was wondering why, since there seem to be unresolved matters. However, I now see it was initially created for a different topic and was closed after the initial request was addressed. If there's another known problem left can you please create a separate issue about it, so we don't get confused about different topics in the same issue?

Thanks for your help, I created a new issue of #2380

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

4 participants