Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Use query stream and get some broken symbols #171

Closed
AlekseyVerba opened this issue Jul 5, 2023 · 10 comments · Fixed by #204
Closed

Use query stream and get some broken symbols #171

AlekseyVerba opened this issue Jul 5, 2023 · 10 comments · Fixed by #204
Labels
bug Something isn't working

Comments

@AlekseyVerba
Copy link

Describe the bug

I use query stream and get broken symbols. In database data is normal, without any defects. When i get data from stream, i get something strange

i get these rows:
"["ᴺᵁᴸᴸ", "45049", "ᴺᵁᴸᴸ", "ᴺᵁᴸᴸ", "ᴺᵁᴸᴸ", "ᴺᵁᴸᴸ", "ᴺᵁᴸᴸ", "ᴺᵁᴸᴸ", "Сайт", "ᴺᵁᴸᴸ", "ᴺᵁᴸᴸ", "ᴺᵁᴸᴸ", "ᴺ���ᴸᴸ", "ᴺᵁᴸᴸ", "ᴺᵁᴸᴸ", "ᴺᵁᴸᴸ", "ᴺᵁᴸᴸ", "256", "0", "spb.evraz.market", "Каталог товаров", "Пользователь", "\/metalloprokat\/", "0.256", "Время ответа сайта", "ᴺᵁᴸᴸ", "ᴺᵁᴸᴸ"]"

"["A89648DF371528B911EB7CC47A32031D", "45027", "ᴺᵁᴸᴸ", "ᴺᵁᴸᴸ", "ᴺᵁᴸᴸ", "ᴺᵁᴸᴸ", "ᴺᵁᴸᴸ", "ᴺᵁᴸᴸ", "Проведе��ие заказа", "A8D548DF371528B911ED9BDB5E11ABCD", "ᴺᵁᴸᴸ", "ИН00-047642", "ИН00-040371 от 21.02.2022 16:43:08", "ᴺᵁᴸᴸ", "A8B948DF371528B911EC7D1294FB667E", "ᴺᵁᴸᴸ", "ᴺᵁᴸᴸ", "1925", "ᴺᵁᴸᴸ", "ᴺᵁᴸᴸ", "ᴺᵁᴸᴸ", "ᴺᵁᴸᴸ", "ᴺᵁᴸᴸ", "ᴺᵁᴸᴸ", "ᴺᵁᴸᴸ", "ᴺᵁᴸᴸ", "3.154"]"

i tried to use another library and i didn't get these mistakes

Expected behaviour

"["ᴺᵁᴸᴸ", "45049", "ᴺᵁᴸᴸ", "ᴺᵁᴸᴸ", "ᴺᵁᴸᴸ", "ᴺᵁᴸᴸ", "ᴺᵁᴸᴸ", "ᴺᵁᴸᴸ", "Сайт", "ᴺᵁᴸᴸ", "ᴺᵁᴸᴸ", "ᴺᵁᴸᴸ", "ᴺᵁᴸᴸ", "ᴺᵁᴸᴸ", "ᴺᵁᴸᴸ", "ᴺᵁᴸᴸ", "ᴺᵁᴸᴸ", "256", "0", "spb.evraz.market", "Каталог товаров", "Пользователь", "\/metalloprokat\/", "0.256", "Время ответа сайта", "ᴺᵁᴸᴸ", "ᴺᵁᴸᴸ"]"

"["A89648DF371528B911EB7CC47A32031D", "45027", "ᴺᵁᴸᴸ", "ᴺᵁᴸᴸ", "ᴺᵁᴸᴸ", "ᴺᵁᴸᴸ", "ᴺᵁᴸᴸ", "ᴺᵁᴸᴸ", "Проведение заказа", "A8D548DF371528B911ED9BDB5E11ABCD", "ᴺᵁᴸᴸ", "ИН00-047642", "ИН00-040371 от 21.02.2022 16:43:08", "ᴺᵁᴸᴸ", "A8B948DF371528B911EC7D1294FB667E", "ᴺᵁᴸᴸ", "ᴺᵁᴸᴸ", "1925", "ᴺᵁᴸᴸ", "ᴺᵁᴸᴸ", "ᴺᵁᴸᴸ", "ᴺᵁᴸᴸ", "ᴺᵁᴸᴸ", "ᴺᵁᴸᴸ", "ᴺᵁᴸᴸ", "ᴺᵁᴸᴸ", "3.154"]"

Code example

`
const ch_query = await ch.query({
query: queryText,
format: 'JSONCompactStringsEachRow',
});

    const ch_stream = await ch_query.stream();

    const transformStream = new Transform({
        transform(data, encoding, callback) {
            data.forEach((element) => {
                this.push(element.json());
            });
            callback();
        },
        objectMode: true,
    });
    
    transformStream.on('data', data => {
        appendFileSync('text.txt', JSON.stringify(data) + '\n');
    })

`

Configuration

Environment

  • Client version: 0.0.16
  • Language version: v16.18.1
  • OS: Ubuntu 22.04

ClickHouse server

  • ClickHouse Server version: 23
@AlekseyVerba AlekseyVerba added the bug Something isn't working label Jul 5, 2023
@slvrtrn
Copy link
Contributor

slvrtrn commented Jul 5, 2023

Can you please provide table DDL + sample dataset (insert statements for the affected rows)?

@slvrtrn
Copy link
Contributor

slvrtrn commented Jul 5, 2023

Additionally, at a glance, Transform might be unnecessary here.

const ch_query = await ch.query({
  query: queryText,
  format: 'JSONCompactStringsEachRow',
})

ch_query.stream().on('data', (rows: Row[]) => {
  let content = ''
  rows.forEach((row) => {
    content += row.text + os.EOL
  })
  fs.appendFileSync('text.txt', content)
})

@slvrtrn slvrtrn added question Further information is requested and removed bug Something isn't working labels Jul 5, 2023
@AlekseyVerba
Copy link
Author

AlekseyVerba commented Jul 6, 2023

Кроме того, на первый взгляд, Transform здесь может быть не нужен.

const ch_query = await ch.query({
  query: queryText,
  format: 'JSONCompactStringsEachRow',
})

ch_query.stream().on('data', (rows: Row[]) => {
  let content = ''
  rows.forEach((row) => {
    content += row.text + os.EOL
  })
  fs.appendFileSync('text.txt', content)
})

yes, i agree. but i need to pass readable stream to another function which accept stream where passing row by row
i provided an example)

@AlekseyVerba
Copy link
Author

Can you please provide table DDL + sample dataset (insert statements for the affected rows)?

CREATE TABLE IF NOT EXISTS "Факт" ("Документ.Ссылка" Nullable(String) , "Дата" Nullable(String) , "СП.Ссылка" Nullable(String) , "Дата_и_время" Nullable(String) , "Время_выполнения" Nullable(Float32) , "Документ" Nullable(String) , "Операция" Nullable(String) , "Тип_документа" Nullable(String) , "Сущность" Nullable(String) , "ИсходящееПисьмо.Ссылка" Nullable(String) , "Время_отправления_счета" Nullable(String) , "Номер_документа" Nullable(String) , "Номер_документа.Детализированный" Nullable(String) , "Пользователь" Nullable(String) , "РТиУ.Ссылка" Nullable(String) , "Номер_РТиУ" Nullable(String) , "Пользователь_ДТ" Nullable(String) , "Время_ответа_сервера" Nullable(String) , "Время_подключения_к_серверу" Nullable(String) , "Домен_сайта_(регион)" Nullable(String) , "Раздел_сайта" Nullable(String) , "Оператор/Пользователь" Nullable(String) , "Адрес_страницы" Nullable(String) , "Время_ответа_сервера_(сек)" Nullable(Float32) , "Сущность_сайта" Nullable(String) , "Время_отрисовки_страницы" Nullable(String) , "Время_отрисовки_страницы_(сек)" Nullable(Float32) ) ENGINE = MergeTree ORDER BY tuple()

[
"('A89F48DF371528B911EC0A42EFAEC1F8', '44442', 'A5220025906A947311E3A36F1F074EE0', '44442.64730324074', 1.624, 'Заказ клиента ИН00-230541 от 31.08.2021 17:05:19', 'время проведения заказа', 'Заказ клиента', 'Проведение заказа', null, null, 'ИН00-230541', 'ИН00-230541 от 31.08.2021 17:05:19', null, null, null, null, null, null, null, null, null, null, null, null, null, null)",

"('A89F48DF371528B911EC0A42EFAEC1F8', '44446', 'A5220025906A947311E3A36F1F074EE0', '44446.73451388889', 2.079, 'Заказ клиента ИН00-230541 от 31.08.2021 17:05:19', 'время проведения заказа', 'Заказ клиента', 'Проведение заказа', null, null, 'ИН00-230541', 'ИН00-230541 от 31.08.2021 17:05:19', null, null, null, null, null, null, null, null, null, null, null, null, null, null)",

"('A89F48DF371528B911EC0A42EFAEC1F8', '44441', 'A5220025906A947311E3A36F1F074EE0', '44441.4734375', 17.915, 'Заказ клиента ИН00-230541 от 31.08.2021 17:05:19', 'время проведения заказа', 'Заказ клиента', 'Проведение заказа', null, null, 'ИН00-230541', 'ИН00-230541 от 31.08.2021 17:05:19', null, null, null, null, null, null, null, null, null, null, null, null, null, null)",
]

@slvrtrn
Copy link
Contributor

slvrtrn commented Jul 6, 2023

@AlekseyVerba

CREATE TABLE IF NOT EXISTS "Факт" ("Документ.Ссылка" Nullable(String) , "Дата" Nullable(String) , "СП.Ссылка" Nullable(String) , "Дата_и_время" Nullable(String) , "Время_выполнения" Nullable(Float32) , "Документ" Nullable(String) , "Операция" Nullable(String) , "Тип_документа" Nullable(String) , "Сущность" Nullable(String) , "ИсходящееПисьмо.Ссылка" Nullable(String) , "Время_отправления_счета" Nullable(String) , "Номер_документа" Nullable(String) , "Номер_документа.Детализированный" Nullable(String) , "Пользователь" Nullable(String) , "РТиУ.Ссылка" Nullable(String) , "Номер_РТиУ" Nullable(String) , "Пользователь_ДТ" Nullable(String) , "Время_ответа_сервера" Nullable(String) , "Время_подключения_к_серверу" Nullable(String) , "Домен_сайта_(регион)" Nullable(String) , "Раздел_сайта" Nullable(String) , "Оператор/Пользователь" Nullable(String) , "Адрес_страницы" Nullable(String) , "Время_ответа_сервера_(сек)" Nullable(Float32) , "Сущность_сайта" Nullable(String) , "Время_отрисовки_страницы" Nullable(String) , "Время_отрисовки_страницы_(сек)" Nullable(Float32) ) ENGINE = MergeTree ORDER BY tuple()

INSERT INTO "Факт" VALUES ('A89F48DF371528B911EC0A42EFAEC1F8', '44442', 'A5220025906A947311E3A36F1F074EE0', '44442.64730324074', 1.624, 'Заказ клиента ИН00-230541 от 31.08.2021 17:05:19', 'время проведения заказа', 'Заказ клиента', 'Проведение заказа', null, null, 'ИН00-230541', 'ИН00-230541 от 31.08.2021 17:05:19', null, null, null, null, null, null, null, null, null, null, null, null, null, null),
                          ('A89F48DF371528B911EC0A42EFAEC1F8', '44446', 'A5220025906A947311E3A36F1F074EE0', '44446.73451388889', 2.079, 'Заказ клиента ИН00-230541 от 31.08.2021 17:05:19', 'время проведения заказа', 'Заказ клиента', 'Проведение заказа', null, null, 'ИН00-230541', 'ИН00-230541 от 31.08.2021 17:05:19', null, null, null, null, null, null, null, null, null, null, null, null, null, null),
                          ('A89F48DF371528B911EC0A42EFAEC1F8', '44441', 'A5220025906A947311E3A36F1F074EE0', '44441.4734375', 17.915, 'Заказ клиента ИН00-230541 от 31.08.2021 17:05:19', 'время проведения заказа', 'Заказ клиента', 'Проведение заказа', null, null, 'ИН00-230541', 'ИН00-230541 от 31.08.2021 17:05:19', null, null, null, null, null, null, null, null, null, null, null, null, null, null);
    const ch = createClient()
    const ch_query = await ch.query({
      query: 'SELECT * FROM "Факт"',
      format: 'JSONCompactStringsEachRow',
    })
    const ch_stream = await ch_query.stream()

    const transformStream = new Transform({
      transform(data, encoding, callback) {
        data.forEach((element: any) => {
          this.push(element.json())
        })
        callback()
      },
      objectMode: true,
    })

    Stream.pipeline(ch_stream, transformStream, () => {}).on('data', (data) => {
      require('node:fs').appendFileSync('text.txt', JSON.stringify(data) + '\n')
    })

produces

["A89F48DF371528B911EC0A42EFAEC1F8","44442","A5220025906A947311E3A36F1F074EE0","44442.64730324074","1.6240001","Заказ клиента ИН00-230541 от 31.08.2021 17:05:19","время проведения заказа","Заказ клиента","Проведение заказа","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ","ИН00-230541","ИН00-230541 от 31.08.2021 17:05:19","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ"]
["A89F48DF371528B911EC0A42EFAEC1F8","44446","A5220025906A947311E3A36F1F074EE0","44446.73451388889","2.079","Заказ клиента ИН00-230541 от 31.08.2021 17:05:19","время проведения заказа","Заказ клиента","Проведение заказа","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ","ИН00-230541","ИН00-230541 от 31.08.2021 17:05:19","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ"]
["A89F48DF371528B911EC0A42EFAEC1F8","44441","A5220025906A947311E3A36F1F074EE0","44441.4734375","17.915","Заказ клиента ИН00-230541 от 31.08.2021 17:05:19","время проведения заказа","Заказ клиента","Проведение заказа","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ","ИН00-230541","ИН00-230541 от 31.08.2021 17:05:19","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ"]

as text.txt contents on Linux (no broken symbols).

OS: Fedora 38
Node.js: 16.17.0

@AlekseyVerba
Copy link
Author

@AlekseyVerba

CREATE TABLE IF NOT EXISTS "Факт" ("Документ.Ссылка" Nullable(String) , "Дата" Nullable(String) , "СП.Ссылка" Nullable(String) , "Дата_и_время" Nullable(String) , "Время_выполнения" Nullable(Float32) , "Документ" Nullable(String) , "Операция" Nullable(String) , "Тип_документа" Nullable(String) , "Сущность" Nullable(String) , "ИсходящееПисьмо.Ссылка" Nullable(String) , "Время_отправления_счета" Nullable(String) , "Номер_документа" Nullable(String) , "Номер_документа.Детализированный" Nullable(String) , "Пользователь" Nullable(String) , "РТиУ.Ссылка" Nullable(String) , "Номер_РТиУ" Nullable(String) , "Пользователь_ДТ" Nullable(String) , "Время_ответа_сервера" Nullable(String) , "Время_подключения_к_серверу" Nullable(String) , "Домен_сайта_(регион)" Nullable(String) , "Раздел_сайта" Nullable(String) , "Оператор/Пользователь" Nullable(String) , "Адрес_страницы" Nullable(String) , "Время_ответа_сервера_(сек)" Nullable(Float32) , "Сущность_сайта" Nullable(String) , "Время_отрисовки_страницы" Nullable(String) , "Время_отрисовки_страницы_(сек)" Nullable(Float32) ) ENGINE = MergeTree ORDER BY tuple()

INSERT INTO "Факт" VALUES ('A89F48DF371528B911EC0A42EFAEC1F8', '44442', 'A5220025906A947311E3A36F1F074EE0', '44442.64730324074', 1.624, 'Заказ клиента ИН00-230541 от 31.08.2021 17:05:19', 'время проведения заказа', 'Заказ клиента', 'Проведение заказа', null, null, 'ИН00-230541', 'ИН00-230541 от 31.08.2021 17:05:19', null, null, null, null, null, null, null, null, null, null, null, null, null, null),
                          ('A89F48DF371528B911EC0A42EFAEC1F8', '44446', 'A5220025906A947311E3A36F1F074EE0', '44446.73451388889', 2.079, 'Заказ клиента ИН00-230541 от 31.08.2021 17:05:19', 'время проведения заказа', 'Заказ клиента', 'Проведение заказа', null, null, 'ИН00-230541', 'ИН00-230541 от 31.08.2021 17:05:19', null, null, null, null, null, null, null, null, null, null, null, null, null, null),
                          ('A89F48DF371528B911EC0A42EFAEC1F8', '44441', 'A5220025906A947311E3A36F1F074EE0', '44441.4734375', 17.915, 'Заказ клиента ИН00-230541 от 31.08.2021 17:05:19', 'время проведения заказа', 'Заказ клиента', 'Проведение заказа', null, null, 'ИН00-230541', 'ИН00-230541 от 31.08.2021 17:05:19', null, null, null, null, null, null, null, null, null, null, null, null, null, null);
    const ch = createClient()
    const ch_query = await ch.query({
      query: 'SELECT * FROM "Факт"',
      format: 'JSONCompactStringsEachRow',
    })
    const ch_stream = await ch_query.stream()

    const transformStream = new Transform({
      transform(data, encoding, callback) {
        data.forEach((element: any) => {
          this.push(element.json())
        })
        callback()
      },
      objectMode: true,
    })

    Stream.pipeline(ch_stream, transformStream, () => {}).on('data', (data) => {
      require('node:fs').appendFileSync('text.txt', JSON.stringify(data) + '\n')
    })

produces

["A89F48DF371528B911EC0A42EFAEC1F8","44442","A5220025906A947311E3A36F1F074EE0","44442.64730324074","1.6240001","Заказ клиента ИН00-230541 от 31.08.2021 17:05:19","время проведения заказа","Заказ клиента","Проведение заказа","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ","ИН00-230541","ИН00-230541 от 31.08.2021 17:05:19","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ"]
["A89F48DF371528B911EC0A42EFAEC1F8","44446","A5220025906A947311E3A36F1F074EE0","44446.73451388889","2.079","Заказ клиента ИН00-230541 от 31.08.2021 17:05:19","время проведения заказа","Заказ клиента","Проведение заказа","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ","ИН00-230541","ИН00-230541 от 31.08.2021 17:05:19","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ"]
["A89F48DF371528B911EC0A42EFAEC1F8","44441","A5220025906A947311E3A36F1F074EE0","44441.4734375","17.915","Заказ клиента ИН00-230541 от 31.08.2021 17:05:19","время проведения заказа","Заказ клиента","Проведение заказа","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ","ИН00-230541","ИН00-230541 от 31.08.2021 17:05:19","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ","ᴺᵁᴸᴸ"]

as text.txt contents on Linux (no broken symbols).

OS: Fedora 38 Node.js: 16.17.0

it happens by chance. i get 13.000.000 rows.
for example:
for the first time i get row with id 1 good data, another time i get bad row

@slvrtrn
Copy link
Contributor

slvrtrn commented Jul 7, 2023

@AlekseyVerba, can you please contact me in the community Slack? I think we will need the entire dataset to debug this then.

@AlekseyVerba
Copy link
Author

@AlekseyVerba, can you please contact me in the community Slack? I think we will need the entire dataset to debug this then.

How can i get in touch with you there?

@slvrtrn
Copy link
Contributor

slvrtrn commented Oct 9, 2023

@AlekseyVerba, join the community Slack (link) and find me (Serge Klochkov) there.

@slvrtrn slvrtrn added bug Something isn't working and removed question Further information is requested labels Oct 16, 2023
@slvrtrn
Copy link
Contributor

slvrtrn commented Oct 17, 2023

@AlekseyVerba, fixed in 0.2.4.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants