Skip to content
This repository has been archived by the owner on Nov 24, 2023. It is now read-only.

DM failed to start an incremental task #2210

Open
together-wang opened this issue Oct 12, 2021 · 3 comments
Open

DM failed to start an incremental task #2210

together-wang opened this issue Oct 12, 2021 · 3 comments

Comments

@together-wang
Copy link

together-wang commented Oct 12, 2021

What did you do?

The incremental task gets stuck when it starts, and then fails over time., but it is normal to start full tasks.
Configure incremental tasks and start tasks.

[tidb@node127 dm]$ cat incr.yaml 
---
name: incr
task-mode: incremental #all   # all 即先进行全量同步,全量同步完成后自动进行增量数据同步
target-database:
  host: "172.16.5.215"
  port: 4200
  user: "root"
  password: ""

mysql-instances:       
  -
    source-id: "mysql-replica-01"
    meta:
      binlog-name: mysql-bin.000028
      binlog-pos: 1365415
      binlog-gtid: "6ddb3c98-5637-11eb-baa7-1adaa2de7319:1-70819"
    block-allow-list:  "instance"
    loader-thread: 16
    syncer-thread: 16000

block-allow-list:
  instance:
    do-dbs: ["test2"]
[tidb@node127 dm]$ tiup dmctl  --master-addr 172.16.5.162:8561 start-task ./incr.yaml 
Starting component `dmctl`: /home/tidb/.tiup/components/dmctl/v2.0.7/dmctl/dmctl --master-addr 172.16.5.162:8561 start-task ./incr.yaml
{
    "result": true,
    "msg": "",
    "sources": [
        {
            "result": false,
            "msg": "[code=38032:class=dm-master:scope=internal:level=high], Message: some error occurs in dm-worker: ErrCode:44012 ErrClass:\"schema-tracker\" ErrScope:\"internal\" ErrLevel:\"high\" Message:\"fail to initial unit Sync of subtask incr : failed to create schema tracker\" RawCause:\"context deadline exceeded\" , Workaround: Please execute `query-status` to check status.",
            "source": "mysql-replica-01",
            "worker": "dm-172.16.5.162-8262"
        }
    ]
}

What did you expect to see?

Incremental task started successfully

What did you see instead?

[tidb@node127 dm]$ tiup dmctl  --master-addr 172.16.5.162:8561 start-task ./incr.yaml 
Starting component `dmctl`: /home/tidb/.tiup/components/dmctl/v2.0.7/dmctl/dmctl --master-addr 172.16.5.162:8561 start-task ./incr.yaml
{
    "result": true,
    "msg": "",
    "sources": [
        {
            "result": false,
            "msg": "[code=38032:class=dm-master:scope=internal:level=high], Message: some error occurs in dm-worker: ErrCode:44012 ErrClass:\"schema-tracker\" ErrScope:\"internal\" ErrLevel:\"high\" Message:\"fail to initial unit Sync of subtask incr : failed to create schema tracker\" RawCause:\"context deadline exceeded\" , Workaround: Please execute `query-status` to check status.",
            "source": "mysql-replica-01",
            "worker": "dm-172.16.5.162-8262"
        }
    ]
}

Versions of the cluster

[tidb@node127 dm]$ tiup dm display dm1
Starting component `dm`: /home/tidb/.tiup/components/dm/v1.6.0/tiup-dm display dm1
Cluster type:       dm
Cluster name:       dm1
Cluster version:    v2.0.7
Deploy user:        tidb
SSH type:           builtin
ID                 Role          Host          Ports      OS/Arch       Status     Data Dir                          Deploy Dir
--                 ----          ----          -----      -------       ------     --------                          ----------
172.16.5.162:9093  alertmanager  172.16.5.162  9093/9094  linux/x86_64  Up         /home/dm1-data/alertmanager-9093  /home/dm1-deploy/alertmanager-9093
172.16.5.162:8561  dm-master     172.16.5.162  8561/8591  linux/x86_64  Healthy|L  /home/dm1-data/dm-master-8561     /home/dm1-deploy/dm-master-8561
172.16.5.162:8262  dm-worker     172.16.5.162  8262       linux/x86_64  Bound      /home/dm1-data/dm-worker-8262     /home/dm1-deploy/dm-worker-8262
172.16.5.162:3330  grafana       172.16.5.162  3330       linux/x86_64  Up         -                                 /home/dm1-deploy/grafana-3330
172.16.5.162:9390  prometheus    172.16.5.162  9390       linux/x86_64  Up         /home/dm1-data/prometheus-9390    /home/dm1-deploy/prometheus-9390
Total nodes: 5

current status of DM cluster (execute query-status in dmctl)

[tidb@node127 dm]$ tiup dmctl  --master-addr 172.16.5.162:8561 query-status full1
Starting component `dmctl`: /home/tidb/.tiup/components/dmctl/v2.0.7/dmctl/dmctl --master-addr 172.16.5.162:8561 query-status full1
{
    "result": true,
    "msg": "",
    "sources": [
        {
            "result": true,
            "msg": "",
            "sourceStatus": {
                "source": "mysql-replica-01",
                "worker": "dm-172.16.5.162-8262",
                "result": null,
                "relayStatus": null
            },
            "subTaskStatus": [
                {
                    "name": "full1",
                    "stage": "Finished",
                    "unit": "Load",
                    "result": {
                        "isCanceled": false,
                        "errors": [
                        ],
                        "detail": null
                    },
                    "unresolvedDDLLockID": "",
                    "load": {
                        "finishedBytes": "79",
                        "totalBytes": "79",
                        "progress": "100.00 %",
                        "metaBinlog": "(mysql-bin.000028, 1366178)",
                        "metaBinlogGTID": "6ddb3c98-5637-11eb-baa7-1adaa2de7319:1-70823"
                    }
                }
            ]
        }
    ]
}

[tidb@node127 dm]$ time tiup dmctl  --master-addr 172.16.5.162:8561 query-status incr
Starting component `dmctl`: /home/tidb/.tiup/components/dmctl/v2.0.7/dmctl/dmctl --master-addr 172.16.5.162:8561 query-status incr
{
    "result": true,
    "msg": "",
    "sources": [
        {
            "result": false,
            "msg": "[code=38008:class=dm-master:scope=internal:level=high], Message: grpc request error, RawCause: rpc error: code = DeadlineExceeded desc = context deadline exceeded",
            "sourceStatus": {
                "source": "mysql-replica-01",
                "worker": "",
                "result": null,
                "relayStatus": null
            },
            "subTaskStatus": [
            ]
        }
    ]
}

real    0m30.181s
user    0m0.161s
sys     0m0.085s
@lance6716
Copy link
Collaborator

don't know why schema tracker is stuck. could you provide a goroutine stack of DM worker?

@lance6716 lance6716 added the type/bug This issue is a bug report label Oct 12, 2021
@together-wang
Copy link
Author

归档.zip
dm-worker and dm-master profile

[tidb@node127 dm]$ time  tiup dmctl  --master-addr 172.16.5.162:8561 start-task ./incr.yaml 
Starting component `dmctl`: /home/tidb/.tiup/components/dmctl/v2.0.7/dmctl/dmctl --master-addr 172.16.5.162:8561 start-task ./incr.yaml
{
    "result": true,
    "msg": "",
    "sources": [
        {
            "result": false,
            "msg": "[code=38032:class=dm-master:scope=internal:level=high], Message: some error occurs in dm-worker: ErrCode:44012 ErrClass:\"schema-tracker\" ErrScope:\"internal\" ErrLevel:\"high\" Message:\"fail to initial unit Sync of subtask incr1 : failed to create schema tracker\" RawCause:\"context deadline exceeded\" , Workaround: Please execute `query-status` to check status.",
            "source": "mysql-replica-01",
            "worker": "dm-172.16.5.162-8262"
        }
    ]
}

real    2m10.356s
user    0m0.185s
sys     0m0.082s

@lichunzhu
Copy link
Contributor

Could you please help grab a dm goroutine text using curl http://127.0.0.1:<worker-port>/debug/pprof/goroutine\?debug\=2?

@lance6716 lance6716 removed the type/bug This issue is a bug report label Oct 14, 2021
# for free to subscribe to this conversation on GitHub. Already have an account? #.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants