Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[master] orchagent exits in a multi-asic linecard in Chassis #11064

Closed
judyjoseph opened this issue Jun 7, 2022 · 5 comments
Closed

[master] orchagent exits in a multi-asic linecard in Chassis #11064

judyjoseph opened this issue Jun 7, 2022 · 5 comments
Labels
Chassis 🤖 Modular chassis support Issue for 202205 P0 Priority of the issue

Comments

@judyjoseph
Copy link
Contributor

Description

On booting up of a multi-asic linecard in Chassis, the following error is seen and orchagent exits

Jun  3 19:20:24.075068 str2--lc1-1 ERR syncd0#syncd: [06:00.0] SAI_API_NEXT_HOP_GROUP:brcm_sai_dnx_create_next_hop_group_member:3300 Next-hop member weight should be '1'. passed value:2
Jun  3 19:20:24.075068 str2--lc1-1 ERR syncd0#syncd: [06:00.0] SAI_API_NEXT_HOP_GROUP:brcm_sai_dnx_create_next_hop_group_member:3300 Next-hop member weight should be '1'. passed value:2
Jun  3 19:20:24.075076 str2--lc1-1 ERR syncd0#syncd: :- sendApiResponse: api SAI_COMMON_API_BULK_CREATE failed in syncd mode: SAI_STATUS_FAILURE
Jun  3 19:20:24.075295 str2--lc1-1 ERR swss0#orchagent: :- flush_creating_entries: ObjectBulker.flush create entries failed, number of entries to create: 2, status: SAI_STATUS_FAILURE
Jun  3 19:20:24.075295 str2--lc1-1 ERR swss0#orchagent: :- addNextHopGroup: Failed to create next hop group 50000000011c0 member 0: 0
Jun  3 19:20:24.075793 str2--lc1-1 NOTICE swss0#orchagent: :- addNextHopGroup: Create next hop group [10.0.0.7@Ethernet-IB0,10.0.0.11@Ethernet-IB0](mailto:10.0.0.7@Ethernet-IB0,10.0.0.11@Ethernet-IB0)
Jun  3 19:20:24.076073 str2--lc1-1 ERR syncd0#syncd: [06:00.0] SAI_API_NEXT_HOP_GROUP:brcm_sai_dnx_create_next_hop_group_member:3300 Next-hop member weight should be '1'. passed value:2
Jun  3 19:20:24.076083 str2--lc1-1 ERR syncd0#syncd: [06:00.0] SAI_API_NEXT_HOP_GROUP:brcm_sai_dnx_create_next_hop_group_member:3300 Next-hop member weight should be '1'. passed value:2
Jun  3 19:20:24.076083 str2--lc1-1 ERR syncd0#syncd: :- sendApiResponse: api SAI_COMMON_API_BULK_CREATE failed in syncd mode: SAI_STATUS_FAILURE
Jun  3 19:20:24.076230 str2--lc1-1 ERR swss0#orchagent: :- flush_creating_entries: ObjectBulker.flush create entries failed, number of entries to create: 2, status: SAI_STATUS_FAILURE
Jun  3 19:20:24.076230 str2--lc1-1 ERR swss0#orchagent: :- addNextHopGroup: Failed to create next hop group 50000000011c3 member 0: 0
Jun  3 19:20:24.076230 str2--lc1-1 ERR swss0#orchagent: :- flush_creating_entries: ObjectBulker.flush create entries failed, number of entries to create: 2, status: SAI_STATUS_FAILURE
Jun  3 19:20:24.076230 str2--lc1-1 ERR swss0#orchagent: :- addNextHopGroup: Failed to create next hop group 50000000011c3 member 0: 0
Jun  3 19:20:24.076707 str2--lc1-1 NOTICE swss0#orchagent: :- addNextHopGroup: Create next hop group [10.0.0.7@Ethernet-IB0,10.0.0.11@Ethernet-IB0](mailto:10.0.0.7@Ethernet-IB0,10.0.0.11@Ethernet-IB0)
Jun  3 19:20:24.076971 str2--lc1-1 ERR syncd0#syncd: [06:00.0] SAI_API_NEXT_HOP_GROUP:brcm_sai_dnx_create_next_hop_group_member:3300 Next-hop member weight should be '1'. passed value:2
Jun  3 19:20:24.076971 str2--lc1-1 ERR syncd0#syncd: [06:00.0] SAI_API_NEXT_HOP_GROUP:brcm_sai_dnx_create_next_hop_group_member:3300 Next-hop member weight should be '1'. passed value:2
Jun  3 19:20:24.076984 str2--lc1-1 ERR syncd0#syncd: :- sendApiResponse: api SAI_COMMON_API_BULK_CREATE failed in syncd mode: SAI_STATUS_FAILURE
Jun  3 19:20:24.077132 str2--lc1-1 ERR swss0#orchagent: :- flush_creating_entries: ObjectBulker.flush create entries failed, number of entries to create: 2, status: SAI_STATUS_FAILURE
Jun  3 19:20:24.077132 str2--lc1-1 ERR swss0#orchagent: :- addNextHopGroup: Failed to create next hop group 50000000011c6 member 0: 0
Jun  3 19:20:24.077434 str2--lc1-1 ERR syncd0#syncd: [06:00.0] SAI_API_NEXT_HOP_GROUP:brcm_sai_dnx_create_next_hop_group:3028 Unable to reserve ECMP fec block failed with error -4.
Jun  3 19:20:24.077434 str2--lc1-1 ERR syncd0#syncd: :- sendApiResponse: api SAI_COMMON_API_CREATE failed in syncd mode: SAI_STATUS_INSUFFICIENT_RESOURCES
Jun  3 19:20:24.077487 str2--lc1-1 ERR syncd0#syncd: :- processQuadEvent: attr: SAI_NEXT_HOP_GROUP_ATTR_TYPE: SAI_NEXT_HOP_GROUP_TYPE_DYNAMIC_UNORDERED_ECMP
Jun  3 19:20:24.077576 str2--lc1-1 ERR swss0#orchagent: :- create: create status: SAI_STATUS_INSUFFICIENT_RESOURCES
Jun  3 19:20:24.077589 str2--lc1-1 ERR swss0#orchagent: :- addNextHopGroup: Failed to create next hop group [10.0.0.7@Ethernet-IB0,10.0.0.11@Ethernet-IB0](mailto:10.0.0.7@Ethernet-IB0,10.0.0.11@Ethernet-IB0), rv:-4
Jun  3 19:20:24.077589 str2--lc1-1 ERR swss0#orchagent: :- handleSaiCreateStatus: Encountered failure in create operation, exiting orchagent, SAI API: SAI_API_NEXT_HOP_GROUP, status: SAI_STATUS_INSUFFICIENT_RESOURCES
Jun  3 19:20:24.425794 str2--lc1-1 INFO swss1#supervisord 2022-06-03 19:20:24,425 INFO exited: orchagent (terminated by SIGABRT (core dumped); not expected)
Jun  3 19:20:24.463014 str2--lc1-1 INFO swss0#supervisord 2022-06-03 19:20:24,462 INFO exited: orchagent (terminated by SIGABRT (core dumped); not expected)
Jun  3 19:20:24.998927 str2--lc1-1 NOTICE coredump_gen_handler.py[14245]: Another instance of techsupport running, aborting this. stderr: Accquiring lock failed, PID 14436 is active
Jun  3 19:20:25.431147 str2--lc1-1 INFO swss1#supervisor-proc-exit-listener: Process 'orchagent' exited unexpectedly. Terminating supervisor 'swss'
Jun  3 19:20:25.431662 str2--lc1-1 INFO swss1#supervisord 2022-06-03 19:20:25,431 WARN received SIGTERM indicating exit request

Steps to reproduce the issue:

  1. Use the latest master image after May 28th build.
  2. Boot the multi-asic linecard.

Describe the results you received:

The above errors and OA exit.

Describe the results you expected:

There should not be an OA exit.
Not expecting these errors " SAI_API_NEXT_HOP_GROUP:brcm_sai_dnx_create_next_hop_group_member:3300 Next-hop member weight should be '1'. passed value:2

Output of show version:

Build based on master/29043ff026a815e1fea338759ff05491c48e2f03

Output of show techsupport:

(paste your output here or download and attach the file here )

Additional information you deem important (e.g. issue happens only occasionally):

@judyjoseph judyjoseph added the Chassis 🤖 Modular chassis support label Jun 7, 2022
@mlok-nokia
Copy link
Contributor

mlok-nokia commented Jun 9, 2022

This commit updated frr version which introduced a new attribute "weight" with value 2 to the ROUTE_TABLE entry. But Syncd/BCM expects the weight value is 1. Before the update, there is NO "weight" attribute in the ROUTE_TABLE entry

commit a477dbb
Author: Hasan Naqvi 56742004+hasan-brcm@users.noreply.github.com
Date: Tue May 24 14:47:09 2022 -0700
Frr 8.2 upgrade (#10691)

  "ROUTE_TABLE:192.170.96.0/25": {
    "expireat": 1654787366.084417,
    "ttl": -0.001,
    "type": "hash",
    "value": {
      "ifname": "PortChannel101,Ethernet6",
      "nexthop": "10.0.0.13,10.0.0.17",
      "weight": "2,2"
    }
  },
  "ROUTE_TABLE:192.170.96.128/25": {
    "expireat": 1654787366.082857,
    "ttl": -0.001,
    "type": "hash",
    "value": {
      "ifname": "PortChannel101,Ethernet6",
      "nexthop": "10.0.0.13,10.0.0.17",
      "weight": "2,2"
    }
  },

@rlhui rlhui added the P0 Priority of the issue label Jun 9, 2022
@prsunny
Copy link
Contributor

prsunny commented Jun 9, 2022

Tracking with Broadcom

@hasan-brcm
Copy link
Contributor

hasan-brcm commented Jun 9, 2022

I see the default weight in frr is 1:

root@sonic:/home/admin# vtysh -c"show ip route 10.10.10.10/32"
Routing entry for 10.10.10.10/32
  Known via "bgp", distance 20, metric 0, best
  Last update 00:03:14 ago
  * 30.0.0.2, via Ethernet4, weight 1
  * 30.1.0.2, via Ethernet12, weight 1

But app-db it gets reflected as 2:

root@sonic:/home/admin# sonic-db-cli APPL_DB "hgetall ROUTE_TABLE:10.10.10.10"
nexthop
30.0.0.2,30.1.0.2
ifname
Ethernet4,Ethernet12
weight
2,2
root@sonic:/home/admin#

The issue seems to be due to below code in PR1853
routesync.cpp L#1211

    uint8_t weight = rtnl_route_nh_get_weight(nexthop);
    if (weight)
    {
        result += to_string(weight + 1);

@prsunny
Copy link
Contributor

prsunny commented Jun 9, 2022

Fix - sonic-net/sonic-swss#2320

@lguohan
Copy link
Collaborator

lguohan commented Jun 10, 2022

fix in #11094

@lguohan lguohan closed this as completed Jun 10, 2022
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Chassis 🤖 Modular chassis support Issue for 202205 P0 Priority of the issue
Projects
None yet
Development

No branches or pull requests

6 participants