Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[sflow] system crashed once sflow is enabled and switch has 200G+ interfaces #6793

Open
Hedgehog-Guru opened this issue Feb 16, 2021 · 8 comments
Labels
Triaged this issue has been triaged

Comments

@Hedgehog-Guru
Copy link

Description

If switch has 200G and above interfaces system crash occur after sflow was enabled

Steps to reproduce the issue:

  1. Enable sflow feature
config feature state sflow enabled
  1. Make sure at least one interface is oper-up and has 200G or above speed
config interface speed Ethernet24 200000
show interfaces status Ethernet24
  Interface        Lanes    Speed    MTU    FEC    Alias    Vlan    Oper    Admin             Type    Asym PFC
-----------  -----------  -------  -----  -----  -------  ------  ------  -------  ---------------  ----------
 Ethernet24  24,25,26,27     200G   9100    N/A     etp7  routed    down       up  QSFP28 or later         N/A
  1. Enable sflow
config sflow enable 
  1. Check system health for example by "pgrep orchagent"

Describe the results you received:

System crashed

Describe the results you expected:

Stable run

Output of show version:

SONiC Software Version: SONiC.SONIC.202012.10-d26a4af_Internal
Distribution: Debian 10.7
Kernel: 4.19.0-9-2-amd64
Build commit: d26a4aff
Build date: Thu Feb  4 15:28:36 UTC 2021
Built by: sw-r2d2-bot@r-build-sonic-ci02

Platform: x86_64-mlnx_msn3700-r0
HwSKU: ACS-MSN3700
ASIC: mellanox
ASIC Count: 1
Serial Number: MT1852X03965
Uptime: 17:46:35 up 12 min,  1 user,  load average: 0.07, 0.84, 0.72
[sonic_dump_qa-anconda-test10_20210216_173758.tar.gz](https://github.com/Azure/sonic-buildimage/files/5989890/sonic_dump_qa-anconda-test10_20210216_173758.tar.gz)

Additional information you deem important (e.g. issue happens only occasionally):

sonic_dump_qa-anconda-test10_20210216_173758.tar.gz

@prsunny
Copy link
Contributor

prsunny commented Feb 16, 2021

@padmanarayana , @dgsudharsan , could you please take a look and suggest next steps?

@anshuv-mfst
Copy link

Issue Triage 2/17: Dell team to provide input on the issue, thanks!

@anshuv-mfst anshuv-mfst added the Triaged this issue has been triaged label Feb 17, 2021
@liat-grozovik
Copy link
Collaborator

@padmanarayana kindly reminder

@padmanarayana
Copy link
Contributor

@liat-grozovik : the dump is from an Internal build. Nevertheless, it is very likely that the 200G is failing because there is no entry in either https://github.com/Azure/sonic-swss/blob/288fb40d8ff4ec825645c2fbab1e79f50881a9f2/cfgmgr/sflowmgr.cpp#L13 or https://github.com/Azure/sonic-swss/blob/288fb40d8ff4ec825645c2fbab1e79f50881a9f2/cfgmgr/sflowmgr.h#L14. We'll check and get back.

@GarrickHe
Copy link
Contributor

@Hedgehog-Guru - We don't have a 200G interface. Can we provide a patch and you build and re-test on your end?

Thanks,
Garrick

@liat-grozovik
Copy link
Collaborator

liat-grozovik commented Mar 8, 2021 via email

@vadymhlushko-mlnx
Copy link
Contributor

@GarrickHe kind reminder, is there are any updates?

liat-grozovik pushed a commit to sonic-net/sonic-swss that referenced this issue Mar 29, 2021
- What I did
Added 200G entry into the speed-rate map. Also handled the case which programs empty string into APP-DB and thus leading to a failure of Orchagent.
Default Sampling rate for 200G is set to 20000

- Why I did it
Fix for Issue: sonic-net/sonic-buildimage#6793

- How I verified it
run sflow community test under sonic-mgmt

Co-authored-by: Vivek Reddy Karri <vkarri@nvidia.com>
daall pushed a commit to sonic-net/sonic-swss that referenced this issue Apr 1, 2021
- What I did
Added 200G entry into the speed-rate map. Also handled the case which programs empty string into APP-DB and thus leading to a failure of Orchagent.
Default Sampling rate for 200G is set to 20000

- Why I did it
Fix for Issue: sonic-net/sonic-buildimage#6793

- How I verified it
run sflow community test under sonic-mgmt

Co-authored-by: Vivek Reddy Karri <vkarri@nvidia.com>
@vivekrnv
Copy link
Contributor

This issue can be closed.

raphaelt-nvidia pushed a commit to raphaelt-nvidia/sonic-swss that referenced this issue Oct 5, 2021
- What I did
Added 200G entry into the speed-rate map. Also handled the case which programs empty string into APP-DB and thus leading to a failure of Orchagent.
Default Sampling rate for 200G is set to 20000

- Why I did it
Fix for Issue: sonic-net/sonic-buildimage#6793

- How I verified it
run sflow community test under sonic-mgmt

Co-authored-by: Vivek Reddy Karri <vkarri@nvidia.com>
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Triaged this issue has been triaged
Projects
None yet
Development

No branches or pull requests

8 participants