Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[Nvidia] Fix issue: watchdogutil command does not work #186

Closed
wants to merge 1 commit into from

Conversation

Junchao-Mellanox
Copy link
Owner

@Junchao-Mellanox Junchao-Mellanox commented Aug 9, 2023

Why I did it

watchdogutil uses platform API watchdog instance to control/query watchdog status. In Nvidia watchdog status, it caches "armed" status in a object member "WatchdogImplBase.armed". This is not working for CLI infrastructure because each CLI will create a new watchdog instance, the status cached in previous instance will totally lose. Consider following commands:

admin@sonic:~$ sudo watchdogutil arm -s 100      =====> watchdog instance1, armed=True
Watchdog armed for 100 seconds
admin@sonic:~$ sudo watchdogutil status             ======> watchdog instance2, armed=False
Status: Unarmed
admin@sonic:~$ sudo watchdogutil disarm            =======> watchdog instance3, armed=False
Failed to disarm Watchdog
Work item tracking
  • Microsoft ADO (number only):

How I did it

Use sysfs to query watchdog status

How to verify it

Manual test
Unit test

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111
  • 202205
  • 202211
  • 202305

Tested branch (Please provide the tested image version)

Description for the changelog

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

@Junchao-Mellanox Junchao-Mellanox changed the title Fix issue: watchdogutil command does not work Conflicts: platform/mellanox/mlnx-platform-api/sonic_platform/watchdog.py platform/mellanox/mlnx-platform-api/tests/test_watchdog.py Conflicts: platform/mellanox/mlnx-platform-api/sonic_platform/watchdog.py src/dhcpmon [Nvidia] Fix issue: watchdogutil command does not work Aug 9, 2023
Conflicts:
	platform/mellanox/mlnx-platform-api/sonic_platform/watchdog.py
	platform/mellanox/mlnx-platform-api/tests/test_watchdog.py

Conflicts:
	platform/mellanox/mlnx-platform-api/sonic_platform/watchdog.py
	src/dhcpmon
@Junchao-Mellanox
Copy link
Owner Author

Passed sonic_main sonic-net#395

Junchao-Mellanox pushed a commit that referenced this pull request Mar 18, 2024
…lly (sonic-net#18076)

#### Why I did it
src/sonic-gnmi
```
* d56712a - (HEAD -> master, origin/master, origin/HEAD) Update GNMI path schema (#197) (4 days ago) [ganglv]
* 758ec18 - Call flag.Parse() to parse global flags like -logtostderr (#198) (5 days ago) [Zain Budhwani]
* 736e3b4 - Add signal handler to stop gnmi server for when sigterm or sigquit is called (#189) (3 weeks ago) [Zain Budhwani]
* 5b59c57 - Fix sonic string in osversion/build (#190) (4 weeks ago) [Zain Budhwani]
* d8d15c7 - Enable unit tests and code coverage for telemetry.go (#186) (5 weeks ago) [Zain Budhwani]
```
#### How I did it
#### How to verify it
#### Description for the changelog
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants