Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[pull] master from Azure:master #1943

Open
wants to merge 4,267 commits into
base: master
Choose a base branch
from
Open

[pull] master from Azure:master #1943

wants to merge 4,267 commits into from

Conversation

pull[bot]
Copy link

@pull pull bot commented Dec 23, 2021

See Commits and Changes for more details.


Created by pull[bot]

Can you help keep this open source service alive? 💖 Please sponsor : )

mssonicbld and others added 28 commits December 15, 2024 16:01
…atically (#21161)

#### Why I did it
src/sonic-utilities
```
* 200ef363 - (HEAD -> master, origin/master, origin/HEAD) Speed up route_check script (#3678) (32 hours ago) [Deepak Singhal]
* 7dc40ac3 - Fixed the issues with sonic-clear queuecounter for egress queue and voq (#3671) (2 days ago) [saksarav-nokia]
* 72ee4fc1 - [config db] Trim garbage charactor in "DEVICE_METADATA" of config db (#3345) (2 days ago) [wenyiz2021]
* b2ba0825 - [show][interfaces] Add proposal for show interface errors {port} (#3623) (3 days ago) [vdahiya12]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Why I did it
Due to excess whitespace, the table of contents does not render as links in github, leading to navigation issues.

This might cause some minor "pain" for other merge requests due to the formatting differences in the ToC, but it should really only take a second to correct. Better to fix it once and make everyone's life easier than just living with it.

Work item tracking
How I did it
Updated whitespace and a couple other very minor correctness changes for MarkDown. There are still other changes I'd like to do but much lower priority and possibly not worth the disturbance.

How to verify it
View the corrected table of contents here:
https://github.com/bradh352/sonic-buildimage/blob/configuration.md/src/sonic-yang-models/doc/Configuration.md

Compared to original unrendered ToC here:
https://github.com/sonic-net/sonic-buildimage/blob/master/src/sonic-yang-models/doc/Configuration.md

Which release branch to backport (provide reason below if selected)
Not needed

Tested branch (Please provide the tested image version)
master as of 20241206

Description for the changelog
YANG Configuration.md fix table of contents links
Why I did it
Cannot configure unified bgp for vxlan evpn without specifying advertise-all-vpn. The setting appears to have been introduced as part of PR #5142, can be seen it is already honored as an option here:

sonic-buildimage/src/sonic-frr-mgmt-framework/templates/bgpd/bgpd.conf.db.addr_family.evpn.j2

Lines 1 to 3 in 8e0f1c6

 {% if 'advertise-all-vni' in af_val and af_val['advertise-all-vni'] == 'true' %} 
   advertise-all-vni 
 {% endif %} 
Work item tracking
How I did it
Added basic yang rule

How to verify it
Configure

"BGP_GLOBALS_AF": {
        "default|l2vpn_evpn": {
            "advertise-all-vni": "true"
        }
    }
and run config replace.

Tested branch (Please provide the tested image version)
master as of 20241205

Description for the changelog
[yang] bgp address family l2vpn advertise-all-vni
This commit brings PR FRRouting/frr#14810 from FRR mainline to SONiC

SRv6 BGP SID reachability
FRRouting/frr#14810

Signed-off-by: cscarpitta <cscarpit@cisco.com>
This commit brings PR FRRouting/frr#16151 from FRR mainline to SONiC

zebra: display srv6 encapsulation source-address when configured
FRRouting/frr#16151

Signed-off-by: cscarpitta <cscarpit@cisco.com>
This commit brings PR FRRouting/frr#15673 from FRR mainline to SONiC

lib: fix srv6 locator flags propagated to isis
FRRouting/frr#15673

Signed-off-by: cscarpitta <cscarpit@cisco.com>
This commit brings PR FRRouting/frr#15604 from FRR mainline to SONiC

Add support for SRv6 SID Manager
FRRouting/frr#15604

Signed-off-by: cscarpitta <cscarpit@cisco.com>
This commit brings PR FRRouting/frr#15676 from FRR mainline to SONiC

bgpd: Extend BGP to communicate with the SRv6 SID Manager to allocate/release SRv6 SIDs
FRRouting/frr#15676

Signed-off-by: cscarpitta <cscarpit@cisco.com>
Signed-off-by: Anand Mehra anamehra@cisco.com

Release for Cisco 8800 Chassis, 8101

Chassis 8800
Test case fix test_thermal_global_state_db for FC in slot 7
XR to SONiC migration process broken in two steps
Apply SONiC config as part of XR to SONiC migration via minigraph.xml
Fixed BFD staying up on portchannel member down issue

8101
Fix for MIGSMSFT-771 - Link flap with disabled bad link detection
iccpd: fix a bug related to stack overflow in iccpd(update_peerlink_isolate_from_all_csm_lif) when the number of mclag members > 32 or more

change array length for 'mlag_po_buf' to 2048.

Add length check before using 'snprintf'. Note:The return value of the snprintf function is the length of the source string.

Signed-off-by: ccyyrr92 cuiyingru@asterfusion.com
…rm (#21149)

The pipeline build links are pointing to a wrong folder for marvell-teralynx platform after renaming PR (#19829)
…#21146)

Why I did it
Reduce high CPU usage on zebra after performing port toggle on all interfaces simultaneously

How I did it
Apply zebra fpm backpressure patches from FRR mainline to dplane_fpm_sonic:

zebra: Use built in data structure counter (zebra: Use built in data structure counter FRRouting/frr#16221)
Zebra fpm backpressure (Zebra fpm backpressure FRRouting/frr#16220)

Signed-off-by: cscarpitta <cscarpit@cisco.com>
Why I did it
Setting the nexthop-group keep parameter to 1. This will instruct zebra not to save nexthop group for more than 1 second after removal. Without this zebra will keep nexthop group in the system for 180 seconds.
In scaled scenarios when this parameter is not set it resulted in the queue growing so big and crashing zebra due to OOM when there is test on link flapping.

How I did it
Update the zebra template and initialize nexthop-group keep as 1.

How to verify it
Running the scale test with link flapping and ensure no memory increase in zebra.
In our testing, we found that unplugging SFP1 alone resulted in a failure at port 66, while unplugging SFP2 alone resulted in a failure at port 65, which did not match our expectations.

Signed-off-by: philo <philo@micasnetworks.com>
Why I did it
The reboot cause is not properly determined after each reboot. So the reboot history is also not maintained.

How I did it
The failure in determining the reboot cause is due to pcisysfs.py script failure in reading the registers.
The pcisysfs.py script used was using the old python2 format which were failing.
Modified the install scripts to use the latest pcisysfs.py for register read and write.
Supervisord emits warnings due to the use of `stdout_logfile=syslog`
and `stderr_logfile=syslog`.  Replace with the modern configuration
options of `stdout_syslog=true` and `stderr_syslog=true` and set
the log file itself to `NONE` so it doesn't generate a file-based
log.

Warnings corrected look like:
```
2024 Dec  1 15:31:06.467218 sw2 INFO pmon#supervisord 2024-12-01 15:31:04,033 WARN For [program:xcvrd], stderr_logfile=syslog but this is deprecated and will be removed.  Use stderr_syslog=true to enable syslog instead.
```

Signed-off-by: Brad House (@bradh352)
…n with bookworm libasan (#21134)

syncd is linking to libasan v8 during build after the bookwork upgrade #18651 but libasan v6 is installed in the syncd container for the mellanox platform which is causing runtime errors.

Signed-off-by: Andriy Yurkiv <ayurkiv@nvidia.com>
Fixes: #20730

Why I did it
The generated t1 config fails YANG validation, which leads to config setup failure since we enforce YANG validation in config reload.

How I did it
Update config to align with YANG

How to verify it
Run YANG validate on generated config.
…D automatically (#21178)

#### Why I did it
src/sonic-platform-daemons
```
* 3fe8841 - (HEAD -> master, origin/master, origin/HEAD) Added SmartSwitch support in chassisd and enabling chassisd (#467) (9 hours ago) [rameshraghupathy]
* 88d0dd7 - Take non-CMIS xcvrs out of lpmode in SFF Manager (#565) (3 days ago) [Peter Bailey]
```
#### How I did it
#### How to verify it
#### Description for the changelog
…tically (#21193)

#### Why I did it
src/sonic-sairedis
```
* 9fe90f6b - (HEAD -> master, origin/master, origin/HEAD) syncd init: rename marvell to marvell-prestera (#1465) (7 hours ago) [krismarvell]
```
#### How I did it
#### How to verify it
#### Description for the changelog
…ly (#21202)

#### Why I did it
src/sonic-bmp
```
* 1971625 - (HEAD -> master, origin/master, origin/HEAD) Update README.md (18 hours ago) [Feng-msft]
```
#### How I did it
#### How to verify it
#### Description for the changelog
- Why I did it
Extend platform dump to include MST devices information.

- How I did it
Extend platform-dump.sh scrip.

- How to verify it
Run show techsupport command
…NEL_ATTR_DECAP_QOS_DSCP_TO_TC_MAP (#20650)

[QoS] Add tunnel pipe mode support for IPIP Decap mode to use SAI_TUNNEL_ATTR_DECAP_QOS_DSCP_TO_TC_MAP
… automatically (#21217)

#### Why I did it
src/sonic-platform-common
```
* 9ca0f69 - (HEAD -> master, origin/master, origin/HEAD) Update azure pipeline to use Bookworm (#523) (13 hours ago) [Saikrishna Arcot]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Why I did it
Add Yang Models support for ASIC_SENSORS configuration.

Work item tracking
Microsoft ADO (number only):
How I did it
Add Yang Models support for the following ASIC_SENSORs configuration

{
   "ASIC_SENSORS": {
        "ASIC_SENSORS_POLLER_INTERVAL": {
            "interval": "10"
        },
        "ASIC_SENSORS_POLLER_STATUS": {
            "admin_status": "enable"
        }
    }
}
How to verify it
Image Build should be passed.

Signed-off-by: mlok <marty.lok@nokia.com>
…atically (#21223)

#### Why I did it
src/sonic-snmpagent
```
* 6a5c96d - (HEAD -> master, origin/master, origin/HEAD) Fix redis memory leak issue in PhysicalEntityCacheUpdater (#343) (8 hours ago) [Jianquan Ye]
```
#### How I did it
#### How to verify it
#### Description for the changelog
liuh-80 and others added 30 commits February 19, 2025 15:59
Install python3 FIPS packages

Why I did it
Python3 FIPS package not installed:

admin@vlab-01:~$ sudo apt list --installed | grep libpython
libpython3-stdlib/now 3.11.2-1+b1 amd64 [installed,local]
libpython3.11-minimal/now 3.11.2-6+deb12u5 amd64 [installed,local]
libpython3.11-stdlib/now 3.11.2-6+deb12u5 amd64 [installed,local]
libpython3.11/now 3.11.2-6+deb12u5 amd64 [installed,local]

How I did it
Add python3 FIPS package to install list.
Purge all python3 dev package before install python3 FIPS package, because these dev package will break dependency

How to verify it
Pass all UT.

Manually confirm the package installed
…handle link notification faster. (#21771)

What I did:
Change the Orchagent redis pop batch size to 128 to handle link notification faster.

Why I did:
As part of Ixia BGP convergence test https://github.com/sonic-net/sonic-mgmt/blob/master/tests/snappi_tests/multidut/bgp/test_bgp_outbound_uplink_multi_po_flap.py we found that because of SAI programming slowness which is around 1500 Routes/sec. [Takes about approx 40sec+/- to program 60K routes across multiple iteration]
Because of above slowness even if we have Link Notification available Orchagent will not process it immediately as current OA will process 1024 Entries (Route entries in our case) before it can pick Link Notification for processing. Now 1K entries can take about 2 sec+/- and if link notification are little spread out [not back 2 back] we can have batch of 1K entries which accumulate SAI delay of 2 sec.

To optimize Link processing and give more chance to OA to pick Link Notification we reduce OA processing to 128 entries and this helped to reduce convergence time to about overall 2 sec.

Changing OA processing from 1K to 128 entries does not have any impact of Route Programming as SAI slowness seems be tied with sequential processing at 1500 Routes per/sec. However this is helpful in processing Link Notification quicker. For our test it reduce Convergence time from about 12-15sec to 2 sec.

How I verify:

Ran above ixia test in multiple iteration with both 1024 and 128 pop batch size and compared the performance.
---------

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
… LC (#21775)

What/Why I did:
Initially I made the change that Fallback Route [routes from AH on Upstream LC] not to get programmed [mark them as deny] on Downstream LC's as reasoning was that on Downstream LC's can forward them based on Default-route. However that assumption is correct for example we have topology like this:

RH is connected to ASIC0 of Upstream LC
AH is connected to ASIC1 of Upstream LC

Downstream LC will learn route from RH [including default-route] and will only forward to ASIC0 of upstream LC . Above assumption was fine if we always have AH and RH connected to same ASIC of Upstream LC.

How I verify:
UT updated
Manual Verification.
Modify the sonic-mgmt Dockerfile to install the conserver package.
This is needed for running dut_console tests through a conserver
connection.
Why I did it
Previously, the sonic-mgmt image encountered an issue where the SSH configuration was overly permissive, preventing the Docker container from starting successfully. The error message is provided below. This PR addresses and resolves the issue.

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@         WARNING: UNPROTECTED PRIVATE KEY FILE!          @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
Permissions 0644 for '/etc/ssh/ssh_host_rsa_key' are too open.
It is required that your private key files are NOT accessible by others.
This private key will be ignored.
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@         WARNING: UNPROTECTED PRIVATE KEY FILE!          @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
Permissions 0644 for '/etc/ssh/ssh_host_ecdsa_key' are too open.
It is required that your private key files are NOT accessible by others.
This private key will be ignored.
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@         WARNING: UNPROTECTED PRIVATE KEY FILE!          @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
Permissions 0644 for '/etc/ssh/ssh_host_ed25519_key' are too open.
It is required that your private key files are NOT accessible by others.
This private key will be ignored.
sshd: no hostkeys available -- exiting.

ERROR: failed to start SSH service

How I did it
Add a step to reset permissions under the specific folder as #20346, which faced the same issue.

How to verify it
Same as #21184
There is an indentation issue in PR #21757. This PR addresses this issue.
- Why I did it
dmidecode is a package required for arm platforms including Nvidia DPUs. Platform dump in the tech support bundle was missing the dmidecode output. This occurred because dmidecode was not installed on the DPUs.

- How I did it
Added dmidecode to the list of required packages for installation and removed it from the installation path for the amd64 architecture only.

- How to verify it
Verified tech support bundle after the fix, and it now includes the dmidecode data as expected.

Signed-off-by: ram25794 <ssingamala@nvidia.com>
…installation. (#21646)

- Why I did it
To ensure the Linux kernel and MFT drivers' signatures are aligned, load the MFT drivers in the BFB installer from the SONiC image root filesystem

- How I did it
Mount SONiC image rootfs and load drives from the rootfs.

- How to verify it
Compile and install BFB image. DPU NIC FW upgrade should finish successfully during the image installation
Count the number of undeliverable IPinIP packets
* Fixing LC's pmon_daemon_control.json symlink

In #21512 we created
separate directories for the different Wolverine SKUs but the
pmon_daemon_control.json symlink was moved to the non-LC file in
x86_64-arista_common

* Fixing a few more LC symlinks that were pointing to the wrong files
Enable gnmi/telemetry user authorization by config_db.

Why I did it
gnmi/telemetry user authorization flag not used in SONiC and can't config by config_db

How I did it
Update Yang model and update gnmi/telemetry start script.

How to verify it
Pass all UT.
Why I did it
To optimization installer for marvell arm64 to use create and mount partition logic from default_platform.conf.
Given that current default_platform.conf only allows install on same disk as ONIE. Add support for install block device selection.
Add support for scsi block devices.

How I did it
Use implementation from default_platform.conf for create and mount partition and remove the implementation in platform_arm64.conf.
Add option to override install block device in default_platform.conf. Also added logic to select the install block device from platform_arm64.conf.
Added support for selecting scsi disk as install device in platform_arm64.conf. This also needed changes to u-boot env variable.
Changed to UUID based 'root' disk selection to have a generic implementation.
For backward compatibility using the existing functions for 7215_A1 platform.

How to verify it
Verified ONIE and SONIC to SONIC install using sonic-marvell.bin and sonic-marvell-arm64.bin.

Signed-off-by: Pavan Naregundi <pnaregundi@marvell.com>
Why I did it
Changed the clock frequency as per BCM's recommendation

Work item tracking
Microsoft ADO (number only):
How I did it
Changed the clock frequency to 1.6GHZ for both Ramons and J2C+

How to verify it
Verified in our Nokia system testbed that the system is up and passing the traffic without any issues.
Why I did it
Update HWSKU for DB and RD boards to work with latest SAI SDK for Marvell platforms.
Also remove HWSKU's for unsupported platforms.

How to verify it
Loaded sonic image on respective RD and DB boards and verified show interfaces status.

Signed-off-by: Pavan Naregundi <pnaregundi@marvell.com>
Why I did it
Adding support for RADIAN feature for SONiC T2

Work item tracking
Microsoft ADO (number only):30112967
How I did it
Cli commands to add/remove ANCHOR prefix to a PREFIX_LIST table in CONFIG_DB
yang model changes for the new table
PrefixListMgr to handle add/remove of configuration
Templates
add_radian/del_radian : to add or remove an anchor prefix list and aggregate address
How to verify it
Unit tests : config gen, manager and yang model

---------

Signed-off-by: Mukul Chodhary <70460358+Muckthebuck@users.noreply.github.com>
…omatically (#21822)

#### Why I did it
src/sonic-swss-common
```
* 599b0a6 - (HEAD -> master, origin/master, origin/HEAD) c-api: README.md (#974) (29 hours ago) [erer1243]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Why I did it
Reverting part of #21345, since Chassis DB connection from remote host (linecard) cannot use unix socket

How I did it
Continue to use TCP socket for chassis DB connection

How to verify it
Bring up BGP on chassis, which currently fails after change in #21345
Why I did it
Fix preset config to align with YANG

How I did it
Fix config generation based on YANG and add UT to avoid future issue.

How to verify it
UT
Setting the STATE_DB ALL_SERVICE_STATUS|tsa_tsb_service flag first as part of startup_tsa_tsb service, followed by configuring TSA.  And as part of the case, when tsa_ena is False (genuine or due to race condition), we explictly call TSA again to ensure all asics go to TSA state.
Related to PR#16187

Why I did it
When you configure an SNMP agent address with a VRF, the snmpd service will fail to properly start after the configuration has been applied. The snmpd service shows a status of "FATAL."

sudo config snmpagentaddress add 1.1.1.1 -p 161 -v mgmt

docker exec -it snmp supervisorctl status | awk '{print $1, $2}'
dependent-startup RUNNING
rsyslogd RUNNING
snmp-subagent STOPPED
snmpd FATAL
start EXITED
supervisor-proc-exit-listener RUNNING
Work item tracking
Microsoft ADO (number only):
How I did it
Corrected syntax in the jinja template used to generate the snmpd.conf within the SNMP docker container.

How to verify it
sudo config snmpagentaddress add 1.1.1.1 -p 161 -v mgmt

docker exec -it snmp supervisorctl status | awk '{print $1, $2}'
dependent-startup EXITED
rsyslogd RUNNING
snmp-subagent RUNNING
snmpd RUNNING
start EXITED
supervisor-proc-exit-listener RUNNING
* platform support M2-W6520-48C8QC

* trigger compilation

* rebuild

* triggle rebuild

* triggle rebuild

* Update wb_fpga_i2c_bus_device.c
Why I did it
Enabled the docker inram feature for slim image.
It would extract the docker image to ram during the boot, so it would take extra times during boot.

Work item tracking
Microsoft ADO (number only):
31323281

How I did it
Use pzstd which is more efficient tool to compress and decompress the docker file to reduce the boot time.
Currently, we do not modify the "FILESYSTEM_DOCKERFS=dockerfs.tar.gz" in onie-image.conf, so for slim image, we still use dockerfs.tar.gz as file name but actually with zstd compressed.
We tried to find out a way to adjust the file name in onie-image.conf, but it seems not easy to do that.
So we use the file cmd to determine the compressing type in union-mount.j2, then use the related cmd to extract the docker file.
Plan to support zstd for all types of images in the future to unify the docker file image .

How to verify it
Boot swi slim image and normal image, boot mellanox bin image, all boot successfully.
Why I did it
Update for NOKIA 7220 H4-32D and NOKIA 7220 H4 to support breakout mode
Update for NOKIA 7220 H5-64D to achieve 100% pass in OC for T1 topology

How I did it
Added files under ../device/nokia/x86_64-nokia_ixr7220_h4_32d-r0 directory.
Added files under ../device/nokia/x86_64-nokia_ixr7220_h4-r0 directory.
Added files under ../device/nokia/x86_64-nokia_ixr7220_h5_64d-r0 directory.

How to verify it
Make sure the sonic-buildimage is successful
Run this image on x86_64-nokia_ixr7220_h4_32d-r0, x86_64-nokia_ixr7220_h4-r0 and x86_64-nokia_ixr7220_h5_64d-r0, verify all dockers are up and test basic commands like:
show version
show platform summary
show platform syseeprom
show platform fan
show platform psustatus
show platform firmware status
show platform temperature
sudo show system-health detail
show interface status
Run OC test, for T0/T1 topology and have 100% pass on all 3 platforms.
…ation. (#21856)

- Why I did it
Fix the issue with the position calculation of the sensors with discrete indexes.

- How I did it
Pass the base position into create_discrete_thermal function instead of initializing the position with 1.

- How to verify it
Run snmp/test_snmp_phy_entity.py sonic-mgmt test on the platform with discrete sensors.
- Why I did it
Every ACS-.....SKU is expected to have ingress_lossy pool but SKU ACS-SN4280 does not contain it. So removing the ACS SKU to make it compliant.

- How I did it
Removed the symlinks in Mellanox-SN4280-O28 folder and added the required files.

- How to verify it
Generated config using Mellanox-SN4280-O28 SKU and tested on the device

Signed-off-by: ram25794 <ssingamala@nvidia.com>
…#21793)

Why I did it
To update the Broadcom DNX SAI version to version 12.3 and update the kernel modules

Work item tracking
Microsoft ADO (number only): 31039883

How I did it
Build 12.3 SAI debian and update SAI and kernel modules accordingly

How to verify it
Build and run basic sonic-mgmt tests on DNX platform
…in STATE_DB (#21813)

Currently Management interface related data is present in CONFIG_DB and 'oper_status' is present in STATE_DB MGMT_PORT_TABLE.
This PR change is made to ensure that all telemetry data related to management interface is present in STATE_DB.

Signed-off-by: Suvarna Meenakshi <sumeenak@microsoft.com>
Remove PG 3 and 4 for Internal Ports
[docker-orchagent] add -R flag for ring mode in orchagent.sh
Why I did it
In a subsequent PR in the sonic-mgmt repo, I will add RADIUS tests, and there needs to be a consumable RADIUS server in the infrastructure.

How I did it
Added the freeradius package the ptf docker j2 file.

How to verify it
I build the new docker image locally and then ran few succesfull app-topo tasks inside of sonic-mgmt. Also verifed that freeradius is installed.
# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.