Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Reduce the timeout (GET_RESPONSE_TIMEOUT) from 6 minutes to 1 minute. #472

Merged
merged 1 commit into from
Jun 19, 2019

Conversation

renukamanavalan
Copy link
Contributor

No description provided.

@renukamanavalan renukamanavalan self-assigned this Jun 19, 2019
@qiluo-msft qiluo-msft requested a review from kcudnik June 19, 2019 20:56
@@ -43,7 +43,7 @@ void check_notifications_pointers(

// if we don't receive response from syncd in 60 seconds
// there is something wrong and we should fail
#define GET_RESPONSE_TIMEOUT (6*60*1000)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just fyi i remember that this was changed on purpose but i cant remember why ;/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Let us face it again. If we face, I would prefer to fix the other end.

@yxieca
Copy link
Contributor

yxieca commented Jul 11, 2019

Removed from 201811 branch until further validation is done.

lguohan pushed a commit that referenced this pull request May 5, 2021
What I did:
Increase SAI Sync operation timeout from 1 min to 6 min by reverting #472

Why I did:
Issue as mention sonic-net/sonic-buildimage#3832 is seen when there is link/port-channel flaps towards T2  which results in sequence of this operation on T1 topology:-

Link Going down
- Next Hop Member removal from Nexthop Group
- Creation of New Nexthop Group
- All the Routes towards T2 are given SET Operation with new Nexthop Group (6000+ routes)

Link Going up
- Creation of New Nexthop Group
- All the Routes towards T2 are given SET Operation with new Nexthop Group (6000+ routes)

Above Sequence with flaps create many SAI operation of CREATE/SET and if SAI is slow to process them if there in any intermediate Get operation (common GET case being CRM polling at default 5 mins) will result timeout if there is no response in 1 min which will cause next Get operation to fail because of attribute mismatch as it make Request and Reponse are out of sync. 

To fix this:-

- Have increase the timeout to 6 min.  Even 201811 image/branch is having same timeout value. 

- Hard to predict what is good timeout value but based on  below experiment and since 201811 is having 6 min using that value.

How I verify:

- Ran the below script for 800 iteration and orchagent did not got timeout error and no crash. Without fix OA used to crash consistently with most of the time less than 100 iteration of flaps.
- Based on sairedis.rec of the first CRM `GET` operation between `Create/Set` operation in these 800 iteration worst time for GET response was see 4+ min. 
```
#!/bin/bash

j=0
while [ True ]; do
        echo "iteration $j"
        config interface shutdown PortChannel0002
        sleep 2
        config interface startup PortChannel0002
        sleep 2
        j=$((j+1))
done
```


```
admin@str-xxxx-06:/var/log/swss$ sudo zgrep -A 1 "|g|.*AVAILABLE_IPV4_ROUTE" sairedis.rec*
sairedis.rec:2021-05-04.17:40:13.431643|g|SAI_OBJECT_TYPE_SWITCH:oid:0x21000000000000|SAI_SWITCH_ATTR_AVAILABLE_IPV4_ROUTE_ENTRY=1538995696
sairedis.rec:2021-05-04.17:46:18.132115|G|SAI_STATUS_SUCCESS|SAI_SWITCH_ATTR_AVAILABLE_IPV4_ROUTE_ENTRY=52530
sairedis.rec:--
sairedis.rec:2021-05-04.17:46:21.128261|g|SAI_OBJECT_TYPE_SWITCH:oid:0x21000000000000|SAI_SWITCH_ATTR_AVAILABLE_IPV4_ROUTE_ENTRY=1538995696
sairedis.rec:2021-05-04.17:46:31.258628|G|SAI_STATUS_SUCCESS|SAI_SWITCH_ATTR_AVAILABLE_IPV4_ROUTE_ENTRY=52530
sairedis.rec.1:2021-05-04.17:30:13.900135|g|SAI_OBJECT_TYPE_SWITCH:oid:0x21000000000000|SAI_SWITCH_ATTR_AVAILABLE_IPV4_ROUTE_ENTRY=1538995696
sairedis.rec.1:2021-05-04.17:34:47.448092|G|SAI_STATUS_SUCCESS|SAI_SWITCH_ATTR_AVAILABLE_IPV4_ROUTE_ENTRY=52531
sairedis.rec.1:--
sairedis.rec.1:2021-05-04.17:35:12.827247|g|SAI_OBJECT_TYPE_SWITCH:oid:0x21000000000000|SAI_SWITCH_ATTR_AVAILABLE_IPV4_ROUTE_ENTRY=1538995696
sairedis.rec.1:2021-05-04.17:35:32.939798|G|SAI_STATUS_SUCCESS|SAI_SWITCH_ATTR_AVAILABLE_IPV4_ROUTE_ENTRY=52533
sairedis.rec.2.gz:2021-05-04.17:20:13.103322|g|SAI_OBJECT_TYPE_SWITCH:oid:0x21000000000000|SAI_SWITCH_ATTR_AVAILABLE_IPV4_ROUTE_ENTRY=1538995696
sairedis.rec.2.gz:2021-05-04.17:24:47.796720|G|SAI_STATUS_SUCCESS|SAI_SWITCH_ATTR_AVAILABLE_IPV4_ROUTE_ENTRY=52531
sairedis.rec.2.gz:--
sairedis.rec.2.gz:2021-05-04.17:25:15.073981|g|SAI_OBJECT_TYPE_SWITCH:oid:0x21000000000000|SAI_SWITCH_ATTR_AVAILABLE_IPV4_ROUTE_ENTRY=1538995696
sairedis.rec.2.gz:2021-05-04.17:26:08.525235|G|SAI_STATUS_SUCCESS|SAI_SWITCH_ATTR_AVAILABLE_IPV4_ROUTE_ENTRY=52532
```


Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
kktheballer pushed a commit to kktheballer/sonic-sairedis that referenced this pull request Jul 21, 2021
What I did:
Increase SAI Sync operation timeout from 1 min to 6 min by reverting sonic-net#472

Why I did:
Issue as mention sonic-net/sonic-buildimage#3832 is seen when there is link/port-channel flaps towards T2  which results in sequence of this operation on T1 topology:-

Link Going down
- Next Hop Member removal from Nexthop Group
- Creation of New Nexthop Group
- All the Routes towards T2 are given SET Operation with new Nexthop Group (6000+ routes)

Link Going up
- Creation of New Nexthop Group
- All the Routes towards T2 are given SET Operation with new Nexthop Group (6000+ routes)

Above Sequence with flaps create many SAI operation of CREATE/SET and if SAI is slow to process them if there in any intermediate Get operation (common GET case being CRM polling at default 5 mins) will result timeout if there is no response in 1 min which will cause next Get operation to fail because of attribute mismatch as it make Request and Reponse are out of sync. 

To fix this:-

- Have increase the timeout to 6 min.  Even 201811 image/branch is having same timeout value. 

- Hard to predict what is good timeout value but based on  below experiment and since 201811 is having 6 min using that value.

How I verify:

- Ran the below script for 800 iteration and orchagent did not got timeout error and no crash. Without fix OA used to crash consistently with most of the time less than 100 iteration of flaps.
- Based on sairedis.rec of the first CRM `GET` operation between `Create/Set` operation in these 800 iteration worst time for GET response was see 4+ min. 
```
#!/bin/bash

j=0
while [ True ]; do
        echo "iteration $j"
        config interface shutdown PortChannel0002
        sleep 2
        config interface startup PortChannel0002
        sleep 2
        j=$((j+1))
done
```


```
admin@str-xxxx-06:/var/log/swss$ sudo zgrep -A 1 "|g|.*AVAILABLE_IPV4_ROUTE" sairedis.rec*
sairedis.rec:2021-05-04.17:40:13.431643|g|SAI_OBJECT_TYPE_SWITCH:oid:0x21000000000000|SAI_SWITCH_ATTR_AVAILABLE_IPV4_ROUTE_ENTRY=1538995696
sairedis.rec:2021-05-04.17:46:18.132115|G|SAI_STATUS_SUCCESS|SAI_SWITCH_ATTR_AVAILABLE_IPV4_ROUTE_ENTRY=52530
sairedis.rec:--
sairedis.rec:2021-05-04.17:46:21.128261|g|SAI_OBJECT_TYPE_SWITCH:oid:0x21000000000000|SAI_SWITCH_ATTR_AVAILABLE_IPV4_ROUTE_ENTRY=1538995696
sairedis.rec:2021-05-04.17:46:31.258628|G|SAI_STATUS_SUCCESS|SAI_SWITCH_ATTR_AVAILABLE_IPV4_ROUTE_ENTRY=52530
sairedis.rec.1:2021-05-04.17:30:13.900135|g|SAI_OBJECT_TYPE_SWITCH:oid:0x21000000000000|SAI_SWITCH_ATTR_AVAILABLE_IPV4_ROUTE_ENTRY=1538995696
sairedis.rec.1:2021-05-04.17:34:47.448092|G|SAI_STATUS_SUCCESS|SAI_SWITCH_ATTR_AVAILABLE_IPV4_ROUTE_ENTRY=52531
sairedis.rec.1:--
sairedis.rec.1:2021-05-04.17:35:12.827247|g|SAI_OBJECT_TYPE_SWITCH:oid:0x21000000000000|SAI_SWITCH_ATTR_AVAILABLE_IPV4_ROUTE_ENTRY=1538995696
sairedis.rec.1:2021-05-04.17:35:32.939798|G|SAI_STATUS_SUCCESS|SAI_SWITCH_ATTR_AVAILABLE_IPV4_ROUTE_ENTRY=52533
sairedis.rec.2.gz:2021-05-04.17:20:13.103322|g|SAI_OBJECT_TYPE_SWITCH:oid:0x21000000000000|SAI_SWITCH_ATTR_AVAILABLE_IPV4_ROUTE_ENTRY=1538995696
sairedis.rec.2.gz:2021-05-04.17:24:47.796720|G|SAI_STATUS_SUCCESS|SAI_SWITCH_ATTR_AVAILABLE_IPV4_ROUTE_ENTRY=52531
sairedis.rec.2.gz:--
sairedis.rec.2.gz:2021-05-04.17:25:15.073981|g|SAI_OBJECT_TYPE_SWITCH:oid:0x21000000000000|SAI_SWITCH_ATTR_AVAILABLE_IPV4_ROUTE_ENTRY=1538995696
sairedis.rec.2.gz:2021-05-04.17:26:08.525235|G|SAI_STATUS_SUCCESS|SAI_SWITCH_ATTR_AVAILABLE_IPV4_ROUTE_ENTRY=52532
```


Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
pettershao-ragilenetworks pushed a commit to pettershao-ragilenetworks/sonic-sairedis that referenced this pull request Nov 18, 2022
jianyuewu pushed a commit to jianyuewu/sonic-sairedis that referenced this pull request Feb 7, 2025
* [sonic-utilities] managementVRF cli support(l3mdev)

This commit adds CLI support for management VRF using l3dev. mVRF can
be enabled using config vrf add mgmt and deleted using config vrf del mgmt.
Show commands for management VRF are added which displays the linux command
output, will update show command display after concluding what would be the
output for the show commands.
Added cli to configure management interface(eth0), config interface ip eth0
add can be used to configure eth0 ip and config ip eth0 remove is used to
remove eth0 ip.

New cli config/show commands:

config vrf add mgmt
config vrf del mgmt
config interface eth0 ip add ip/mask gatewayIP
config interface eth0 ip remove ip/mask

show mgmt-vrf
show mgmt-vrf route
show mgmt-vrf addresses
show mgmt-vrf interfaces

Signed-off-by: Harish Venkatraman <harish_venkatraman@dell.com>
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants