Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Extra RAM being used by BGP process ? #465

Open
netixx opened this issue Apr 17, 2024 · 2 comments
Open

Extra RAM being used by BGP process ? #465

netixx opened this issue Apr 17, 2024 · 2 comments

Comments

@netixx
Copy link
Contributor

netixx commented Apr 17, 2024

Describe the bug
I am running the BGP server with around 20 peers, each with around 1M routes.

I am seeing high RAM usage. Running a pprof heap dump, I get the following flamegraph:
image

It looks to me like some resources are not released when the routes are processed by the filters ?

Steps to Reproduce
Run the router and check for RAM allocation.

Expected behavior
Only the RIB component should use a lot of RAM

Configuration used

b.AddPeer(server.PeerConfig{
			LocalAS: 16276,
			PeerAS: 16276,
			RouterID: addr.ToUint32(),
			PeerAddress: ip.Ptr(),
			LocalAddress: locAddr.Ptr(),
			AdminEnabled: true,
			VRF: defaultVRF,
			Passive: true,
			AdvertiseIPv4MultiProtocol: true,
			IPv4: &server.AddressFamilyConfig{
				AddPathRecv: true,
				ImportFilterChain: filter.NewAcceptAllFilterChain(),
				ExportFilterChain: filter.NewDrainFilterChain(),
			},
			IPv6: &server.AddressFamilyConfig{
				AddPathRecv: true,
				ImportFilterChain: filter.NewAcceptAllFilterChain(),
				ExportFilterChain: filter.NewDrainFilterChain(),
			},

**Additional context **

We are running add-path with both IPv4 and IPv6 AFIs and unicast SAFI.

@taktv6
Copy link
Member

taktv6 commented Apr 18, 2024

Hi, thanks for reaching out.
I'm very curious now: How many prefixes/routes are you sending over to the process?
We're well aware our BGP memory footprint is anything but very efficient at the moment. We had plans to improve that but didn't find the time yet to fix it.

@netixx
Copy link
Contributor Author

netixx commented Apr 18, 2024

At the time of the heap dump, I had around 29M path in the BGP table (that is bio_bgp_route_received_count).
Receiving between 500 and 700 updates per second from 21 peers (around 30 updates/second per peer) - from bio_bgp_update_received_count.
Peers each accounting for 1.35M to 1.4M routes.

What troubles me is that we can see on the right side the RIB with a lot of small objects, which is expected.

But on the left side, there seem to be even more RAM held by github.com/bio-routing/bio-rd/protocols/bgp/server.(*fsmAddressFamily).updates.

I don't understand why this functions holds that much memory, since mostly it should end up pushing the route to the routing table (either adjRibIn or locRIB).

Next thing I guess to optimise RAM, is to use a copy on write system for paths between the adjRibIn and locRIB, only storing modified values, instead of always copying the path.

Let me know if I can be of some help in that regard :)

Side note, I also get a lot of CPU use for garbage collector, which could mean that there are more allocs going on that we want:
image

For additional reference, here is the "alloc" graph:
image

In another project I am looking at, they are using https://github.com/kentik/patricia (in particular https://github.com/kentik/patricia/tree/main/generics_tree) for RIB storage, which seems really efficient (example here: https://github.com/akvorado/akvorado/blob/main/inlet/routing/provider/bmp/rib.go), especially in terms of garbage collection.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants