Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

introduce XLAT, native IPv4 for clients #1808

Closed
wants to merge 6 commits into from

Conversation

christf
Copy link
Member

@christf christf commented Sep 4, 2019

This introduces native ipv4 support for clients in a babel-based, ipv6-only network. There is a jool instance expected somewhere on the network for the plat part. This setup is used in Magdeburg and working well.

I do not think the ipv4 connections survive roaming with the current implementation but at least android-clients do not disconnect from the network any more due to missing ipv4 connectivity.

@CodeFetch I do hope you do not disapprove raising a PR partly containing your work. Let me know if you do and we can work something out.

  • provide usage documentation including the backend
  • In our tests we are still seeing performance issues that are not fully understood. Those have to be resolved.
  • ensure nextnode.ipv4 address is defined

@christf christf added 0. type: enhancement The changeset is an enhancement 3. topic: babel Topic: Babel Layer 3 Routing labels Sep 4, 2019
package/gluon-ddhcpd/Makefile Outdated Show resolved Hide resolved
package/gluon-l3roamd/files/etc/init.d/gluon-l3roamd Outdated Show resolved Hide resolved
package/gluon-ddhcpd/files/etc/init.d/gluon-ddhcpd Outdated Show resolved Hide resolved
package/gluon-xlat464-clat/Makefile Outdated Show resolved Hide resolved
package/gluon-xlat464-plat/Makefile Outdated Show resolved Hide resolved
package/gluon-xlat464-plat/check_site.lua Outdated Show resolved Hide resolved
package/gluon-xlat464-plat/files/xlat464plat.sh Outdated Show resolved Hide resolved
package/gluon-xlat464-plat/files/xlat464plat.sh Outdated Show resolved Hide resolved
@christf christf added the 2. status: waiting-on-author Waiting on some action from the author label Sep 7, 2019
@christf christf changed the title introduce XLAT, native IPv4 for clients WIP: introduce XLAT, native IPv4 for clients Sep 7, 2019
@CodeFetch
Copy link
Contributor

@christf Thank you for raising this PR. I think PLAT support should be removed, because it is not usable without N2N/S2N support. Rotanid's remarks regarding indentation and double newlines are correct. I didn't bother at the time of writing it.

@christf
Copy link
Member Author

christf commented Sep 10, 2019

yeah, I will rework it to make it mergable. For now I just wanted to share that there is such a set of packages while still commencing the tests. I am still experiencing performance issues in my test network related to MTU problems. This PR will remain WIP until those are resolved.

@CodeFetch
Copy link
Contributor

@christf Indeed the MTU issue is not handled correctly. From the comments the NAT46 device is intended to be used with a big MTU (as much as the WLAN interface plus IPv6 overhead in our case). But that contradicts the RFC which says for the IPv4 network a very small MTU should be set (IPv6-MTU minus IP-header-size-difference - e.g. 1280 - 60 = 1220). Because the problem is that the resulting IPv6 packets won't get fragmented as the default for IPv6 is "don't fragment". So it needs to happen on the IPv4-side with MSS clamping enabled. The NAT46 modules does not check whether packets are too big at the moment https://github.com/ayourtch/nat46/blob/master/nat46/modules/nat46-core.c#L1843.

Performance-wise the NAT46 module might be heavy as it processes each skb directly instead of using a separate worker which is correct in general, but it can take up much memory on high traffic flows, because it reallocates a bigger skb as far as I remember and it takes time until the old one gets freed. This could be optimized. I think the newer kernels have a function to put in a page at the beginning for the IPv6 header and leave the rest of the data untouched and for unlinking references to the resulting skb. So we could possibly save most of the copying process.

@T-X
Copy link
Contributor

T-X commented Sep 10, 2019

@CodeFetch:

Performance-wise the NAT46 module might be heavy as it processes each skb directly instead of using a separate worker which is correct in general

Is there an accidental mixup in this sentence? It's the other way round, queuing something onto a kworker with queue_work() is a performance killer. You don't want to use that on the fast-path for your payload packet flow. The only reason you might want to queue something on a kworker is if you have some packet that needs a longer time to be processed and therefore should be processed in a sleepable/interruptable handler.

But lifting onto such a handler is heavy.

@CodeFetch
Copy link
Contributor

CodeFetch commented Sep 10, 2019

@T-X Yes, sorry I changed the sentence and didn't read it again. Performance-wise it's ordinarily better, but not memory-performance wise. The module uses skb_copy/reallocates the skbs instead of skb_cloning them and adding/removing the headers with push-pull and removing the references, but this would require a worker which checks on the references on the original one and only starts changing the headers when the old skb has been released by everything else. Do you know what I mean?

Edit:
Maybe one just needs to check on the reference count... I don't know what could still hold a reference when the skb is handed over to the netdev... Thus a worker might not be needed at all. At least for IPv6 to IPv4 translation copying the whole skb seems to be overkill. I ran OOM with this module when using iperf on a 64 MB device.

Edit2: @T-X I assume pskb_expand_head won't give us a real improvement for linear skb's translation from IPv4 to IPv6 as the 60 additional bytes won't be within the margin of the IPv4 max headroom (do you remember where this value is being defined?).
Another option could be to set dev->needed_headroom = 60;, but this will likely also result in a call to pskb_expand_head, but I'm not sure...

@christf
Copy link
Member Author

christf commented Sep 20, 2019

actually outgoing traffic is ok but inbound is problematic. This is not menat to be a high-performance solution. It just has to work barely enough to allow the few remaining clients that refuse to work without ipv4 use the network. When larger scale applications are required, the best performance optimization is to stop the dns server to hand out A records...

@CodeFetch
Copy link
Contributor

@christf I agree. I'd still be happy to hear T-X's point, because I don't want to have the same problems with NAT426, but it's OT.
In the worst case (if rebooting nodes happen often) we can have a second look (maybe there is just a bug we haven't found, yet) or do throttling to keep the nodes stable.

@christf
Copy link
Member Author

christf commented Nov 24, 2019

I have this integrated now into our gluon builds. PMTU discovery works with the additional clat rule.
My TL-WR740 with all of its 400Mhz was able to process 17Mbit through wireguard and the clat module. Hence upgrading this PR.

@christf christf changed the title WIP: introduce XLAT, native IPv4 for clients introduce XLAT, native IPv4 for clients Nov 24, 2019
Copy link
Member

@rotanid rotanid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i noted a few additional things.
also, i noticed that your PR is using different indentations, even inside the same shell script.
Please try to use at most two, one for lua, one for shell and not mix them, maybe one can look at gluon-core stuff to see which one is preferred

documentation is still missing.

package/gluon-ddhcpd/Makefile Outdated Show resolved Hide resolved
package/gluon-xlat464-clat/Makefile Outdated Show resolved Hide resolved
package/gluon-xlat464-clat/Makefile Outdated Show resolved Hide resolved
@rotanid rotanid added this to the 2020.1 milestone Jan 20, 2020
This provides native ipv4 to clients on a network having an ipv6-only
backend, relying on external plat. This allows to use clients in
ipv6-only networks that would otherwise refuse to connect without a
valid ipv4 route.

Limitations:
* External plat must be provided, for example by jool - for example by
  https://github.com/FreifunkMD/jool-docker.git
* This implementation will break ipv4 connections when clients roam.
@christf christf removed the 2. status: waiting-on-author Waiting on some action from the author label Jan 22, 2020
@christf
Copy link
Member Author

christf commented Jan 22, 2020

The following changes were performed:

  • rename of the package, update title
  • fixing indentation, using tabs now as mentined in the developers guide
  • added docs
  • removal of plat references that are not actually used anywhere
  • instead of removing ddhcp init script, ddhcpd is disabled

@lemoer
Copy link
Member

lemoer commented Feb 3, 2020

I am using the Magdeburg firmware and therefore this PR since a couple of weeks and did not see any issues in that while.

@@ -0,0 +1,3 @@
#!/bin/sh

uci set ddhcpd.settings.enabled="0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be merged into 316-ddhcp

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, please check if it was done correctly

docs/package/gluon-464xlat-clat.rst Outdated Show resolved Hide resolved
docs/package/gluon-464xlat-clat.rst Outdated Show resolved Hide resolved
docs/package/gluon-464xlat-clat.rst Outdated Show resolved Hide resolved
@rotanid rotanid added 2. status: waiting-on-author Waiting on some action from the author and removed 2. status: waiting-on-author Waiting on some action from the author labels Mar 14, 2020
Copy link
Contributor

@mweinelt mweinelt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly okay, a few nitpicks and then we can get this merged in time for v2020.2.

Example::

{
clat_range = 'fdff:ffff:ffff::/48',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation


clat_range : mandatory
- infrastructure net (ULA) from which a /96 CLAT prefix will be generated.
- This must be a /48 prefix.
Copy link
Contributor

@mweinelt mweinelt Mar 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A short subclause, explaining why a /48, e.g. how it is partitioned, would go a long way.

end
end
f:close()
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clat_range : mandatory
- infrastructure net (ULA) from which a /96 CLAT prefix will be generated.
- This must be a /48 prefix.
- This can be the same for each site and is pre-registered at https://wiki.freifunk.net/IP-Netze#IPv6 as part of fdff:ffff:ff00::/40
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should reference such a badly maintained and outdated wiki page. And if it's all the same we might also default to some prefix, no?

@@ -0,0 +1,2 @@

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

empty line?


define Package/gluon-ddhcpd
TITLE:=Distributed DHCP Daemon for Gluon
DEPENDS:=+gluon-core +ddhcpd
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does ddhcpd come from?


define Package/gluon-ddhcpd
TITLE:=Distributed DHCP Daemon for Gluon
DEPENDS:=+gluon-core +ddhcpd
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If gluon-ddhcpd depends on mmfd that dependency is missing here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ddhcpd depends on a means to distribute multicast throughout the network. In babel networks this capability is provided by mmfd. in batman networks this is provided by the l2-property of batman. It could also be provided by bier or any other means. There is no way to model that currently and depending on mmfd is not the right thing to do for batman - as such a dependency was not placed.

@@ -0,0 +1,22 @@
#!/usr/bin/lua
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WIth the mmfd rule it's probably appropriate to call the file ddhcpd-firewall.

@@ -0,0 +1,10 @@
#!/usr/bin/lua
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no need to sort this behind the firewall rules, both can be put at position 315.

@@ -1,6 +1,8 @@
need_string_match(in_domain({'node_prefix6'}), '^[%x:]+/64$')
need_string_match(in_domain({'node_client_prefix6'}), '^[%x:]+/64$')

need_string_match(in_domain({'clat_range'}), '^[%x:]+/48$', false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does babel require this? And why is this check more specific that the one in gluon-464xlat-clat? Shouldn't gluon-mesh-babel just depend on gluon-464xlat-clat instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Babel needs to know which routes (from which IP ranges) should be redistributed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the clat functionality is only there, when the clat package is actually included, no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a node does not use the CLAT (whyever) it should still redistribute the route. I agree that the site-check fits better in the clat-package itself. There is no clear solution for this, but I guess nobody will be mixing firmwares with and without clat-support. Based on that it is fine to move it to the clat-package. I don't think gluon-mesh-babel should depend on the clat package.

@rotanid rotanid added the 2. status: waiting-on-author Waiting on some action from the author label Apr 5, 2020
@mweinelt mweinelt mentioned this pull request Apr 11, 2020
2 tasks
@mweinelt
Copy link
Contributor

shellcheck:

+ make lint-sh
Checking contrib/depdot.sh
Checking contrib/lsupgrade.sh
Checking contrib/sign.sh
Checking contrib/sigtest.sh
Checking contrib/actions/install-dependencies.sh
Checking contrib/actions/run-build.sh
Checking package/gluon-464xlat-clat/files/lib/netifd/proto/xlat464clat.sh
package/gluon-464xlat-clat/files/lib/netifd/proto/xlat464clat.sh:82:5: warning: debug is referenced but not assigned. [SC2154]
package/gluon-464xlat-clat/files/lib/netifd/proto/xlat464clat.sh:157:36: warning: local_ip is referenced but not assigned (did you mean 'local_v4'?). [SC2154]
make: *** [Makefile:136: lint-sh] Error 1
script returned exit code 2

@rotanid
Copy link
Member

rotanid commented Apr 19, 2020

Hey @christf , your PR got a review 4 weeks ago and is waiting to be updated :-)

@mweinelt mweinelt modified the milestones: 2020.2, 2020.3 May 2, 2020
@rotanid rotanid removed this from the 2020.3 milestone Aug 8, 2020
@T-X
Copy link
Contributor

T-X commented Jul 10, 2022

The Linux kernel and Babel got support for routing IPv4 packets via IPv6 routes:

Which sounds like a neat idea and overall a lot simpler than XLAT?

@CodeFetch
Copy link
Contributor

@T-X Yes, indeed it looks promising. As a sidenote: I've kind of lost interest in Babel, because there is more future potential in Batman as it is much easier to implement technologies like multipath bonding. Maybe we can have a discussion about it at the end of the year. Btw rhashtables was something to keep in mind for improving Batman performance and possibly fixing the scaling issues.

@AiyionPrime AiyionPrime added the 2. status: merge conflict The merge has a conflict and needs rebasing label Jan 21, 2023
@blocktrron
Copy link
Member

Closed because of #3105

@blocktrron blocktrron closed this Jan 7, 2024
@T-X
Copy link
Contributor

T-X commented Sep 20, 2024

Just in case someone stumbles over this PR again, in case some renewed interest might come up:

Someone at Freifunk Franken has recently implemented SIIT in eBPF, which seems to include code for handling many of the annoying parts, like checksums and fragmentation handling, too: https://git.freifunk-franken.de/jkimmel/siit-bpf
And secondly, there is also work on upstreaming some SIIT implementation into the Linux kernel: https://nlnet.nl/project/IPv6-monostack/

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
0. type: enhancement The changeset is an enhancement 2. status: merge conflict The merge has a conflict and needs rebasing 2. status: waiting-on-author Waiting on some action from the author 3. topic: babel Topic: Babel Layer 3 Routing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants