Provide more flexibility for initial device configuration #93

Mierdin · 2019-03-20T18:33:27Z

So, the new full vqfx image from @mwiget works by copying the IP address assigned to the container into the bootstrap configuration. This is cool, but is incompatible with the way we're using the NAPALM "replace" strategy to provide final configurations for each stage. This will overwrite the full configuration, which will wipe out this addressing configuration, and render the vqfx unreachable, immediately upon initial configuration.

I've opened this PR to modify Syringe to provide an optional field in each stage definition that allows the lesson author to specify a "merge" strategy if they wish. However, I am not quite happy with this approach. The main reason I went with a static "replace" strategy for all, is to ensure atomicity between stages. No matter what the user does to a configuration, I know that the configuration for that device for that stage, will be applied as-is. What you see is what you get. If we allow merges, this becomes a lot more complicated.

@mwiget I remembered in previous images you've worked on, you put a static IP address (something like 10.0.0.5) on em0, and this makes it possible to use the "replace" strategy, as the em0 configuration will never change anyways. I understand that the current approach laid out in your latest PRs to introduce container-vqfx uses the dynamic IP approach because you're mapping the vnic directly to the container's eth0, so you need to have the same IP address, but I'm wondering if it would be reasonable to go with an approach that allows us to continue to use the replace strategy as before. Thoughts on that? We can continue to use the current approach for all other interfaces, but having a constant em0 configuration would mean we could preserve inter-stage configuration atomicity.

Also FYI I have been pushing changes to both the vqfx image as well as the JSNAPy lesson in https://github.com/nre-learning/antidote/tree/vqfx-full-testing to test this image as well as this PR. I have PTR pointed at that branch so you can see this spun up at https://ptr.labs.networkreliability.engineering/labs/?lessonId=12&lessonStage=1

Signed-off-by: Matt Oswalt <matt@keepingitclassless.net>

Mierdin · 2019-03-20T18:40:40Z

@nre-learning/technical-reviewers

mwiget · 2019-03-20T19:44:12Z

@Mierdin the reason for passing the dynamically assigned IP to em0 is to allow it to work in a generic Docker and Kubernetes environment as if it were any other container.

I can certainly assign a static IP to em0, but how would a config change be pushed over the network to the vQFX? I assume the container itself still gets a dynamic IP on eth0.
In fact, I‘d only need to remove the apply-group openjnpr-container-vqfx statement from the initial config and package a static config instead that enables ssh, netconf and configures em0.

In other words, what if a lab needs to launch multiple vQFX? How will the provisioning work correctly for every vqfx?

For another project, where I needed to overload Junos access with services in the container directly, I ended up doing some iptables trickery to get ssh and netconf connections mapped to Junos, while using the container IP for services running next to Junos. But this has severe limitations when it comes to streaming from Junos or any other service offered via em0.

Re-reading #93, I think I start to understand what you have in mind ;-). If you push an initial config to the vqfx container at runtime, just make sure you don‘t include „apply-group openjnpr-container-vqfx“ and you can configure every aspect (em0, hostname, ssh/netconf etc) to your liking. Powerful thingie, config-groups. @Mierdin would that work? Obviously the assigned IP must be in the same subnet as the one originally assigned to eth0 of the vqfx container. Thoughts?

Mierdin · 2019-03-20T20:13:51Z

I can certainly assign a static IP to em0, but how would a config change be pushed over the network to the vQFX? I assume the container itself still gets a dynamic IP on eth0.
In fact, I‘d only need to remove the apply-group openjnpr-container-vqfx statement from the initial config and package a static config instead that enables ssh, netconf and configures em0. In other words, what if a lab needs to launch multiple vQFX? How will the provisioning work correctly for every vqfx?

This is done today, using Kubernetes jobs to spin up a container running NAPALM to apply a configuration when moving into a stage. What this PR proposes (again, would not be ideal) is to adjust one parameter for that approach to use merge strategy instead of replace, but again I'd like to avoid that if possible. The ideal end state would be a vqfx image that can survive having it's entire config rewritten after initial configuration, without napalm having to be made aware of the "correct" IP address dynamically - ideally it's always the same.

If you push an initial config to the vqfx container at runtime, just make sure you don‘t include „apply-group openjnpr-container-vqfx“ and you can configure every aspect (em0, hostname, ssh/netconf etc) to your liking. Powerful thingie, config-groups. @Mierdin would that work? Obviously the assigned IP must be in the same subnet as the one originally assigned to eth0 of the vqfx container. Thoughts?

I was already not including the apply-group statement, so it sounds like the only thing I was missing was that the em0 address needs to be in the subnet that is in use by the primary CNI plugin (in our case, weave).

However, this raises an interesting problem - if we try to statically assign addresses in the pushed configs, we'd have to coordinate with weave somehow which seems best-case to be pretty complicated. I'd almost not like to even go down that road. One of the cool things about the previous NAT approach is that the vqfx's configured em0 address could be consistently the same, and it didn't matter what the outer container's eth0 was, it would just always map to that dynamically. So I could always use the same em0 address in the pushed configs, and not care about the actual container's IP address.

So it seems to come down to a tradeoff between having simplified networking (eth0 more directly being mapped to em0, and thereby sharing a lot of the configuration details like addressing) which seems to also be more compatible with other things we haven't gotten into yet like streaming telemetry (though we could also build lessons to do that from other interfaces I'd imagine) and the ability to strictly ensure inter-stage atomicity by using the replace strategy.

It seems those two approaches are mutually exclusive.

mwiget · 2019-03-21T08:43:55Z

Yep, sounds like trade offs are unavoidable. I can certainly add the described workaround using a static IP address for em0 with port forwarding from the containers IP, but we still need to ensure the em0 IP address remains unchanged between config updates. To me, it is confusing to “see” a different em0 IP address than what I use to communicate with via automation tools.

Another way to deal with it is making the assigned IP addresses predictable. I always do that with docker-compose ipam settings. @Mierdin have you considered that?

Mierdin · 2019-03-21T22:09:22Z

Actually I had a kind of cool thought - rather than allow merge configuration changes, we could modify the configuration container antidotelabs/napalm to first retrieve the configured address on em0, and then we could replace the static address in all of the configurations in the curriculum with a little jinja snippet that we could replace with that retrieved address at configure-time.

Best of both worlds, since we're able to let kubernetes do the addressing as it does now, but we also continue to enjoy the safety of using the replace strategy.

Agree? Thanks for gaming this out with me.

mwiget · 2019-03-22T06:54:43Z

Sure, makes sense to me. Glad we keep the assigned IP address via kubernetes.

Signed-off-by: Matt Oswalt <matt@keepingitclassless.net>

codecov · 2019-03-23T19:03:15Z

Codecov Report

Merging #93 into master will increase coverage by 0.16%.
The diff coverage is 0%.

@@            Coverage Diff             @@
##           master      #93      +/-   ##
==========================================
+ Coverage   23.46%   23.63%   +0.16%     
==========================================
  Files          18       18              
  Lines        1918     1917       -1     
==========================================
+ Hits          450      453       +3     
+ Misses       1414     1411       -3     
+ Partials       54       53       -1

Impacted Files	Coverage Δ
scheduler/jobs.go	`0% <0%> (ø)`	⬆️
scheduler/namespaces.go	`58.03% <0%> (+2.67%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update da4a81b...c24c81c. Read the comment docs.

Mierdin · 2019-03-23T19:15:47Z

Alright, this is working really well. I made changes to the curriculum and introduced a new image for making configurations using the model described above, in https://github.com/nre-learning/antidote/pull/207.

I've also pushed commits to this PR to ensure Syringe is equipped to use the new configurator image.
Thanks for the discussion - looks like we didn't have to make a hard trade-off after all.

mwiget · 2019-03-23T19:46:11Z

very cool! i really have to get my selfmedicate setup working to try this out.

Signed-off-by: Matt Oswalt <matt@keepingitclassless.net>

Provide merge config strategy option

d1b5b4b

Signed-off-by: Matt Oswalt <matt@keepingitclassless.net>

Mierdin changed the title ~~Provide merge config strategy option~~ WIP - need discussion: Provide merge config strategy option Mar 20, 2019

This was referenced Mar 20, 2019

New Lesson - Automating with Openconfig nre-learning/nrelabs-curriculum#191

Merged

New Lesson - Automating with JET nre-learning/nrelabs-curriculum#173

Merged

mwiget mentioned this pull request Mar 21, 2019

Vqfx-full issues nre-learning/nrelabs-curriculum#206

Open

Mierdin added 2 commits March 23, 2019 11:50

Remove experimental strategy field

fa6a653

Signed-off-by: Matt Oswalt <matt@keepingitclassless.net>

Switch jobs function over to use the new configurator image

0f120be

Signed-off-by: Matt Oswalt <matt@keepingitclassless.net>

Mierdin changed the title ~~WIP - need discussion: Provide merge config strategy option~~ Provide more flexibility for initial device configuration Mar 23, 2019

Mierdin mentioned this pull request Mar 23, 2019

Introduce new "configurator" for configuring devices, and templatize mgmt interface configuration nre-learning/nrelabs-curriculum#207

Merged

Mierdin requested a review from cloudtoad March 23, 2019 19:16

update changelog

c24c81c

Signed-off-by: Matt Oswalt <matt@keepingitclassless.net>

Mierdin merged commit b394dd7 into master Mar 24, 2019

Mierdin deleted the merge-stage-option branch March 24, 2019 07:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide more flexibility for initial device configuration #93

Provide more flexibility for initial device configuration #93

Mierdin commented Mar 20, 2019 •

edited

Loading

Mierdin commented Mar 20, 2019

mwiget commented Mar 20, 2019

Mierdin commented Mar 20, 2019

mwiget commented Mar 21, 2019

Mierdin commented Mar 21, 2019

mwiget commented Mar 22, 2019

codecov bot commented Mar 23, 2019 •

edited

Loading

Mierdin commented Mar 23, 2019

mwiget commented Mar 23, 2019

Provide more flexibility for initial device configuration #93

Provide more flexibility for initial device configuration #93

Conversation

Mierdin commented Mar 20, 2019 • edited Loading

Mierdin commented Mar 20, 2019

mwiget commented Mar 20, 2019

Mierdin commented Mar 20, 2019

mwiget commented Mar 21, 2019

Mierdin commented Mar 21, 2019

mwiget commented Mar 22, 2019

codecov bot commented Mar 23, 2019 • edited Loading

Codecov Report

Mierdin commented Mar 23, 2019

mwiget commented Mar 23, 2019

Mierdin commented Mar 20, 2019 •

edited

Loading

codecov bot commented Mar 23, 2019 •

edited

Loading