Skip to content
This repository was archived by the owner on Jul 11, 2019. It is now read-only.

Devise lab provisioner layout data structure for new provisioner. #198

Open
tima opened this issue Sep 27, 2017 · 21 comments
Open

Devise lab provisioner layout data structure for new provisioner. #198

tima opened this issue Sep 27, 2017 · 21 comments

Comments

@tima
Copy link
Collaborator

tima commented Sep 27, 2017

This is part of the refactor of the lab provisioner to create a more flexible lab provisioner to handle the expanding scope of lightbulb content. (See #141)

NOTE: This is a WIP still subject to change. It is also a dependency of #196 and ultimately #197.

Lab Layouts

Issue here: #83

TBD: How to support CentOS AND RHEL here? Different layouts?

  • standard_linux: controller + 3 RHEL nodes
  • standard_linux_ha: controller + 3 RHEL nodes + haproxy
  • standard_linux_mixed: controller + 3 RHEL nodes + 1 Debian node
lightbulb_amis:
	us-east-1:
		some-os-appbundle-vm: ami12312d
 
lightbulb_lab:
	standard_linux:
		lightbulb_node: 
                  instance_size: micro
                  ami_type: foo
                  disk_space: 20
                  group: web
@tima
Copy link
Collaborator Author

tima commented Sep 27, 2017

In the above layout, instead of embedding AMIs for each region use an AMI find search string instead? Thinking yes -- what needs to be define in order for that to work?

@tima
Copy link
Collaborator Author

tima commented Sep 27, 2017

I should also mention some part of this needs to be usable by the #197 and also #180/#177.

@tima
Copy link
Collaborator Author

tima commented Oct 3, 2017

I went back over my original thoughts and have some thought to roughly what #197 will need and came up with this. Feedback appreciated:

lab_layouts:
  {{ lab_layout_name }}:
      {{ inventory_group }}:
          {{ node_name }}: {{ profile }}

aws:
    {{ profile_name }}:
        aws_ami_find: 
            search: foo*
            owner: blah
        aws_size: t2.micro
        aws_storage_gb: 5
azure:
    {{ profile_name }}:
        azure_image:
          offer: centos
          publisher: centos
        azure_size: Standard_A1
        azure_data_disks:
          disk_size_gb: 64
          managed_disk_type: Standard_LRS        

I still have a couple of things I'm not sure about here:

  • Should the cloud providers be nested under a key? If so what? "cloud" seems most obvious, but we need to consider other potential platforms such as openstack or vmware or openshift being introduced. Thinking "provider". Open to
  • This structure only allows for one group per node. We don't need that now and allowing for that is going to make basic layouts messy that I didn't try and support that. Am I missing something?
  • The Azure section is a bit of a guess -- I may be a bit off.

To validate and demo the above structure I tried to create the lab layouts we have in lightbulb to see if this comes together. It's a bit long but necessary to validate this structure and maintaining it.

lab_layouts:
    aws_standard_linux_centos:
        control:
          - ansible: aws_tower_medium
        web:
          - node1: aws_centos_micro
          - node2: aws_centos_micro
          - node3: aws_centos_micro
    aws_standard_linux_ha_centos:
        control:
          - ansible: aws_tower_medium
        web:
          - node1: aws_centos_micro
          - node2: aws_centos_micro
          - node3: aws_centos_micro
        haproxy:
          - haproxy: aws_centos_micro
    aws_standard_linux_mixed:
        control:
          - ansible: aws_tower_medium
        web:
          - node1: aws_centos_micro
          - node2: aws_centos_micro
          - node4: aws_debian_micro

aws:
  aws_centos_micro:
    aws_ami_find: 
        search: foo*
        owner: blah
    aws_size: t2.micro
    aws_storage_gb: 5
  aws_centos_medium:
    aws_ami_find: 
        search: foo*
        owner: blah
    aws_size: t2.medium
    aws_storage_gb: 20
  aws_centos_tower:
    aws_ami_find: 
        search: foo*
        owner: blah
    aws_size: t2.medium
    aws_storage_gb: 20
  aws_rhel_micro:
    aws_ami_find: 
        search: foo*
        owner: blah
    aws_size: t2.micro
    aws_storage_gb: 5
  aws_rhel_medium:
    aws_ami_find: 
        search: foo*
        owner: blah
    aws_size: t2.medium
    aws_storage_gb: 20
azure:
  azure_centos_A1:
    azure_image:
      offer: centos
      publisher: centos
    azure_size: Standard_A1
    azure_data_disks:
      disk_size_gb: 64
      managed_disk_type: Standard_LRS
  azure_centos_A2:
    azure_image:
      offer: centos
      publisher: centos
    azure_size: Standard_A2
    azure_data_disks:
      disk_size_gb: 64
      managed_disk_type: Standard_LRS
    # Ansible Doesn't Publish Tower images to Azure yet

Thoughts?

@samdoran
Copy link
Contributor

samdoran commented Oct 4, 2017

Yes, we should use ec2_ami_find rather than hard coding IDs.

The idea I had for lab topologies is you would specify a cloud platform and a lab topology name. Those two keys would provide everything needed to provision the environments.

In its current state, we have two layers of abstraction: one allows you to list out the nodes in a topology and their name, and those values are used to look up the specific configuration for that node type.

To simplify things, we could just combine these two things into one list of dicts, unless weed need to create arbitrary lab topologies, which is how it works today. I'm more in favor of defining the lab topologies explicitly, then just selection them.

In vars you can have a file for each cloud platform. In tasks you can have a file for each cloud platform.

So tasks/main.yml looks ilke:

- name: Include cloud variables
  include_vars: "{{ cloud_platform }}.yml"

- name: Provision lab in {{ cloud_platform | capitalize }}
  include_tasks: "{{ cloud_platform }}.yml"

Inside each vars file is a dictionary that keys off the topology.

# vars/aws.yml

lab_topologies:
  web:
    - name: tower
      search: 'some/search/sttring*'
      size: t2.medium
      disk_space: 20
   # More hosts here

  multi_os:
    - name: ubuntu
      search: 'some/search/sttring*'
      size: t2.micro
    # More hosts here

  networking:  
    - name: ios-r1
      search: 'some/search/sttring*'
      size: t2.micro
    # More hosts here

This would allow each cloud platform to have its own set of variables and tasks, which will most likely be quite different between platforms.

If we end up having a bunch of redundant data between the variables, we could look at consolodating some of the variables, but I think it'd be best to optimize later and just start with separate vars and tasks for each cloud provider.

@tima
Copy link
Collaborator Author

tima commented Oct 4, 2017

Not sure @samdoran. I agree with separating things and even optimizing later, but I don't want to repeat this exercise again when there's a pretty good indication of what the needs are now and in the near future. The needs for the refactor are being partially driven by the cloud provisioning tasks.

We definitely need to be somewhat flexible to other layouts/topologies with some of the other content I'm seeing planned -- networks, windows, containers etc. We definitely need to support labs in different platforms/clouds -- AWS, vagrant and Azure. I've heard OpenStack and VMware already floated. Who know what else someone will want to provision labs on next. Putting all of that in one role starts to become an unwieldy and inflexible monolith and an anti-pattern as these grow. Some of that is unavoidable but I'd prefer that happen in the playbooks so these roles can be more easily mixed and matched reused. So that is why I'm proposing to go away from having all provisioners handled by one role. Allow for where/how the lab is provisioned to be reasonably independent of the deploy and config.

Wouldn't each node in a file like vars/aws.yml above require have a lot of repetition for each node? If I'm following you and built out your example "web" would have the same definition for except for the name of nodes1-3. You'd repeat that for the multi_os layout plus add one for Debian. Then if you did an HA one you'd have the same as web plus a proxy node. You'd also need another one with Tower preinstalled and another without. Perhaps I've misunderstood though.

@petebowden
Copy link

@tima do you have this work in a branch somewhere? I'd like to pull it and poke around. Could you push it up?

@tima
Copy link
Collaborator Author

tima commented Oct 7, 2017

@petebowden: I made a public gist in my account here for you to poke at.

@petebowden
Copy link

@tima @samdoran

Thinking about this from a roles perspective, I think we need to have a role per provider. Let each cloud provider be responsible for knowing how to create the requested resource. This makes maintenance of each provider independent of the rest of the project, while providing a common interface that each must provider must adhere to.

I like the idea of being able to call a single playbook to create a lab, while providing easy customization through variables; i.e. the provisioning playbook includes other roles based upon the cloud provider selected.

We will include a set of defaults for each provider so that, ideally, a user only needs to provide credentials for the provider and the lab can be provisioned. While still allowing for details such as vpc, storage, cpus or memory, to be be overridden. Documentation will be needed for each provisioner module.

---
# tasks file for /lightbulb/tools/aws_lab_setup/roles/provisioner
- name: Provison with AWS
  include_role:
    name: provision_aws
  when: cloud_environment == "AWS"

- name: Provision with Azure
  include_role:
    name: provison_azure
  when: cloud_environment == "Azure"
---
# tasks file for /lightbulb/tools/aws_lab_setup/roles/provision_aws
- name: Provision basic lab on AWS
/* Do whatever we need to do to provision AWS 
basic_lab:
        control:
          - ansible: tower_medium
        web:
          - node1: centos_micro
          - node2: centos_micro
          - node3: centos_micro
  etc...
*/

- name: Provision networking lab
/* provision a networking lab */

For Azure, GCE, RHV you would create a role similar to above. Each provider provides the implementation and we specify the minimum details needed. We do have the opportunity to decide what variables can be shared across different providers, but thinking about the labs, I'm not sure what they would be. Need help here...

What we need to do is need to enforce what data is returned from the provisioning run, so the rest of the lab can be placed on top. In other words, we need a set of hosts we can then work with to layer on the remainder of the lab components.

Sorry for grammar/typos #friday_night

Thoughts?

@tima
Copy link
Collaborator Author

tima commented Oct 18, 2017

Thanks for your thoughts on this @petebowden. A few of comments...

Thinking about this from a roles perspective, I think we need to have a role per provider. Let each cloud provider be responsible for knowing how to create the requested resource.

I agree with one roles per cloud provider. that is exactly what I'm refactoring the provisioner for. I was too locked in to the way the provisioner is arranged now and should have split cloud provider info (vars) into separate files that would live in the roles. Good catch.

I like the idea of being able to call a single playbook to create a lab, while providing easy customization through variables; i.e. the provisioning playbook includes other roles based upon the cloud provider selected.

Like the idea in theory though I'm not sure if will be reality. I see some communities wanting to create provisioners the main lightbulb repo won't support that is will be hard to hardcode conditional logic. So instead of ...

- name: Provison with AWS
  include_role:
    name: provision_aws
  when: cloud_environment == "AWS"

- name: Provision with Azure
  include_role:
    name: provison_azure
  when: cloud_environment == "Azure"

...we probably should do...

- name: environment is provisioned
  include_role:
    name: "{{ cloud_provisioner_role }}"

We could optionally default to one of the provider.

If I'm understanding the second part of your suggestions, the issue I see is flexibility. For instance I'm almost 100% sure the network automation and windows workshops will need different lab layouts. Who knows what else will come next? A lot of other workshop ideas are floating around. I've been in spots where I needed to bring up a slightly different lab for a one-off demo I was doing and really had to hack the tool. That shouldn't be the case. I should have been able to just define my groups and the node name and machine type along with my cloud credentials and go.

@IPvSean: Any thoughts to add with what you're seeing putting together the network automation program?

@tima
Copy link
Collaborator Author

tima commented Oct 18, 2017

@petebowden I revised my gist of an example cloud provider role layout.

https://gist.github.com/tima/6937100f2e4d069145f5b8a805eb3f1b

Taking a step back from that and thinking outloud here I have a few questions and concerns:

  1. Should "lab layouts" (open to changing that term) live outside of a cloud provider in a "common" role?
  2. If yes to 1, how would a user add their own layout?
  3. I took out the prefixing (i.e. aws_) -- I'm second guessing that move. That would force the layouts to be tied to specific cloud providers making the previous points moot.

Perhaps I'm overthinking things. ¯\_(ツ)_/¯

Thoughts anyone?

@tima
Copy link
Collaborator Author

tima commented Oct 18, 2017

I had a chat offline on this one. (I swear I'm not talking to myself.)

I'm thinking the embedded machines and layouts should be specific by provider and prefixed (best practices!) accordingly. That way a user needing something different can override just the lightbulb_lab_layouts dictionary and reuse the embedded machine definitions or vice versa or override them both via extra vars for something totally custom to them.

I'm going to update my gist now.

@tima
Copy link
Collaborator Author

tima commented Oct 18, 2017

The gist has been updated based on the above: https://gist.github.com/tima/6937100f2e4d069145f5b8a805eb3f1b

@tima
Copy link
Collaborator Author

tima commented Oct 23, 2017

Last call... anything else to add @petebowden or @IPvSean?

@petebowden
Copy link

@tima Still thinking about this one. Traveling this week.

How are we including the different providers? I'd like the ability to allow a user to ask for hosts to be provisioned or to provide their own hosts, should they already have them built.

How do we define different labs? e.g. networking would require different setups compared to the normal intro to tower.

@tima
Copy link
Collaborator Author

tima commented Oct 24, 2017

Understood @petebowden.

How are we including the different providers?

Each provider would be it's own role doing approximately the same thing for their respective environment. The other roles we create for the provisioner tool should work regardless of provider. OS is a different matter which is why I want to limit them as much a possible -- i.e. CentOS7 and RHEL7 for linux servers.

I'd like the ability to allow a user to ask for hosts to be provisioned or to provide their own hosts, should they already have them built.

I'm not sure we should concern ourselves with the latter case however they could pretty easily write their own playbook that doesn't use a provisioner role but does make use of the other roles from lightbulb.

How do we define different labs? e.g. networking would require different setups compared to the normal intro to tower.

Those would be defined under lightbulb_lab_layouts. We'd have some common/known layouts in the role itself but those layouts could be overridden with a moderate amount of effort by a user with a var file when they run the provisioner.

Does that clarify?

@ismc
Copy link

ismc commented Oct 27, 2017

You are encoding the names of the providers into the label int he data structure. That makes it hard to be agnostic. The network provisioner does more as @samdoran suggests with a spec file that specifies the provider and a template. You can deploy the same template across multiple clouds. Your data structure also lacks networks, subnets, routers, VPNs, etc. It seems to assume that everything is in the same network. As an example, here is the network workshop provisioner:

https://github.com/network-automation/ansible-network-workshop

The provisioner passes the template (essentially a common data structure) to a cloud agnostic role that creates the things (e.g. instances, subnets, routers, etc) in the cloud/regions specified by the spec file. It seem that we are pretty close here. You are saying that you want a role per provider. I think that is hard because you end up hard-coding the roles in your playbooks. I use a generic 'cloud-networks' role, for example, to create the networks in the template. I just call the appropriate include file. If you want a role per provider, we could have, for example, the cloud-network role that read the template (aka common data structure) and then call an actual role (e.g. aws-network). If you are amendable to these things, I could contribute the work we did on the network provisioner to the main provisioner.

@petebowden
Copy link

@ismc I like what you're suggesting - it does indeed make it agnostic without needing to do a mess of includes in a role. All we need to do is update a variable file and then add in an additional role for a given provider.

Users could still specify components they've already created, e.g. a VPC, and then let the provider create anything else they need.

@tima - I like this idea. What are your thoughts?

@tima
Copy link
Collaborator Author

tima commented Oct 30, 2017

I have a lot of concerns @petebowden. The idea of making something cloud agnostic sounds good on paper but in practice will be a fools errand. If it were reasonably possible, it would have been built into the core Ansible engine.

Looking at this over time and all the use cases, I realized that we aren't going to be able to make a totally agnostic provisioner. We could go mad trying to create and maintain a generic structure for all providers (multiple cloud) and uses. This type of monolith (presumably implemented as a role) is a slippery slope towards a lot of complexity that is not reflective of Ansible's core principles.

So let's put the idea of a single lab provisioner for all like we sort of have now can be aside -- we still need something that will let a facilitator spin up labs of any size easilyand repeatably. At the same time having playbooks and roles for every workshop type is bound to create a lot of repetition and overlap, nor would it promote collaboration or reuse. That doesn't seem very scalable.

This is how I came to the "middle ground" idea of having a kit of various roles that would be assembled into playbooks with docs for common needs and uses. At the same time those roles should be used to do something new or custom without too much effort.

@ismc said "I think that is hard because you end up hard-coding the roles in your playbooks." Really? A role declaration is that much of a hassle? I don't see it that way at all. The idea of roles was to make automation portable and composable. The introduction of include_role and import_role modules in recent released are pushing that notion further.

This idea of the provisioner being a kit is consistent with the philophy of lightbulb content where it can be adapted for varies uses rather than being a specific "finished product."

@ismc's observations about supporting multiple networks, subnets, routers etc. is a good one. I see this as being a unique requirement of network automation labs -- an exception rather than a rule.

I'm open to suggestions to how that could be worked into a way that works for network automation in a way that supports the other use cases and general design philosophy. A truly cloud agnostic provisioner is a non-starter to me though.

@ismc
Copy link

ismc commented Oct 30, 2017

@tima, you misunderstood (or I miscommunicated). Your data model was too specific. I am all for roles, I use them heavily in the network workshop provisioner. As soon as you put all of the data in a data model that is opinionated to a cloud, however, you have hard coded that cloud. I took the route of a cloud-agnostic data model and a set of roles that translate that data model to the appropriate cloud. I have proven that this works with the network workshop provisioner, which provides a superset of functionally to the current lightbulb provisioner. Also, abstracted data models are a pretty standard idea (e.g. YANG).

Since you are saying that there will not be a single lightbulb provisioner and you are declaring my working, cloud-agnostic roles to be a non-starter, I guess that I will not start on the current lightbulb provisioner :) I will continue to work on the lightbulb network workshop provisioner since it is working and a part of a larger overall project of mine.

@IPvSean
Copy link
Collaborator

IPvSean commented Nov 21, 2017

@tima

This is the layout I came to the conclusion from to match lightbulb but add the networking mode
(here is an example of creating vpc)

- name: Create AWS VPC {{ ec2_name_prefix }}-vpc
  ec2_vpc_net:
    name: "{{ ec2_name_prefix }}-vpc"
    cidr_block: "{{ec2_subnet}}"
    region: "{{ ec2_region }}"
  register: create_vpc
  when: ec2_vpc_id == ""

- name: Create AWS VPC {{ ec2_name_prefix }}-vpc2 (NETWORKING MODE)
  ec2_vpc_net:
    name: "{{ ec2_name_prefix }}-vpc2"
    cidr_block: "{{ ec2_subnet2 }}"
    region: "{{ ec2_region }}"
  register: create_vpc2
  when:
    - ec2_vpc_id2 == ""
    - networking

this is just one task. The alternate way I was thinking is there would be a block for essentials (non-networking) and a block for networking with a when statement for each. I think that might be more confusing for someone reading it... what do you think?

@IPvSean
Copy link
Collaborator

IPvSean commented Nov 21, 2017

I guess a third way would be to include_task....

# for free to subscribe to this conversation on GitHub. Already have an account? #.
Projects
None yet
Development

No branches or pull requests

5 participants