Skip to content

Latest commit

 

History

History
569 lines (464 loc) · 20.4 KB

concourse_2.md

File metadata and controls

569 lines (464 loc) · 20.4 KB
author categories date draft short title
cunnie
BOSH
Concourse
2015-10-24 13:52:48 -0700
true
How to create Concourse CI workers manually using VMware Fusion VMs
The World's Smallest Concourse CI Server

We create a VM with VMware Fusion. We use a configuration similar to Amazon's m3.large (i.e. 8GiB RAM, 2 vCPU), for that is the canonical size of a Concourse worker.

  • Ubuntu 14.04.3
  • 2 CPUs
  • 8192 MB RAM
  • 30 GB Disk

(in our case we set the VM's network interface to bridge on the ethernet, hard-code the MAC address (02:00:da:da:b0:b0), and add an entry in our DHCP server)

0.0 Configure VMware Fusion VM

VMware Fusion:

  • Add → New...
  • Select Create a custom virtual machine, click Continue
  • Choose Operating System: Linux → Fedora 64-bit; click Continue
  • click Continue
  • click Customize Settings
    • click Save (location)
    • Processors & Memory
      • 2 processor cores
      • 8192 MB RAM
      • Advanced Options
        • checked: Enable hypervisor applications in this virtual machine
    • click Show All
    • Hard Disk (SCSI)
      • Disk size: 30.00 GB
      • click Apply
      • click Show All
    • CD/DVD (IDE)
      • click the drop down
      • select Choose a disc or disc image...
      • browse to the Fedora ISO (e.g. Fedora-Server-DVD-x86_64-22.iso)
      • click Open
      • close the window
  • click the "▶" (play) button

0.1 Install Ubuntu Server 15.10

  • English
  • Install Ubuntu Server
  • English
  • United States
  • No (detect keyboard layout)
  • English (US)
  • English (US) (keyboard layout)
  • hostname: ci.nono.com
  • new user: cunnie
  • username: cunnie
  • password choose-a-good-password
  • re-enter password
  • No (encrypt homedir)
  • Yes (time zone is correct)
  • Guided - use entire disk (not fond of LVM for single-disk systems)
  • SCSI33 (0,0,0) (sda)...
  • Yes (write changes to disk
  • press Enter (no HTTP proxy)
  • press Enter (no automatic updates)
  • press Spacebar to select OpenSSH server
  • press Enter (install the GRUB boot loader)
  • press Enter (to reboot)
ssh cunnie@ci.nono.com
sudo apt-get update
sudo apt-get upgrade
sudo apt-get install -y open-vm-tools golang
sudo tee /etc/profile.d/gopath.sh <<-EOF
export GOPATH=~/go
EOF
# log out & log back in again
exit
ssh cunnie@ci.nono.com

We're following the Garden Linux instructions, but modifying them slightly because we believe we know better:

sudo apt-get install -y btrfs-tools

We follow the instuctions for setting up the backing store:

sudo su -
backing_store=/opt/garden/btrfs_backing_store
mkdir -p $(dirname $backing_store)
loopback_device=/dev/btrfs_loop
mount_point=/opt/garden/graph

if [ ! -d $mount_point ]
then
    dd if=/dev/zero of=$backing_store bs=1M count=3000  # Here we are writing 3GB. You can adjust this value accordingly.
    mknod $loopback_device b 7 200
    losetup $loopback_device $backing_store
    mkfs.btrfs $backing_store
fi

if cat /proc/mounts | grep $mount_point
then
    echo "btrfs volume already mounted"
else
    echo "mounting btrfs volume"
    mkdir -p $mount_point
    mount -t btrfs $loopback_device $mount_point
fi

Some more modifications:

sudo apt-get -y install docker.io
mkdir -p $GOPATH/src
go get github.com/cloudfoundry-incubator/garden-linux
cd $GOPATH/src/github.com/cloudfoundry-incubator/garden-linux # assuming your $GOPATH has only one entry
make
go build -a -tags daemon -o out/garden-linux

uh-oh

The program 'make' can be found in the following packages:
 * make
 * make-guile
Try: apt-get install <selected package>
# and..,
package github.com/docker/docker/autogen/dockerversion: cannot find package "github.com/docker/docker/autogen/dockerversion" in any of:
	/usr/lib/go/src/github.com/docker/docker/autogen/dockerversion (from $GOROOT)
	/root/go/src/github.com/docker/docker/autogen/dockerversion (from $GOPATH)
package github.com/docker/docker/pkg/transport: cannot find package "github.com/docker/docker/pkg/transport" in any of:
	/usr/lib/go/src/github.com/docker/docker/pkg/transport (from $GOROOT)
	/root/go/src/github.com/docker/docker/pkg/transport (from $GOPATH)
apt-get install -y make docker.io
mkdir $GOPATH
go get -d github.com/docker/docker
cd $GOPATH/src/github.com/docker/docker
git checkout v1.7.1
#time make -j 2 binary # ~10 minutes
#time make -j 4 all # 742m21.107s
mkdir -p $GOPATH/src
go get github.com/cloudfoundry-incubator/garden-linux
cd $GOPATH/src/github.com/cloudfoundry-incubator/garden-linux # assuming your $GOPATH has only one entry
make
go build -a -tags daemon -o out/garden-linux

Docker fails on apparmor, which is stupid & I hate it anway

OOPS: 991 passed, 16 skipped, 12 FAILED
--- FAIL: Test (3637.98s)
FAIL
coverage: 75.9% of statements
exit status 1
FAIL	github.com/docker/docker/integration-cli	3638.052s
---> Making bundle: .integration-daemon-stop (in bundles/1.9.0-dev/test-integration-cli)
+++++ cat bundles/1.9.0-dev/test-integration-cli/docker.pid
++++ kill 8945
++++ /etc/init.d/apparmor stop
 * Clearing AppArmor profiles cache
   ...done.
All profile caches have been cleared, but no profiles have been unloaded.
Unloading profiles will leave already running processes permanently
unconfined, which can lead to unexpected situations.

To set a process to complain mode, use the command line tool
'aa-complain'. To really tear down all profiles, run the init script
with the 'teardown' option."
Makefile:42: recipe for target 'all' failed
make: *** [all] Error 1

Let's try installing garden-linux:

mkdir -p $GOPATH/src
go get github.com/cloudfoundry-incubator/garden-linux
cd $GOPATH/src/github.com/cloudfoundry-incubator/garden-linux # assuming your $GOPATH has only one entry
make
go build -a -tags daemon -o out/garden-linux

0.1 Install Fedora Server 22

  • select Install Fedora 22 and press Enter

Stopped in the tracks because of this issue

per Glyn Normington "The problem is that we haven't yet started supporting centos or systemd"

Install Ubuntu 14.04.3

I ditched the Ubuntu install when I realized it had golang 1.2.1, which is really ancient.

  • Connect the VM's CD drive to the the Ubuntu ISO image (ubuntu-14.04.3-desktop-amd64.iso)
  • boot the VM
  • go through the Ubuntu installation process
  • we log in via the console as the user we created in the
  • bring up a terminal window inside the VM and run the following commands to install pre-requisite software:
  • log in via the console (or via ssh)
sudo apt-get -y install git automake autoconf
sudo apt-get -y install e2fsprogs e2fslibs e2fslibs-dev libblkid-dev
sudo apt-get -y install zlib1g-dev gcc liblzo2-dev
sudo apt-get -y install zlib1g-dev gcc liblzo2-dev
# fixes
# root@ci:~/go/src/github.com/docker/docker# make
#   mkdir bundles
#   docker build -t "docker-dev:master" .
#   /bin/sh: 1: docker: not found
sudo apt-get -y install docker.io
sudo su -
# fixes `dd: failed to open ‘/opt/garden/btrfs_backing_store’: No such file or directory`
sudo mkdir -p /opt/garden
  • Gearbox → System Settings → Displays
  • Resolution: 1024 × 768 (4:3)
  • click Apply, click Keep this Configuration

We need to get garden-linux dependencies [Docker] :

mkdir $GOPATH
go get -d github.com/docker/docker
cd $GOPATH/src/github.com/docker/docker
make all
#bash project/make/.go-autogen
#hack/make.sh ubuntu
go get github.com/cloudfoundry-incubator/garden-linux

cd $GOPATH
...
imports github.com/docker/docker/autogen/dockerversion: cannot find package "github.com/docker/docker/autogen/dockerversion" in any of:
	/usr/lib/go/src/pkg/github.com/docker/docker/autogen/dockerversion (from $GOROOT)
	/root/go/src/github.com/docker/docker/autogen/dockerversion (from $GOPATH)
...

imports github.com/docker/docker/pkg/transport: cannot find package "github.com/docker/docker/pkg/transport" in any of:
	/usr/lib/go/src/pkg/github.com/docker/docker/pkg/transport (from $GOROOT)
	/root/go/src/github.com/docker/docker/pkg/transport (from $GOPATH)

which seems to have been removed from docker ~v.1.8.0

git show 276c640be4b4335e3b8d684cb3562a56d3337b39
commit 276c640be4b4335e3b8d684cb3562a56d3337b39
Author: Tibor Vass <tibor@docker.com>
Date:   Sun May 17 05:07:48 2015 -0400

    remove pkg/transport and use the one from distribution

    Signed-off-by: Tibor Vass <tibor@docker.com>

We install Garden Linux using the README:

sudo su -
apt-get update
apt-get install -y asciidoc xmlto --no-install-recommends
apt-get install -y pkg-config autoconf
apt-get build-dep -y btrfs-tools

mkdir -p /tmp/btrfs
cd /tmp/btrfs
git clone git://git.kernel.org/pub/scm/linux/kernel/git/kdave/btrfs-progs.git
cd btrfs-progs
./autogen.sh
./configure
make && make install

Caveats

Note that our decision to manually create a VM and install garden-linux by hand (an admittedly unnatural act) was borne of 2 requirements:

  1. a desire to expose hardware virtualization to the worker VMs/Containers
  2. financial (we wouldn't mind spending $80 for a VMware Fusion license; however, we balked at spending $3,495 for a VMware vSphere Enterprise Plus license.

We wanted to expose hardware virtualization to the containers to enable us to run Intel's Hardware Accelerated Execution Manager (HAXM), which "...[uses Intel Virtualization Technology (VT) to speed up Android app emulation on a host machine] (https://software.intel.com/en-us/blogs/2012/03/12/how-to-start-intel-hardware-assisted-virtualization-hypervisor-on-linux-to-speed-up-intel-android-x86-emulator)"

Had we not needed to expose hardware virtualization, we would have opted for a BOSH Lite deployment (sample manifest here) Unfortunately, BOSH Lite only supports the Virtual Box and AWS Vagrant providers, not the VMware Fusion provider, and the VirtualBox does not support nested virtualization [VirtualBox]

discovered that by migrating our CI to Concourse and houdini, the "World's worst containerizer", we were able to decrease the duration of our tests 400% (from 9 minutes 22 seconds to something)

, who would like to reduce their feedback cycle, or who would like to test against a variety of ABI interfaces (currently Travis doesn't support x86-based or x86_64-based emulators, only ARM emulators). We discovered that by migrating our Android application's Continuous Integration (CI) from Travis CI (a CI service provider) to a custom solution using Concourse CI, a CI server developed internally at Pivotal Software, we were able to assert greater control over our CI environment (i.e. we were able to connect to our build containers to troubleshoot failed builds and had more flexibility choosing our target SDK (i.e. we were able to target Android-23, Marshmallow)).

This blog post may be of interest to Android developers who use continuous integration and who need greater control over their CI environment. It describes setting up a Concourse Server on Amazon AWS. Subsequent posts will discuss configuring and provisioning the Concourse workers and containers.

1. Configure Concourse Worker, Pipeline, and Job

1.0 Verify There Are No Concourse Workers

We check Concourse to make sure no workers are registered: https://ci.blabbertabber.com/api/v1/workers (substitute your server's URL as appropriate; you will need to authenticate). We should see an empty JSON array (i.e. "[ ]").

1.1 Download, Install, and Start Houdini

The Concourse worker needs Houdini, "The World's Worst Containerizer" to implement the Garden Linux container API so that the Concourse remote worker can spin up containers.

curl -L https://github.com/vito/houdini/releases/download/2015-10-09/houdini_darwin_amd64 -o ~/Downloads/houdini
install ~/Downloads/houdini /usr/local/bin
mkdir -p ~/workspace/houdini/containers
cd ~/workspace/houdini/
houdini

We see the following output:

{"timestamp":"1445517108.557929754","source":"houdini","message":"houdini.started","log_level":1,"data":{"addr":"0.0.0.0:7777","network":"tcp"}}

1.2 Manually Provision Worker

We follow the instructions [workers] to manually our remote worker:

[FIXME: why override UserKnownHostsFile? Why not opt for the default or "StrictHostKeyChecking no"?]

mkdir -p ~/workspace/houdini
cd ~/workspace/houdini
cat > worker.json <<EOF
{
    "platform": "darwin",
    "tags": [],
    "resource_types": []
}
EOF
TSA_HOST=ci.blabbertabber.com
GARDEN_ADDR=0.0.0.0:7777
ssh -p 2222 $TSA_HOST \
      -i ~/.ssh/worker_key \
      -o UserKnownHostsFile=host_key.pub \
      -R0.0.0.0:0:$GARDEN_ADDR \
      forward-worker \
      < worker.json

We see the following output:

Warning: Permanently added '[ci.blabbertabber.com]:2222,[52.0.76.229]:2222' (RSA) to the list of known hosts.
Allocated port 35509 for remote forward to 0.0.0.0:7777
2015/10/22 13:11:58 heartbeat took 177.775696ms

1.0 Verify There Is One Concourse Worker

We check Concourse to make sure our worker is registered: https://ci.blabbertabber.com/api/v1/workers (substitute your server's URL as appropriate; you will need to authenticate). We should see the following JSON:

[{"addr":"127.0.0.1:47274","baggageclaim_url":"","active_containers":0,"resource_types":null,"platform":"darwin","tags":[],"name":"127.0.0.1:47274"}]

If instead you see [ ], then you'll need to troubleshoot the tsa [tsa] daemon.

3. Conclusion

We were pleased with our switch to Concourse; however, switching CI platforms is not a trivial decision, and we switched because Travis CI no longer met our requirements. Don't switch to Concourse if Travis CI meets your needs.

Travis CI has several advantages:

  • free (at least for Open Source projects)
  • relatively easy to configure (a single .travis.yml file in repo)
  • tight GitHub integration, e.g. Travis CI runs pull requests and updates the pull request's status page.
  • badges (e.g. Build Status )

We have serious concerns over the security implications of using our OS X workstation as a remote Concourse worker. Should the Concourse server be compromised, our workstation will be compromised, too. Hosting a Concourse worker on one's personal workstation should be viewed as a proof-of-concept, not as a production-ready solution.

Alternative, more secure solutions would include the following:

  • using a more orthodox Concourse deployment (with an m3.large Concourse worker VM) (disadvantage: would be restricted to the ARM ABI for the Android emulators)
  • using a Linux VM on a firewalled network (with hardware virtualization enabled to allow ABIs other than ARM).

[workers] The instructions for manually provisioning Concourse workers can be found on the Concourse documentation. Additional information can be found on Concourse's atc's GitHub repo

[tsa] The Concourse server's file /var/vcap/sys/log/tsa/tsa.stdout.log often contains important troubleshooting information. For example, when troubleshooting our server, we do the following:

ssh -i ~/.ssh/aws_nono.pem vcap@ci.blabbertabber.com # BOSH account is always 'vcap'
# Last login: Fri Oct 23 11:17:21 2015 from 24.23.190.188
sudo su - # password is 'c1oudc0w'
tail -f /var/vcap/sys/log/tsa/tsa.stdout.log

When we see the following message in the log, it indicates our worker.json is malformed (in this case, an unexpected comma):

{"timestamp":"1445598975.989170074","source":"tsa","message":"tsa.connection.forward-worker.failed-to-register","log_level":2,"data":{"error":"invalid character '}' looking for beginning of object key string
","session":"18.2"}}

When we see the following message in the log, it indicates that the tsa is failing to authenticate against the atc. Check the BOSH manifest to make sure that jobs.*.properties.atc.basic_auth_username matches jobs..properties..tsa.atc.username and that jobs.*.properties.atc.basic_password matches jobs..properties..tsa.atc.password

This may also be caused by a mis-set jobs.*.properties.tsa.atc.address.

(The HTTP/1.1 Status Code 401, "Unauthorized", is an important clue).

{"timestamp":"1445599186.677824974","source":"tsa","message":"tsa.connection.forward-worker.register.start","log_level":1,"data":{"session":"20.2.31","worker-address":"127.0.0.1:52502","worker-platform":"darwin","worker-tags":""}}
{"timestamp":"1445599186.868321419","source":"tsa","message":"tsa.connection.forward-worker.register.bad-response","log_level":2,"data":{"session":"20.2.31","status-code":401}}

Although Travis CI is free for Open Source projects, the price climbs to $1,548/year ($129/month) for closed source projects.

Footnotes

[Docker] Installing Garden Linux's Docker dependencies can be problematic; blindly typing go get github.com/cloudfoundry-incubator/garden-linux without prepping the Docker build will result in the following errors:

imports github.com/docker/docker/autogen/dockerversion: cannot find package "github.com/docker/docker/autogen/dockerversion" in any of:
	/usr/lib/go/src/pkg/github.com/docker/docker/autogen/dockerversion (from $GOROOT)
	/root/go/src/github.com/docker/docker/autogen/dockerversion (from $GOPATH)

and

imports github.com/docker/docker/pkg/transport: cannot find package "github.com/docker/docker/pkg/transport" in any of:
	/usr/lib/go/src/pkg/github.com/docker/docker/pkg/transport (from $GOROOT)
	/root/go/src/github.com/docker/docker/pkg/transport (from $GOPATH)

[VirtualBox] VirtualBox as of this writing does not support nested virtualization (VT-in-VT), although interestingly the VirtualBox support ticket has many comments from those wishing to use it for the Android emulator.

[android-23] We discovered a bug when we upgraded our Travis CI to use the latest Android emulator (API 23, Marshmallow). Specifically our builds would fail with com.android.ddmlib.ShellCommandUnresponsiveException. The problem was posted to StackOverflow, but no solution was offered (at the time of this writing). The problem may lie with the image, not with Travis-CI: according to one developer, "something is up with the API 23 Google API emulator image".

[Travis] Travis CI does not permit ssh'ing into the container to troubleshoot the build. That, coupled with long feedback times, leads to a frustrating cycle of making small changes, pushing them, waiting 6 minutes to determine if they fixed the problem, and starting again.

[android-23] We discovered a bug when we upgraded our Travis CI to use the latest Android emulator (API 23, Marshmallow). Specifically our builds would fail with com.android.ddmlib.ShellCommandUnresponsiveException. The problem was posted to StackOverflow, but no solution was offered (at the time of this writing). The problem may lie with the image, not with Travis-CI: according to one developer, "something is up with the API 23 Google API emulator image".