Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Allow the creation of an archive tarball for airgapped environments #12

Merged
merged 1 commit into from
Mar 24, 2018

Conversation

raskchanky
Copy link
Contributor

I wrote this script awhile ago but shelved it because we didn't have a way to download transitive dependencies for hart files. That was necessary because when we go to upload everything to a new depot, the only way to guarantee that everything will upload properly is if we downloaded all the deps when the tarball was created. Simply getting the latest stable of everything isn't enough. Some packages have deps that are older than the latest stable.

I've successfully tested this script on an EC2 instance with Builder running in local dev env mode and everything seemed to work.

It's worth noting, however, that in order to upload packages to a new depot, you need to have HAB_AUTH_TOKEN set. This will necessitate either logging into the UI and generating a token there, or using a GH auth token (which is what I did when testing on EC2).

I'm sure some of this could be optimized a bit. I'm not a bash wizard.

Signed-off-by: Josh Black raskchanky@gmail.com

key_count=$((key_count+1))
echo ""
echo "[$key_count/$key_total] Importing $key"
cat $key | hab origin key import
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we also need to do a 'hab origin key upload' to the on-prem depot?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because once we start uploading packages, hab will auto-upload any keys it needs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes.. cool 👍

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find

hab origin key import < $key

a bit simpler

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's too bad that hab origin key import doesn't take a file argument, otherwise you could do this whole thing with just

find . -type f -name "*.pub" | xargs -n1 hab origin key import

@chefsalim
Copy link
Contributor

Looks good! Just one comment inline..

@raskchanky raskchanky requested review from fnichol and baumanj March 22, 2018 00:48
# Setting this variable to any value will cause the cleanup to be skipped. By
# default, the script will clean up after itself.

set -eo pipefail
Copy link

@baumanj baumanj Mar 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to have u here as well? I tend to find depending on the behavior of unset variables makes code harder for me to understand.

Adding it will require a few changes, but I'll try to call them out below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to explicitly check for the presence of the env vars that I care about, as a way to provide a better UX to the person running this script, instead of letting the script error out with whatever bash's default error is for unbound variables.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally. But you can get both, I think. Let me know if what I'm suggesting below fails to achieve that.

exit 1
fi
done
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be simplified to

check_tools() {
  for tool
  do
    if ! exists "$tool"; then
      echo "Please install $tool and run this script again."
      exit 1
    fi
  done
}

And then instead of needing to

    required_tools=( aws git curl jq xzcat )
    required_vars=( AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY )

    check_tools required_tools

you can just

    check_tools aws git curl jq xzcat

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow that's crazy. So the for loop at the beginning of check_tools() implicitly loops through whatever arguments were passed to the function?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, that's the default for behavior. I think it's a worthwhile idiom since that's such a common thing to do with functions. You can change the positional parameters with an explicit set command too, but that strikes me as a bit weird:

$ set -- 1 2 3
$ for x; do echo $x; done
1
2
3

exit 1
fi
done
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to check_tools:

check_vars() {
  for var
  do
    if [ -z "${!var:-}" ]; then
      echo "Please ensure that $var is exported in your environment and run this script again."
      exit 1
    fi
  done
}

And then:

check_vars AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY

Note the change of "${!var}" to "${!var:-}" is necessary to support set -u.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I didn't know about the :- syntax.

bucket="${HAB_ON_PREM_BOOTSTRAP_BUCKET_NAME:-habitat-on-prem-builder-bootstrap}"
marker="LATEST.tar.gz"

case "$1" in
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

case "${1:-}" in

for set -u

mkdir -p "$tmp_dir/keys"

# download keys first
keys=$(curl -s -H "Accept: application/json" "$upstream_depot/v1/depot/origins/core/keys" | jq ".[] | .location")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kind of tangential, but it would be nice if we could get jq (and any other required utils) from a hab package. I don't see that jq is currently provided via hab pkg provides jq.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's a great point, and one that @fnichol brought up to me too. It'd ultimately be really nice to write a plan for this script and include all the deps there. That would make for a much more streamlined UX and we could skip all the dep checking in this script as well. I'll loop back around and try to tackle that at the end if I have some time.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I just realized that hab pkg provides only searches locally installed. We do have a package: core/jq-static

for k in $keys
do
key=$(echo $k | tr -d '"')
release=$(echo $key | awk -F '/' '{ print $5 }')
Copy link

@baumanj baumanj Mar 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find

grep -Eo '[0-9]+$'

or perhaps

cut -d '/' -f 5

a bit easier to understand as "pull the digits off the end", than

awk -F '/' '{ print $5 }'

But it may be down to personal preference

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I consider it a personal victory any time I can work awk into a shell script, as it's one of my favorite utilities. However, I do agree that the cut command you suggest is a bit clearer. I'll change it.

done
popd

cd /tmp
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you're doing this cd immediately afterward, I'd suggest replacing the pushdpopd pair with just a cd "$core".

Alternatively, there's an idiom that I quite like that makes it more visually apparent that a certain range of commands are in a different directory and removes the need to remember the popd.:

    ( cd "$core"
      dir_list=$(find . -type f -name "plan.sh" -printf "%h\n" | sed -r "s|^\.\/||" | sort -u)
      …
    )
    cd /tmp

The main benefit here being that if the cd /tmp is ever moved or removed, you'll maintain the functionality. The main caveat being that if you want to exit the entire script from within the the subshell, you'll need to add

    ) || exit

to the end.


git clone https://github.com/habitat-sh/core-plans.git "$core"
pushd "$core"
dir_list=$(find . -type f -name "plan.sh" -printf "%h\n" | sed -r "s|^\.\/||" | sort -u)
Copy link

@baumanj baumanj Mar 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sed -r "s|^\.\/||"

could be changed to

xargs basename -a

Which I find somewhat more readable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed

keys=$(curl -s -H "Accept: application/json" "$upstream_depot/v1/depot/origins/core/keys" | jq ".[] | .location")
for k in $keys
do
key=$(echo $k | tr -d '"')
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general the

echo $foo | cmd

idiom can be replaced with

cmd <<< $foo

Which I find a bit more evocative. So, in this case, you'd get

      key=$(tr -d '"' <<< $k)

continue
fi

slash_ident=$(echo "$raw_ident" | jq "\"\(.origin)/\(.name)/\(.version)/\(.release)\"" | tr -d '"')
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Single-quoting the jq argument might make this a bit more readable:

jq '"\(.origin)/\(.name)\(.version)/\(.release)"'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call

# now extract the target and tdeps and download those too
if [[ $(tail -n +6 $local_file | xzcat | tar tf - | grep '/TARGET') ]] && [[ $(tail -n +6 $local_file | xzcat | tar tf - | grep '/TDEPS') ]]; then
target=$(tail -n +6 $local_file | xzcat | tar xfO - --no-anchored TARGET)
tdeps=$(tail -n +6 $local_file | xzcat | tar xfO - --no-anchored TDEPS)
Copy link

@baumanj baumanj Mar 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this ends up decompressing the entire hart potentially several times, I think we could improve performance and well as cut down on code duplication:

          local_tar=$(basename "$local_file" .hart).tar
          tail -n +6 "$local_file" | unxz > "$local_tar"

Also, we don't need to use grep multiple times to check for TARGET and TDEPS, we can ask tar directly:

          if tar tf "$local_tar" --no-anchored TARGET TDEPS > /dev/null 2>&1; then

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my not very rigorous testing this seemed to speed up the hart download phase from ~45 minutes to ~30.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I'll give this a shot

if [[ $(tail -n +6 $local_file | xzcat | tar tf - | grep '/TARGET') ]] && [[ $(tail -n +6 $local_file | xzcat | tar tf - | grep '/TDEPS') ]]; then
target=$(tail -n +6 $local_file | xzcat | tar xfO - --no-anchored TARGET)
tdeps=$(tail -n +6 $local_file | xzcat | tar xfO - --no-anchored TDEPS)
dep_total=$(echo "$tdeps" | wc -l)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd similarly suggest to use an array as before:

            readarray -t tdep_array <<< "$tdeps"
            dep_total=${#tdep_array[*]}

target=$(echo "$latest" | jq ".target" | tr -d '"')

# check to see if we have this file before fetching it again
local_file="$tmp_dir/harts/$dash_ident-$target.hart"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since $dash_ident is only used once, I think it makes sense to apply the same approach you did later on:

      local_file="$tmp_dir/harts/$(echo $slash_ident | tr '/' '-')-$target.hart"

or if you prefer this style:

      local_file="$tmp_dir/harts/$(tr '/' '-' <<< $slash_ident)-$target.hart"

This has the added benefit of reducing duplication of the somewhat tricky jq call.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea

else
echo "[$pkg_count/$pkg_total] [$dep_count/$dep_total] Downloading $dep"
curl -s -H "Accept: application/json" -o $file_to_check "$upstream_depot/v1/depot/pkgs/$dep/download"
fi
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a decent bit of duplicated logic here and it's somewhat far from the other instance at line 123. I think that increases the risk of a fix being made one place not being applied to both and increases the value of deduplication. One approach would be to make a function that does the downloading if necessary and communicates whether a new file was downloaded via the return:

download_hart_if_missing() {
  local local_file=$1
  local slash_ident=$2
  local status_line=$3

  if [ -f "$local_file" ]; then
    echo "$status_line $slash_ident is already present in our local directory. Skipping download."
    return 1
  else
    echo "$status_line Downloading $slash_ident"
    curl -s -H "Accept: application/json" -o "$local_file" "$upstream_depot/v1/depot/pkgs/$slash_ident/download"
  fi
}

Then the two instances could be replaced with:

        if download_hart_if_missing "$local_file" "$slash_ident" "[$pkg_count/$pkg_total]"; then
          # now extract the target and tdeps and download those too
              download_hart_if_missing "$file_to_check" "$dep" "[$pkg_count/$pkg_total] [$dep_count/$dep_total]" || true

The || true being necessary in the second case because there's nothing to do if $file_to_check is already downloaded, but we don't want the script to exit on account of set -e.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this. I knew when I was writing this that this was duplicated, and I knew a function was the answer, but I couldn't put it all together in a way that made sense. I'll try this out.

echo "Done."
;;
populate-depot)
if [ -z "$2" ]; then
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"${2:-}" for set -u

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I'd suggest assigning $2 to something like depot_url to improve readability.


set -eo pipefail

help() {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

help is a bash builtin, so it might be better to call this usage Also, since all the callers immediately exit afterwards, it would save a few lines to add the exit 1 to this function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I had no idea. I'll change it.

s3_cp $tar_file s3://$bucket/
s3_cp s3://$bucket/$bootstrap_file s3://$bucket/$marker

if [ -z "$HAB_ON_PREM_BOOTSTRAP_DONT_CLEAN_UP" ]; then
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to put this in a function and add it to an EXIT trap?

Also, "${HAB_ON_PREM_BOOTSTRAP_DONT_CLEAN_UP:-}" and I think it's easier to reason with fewer negated conditions so perhaps:

    if [ "${HAB_ON_PREM_BOOTSTRAP_DONT_CLEAN_UP:-}" ]; then
      echo "Cleanup skipped."
    else
      echo "Cleaning up."

or

    if [ "${HAB_ON_PREM_BOOTSTRAP_CLEAN_UP:-true}" ]; then
      echo "Cleaning up."
    else
      echo "Cleanup skipped."

for p in $dir_list
do
pkg_count=$((pkg_count+1))
echo ""
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "" is unnecessary. echo with no args will produce a newline.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know that, thanks. Will change.

echo ""
echo "Importing keys"
keys=$(find . -type f -name "*.pub")
key_total=$(echo "$keys" | wc -l)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, I'd suggest arrays for the keys and harts below

n=$((n+1))
echo "Upload failed. Sleeping 5 seconds and retrying."
sleep 5
done
Copy link

@baumanj baumanj Mar 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can save yourself some math with

      for _ in {1..5}
      do
        hab pkg upload --url "$2" --channel stable "$hart" && break
        echo "Upload failed. Sleeping 5 seconds and retrying."
        sleep 5
      done


echo "Package uploads finished."

if [ -z "$HAB_ON_PREM_BOOTSTRAP_DONT_CLEAN_UP" ]; then
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per my other comment using a trap here would allow code sharing.

@baumanj
Copy link

baumanj commented Mar 23, 2018

Overall, I think this looks great. It's well-organized and easy to understand.

In addition to the comments I made, there are a number of places that trigger shellcheck lints for double quoting. I'd suggest running shellcheck yourself (we even have a hab package) to fix those up.

@raskchanky raskchanky force-pushed the jb/download-not-install branch from 3406f7c to a2a1b94 Compare March 23, 2018 18:19
if download_hart_if_missing "$local_file" "$slash_ident" "[$pkg_count/$pkg_total]"; then
# now extract the target and tdeps and download those too
local_tar=$(basename "$local_file" .hart).tar
tail -n +6 "$local_file" | unxz > "$local_tar"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It occurs to me that this 6 assumes the structure of the hart file, so it might be good to do some future-proofing with a check like:

          if [ "$(head -n1 "$local_file")" != HART-1 ]; then
            echo "Unsupported hart file version"
            exit 1
          fi

@raskchanky raskchanky force-pushed the jb/download-not-install branch from a2a1b94 to 406d87d Compare March 23, 2018 18:54
@raskchanky raskchanky force-pushed the jb/download-not-install branch 3 times, most recently from 86ee58b to 8155714 Compare March 23, 2018 22:43
Signed-off-by: Josh Black <raskchanky@gmail.com>
@raskchanky raskchanky force-pushed the jb/download-not-install branch from 8155714 to e2415c4 Compare March 23, 2018 22:44
@chefsalim
Copy link
Contributor

One more quick comment - not a blocker for this PR - but I just realized that we may need an option to populate the depot with an already downloaded tar file. So if the install point is truly airgapped, we won't need to reach out to S3 for the tar file. We can tweak that post this PR.

@raskchanky
Copy link
Contributor Author

I implemented many of the suggestions that were left, though not all of them, and verified that everything still works.

@raskchanky raskchanky merged commit 06a44a6 into master Mar 24, 2018
@raskchanky raskchanky deleted the jb/download-not-install branch March 24, 2018 04:13
@raskchanky
Copy link
Contributor Author

@chefsalim Good point. I made an issue to track this.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants