-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Allow the creation of an archive tarball for airgapped environments #12
Conversation
scripts/on-prem-archive.sh
Outdated
key_count=$((key_count+1)) | ||
echo "" | ||
echo "[$key_count/$key_total] Importing $key" | ||
cat $key | hab origin key import |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we also need to do a 'hab origin key upload' to the on-prem depot?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, because once we start uploading packages, hab
will auto-upload any keys it needs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes.. cool 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find
hab origin key import < $key
a bit simpler
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's too bad that hab origin key import
doesn't take a file argument, otherwise you could do this whole thing with just
find . -type f -name "*.pub" | xargs -n1 hab origin key import
Looks good! Just one comment inline.. |
scripts/on-prem-archive.sh
Outdated
# Setting this variable to any value will cause the cleanup to be skipped. By | ||
# default, the script will clean up after itself. | ||
|
||
set -eo pipefail |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason not to have u
here as well? I tend to find depending on the behavior of unset variables makes code harder for me to understand.
Adding it will require a few changes, but I'll try to call them out below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to explicitly check for the presence of the env vars that I care about, as a way to provide a better UX to the person running this script, instead of letting the script error out with whatever bash's default error is for unbound variables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally. But you can get both, I think. Let me know if what I'm suggesting below fails to achieve that.
exit 1 | ||
fi | ||
done | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be simplified to
check_tools() {
for tool
do
if ! exists "$tool"; then
echo "Please install $tool and run this script again."
exit 1
fi
done
}
And then instead of needing to
required_tools=( aws git curl jq xzcat )
required_vars=( AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY )
check_tools required_tools
you can just
check_tools aws git curl jq xzcat
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow that's crazy. So the for
loop at the beginning of check_tools()
implicitly loops through whatever arguments were passed to the function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, that's the default for
behavior. I think it's a worthwhile idiom since that's such a common thing to do with functions. You can change the positional parameters with an explicit set
command too, but that strikes me as a bit weird:
$ set -- 1 2 3
$ for x; do echo $x; done
1
2
3
exit 1 | ||
fi | ||
done | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to check_tools
:
check_vars() {
for var
do
if [ -z "${!var:-}" ]; then
echo "Please ensure that $var is exported in your environment and run this script again."
exit 1
fi
done
}
And then:
check_vars AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY
Note the change of "${!var}"
to "${!var:-}"
is necessary to support set -u
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I didn't know about the :-
syntax.
scripts/on-prem-archive.sh
Outdated
bucket="${HAB_ON_PREM_BOOTSTRAP_BUCKET_NAME:-habitat-on-prem-builder-bootstrap}" | ||
marker="LATEST.tar.gz" | ||
|
||
case "$1" in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
case "${1:-}" in
for set -u
mkdir -p "$tmp_dir/keys" | ||
|
||
# download keys first | ||
keys=$(curl -s -H "Accept: application/json" "$upstream_depot/v1/depot/origins/core/keys" | jq ".[] | .location") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kind of tangential, but it would be nice if we could get jq
(and any other required utils) from a hab package. I don't see that jq
is currently provided via hab pkg provides jq
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's a great point, and one that @fnichol brought up to me too. It'd ultimately be really nice to write a plan for this script and include all the deps there. That would make for a much more streamlined UX and we could skip all the dep checking in this script as well. I'll loop back around and try to tackle that at the end if I have some time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I just realized that hab pkg provides
only searches locally installed. We do have a package: core/jq-static
scripts/on-prem-archive.sh
Outdated
for k in $keys | ||
do | ||
key=$(echo $k | tr -d '"') | ||
release=$(echo $key | awk -F '/' '{ print $5 }') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find
grep -Eo '[0-9]+$'
or perhaps
cut -d '/' -f 5
a bit easier to understand as "pull the digits off the end", than
awk -F '/' '{ print $5 }'
But it may be down to personal preference
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I consider it a personal victory any time I can work awk
into a shell script, as it's one of my favorite utilities. However, I do agree that the cut
command you suggest is a bit clearer. I'll change it.
scripts/on-prem-archive.sh
Outdated
done | ||
popd | ||
|
||
cd /tmp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you're doing this cd
immediately afterward, I'd suggest replacing the pushd
…popd
pair with just a cd "$core"
.
Alternatively, there's an idiom that I quite like that makes it more visually apparent that a certain range of commands are in a different directory and removes the need to remember the popd
.:
( cd "$core"
dir_list=$(find . -type f -name "plan.sh" -printf "%h\n" | sed -r "s|^\.\/||" | sort -u)
…
)
cd /tmp
The main benefit here being that if the cd /tmp
is ever moved or removed, you'll maintain the functionality. The main caveat being that if you want to exit the entire script from within the the subshell, you'll need to add
) || exit
to the end.
scripts/on-prem-archive.sh
Outdated
|
||
git clone https://github.com/habitat-sh/core-plans.git "$core" | ||
pushd "$core" | ||
dir_list=$(find . -type f -name "plan.sh" -printf "%h\n" | sed -r "s|^\.\/||" | sort -u) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sed -r "s|^\.\/||"
could be changed to
xargs basename -a
Which I find somewhat more readable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed
scripts/on-prem-archive.sh
Outdated
keys=$(curl -s -H "Accept: application/json" "$upstream_depot/v1/depot/origins/core/keys" | jq ".[] | .location") | ||
for k in $keys | ||
do | ||
key=$(echo $k | tr -d '"') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general the
echo $foo | cmd
idiom can be replaced with
cmd <<< $foo
Which I find a bit more evocative. So, in this case, you'd get
key=$(tr -d '"' <<< $k)
scripts/on-prem-archive.sh
Outdated
continue | ||
fi | ||
|
||
slash_ident=$(echo "$raw_ident" | jq "\"\(.origin)/\(.name)/\(.version)/\(.release)\"" | tr -d '"') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Single-quoting the jq
argument might make this a bit more readable:
jq '"\(.origin)/\(.name)\(.version)/\(.release)"'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call
scripts/on-prem-archive.sh
Outdated
# now extract the target and tdeps and download those too | ||
if [[ $(tail -n +6 $local_file | xzcat | tar tf - | grep '/TARGET') ]] && [[ $(tail -n +6 $local_file | xzcat | tar tf - | grep '/TDEPS') ]]; then | ||
target=$(tail -n +6 $local_file | xzcat | tar xfO - --no-anchored TARGET) | ||
tdeps=$(tail -n +6 $local_file | xzcat | tar xfO - --no-anchored TDEPS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this ends up decompressing the entire hart potentially several times, I think we could improve performance and well as cut down on code duplication:
local_tar=$(basename "$local_file" .hart).tar
tail -n +6 "$local_file" | unxz > "$local_tar"
Also, we don't need to use grep
multiple times to check for TARGET
and TDEPS
, we can ask tar
directly:
if tar tf "$local_tar" --no-anchored TARGET TDEPS > /dev/null 2>&1; then
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my not very rigorous testing this seemed to speed up the hart download phase from ~45 minutes to ~30.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, I'll give this a shot
scripts/on-prem-archive.sh
Outdated
if [[ $(tail -n +6 $local_file | xzcat | tar tf - | grep '/TARGET') ]] && [[ $(tail -n +6 $local_file | xzcat | tar tf - | grep '/TDEPS') ]]; then | ||
target=$(tail -n +6 $local_file | xzcat | tar xfO - --no-anchored TARGET) | ||
tdeps=$(tail -n +6 $local_file | xzcat | tar xfO - --no-anchored TDEPS) | ||
dep_total=$(echo "$tdeps" | wc -l) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd similarly suggest to use an array as before:
readarray -t tdep_array <<< "$tdeps"
dep_total=${#tdep_array[*]}
scripts/on-prem-archive.sh
Outdated
target=$(echo "$latest" | jq ".target" | tr -d '"') | ||
|
||
# check to see if we have this file before fetching it again | ||
local_file="$tmp_dir/harts/$dash_ident-$target.hart" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since $dash_ident
is only used once, I think it makes sense to apply the same approach you did later on:
local_file="$tmp_dir/harts/$(echo $slash_ident | tr '/' '-')-$target.hart"
or if you prefer this style:
local_file="$tmp_dir/harts/$(tr '/' '-' <<< $slash_ident)-$target.hart"
This has the added benefit of reducing duplication of the somewhat tricky jq
call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea
scripts/on-prem-archive.sh
Outdated
else | ||
echo "[$pkg_count/$pkg_total] [$dep_count/$dep_total] Downloading $dep" | ||
curl -s -H "Accept: application/json" -o $file_to_check "$upstream_depot/v1/depot/pkgs/$dep/download" | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a decent bit of duplicated logic here and it's somewhat far from the other instance at line 123. I think that increases the risk of a fix being made one place not being applied to both and increases the value of deduplication. One approach would be to make a function that does the downloading if necessary and communicates whether a new file was downloaded via the return:
download_hart_if_missing() {
local local_file=$1
local slash_ident=$2
local status_line=$3
if [ -f "$local_file" ]; then
echo "$status_line $slash_ident is already present in our local directory. Skipping download."
return 1
else
echo "$status_line Downloading $slash_ident"
curl -s -H "Accept: application/json" -o "$local_file" "$upstream_depot/v1/depot/pkgs/$slash_ident/download"
fi
}
Then the two instances could be replaced with:
if download_hart_if_missing "$local_file" "$slash_ident" "[$pkg_count/$pkg_total]"; then
# now extract the target and tdeps and download those too
download_hart_if_missing "$file_to_check" "$dep" "[$pkg_count/$pkg_total] [$dep_count/$dep_total]" || true
The || true
being necessary in the second case because there's nothing to do if $file_to_check
is already downloaded, but we don't want the script to exit on account of set -e
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this. I knew when I was writing this that this was duplicated, and I knew a function was the answer, but I couldn't put it all together in a way that made sense. I'll try this out.
scripts/on-prem-archive.sh
Outdated
echo "Done." | ||
;; | ||
populate-depot) | ||
if [ -z "$2" ]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"${2:-}"
for set -u
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I'd suggest assigning $2
to something like depot_url
to improve readability.
scripts/on-prem-archive.sh
Outdated
|
||
set -eo pipefail | ||
|
||
help() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
help
is a bash builtin, so it might be better to call this usage
Also, since all the callers immediately exit afterwards, it would save a few lines to add the exit 1
to this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I had no idea. I'll change it.
scripts/on-prem-archive.sh
Outdated
s3_cp $tar_file s3://$bucket/ | ||
s3_cp s3://$bucket/$bootstrap_file s3://$bucket/$marker | ||
|
||
if [ -z "$HAB_ON_PREM_BOOTSTRAP_DONT_CLEAN_UP" ]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to put this in a function and add it to an EXIT
trap?
Also, "${HAB_ON_PREM_BOOTSTRAP_DONT_CLEAN_UP:-}"
and I think it's easier to reason with fewer negated conditions so perhaps:
if [ "${HAB_ON_PREM_BOOTSTRAP_DONT_CLEAN_UP:-}" ]; then
echo "Cleanup skipped."
else
echo "Cleaning up."
or
if [ "${HAB_ON_PREM_BOOTSTRAP_CLEAN_UP:-true}" ]; then
echo "Cleaning up."
else
echo "Cleanup skipped."
scripts/on-prem-archive.sh
Outdated
for p in $dir_list | ||
do | ||
pkg_count=$((pkg_count+1)) | ||
echo "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ""
is unnecessary. echo
with no args will produce a newline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't know that, thanks. Will change.
scripts/on-prem-archive.sh
Outdated
echo "" | ||
echo "Importing keys" | ||
keys=$(find . -type f -name "*.pub") | ||
key_total=$(echo "$keys" | wc -l) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above, I'd suggest arrays for the keys and harts below
scripts/on-prem-archive.sh
Outdated
n=$((n+1)) | ||
echo "Upload failed. Sleeping 5 seconds and retrying." | ||
sleep 5 | ||
done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can save yourself some math with
for _ in {1..5}
do
hab pkg upload --url "$2" --channel stable "$hart" && break
echo "Upload failed. Sleeping 5 seconds and retrying."
sleep 5
done
scripts/on-prem-archive.sh
Outdated
|
||
echo "Package uploads finished." | ||
|
||
if [ -z "$HAB_ON_PREM_BOOTSTRAP_DONT_CLEAN_UP" ]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per my other comment using a trap here would allow code sharing.
Overall, I think this looks great. It's well-organized and easy to understand. In addition to the comments I made, there are a number of places that trigger shellcheck lints for double quoting. I'd suggest running shellcheck yourself (we even have a hab package) to fix those up. |
3406f7c
to
a2a1b94
Compare
if download_hart_if_missing "$local_file" "$slash_ident" "[$pkg_count/$pkg_total]"; then | ||
# now extract the target and tdeps and download those too | ||
local_tar=$(basename "$local_file" .hart).tar | ||
tail -n +6 "$local_file" | unxz > "$local_tar" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It occurs to me that this 6
assumes the structure of the hart file, so it might be good to do some future-proofing with a check like:
if [ "$(head -n1 "$local_file")" != HART-1 ]; then
echo "Unsupported hart file version"
exit 1
fi
a2a1b94
to
406d87d
Compare
86ee58b
to
8155714
Compare
Signed-off-by: Josh Black <raskchanky@gmail.com>
8155714
to
e2415c4
Compare
One more quick comment - not a blocker for this PR - but I just realized that we may need an option to populate the depot with an already downloaded tar file. So if the install point is truly airgapped, we won't need to reach out to S3 for the tar file. We can tweak that post this PR. |
I implemented many of the suggestions that were left, though not all of them, and verified that everything still works. |
@chefsalim Good point. I made an issue to track this. |
I wrote this script awhile ago but shelved it because we didn't have a way to download transitive dependencies for hart files. That was necessary because when we go to upload everything to a new depot, the only way to guarantee that everything will upload properly is if we downloaded all the deps when the tarball was created. Simply getting the latest stable of everything isn't enough. Some packages have deps that are older than the latest stable.
I've successfully tested this script on an EC2 instance with Builder running in local dev env mode and everything seemed to work.
It's worth noting, however, that in order to upload packages to a new depot, you need to have
HAB_AUTH_TOKEN
set. This will necessitate either logging into the UI and generating a token there, or using a GH auth token (which is what I did when testing on EC2).I'm sure some of this could be optimized a bit. I'm not a bash wizard.
Signed-off-by: Josh Black raskchanky@gmail.com