Skip to content
Matt.jl edited this page Jul 12, 2020 · 5 revisions

Current state of AWS in JuliaCloud

Contact Info

@mattBrzezinski on GitHub and the Julia Slack

Tenets

  • Make using AWS easy for the average Julia user
  • Use automation and code generation as much as possible
  • Create a simple straight-forward systems design

Summary

Using AWS Services in Julia is currently more difficult than it needs to be. Users are currently limited to either the low-level API wrappers, which require knowing the service, request type, and uri as outlined by Amazon. Or users can use a high-level wrapper package which may or may not be available for the service which they want to use. Updating these API packages is a manual and undocumented process.

This document proposes a system which can automatically update low and high level API wrappers for AWS Services. As well as use one of Julia's key features, multiple dispatch, to dispatch on the request type rather than having an individual function for each AWS Service.

These changes will allow JuliaCloud to always have an up-to-date package with the latest Amazon Service APIs.

Current State

There are two categories of packages currently supporting AWS usage in JuliaCloud.

Low-Level Wrapper

AWSCore.jl is the most popular low-level package. The package consists of five major files:

  • AWSAPI.jl: Generates the Services.jl file which contains the low-level API wrappers for each AWS Service
  • AWSCore.jl: Processes json, query, rest-xml, rest-json request protocols
  • AWSCredentials.jl: Handles retrieving AWS Credentials from locations such as environment variables, credential / configuration files, etc.
  • Services.jl: Contains a function for every AWS Service
  • signaturev4.jl: Creates the AWS4AuthLayer to be inserted into the HTTP stack and signs the requests with AWS authentication

Inside of Services.jl each AWS Service has its own respective service, which is used to call it:

function s3(aws::AWSConfig, verb, resource, args=[])
    AWSCore.service_rest_xml(
        aws;
        service      = get(aws, :service_name, "s3"),
        version      = "2006-03-01",
        verb         = verb,
        resource     = resource,
        args         = args)
end

AWSCore.jl works by running a Node.js server which parsing down the AWS SDK JS to create definitions for each AWS Service. To use the package it is then up to the end user to know how to call the appropriate operation, which can be done by referencing the AWS Documentation.

e.g. ListBuckets operation on AWS S3

using AWSCore.Services
Services.s3(aws_config(), "GET", "/")

Having functions defined for each service in this form does not take advantage of multiple dispatch. In its current state there are no documented steps to update the Services.jl file. If Amazon releases a new service, or updates the API for an existing service the process of updating Services.jl needs to be done manually.

High-Level Wrapper(s)

These packages are much more simple to use as the end user only needs to know the operation they wish to perform. However these high-level packages are currently hand written, limited to certain AWS Services, are prone to errors and/or have limited functionality.

To use a package such as AWSS3.jl, the end user only needs to know how to call the operation.

e.g. ListBuckets operation on AWS S3

using AWSCore

s3_list_buckets()

AWSSDK.jl is a package which contains high-level API definitions for all Amazon Services. However, because it contains every service as its own module loading this package is quite cumbersome.

Proposed Solution

I propose that we tag the current version of AWSCore.jl@1.0, and begin working on AWSCore.jl@2.0. AWSCore.jl@2.0 would consist of:

  • Taking advantage of Julia's multiple dispatch for making AWS service requests
  • Automating the creation and updating of service definitions using GitHub actions
  • Low and High Level API wrappers

After the release of AWSCore.jl@2.0 the archival and deprecation of other low and high level wrapper packages can occur.

The proposed architecture for a system would look like:

AWSCore Architecture Diagram

AWSCore.jl

Would hold the structs for each type of request we will dispatch on. It will also contain its current functionality of making the requests themselves (JSON, REST-XML, etc.) These request function can be used as an entry point, however they are not the recommended route.

AWSCoreServices.jl

This file will be auto-generated by AWSMetadata.jl. It will contain the low-level API wrapper objects for each service. This will be the entry point for the low-level API wrapper.

i.e.

module AWSCoreServices
# ...
const sagemaker_runtime = AWSCorePrototype.RestXMLService("runtime.sagemaker", "2017-05-13")
const s3 = AWSCore.RESTXMLService("s3", "2006-03-01")
const s3_control = AWSCorePrototype.RestXMLService("s3-control", "2018-08-20")
const sagemaker = AWSCorePrototype.JSONService("api.sagemaker", "2017-07-24", "1.1", "SageMaker")
# ...
end

services/${Service}.jl

These files will be auto-generated by AWSMetadata.jl. Each file will be a sub-module for an AWS Service and contain high-level wrappers for each operation for a service. These will be the entry points for the high-level API wrappers.

Since these files contain a large amount of functions, including them in the AWSCore module would take a substantial amount of time. Instead it will be used by calling a macro to generate the module and include the service file when needed.

i.e.

module s3
# ...
ListObjects(Bucket) = s3("GET", "/$Bucket")
ListObjectVersions(Bucket) = s3("GET", "/$Bucket?versions")
HeadObject(Bucket, Key) = s3("HEAD", "/$Bucket/$Key+")
PutBucketAcl(Bucket) = s3("PUT", "/$Bucket?acl")
# ...
end

AWSMetadata.jl and metadata.json

AWSMetadata.jl contains all the functions for updating both the low and high level API wrappers. metadata.json is used in tandem to hold the SHA hashes for each version, as well as their API Versions.

GitHub Actions

We can use GitHub actions to automatically create or update AWS Service APIs on a daily basis. We can also use GitHub actions to trigger alarms and gather metrics.

Use Cases

Creating or Updating an API Wrapper

Create Update API Workflow

Low-Level Wrapper Usage

Low-Level wrapper usage would look similar to the current AWSCore.jl.

using AWSCorePrototype.Services: s3

buckets = s3("GET", "/")
println(buckets)

High-Level Wrapper Usage

using .AWSCorePrototype: @service
@service S3
using .S3

buckets = S3.ListBuckets()
println(buckets)

In Scope

  • Code generates low and high level wrappers
  • Use multiple dispatch on a request type
  • Archive single AWS Service high-level wrappers, and other low-level wrapper packages
  • Increase code coverage of unit tests for each AWS Service

Out of Scope

  • Handling other cloud service providers such as Azure, or Google Cloud Platform

Measures of Success

  • Decrease the size of the code base
  • Increase the performance making requests to AWS
  • Get code coverage for unit tests to 100%

Dependencies

To automate the creation of high and low level wrappers in Julia we must pull AWS Service definitions from an external source. The JavaScript SDK is the most simple to parse as all service definitions are defined as JSON files while other SDKs define them on a per language basis.

We need to also have some service to run the code which will automate the creation or updating of a service, such as GitHub actions. Certain actions are already created, and can simplify this process such as:

We will need to depend on other Julia packages. A short list of them would be:

Functional Requirements

  • Users should be able to easily use the package, and only load the necessary modules for their code.
  • The package should have a well defined API and design.
  • Making calls to AWS should be performant.
  • Decrease the time between Amazon launching a service, and the API being available in Julia.

Non-Functional Requirements

  • The only manual process of updating API wrapper definitions should be creating new unit tests and review the generated changes.
  • The system should be well documented, such that anyone in the Julia community can have a good understanding of the system components and is able to contribute to the repository.
  • The system should be extendable so that new AWS Services can automatically be created.

Metrics

  • How often are services being updated / created?
    • Is checking daily a good time frame?
    • How do we handle metrics, GitHub Actions?

Alarms

  • If a new protocol (not REST-XML, JSON, REST-JSON, or Query) is being used for a Service we should trigger an alarm. Amazon is now making a new service and the auto-generation code needs to accommodate the new protocol.

Open Issues

  • Should we attempt to automate the creation of unit tests as well?
    • I would argue no, and that these should be created by hand.
    • It would be tedious, however it will give us the backing that the generated code is correct.
    • It would also be complex to know pre-requisites for certain operations
      • i.e. To add a SecurityGroup to an EC2 Instance you'd first need an Instance in a VPC
  • Do we go all in on GitHub Actions?
    • We can replace TravisCI (what is currently used) to run tests
    • How do we deal with bors?
    • Quick overview I found between the two, link
  • Which Julia JSON package should we be using?
    • Prototype reading JSON for an AWS Service API and compare the performance
    • What are the pros/cons of each of them?
    • List of Julia JSON packages
  • Should we move away from using XMLDict.jl?
    • It's very very convenient to use
  • How should optional parameters be passed in?
    • As a Dictionary? Raw XML for Rest-XML calls? Both?
    • LittleDict?

Appendix

Number of Amazon Services

import re

with open("api_files.txt") as f:
  services = set()

  for line in f.readlines():
    services.add(re.split("-\d", line)[0])
  print(len(services))  # 220 (as of 2019-12-17)

List of low-level wrappers

List of high-level wrappers

List of available AWS SDKs