Architecting to Scale Module

Concepts
- Architectural Patterns
  - Loosely Coupled Architecture (LCA)
    - Components can stand independently and require little or no knowledge of the inner workings of the other components.
    - Benefits of LCA
      - Layers of abstraction
      - Permits more flexibility
      - Interchangeable components
      - More atomic or isolated functional units
      - Can scale components independently
- Types of Scaling
  - Horizontal Scaling (Scale Out)
    - Add more instances as demand increases
    - No downtime required to Scale Out of Scale in
    - Automatic using Auto-Scaling groups
    - Theoretically Unlimited on AWS
    - Cost Effective
  - Vertical Scaling (Scale Up)
    - Add more CPU and/or RAM to existing instance as demand increases
    - Requires restart to scale up or down
    - Would require scripting to automate
    - Limited by instance sizes
- Auto-Scaling Groups
  - Automatically provides horizontal scaling (scale-out) as required by the demand
  - Triggered by an event or scaling action to either launch or terminate instances
  - Availability, Cost and System metrics can all factor into scaling.
  - Four Scaling Options
    - Maintain
      - Keep a specific or minimum number of instances running
      - What: Hands-off way to maintain X number of instances
      - When: I need 3 instances always
    - Manual
      - Use maximum, minimum or specific number of instances
      - What: Manually change desired capacity via console or CLI
      - When: My needs change so rarely that I can just manually add and remove
    - Schedule
      - Increase or Decrease instances based on Schedule
      - What: Adjust min/max instances based on specific times
      - When: Every Monday morning, we get a rush on our website
    - Dynamic
      - Scale based on real-time metrics of the systems
      - What: Scale in response to behavior of elements in the environment
      - When: When CPU Utilisation gets to 70% on current instances, scale up
  - Launch Configurations
    - Template used in Autoscaling group which defines
      - type of EC2 instance
      - Specify VPC and Subnets for scaled instances
    - Attach to a ELB
    - Define a Health Check Grace Period
    - Define size of the group to stay at a initial size
    - Or Use scaling policy which can be based from metrics (CPU, ALB Request Count Per Target, Network Traffic In/Out)
  - Scaling Policies
    - Target Tracking Policy
      - What: Scale based on a predefined or custom metric in relation to a target value
      - When: When CPU Utilisation gets to 70% on current instances, scale up
    - Simple Scaling Policy
      - What: Waits until health check and cool down period expires before evaluating new need
      - When: Let's add new instances slow and steady
    - Step Scaling Policy
      - What: Responds to scaling needs with more sophistication and logic
      - When: AGG! Add ALL instances
  - Scaling Cooldowns
    - Configuration duration that gives scaling a chance to "come up to speed" and absorb load
    - Default cooling period is 300 seconds
    - Automatically applies to dynamic scaling and optionally to manual scaling but not supported for scheduled scaling
    - Can override default cooldown via scaling-specific cool down
    - Review examples
- Kinesis
  - Collection of services for processing streams of various data
  - Data is processed in "shards" - with each shard able to ingest 1000 records per second
  - A default limit of 500 shards, but you can request an increase to unlimited shards
    - Shards allow parallel processing of incoming data.
    - More like lanes in a highway. More lanes, more traffic can pass through.
    - The recommended method to read data from a shard is via Kinesis Consumer Library
  - Record consists of Partition Key, Sequence Number and Data Blob (up to 1MB)
  - Transient Data Store - Default Retention of 24 hours, but can be configured for up to 7 days
  - Flavors for Kinesis
    - Kinesis Video Streams
      - Stream the data coming through from various sources, optimised to handle video data.
    - Kinesis Data Streams
      - Stream any data coming through from various sources
    - Kinesis Firehose
      - Receive data coming through from various sources and land it in S3, Redshift, lambda, EC2 instance etc.
    - Kinesis Data Analytics
      - Apply analytics on incoming data immediately.
- DynamoDB Scaling
  - Divided into two dimensions
    - Throughout
      - Read Capacity Units
      - Write Capacity Units
    - Size
      - Max Item Size 400KB
  - Terminology
    - Partition: A physical space where DynamoDB is stored
    - Partition Key: A unique identifier for each record; sometimes called as a Hash Key
    - Sort Key: In combination wtih the Partition Key, a composite key can be created that defines storage order; Sometimes called as "Range Key"
  - Under the Hood
    - DynamoDB scales by adding partitions.
    - Partition Calculation
      - By Capacity: (Total RCU / 3000) + (Total WCU / 1000)
      - By Size: Total Size / 10 GB
      - Total Partitions: Round Up for MAX (By Capacity, By Size)
      - Example
        
        By Capacity: (2000 RCU / 3000) + (2000 WCU / 1000) = 2.66
        
        By Size: 10 GB / 10 GB = 1
        
        Total Partitions: MAX (2.66,1) = 2.66 Round Up = 3 Partitions
    - Using Target Tracking method to try and stay close to Target Utilisation
    - Currently does not scale down if table's consumption drops to Zero
      - Workaround #1: Send requests to the table unit until it scales down
      - Workaround #2: Manually reduce the max capacity to be the same as min capacity
    - Also supports Global Secondary Indexes - think of them like a copy of the table just indexed or keyed differently
- CloudFront
  - Can deliver content to users faster by caching static and dynamic content at edge locations
  - Dynamic Content delivery is achieved using HTTP cookies forwarded from your origin
  - Supports Adobe Flash Media Server's RTMP Protocol but have to choose RTMP delivery method
  - Web distributions also support media streaming and live streaming but use HTTP or HTTPS
  - Origins can be S3, EC2, ELB or another web server
  - Multiple origins can be configured
  - User Behaviors to configure serving up origin content based on URL Paths
  - Example: CloudFront CDN can configured to serve static content from a S3 bucket and dynamic content via an ELB which is backended by an EC2 fleet.
  - Invalidation Requests
    - There are several ways to invalidate a CloudFront cache
      - Simply delete the file from the origin and wait for the TTL to expire. TTL is configurable
      - Use the AWS console to request invalidation for all content or a specific path such as /images/*
      - Use the CloudFront API to submit an invalidation request
      - Use Third-party tools to perform CloudFront invalidation (CloudBerry, YLastic, CDN Planet, CloudFront Purge Tool)
  - CloudFront Supports
    - Zone Apex DNS entries (that is without www or any subdomains in front of the URL)
    - Geo Restriction: White List and Black List countries to access the CloudFront content
- Amazon Simple Notification Service (SNS)
  - Enables a Publish/Subscribe design pattern
  - Topics: A channel for publishing a notification, consider this as an outbox
  - Subscription: Configuring an endpoint to receive messages published on the topic
  - Endpoint protocols include HTTP(S), email, SMS, SQS, Amazon Device Messaging (push notifications) and lambda
  - Fan Out Architecture
    - User Uploads an item ->
    - S3 bucket ->
    - SNS ->
    - Image Upload Topic ->
      - SES -> Thank you email
      - SQS -> Image Resize Queue
      - Lambda -> Amazon Rekognition
- Amazon Simple Queue Service (SQS)
  - Reliable, highly-scalable, hosted message queuing service.
  - Available integration with KMS for encrypted messaging.
  - Transient Storage - default 4 days, max 14 days
  - Optionally supports First-in First-out queueing order.
  - Maximum message size of 256KB but by using a special Java SQS SDK, messages can be as large as 2GB.
    - Stores the message on S3 and creates a SQS message as a pointer to the message
  - Significant benefit of SQS is that it helps create Loosely coupled architectures.
  - Standard Queue vs FIFO queue
    - Standard Queue: Order is not guaranteed.
    - FIFO Queue: Order is guaranteed, first in, first out
- Amazon MQ
  - Managed implementation of Apache's ActiveMQ
  - Fully Managed and highly available within a region
  - Fully supports ActiveMQ API, JMS, NMS, MQTT, WebSocket etc
  - Designed as a drop-in replacement for on-premise message brokers
  - Use SQS if a new application is created from scratch.
  - Use MQ if migrating an existing Message Brokers to AWS
- Amazon Lambda
  - Allows users to run code on-demand without the need for Infrastructure
  - Supports Node.js, Python, Java, Go and C#
  - Code is stateless and executed on a event basis (SNS, SQS, S3, DynamoDB Streams etc)
  - No fundamental limits to scaling a function since AWS dynamically allocates capacity in relation to events
- Simple Workflow Service (SWF)
  - Create distributed asynchronous systems as workflows
  - Supports both sequential and parallel processing.
  - Tracks the state of the workflow which you interact and update via API
  - Best suited for human-enabled workflows like a order fulfilment or procedural requests
  - AWS recommends new applications - look at Step Functions over SWF
  - Components for SWF
    - Activity Worker
      - A program that interacts with the AWS SWF service to get tasks, process tasks and return results
    - Decider
      - A program that controls coordination of tasks such as their ordering, concurrency and scheduling
- AWS Step Functions
  - Managed workflow and orchestration platform
  - Scalable and highly available
  - Define app as a state machine
  - Create tasks, sequential steps, parallel steps, branching paths or timers.
  - Amazon State Language declarative JSON to configure and document the steps of step function
  - Apps can interact and update the stream via Step Function API
  - Visual Interface that helps describe the flow and real time status
  - Detailed logs captured for all the steps
  - Finite State Machine
- AWS Batch
  - Management tool for creating, managing and executing batch-oriented tasks using EC2 instances
  - Create a Compute Environment: Managed or Unmanaged, Spot or On-Demand, vCPUs
  - Create a Job Queue with priority assigned to Compute Environment
  - Create a Job Definition: Script or JSON, environment variables, mount points, IAM role, container image etc.
  - Schedule the Job
  - Comparisons
    - Step Functions
      - When: Out-of-the-box coordination of AWS service components
      - Use Case: Order processing flow
    - Simple Work Flow service
      - When: Need to support external processes or specialised execution logic.
      - Use Case: Loan Application Process with manual review steps
      - Note: This is being phased out and Amazon is pushing towards adopting Step Function.
    - Simple Queue Service
      - When: Message Queue; Store and Forward Patterns
      - Use Case: Image resize process
    - AWS Batch
      - When: Scheduled or reoccurring tasks that doesn't require heavy logic
      - Use Case: Rotate Logs daily on Firewall Appliance
  - Elastic MapReduce
    - This isn't one single product, rather it is a collection of OpenSource products.
      - EMR Core
        
        Hadoop HDFS: Is just the filesystem that the data gets stored in. Conducive to data analytics and data manipulation.
        
        Hadoop MapReduce: Actual Framework to do the processing of the data
      - EMR Management
        
        Zoo Keeper: Involved in Resource Co-ordination.
        
        Oozie: Workflow Framework
        
        Apache Pig: Scripting Framework
        
        Hive: SQL Interface to Hadoop Landscape
        
        Mahout: Machine Learning Component
        
        Apache HBase: Columnar Database for storing Hadoop data
        
        Flume: Very helpful for ingesting Application and System logs
        
        Scoop: Facilitates the import of data into Hadoop from other databases or datasources
        
        Ambari: Management and Monitoring console
      - Enterprise Support, Professional Support and Project Contributions
        
        Hortonworks
        
        Cloudera
    - AWS Elastic MapReduce (EMR)
      - Managed Hadoop Framework for processing huge amounts of data
      - Also supports Apache Spark, HBase, Presto and Flink
      - Most commonly used for log analysis, financial analysis or extract, translate and loading (ETL) activities.
      - A Step is a programmatic task for performing some process on the data (eg. count words)
      - A cluster is a collection of EC2 instances provisioned by EMR to run the Steps
      - Components of EMR
        
        Core Nodes - Holds the data for processing
        
        Task Nodes - Worker nodes, storage is ephemeral (no persistent storage of data), works on the step
      - AWS EMR Process Example
        
        Step 1: Hive Script
        
        Step 2: Custom Jar
        
        Step 3: Pig Script
        
        Step 4: Output to S3 bucket
- Exam Tips
  - Auto Scaling Groups
    - Know the different scaling options and policies
    - Understand the difference and limitations between horizontal and vertical scaling
    - Know what cooling down period (not the same as health check grace period) is and how it impacts the responsiveness to demand
  - Kinesis
    - Exam is likely to restricted to the Data Stream use cases for Kinesis such as Data Streams and Firehose
    - Understand shard concept and how partition keys and sequences enabled shards to manage data flow
  - DynamoDB Autoscaling
    - Know the new and old terminology and concept of a partition, partition key and sort key in the context of DynamoDB
    - Understand how DynamoDB calculates total partitions and allocates RCU and WCU across available partitions
    - Conceptually know how data is stored across partitions
  - CloudFront
    - Know that both static and dynamic content is supported
    - Understand possible origins and how multiple origins can be used together with Behaviors
    - Know invalidation methods, zone apex and geo-restriction options
  - SNS
    - Understand a loosely coupled architecture and benefits it brings
    - Know the different types of subscription end points supported
  - SQS
    - Know the difference between
      - Standard and FIFO Queues
      - Pub/Sub (SNS) and Message Queueing (SQS) architecture
  - Lambda
    - Know what "serverless" is in concept and how Lambda can facilitate such an architecture
    - Know the languages supported by Lambda
  - SWF
    - Understand the difference and functions of a Worker and Decider
    - Best suited for human-enabled workflows like order fulfillment or procedural requests
  - Elastic MapReduce
    - Understand the parts of a Hadoop landscape at a high-level
    - Know what a Cluster is and what steps are
    - Understand the roles of a Master Node, Core Nodes and Task Nodes.
- Whitepapers
  - Web Application Hosting in the AWS Cloud
    - https://d1.awsstatic.com/whitepapers/aws-web-hosting-best-practices.pdf
  - Introduction to Scalable Gaming Patterns on AWS
    - https://d0.awsstatic.com/whitepapers/aws-scalable-gaming-patterns.pdf
  - Performance at Scale with Amazon Elasticache
    - https://d0.awsstatic.com/whitepapers/performance-at-scale-with-amazon-elasticache.pdf
  - Automating Elasticity
    - https://docs.aws.amazon.com/aws-technical-content/latest/cost-optimization-automating-elasticity/cost-optimization-automating-elasticity.pdf
- Videos
  - Scaling up to your First 10 Million Users
    - https://www.youtube.com/watch?v=w95murBkYmU
  - Learning to build a cloud-scale Wordpress site that can keep up with Unpredictable Changes and Capacity Demands
    - https://www.youtube.com/watch?v=dPdac4LL884
  - Elastic Load Balancing Deep Dive and Best Practices
    - https://www.youtube.com/watch?v=9TwkMMogojY
- Pro Tips
  - Elasticity will drive most benefit from the cloud such as cost and agility
  - Think Cloud-First designs if you're in a Green Field scenario even if you are deploying on-prem
    - Most modern data centres are moving towards modular workloads to support things like Docker and Cloud Foundary
    - These things allow us to scale horizontally just like in the cloud
  - If in "Brown Field" situation, create roadmaps for cloud-first enablers like distributed applications, federated data and SOA
  - Be careful to not let elasticity cover for poor development methods
  - Microservice concepts help achieve scalability via decoupling, simplification and seperation of concerns

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chapter 5 - Architecting to Scale.md

Chapter 5 - Architecting to Scale.md

Files

Chapter 5 - Architecting to Scale.md

Latest commit

History

Chapter 5 - Architecting to Scale.md

File metadata and controls