Skip to content

Latest commit

 

History

History
298 lines (297 loc) · 17 KB

Chapter 5 - Architecting to Scale.md

File metadata and controls

298 lines (297 loc) · 17 KB

Architecting to Scale Module

  • Concepts
    • Architectural Patterns
      • Loosely Coupled Architecture (LCA)
        • Components can stand independently and require little or no knowledge of the inner workings of the other components.
        • Benefits of LCA
          • Layers of abstraction
          • Permits more flexibility
          • Interchangeable components
          • More atomic or isolated functional units
          • Can scale components independently
    • Types of Scaling
      • Horizontal Scaling (Scale Out)
        • Add more instances as demand increases
        • No downtime required to Scale Out of Scale in
        • Automatic using Auto-Scaling groups
        • Theoretically Unlimited on AWS
        • Cost Effective
      • Vertical Scaling (Scale Up)
        • Add more CPU and/or RAM to existing instance as demand increases
        • Requires restart to scale up or down
        • Would require scripting to automate
        • Limited by instance sizes
    • Auto-Scaling Groups
      • Automatically provides horizontal scaling (scale-out) as required by the demand
      • Triggered by an event or scaling action to either launch or terminate instances
      • Availability, Cost and System metrics can all factor into scaling.
      • Four Scaling Options
        • Maintain
          • Keep a specific or minimum number of instances running
          • What: Hands-off way to maintain X number of instances
          • When: I need 3 instances always
        • Manual
          • Use maximum, minimum or specific number of instances
          • What: Manually change desired capacity via console or CLI
          • When: My needs change so rarely that I can just manually add and remove
        • Schedule
          • Increase or Decrease instances based on Schedule
          • What: Adjust min/max instances based on specific times
          • When: Every Monday morning, we get a rush on our website
        • Dynamic
          • Scale based on real-time metrics of the systems
          • What: Scale in response to behavior of elements in the environment
          • When: When CPU Utilisation gets to 70% on current instances, scale up
      • Launch Configurations
        • Template used in Autoscaling group which defines
          • type of EC2 instance
          • Specify VPC and Subnets for scaled instances
        • Attach to a ELB
        • Define a Health Check Grace Period
        • Define size of the group to stay at a initial size
        • Or Use scaling policy which can be based from metrics (CPU, ALB Request Count Per Target, Network Traffic In/Out)
      • Scaling Policies
        • Target Tracking Policy
          • What: Scale based on a predefined or custom metric in relation to a target value
          • When: When CPU Utilisation gets to 70% on current instances, scale up
        • Simple Scaling Policy
          • What: Waits until health check and cool down period expires before evaluating new need
          • When: Let's add new instances slow and steady
        • Step Scaling Policy
          • What: Responds to scaling needs with more sophistication and logic
          • When: AGG! Add ALL instances
      • Scaling Cooldowns
        • Configuration duration that gives scaling a chance to "come up to speed" and absorb load
        • Default cooling period is 300 seconds
        • Automatically applies to dynamic scaling and optionally to manual scaling but not supported for scheduled scaling
        • Can override default cooldown via scaling-specific cool down
        • Review examples
    • Kinesis
      • Collection of services for processing streams of various data
      • Data is processed in "shards" - with each shard able to ingest 1000 records per second
      • A default limit of 500 shards, but you can request an increase to unlimited shards
        • Shards allow parallel processing of incoming data.
        • More like lanes in a highway. More lanes, more traffic can pass through.
        • The recommended method to read data from a shard is via Kinesis Consumer Library
      • Record consists of Partition Key, Sequence Number and Data Blob (up to 1MB)
      • Transient Data Store - Default Retention of 24 hours, but can be configured for up to 7 days
      • Flavors for Kinesis
        • Kinesis Video Streams
          • Stream the data coming through from various sources, optimised to handle video data.
        • Kinesis Data Streams
          • Stream any data coming through from various sources
        • Kinesis Firehose
          • Receive data coming through from various sources and land it in S3, Redshift, lambda, EC2 instance etc.
        • Kinesis Data Analytics
          • Apply analytics on incoming data immediately.
    • DynamoDB Scaling
      • Divided into two dimensions
        • Throughout
          • Read Capacity Units
          • Write Capacity Units
        • Size
          • Max Item Size 400KB
      • Terminology
        • Partition: A physical space where DynamoDB is stored
        • Partition Key: A unique identifier for each record; sometimes called as a Hash Key
        • Sort Key: In combination wtih the Partition Key, a composite key can be created that defines storage order; Sometimes called as "Range Key"
      • Under the Hood
        • DynamoDB scales by adding partitions.
        • Partition Calculation
          • By Capacity: (Total RCU / 3000) + (Total WCU / 1000)
          • By Size: Total Size / 10 GB
          • Total Partitions: Round Up for MAX (By Capacity, By Size)
          • Example
            • By Capacity: (2000 RCU / 3000) + (2000 WCU / 1000) = 2.66
            • By Size: 10 GB / 10 GB = 1
            • Total Partitions: MAX (2.66,1) = 2.66 Round Up = 3 Partitions
        • Using Target Tracking method to try and stay close to Target Utilisation
        • Currently does not scale down if table's consumption drops to Zero
          • Workaround #1: Send requests to the table unit until it scales down
          • Workaround #2: Manually reduce the max capacity to be the same as min capacity
        • Also supports Global Secondary Indexes - think of them like a copy of the table just indexed or keyed differently
    • CloudFront
      • Can deliver content to users faster by caching static and dynamic content at edge locations
      • Dynamic Content delivery is achieved using HTTP cookies forwarded from your origin
      • Supports Adobe Flash Media Server's RTMP Protocol but have to choose RTMP delivery method
      • Web distributions also support media streaming and live streaming but use HTTP or HTTPS
      • Origins can be S3, EC2, ELB or another web server
      • Multiple origins can be configured
      • User Behaviors to configure serving up origin content based on URL Paths
      • Example: CloudFront CDN can configured to serve static content from a S3 bucket and dynamic content via an ELB which is backended by an EC2 fleet.
      • Invalidation Requests
        • There are several ways to invalidate a CloudFront cache
          • Simply delete the file from the origin and wait for the TTL to expire. TTL is configurable
          • Use the AWS console to request invalidation for all content or a specific path such as /images/*
          • Use the CloudFront API to submit an invalidation request
          • Use Third-party tools to perform CloudFront invalidation (CloudBerry, YLastic, CDN Planet, CloudFront Purge Tool)
      • CloudFront Supports
        • Zone Apex DNS entries (that is without www or any subdomains in front of the URL)
        • Geo Restriction: White List and Black List countries to access the CloudFront content
    • Amazon Simple Notification Service (SNS)
      • Enables a Publish/Subscribe design pattern
      • Topics: A channel for publishing a notification, consider this as an outbox
      • Subscription: Configuring an endpoint to receive messages published on the topic
      • Endpoint protocols include HTTP(S), email, SMS, SQS, Amazon Device Messaging (push notifications) and lambda
      • Fan Out Architecture
        • User Uploads an item ->
        • S3 bucket ->
        • SNS ->
        • Image Upload Topic ->
          • SES -> Thank you email
          • SQS -> Image Resize Queue
          • Lambda -> Amazon Rekognition
    • Amazon Simple Queue Service (SQS)
      • Reliable, highly-scalable, hosted message queuing service.
      • Available integration with KMS for encrypted messaging.
      • Transient Storage - default 4 days, max 14 days
      • Optionally supports First-in First-out queueing order.
      • Maximum message size of 256KB but by using a special Java SQS SDK, messages can be as large as 2GB.
        • Stores the message on S3 and creates a SQS message as a pointer to the message
      • Significant benefit of SQS is that it helps create Loosely coupled architectures.
      • Standard Queue vs FIFO queue
        • Standard Queue: Order is not guaranteed.
        • FIFO Queue: Order is guaranteed, first in, first out
    • Amazon MQ
      • Managed implementation of Apache's ActiveMQ
      • Fully Managed and highly available within a region
      • Fully supports ActiveMQ API, JMS, NMS, MQTT, WebSocket etc
      • Designed as a drop-in replacement for on-premise message brokers
      • Use SQS if a new application is created from scratch.
      • Use MQ if migrating an existing Message Brokers to AWS
    • Amazon Lambda
      • Allows users to run code on-demand without the need for Infrastructure
      • Supports Node.js, Python, Java, Go and C#
      • Code is stateless and executed on a event basis (SNS, SQS, S3, DynamoDB Streams etc)
      • No fundamental limits to scaling a function since AWS dynamically allocates capacity in relation to events
    • Simple Workflow Service (SWF)
      • Create distributed asynchronous systems as workflows
      • Supports both sequential and parallel processing.
      • Tracks the state of the workflow which you interact and update via API
      • Best suited for human-enabled workflows like a order fulfilment or procedural requests
      • AWS recommends new applications - look at Step Functions over SWF
      • Components for SWF
        • Activity Worker
          • A program that interacts with the AWS SWF service to get tasks, process tasks and return results
        • Decider
          • A program that controls coordination of tasks such as their ordering, concurrency and scheduling
    • AWS Step Functions
      • Managed workflow and orchestration platform
      • Scalable and highly available
      • Define app as a state machine
      • Create tasks, sequential steps, parallel steps, branching paths or timers.
      • Amazon State Language declarative JSON to configure and document the steps of step function
      • Apps can interact and update the stream via Step Function API
      • Visual Interface that helps describe the flow and real time status
      • Detailed logs captured for all the steps
      • Finite State Machine
    • AWS Batch
      • Management tool for creating, managing and executing batch-oriented tasks using EC2 instances
      • Create a Compute Environment: Managed or Unmanaged, Spot or On-Demand, vCPUs
      • Create a Job Queue with priority assigned to Compute Environment
      • Create a Job Definition: Script or JSON, environment variables, mount points, IAM role, container image etc.
      • Schedule the Job
      • Comparisons
        • Step Functions
          • When: Out-of-the-box coordination of AWS service components
          • Use Case: Order processing flow
        • Simple Work Flow service
          • When: Need to support external processes or specialised execution logic.
          • Use Case: Loan Application Process with manual review steps
          • Note: This is being phased out and Amazon is pushing towards adopting Step Function.
        • Simple Queue Service
          • When: Message Queue; Store and Forward Patterns
          • Use Case: Image resize process
        • AWS Batch
          • When: Scheduled or reoccurring tasks that doesn't require heavy logic
          • Use Case: Rotate Logs daily on Firewall Appliance
      • Elastic MapReduce
        • This isn't one single product, rather it is a collection of OpenSource products. alt text
          • EMR Core
            • Hadoop HDFS: Is just the filesystem that the data gets stored in. Conducive to data analytics and data manipulation.
            • Hadoop MapReduce: Actual Framework to do the processing of the data
          • EMR Management
            • Zoo Keeper: Involved in Resource Co-ordination.
            • Oozie: Workflow Framework
            • Apache Pig: Scripting Framework
            • Hive: SQL Interface to Hadoop Landscape
            • Mahout: Machine Learning Component
            • Apache HBase: Columnar Database for storing Hadoop data
            • Flume: Very helpful for ingesting Application and System logs
            • Scoop: Facilitates the import of data into Hadoop from other databases or datasources
            • Ambari: Management and Monitoring console
          • Enterprise Support, Professional Support and Project Contributions
            • Hortonworks
            • Cloudera
        • AWS Elastic MapReduce (EMR)
          • Managed Hadoop Framework for processing huge amounts of data
          • Also supports Apache Spark, HBase, Presto and Flink
          • Most commonly used for log analysis, financial analysis or extract, translate and loading (ETL) activities.
          • A Step is a programmatic task for performing some process on the data (eg. count words)
          • A cluster is a collection of EC2 instances provisioned by EMR to run the Steps
          • Components of EMR
            • Core Nodes - Holds the data for processing
            • Task Nodes - Worker nodes, storage is ephemeral (no persistent storage of data), works on the step
          • AWS EMR Process Example
            • Step 1: Hive Script
            • Step 2: Custom Jar
            • Step 3: Pig Script
            • Step 4: Output to S3 bucket
    • Exam Tips
      • Auto Scaling Groups
        • Know the different scaling options and policies
        • Understand the difference and limitations between horizontal and vertical scaling
        • Know what cooling down period (not the same as health check grace period) is and how it impacts the responsiveness to demand
      • Kinesis
        • Exam is likely to restricted to the Data Stream use cases for Kinesis such as Data Streams and Firehose
        • Understand shard concept and how partition keys and sequences enabled shards to manage data flow
      • DynamoDB Autoscaling
        • Know the new and old terminology and concept of a partition, partition key and sort key in the context of DynamoDB
        • Understand how DynamoDB calculates total partitions and allocates RCU and WCU across available partitions
        • Conceptually know how data is stored across partitions
      • CloudFront
        • Know that both static and dynamic content is supported
        • Understand possible origins and how multiple origins can be used together with Behaviors
        • Know invalidation methods, zone apex and geo-restriction options
      • SNS
        • Understand a loosely coupled architecture and benefits it brings
        • Know the different types of subscription end points supported
      • SQS
        • Know the difference between
          • Standard and FIFO Queues
          • Pub/Sub (SNS) and Message Queueing (SQS) architecture
      • Lambda
        • Know what "serverless" is in concept and how Lambda can facilitate such an architecture
        • Know the languages supported by Lambda
      • SWF
        • Understand the difference and functions of a Worker and Decider
        • Best suited for human-enabled workflows like order fulfillment or procedural requests
      • Elastic MapReduce
        • Understand the parts of a Hadoop landscape at a high-level
        • Know what a Cluster is and what steps are
        • Understand the roles of a Master Node, Core Nodes and Task Nodes.
    • Whitepapers
    • Videos
    • Pro Tips
      • Elasticity will drive most benefit from the cloud such as cost and agility
      • Think Cloud-First designs if you're in a Green Field scenario even if you are deploying on-prem
        • Most modern data centres are moving towards modular workloads to support things like Docker and Cloud Foundary
        • These things allow us to scale horizontally just like in the cloud
      • If in "Brown Field" situation, create roadmaps for cloud-first enablers like distributed applications, federated data and SOA
      • Be careful to not let elasticity cover for poor development methods
      • Microservice concepts help achieve scalability via decoupling, simplification and seperation of concerns