Skip to content

Latest commit

 

History

History
228 lines (191 loc) · 12.3 KB

File metadata and controls

228 lines (191 loc) · 12.3 KB

Monitoring

  • Act of collecting & analyzing data to determine the performance, health, and availability of applications and their depended resources.
  • 💡📝 Understanding the operation of applications and alerts helps fixing problems before they occur.

Azure monitoring services

  • Application Monitoring: • Application Insights
  • Deep Infrastructure Monitoring: • Log Analytics • Management Solutions • Network Monitoring • Service Map
  • Core Monitoring: • Azure Monitor • Advisor • Service Health • Activity Log
  • Shared Capabilities: • Alerts • Dashboards • Metrics Explorer

Azure monitoring costs

  • Use # Calculator for a specific resource.
  • Review estimated costs when creating a resource
  • See spent costs through Subscription blade
    • You can task resources Cost analysis with e.g. costCenter: marketing to filter in cost analysis view.
      • 💡 Recommended to tag & export.
    • You can see your bill in "Invoices", opt-in for PDF invoices & download (more options available in Azure Account Center)
    • External services billed separately
  • Get notifications through Azure Advisor or Azure Cost Management for anomalies and overspending risks.

Azure Monitor

  • PaaS (Platform as a Service)
  • Pipeline for metric log data coming from any Azure resource provider.
  • Fastest telemetry pipeline: Faster than Log Analytics.
  • Collected data is saved in Log Analytics to analyze, monitor, visualize metrics and query and analyze logs.
    • Visualize: • Dashboard • Views • Power BI • Workbooks
    • Analyze: • Metrics Explorer • Log Analytics
    • Respond: • Alerts • Autoscale
    • Integrate: • Event Hubs • Logic Apps • Ingest & Export APIs
  • You can use alerts to proactively notify you based on rules with Azure Alerts.
  • You can access through
    • Azure Insight & Analytics (OMS portal that's now legacy)
    • On Azure portal there's an existing resource called Azure Monitor.
    • Rest API, PowerShell & CLI

Data types

Metrics

  • On resource level
  • All numerical values including
    • All Application Insights data / telemetry
    • System health from VMs (uses HyperV metrics without any configuration)
  • Can be visualized & queried
  • 30 days free to store & query
  • E.g. requests and errors, average response time, infrastructure metrics

Activity Log

Activity log categories
  • Administrative
    • Every API call to Azure Resource Manager
    • E.g. create virtual machine or delete network security group
  • Service health
    • Any service health incidents occurred in Azure
    • Come in 5 varieties: • Action Required • Assisted Recovery • Incident • Maintenance • Information • Security.
    • E.g. SQL Azure in East US is experiencing downtime.
  • Resource health
    • Status of resources: • Available • Unavailable • Degraded • Unknown
    • E.g. Virtual Machine health status changed to unavailable
  • Alert
    • All activations of Azure alerts.
    • E.g. CPU % on a VM has been over 80 for the past 5 minutes
  • Autoscale
    • Events in autoscale engine defined by your settings.
    • E.g. Scale up action failed
  • Recommendation
    • How to better utilize your resources.
  • Security
    • Events by Azure Security Center
    • E.g. Suspicious double extension file executed.
  • Policy
    • All effect action operations performed by Azure Policy.
    • Types include audit and deny
  • 💡 Security monitoring should be set-up for Administrative, Security, Policy and ServiceHealth

Resource log

  • On platform level.
  • "Inside operations" within the resource.
  • Resource-specific with common scheme where resources include their own properties.
  • E.g. IIS logs, web server logging, failed request tracking.
  • Can be streamed (to Power BI, Azure Functions etc.), added in blob storage, and sent directly to Log Analytics.
  • 🤗 Formerly known as diagnostic logs 1

Diagnostic settings

  • Diagnostic setting is resource deployed to send logs different destinations.
  • Each Azure resource requires its own diagnostic setting
  • A diagnostic setting consists of
    • Sources (depends on resource type)
    • Destinations (destinations to send)

Diagnostic settings destinations

  • Logs can be sent to resources across subscriptions 4
  • Using Azure Lighthouse, it can be sent across tenants 4
  • Resources that can be sent include:
    • A storage account (storageAccountId 1)
      • Allows archival
      • Allows you to configure retention in days5 (retentionPolicy 1)
      • 💡 Can enable immutable on blob storage to protect against tampering 2
    • Log analytics workspace (workspaceId)
      • Metrics are converted to forms and sent to Azure Monitor Logs
      • Helps to query, visualize, set-up alerts and integrate with Azure Sentinel
    • Azure marketplace resource (marketplacePartnerId)
      • Also known as partner integrations 3
      • E.g. Datadog, Elastic, Logz.io.
    • Event Hub (eventHubName)

Azure Alerts

  • All alert creation for metrics, logs and activity log across Azure Monitor, Log Analytics, and Application Insights
  • Separation of operational and configuration views:
    • Alert Rules: Definition of the condition that triggers an alert
    • Fired Alerts: An instance of the alert rule firing
  • Flow of alerts
    • Set-up alert rule
      • Target resource (e.g. storage account)
        • Signal:
          • Types are Metric, Activity log, Application Insights and Log
          • You can have multi-dimensional metrics & monitor multiple metrics with a single rule (currently up to two)
      • Criteria
        • Logic test: e.g. six-hour period when capacity is over 10 MB
    • Action group (= actions to do)
      • Grouping of different actions to take when the alert is triggered
      • Each action has name & action type e.g. email/sms/push/voice/webhook/automation runbook
        • ❗ Applied rate limiting:
          • SMS: No more than 1 SMS every 5 minutes.
          • Voice: No more than 1 Voice call every 5 minutes.
          • Email: No more than 100 emails in an hour.
          • Other actions are not rate limited.
    • Set alert rule name, description, severity (can be informational, warning, error, critical)
  • Log alerts
    • Defined by Log Query (by Log Analytics), Time period, frequency, threshold.
    • Number of results alert rules always creates a single alert, while Metric measurement alert rule creates an alert for each object that exceeds the threshold

Azure Advisor

  • Uses telemetry & application configurations to give personalize recommendations and guidance for
    • high availability, security, performance, cost effectiveness (monitors unused resources and spent).
  • Common resource, free for all users
  • You can download, filter, postpone, dismiss recommendations.
  • You can customize by excluding subscription/resource groups, configuring utilization rules (e.g. you can as a subscription owner set CPU to lower threshold)

Log Analytics

  • PaaS (Platform as a Service)
  • It's also referred as (newer names)
    • Azure Monitor Logs
      • Azure Monitor log data is stored in Log Analytics
      • The term is changed from "Log Analytics" to "Azure Monitor Logs" (see)
    • Log Search
      • "Log Search" interface in Azure Monitor (Monitor -> Logs)
      • Or Log Analytics Workspace -> Logs
  • Two main features
    • Log Analytics Workspace where Azure Monitor stores its data
    • Log search feature; collect, correlate, search, and act on data with any schema.
  • Business value:
    • Assessing updates: From logs can guess average patching time
    • Change tracking: Abnormal behavior from a specific account by tracking changes throughout the environment
  • Free to store logs for 90 days of charge
    • You can export to Excel, PowerBI, or use API to send data or get.

Sending data to Log Analytics

  • Log analytics uses push-model i.e. data is pushed to it.
  • Data can be pushed from integrated sources including
    • Connected Azure sources e.g. IIS logs, custom text logs with custom fields, error level etc.
    • Office 365, Azure Automation, Back-ups
    • ❗️ You cannot change schema on ingestion time for integrated resources, only on query time.
  • Data can also be pushed Linux & Windows systems with an agent
    • Agents include
      • Log analytics agent
      • Operations Manager
        • Allows using existing investments with SCOM (System Center Operations Manager)
        • SCOM agents communicate with SCOM Server over TLS 1.2 which forward events and performance data to Log Analytics.
    • You can install agents using a script via Azure Automation DSC (desired state configuration)
      • Agents are already installed in cloud Windows VMs
  • You can also use Log Analytics REST APIs to send custom data with any schema you want

Collect data across subscriptions

  • Log analytics applies if in same subscription, or in same Azure Active Directory tenant.
  • So single Log Analytics workspace can monitor across under same tenant.
  • To collect data across subscriptions and tenants:
    • In customer subscription
      • Activity log (export button) => Event Hub
    • In Service provider subscription
      • Logic App (When events are available in Event Hub -> Parse JSON (Body) -> Compose (Select Body in inputs) -> Send Data (Azure Log Analytics Data Collector))=> Log Analytics
  • Azure Monitor Logs support query across multiple Log Analytics workspaces across subscriptions
    • E.g. app("/subscriptions/b459b4f6-912x-46d5-9cb1-b43069212ab4/resourcegroups/Fabrikam/providers/microsoft.insights/components/fabrikamapp").requests | count
    • Read more on Microsoft Docs

Querying Log Analytics

  • Each data source has documentation (description & name) of its properties.
  • 📝 Main query tables: Heartbeat, Perf, Usage, Event, Syslog, Alert.
  • You can connect to Activity Logs in Activity Logs section and query with AzureActivity

Kusto Query Language (KQL)

  • A pipe-through language to query log analytics data
  • E.g.: Event | where (EventLevelName == "Error") | where (TimeGenerated > ago(1days)) | summarize ErrorCount = count() by Computer | top 10 by ErrorCount desc
  • Can generate charts | render timechart that can be pinned to dashboard.