Monday, 21 March 2022

Amazon EMR

 

  • A managed cluster platform that simplifies running big data frameworks, such as Apache Hadoop and Apache Spark, on AWS to process and analyze vast amounts of data.
  • You can process data for analytics purposes and business intelligence workloads using EMR together with Apache Hive and Apache Pig.
  • You can use EMR to transform and move large amounts of data into and out of other AWS data stores and databases.

Features

  • EMR Notebooks provide a managed environment, based on Jupyter Notebooks, to help users prepare and visualize data, collaborate with peers, build applications, and perform interactive analysis using EMR clusters.
  • EMR enables you to quickly and easily provision as much capacity as you need, and automatically or manually add and remove capacity.
  • You can leverage multiple data stores, including S3, the Hadoop Distributed File System (HDFS), and DynamoDB.

AWS Training Amazon EMR 2

  • EMR supports powerful and proven Hadoop tools such as Hive, Pig, HBase, and Impala. Additionally, it can run distributed computing frameworks besides Hadoop MapReduce such as Spark or Presto using bootstrap actions. You can also use Hue and Zeppelin as GUIs for interacting with applications on your cluster.

Components

  • Clusters – A collection of EC2 instances. You can create two types of clusters:
    • transient cluster that auto-terminates after steps complete.
    • long-running cluster that continues to run until you terminate it deliberately.
  • Nodes – Each EC2 instance in a cluster is called a node.
  • Node Type – Each node has a role within the cluster, referred to as the node type. The node types are:
    • Master node: A node that manages the cluster by running software components to coordinate the distribution of data and tasks among other nodes for processing. The master node tracks the status of tasks and monitors the health of the cluster. Every cluster has a master node, and it’s possible to create a single-node cluster with only the master node. Does not support automatic failover.
    • Core node: A node with software components that run tasks and store data in the Hadoop Distributed File System (HDFS) on your cluster. Multi-node clusters have at least one core node. EMR is fault tolerant for slave failures and continues job execution if a slave node goes down.
    • Task node: A node with software components that only runs tasks and does not store data in HDFS. Task nodes are optional.

EMR Architecture

  • Storage – this layer includes the different file systems that are used with your cluster.
    • Hadoop Distributed File System (HDFS) – a distributed, scalable file system for Hadoop.
      • HDFS distributes the data it stores across instances in the cluster, storing multiple copies of data on different instances to ensure that no data is lost if an individual instance fails.
      • HDFS is ephemeral storage.
      • HDFS is useful for caching intermediate results during MapReduce processing or for workloads that have significant random I/O.
    • EMR File System (EMRFS) – With EMRFS, EMR extends Hadoop to directly be able to access data stored in S3 as if it were a file system.
    • Local File System – refers to a locally connected disk.
      • Each EC2 node in your cluster comes with a pre-configured instance store, which persists only on the lifetime of the EC2 instance.
  • Cluster Resource Management – this layer is responsible for managing cluster resources and scheduling the jobs for processing data.
    • By default, Amazon EMR uses YARN, which is a component introduced in Apache Hadoop 2.0 to centrally manage cluster resources for multiple data-processing frameworks.
    • EMR has an agent on each node that administers YARN components, keeps the cluster healthy, and communicates with EMR.
  • Data Processing Frameworks – this layer is the engine used to process and analyze data.
    • Hadoop MapReduce – an open-source programming model for distributed computing.
    • Apache Spark – a cluster framework and programming model for processing big data workloads.

Data Processing

  • Ways to process data in your EMR cluster:
    • Submitting Jobs Directly to Applications – Submit jobs and interact directly with the software that is installed in your EMR cluster. To do this, you connect to the master node over a secure connection and access the interfaces and tools that are available for the software that runs directly on your cluster.
    • Running Steps to Process Data – Submit one or more ordered steps to an EMR cluster. Each step is a unit of work that contains instructions to manipulate data for processing by software installed on the cluster.

Scaling

  • There are two main options for adding or removing capacity:
    • Deploy multiple clusters: If you need more capacity, you can easily launch a new cluster and terminate it when you no longer need it. There is no limit to how many clusters you can have.
    • Resize a running cluster: You may want to scale out a cluster to temporarily add more processing power to the cluster, or scale in your cluster to save on costs when you have idle capacity. When adding instances to your cluster, EMR can now start utilizing provisioned capacity as soon it becomes available. When scaling in, EMR will proactively choose idle nodes to reduce impact on running jobs.

Deployment

  • Choose the instance size and type that best suits the processing needs for your cluster
    • Batch processing
    • Low-latency queries
    • Streaming data
    • Large data storage
  • In addition to the standard software and applications that are available for installation on your cluster, you can use bootstrap actions to install custom software.

EMR Notebooks

  • A serverless Jupyter notebook.
  • An EMR cluster is required to execute the code and queries within an EMR notebook, but the notebook is not locked to the cluster.
  • Runs Apache Spark.

Managing Clusters

  • cluster step is a user-defined unit of processing, mapping roughly to one algorithm that manipulates the data.
  • When creating a cluster, typically you should select the Region where your data is located.
  • You can connect to the master node only while the cluster is running. When the cluster terminates, the EC2 instance acting as the master node is terminated and is no longer available.
  • By default, the ElasticMapReduce-master security group does not permit inbound SSH access.
  • You can set termination protection on a cluster.
  • You can adjust the number of EC2 instances available to an EMR cluster automatically or manually in response to workloads that have varying demands.
  • The EMR File System (EMRFS) is an implementation of HDFS that all EMR clusters use for reading and writing regular files from EMR directly to S3. It provides the convenience of storing persistent data in S3 for use with Hadoop while also providing features like consistent view and data encryption.
  • You can add tags to your clusters.
  • You can launch an EMR cluster with three master nodes to enable high availability for EMR applications. Amazon EMR automatically fails over to a standby master node if the primary master node fails or if critical processes.

High Availability

  • You can launch an EMR cluster with three master nodes and support high availability for HBase clusters on EMR. Amazon EMR automatically fails over to a standby master node if the primary master node fails or if critical processes such as Resource Manager or Name Node crash. 
  • In the event of a failover, Amazon EMR automatically replaces the failed master node with a new master node with the same configuration and boot-strap actions.

Monitoring

  • EMR integrates with CloudTrail to log information about requests made by or on behalf of your AWS account.
  • You can monitor and interact with your cluster by forming a secure connection between your remote computer and the master node by using SSH.
  • EMR provides the ability to archive log files in S3 so you can store logs and troubleshoot issues even after your cluster terminates.
  • EMR also provides an optional debugging tool.
  • EMR integrates with CloudWatch to track performance metrics for the cluster and jobs within the cluster.

Security

  • EMR integrates with IAM to manage permissions. You define permissions using IAM policies, which you attach to IAM users or IAM groups. The permissions that you define in the policy determine the actions that those users or members of the group can perform and the resources that they can access.
  • EMR uses IAM roles for the EMR service itself and the EC2 instance profile for the instances. These roles grant permissions for the service and instances to access other AWS services on your behalf. There is a default role for the EMR service and a default role for the EC2 instance profile.
  • EMR uses security groups to control inbound and outbound traffic to your EC2 instances. When you launch your cluster, EMR uses a security group for your master instance and a security group to be shared by your core/task instances.
  • EMR supports optional S3 server-side and client-side encryption with EMRFS to help protect the data that you store in S3.
  • EMR supports launching clusters in a VPC.
  • EMR release version 5.10.0 and later supports Kerberos, which is a network authentication protocol

Pricing

  • You pay a per-second rate for every second for each node you use, with a one-minute minimum.
  • The EMR price is in addition to the EC2 price (the price for the underlying servers) and EBS price (if attaching EBS volumes).

Amazon Elasticsearch (Amazon ES)

 

  • Amazon ES lets you search, analyze, and visualize your data in real-time. This service manages the capacity, scaling, patching, and administration of your Elasticsearch clusters for you, while still giving you direct access to the Elasticsearch APIs.
  • The service offers open-source Elasticsearch APIs, managed Kibana, and integrations with Logstash and other AWS Services. This combination is often coined as the ELK Stack.

Concepts

  • An Amazon ES domain is synonymous with an Elasticsearch cluster. Domains are clusters with the settings, instance types, instance counts, and storage resources that you specify.
  • You can create multiple Elasticsearch indices within the same domain. Elasticsearch automatically distributes the indices and any associated replicas between the instances allocated to the domain.
  • Amazon ES uses a blue/green deployment process when updating domains. Blue/green typically refers to the practice of running two production environments, one live and one idle, and switching the two as you make software changes.
  • When new service software becomes available, you can request an update to your domain and benefit from new features immediately. Some updates are required, and others are optional.
  • If you take no action on required updates, AWS updates the service software automatically after a certain timeframe (typically two weeks).
  • The alerting feature notifies you when data from one or more Elasticsearch indices meet certain conditions, such as receiving HTTP 503 errors.
  • New SQL support enables you to query your domain using the familiar SQL syntax without compromising on Elasticsearch’s full-text search and scoring capabilities. With SQL support, you can query your data using aggregations, group by, and where clauses to investigate your data.

Storage

  • You can choose between local on-instance storage (up to 3 PB) or Amazon EBS volumes to store your Elasticsearch indices.
  • You can build data durability for your Amazon Elasticsearch domain through automated and manual snapshots. By default, the service will automatically create daily snapshots of each domain and retain them for 14 days. The automated snapshots are stored free of charge in Amazon S3, while the manual snapshots will incur standard Amazon S3 usage charges.
  • Snapshots backup a cluster’s data and state, includes cluster settings, node information, index settings, and shard allocation.
  • You cannot use automated snapshots to migrate to new domains. For migrations, you must use manual snapshots stored in your S3 bucket.

Data Ingestion

  • Easily ingest structured and unstructured data into your Amazon Elasticsearch domain with Logstash, an open-source data pipeline that helps you process logs and other event data.
  • You can also ingest data into your Amazon Elasticsearch domain using Amazon Kinesis Firehose, AWS IoT, or Amazon CloudWatch Logs.
  • You can get faster and better insights into your data using Kibana, an open-source analytics and visualization platform. Kibana is automatically deployed with your Amazon Elasticsearch Service domain.
  • You can load streaming data from the following sources using AWS Lambda event handlers:
    • Amazon S3
    • Amazon Kinesis Data Streams and Data Firehose
    • Amazon DynamoDB
    • Amazon CloudWatch
    • AWS IoT
  • Amazon ES exposes three Elasticsearch logs through CloudWatch Logs:
    • error logs
    • search slow logs – These logs help fine tune the performance of any kind of search operation on Elasticsearch.
    • index slow logs – These logs provide insights into the indexing process and can be used to fine-tune the index setup.
    • Indexing
      • Before you can search data, you must index it. Indexing is the method by which search engines organize data for fast retrieval.
      • In Elasticsearch, the basic unit of data is a JSON document.
      • Within an index, Elasticsearch organizes documents into types (arbitrary data categories that you define) and identifies them using a unique ID.

Kibana and Logstash

  • Kibana is a popular open source visualization tool designed to work with Elasticsearch.
  • The URL is elasticsearch-domain-endpoint/_plugin/kibana/.
  • You can configure your own Kibana instance aside from using the default provided Kibana.
  • Amazon ES uses Amazon Cognito to offer username and password protection for Kibana. (Optional feature)
  • Logstash provides a convenient way to use the bulk API to upload data into your Amazon ES domain with the S3 plugin. The service also supports all other standard Logstash input plugins that are provided by Elasticsearch.
  • Amazon ES also supports two Logstash output plugins:
    • standard Elasticsearch plugin
    • logstash-output-amazon-es plugin, which signs and exports Logstash events to Amazon ES.

Security

  • Amazon ES is HIPAA eligible and compliant with PCI DSS, NOC and ISO standards.
  • You can securely connect your applications to your managed Elasticsearch environment from your VPC or via the public Internet, configuring network access using VPC security groups or IP-based access policies.
  • Securely authenticate your users and control access using Amazon Cognito and AWS IAM.
  • Has built-in encryption of data-at-rest and in-transit to protect your data both when it is stored in your domain or in automated snapshots, and when it is transferred between nodes in your domain.
  • You can create three types of policies to control access to domains:
    • Resource-based Policies – attached to domains. These policies specify which actions a principal can perform on the domain’s subresources, which include Elasticsearch indices and APIs.
    • Identity-based policies – attached to IAM users or roles.
    • IP-based Policies – restrict access to a domain to one or more IP addresses or CIDR blocks.
  • Placing an Amazon ES domain within a VPC enables secure communication between Amazon ES and other services within the VPC without the need for an internet gateway, NAT device, or VPN connection. All traffic remains securely within the AWS Cloud.

High Availability

  • You can deploy your Amazon ES instances across multiple AZs (up to three). If you enable replicas for your indexes, the shards will automatically be distributed such that you have cross-zone replication.
  • If one or more instances in an AZ are unreachable or not functional, Amazon ES automatically tries to bring up new instances in the same AZ to replace the affected instances.
  • When enabling Multi-AZ, you should create at least one replica for each index in your cluster. Without replicas, Amazon ES can’t distribute copies of your data to other Availability Zones. Amazon ES
  • Even if you select two Availability Zones when configuring your domain, Amazon ES automatically distributes dedicated master nodes across three Availability Zones. This distribution helps prevent cluster downtime if a zone experiences a service disruption. It also assists in electing a new master node through a quorum between the two remaining nodes.
  • Amazon Elasticsearch Service has increased its snapshot frequency from daily to hourly and are retained for 14 days at no extra charge.

Limitations

  • You can either launch your domain within a VPC or use a public endpoint, but not in both.
  • You cannot switch your domain from a VPC to a public endpoint and vice versa.
  • You can’t launch your domain within a VPC that uses dedicated tenancy.
  • You cannot move your domain to a different VPC that where it was initially launched in.
  • To access the default installation of Kibana for a domain that resides within a VPC, users must have access to the VPC.

Pricing

  • You pay for each hour of use of an EC2 instance and for the cumulative size of any EBS storage volumes attached to your instances.
  • You can use Reserved Instances to reduce long term cost on your EC2 instances.

Use Cases

  • Log Analytics
  • Real-Time Application Monitoring
  • Security Analytics
  • Full Text Search
  • Clickstream Analytics

Amazon Cloud Search

 

  • A fully-managed service in the AWS Cloud that makes it easy to set up, manage, and scale a search solution for your website or application.

Features

  • You can use CloudSearch to index and search both structured data and plain text.
  • Full text search with language-specific text processing
  • Boolean search
  • Prefix searches
  • Range searches
  • Term boosting
  • Faceting
  • Highlighting
  • Autocomplete Suggestions
  • You can get search results in JSON or XML, sort and filter results based on field values, and sort results alphabetically, numerically, or according to custom expressions.
  • CloudSearch can scale to accommodate the amount of data uploaded to the domain and the volume and complexity of search requests.
  • You can integrate CloudSearch with API Gateway.

To access the CloudSearch search and document services, you use separate domain-specific endpoints:

  • http://doc-domainnamedomainid.us-east-1.cloudsearch.amazonaws.com—a domain’s document service endpoint is used to upload documents.
  • http://search-domainnamedomainid.us-east-1.cloudsearch.amazonaws.com—a domain’s search endpoint is used to submit search requests.
  • CloudSearch supports authentication using AWS Signature Version 4.
  • To search your data with CloudSearch, the first thing you need to do is create a search domain. If you have multiple collections of data that you want to make searchable, you can create multiple search domains.
  • search partition is the portion of your data which fits on a single search instance. A search domain can have one or more search partitions, and the number of search partitions can change as your documents are indexed.
  • facet is an index field that represents a category that you want to use to refine and filter search results. When you submit search requests to CloudSearch, you can request facet information to find out how many hits share the same value in a facet.
  • During indexing, CloudSearch processes the contents of text and text-array fields according to the language-specific analysis scheme configured for the field. An analysis scheme controls how the text is normalized, tokenized, and stemmed, and specifies any stopwords or synonyms to take into account during indexing. CloudSearch provides default analysis schemes for each supported language.
  • You can customize how search results are ranked by defining expressions that calculate custom values for every document that matches your search criteria.
  • To make your data searchable, you need to format your data in JSON or XML.  Each item that you want to be able to receive as a search result is represented as a document. Every document has a unique document ID and one or more fields that contain the data that you want to search and return in results.
  • You can specify a variety of options to constrain your search, request facet information, control ranking, and specify what you want to be returned in the results. You can get search results in either JSON or XML. By default, CloudSearch returns results in JSON.
  • By default, CloudSearch returns search results ranked according to the hits’ relevance _scores. A document’s relevance _score indicates how relevant a particular search hit is to the search request.

Scaling

  • Scaling for Traffic (domain depth) – When a search instance nears its maximum load, CloudSearch deploys a duplicate search instance to provide additional processing power. When traffic drops, it removes search instances to minimize costs.
  • Scaling for Data (domain width) – When the amount of data you add to your domain exceeds the capacity of the initial search instance type, CloudSearch scales your search domain to a larger search instance type. After a domain exceeds the capacity of the largest search instance type, CloudSearch partitions the search index across multiple search instances. When the volume of data in your domain shrinks, it scales down your domain to fewer search instances or a smaller search instance type to minimize costs.

Fault Tolerance

  • You can expand a CloudSearch domain to an additional AZ in the same region to increase fault tolerance in the event of a service disruption.
  • When you turn on the Multi-AZ option, CloudSearch provisions and maintains extra instances for your search domain in a second AZ to ensure high availability. The maximum number of AZs a domain can be deployed in is two.

Monitoring

  • You can retrieve information about each of your search domains through CloudSearch Console, AWS CLI or SDK.
  • Monitor a domain using CloudWatch.
  • Log API calls made to CloudSearch using CloudTrail.

Pricing

  • Customers are billed according to their monthly usage across the following dimensions:
    • Search instances
    • Document batch uploads
    • IndexDocuments requests
    • Data transfer

Amazon Athena

 

  • An interactive query service that makes it easy to analyze data directly in S3 using standard SQL.

Features

  • Athena is serverless.
  • Has a built-in query editor.
  • Uses Presto, an open source, distributed SQL query engine optimized for low latency, ad hoc analysis of data.
  • Athena supports a wide variety of data formats such as CSV, JSON, ORC, Avro, or Parquet.
  • Athena automatically executes queries in parallel, so that you get query results in seconds, even on large datasets.
  • Athena uses Amazon S3 as its underlying data store, making your data highly available and durable.
  • Athena integrates with Amazon QuickSight for easy data visualization.
  • Athena integrates out-of-the-box with AWS Glue.

Athena uses a managed Data Catalog to store information and schemas about the databases and tables that you create for your data stored in S3.

Partitioning

  • By partitioning your data, you can restrict the amount of data scanned by each query, thus improving performance and reducing cost.
  • Athena leverages Hive for partitioning data.
  • You can partition your data by any key.

Queries

  • You can query geospatial data.
  • You can query different kinds of logs as your datasets.
  • Athena stores query results in S3.
  • Athena retains query history for 45 days.
  • Athena does not support user-defined functions, INSERT INTO statements, and stored procedures.
  • Athena supports both simple data types such as INTEGER, DOUBLE, VARCHAR and complex data types such as MAPS, ARRAY and STRUCT.
  • Athena supports querying data in Amazon S3 Requester Pays buckets.

Security

  • Control access to your data by using IAM policies, access control lists, and S3 bucket policies.
  • If the files in the target S3 bucket is encrypted, you can perform queries on the encrypted data itself.

Pricing

  • You pay only for the queries that you run. You are charged based on the amount of data scanned by each query.
  • You are not charged for failed queries.
  • You can get significant cost savings and performance gains by compressing, partitioning, or converting your data to a columnar format, because each of those operations reduces the amount of data that Athena needs to scan to execute a query.

AWS Trusted Advisor

 

  • Trusted Advisor analyzes your AWS environment and provides best practice recommendations in five categories:
    • Cost Optimization
    • Performance
    • Security
    • Fault Tolerance
    • Service Limits
  • Access to the seven core Trusted Advisor checks are available to all AWS users.
  • Access to the full set of Trusted Advisor checks are available to Business and Enterprise Support plans.

AWS Training AWS Trusted Advisor 2

AWS Systems Manager

 

  • Allows you to centralize operational data from multiple AWS services and automate tasks across your AWS resources.

Features

  • Create logical groups of resources such as applications, different layers of an application stack, or production versus development environments.
  • You can select a resource group and view its recent API activity, resource configuration changes, related notifications, operational alerts, software inventory, and patch compliance status.
  • Collects information about your instances and the software installed on them.
  • Allows you to safely automate common and repetitive IT operations and management tasks across AWS resources.
  • Provides a browser-based interactive shell and CLI for managing Windows and Linux EC2 instances, without the need to open inbound ports, manage SSH keys, or use bastion hosts. Administrators can grant and revoke access to instances through a central location by using IAM policies.
  • Helps ensure that your software is up-to-date and meets your compliance policies.
  • Lets you schedule windows of time to run administrative and maintenance tasks across your instances.

SSM Agent is the tool that processes Systems Manager requests and configures your machine as specified in the request. SSM Agent must be installed on each instance you want to use with Systems Manager. On newer AMIs and instance types, SSM Agent is installed by default. On older versions, you must install it manually.


Capabilities

  • Automation
    • Allows you to safely automate common and repetitive IT operations and management tasks across AWS resources
    • step is defined as an initiated action performed in the Automation execution on a per-target basis. You can execute the entire Systems Manager automation document in one action or choose to execute one step at a time.
    • Concepts
        • Automation document – defines the Automation workflow.
        • Automation action – the Automation workflow includes one or more steps. Each step is associated with a particular action or plugin. The action determines the inputs, behavior, and outputs of the step.
        • Automation queue – if you attempt to run more than 25 Automations simultaneously, Systems Manager adds the additional executions to a queue and displays a status of Pending. When an Automation reaches a terminal state, the first execution in the queue starts.
    • You can schedule Systems Manager automation document execution.
  • Resource Groups
    • A collection of AWS resources that are all in the same AWS region, and that match criteria provided in a query.
    • Use Systems Manager tools such as Automation to simplify management tasks on your groups of resources. You can also use groups as the basis for viewing monitoring and configuration insights in Systems Manager.
  • Built-in Insights
    • Show detailed information about a single, selected resource group.
    • Includes recent API calls through CloudTrail, recent configuration changes through Config, Instance software inventory listings, instance patch compliance views, and instance configuration compliance views.
  • Systems Manager Activation
    • Enable hybrid and cross-cloud management. You can register any server, whether physical or virtual to be managed by Systems Manager.
  • Inventory Manager
    • Automates the process of collecting software inventory from managed instances.
    • You specify the type of metadata to collect, the instances from where the metadata should be collected, and a schedule for metadata collection.
  • Configuration Compliance
    • Scans your fleet of managed instances for patch compliance and configuration inconsistencies.
    • View compliance history and change tracking for Patch Manager patching data and State Manager associations by using AWS Config.
    • Customize Systems Manager Compliance to create your own compliance types.
  • Run Command
    • Remotely and securely manage the configuration of your managed instances at scale.
    • Managed Instances – any EC2 instance or on-premises server or virtual machine in your hybrid environment that is configured for Systems Manager.
  • Session Manager
    • Manage your EC2 instances through an interactive one-click browser-based shell or through the AWS CLI.
    • Makes it easy to comply with corporate policies that require controlled access to instances, strict security practices, and fully auditable logs with instance access details, while still providing end users with simple one-click cross-platform access to your Amazon EC2 instances.
    • You can use AWS Systems Manager Session Manager to tunnel SSH (Secure Shell) and SCP (Secure Copy) traffic between a client and a server.
  • Distributor
    • Lets you package your own software or find AWS-provided agent software packages to install on Systems Manager managed instances.
    • After you create a package in Distributor, which creates an Systems Manager document, you can install the package in one of the following ways.
        • One time by using Systems Manager Run Command.
        • On a schedule by using Systems Manager State Manager.
  • Patch Manager
    • Automate the process of patching your managed instances.
    • Enables you to scan instances for missing patches and apply missing patches individually or to large groups of instances by using EC2 instance tags.
    • For security patches, Patch Manager uses patch baselines that include rules for auto-approving patches within days of their release, as well as a list of approved and rejected patches.
    • You can use AWS Systems Manager Patch Manager to select and apply Microsoft application patches automatically across your Amazon EC2 or on-premises instances.
    • AWS Systems Manager Patch Manager includes common vulnerability identifiers (CVE ID). CVE IDs can help you identify security vulnerabilities within your fleet and recommend patches.
    • You can configure actions to be performed on a managed instance before and after installing patches.
  • Maintenance Window
    • Set up recurring schedules for managed instances to execute administrative tasks like installing patches and updates without interrupting business-critical operations.
    • Supports running four types of tasks:
        • Systems Manager Run Command commands
        • Systems Manager Automation workflows
        • AWS Lambda functions
        • AWS Step Functions tasks
  • Systems Manager Document (SSM)
    • Defines the actions that Systems Manager performs.
    • Types of SSM Documents

Type

Use with

Details

Command document

Run Command,

State Manager

Run Command uses command documents to execute commands. State Manager uses command documents to apply a configuration. These actions can be run on one or more targets at any point during the lifecycle of an instance.

Policy document

State Manager

Policy documents enforce a policy on your targets. If the policy document is removed, the policy action no longer happens.

Automation document

Automation

Use automation documents when performing common maintenance and deployment tasks such as creating or updating an AMI.

Package document

Distributor

In Distributor, a package is represented by a Systems Manager document. A package document includes attached ZIP archive files that contain software or assets to install on managed instances. Creating a package in Distributor creates the package document.

 

    • Can be in JSON or YAML.
    • You can create and save different versions of documents. You can then specify a default version for each document.
    • If you want to customize the steps and actions in a document, you can create your own.
    • You can tag your documents to help you quickly identify one or more documents based on the tags you’ve assigned to them.
  • State Manager
    • A service that automates the process of keeping your EC2 and hybrid infrastructure in a state that you define.
    • State Manager association is a configuration that is assigned to your managed instances. The configuration defines the state that you want to maintain on your instances. The association also specifies actions to take when applying the configuration.
  • Parameter Store
    • Provides secure, hierarchical storage for configuration data and secrets management.
    • You can store values as plain text or encrypted data with SecureString.
    • Parameters work with Systems Manager capabilities such as Run Command, State Manager, and Automation.
  • OpsCenter
    • OpsCenter helps you view, investigate, and resolve operational issues related to your environment from a central location.
    • OpsCenter complements existing case management systems by enabling integrations via Amazon Simple Notification Service (SNS) and public AWS SDKs. By aggregating information from AWS Config, AWS CloudTrail logs, resource descriptions, and Amazon CloudWatch Events, OpsCenter helps you reduce the mean time to resolution (MTTR) of incidents, alarms, and operational tasks.
  • Change Manager
    • An enterprise change management framework for requesting, approving, implementing, and reporting on operational changes to your application configuration and infrastructure.
    • From a single delegated administrator account, if you use AWS Organizations, you can manage changes across multiple AWS accounts and across AWS Regions. Alternatively, using a local account, you can manage changes for a single AWS account.
    • Can be used for both AWS and on-premises resources.
    • For each change template, you can add up to five levels of approvers. When it’s time to implement an approved change, Change Manager runs the Automation runbook that is specified in the associated change request.

Monitoring

  • SSM Agent writes information about executions, scheduled actions, errors, and health statuses to log files on each instance. For more efficient instance monitoring, you can configure either SSM Agent itself or the CloudWatch Agent to send this log data to CloudWatch Logs.
  • Using CloudWatch Logs, you can monitor log data in real-time, search and filter log data by creating one or more metric filters, and archive and retrieve historical data when you need it.
  • Log System Manager API calls with CloudTrail.

Security

  • Systems Managers is linked directly to IAM for access controls.

Pricing

  • For your own packages, you pay only for what you use. Upon transferring a package into Distributor, you will be charged based on the size and duration of storage for that package, the number of Get and Describe API calls made, and the amount of out-of-Region and on-premises data transfer out of Distributor for those packages.
  • You are charged based on the number and type of Automation steps.