- Amazon ES lets you search, analyze, and visualize your data in real-time. This service manages the capacity, scaling, patching, and administration of your Elasticsearch clusters for you, while still giving you direct access to the Elasticsearch APIs.
- The service offers open-source Elasticsearch APIs, managed Kibana, and integrations with Logstash and other AWS Services. This combination is often coined as the ELK Stack.
Concepts
- An Amazon ES domain is synonymous with an Elasticsearch cluster. Domains are clusters with the settings, instance types, instance counts, and storage resources that you specify.
- You can create multiple Elasticsearch indices within the same domain. Elasticsearch automatically distributes the indices and any associated replicas between the instances allocated to the domain.
- Amazon ES uses a blue/green deployment process when updating domains. Blue/green typically refers to the practice of running two production environments, one live and one idle, and switching the two as you make software changes.
- When new service software becomes available, you can request an update to your domain and benefit from new features immediately. Some updates are required, and others are optional.
- If you take no action on required updates, AWS updates the service software automatically after a certain timeframe (typically two weeks).
- The alerting feature notifies you when data from one or more Elasticsearch indices meet certain conditions, such as receiving HTTP 503 errors.
- New SQL support enables you to query your domain using the familiar SQL syntax without compromising on Elasticsearch’s full-text search and scoring capabilities. With SQL support, you can query your data using aggregations, group by, and where clauses to investigate your data.
Storage
- You can choose between local on-instance storage (up to 3 PB) or Amazon EBS volumes to store your Elasticsearch indices.
- You can build data durability for your Amazon Elasticsearch domain through automated and manual snapshots. By default, the service will automatically create daily snapshots of each domain and retain them for 14 days. The automated snapshots are stored free of charge in Amazon S3, while the manual snapshots will incur standard Amazon S3 usage charges.
- Snapshots backup a cluster’s data and state, includes cluster settings, node information, index settings, and shard allocation.
- You cannot use automated snapshots to migrate to new domains. For migrations, you must use manual snapshots stored in your S3 bucket.
Data Ingestion
- Easily ingest structured and unstructured data into your Amazon Elasticsearch domain with Logstash, an open-source data pipeline that helps you process logs and other event data.
- You can also ingest data into your Amazon Elasticsearch domain using Amazon Kinesis Firehose, AWS IoT, or Amazon CloudWatch Logs.
- You can get faster and better insights into your data using Kibana, an open-source analytics and visualization platform. Kibana is automatically deployed with your Amazon Elasticsearch Service domain.
- You can load streaming data from the following sources using AWS Lambda event handlers:
- Amazon S3
- Amazon Kinesis Data Streams and Data Firehose
- Amazon DynamoDB
- Amazon CloudWatch
- AWS IoT
- Amazon ES exposes three Elasticsearch logs through CloudWatch Logs:
- error logs
- search slow logs – These logs help fine tune the performance of any kind of search operation on Elasticsearch.
- index slow logs – These logs provide insights into the indexing process and can be used to fine-tune the index setup.
- Indexing
- Before you can search data, you must index it. Indexing is the method by which search engines organize data for fast retrieval.
- In Elasticsearch, the basic unit of data is a JSON document.
- Within an index, Elasticsearch organizes documents into types (arbitrary data categories that you define) and identifies them using a unique ID.
- Indexing
Kibana and Logstash
- Kibana is a popular open source visualization tool designed to work with Elasticsearch.
- The URL is elasticsearch-domain-endpoint/_plugin/kibana/.
- You can configure your own Kibana instance aside from using the default provided Kibana.
- Amazon ES uses Amazon Cognito to offer username and password protection for Kibana. (Optional feature)
- Logstash provides a convenient way to use the bulk API to upload data into your Amazon ES domain with the S3 plugin. The service also supports all other standard Logstash input plugins that are provided by Elasticsearch.
- Amazon ES also supports two Logstash output plugins:
- standard Elasticsearch plugin
- logstash-output-amazon-es plugin, which signs and exports Logstash events to Amazon ES.
Security
- Amazon ES is HIPAA eligible and compliant with PCI DSS, NOC and ISO standards.
- You can securely connect your applications to your managed Elasticsearch environment from your VPC or via the public Internet, configuring network access using VPC security groups or IP-based access policies.
- Securely authenticate your users and control access using Amazon Cognito and AWS IAM.
- Has built-in encryption of data-at-rest and in-transit to protect your data both when it is stored in your domain or in automated snapshots, and when it is transferred between nodes in your domain.
- You can create three types of policies to control access to domains:
- Resource-based Policies – attached to domains. These policies specify which actions a principal can perform on the domain’s subresources, which include Elasticsearch indices and APIs.
- Identity-based policies – attached to IAM users or roles.
- IP-based Policies – restrict access to a domain to one or more IP addresses or CIDR blocks.
- Placing an Amazon ES domain within a VPC enables secure communication between Amazon ES and other services within the VPC without the need for an internet gateway, NAT device, or VPN connection. All traffic remains securely within the AWS Cloud.
High Availability
- You can deploy your Amazon ES instances across multiple AZs (up to three). If you enable replicas for your indexes, the shards will automatically be distributed such that you have cross-zone replication.
- If one or more instances in an AZ are unreachable or not functional, Amazon ES automatically tries to bring up new instances in the same AZ to replace the affected instances.
- When enabling Multi-AZ, you should create at least one replica for each index in your cluster. Without replicas, Amazon ES can’t distribute copies of your data to other Availability Zones.
- Even if you select two Availability Zones when configuring your domain, Amazon ES automatically distributes dedicated master nodes across three Availability Zones. This distribution helps prevent cluster downtime if a zone experiences a service disruption. It also assists in electing a new master node through a quorum between the two remaining nodes.
- Amazon Elasticsearch Service has increased its snapshot frequency from daily to hourly and are retained for 14 days at no extra charge.
Limitations
- You can either launch your domain within a VPC or use a public endpoint, but not in both.
- You cannot switch your domain from a VPC to a public endpoint and vice versa.
- You can’t launch your domain within a VPC that uses dedicated tenancy.
- You cannot move your domain to a different VPC that where it was initially launched in.
- To access the default installation of Kibana for a domain that resides within a VPC, users must have access to the VPC.
Pricing
- You pay for each hour of use of an EC2 instance and for the cumulative size of any EBS storage volumes attached to your instances.
- You can use Reserved Instances to reduce long term cost on your EC2 instances.
Use Cases
- Log Analytics
- Real-Time Application Monitoring
- Security Analytics
- Full Text Search
- Clickstream Analytics
No comments:
Post a Comment