Thursday, 4 July 2019

ELK Stack Tutorial: Learn Elasticsearch, Logstash, and Kibana

What is the ELK Stack?

The ELK Stack is a collection of three open-source products — Elasticsearch, Logstash, and Kibana. They are all developed, managed ,and maintained by the company Elastic.
  • E stands for ElasticSearch: used for storing logs
  • L stands for LogStash : used for both shipping as well as processing and storing logs
  • K stands for Kibana: is a visutalization tool (a web interface) which is hosted through Nginx or Apache
ELK Stack is designed to allow users to take to data from any source, in any format, and to search, analyze, and visualize that data in real time.
ELK provides centralized logging that be useful when attempting to identify problems with servers or applications. It allows you to search all your logs in a single place. It also helps to find issues that occur in multiple servers by connecting their logs during a specific time frame.
In this tutorial, you will learn
  • What is the ELK Stack?
  • ELK Stack Architecture
  • What is Elasticsearch?
  • What is Logstash?
  • What is Kibana?
  • Elk vs splunk
  • Case Studies
  • Advantages and Disadvantages of ELK stack

ELK Stack Architecture

Here is the simple architecture of ELK stack
  • Logs: Server logs that need to be analyzed are identified
  • Logstash: Collect logs and events data. It even parses and transforms data
  • ElasticSearch: The transformed data from Logstash is Store, Search, and indexed.
  • Kibana: Kibana uses Elasticsearch DB to Explore, Visualize, and Share
However, one more component is needed or Data collection called Beats. This led Elastic to rename ELK as the Elastic Stack.
While dealing with very large amounts of data, you may need Kafka, RabbitMQ for buffering and resilience. For security, nginx can be used.
Let's deep drive all of these open source products:

What is Elasticsearch?

Elasticsearch is a NoSQL database. It is based on Lucene search engine, and it is built with RESTful APIS. It offers simple deployment, maximum reliability, and easy management. It also offers advanced queries to perform detail analysis and stores all the data centrally. It is helpful for executing a quick search of the documents.
Elasticsearch also allows you to store, search and analyze big volume of data. It is mostly used as the underlying engine to powers applications that completed search requirements. It has been adopted in search engine platforms for modern web and mobile applications. Apart from a quick search, the tool also offers complex analytics and many advanced features.
Features of Elastic search:
  • Open source search server is written using Java
  • Used to index any kind of heterogeneous data
  • Has REST API web-interface with JSON output
  • Full-Text Search
  • Near Real Time (NRT) search
  • Sharded, replicated searchable, JSON document store
  • Schema-free, REST & JSON based distributed document store
  • Multi-language & Geolocation support
Advantages of Elasticsearch
  • Store schema-less data and also creates a schema for your data
  • Manipulate your data record by record with the help of Multi-document APIs
  • Perform filtering and querying your data for insights
  • Based on Apache Lucene and provides RESTful API
  • Provides horizontal scalability, reliability, and multitenant capability for real time use of indexing to make it faster search
  • Helps you to scale vertically and horizontally
Important Terms used in Elastic Search
TermUsage
ClusterA cluster is a collection of nodes which together holds data and provides joined indexing and search capabilities.
NodeA node is an elasticsearch Instance. It is created when an elasticsearch instance begins.
IndexAn index is a collection of documents which has similar characteristics. e.g., customer data, product catalog. It is very useful while performing indexing, search, update, and delete operations. It allows you to define as many indexes in one single cluster.
DocumentIt is the basic unit of information which can be indexed. It is expressed in JSON (key: value) pair. '{"user": "nullcon"}'. Every single Document is associated with a type and a unique id.
ShardEvery index can be split into several shards to be able to distribute data. The shard is the atomic part of an index, which can be distributed over the cluster if you want to add more nodes.

What is Logstash?

Logstash is the data collection pipeline tool. It collects data inputs and feeds into the Elasticsearch. It gathers all types of data from the different source and makes it available for further use.
Logstash can unify data from disparate sources and normalize the data into your desired destinations. It allows you to cleanse and democratize all your data for analytics and visualization of use cases.
It consists of three components:
  • Input: passing logs to process them into machine understandable
    format
  • Filters: It is a set of conditions to perform a particular action or event
  • Output: Decision maker for processed event or log
Features of Logstash
  • Events are passed through each phase using internal queues
  • Allows different inputs for your logs
  • Filtering/parsing for your logs
Advantage of Logstash
  • Offers centralize the data processing
  • It analyzes a large variety of structured/unstructured data and events
  • Offers plugins to connect with various types of input sources and platforms

What is Kibana?

Kibana is a data visualization which completes the ELK stack. This tool is used for visualizing the Elasticsearch documents and helps developers to have a quick insight into it. Kibana dashboard offers various interactive diagrams, geospatial data, and graphs to visualize complex quires.
It can be used for search, view, and interact with data stored in Elasticsearch directories. Kibana helps you to perform advanced data analysis and visualize your data in a variety of tables, charts, and maps.
In Kibana there are different methods for performing searches on your data.
Here are the most common search types:
Search TypeUsage
Free text searchesIt is used for searching a specific string
Field-level searchesIt is used for searching for a string within a specific field
Logical statementsIt is used to combine searches into a logical statement.
Proximity searchesIt is used for searching terms within specific character proximity.
Features of Kinbana:
  • Powerful front-end dashboard which is capable of visualizing indexed information from the elastic cluster
  • Enables real-time search of indexed information
  • You can search, View, and interact with data stored in Elasticsearch
  • Execute queries on data & visualize results in charts, tables, and maps
  • Configurable dashboard to slice and dice logstash logs in elasticsearch
  • Capable of providing historical data in the form of graphs, charts, etc.
  • Real-time dashboards which is easily configurable
  • Enables real-time search of indexed information
Advantages and Disadvantages of Kinbana
  • Easy visualizing
  • Fully integrated with Elasticsearch
  • Visualization tool
  • Offers real-time analysis, charting, summarization, and debugging capabilities
  • Provides instinctive and user-friendly interface
  • Allows sharing of snapshots of the logs searched through
  • Permits saving the dashboard and managing multiple dashboards

Why Log Analysis?

In cloud-based environment infrastructures, performance, and isolation is very important. The performance of virtual machines in the cloud may vary based on the specific loads, environments, and number of active users in the system. Therefore, reliability and node failure can become a significant issue.
Log management platform can monitor all above-given issues as well as process operating system logs, NGINX, IIS server log for web traffic analysis, application logs, and logs on AWS (Amazon web services).
Log management helps DevOps engineers, system admin to make better business decisions. Hence, log analysis via Elastic Stack or similar tools is important.

ELK vs. Splunk

ElkSplunk
Elk is open source toolSplunk is a commercial tool.
Elk stack does not offer Solaris Portability because of Kibana.Splunk offers Solaris Portability.
Processing speed is strictly limited.Offers accurate and speedy processes.
ELK is a technology stack created with the combination Elastic Search-Logstash-Kibana.Splunk is a proprietary tool. It provides both on-premise and cloud solutions.
In ELK Searching, Analysis & Visualization will be only possible after the ELK stack is setup.Splunk is a complete data management package at your disposal.
ELK does not support integration with other tools.Splunk is a useful tool for setting up integrations with other tools.

Case Studies

NetFlix
Netflix heavily relies on ELK stack. The company using ELK stack to monitor and analyze customer service operation's security log. It allows them to index, store, and search documents from more than fifteen clusters which comprise almost 800 nodes.
LinkedIn
The famous social media marketing site LinkedIn uses ELK stack to monitor performance and security. The IT team integrated ELK with Kafka to support their load in real time. Their ELK operation includes more than 100 clusters across six different data centers.
Tripwire:
Tripwire is a worldwide Security Information Event Management system. The company uses ELK to support information packet log analysis.
Medium:
Medium is a famous blog-publishing platform. They use ELK stack to debug their production issues. The company also uses ELK to detect DynamoDB hotpots. Moreover, using this stack, the company can support 25 million unique readers as well as thousands of published posts each week.

Advantages and Disadvantages of ELK stack

Advantages
  • ELK works best when logs from various Apps of an enterprise converge into a single ELK instance
  • It provides amazing insights for this single instance and also eliminates the need to log into hundred different log data sources
  • Rapid on-premise installation
  • Easy to deploy Scales vertically and horizontally
  • Elastic offers a host of language clients which includes Ruby. Python. PHP, Perl, .NET, Java, and JavaScript, and more
  • Availability of libraries for different programming and scripting languages
Disadvantages
  • Different components In the stack can become difficult to handle when you move on to complex setup
  • There's nothing like trial and error. Thus, the more you do, the more you learn along the way
Summary
  • Centralized logging can be useful when attempting to identify problems with servers or applications
  • ELK stack is useful to resolve issues related to centralized logging system
  • ELK stack is a collection of three open source tools Elasticsearch, Logstash Kibana
  • Elasticsearch is a NoSQL database
  • Logstash is the data collection pipeline tool
  • Kibana is a data visualization which completes the ELK stack
  • In cloud-based environment infrastructures, performance and isolation is very important
  • In ELK stack processing speed is strictly limited whereas Splunk offers accurate and speedy processes
  • Netflix, LinkedIn, Tripware, Medium all are using ELK stack for their business
  • ELK works best when logs from various Apps of an enterprise converge into a single ELK instance
  • Different components In the stack can become difficult to handle when you move on to complex setup

30 Best DevOps Tools & Technologies (2019 List)

DevOps is a software development and delivery process. It emphasizes communication, collaboration between product management, software development, and operations professionals.
Following is a curated list of the Top DevOps Tool, along with their features and latest download links.

1) QuerySurge

QuerySurge is the smart data testing solution that is the first-of-its-kind full DevOps solution for continuous data testing.
Key Features
  • Robust API with 60+ calls
  • Seamlessly integrates into the DevOps pipeline for continuous testing
  • Verifies large amounts of data quickly
  • Validates difficult transformation rules between multiple source and target systems
  • Detects requirements and code changes, updates tests accordingly and alerts team members of said changes
  • Provides detailed data intelligence & data analytics

2) Jenkins

Jenkins a DevOps tool for monitoring execution of repeated tasks. It helps to integrate project changes more easily by quickly finding issues.
Features:
  • It increases the scale of automation
  • Jenkins requires little maintenance and has built-in GUI tool for easy updates.
  • It offers 400 plugins to support building and testing virtually any project.
  • It is Java-based program ready to run with Operating systems like Windows, Mac OS X, and UNIX
  • It supports continuous integration and continuous delivery
  • It can easily set up and configured via web interface
  • It can distribute tasks across multiple machines thereby increasing concurrency.

3) Vagrant

Vagrant is a DevOps tool. It allows building and managing virtual machine environments in a single workflow. It offers easy-to-use workflow and focuses on automation. Vagrant lowers development environment setup time and increases production parity.
Features:
  • Vagrant integrates with existing configuration management tools like Chef, Puppet, Ansible, and Salt
  • Vagrant works seamlessly on Mac, Linux, and Window OS
  • Create a single file for projects to describe the type of machine and software users want to install
  • It helps DevOps team members to have an ideal development environment

4) PagerDuty:

PagerDuty is a DevOps tool that helps businesses to enhance their brand reputation. It is an incident management solution supporting continuous delivery strategy. It also allows DevOps teams to deliver high-performing apps.
Key Features:
  • Provide Real-time alerts
  • Reliable & Rich Alerting facility
  • Event Grouping & Enrichment
  • Gain visibility into critical systems and applications
  • Easily detect and resolve incidents from development through production
  • It offers Real-Time Collaboration System & User Reporting
  • It supports Platform Extensibility
  • It allows scheduling & automated Escalations
  • Full-stack visibility across development and production environments
  • Event intelligence for actionable insights

5) Prometheus:

Prometheus is 100% open source free to use service monitoring system. It offers support for more than ten languages.
Key Features:
  • Flexible query language for slicing collected time series data to generate tables, graphs, and alerts
  • Stores time series, streams of timestamped values belonging to the same metric, and the same set of labeled dimensions
  • Stores time series in memory and also on local disk
  • It has easy-to-implement custom libraries
  • Alert manager handles notifications and silencing

6) Ganglia:

Ganglia DevOps tool offers teams with cluster and grid monitoring capabilities. This tool is designed for high-performance computing systems like clusters and grids.
Key Features:
  • Free and open source tool
  • Scalable monitoring system based on a hierarchical design
  • Achieves low per-node overheads for high concurrency
  • It can handle clusters with 2,000 nodes

7) Snort:

Snort is a very powerful open-source DevOps tool that helps in the detection of intruders. It also highlights malicious attacks against the system. It allows real-time traffic analysis and packet logging.
Key Features:
  • Performs protocol analysis and content searching
  • It allows signature-based detection of attacks by analyzing packets
  • It offers real-time traffic analysis and packet logging
  • Detects buffer overflows, stealth port scans, and OS fingerprinting attempts, etc.

8) Splunk:

Splunk is a tool to make machine data accessible, usable, and valuable to everyone. It delivers operational intelligence to DevOps teams. It helps companies to be more productive, competitive, and secure.
Key Features:
  • Data drive analytics with actionable insights
  • Next-generation monitoring and analytics solution
  • Delivers a single, unified view of different IT services
  • Extend the Splunk platform with purpose-built solutions for security
Download link: https://www.splunk.com/

9) Nagios

Nagios is another useful tool for DevOps. It helps DevOps teams to find, and correct problems with network & infrastructure.
Key Features:
  • Nagios XI helps to monitors components like applications, services, OS, network protocols
  • It provides complete monitoring of desktop and server operating systems
  • It provides complete monitoring of Java Management Extensions
  • It allows monitoring of all mission-critical infrastructure components on any operating system
  • Its log management tool is industry leading.
  • Network Analyzer helps identify bottlenecks and optimize bandwidth utilization.
  • This tool simplifies the process of searching log data
Download link: https://www.nagios.com/

10) Chef:

Chef is a useful DevOps tool for achieving speed, scale, and consistency. It is a Cloud based system. It can be used to ease out complex tasks and perform automation.
Features:
  • Accelerate cloud adoption
  • Effectively manage data centers
  • It can manage multiple cloud environments
  • It maintains high availability

11) Sumo Logic:

Sumo Logic helps organizations to analyze and make sense of log data. It combines security analytics with integrated threat intelligence for advanced security analytics.
Key Features:
  • Build, run, and secure Azure Hybrid applications
  • Cloud-native, machine data analytics service for log management and time series metrics
  • Monitor, secure, troubleshoot cloud applications, and infrastructures
  • It has a power of elastic cloud to scale infinitely
  • Drive business value, growth and competitive advantage
  • One platform for continuous real-time integration
  • Remove friction from the application lifecycle

12) OverOps:

OverOps is the DevOps tool that gives root-cause of a bug and informs about server crash to the team. It quickly identifies when and why code breaks in production.
Key Features:
  • Detects production code breaks and delivers the source code
  • Improve staff efficiency by reducing time wasted sifting through logs
  • Offers the complete source code and variable to fix any error
  • Proactively detects when deployment processes face errors
  • It helps DevOps team to spend more time in delivering great features
Download link: https://www.overops.com/

13) Consul:

Consul is a DevOps tool. It is widely used for discovering and configuring services in any infrastructure. It is a perfect tool for modern, elastic infrastructures as it is useful for the DevOps community.
Key Features:
  • It provides a robust API
  • Applications can easily find the services they should depend upon using DNS or HTTP
  • Make use of the hierarchical key or value store for dynamic configuration
  • Provide Supports for multiple data centers

14) Docker:

Docker is a DevOps technology suite. It allows DevOps teams to build, ship, and run distributed applications. This tool allows users to assemble apps from components and work collaboratively.
Key Features:
  • CaaS Ready platform running with built in orchestration
  • Flexible image management with a private registry to store, manage images and configure image caches
  • Isolates apps in containers to eliminate conflicts for enhancing security

15) Stackify Retrace:

Stackify is a lightweight DevOps tool. It shows real-time logs, errors queries, and more directly into the workstation. It is an ideal solution for intelligent orchestration for the software-defined data center.
Key Features:
  • Detailed trace of all types of web request
  • Eliminate messy configuration or code changes
  • Provides an instant feedback to check what .NET or Java web apps are doing
  • Allows to find and fix bugs before production
  • Integrated container management with Docker Datacenter of all app resources and users in a unified web admin UI
  • Flexible image management with a private registry to store and manage images
  • It provides secure access and configures image caches
  • Secure multi tenancy with granular Role Based Access Control
  • Complete security with automatic TLS, integrated secrets management, security scanning and deployment policy
  • Docker Certified Plugins Containers provide tested, certified and supported solutions

16) CFEngine:

CFEngine is a DevOps tool for IT automation. It is an ideal tool for configuration management. It helps teams to automate large-scale complex infrastructure.
Key Features:
  • Provides rapid solution with the execution time less than one second
  • An open source configuration solution with an unmatched security record
  • It conducted billions of compliance checks in large-scale production environments
  • It allows deploying a model-based configuration change across 50,000 servers in very few minutes

17) Artifactory:

Artifactory is the enterprise-ready repository manager. It provides end-to-end, automated solution for tracking artifacts from development to production.
Features:
  • It supports software packages created using any technology or language
  • Supports secure, clustered, high-availability Docker registries
  • Remote artifacts are cached locally for reuse this eliminates the need for downloading them repeatedly.

18) Capistrano:

Capistrano is another useful remote server automation tool for DevOps teams. This tool supports scripting and executing arbitrary tasks.
Features:
  • Allows to deploy web application to any number of machines
  • Helps to automate common tasks in software teams
  • Interchangeable output formatters
  • Allows to script arbitrary workflows over SSH
  • Easy to add support for many source control management software
  • Host and Role filters for partial deploys or cluster maintenance
  • Recipes for the database integration and Rails asset pipelines
Download link: http://capistranorb.com/

19) Monit:

Monit is an Open Source DevOps tool. It is designed for managing and monitoring UNIX systems. It conducts automatic maintenance, repair, and executes meaningful actions in error situations.
Features:
  • Executes meaningful causal actions in error situations
  • Monit helps to monitor daemon processes or similar programs running on localhost
  • It helps to monitor files, directories, and file systems on localhost
  • This DevOps tool allows network connections to various servers

20) Supervisor:

Supervisor is a useful DevOps tool. It allows teams to monitor and control processes on UNIX operating systems. It provides users a single place to start, stop, and monitor all the processes.
Features:
  • Supervisor is configured using a simple INI-style config file which is easy to learn
  • This tool provides users a single place to start, stop, and monitor all the processes
  • It uses simple event notification to monitor programs written in any language
  • It is tested and supported on Linux, Mac OS X, FreeBSD, Solaris, etc.
  • It does not need compiler because it is written entirely in Python

21) Ansible:

Ansible is a leading DevOps tool. It is a simple way to automate IT for automating entire application lifecycle. It makes it easier for DevOps teams to scale automation and speed up productivity.
Key Features:
  • It is easy to use open source deploy apps
  • It helps to avoid complexity in the software development process
  • IT automation eliminates repetitive tasks that allow teams to do more strategic work
  • It is an ideal tool to manage complex deployments and speed up development process

22) Code Climate:

Code Climate is a DevOps tool that monitors the health of the code, from the command line to the cloud. It helps users to fix issues easily and allows the team to produce better code.
Features:
  • It can easily integrate into any workflow
  • It helps to identify fixes, and improve team's skills to produce maintainable code
  • With the Code climate, it is easy to increase the code quality
  • Allow tracking progress instantly
Download link: https://codeclimate.com/

23) Icinga

Icinga is a DevOps tool that consists of two branches in parallel: Icinga and Icinga2. It allows DevOps engineers to select best suits for their project.
Key Features:
  • Monitor network services, host resources, and server components
  • Notify through email, SMS, or phone call
  • With the RESTful API of Icinga 2, it is certainly easy to update configurations
  • When any issue occurs, the user will be notified. Using e-mail, text message or mobile message applications
  • Apply rules to hosts and services for creating continuous monitoring environment
  • Report with chart graphs, measure SLA and helps to identify trends

24) New Relic APM:

New Relic APM is a useful DevOps tool. It gains end to end visibility across customer experience and dynamic infrastructure. It allows DevOps team reduce the time for monitoring applications.
Features:
  • Monitor performance of External Services
  • It allows full-stack alerting
  • Organize, visualize, evaluate with in-depth analytics
  • Provide a precise picture of dynamically changing systems.
  • The external service's dashboard offers charts with response time
  • Create customized queries on metric data and names
  • Key Transactions monitor feature to manage and track all the important business transactions

25) Juju:

Juju is an open source application modeling DevOps tool. It deploys, configure, scales and operate software on public & private clouds. With Juju, it is possible to automate cloud infrastructure and deploy application architectures.
Key Features:
  • DevOps engineers can easily handle configuration, management, maintenance, deployment, and scalability.
  • It offers powerful GUI and command-line interface
  • Deploy services to targeted cloud in seconds
  • Provide detailed logs to resolve issues quickly

26) ProductionMap:

ProductionMap is an Integrated Visual platform for DevOps engineers. It helps to make automation development fast and easy. This orchestration platform backed by dedicated to IT professionals.
Features:
  • Allows users planning the automation process
  • Java Script editor backed by a full Object Model
  • Each execution is automatically documented
  • The Admin can control map execution
  • User can trigger an execution of a map from remote events

27) Scalyr:

Scalyr is a DevOps platform for high-speed server monitoring and log management. It's log aggregator module collects all application, web, process, and system logs
Features:
  • Start monitoring and collecting data without need to worry about infrastructure
  • Drop the Scalyr Agent on any server
  • It allows to Import logs from Heroku, Amazon RDS, and Amazon CloudWatch, etc.
  • Graphs allow visualizing log data and metrics to show breakdowns and percentiles
  • Centralized log management and server monitoring
  • Watch all the new events arrive in near real-time
  • Search hundreds of GBs/sec across all the servers
  • Just need to click once to switch between logs and graphs
  • Turn complex log data into simple, clear, and highly interactive reports

28) Rudder:

Rudder is a DevOps solution for continuous configuration and auditing. It is easy to use web-driven solution for IT automation.
Key Features:
  • Workflow offers various user options like non-expert users, expert users, and managers
  • Automate common system administration tasks such as installation and configuration
  • Enforce configuration over time
  • Provide Inventory of all managed nodes
  • Web interface for configuring and managing nodes
  • Compliance reporting by configuration or by node

29) Puppet Enterprise:

Puppet Enterprise is a DevOps tool. It allows managing entire infrastructure as code without expanding the size of the team.
Features:
  • Puppet enterprise tool eliminates manual work for software delivery process. It helps developer to deliver great software rapidly
  • Model and manage entire environment
  • Intelligent orchestration and visual workflows
  • Real-time context-aware reporting
  • Define and continually enforce infrastructure
  • It inspects and reports on packages running across infrastructure
  • Desired state conflict detection and remediation

30) Graylog:

Graylog is a powerful log management and DevOps tool. It has many use cases for monitoring SSH logins and unusual activities. Its basic version is a free and open source.
Features:
  • Automatically archive the data so that user don't need to do that frequently
  • Graylog Enterprise also offers Audit Log capabilities.
  • It records and stores actions taken by a user or administrator that make changes in the system
  • Receive enterprise-grade support by allowing support requests directly from the engineers

31) UpGuard:

UpGuard helps DevOps teams around the world to gain visibility into their technology.It integrates seamlessly with popular automation platforms such as Puppet, Chef, and Ansible.
Features:
  • UpGuard helps businesses around the world to gain visibility into their technology
  • This DevOps tool allows increasing in speed of software delivery. It is accomplished through the automation by numbers of processes and technologies.
  • It allows users to trust a third-party with sensitive data
  • The procedures used to govern assets are as important as the configurations themselves