Thursday, 24 March 2022

Amazon Transcribe

 

  • A fully managed automatic speech recognition (ASR) service.
  • Converts speech into text.
  • It supports a wide variety of audio coding formats such as WAV, MP3, MP4, FLAC, AMR, AMR-WB, Ogg, and WebM.
  • It can process batch and streaming transcriptions.

Common Use Cases

  • Transcribing customer calls
  • Meeting transcription
  • Closed captioning
  • Generating metadata to create a searchable archive

Concepts

  • confidence score is between 0 and 100, indicating the probability that a given prediction is correct.
  • Low-fidelity (lo-fi) is a term used to describe audio recordings that exhibit poor sound quality. The term high-fidelity refers to high-quality audio recordings.
  • Automatic content redaction
    • A process that censors sensitive information within the transcript output.
    • Replaces redacted information with the [PII] tag.

Features

  • Custom Vocabulary
    • Helps improve accuracy for content that has business-specific terms such as medical or legal terms.
  • Vocabulary Filtering 
    • Allows you to create a list of words to filter from the transcript.
    • Useful for blocking profanities.
  • Multiple speaker recognition
    • Supports identifying up to a maximum of ten speakers.
  • Capable of transcribing low-fidelity and high-fidelity audio files.
  • Uses machine learning to provide punctuation and grammatical formatting, making the transcription output immediately usable.
  • It adds timestamps to every word, making it easier to use for movie subtitles.
  • Includes confidence scores at each result so you can easily pinpoint the sections where further editing is required.
  • Transcribe Medical can automatically identify Protected Health Information (PHI). Amazon Transcribe Medical is a HIPAA-eligible automatic speech recognition service.

Pricing

  • Charges batch and streaming transcription jobs at a monthly rate of $0.0004 per second.
  • Billed in 1-second increments
  • Minimum request charge of 15 seconds.

Amazon Textract

 

  • A fully managed document analysis service for detecting and extracting information from scanned documents.
  • Returns extracted data as key-value pairs (e.g., Name: John Doe)
  • Supports virtually any type of documents
  • Can detect text written in Standard English alphabet and ASCII symbols.

Common Use Cases:

  • Building search indexes
  • Importing documents into a business application
  • Building automated document processing solutions
  • Text extraction for Natural Language Processing (NLP) Applications
  • Maintaining document compliance

Concepts

  • Amazon Textract returns a confidence score for each identified element, which indicates the probability that a given prediction is correct.
  • A low-confidence score can be rerouted to Amazon Augmented AI (A2I) for further human review.
  • The asynchronous operation allows you to process multipage PDF documents.
  • Detect Document Text API
    •  Uses optical character recognition (OCR) technology to extract printed text and handwriting from a document.
  • Analyze Document API
    • Extracts printed text, handwriting, and other data from tables and key-value pairs from forms.

Pricing

  • You only pay for what you use.
  • Charges vary for Detect Document Text API and Analyze Document API, with the latter being the more expensive.

Amazon SageMaker

 

  • A fully managed service that allows data scientists and developers to easily build, train, and deploy machine learning models at scale.
  • Provides built-in algorithms that you can immediately use for model training.
  • Also supports custom algorithms through docker containers.
  • One-click model deployment.

Concepts

  • Hyperparameters
    • It refers to a set of variables that controls how a model is trained.
    • You can think of them as “volume knobs” that you can tune to acquire your model’s objective.
  • Automatic Model Tuning
    • Finds the best version of a model by automating the training job within the limits of the hyperparameters that you specified.
  • Training
    • The process where you create a machine learning model.
  • Inference
    • The process of using the trained model to make predictions.
  • Local Mode
    • Allows you to create and deploy estimators to your local machine for testing.
    • You must install the Amazon SageMaker Python SDK on your local environment to use local mode.

Common Training Data Formats For Built-in Algorithms

  • CSV
  • Protobuf RecordIO
  • JSON
  • Libsvm
  • JPEG
  • PNG

Input modes for transferring training data

  • File mode
    • Downloads data into the SageMaker instance volume before model training commences.
    • Slower than pipe mode
    • Used for Incremental training
  • Pipe mode
    • Directly stream data from Amazon S3 into the training algorithm container.
    • There’s no need to procure large volumes to store large datasets.
    • Provides shorter startup and training times.
    • Higher I/O throughputs
    • Faster than File mode.
    • You MUST use protobuf RecordIO as your training data format before you can take advantage of the Pipe mode.

Two methods of deploying a model for inference

  • Amazon SageMaker Hosting Services
    • Provides a persistent HTTPS endpoint for getting predictions one at a time.
    • Suited for web applications that need sub-second latency response.
  • Amazon SageMaker Batch Transform
    • Doesn’t need a persistent endpoint
    • Get inferences for an entire dataset

Optimization

  • Convert training data into a protobuf RecordIO format to make use of Pipe mode.
  • Use Amazon FSx for Lustre to accelerate File mode training jobs.

Monitoring

  • You can publish SageMaker instance metrics to the CloudWatch dashboard to gain a unified view of its CPU utilization, memory utilization, and latency.
  • You can also send training metrics to the CloudWatch dashboard to monitor model performance in real-time.
  • Amazon CloudTrail helps you detect unauthorized SageMaker API calls.

Pricing

  • The building, training, and deploying of ML models are billed by the second, with no minimum fees and no upfront commitments.

Amazon Rekognition

 

  • A service that makes it easy to add powerful visual analysis to your applications.
  • There are two services under Amazon Rekognition:
    • Rekognition Image lets you easily build powerful applications to search, verify, and organize millions of images.
    • Rekognition Video lets you extract motion-based context from stored or live stream videos and helps you analyze them.
  • Rekognition Image
    • An image recognition service that detects objects, scenes, and faces; extracts text, and many more.
    • It also allows you to search and compare faces.
    • The service uses deep neural network models to detect and label thousands of objects and scenes in your images.
    • Common use cases
      • Searchable Image Library
      • Face-Based User Verification
      • Sentiment Analysis
      • Facial Recognition
      • Image Moderation
    • Rekognition Image currently supports the JPEG and PNG image formats. You can submit images either as an S3 object (up to 15MB) or as a byte array (up to 5MB).
    • Rekognition Image returns the bounding box for each face detected in an image along with its attributes such as sex, accessories, facial features, etc.
    • Using the CompareFaces API, Rekognition Image lets you measure the likelihood that faces in two images are of the same person.
    • Rekognition Video
      • A video recognition service that detects activities; understands the movement of people in frame; and recognizes objects, celebrities, text, scenes, and many more in a video.
      • Rekognition Video allows you also to index metadata like objects, text, activities, scene, celebrities, and faces that make video search easy.
      • Common use cases
        • Search Index for video archives
        • Easy filtering of video for explicit and suggestive content
      • Rekognition Video operations can analyze videos (up to 8GB) stored in Amazon S3 buckets. The video must be encoded using the H.264 codec. The supported file formats are MPEG-4 and MOV.
      • With Rekognition Video, you can locate faces across a video and analyze face attributes.
      • With the Person Tracking feature, you can also track each person within a shot and through the video across shots.
      • Rekognition Video uses a Kinesis Video Stream as input, to process a video stream. The analysis results are output to a Kinesis data stream and finally read by your client application.
    • Concepts
      • label is an object, scene, or concept found in an image based on its contents.
      • Each label comes with a confidence score. A confidence score is a number between 0 and 100 that indicates the probability that a given prediction is correct.
      • Object and Scene Detection is the process of analyzing an image or video to assign labels based on its visual content. Rekognition Image does this through the DetectLabels API.
      • For every label found, Amazon Rekognition returns the parent labels if they exist. This defines if two objects are related to one another under some certain category. Parents are returned in hierarchical order (from left to right).
      • Unsafe Content Detection is a deep-learning based API for detection of explicit, rude and suggestive adult content in images. Very useful for filtering inappropriate content.
      • Facial Recognition is the process of identifying or verifying a person’s identity by searching for their face in a collection of faces. You can create a face collection as your dataset for comparison.
      • Amazon Rekognition can also perform sentiment and demographic analysis.
      • Text in Image allows you to detect and recognize text within an image, and is specifically built to work with real-world images rather than document images.
      • Celebrity Recognition is Amazon Rekognition’s feature for recognizing celebrities within supplied images and in videos.
    • Pricing
      • With Rekognition Image, you only pay for the images you analyze and the face metadata you store.
      • Amazon Rekognition Video charges you based on the amount of video time analyzed and for amount of face metadata stored per month.

Amazon Polly

 

  • A text-to-speech (TTS) service
  • Uses advanced deep learning technologies to convert text into natural, lifelike speech
  • It supports saving text into MP3, OGG, and PCM file formats.
  • Offers Standard and Neural TTS (NTTS)

Common use cases

  • Increase customer engagement
  • Language learning applications
  • Helps visually impaired individuals to consume digital content
  • Testing in-game dialogs
  • Voice response

Concepts

  • Speech Synthesis Markup Language (SSML)
    • Uses XML-based tags to modify different aspects of the text-to-speech output.
    • Can control pitch, speaking style, speech rate, and volume.
  • Standard TTS
    • Concatenates short speech snippets together.
    • Limited in terms of producing different speaking styles.
  • Neural TTS
    • Produces higher quality speech output than Standard TTS.
    • Neural TTS supports two speaking styles:
      • Conversational
      • Newscaster
  • Speech Mark
    • Refers to the metadata that describes the synthesized speech
    • Speech Mark has four types:
      • Sentence
      • Word
      • Viseme
      • SSML

Features

  • Amazon Polly accepts plain text, UTF-8, and SSML as inputs.
  • Pronounces out abbreviations and acronyms
  • Interprets date/time and unit of measurements.
  • Homograph disambiguation 
    • For example,  “St.” can be read as ”saint” or “street.” Amazon Polly is capable of identifying their difference depending on a given context.
  • Custom lexicon
    • Supports customizing the pronunciation of words uncommon to the selected language.

Pricing

  • Standard TTS
    • $4.00 per 1 million characters
  • Neural TTS
    • $16.00 per 1 million characters

Amazon Personalize

 

  • A fully managed machine learning service for building recommendation systems.
  • Amazon Personalize allows you to train, build, and deploy recommendation models without an extensive machine learning experience.
  • Offers batch and real-time recommendations.

Common Use Cases:

  • Personalized product and content recommendations.
  • Product rankings.
  • Improves marketing communication through individualized push notifications and emails.

Concepts

  • Amazon Personalize can provide recommendations based on real-time data, historical data, or a mix of both.
  • Event trackers
    • Records user interactions in real-time.
  • Recipe
    • Refers to the algorithm to be used in training a solution for a given use case.
    • Available Recipes
      • USER_PERSONALIZATION – optimized for personalized recommendation systems
      • PERSONALIZED_RANKING – a hierarchical recurrent neural network (HRNN) for providing a list of best recommendations (e.g., ranking search results)
      • RELATED_ITEMS – predicts item similar to a given item

Pricing

  • Pay only for what you use.
  • You are billed for data ingestion, training, and inference (recommendation)

Amazon Lex

 

  • A service that can help you build conversational interfaces using voice and text.
  • Uses automatic speech recognition (ASR) to convert speech to text.
  • Uses natural language understanding (NLU) for recognizing the intent of the text.
  • Provides highly-engaging user experiences and lifelike conversational interactions.
  • Gets more intelligent over time by using deep learning.

Common Use Cases

  • AI Chatbots
  • Informational bots
  • Enterprise Productivity bots
  • Voice Assistants

Concepts

  • Bots
    • Performs automated tasks such as ordering food or booking flights.
    • Supports multiple intents – for example, a bot can book a reservation or may choose to cancel it.
  • Intent
    • A set of actions given to a bot by a user.
    • Intent name
      • A description for the intent (e.g. ‘BookFlights’, ‘OrderFood’)
    • Sample utterances
      • Describes the tone of the intent.
    • How to fulfill the intent
      • Describes the method of fulfilling an intent.
      • Deeply integrates with AWS Lambda to fulfill the intent.
  • Slot
    • An optional parameter used as a part of the intent configuration.
    • Slot type
      • You can create a custom or built-in slot type.
      • Each slot type should have a unique name within an AWS account.

Pricing

  • Pay-as-you-go payment model
  • $0.004 per voice request
  • $0.00075 per text request