Linux Training in Coimbatore & Best Linux Server Administration Training Institute: Amazon Polly

A text-to-speech (TTS) service
Uses advanced deep learning technologies to convert text into natural, lifelike speech
It supports saving text into MP3, OGG, and PCM file formats.
Offers Standard and Neural TTS (NTTS)

Common use cases

Concepts

Speech Synthesis Markup Language (SSML)
- Uses XML-based tags to modify different aspects of the text-to-speech output.
- Can control pitch, speaking style, speech rate, and volume.
Standard TTS
- Concatenates short speech snippets together.
- Limited in terms of producing different speaking styles.
Neural TTS
- Produces higher quality speech output than Standard TTS.
- Neural TTS supports two speaking styles:
  - Conversational
  - Newscaster
Speech Mark
- Refers to the metadata that describes the synthesized speech
- Speech Mark has four types:
  - Sentence
  - Word
  - Viseme
  - SSML

Features

Amazon Polly accepts plain text, UTF-8, and SSML as inputs.
Pronounces out abbreviations and acronyms
Interprets date/time and unit of measurements.
Homograph disambiguation
- For example, “St.” can be read as ”saint” or “street.” Amazon Polly is capable of identifying their difference depending on a given context.
Custom lexicon
- Supports customizing the pronunciation of words uncommon to the selected language.

Pricing

Thursday, 24 March 2022