- A text-to-speech (TTS) service
- Uses advanced deep learning technologies to convert text into natural, lifelike speech
- It supports saving text into MP3, OGG, and PCM file formats.
- Offers Standard and Neural TTS (NTTS)
- Increase customer engagement
- Language learning applications
- Helps visually impaired individuals to consume digital content
- Testing in-game dialogs
- Voice response
- Speech Synthesis Markup Language (SSML)
- Uses XML-based tags to modify different aspects of the text-to-speech output.
- Can control pitch, speaking style, speech rate, and volume.
- Standard TTS
- Concatenates short speech snippets together.
- Limited in terms of producing different speaking styles.
- Neural TTS
- Produces higher quality speech output than Standard TTS.
- Neural TTS supports two speaking styles:
- Conversational
- Newscaster
- Speech Mark
- Refers to the metadata that describes the synthesized speech
- Speech Mark has four types:
- Sentence
- Word
- Viseme
- SSML
- Amazon Polly accepts plain text, UTF-8, and SSML as inputs.
- Pronounces out abbreviations and acronyms
- Interprets date/time and unit of measurements.
- Homograph disambiguation
- For example, “St.” can be read as ”saint” or “street.” Amazon Polly is capable of identifying their difference depending on a given context.
- Custom lexicon
- Supports customizing the pronunciation of words uncommon to the selected language.
- Standard TTS
- $4.00 per 1 million characters
- Neural TTS
- $16.00 per 1 million characters
No comments:
Post a Comment