Thursday, 24 March 2022

Amazon Polly

 

  • A text-to-speech (TTS) service
  • Uses advanced deep learning technologies to convert text into natural, lifelike speech
  • It supports saving text into MP3, OGG, and PCM file formats.
  • Offers Standard and Neural TTS (NTTS)

Common use cases

  • Increase customer engagement
  • Language learning applications
  • Helps visually impaired individuals to consume digital content
  • Testing in-game dialogs
  • Voice response

Concepts

  • Speech Synthesis Markup Language (SSML)
    • Uses XML-based tags to modify different aspects of the text-to-speech output.
    • Can control pitch, speaking style, speech rate, and volume.
  • Standard TTS
    • Concatenates short speech snippets together.
    • Limited in terms of producing different speaking styles.
  • Neural TTS
    • Produces higher quality speech output than Standard TTS.
    • Neural TTS supports two speaking styles:
      • Conversational
      • Newscaster
  • Speech Mark
    • Refers to the metadata that describes the synthesized speech
    • Speech Mark has four types:
      • Sentence
      • Word
      • Viseme
      • SSML

Features

  • Amazon Polly accepts plain text, UTF-8, and SSML as inputs.
  • Pronounces out abbreviations and acronyms
  • Interprets date/time and unit of measurements.
  • Homograph disambiguation 
    • For example,  “St.” can be read as ”saint” or “street.” Amazon Polly is capable of identifying their difference depending on a given context.
  • Custom lexicon
    • Supports customizing the pronunciation of words uncommon to the selected language.

Pricing

  • Standard TTS
    • $4.00 per 1 million characters
  • Neural TTS
    • $16.00 per 1 million characters

No comments:

Post a Comment