Amazon Polly
An AI service that uses advanced deep learning technologies to synthesize natural-sounding human speech.
Free Tier
5 million characters per month
Voice Types
Neural and Standard
Languages
30+
About Amazon Polly
Amazon Polly is a cloud service that converts text into lifelike speech, allowing you to create applications that talk. Polly's text-to-speech (TTS) service uses advanced deep learning technologies to synthesize natural-sounding human speech across dozens of voices and languages. It includes a variety of Neural Text-to-Speech (NTTS) voices, which deliver significant improvements in speech quality. The service supports Speech Synthesis Markup Language (SSML) for fine-grained control over speech aspects like pronunciation, volume, and rate. Developers can generate speech in real-time streams or save it as standard audio files, making it suitable for interactive voice systems, audio content creation, and accessibility applications.
Core Capabilities
Speech Synthesis
Converts text input into high-quality speech audio.
Neural Voices
Provides natural and expressive speech using Neural Text-to-Speech (NTTS) technology.
Customization
Control speech output with SSML tags for pronunciation, speed, pitch, and volume.
Custom Lexicons
Customize the pronunciation of specific words and phrases.
Technical Features
Real-Time Streaming
Enables immediate playback of synthesized speech.
Audio Formats
Supports MP3, Ogg Vorbis, and raw PCM audio streams.
Speech Marks
Provides metadata to synchronize facial animation or highlight text as it's being spoken.
Integration
Accessible via the AWS Management Console, CLI, and SDKs for various programming languages.