Official

playht / play-dialog

End-to-end AI speech model designed for natural-sounding conversational speech synthesis, with support for context-aware prosody, intonation, and emotional expression.

  • Public
  • 6.1K runs

Pricing

Official model
Pricing for official models works differently from other models. Instead of being billed by time, you’re billed by input and output, making pricing more predictable.

This model is priced by how many seconds of audio are generated.

Check out our docs for more information about how per second of audio pricing works on Replicate.

Readme

PlayDialog

PlayDialog is an end-to-end AI speech model designed for natural-sounding conversational speech synthesis, with support for context-aware prosody, intonation, and emotional expression.

Features

Core Speech Model

  • Context-aware speech generation using Adaptive Speech Contextualizer (ASC) architecture
  • Prosody and intonation control based on conversation history
  • Emotion-aware speech synthesis
  • Support for streaming responses from LLMs via WebSocket
  • Trained on hundreds of millions of real conversation samples
  • Complementary to Play 3.0 mini (which supports 30+ languages with low latency)

Performance

  • 2:1 preference ratio in blind testing against leading competitors (n=600)
  • Strong performance in expressiveness metrics
  • Optimized for conversation flow and natural speech patterns

Support

For technical support or sales inquiries, contact our support team.