You're looking at a specific version of this model. Jump to the model overview.
            
              
                
              
            
            Input schema
          
        The fields you can use to run this model with an API. If you don’t give a value for a field its default value will be used.
| Field | Type | Default value | Description | 
|---|---|---|---|
| audio | 
           
            string
            
           
         | 
        
           
            Audio file
           
         | 
      |
| model | 
           
            None
            
           
         | 
        
          
             
              large-v3
             
          
          
          
         | 
        
           
            Whisper model size (currently only large-v3 is supported).
           
         | 
      
| transcription | 
           
            None
            
           
         | 
        
          
             
              plain text
             
          
          
          
         | 
        
           
            Choose the format for the transcription
           
         | 
      
| translate | 
           
            boolean
            
           
         | 
        
          
             
              False
             
          
          
          
         | 
        
           
            Translate the text to English when set to True
           
         | 
      
| language | 
           
            None
            
           
         | 
        
          
             
              auto
             
          
          
          
         | 
        
           
            Language spoken in the audio, specify 'auto' for automatic language detection
           
         | 
      
| temperature | 
           
            number
            
           
         | 
        
          
             
              0
             
          
          
          
         | 
        
           
            temperature to use for sampling
           
         | 
      
| patience | 
           
            number
            
           
         | 
        
           
            optional patience value to use in beam decoding, as in https://arxiv.org/abs/2204.05424, the default (1.0) is equivalent to conventional beam search
           
         | 
      |
| suppress_tokens | 
           
            string
            
           
         | 
        
          
             
              -1
             
          
          
          
         | 
        
           
            comma-separated list of token ids to suppress during sampling; '-1' will suppress most special characters except common punctuations
           
         | 
      
| initial_prompt | 
           
            string
            
           
         | 
        
           
            optional text to provide as a prompt for the first window.
           
         | 
      |
| condition_on_previous_text | 
           
            boolean
            
           
         | 
        
          
             
              True
             
          
          
          
         | 
        
           
            if True, provide the previous output of the model as a prompt for the next window; disabling may make the text inconsistent across windows, but the model becomes less prone to getting stuck in a failure loop
           
         | 
      
| temperature_increment_on_fallback | 
           
            number
            
           
         | 
        
          
             
              0.2
             
          
          
          
         | 
        
           
            temperature to increase when falling back when the decoding fails to meet either of the thresholds below
           
         | 
      
| compression_ratio_threshold | 
           
            number
            
           
         | 
        
          
             
              2.4
             
          
          
          
         | 
        
           
            if the gzip compression ratio is higher than this value, treat the decoding as failed
           
         | 
      
| logprob_threshold | 
           
            number
            
           
         | 
        
          
             
              -1
             
          
          
          
         | 
        
           
            if the average log probability is lower than this value, treat the decoding as failed
           
         | 
      
| no_speech_threshold | 
           
            number
            
           
         | 
        
          
             
              0.6
             
          
          
          
         | 
        
           
            if the probability of the <|nospeech|> token is higher than this value AND the decoding has failed due to `logprob_threshold`, consider the segment as silence
           
         | 
      
            
              
                
              
            
            Output schema
          
        The shape of the response you’ll get when you run this model with an API.
              Schema
            
            {'properties': {'detected_language': {'title': 'Detected Language',
                                      'type': 'string'},
                'segments': {'title': 'Segments'},
                'srt_file': {'format': 'uri',
                             'title': 'Srt File',
                             'type': 'string'},
                'transcription': {'title': 'Transcription', 'type': 'string'},
                'translation': {'title': 'Translation', 'type': 'string'},
                'txt_file': {'format': 'uri',
                             'title': 'Txt File',
                             'type': 'string'}},
 'required': ['detected_language', 'transcription'],
 'title': 'ModelOutput',
 'type': 'object'}