You're looking at a specific version of this model. Jump to the model overview.
            
              
                
              
            
            Input schema
          
        The fields you can use to run this model with an API. If you don’t give a value for a field its default value will be used.
| Field | Type | Default value | Description | 
|---|---|---|---|
| image | 
           
            string
            
           
         | 
        
           
            Input image (e.g. PNG/JPG).
           
         | 
      |
| audio | 
           
            string
            
           
         | 
        
           
            Input audio (e.g. WAV/MP3).
           
         | 
      |
| seed | 
           
            integer
            
           
         | 
        
          
             
              0
             
          
          
          
         | 
        
           
            Set a random seed (0 for random)
           
         | 
      
| resolution | 
           
            integer
            
           
         | 
        
          
             
              512
             
          
          
          
         | 
        
           
            Resolution for generation (square). Default: 512
           
         | 
      
| fps | 
           
            integer
            
           
         | 
        
          
             
              30
             
          
          
          
         | 
        
           
            Frames per second of output video. Default: 30
           
         | 
      
| num_generated_frames_per_clip | 
           
            integer
            
           
         | 
        
          
             
              16
             
          
          
          
         | 
        
           
            Frames per video clip chunk. Default: 16
           
         | 
      
| inference_steps | 
           
            integer
            
           
         | 
        
          
             
              20
             
          
          
          
         | 
        
           
            Diffusion inference steps. Default: 20
           
         | 
      
| cfg_scale | 
           
            number
            
           
         | 
        
          
             
              3.5
             
          
          
          
         | 
        
           
            Classifier-free guidance scale. Default: 3.5
           
         | 
      
| max_audio_seconds | 
           
            integer
            
           
         | 
        
          
             
              8
             
          
          
          
         | 
        
           
            Max audio duration (in seconds). Default: 8
           
         | 
      
            
              
                
              
            
            Output schema
          
        The shape of the response you’ll get when you run this model with an API.
              Schema
            
            {'format': 'uri', 'title': 'Output', 'type': 'string'}