sakemin / pytsmod

PyTSMod is an open-source library for Time-Scale Modification(eg. time-stretching) algorithms, by Sangeon Yong at MAC Lab, KAIST.

  • Public
  • 163 runs
  • GitHub
  • License

Run time and cost

This model costs approximately $0.0099 to run on Replicate, or 101 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia A40 GPU hardware. Predictions typically complete within 18 seconds. The predict time for this model varies significantly based on the inputs.

Readme

PyTSMod

PyTSMod is an open-source library for Time-Scale Modification algorithms in Python 3. PyTSMod contains basic TSM algorithms such as Overlap-Add (OLA), Waveform-Similarity Overlap-Add (WSOLA), Time-Domain Pitch-Synchronous Overlap-Add (TD-PSOLA), and Phase Vocoder (PV-TSM). We are also planning to add more TSM algorithms and pitch shifting algorithms.

Time Stretching

  • With the methods OLA, WSOLA and PV-TSM, constant ratio time stretching is available with s_fixed value, and also dynamic time stretching is available with s_ap.
  • s_ap takes anchor point pair values in dict format.
  • Anchor point value is based on 0~1, 0 means the starting point of audio, and 1 means the end point of audio. (eg. 0:0, 0.5:1, 1:1.7 means first half;0~50% of the audio will be stretched 2x, and the last half;50~100% of the audio will be streched 140%.)
  • With setting absolute_second as True, anchor point value is taken with absolute second metric.

Key/Pitch Shifting

  • TD-PSOLA method offers key/pitch shifting in both constant and dynamic ways.
  • Key shifting is available with setting td_psola_pitch_shift as key.
  • td_psola_key_updown for fixed constant key shifting.
  • td_psola_dynamic_key for dynamic key shifting. Must be formatted in dict type[relative frame ratio(0.0~1.0):key_shift_amount]. (eg. [0.3:0, 0.6:1, 1:-2] means for first 0 ~ 30% part of the audio, it keeps the original key, for 30 ~ 60% key is shifted +1 and for 60 ~ 100% key is shifted -2.)
  • Pitch shifting is available with setting td_psola_pitch_shift as pitch.
  • td_psola_pitch_ratio for fixed constant pitch shifting.
  • td_psola_dynamic_pitch for dynamic pitch shifting. Must be formatted in dict type[relative frame ratio(0.0~1.0):pitch_shift_amount]. (eg. [0.5:1, 0.8:2, 1:1.3] means for first 0 ~ 50% part of the audio, it keeps the original key, for 50 ~ 80% pitch is shifted +1 octave and for 80 ~ 100% pitch is shifted 130% of original pitch value.)
  • With setting absolute_second as True, anchor point value is taken with absolute second metric.

Take a look of Examples for the use cases.

Full documentation is available on https://pytsmod.readthedocs.io