1. 01 Sep, 2023 7 commits
    • Sanchit Gandhi's avatar
      [VITS] Add to TTA pipeline (#25906) · b439129e
      Sanchit Gandhi authored
      
      * [VITS] Add to TTA pipeline
      
      * Update tests/pipelines/test_pipelines_text_to_audio.py
      
      Co-authored-by: default avatarYoach Lacombe <52246514+ylacombe@users.noreply.github.com>
      
      * remove extra spaces
      
      ---------
      
      Co-authored-by: default avatarYoach Lacombe <52246514+ylacombe@users.noreply.github.com>
      b439129e
    • Zach Mueller's avatar
      Revert frozen training arguments (#25903) · be0e189b
      Zach Mueller authored
      * Revert frozen training arguments
      
      * TODO
      be0e189b
    • Omar Sanseviero's avatar
      Remove broken docs for MusicGen (#25905) · 69c5b8f1
      Omar Sanseviero authored
      Update musicgen.md
      69c5b8f1
    • Yih-Dar's avatar
      Better error message for pipeline loading (#25912) · 16d6e308
      Yih-Dar authored
      
      * update
      
      * update
      
      * update
      
      * update
      
      ---------
      
      Co-authored-by: default avatarydshieh <ydshieh@users.noreply.github.com>
      16d6e308
    • Joao Gante's avatar
      Falcon: Add RoPE scaling (#25878) · 53e2fd78
      Joao Gante authored
      53e2fd78
    • pkumc's avatar
      fix FSDP model resume optimizer & scheduler (#25852) · 024acd27
      pkumc authored
      
      * fix FSDP resume optimizer & scheduler
      
      * improve trainer code quality
      
      ---------
      
      Co-authored-by: default avatarmachi04 <machi04@meituan.com>
      024acd27
    • Matthijs Hollemans's avatar
      add VITS model (#24085) · 4ece3b94
      Matthijs Hollemans authored
      
      * add VITS model
      
      * let's vits
      
      * finish TextEncoder (mostly)
      
      * rename VITS to Vits
      
      * add StochasticDurationPredictor
      
      * ads flow model
      
      * add generator
      
      * correctly set vocab size
      
      * add tokenizer
      
      * remove processor & feature extractor
      
      * add PosteriorEncoder
      
      * add missing weights to SDP
      
      * also convert LJSpeech and VCTK checkpoints
      
      * add training stuff in forward
      
      * add placeholder tests for tokenizer
      
      * add placeholder tests for model
      
      * starting cleanup
      
      * let the great renaming begin!
      
      * use config
      
      * global_conditioning
      
      * more cleaning
      
      * renaming variables
      
      * more renaming
      
      * more renaming
      
      * it never ends
      
      * reticulating the splines
      
      * more renaming
      
      * HiFi-GAN
      
      * doc strings for main model
      
      * fixup
      
      * fix-copies
      
      * don't make it a PreTrainedModel
      
      * fixup
      
      * rename config options
      
      * remove training logic from forward pass
      
      * simplify relative position
      
      * use actual checkpoint
      
      * style
      
      * PR review fixes
      
      * more review changes
      
      * fixup
      
      * more unit tests
      
      * fixup
      
      * fix doc test
      
      * add integration test
      
      * improve tokenizer tests
      
      * add tokenizer integration test
      
      * fix tests on GPU (gave OOM)
      
      * conversion script can handle repos from hub
      
      * add conversion script for all MMS-TTS checkpoints
      
      * automatically create a README for the converted checkpoint
      
      * small changes to config
      
      * push README to hub
      
      * only show uroman note for checkpoints that need it
      
      * remove conversion script because code formatting breaks the readme
      
      * make WaveNet layers configurable
      
      * rename variables
      
      * simplifying the math
      
      * output attentions and hidden states
      
      * remove VitsFlip in flow model
      
      * also got rid of the other flip
      
      * fix tests
      
      * rename more variables
      
      * rename tokenizer, add phonemization
      
      * raise error when phonemizer missing
      
      * re-order config docstrings to match method
      
      * change config naming
      
      * remove redundant str -> list
      
      * fix copyright: vits authors -> kakao enterprise
      
      * (mean, log_variances) -> (prior_mean, prior_log_variances)
      
      * if return dict -> if not return dict
      
      * speed -> speaking rate
      
      * Apply suggestions from code review
      
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * update fused tanh sigmoid
      
      * reduce dims in tester
      
      * audio -> output_values
      
      * audio -> output_values in tuple out
      
      * fix return type
      
      * fix return type
      
      * make _unconstrained_rational_quadratic_spline a function
      
      * all nn's to accept a config
      
      * add spectro to output
      
      * move {speaking rate, noise scale, noise scale duration} to config
      
      * path -> attn_path
      
      * idxs -> valid idxs -> padded idxs
      
      * output values -> waveform
      
      * use config for attention
      
      * make generation work
      
      * harden integration test
      
      * add spectrogram to dict output
      
      * tokenizer refactor
      
      * make style
      
      * remove 'fake' padding token
      
      * harden tokenizer tests
      
      * ron norm test
      
      * fprop / save tests deterministic
      
      * move uroman to tokenizer as much as possible
      
      * better logger message
      
      * fix vivit imports
      
      * add uroman integration test
      
      * make style
      
      * up
      
      * matthijs -> sanchit-gandhi
      
      * fix tokenizer test
      
      * make fix-copies
      
      * fix dict comprehension
      
      * fix config tests
      
      * fix model tests
      
      * make outputs consistent with reverse/not reverse
      
      * fix key concat
      
      * more model details
      
      * add author
      
      * return dict
      
      * speaker error
      
      * labels error
      
      * Apply suggestions from code review
      
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Update src/transformers/models/vits/convert_original_checkpoint.py
      
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * remove uromanize
      
      * add docstrings
      
      * add docstrings for tokenizer
      
      * upper-case skip messages
      
      * fix return dict
      
      * style
      
      * finish tests
      
      * update checkpoints
      
      * make style
      
      * remove doctest file
      
      * revert
      
      * fix docstring
      
      * fix tokenizer
      
      * remove uroman integration test
      
      * add sampling rate
      
      * fix docs / docstrings
      
      * style
      
      * add sr to model output
      
      * fix outputs
      
      * style / copies
      
      * fix docstring
      
      * fix copies
      
      * remove sr from model outputs
      
      * Update utils/documentation_tests.txt
      
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * add sr as allowed attr
      
      ---------
      
      Co-authored-by: default avatarsanchit-gandhi <sanchit@huggingface.co>
      Co-authored-by: default avatarSanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      4ece3b94
  2. 31 Aug, 2023 11 commits
  3. 30 Aug, 2023 11 commits
  4. 29 Aug, 2023 11 commits