• Thomas Wolf's avatar
    [Dependencies|tokenizers] Make both SentencePiece and Tokenizers optional dependencies (#7659) · ba8c4d0a
    Thomas Wolf authored
    * splitting fast and slow tokenizers [WIP]
    
    * [WIP] splitting sentencepiece and tokenizers dependencies
    
    * update dummy objects
    
    * add name_or_path to models and tokenizers
    
    * prefix added to file names
    
    * prefix
    
    * styling + quality
    
    * spliting all the tokenizer files - sorting sentencepiece based ones
    
    * update tokenizer version up to 0.9.0
    
    * remove hard dependency on sentencepiece 🎉
    
    * and removed hard dependency on tokenizers 🎉
    
    
    
    * update conversion script
    
    * update missing models
    
    * fixing tests
    
    * move test_tokenization_fast to main tokenization tests - fix bugs
    
    * bump up tokenizers
    
    * fix bert_generation
    
    * update ad fix several tokenizers
    
    * keep sentencepiece in deps for now
    
    * fix funnel and deberta tests
    
    * fix fsmt
    
    * fix marian tests
    
    * fix layoutlm
    
    * fix squeezebert and gpt2
    
    * fix T5 tokenization
    
    * fix xlnet tests
    
    * style
    
    * fix mbart
    
    * bump up tokenizers to 0.9.2
    
    * fix model tests
    
    * fix tf models
    
    * fix seq2seq examples
    
    * fix tests without sentencepiece
    
    * fix slow => fast  conversion without sentencepiece
    
    * update auto and bert generation tests
    
    * fix mbart tests
    
    * fix auto and common test without tokenizers
    
    * fix tests without tokenizers
    
    * clean up tests lighten up when tokenizers + sentencepiece are both off
    
    * style quality and tests fixing
    
    * add sentencepiece to doc/examples reqs
    
    * leave sentencepiece on for now
    
    * style quality split hebert and fix pegasus
    
    * WIP Herbert fast
    
    * add sample_text_no_unicode and fix hebert tokenization
    
    * skip FSMT example test for now
    
    * fix style
    
    * fix fsmt in example tests
    
    * update following Lysandre and Sylvain's comments
    
    * Update src/transformers/testing_utils.py
    
    Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
    
    * Update src/transformers/testing_utils.py
    
    Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
    
    * Update src/transformers/tokenization_utils_base.py
    
    Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
    
    * Update src/transformers/tokenization_utils_base.py
    
    Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
    
    Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
    ba8c4d0a