Commits · pipelines_signatures · 某某某 / transformers-new

17 Apr, 2023 15 commits
- Make the signature of pipelines more explicit · 18622b29
  Sylvain Gugger authored 2 years ago
  
  18622b29
- Mark auto models as important (#22815) · dacd3456
  Sylvain Gugger authored 2 years ago
```
* Mark auto models as important

* Annoying file with bad line endings
```
  dacd3456
- Introduce `PartialState` as the device handler in the `Trainer` (#22752) · 03462875
  Zachary Mueller authored 2 years ago
```
* Use accelerate for device management

* Add accelerate to setup


Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
```
  03462875
- Revert "Use code on the Hub from another repo" (#22813) · 50caa206
  Sylvain Gugger authored 2 years ago
```
Revert "Use code on the Hub from another repo (#22698)"

This reverts commit ea7b0a53.
```
  50caa206
- Simplify update metadata job (#22811) · e13d6ef7
  Sylvain Gugger authored 2 years ago
```
* Simplify update metadata job

* Match more branch names

* Install all what is necessary

* Install all what is necessary

* Forgot the dev

* Install less stuff

* This syntax?
```
  e13d6ef7
- Remove accelerate from tf test reqs (#22777) · cd3e0211
  Zachary Mueller authored 2 years ago
```
Remove accelerate from tf
```
  cd3e0211
- Fix squeeze into torch 1.x compatible form in llama model (#22808) · f8c43c94
  Kunhao ZHENG authored 2 years ago
```
fix-squeeze-tuple
```
  f8c43c94
- Don't use `LayoutLMv2` and `LayoutLMv3` in some pipeline tests (#22774) · 5269718c
  Yih-Dar authored 2 years ago
```
* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
  5269718c
- Use code on the Hub from another repo (#22698) · ea7b0a53
  Sylvain Gugger authored 2 years ago
```
* initial work

* Add other classes

* Refactor code

* Move warning and fix dynamic pipeline

* Issue warning when necessary

* Add test
```
  ea7b0a53
- [i18n-KO] Translated `tasks/translation.mdx` to Korean (#22805) · 4d2c52e8
  Wonhyeong Seo authored 2 years ago
```
docs: ko: tasks/translation.mdx
```
  4d2c52e8
- Fix sneaky torch dependency in TF example (#22804) · 2237127a
  Matt authored 2 years ago
  
  2237127a
- improve(llama): Faster apply_rotary_pos_emb (#22785) · 626c1b8a
  fpgaminer authored 2 years ago
  
  626c1b8a
- [i18n-KO] fix: docs: ko: sagemaker anchors and `_toctree.yml` (#22549) · abbc96a2
  Jungnerd authored 2 years ago
```
fix: docs: ko: sagemaker anchors and  `_toctree.yml`

Co-authored-by: Hyeonseo Yun <0525_hhgus@naver.com>
Co-authored-by: Gabriel Yang <gabrielwithhappy@gmail.com>
Co-authored-by: Sohyun Sim <96299403+sim-so@users.noreply.github.com>
Co-authored-by: Na Yeon Han <nayeon2.han@gmail.com>
Co-authored-by: Wonhyeong Seo <wonhseo@kakao.com>
```
  abbc96a2
- [i18n-KO] Translated `custom_models.mdx` to Korean (#22534) · 18c89481
  Na Yeon Han authored 2 years ago
```
docs: ko: translated `custom_models.mdx`

Co-authored-by: Wonhyeong Seo <wonhseo@kakao.com>
Co-authored-by: Gabriel Yang <gabrielwithhappy@gmail.com>
Co-authored-by: Jungnerd <46880056+jungnerd@users.noreply.github.com>
```
  18c89481
- Fix `test_word_time_stamp_integration` for `Wav2Vec2ProcessorWithLMTest` (#22800) · 76d24f1a
  Yih-Dar authored 2 years ago
```
* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
  76d24f1a
15 Apr, 2023 1 commit
- Generate: add CJK support to TextStreamer (#22664) · 28f26c10
  bcol authored 2 years ago
  
  28f26c10
14 Apr, 2023 12 commits

Move labels to the same device as logits for Whisper (#22779) · fb3aa06c
oscar-garzon authored 2 years ago

fb3aa06c
Indexing fix - CLIP checkpoint conversion (#22776) · 20e54e49
amyeroberts authored 2 years ago
```
* Indexing fix - CLIP checkpoint conversion

* Fix up
```
20e54e49
Seq2SeqTrainer: Evict decoder_input_ids only when it is created from labels (#22772) · 895ae3b5
Joao Gante authored 2 years ago

895ae3b5
Fix word_ids hyperlink (#22765) · daf53241
Mayank Agarwal authored 2 years ago
```
* Fix word_ids hyperlink

* Add suggested fix
```
daf53241
Tweak ESM tokenizer for Nucleotide Transformer (#22770) · 06e737fb
Matt authored 2 years ago
```
* If EOS is None, don't add it to sequences

* If EOS is None, don't add it to sequences
```
06e737fb

[i18n-KO] Translated `tutorial/proprecssing.mdx` to Korean (#22578) · c8df3900

Sohyun Sim authored 2 years ago


* add ko preprocessing

* translate preprocessing.mdx to korean

* translate preprocessing.mdx

* Update preprocessing.mdx

Fixed the line 273 as below:
또한, 특징 추출기에 `sampling_rate` 인자를 추가하여 발생할 수 있는 조용한 오류(silent errors)를 더 잘 디버깅하는 것을 권장합니다.

* translate Image part

* translated preprocess.mdx

* Update docs/source/ko/preprocessing.mdx

Co-authored-by: Wonhyeong Seo <wonhseo@kakao.com>

* Update docs/source/ko/preprocessing.mdx

Co-authored-by: Wonhyeong Seo <wonhseo@kakao.com>

* Update docs/source/ko/preprocessing.mdx

Co-authored-by: Wonhyeong Seo <wonhseo@kakao.com>

* Update docs/source/ko/preprocessing.mdx

Co-authored-by: Wonhyeong Seo <wonhseo@kakao.com>

* Update docs/source/ko/preprocessing.mdx

Co-authored-by: Wonhyeong Seo <wonhseo@kakao.com>

* Update docs/source/ko/preprocessing.mdx

Co-authored-by: Wonhyeong Seo <wonhseo@kakao.com>

* Update docs/source/ko/preprocessing.mdx

Co-authored-by: Wonhyeong Seo <wonhseo@kakao.com>

* Update docs/source/ko/preprocessing.mdx

Co-authored-by: Wonhyeong Seo <wonhseo@kakao.com>

* Update docs/source/ko/preprocessing.mdx

* Update docs/source/ko/preprocessing.mdx

* Update docs/source/ko/preprocessing.mdx

* Update docs/source/ko/preprocessing.mdx

* Update docs/source/ko/preprocessing.mdx

* Update docs/source/ko/preprocessing.mdx

* fixed translation

---------

Co-authored-by: Wonhyeong Seo <wonhseo@kakao.com>

c8df3900

Fix failing torchscript tests for `CpmAnt` model (#22766) · 53c710d1
Yih-Dar authored 2 years ago
```
* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
53c710d1

Fix a mistake in Llama weight converter log output. (#22764) · d2ffc3fc

Alexander Ljungberg authored 2 years ago

Fixed string format; better tokenizer message.

Before: `Saving a {tokenizer_class} to {tokenizer_path}`
After: `Saving a LlamaTokenizerFast to outdir.`

d2ffc3fc

Generate: pin number of beams in BART test (#22763) · 9af845af
Joao Gante authored 2 years ago

9af845af
Pix2struct: doctest fix (#22761) · 66b15efb
Joao Gante authored 2 years ago

66b15efb

[Examples] TPU-based training of a language model using TensorFlow (#21657) · 390e121f

Sayak Paul authored 2 years ago


* add: tokenizer training script for TF TPU LM training.

* add: script for preparing the TFRecord shards.

* add: sequence of execution to readme.

* remove limit from the tfrecord shard name.

* Add initial train_model.py

* Add basic training arguments and model init

* Get up to the point of writing the data collator

* Pushing progress so far!

* Complete first draft of model training code

* feat: grouping of texts efficiently.

Co-authored-by: Matt <rocketknight1@gmail.com>

* Add proper masking collator and get training loop working

* fix: things.

* Read sample counts from filenames

* Read sample counts from filenames

* Draft README

* Improve TPU warning

* Use distribute instead of distribute.experimental

* Apply suggestions from code review

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* Modularize loading and add MLM probability as arg

* minor refactoring to better use the cli args.

* readme fillup.

* include tpu and inference sections in the readme.

* table of contents.

* parallelize maps.

* polish readme.

* change script name to run_mlm.py

* address PR feedback (round I).

---------

Co-authored-by: Matt <rocketknight1@gmail.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

390e121f

[i18n-KO] Translated `sequence_classification.mdx` to Korean (#22655) · bfb3925f

Hyeonseo Yun authored 2 years ago


* docs: ko: init: tasks/sequence_classification.mdx

* docs: ko: revised: change voca in tasks/sequence_classification.mdx

* docs: ko: revised: [RE] change voca in tasks/sequence_classification.mdx

* docs: ko: revised: spell check and sentence naturally in tasks/sequence_classification.mdx

* docs: ko: revised: spell check and consistent vocabulary in tasks/sequence_classification.mdx

* docs: ko: revised: Add full stop and change voca in tasks/sequence_classification.mdx

* docs: ko: revised: sync first section templates in tasks/sequence_classification.mdx

Co-authored-by: Wonhyeong Seo <wonhseo@kakao.com>

* fix: revert use of full-stops to colons

* colons are used to emphasize the code block that follows

* @0525hhgus @wonhyeongseo docs: ko: revised: sync second section templates in tasks/sequence_classification.mdx

Co-Authored-By: Wonhyeong Seo <wonhseo@kakao.com>

* docs: ko: revised: change 'train', 'finetuning' in tasks/sequence_classification.mdx

---------

Co-authored-by: Wonhyeong Seo <wonhseo@kakao.com>

bfb3925f

13 Apr, 2023 12 commits
- Fix `serving_output` for TF composite models (encoder-decoder like models) (#22743) · a6752a7d
  Yih-Dar authored 2 years ago
```
* fix

* style

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
  a6752a7d
- Revert (for now) the change on `Deta` in #22437 (#22750) · 410b61ad
  Yih-Dar authored 2 years ago
```
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
  410b61ad
- Generate: handle text conditioning with multimodal encoder-decoder models (#22748) · 9dfd6a4b
  Joao Gante authored 2 years ago
  
  9dfd6a4b
- fix(llama): fix LlamaTokenzier (#22746) · 90ce374d
  Ruiyang Sun authored 2 years ago
```
Bug in LlamaTokenizer when  #22742
```
  90ce374d
- [trainer] update url (#22747) · d85bf954
  Stas Bekman authored 2 years ago
```
* [trainer] update url

* style
```
  d85bf954
- Remove `DS_BUILD_AIO=1` (#22741) · 656d41ab
  Yih-Dar authored 2 years ago
```
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
  656d41ab
- `DocumentQuestionAnsweringPipeline` only for fast tokenizers (#22745) · 32b08742
  Yih-Dar authored 2 years ago
```
* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
  32b08742
- [i18n-KO] Translated `training.mdx` to Korean (#22670) · 4def2fe9
  Gabriel Yang authored 2 years ago
```
translate training doc to Korean
```
  4def2fe9
- Change `torch_dtype` to `str` when `saved_model=True` in `save_pretrained` for TF models (#22740) · 7df13432
  Yih-Dar authored 2 years ago
```
* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
  7df13432
- [Pix2struct] Simplify generation (#22527) · 8eb38f63
  NielsRogge authored 2 years ago
```
* Add model to doc tests

* Remove generate and replace by prepare_inputs_for_generation

* More fixes

* Remove print statements

* Update integration tests

* Fix generate

* Remove model from auto mapping

* Use auto processor

* Fix integration tests

* Fix test

* Add inference code snippet

* Remove is_encoder_decoder

* Update docs

* Remove notebook link
```
  8eb38f63
- Make vilt, switch_transformers compatible with model parallelism (#22703) · 95e70575
  Rinat authored 2 years ago
```
* Update modeling_vilt.py

Vilt compatible with model parallelism

* Update modeling_switch_transformers.py

switch_transformers compatible with model parallelism
```
  95e70575
- Indexing fix for gpt_bigcode (#22737) · 89087597
  Joel Lamy-Poirier authored 2 years ago
```
Fix indexing
```
  89087597