Commits · idefics3 · zhusg / transformers-new

23 Aug, 2024 3 commits

Integrate Liger (Linkedin GPU Efficient Runtime) Kernel to Trainer (#32860) · adb91179

Jason (Siyu) Zhu authored 10 months ago


* add liger integration

* fix syntax

* fix import issue

* add trainer.md

* Use _apply_liger_kernel()

* Fixed log message

* Update docs/source/en/trainer.md

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update docs/source/en/trainer.md

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update src/transformers/training_args.py

Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>

* Update src/transformers/trainer.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update src/transformers/training_args.py

Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>

* Update docs/source/en/trainer.md

Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>

* Fixed checkstyle and updated readme

* Added test

* Fixed checkstyle

* fix docstring

* rename use_liger to use_liger_kernel

* Trigger Build

* Added test

* add fix-copies

* Fixed copy inconsistencies

---------

Co-authored-by: shimizust <sshimizu@linkedin.com>
Co-authored-by: Steven Shimizu <shimizust@gmail.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>

adb91179

Forbid `PretrainedConfig` from saving `generate` parameters; Update... · 970a16ec

Joao Gante authored 10 months ago

Forbid `PretrainedConfig` from saving `generate` parameters; Update deprecations in `generate`-related code 🧹  (#32659)

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

970a16ec

Reducing memory usage: removing useless logits computation in generate() (#31292) · 22e6f145

Cyril Vallez authored 10 months ago

* Add .float() in all generation methods logit outputs

* Switch float-casting of logits to training only for main models

* Add `num_logits_to_keep` in Llama and add it by default in generate

* Apply style

* Add num_logits_to_keep as arg in prepare_input_for_generation

* Add support for Mistral

* Revert models except llama and mistral

* Fix default None value in _supports_num_logits_to_keep()

* Fix dimension of dummy input

* Add exception for prophetnet in _supports_num_logits_to_keep()

* Update _supports_num_logits_to_keep() to use inspect.signature()

* Add deprecation cycle + remove modification with pretraining_tp

* Apply style

* Add most used models

* Apply style

* Make `num_logits_to_keep` an int in all cases to remove if-else clause

* Add compile check for the warning

* Fix torch versions

* style

* Add gemma2

* Update warning version

* Add comment about .float operations in generation utils

* Add tests in GenerationTesterMixin and ModelTesterMixin

* Fix batch size for assisted decoding in tests

* fix small issues in test

* refacor test

* fix slicing removing dim issue

* Add nemotron support (should fix check-copy issue in CIs)

* Trigger new CIs

* Trigger new CIs

* Bump version

* Bump version in TODO

* Trigger CIs

* remove blank space

* Trigger CIs

22e6f145

22 Aug, 2024 17 commits

docs: fix outdated link to TF32 explanation (#32947) · d806fa3e
Stefano Fiorucci authored 10 months ago
```
fix outdated link
```
d806fa3e
Generate: Deprecate returning legacy cache by default; Handle `use_cache=False` (#32863) · a26de151
Joao Gante authored 10 months ago

a26de151

🌐

[i18n-KO] Translated `knowledge_distillation_for_image_classification.md to Korean" (#32334) · 09e6579d

Jinuk authored 10 months ago


* docs: ko: tasks/knowledge_distillation_for_image_classification.md

* feat: nmt draft

* fix: manual edits

* Apply suggestions from code review

Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr>

* Apply suggestions from code review

Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr>

* Apply suggestions from code review

Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>

* Apply suggestions from code review

Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>

* Apply suggestions from code review

Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>

* Apply suggestions from code review

Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr>

* Apply suggestions from code review

Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr>

* Apply suggestions from code review

Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr>

* Apply suggestions from code review

* Apply suggestions from code review

* Apply suggestions from code review

* Apply suggestions from code review

---------

Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr>
Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>

09e6579d

Fix regression on `Processor.save_pretrained` caused by #31691 (#32921) · 273c0afc
Franz Louis Cesista authored 10 months ago
```
fix save_pretrained
```
273c0afc
[run_slow] idefics2 (#32840) · 18199b34
Andrés Marafioti authored 10 months ago

18199b34
Gemma2: eager attention by default (#32865) · 975b988b
Joao Gante authored 10 months ago

975b988b

fix: (issue #32689) `AttributeError` raised when using `Trainer` with... · f1d822ba

Shaopeng Fu authored 10 months ago

fix: (issue #32689) `AttributeError` raised when using `Trainer` with `eval_on_start=True` in Jupyter Notebook. (#32849)

fix: `AttributeError` raised when using `Trainer` with `eval_on_start=True` in Jupyter Notebook.

f1d822ba

Add chat_template for tokenizer extracted from GGUF model (#32908) · ee8c01f8
Isotr0py authored 10 months ago
```
* add chat_template to gguf tokenizer

* add template through tokenizer config
```
ee8c01f8
Improve greedy search memory usage (#32895) · 99d67f1a
regisss authored 10 months ago
```
Do not call torch.repeat_interleave if expand_size is 1
```
99d67f1a

Fix benchmark script (#32635) · bf97d4aa

Yih-Dar authored 10 months ago


* fix

* >= 0.3.0

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

bf97d4aa

Add SynCode to llm_tutorial (#32884) · 92824136
Shubham Ugare authored 10 months ago

92824136

FIX / Hub: Also catch for `exceptions.ConnectionError` (#31469) · eeea7120

Younes Belkada authored 10 months ago


* Update hub.py

* Update errors

* Apply suggestions from code review

Co-authored-by: Lucain <lucainp@gmail.com>

---------

Co-authored-by: Amy Roberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Lucain <lucainp@gmail.com>

eeea7120

CI: separate step to download nltk files (#32935) · 8b94d28f
Joao Gante authored 10 months ago
```
* separate step to download nltk files

* duplicated

* rm comma
```
8b94d28f

FEAT / Trainer: Add adamw 4bit optimizer (#31865) · c42d2645

Marc Sun authored 10 months ago

* add 4bit optimizer

* style

* fix msg

* style

* add qgalore

* Revert "add qgalore"

This reverts commit 25278e805f24d5d48eaa0638abb48de1b783a3fb.

* style

* version check

c42d2645

fix: no need to dtype A in jamba (#32924) · 6baa6f27
Gal Cohen (galco) authored 10 months ago
```
Co-authored-by: Gal Cohen <galc@ai21.com>
```
6baa6f27
fix: Added missing `huggingface_hub` installation to workflows (#32891) · af638c4a
Sai-Suraj-27 authored 10 months ago
```
Added missing huggingface_hub installation to workflows.
```
af638c4a

Jamba: update integration tests (#32250) · f6e2586a

Joao Gante authored 10 months ago


* try test updates

* a few more changes

* a few more changes

* a few more changes

* [run slow] jamba

* skip logits checks on older gpus

* [run slow] jamba

* oops

* [run slow] jamba

* Update tests/models/jamba/test_modeling_jamba.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/jamba/test_modeling_jamba.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

f6e2586a

21 Aug, 2024 2 commits

Update docker image building (#32918) · 3bb7b052
Arthur authored 10 months ago
```
commit
```
3bb7b052

fix: [whisper] don't overwrite GenerationConfig's `return_timestamps` when... · c6d484e3

Ruilin Huang authored 10 months ago

fix: [whisper] don't overwrite GenerationConfig's `return_timestamps` when `return_timestamps` is not passed to `generate` function (#31296)

[whisper] don't overwrite return_timestamps when not passed to generate

c6d484e3

20 Aug, 2024 9 commits

[i18n-ar] add README_ar.md to README.md (#32583) · 87134662

Ahmed Almaghz authored 10 months ago

* Update README.md

* Update README.md

* Add README_ar.md to i18n/README_de.md

* Add README_ar.md to i18n/README_es.md

* Add README_ar.md to i18n/README_fr.md

* Add README_ar.md to i18n/README_hd.md

* Add README_ar.md to i18n/README_ja.md

* Add README_ar.md to i18n/README_ko.md

* Add README_ar.md to i18n/README_pt-br.md

* Add README_ar.md to i18n/README_ru.md

* Add README_ar.md to i18n/README_te.md

* Add README_ar.md to i18n/README_vi.md

* Add README_ar.md to i18n/README_vi.md

* Add README_ar.md to i18n/README_zh-hans.md

* Add README_ar.md to i18n/README_zh-hant.md

* Create README_ar.md

87134662

link for optimizer names (#32400) · 1dde50c7

Nicholas Broad authored 10 months ago

* link for optimizer names

Add a note and link to where the user can find more optimizer names easily because there are many more optimizers than are mentioned in the docstring.

* make fixup

1dde50c7

Replace `tensor.norm()` with decomposed version for CLIP executorch export (#32887) · 078d5a88
Pavel Iakubovskii authored 10 months ago
```
* Replace .norm() with decomposed version for executorch export

* [run_slow] clip
```
078d5a88

Bump nltk from 3.7 to 3.9 in /examples/research_projects/decision_transformer (#32903) · 9800e6d1

dependabot[bot] authored 10 months ago

Bump nltk in /examples/research_projects/decision_transformer

Bumps [nltk](https://github.com/nltk/nltk) from 3.7 to 3.9.
- [Changelog](https://github.com/nltk/nltk/blob/develop/ChangeLog)
- [Commits](https://github.com/nltk/nltk/compare/3.7...3.9

)

---
updated-dependencies:
- dependency-name: nltk
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

9800e6d1

Fix: Mamba2 `norm_before_gate` usage (#32686) · c63a3d0f

Anton Vlasjuk authored 10 months ago

* mamba2 uses norm_before_gate=False

* small nit

* remove norm_before_gate flag and follow False path only

c63a3d0f

fix: jamba cache fails to use torch.nn.module (#32894) · 01c4fc45
Gal Cohen (galco) authored 10 months ago
```
Co-authored-by: Gal Cohen <galc@ai21.com>
```
01c4fc45
Fix repr for conv (#32897) · 65f4bc99
Arthur authored 10 months ago
```
add nx
```
65f4bc99

🚨

Update min version of accelerate to 0.26.0 (#32627) · fd06ad54

Marc Sun authored 10 months ago

* Update min version of accelerate to 0.26.0

* dev-ci

* update min version in import

* remove useless check

* dev-ci

* style

* dev-ci

* dev-ci

fd06ad54

Allow-head-dim (#32857) · 13e645bb

Arthur authored 10 months ago


* support head dim

* fix the doc

* fixup

* add oproj

Co-authored-by: Suhara
<suhara@users.noreply.github.com>>

* update

Co-authored-by: bzantium <bzantium@users.noreply.github.com>

* Co-authored-by: suhara <suhara@users.noreply.github.com>

* Update

Co-authored-by: Yoshi Suhara <suhara@users.noreply.github.com>

---------

Co-authored-by: bzantium <bzantium@users.noreply.github.com>
Co-authored-by: Yoshi Suhara <suhara@users.noreply.github.com>

13e645bb

19 Aug, 2024 9 commits

Add tip to clarify tool calling (#32883) · 85345bb4
Matt authored 10 months ago

85345bb4
Docs: Fixed `whisper-large-v2` model link in docs (#32871) · 37204848
Sai-Suraj-27 authored 10 months ago
```
Fixed whisper-large-v2 model link in docs.
```
37204848
Fix: Mamba2 generation mismatch between input_ids and inputs_embeds (#32694) · 61d89c19
Anton Vlasjuk authored 10 months ago
```
* fix cache when using input embeddings

* simplify check, we can always add input ids seq len since its 0 in first pass
```
61d89c19

Mamba / FalconMamba: Fix mamba left padding (#32677) · 93e538ae

Younes Belkada authored 10 months ago


* fix mamba left padding

* Apply suggestions from code review

Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>

* fix copies

* test with `inputs_embeds`

* Update src/transformers/models/falcon_mamba/modeling_falcon_mamba.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* copies

* clairfy

* fix last comments

* remove

---------

Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

93e538ae

Fix incorrect vocab size retrieval in GGUF config (#32551) · 59e8f191
Isotr0py authored 10 months ago
```
* fix gguf config vocab size

* minor fix

* link issue
```
59e8f191

RT-DETR parameterized batchnorm freezing (#32631) · 5f6c080b

Alan-Blanchet authored 10 months ago


* fix: Parameterized norm freezing

For the R18 model, the authors don't freeze norms in the backbone.

* Update src/transformers/models/rt_detr/configuration_rt_detr.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

---------

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

5f6c080b

Support save/load ckpt for XLA FSDP (#32311) · 8a4857c0

Yitong Huang authored 10 months ago


* Support save/load ckpt for XLA FSDP

* Fix bug for save

* Fix style

* reserve sharded ckpt and better file naming

* minor fix

Co-authored-by: Zach Mueller <muellerzr@gmail.com>

* add is_fsdp_xla_v1_enabled

---------

Co-authored-by: Zach Mueller <muellerzr@gmail.com>

8a4857c0

Add __repr__ for Conv1D (#32425) · f1b720ed

Aaron Chung authored 10 months ago

* Add representation for Conv1D, for better output info.

* code format for Conv1D

* We add a __repr__ func for Conv1D, this allows the print (or output) of the model's info has a better description for Conv1D.

f1b720ed

[tests] make `test_sdpa_can_compile_dynamic` device-agnostic (#32519) · e55b33ce
Fanli Lin authored 10 months ago
```
* enable

* fix
```
e55b33ce