Commits · muellerzr-multinode-save · 某某某 / transformers-new

21 Dec, 2023 6 commits

Wait for everyone again · a1c7778a
[[ -z $EMAIL ]] && read -e -p "Enter your email (for git configuration): " EMAIL authored 1 year ago

a1c7778a

[`Mixtral` & `Mistral`] Add support for sdpa (#28133) · f9a98c47

Arthur authored 1 year ago


* some nits

* update test

* add support d\sd[a

* remove some dummy inputs

* all good

* style

* nits

* fixes

* fix more copies

* nits

* styling

* fix

* Update src/transformers/models/mistral/modeling_mistral.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* add a slow test just to be sure

* fixup

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

f9a98c47

[Whisper] Use torch for stft if available (#26119) · 814619f5

Sanchit Gandhi authored 1 year ago

* [Whisper] Use torch for stft if available

* update docstring

* mock patch decorator

* fit on one line

814619f5

Fix `input_embeds` docstring in encoder-decoder architectures (#28168) · 7e93ce40
Joao Gante authored 1 year ago

7e93ce40

[bnb] Let's make serialization of 4bit models possible (#26037) · 4f7806ef

Poedator authored 1 year ago


* updated bitsandbytes.py

* rm test_raise_* from test_4bit.py

* add test_4bit_serialization.py

* modeling_utils bulk edits

* bnb_ver 0.41.3 in integrations/bitsandbytes.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* @slow reinstated

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* bnb ver 0.41.3 in  src/transformers/modeling_utils.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* rm bnb version todo in  integrations/bitsandbytes.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* moved 4b serialization tests to test_4bit

* tests upd for opt

* to torch_device

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* ruff fixes to tests

* rm redundant bnb version check in mod_utils

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* restore _hf_peft_config_loaded  modeling_utils.py::2188

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* restore _hf_peft_config_loaded  test in modeling_utils.py::2199

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* fixed NOT getattr(self, "is_8bit_serializable")

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* setting model.is_4bit_serializable

* rm separate fp16_statistics arg from set_module...

* rm else branch in integrations::bnb::set_module

* bnb 4bit dtype check

* upd comment on 4bit weights

* upd tests for FP4 safe

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

4f7806ef

disable test_retain_grad_hidden_states_attentions on SeamlessM4TModelWithTextInputTest (#28169) · e268d7e5
Dean Wyatte authored 1 year ago
```
disable retain_grad_hidden_states_attentions on SeamlessM4TModelWithTextInputTest
```
e268d7e5

20 Dec, 2023 11 commits

Fix yolos resizing (#27663) · 1d777359
amyeroberts authored 1 year ago
```
* Fix yolos resizing

* Update tests

* Add a test
```
1d777359
Generate: fix speculative decoding (#28166) · 45b70384
Joao Gante authored 1 year ago
```
Co-authored-by: Merve Noyan <merveenoyan@gmail.com>
```
45b70384
[docs] Trainer docs (#28145) · 01c081d1
Steven Liu authored 1 year ago
```
* fsdp, debugging, gpu selection

* fix hfoption

* fix
```
01c081d1

Align backbone stage selection with out_indices & out_features (#27606) · ee298a16

amyeroberts authored 1 year ago

* Iteratre over out_features instead of stage_names

* Update for all backbones

* Add tests

* Fix

* Align timm backbone behaviour with other backbones

* Fix tests

* Stricter checks on set out_features and out_indices

* Revert back stage selection logic

* Remove out-of-order logic

* Document restriction in docstrings

ee298a16

Update FA2 exception msg to point to hub discussions (#28161) · 224ab709
amyeroberts authored 1 year ago
```
* Update FA2 exception msg to point to hub discussions

* Use path for hub url
```
224ab709
Avoid unnecessary warnings when loading `CLIPConfig` (#28108) · 9924df9e
Yih-Dar authored 1 year ago
```
* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
9924df9e
Fix weights not properly initialized due to shape mismatch (#28122) · 7938c8c8
Yih-Dar authored 1 year ago
```
* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
7938c8c8

move code to Trainer.evaluate to enable use of that function with multiple datasets (#27844) · 769a9542

peter-sk authored 1 year ago


* move code to Trainer.evaluate to enable use of that function with multiple datasets

* test

* update doc string

* and a tip

* forgot the type

---------

Co-authored-by: Prof. Peter Schneider-Kamp <jps@ordbogen.com>

769a9542

[gpt-neox] Add attention_bias config to support model trained without attention biases (#28126) · cd9f9d63
Jong-hun Shin authored 1 year ago
```
* add attention_bias hparam for a model trained without attention biases

* fix argument documentation error
```
cd9f9d63

Fix FA2 integration (#28142) · def581ef

Sourab Mangrulkar authored 1 year ago


* fix fa2

* fix FA2 for popular models

* improve warning and add Younes as co-author

Co-Authored-By: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Update src/transformers/modeling_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix the warning

* Add Tip

* typo fix

* nit

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

def581ef

Remove deprecated CPU dockerfiles (#28149) · b134f685
Abolfazl Shahbazi authored 1 year ago
```
Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>
```
b134f685

19 Dec, 2023 6 commits

[docs] Fix mistral link in mixtral.md (#28143) · 38611086
Aaron Jimenez authored 1 year ago
```
Fix mistral link in mixtral.md
```
38611086

Update modeling_utils.py (#28127) · 23f8e4db

Mike Zellinger authored 1 year ago

In docstring for PreTrainedModel.resize_token_embeddings, correct definition of new_num_tokens parameter to read "the new number of tokens" (meaning the new size of the vocab) rather than "the number of new tokens" (number of newly added tokens only).

23f8e4db

[`Mixtral`] Fix loss + nits (#28115) · 4a04b4cc

Arthur authored 1 year ago


* default config should not use sliding window

* update the doc

* nits

* add a proper test

* update

* update

* update expected value

* Update src/transformers/tokenization_utils_fast.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* convert to float

* average then N**2

* comment

* revert nit

* good to fo

* fixup

* Update tests/models/mixtral/test_modeling_mixtral.py

Co-authored-by: Lysandre Debut <hi@lysand.re>

* revert unrelated change

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Lysandre Debut <hi@lysand.re>

4a04b4cc

Generate: speculative decoding (#27979) · ac974199

Joao Gante authored 1 year ago


* speculative decoding

* fix test

* space

* better comments

* remove redundant test

* test nit

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* PR comments

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

ac974199

Update split string in doctest to reflect #28087 (#28135) · bd7a3561
amyeroberts authored 1 year ago

bd7a3561

When save a model on TPU, make a copy to be moved to CPU (#27993) · 5aec50ec

qihqi authored 1 year ago

* When save a model, make a copy to be moved to CPU, dont move the original
model

* make deepcopy inside of _save_tpu

* Move to tpu without copy

5aec50ec

18 Dec, 2023 11 commits
- [Doc] Fix token link in What Transformers can do (#28123) · 4edffda6
  Aaron Jimenez authored 1 year ago
```
Fix token link
```
  4edffda6
- Fix a typo in tokenizer documentation (#28118) · c52b515e
  Mike Salvatore authored 1 year ago
  
  c52b515e
- [docs] General doc fixes (#28087) · a52e180a
  Steven Liu authored 1 year ago
```
* doc fix friday

* deprecated objects

* update not_doctested

* update toctree
```
  a52e180a
- Fix indentation error - semantic_segmentation.md (#28117) · 08a6e7a7
  Rockerz authored 1 year ago
```
Update semantic_segmentation.md
```
  08a6e7a7
- More TF fixes (#28081) · 71d47f0a
  Matt authored 1 year ago
```
* More build_in_name_scope()

* Make sure we set the save spec now we don't do it with dummies anymore

* make fixup
```
  71d47f0a
- Remove warning if `DISABLE_TELEMETRY` is used (#28113) · 0695b242
  Lucain authored 1 year ago
```
remove warning if DISABLE_TELEMETRY is used
```
  0695b242
- Disable jitter noise during evaluation in SwitchTransformers (#28077) · 7c5408da
  Daize Dong authored 1 year ago
```
* Disable jitter noise during evaluation

* Update outdated configuration information

* Formatting

* Add new line
```
  7c5408da
- fix ConversationalPipeline docstring (#28091) · a0522de4
  lain authored 1 year ago
  
  a0522de4
- in peft finetune, only the trainable parameters need to be saved (#27825) · e6cb8e05
  Wang, Yi authored 1 year ago
```
to reduce the storage size and also save the time of checkpoint saving while using deepspeed for training

Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
```
  e6cb8e05
- Spelling correction (#28110) · 7f2a8f92
  Aeneas Stankowski authored 1 year ago
```
Update mixtral.md

correct minor typo in overview
```
  7f2a8f92
- [`Llava` / `Vip-Llava`] Add SDPA into llava (#28107) · b8378b65
  Younes Belkada authored 1 year ago
```
add SDPA into llava
```
  b8378b65
17 Dec, 2023 2 commits

Fix the deprecation warning of _torch_pytree._register_pytree_node (#27803) · e6dcf8ab
cyyever authored 1 year ago

e6dcf8ab

4D `attention_mask` support (#27539) · f85a1e82

Poedator authored 1 year ago


* edits to _prepare_4d_causal_attention_mask()

* initial tests for 4d mask

* attention_mask_for_sdpa support

* added test for inner model hidden

* added autotest decorators

* test mask dtype to torch.int64

* torch.testing.assert_close

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* torch_device and @torch_gpu in tests

* upd tests

* +torch decorators

* torch decorators fixed

* more decorators!

* even more decorators

* fewer decorators

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

f85a1e82

16 Dec, 2023 1 commit
- fix resuming from ckpt when using FSDP with FULL_STATE_DICT (#27891) · 238d2e3c
  Sourab Mangrulkar authored 1 year ago
```
* fix resuming from ckpt when suing FSDP with FULL_STATE_DICT

* update tests

* fix tests
```
  238d2e3c
15 Dec, 2023 3 commits
- [docs] MPS (#28016) · ebfdb9ca
  Steven Liu authored 1 year ago
```
* mps docs

* toctree
```
  ebfdb9ca
- [docs] Trainer (#27986) · 0d63d177
  Steven Liu authored 1 year ago
```
* first draft

* add to toctree

* edits

* feedback
```
  0d63d177
- Fix Vip-llava docs (#28085) · 1faeff85
  Younes Belkada authored 1 year ago
```
* Update vipllava.md

* Update modeling_vipllava.py
```
  1faeff85