Commits · 23f8e4db7779a81a24f35debd34c56f39e5807a4 · 某某某 / transformers-new

19 Dec, 2023 5 commits

Update modeling_utils.py (#28127) · 23f8e4db

Mike Zellinger authored 1 year ago

In docstring for PreTrainedModel.resize_token_embeddings, correct definition of new_num_tokens parameter to read "the new number of tokens" (meaning the new size of the vocab) rather than "the number of new tokens" (number of newly added tokens only).

23f8e4db

[`Mixtral`] Fix loss + nits (#28115) · 4a04b4cc

Arthur authored 1 year ago


* default config should not use sliding window

* update the doc

* nits

* add a proper test

* update

* update

* update expected value

* Update src/transformers/tokenization_utils_fast.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* convert to float

* average then N**2

* comment

* revert nit

* good to fo

* fixup

* Update tests/models/mixtral/test_modeling_mixtral.py

Co-authored-by: Lysandre Debut <hi@lysand.re>

* revert unrelated change

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Lysandre Debut <hi@lysand.re>

4a04b4cc

Generate: speculative decoding (#27979) · ac974199

Joao Gante authored 1 year ago


* speculative decoding

* fix test

* space

* better comments

* remove redundant test

* test nit

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* PR comments

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

ac974199

Update split string in doctest to reflect #28087 (#28135) · bd7a3561
amyeroberts authored 1 year ago

bd7a3561

When save a model on TPU, make a copy to be moved to CPU (#27993) · 5aec50ec

qihqi authored 1 year ago

* When save a model, make a copy to be moved to CPU, dont move the original
model

* make deepcopy inside of _save_tpu

* Move to tpu without copy

5aec50ec

18 Dec, 2023 11 commits
- [Doc] Fix token link in What Transformers can do (#28123) · 4edffda6
  Aaron Jimenez authored 1 year ago
```
Fix token link
```
  4edffda6
- Fix a typo in tokenizer documentation (#28118) · c52b515e
  Mike Salvatore authored 1 year ago
  
  c52b515e
- [docs] General doc fixes (#28087) · a52e180a
  Steven Liu authored 1 year ago
```
* doc fix friday

* deprecated objects

* update not_doctested

* update toctree
```
  a52e180a
- Fix indentation error - semantic_segmentation.md (#28117) · 08a6e7a7
  Rockerz authored 1 year ago
```
Update semantic_segmentation.md
```
  08a6e7a7
- More TF fixes (#28081) · 71d47f0a
  Matt authored 1 year ago
```
* More build_in_name_scope()

* Make sure we set the save spec now we don't do it with dummies anymore

* make fixup
```
  71d47f0a
- Remove warning if `DISABLE_TELEMETRY` is used (#28113) · 0695b242
  Lucain authored 1 year ago
```
remove warning if DISABLE_TELEMETRY is used
```
  0695b242
- Disable jitter noise during evaluation in SwitchTransformers (#28077) · 7c5408da
  Daize Dong authored 1 year ago
```
* Disable jitter noise during evaluation

* Update outdated configuration information

* Formatting

* Add new line
```
  7c5408da
- fix ConversationalPipeline docstring (#28091) · a0522de4
  lain authored 1 year ago
  
  a0522de4
- in peft finetune, only the trainable parameters need to be saved (#27825) · e6cb8e05
  Wang, Yi authored 1 year ago
```
to reduce the storage size and also save the time of checkpoint saving while using deepspeed for training

Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
```
  e6cb8e05
- Spelling correction (#28110) · 7f2a8f92
  Aeneas Stankowski authored 1 year ago
```
Update mixtral.md

correct minor typo in overview
```
  7f2a8f92
- [`Llava` / `Vip-Llava`] Add SDPA into llava (#28107) · b8378b65
  Younes Belkada authored 1 year ago
```
add SDPA into llava
```
  b8378b65
17 Dec, 2023 2 commits

Fix the deprecation warning of _torch_pytree._register_pytree_node (#27803) · e6dcf8ab
cyyever authored 1 year ago

e6dcf8ab

4D `attention_mask` support (#27539) · f85a1e82

Poedator authored 1 year ago


* edits to _prepare_4d_causal_attention_mask()

* initial tests for 4d mask

* attention_mask_for_sdpa support

* added test for inner model hidden

* added autotest decorators

* test mask dtype to torch.int64

* torch.testing.assert_close

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* torch_device and @torch_gpu in tests

* upd tests

* +torch decorators

* torch decorators fixed

* more decorators!

* even more decorators

* fewer decorators

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

f85a1e82

16 Dec, 2023 1 commit
- fix resuming from ckpt when using FSDP with FULL_STATE_DICT (#27891) · 238d2e3c
  Sourab Mangrulkar authored 1 year ago
```
* fix resuming from ckpt when suing FSDP with FULL_STATE_DICT

* update tests

* fix tests
```
  238d2e3c
15 Dec, 2023 18 commits
- [docs] MPS (#28016) · ebfdb9ca
  Steven Liu authored 1 year ago
```
* mps docs

* toctree
```
  ebfdb9ca
- [docs] Trainer (#27986) · 0d63d177
  Steven Liu authored 1 year ago
```
* first draft

* add to toctree

* edits

* feedback
```
  0d63d177
- Fix Vip-llava docs (#28085) · 1faeff85
  Younes Belkada authored 1 year ago
```
* Update vipllava.md

* Update modeling_vipllava.py
```
  1faeff85
- Fix wrong examples in llava usage. (#28020) · ffa04def
  Ligeng Zhu authored 1 year ago
```
* Fix wrong examples in llava usage.

* Update modeling_llava.py
```
  ffa04def
- Fix `low_cpu_mem_usage` Flag Conflict with DeepSpeed Zero 3 in... · 29a1c1b4
  Kotaro Tanahashi authored 1 year ago
```
Fix `low_cpu_mem_usage` Flag Conflict with DeepSpeed Zero 3 in `from_pretrained` for Models with `keep_in_fp32_modules`" (#27762)

Fix `from_pretrained` Logic
for `low_cpu_mem_usage` with DeepSpeed Zero3
```
  29a1c1b4
- Update fixtures-image-utils (#28080) · 26ea725b
  Quentin Lhoest authored 1 year ago
```
* fix hf-internal-testing/fixtures_image_utils

* fix test

* comments
```
  26ea725b
- Fix bug for checkpoint saving on multi node training setting (#28078) · 1c286be5
  dumpmemory authored 1 year ago
```
* add multi-node traning setting

* fix style
```
  1c286be5
- make torch.load a bit safer (#27282) · dec84b32
  Julien Chaumond authored 1 year ago
```
* make torch.load a bit safer

* Fixes

---------

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
```
  dec84b32
- Make GPT2 traceable in meta state (#28054) · 74cae670
  Ke Wen authored 1 year ago
```
* Put device in tensor constructor instead of to()

* Fix copy
```
  74cae670
- [LLaVa] Add past_key_values to _skip_keys_device_placement to fix multi-GPU dispatch (#28051) · e2b6df79
  Adilzhan Ismailov authored 1 year ago
```
Add past_key_values to _skip_keys_device_placement  for LLaVa
```
  e2b6df79
- Skip M4T `test_retain_grad_hidden_states_attentions` (#28060) · deb72cb6
  Yoach Lacombe authored 1 year ago
```
* skip test from SpeechInput

* refine description of skip
```
  deb72cb6
- [`Mixtral`] update conversion script to reflect new changes (#28068) · d269c4b2
  Younes Belkada authored 1 year ago
```
* Update convert_mixtral_weights_to_hf.py

* forward contrib credits from original fix

---------

Co-authored-by: thomasw21 <thomasw21@users.noreply.github.com>
```
  d269c4b2
- doc: Correct spelling mistake (#28064) · 70a127a3
  Cylis authored 1 year ago
  
  70a127a3
- Remove SpeechT5 deprecated argument (#28062) · c817c17d
  Yoach Lacombe authored 1 year ago
  
  c817c17d
- [Flax LLaMA] Fix attn dropout (#28059) · 6af3ce77
  Sanchit Gandhi authored 1 year ago
  
  6af3ce77
- [Flax BERT] Update deprecated 'split' method (#28012) · 7e876dca
  Sanchit Gandhi authored 1 year ago
```
* [Flax BERT] Update deprecated 'split' method

* fix copies
```
  7e876dca
- [`Modeling` / `Mixtral`] Fix GC + PEFT issues with Mixtral (#28061) · e737446e
  Younes Belkada authored 1 year ago
```
fix for mistral
```
  e737446e
- [`FA-2`] Fix fa-2 issue when passing `config` to `from_pretrained` (#28043) · 1e209317
  Younes Belkada authored 1 year ago
```
* fix fa-2 issue

* fix test

* Update src/transformers/modeling_utils.py

Co-authored-by: fxmarty <9808326+fxmarty@users.noreply.github.com>

* clenaer fix

* up

* add more robust tests

* Update src/transformers/modeling_utils.py

Co-authored-by: fxmarty <9808326+fxmarty@users.noreply.github.com>

* fixup

* Update src/transformers/modeling_utils.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* pop

* add test

---------

Co-authored-by: fxmarty <9808326+fxmarty@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
```
  1e209317
14 Dec, 2023 3 commits

Remove warning when Annotion enum is created (#28048) · 1a585c12
amyeroberts authored 1 year ago
```
Remove warning when enum is created
```
1a585c12
Replace build() with build_in_name_scope() for some TF tests (#28046) · 3060899b
Matt authored 1 year ago
```
Replace build() with build_in_name_scope() for some tests
```
3060899b

Proper build() methods for TF (#27794) · 050e0b44

Matt authored 1 year ago

* Add a convenience method for building in your own name scope

* Second attempt at auto layer building

* Revert "Second attempt at auto layer building"

This reverts commit e03a3aaecf9ec41a805582b83cbdfe3290a631be.

* Attempt #3

* Revert "Attempt #3"

This reverts commit b9df7a0857560d29b5abbed6127d9e9eca77cf47.

* Add missing attributes that we're going to need later

* Add some attributes we're going to need later

* A fourth attempt! Feel the power flow through you!

* Revert "A fourth attempt! Feel the power flow through you!"

This reverts commit 6bf4aaf3875d6f28485f50187617a4c616c8aff7.

* Add more values we'll need later

* TF refactor that we'll need later

* Revert "TF refactor that we'll need later"

This reverts commit ca07202fb5b7b7436b893baa8d688b4f348ea7b9.

* Revert "Revert "TF refactor that we'll need later""

This reverts commit 1beb0f39f293ed9c27594575e1c849aadeb15c13.

* make fixup

* Attempt five!

* Revert "Attempt five!"

This reverts commit 3302207958dfd0374b0447a51c06eea51a506044.

* Attempt six - this time don't add empty methods

* Revert "Attempt six - this time don't add empty methods"

This reverts commit 67d60129be75416b6beb8f47c7d38d77b18d79bb.

* Attempt seven - better base model class detection!

* Revert "Attempt seven - better base model class detection!"

This reverts commit 5f14845e92ea0e87c598da933bfbfee10f553bc9.

* Another attribute we'll need later

* Try again with the missing attribute!

* Revert "Try again with the missing attribute!"

This reverts commit 760c6f30c5dffb3e04b0e73c34a77d1882a0fef7.

* This is the attempt that will pierce the heavens!

* Revert "This is the attempt that will pierce the heavens!"

This reverts commit c868bb657de057aca7a5260350a3f831fc4dfee6.

* Attempt seven - snag list is steadily decreasing

* Revert "Attempt seven - snag list is steadily decreasing"

This reverts commit 46fbd975deda64429bfb3e5fac4fc0370c00d316.

* Attempt eight - will an empty snag list do it?

* Revert "Attempt eight - will an empty snag list do it?"

This reverts commit 7c8a3c2b083253649569e9877e02054ae5cec67b.

* Fixes to Hubert issues that cause problems later

* Trying again with Conv1D/SeparableConv fixes

* Revert "Trying again with Conv1D/SeparableConv fixes"

This reverts commit 55092bca952bc0f750aa1ffe246a640bf1e2036e.

* Apply the build shape fixes to Wav2Vec2 as well

* One more attempt!

* Revert "One more attempt!"

This reverts commit 5ac3e4cb01b9458cc93312873725f9444ae7261c.

* Another attempt!

* Revert "Another attempt!"

This reverts commit ea16d890e019d7de8792a3b8e72f3b1c02adae50.

* Let's see how many failures we get without the internal build method

* Fix OpenAI

* Fix MobileBERT

* (Mostly) fix GroupVIT

* Fix BLIP

* One more BLIP fix

* One more BLIP fix!

* Fix Regnet

* Finally fully fix GroupViT

* Fix Data2Vec and add the new AdaptivePool

* Fix Segformer

* Fix Albert

* Fix Deberta/DebertaV2

* Fix XLM

* Actually fix XLM

* Fix Flaubert

* Fix lxmert

* Fix Resnet

* Fix ConvBERT

* Fix ESM

* Fix Convnext / ConvnextV2

* Fix SAM

* Fix Efficientformer

* Fix LayoutLMv3

* Fix speech_to_text

* Fix mpnet and mobilevit

* Fix Swin

* Fix CTRL

* Fix CVT

* Fix DPR

* Fix Wav2Vec2

* Fix T5

* Fix Hubert

* Fix GPT2

* Fix Whisper

* Fix DeiT

* Fix the encoder-decoder / dual-encoder classes

* make fix-copies

* build in name scope

* Fix summarization test

* Fix tied weight names for BART + Blenderbot

* Fix tied weight name building

* Fix to TFESM weight building

* Update TF SAM

* Expand all the shapes out into Big Boy Shapes

050e0b44