Commits · 2b4734bd4907d54a14f992c42d079af8dfffe6b0 · 某某某 / transformers-new

31 Mar, 2025 3 commits

Support passing flash_attn_kwargs when gradient_checkpointing is enabled (#37037) · 2b4734bd

efsotr authored 3 months ago

* support passing flash_attn_kwargs when gradient_checkpointing is enabled

* make modeling_deepspeek_v3.py consistent with modular_deepseek_v3.py

2b4734bd

Gaudi: Fix the pipeline failed issue with hpu device (#36990) · bd41b9c1

Yuan Wu authored 3 months ago


* Gaudi: fix the issue of is_torch_hpu_available() returns false

Signed-off-by: yuanwu <yuan.wu@intel.com>

* Fix make fixup

Signed-off-by: yuanwu <yuan.wu@intel.com>

* Add comments for the implicit behavior of import

Signed-off-by: yuanwu <yuan.wu@intel.com>

* Update src/transformers/utils/import_utils.py

* Update src/transformers/utils/import_utils.py

---------

Signed-off-by: yuanwu <yuan.wu@intel.com>
Co-authored-by: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>

bd41b9c1

Adding Qwen3 and Qwen3MoE (#36878) · 6acd5aec

Bo Zheng authored 3 months ago


* Initial commit for Qwen3

* fix and add tests for qwen3 & qwen3_moe

* rename models for tests.

* fix

* fix

* fix and add docs.

* fix model name in docs.

* simplify modular and fix configuration issues

* Fix the red CI: ruff was updated

* revert ruff, version was wrong

* fix qwen3moe.

* fix

* make sure MOE can load

* fix copies

---------

Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>

6acd5aec

30 Mar, 2025 1 commit
- 🌐 [i18n-KO] Translated `qwen2_vl.md` to Korean (#36750) · 0d6a60fe
  MinJu-Ha authored 3 months ago
```
* fix: manual edits

* fix: resolve suggestions

* Update toctree.yml
```
  0d6a60fe
28 Mar, 2025 15 commits

Kenlm (#37091) · b7fc2daf

Yih-Dar authored 3 months ago


* kenlm

* kenlm

* kenlm

* kenlm

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

b7fc2daf

[Cache] rename dtype attribute 🚨 🚨 (#37044) · bab605dd
Joao Gante authored 3 months ago
```
* yoink

* same pattern in all cache
```
v4.50.3-DeepSeek-3

bab605dd

[generate] beam search -- fix output cropping (#37080) · 9fd94760

Joao Gante authored 3 months ago

* handle jagged beams

* better comment

* bart -- beam search tests print special tokens

* more bart test updates

* more tests!

* better comment

9fd94760

fixed typo. (#37057) · 257bc670

湛露先生 authored 3 months ago


Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

257bc670

Fix AttentionInterface following feedback (#37010) · 2bea6bf2
Cyril Vallez authored 3 months ago
```
* up

* typo

* update doc

* Update attention_interface.md
```
2bea6bf2
Fix state_dict map location when quantized (#37086) · a86dad56
Cyril Vallez authored 3 months ago
```
* Update modeling_utils.py

* Update modeling_utils.py
```
a86dad56
Update w/ new account (#37084) · d6064754
Zach Mueller authored 3 months ago
```
* Update w/ new account

* DS
```
d6064754

fix tied weigths issue (#37031) · 581cf96e

Yih-Dar authored 3 months ago


* fix

* comment

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

581cf96e

[WIP] add deepseek-v3 (#35926) · eca74d13

Minho Ryu authored 3 months ago


* init commit

* style

* take comments into account

* add deepseekv3 modeling

* remove redundant code

* apply make style

* apply fix-copies

* make format

* add init files

* rename deepseekv3 into deepseek_v3 based on its model_type

* rename deepseekv3 into deepseek_v3 based on its model_type

* deepseek-v3 not deepseek_v3

* set model_type as deepseek_v3

* use default docs

* apply make

* fill type and docstring

* add rope_config_validation

* use custom DeepseekV3MLP

* hold code only for checkpoints congifuration; remove redundant

* revise rope yarn for DeepSeek variation

* rename DeepSeek-V3

* some refactoring

* revise load_hook to work properly; make moe func trainable; use llama instead of mixtral

* fix attention forward

* use -1 for not-changing dim when to use exapnd

* refactor DeepseekV3TopkRouter

* use reshape_for_rope instead of load_hook; revise attention forward for TP; rename q_head_dim with qk_head_dim

* register pre_hook and hook both

* make style

* use n_shared_experts

* Update src/transformers/models/deepseek_v3/configuration_deepseek_v3.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* add test file

* update modeling_file according to modular file

* make style

* add mapping for DeepseekV3ForSequenceClassification

* remove aux_loss_alpha

* add deepseek_v3 for perf

* add deepseek_v3

* rename test as deepseekv3

* use tiny-deepseek-v3

* remove DeepseekV3ForSequenceClassification

* cache before padding

* remote output_router_logits

* Revert "remote output_router_logits"

This reverts commit f264f800

.

* remove output_router_logits

* make e_score_correction_bias as buffer

* skip tests not compatible

* make style

* make e_score_correction_bias as buffer

* use rope_interleave instead of load_hook

* skip tests not compatible with MLA

* add doc for rope_interleave

* fix typo

* remove torch.no_grad for selecting topk

* fix post merge issue

* mrege with main and simplify

* nits

* final

* small fixes

* fix

* support TP better

* stash

* changes currently requires

* remove synch

* more fixes for TP

* temp fix for TP : some attention layers's FP8 scales are too small + shared is local colwise and anything is local if FP8 because weights are used

* updates to have generation work!

* push most of the changes

* reorder functions + call for contributions!

* update readme

* nits

* update

* ruff was updated on main

* merge with main and fix copies

* revert unrelated changes

* route all tokens to all experts when testing to avoid no gradient iddues

* finish fixing all tests

* fixup

* nit

* clean config

* last readme changes

* nit

* do cnit

* typo

* last nit

* one more one more

---------

Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: arthur@huggingface.co <arthur@ip-26-0-165-131.ec2.internal>

eca74d13

[blip-2] Fix dtype mismatch when keep in fp32 (#37068) · 52cc204d
Raushan Turganbay authored 3 months ago
```
* fix fp32 BLIP2

* no need to reorder that

* check for `Noneness` as well before casting dtype
```
52cc204d
Change deprecated PT functions (#37041) · aa3778af
cyyever authored 3 months ago
```
Change deprecated functions
```
aa3778af
Fix some typos about benchmark scripts. (#37027) · c90e6e96
湛露先生 authored 3 months ago
```
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
```
c90e6e96

Use `lru_cache` for tokenization tests (#36818) · 1fcaad6d

Yih-Dar authored 3 months ago


* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

1fcaad6d

fix: AttributeError: 'LlavaProcessor' object has no attribute 'image_token_id' (#37026) · 3af425d4

jp authored 3 months ago

* Add image_token_id and video_token_id handling in Llava processors

* fix: image to video

* fix: correct image and video token ID handling in Llava processors

* fix: improve image and video token ID handling in Llava processors

3af425d4

Fix SDPA implementation in Qwen2-VL (issues with torch==2.6.0) (#36891) · 064cd7cd
Manuel Faysse authored 3 months ago
```
* fix sdpa implementation

* ruff

* also modify 2_5 for consistency
```
064cd7cd

27 Mar, 2025 20 commits

fix: Fully remove legacy cache from Llama (#36958) · 348f3285

Perry Gibson authored 3 months ago

* bug: fully remove legacy cache from Llama

* bug: fix CI issues

* bug: update jetmoe model

* bug: apply =check_modular_conversion.py= fix

* bug: apply make fix-copies

* bug: fix ruff

* PR suggestions

* Remove trailing commas in auto-gen files

* Trivial new line removal

348f3285

fixed typo (#37036) · d6b3c748
Finn-Ole Höner authored 3 months ago

d6b3c748
Remove deprecated batch_size parameter (#37007) · 6cc9c8d7
cyyever authored 3 months ago

6cc9c8d7
Replace default split function with jnp.split() in flax models (#37001) · 4cc65e99
Prem Kumar M authored 3 months ago
```
Replace split with jnp's split function for flax models (#36854)
```
4cc65e99
Set weights_only in torch.load (#36991) · 41a0e58e
cyyever authored 3 months ago

41a0e58e
Fix typing for None valued variables (#37004) · de77f5b1
cyyever authored 3 months ago
```
Fix typing for None-able variables
```
de77f5b1
Avoid unnecessary device operations in loss computing (#36950) · 8c5e29ba
cyyever authored 3 months ago
```
* Avoid unnecessary tensor copy in loss computing

* Add type
```
8c5e29ba
clean pipeline question_answering. (#36986) · 471cf1de
湛露先生 authored 3 months ago
```
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
```
471cf1de
[generate, cache] handle more complex device maps (#37014) · 29f322d0
Joao Gante authored 3 months ago

29f322d0

[audio utils] fix fft_bin_width computation (#36603) · fb8e6c50

eustlb authored 3 months ago

* fix fft_bin_width computation

* update docstring + enforce correct params

* update test with correct value

* udpate test

* update feature extractors for concerned models

* update

* make

* udpate docstring

* udpate docstring

fb8e6c50

[chat templates} support loading audio from video (#36955) · e97c7600
Raushan Turganbay authored 3 months ago
```
* add audio from video

* typos

* delete print

* comments
```
e97c7600
Fixup for distill_any_depth conversion script (#37043) · c7bc79bd
Pavel Iakubovskii authored 3 months ago
```
* Fixup

* trigger
```
c7bc79bd

Optimize `to_py_obj` for python-native numeric lists and scalars (#36885) · d1eafe8d

Sungyoon Jeong authored 3 months ago

* Optimize to_py_obj for python-native numeric lists and scalars

* Fix bug that tuple is not converted to list

* Try np.array for more robust type checking

* Apply review and add tests for to_py_obj

d1eafe8d

fix pegasus init weights and other copied models (#36844) · 0e56fb69

jiqing-feng authored 3 months ago


* fix pegasus init weights

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix the rest of models

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix test

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix informer init

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* init weight before checking

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix roformer tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix roformer tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

0e56fb69

Add Distill Any Depth (#36614) · 7e813f9c

Parteek authored 3 months ago


* Added conversion Script

* Update src/transformers/models/depth_anything/convert_distill_any_depth_to_hf.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Updated Conversion Script

* Update src/transformers/models/depth_anything/convert_distill_any_depth_to_hf.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

---------

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

7e813f9c

Skip FP8 linear tests For device capability < 9.0(#37008) · 92429057
Mohamed Mekkouri authored 3 months ago
```
* skip fp8 linear

* add capability check

* format
```
92429057
remove redundant code in trainer (#36994) · 279c2e30
hoshi-hiyouga authored 3 months ago
```
* Update optimization.py

* Update optimization.py
```
279c2e30

Mark 2 tests as flaky for now (#37038) · d13c390d

Yih-Dar authored 3 months ago


* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

d13c390d

[Modeling] Load FP8 safetensors such as DeepSeek (#36828) · d6d930a6

Kyle Sayers authored 3 months ago


support loading fp8

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

d6d930a6

Fix PixtralProcessor patch_size when spatial_merge_size is used (#37019) · 927ce1d3
Michael Goin authored 3 months ago

927ce1d3

26 Mar, 2025 1 commit

Support QuestionAnswering Module for ModernBert based models. (#35566) · 49b5ab6a

Abu Bakr Soliman authored 3 months ago


* push ModernBertForQuestionAnswering

* update ModernBertForQuestionAnswering

* update __init__ loading

* set imports for ModernBertForQuestionAnswering

* update ModernBertForQuestionAnswering

* remove debugging logs

* update init_weights method

* remove custom initialization for ModernBertForQuestionAnswering

* apply make fix-copies

* apply make style

* apply make fix-copies

* append ModernBertForQuestionAnswering to the pipeline supported models

* remove unused file

* remove invalid autoload value

* update en/model_doc/modernbert.md

* apply make fixup command

* make fixup

* Update dummies

* update usage tips for ModernBertForQuestionAnswering

* update usage tips for ModernBertForQuestionAnswering

* add init

* add lint

* add consistency

* update init test

* change text to trigger stuck text

* use self.loss_function instead of custom loss

By @Cyrilvallez

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* Update modeling_modernbert.py

make comparable commit to even it out

* Match whitespace

* whitespace

---------

Co-authored-by: Matt <rocketknight1@gmail.com>
Co-authored-by: Orion Weller <wellerorion@gmail.com>
Co-authored-by: Orion Weller <31665361+orionw@users.noreply.github.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

49b5ab6a