Commits · feature/#35425 · zhusg / transformers-new

28 Mar, 2025 11 commits
- add generated output · 3d401976
  Arthur Zucker authored 2 months ago
  
  3d401976
- merge with main and fix copies · f2bb6f98
  Arthur Zucker authored 2 months ago
  
  f2bb6f98
- ruff was updated on main · ee33cf7b
  Arthur Zucker authored 2 months ago
  
  ee33cf7b
- Merge branch 'main' of github.com:huggingface/transformers into feature/#35425 · c198b4b6
  Arthur Zucker authored 2 months ago
  
  c198b4b6
- update · 186e32b9
  Arthur Zucker authored 2 months ago
  
  186e32b9
- nits · d7da38bc
  Arthur Zucker authored 2 months ago
  
  d7da38bc
- update readme · 24557c37
  Arthur Zucker authored 2 months ago
  
  24557c37
- reorder functions + call for contributions! · a50c3512
  Arthur Zucker authored 2 months ago
  
  a50c3512
- fix: AttributeError: 'LlavaProcessor' object has no attribute 'image_token_id' (#37026) · 3af425d4
  jp authored 2 months ago
```
* Add image_token_id and video_token_id handling in Llava processors

* fix: image to video

* fix: correct image and video token ID handling in Llava processors

* fix: improve image and video token ID handling in Llava processors
```
  3af425d4
- push most of the changes · 7350a5d4
  Arthur Zucker authored 2 months ago
  
  7350a5d4
- Fix SDPA implementation in Qwen2-VL (issues with torch==2.6.0) (#36891) · 064cd7cd
  Manuel Faysse authored 2 months ago
```
* fix sdpa implementation

* ruff

* also modify 2_5 for consistency
```
  064cd7cd
27 Mar, 2025 21 commits

fix: Fully remove legacy cache from Llama (#36958) · 348f3285

Perry Gibson authored 2 months ago

* bug: fully remove legacy cache from Llama

* bug: fix CI issues

* bug: update jetmoe model

* bug: apply =check_modular_conversion.py= fix

* bug: apply make fix-copies

* bug: fix ruff

* PR suggestions

* Remove trailing commas in auto-gen files

* Trivial new line removal

348f3285

updates to have generation work! · 3fb9bea5
Arthur Zucker authored 2 months ago

3fb9bea5
fixed typo (#37036) · d6b3c748
Finn-Ole Höner authored 2 months ago

d6b3c748
Remove deprecated batch_size parameter (#37007) · 6cc9c8d7
cyyever authored 2 months ago

6cc9c8d7
Replace default split function with jnp.split() in flax models (#37001) · 4cc65e99
Prem Kumar M authored 2 months ago
```
Replace split with jnp's split function for flax models (#36854)
```
4cc65e99
Set weights_only in torch.load (#36991) · 41a0e58e
cyyever authored 2 months ago

41a0e58e
Fix typing for None valued variables (#37004) · de77f5b1
cyyever authored 2 months ago
```
Fix typing for None-able variables
```
de77f5b1
Avoid unnecessary device operations in loss computing (#36950) · 8c5e29ba
cyyever authored 2 months ago
```
* Avoid unnecessary tensor copy in loss computing

* Add type
```
8c5e29ba
clean pipeline question_answering. (#36986) · 471cf1de
湛露先生 authored 2 months ago
```
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
```
471cf1de
[generate, cache] handle more complex device maps (#37014) · 29f322d0
Joao Gante authored 2 months ago

29f322d0

[audio utils] fix fft_bin_width computation (#36603) · fb8e6c50

eustlb authored 2 months ago

* fix fft_bin_width computation

* update docstring + enforce correct params

* update test with correct value

* udpate test

* update feature extractors for concerned models

* update

* make

* udpate docstring

* udpate docstring

fb8e6c50

[chat templates} support loading audio from video (#36955) · e97c7600
Raushan Turganbay authored 2 months ago
```
* add audio from video

* typos

* delete print

* comments
```
e97c7600
Fixup for distill_any_depth conversion script (#37043) · c7bc79bd
Pavel Iakubovskii authored 2 months ago
```
* Fixup

* trigger
```
c7bc79bd

Optimize `to_py_obj` for python-native numeric lists and scalars (#36885) · d1eafe8d

Sungyoon Jeong authored 2 months ago

* Optimize to_py_obj for python-native numeric lists and scalars

* Fix bug that tuple is not converted to list

* Try np.array for more robust type checking

* Apply review and add tests for to_py_obj

d1eafe8d

fix pegasus init weights and other copied models (#36844) · 0e56fb69

jiqing-feng authored 2 months ago


* fix pegasus init weights

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix the rest of models

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix test

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix informer init

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* init weight before checking

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix roformer tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix roformer tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

0e56fb69

Add Distill Any Depth (#36614) · 7e813f9c

Parteek authored 2 months ago


* Added conversion Script

* Update src/transformers/models/depth_anything/convert_distill_any_depth_to_hf.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Updated Conversion Script

* Update src/transformers/models/depth_anything/convert_distill_any_depth_to_hf.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

---------

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

7e813f9c

Skip FP8 linear tests For device capability < 9.0(#37008) · 92429057
Mohamed Mekkouri authored 2 months ago
```
* skip fp8 linear

* add capability check

* format
```
92429057
remove redundant code in trainer (#36994) · 279c2e30
hoshi-hiyouga authored 2 months ago
```
* Update optimization.py

* Update optimization.py
```
279c2e30

Mark 2 tests as flaky for now (#37038) · d13c390d

Yih-Dar authored 2 months ago


* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

d13c390d

[Modeling] Load FP8 safetensors such as DeepSeek (#36828) · d6d930a6

Kyle Sayers authored 2 months ago


support loading fp8

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

d6d930a6

Fix PixtralProcessor patch_size when spatial_merge_size is used (#37019) · 927ce1d3
Michael Goin authored 2 months ago

927ce1d3

26 Mar, 2025 8 commits

Support QuestionAnswering Module for ModernBert based models. (#35566) · 49b5ab6a

Abu Bakr Soliman authored 2 months ago


* push ModernBertForQuestionAnswering

* update ModernBertForQuestionAnswering

* update __init__ loading

* set imports for ModernBertForQuestionAnswering

* update ModernBertForQuestionAnswering

* remove debugging logs

* update init_weights method

* remove custom initialization for ModernBertForQuestionAnswering

* apply make fix-copies

* apply make style

* apply make fix-copies

* append ModernBertForQuestionAnswering to the pipeline supported models

* remove unused file

* remove invalid autoload value

* update en/model_doc/modernbert.md

* apply make fixup command

* make fixup

* Update dummies

* update usage tips for ModernBertForQuestionAnswering

* update usage tips for ModernBertForQuestionAnswering

* add init

* add lint

* add consistency

* update init test

* change text to trigger stuck text

* use self.loss_function instead of custom loss

By @Cyrilvallez

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* Update modeling_modernbert.py

make comparable commit to even it out

* Match whitespace

* whitespace

---------

Co-authored-by: Matt <rocketknight1@gmail.com>
Co-authored-by: Orion Weller <wellerorion@gmail.com>
Co-authored-by: Orion Weller <31665361+orionw@users.noreply.github.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

49b5ab6a

fix transformers_cli import relative path issue (#36989) · 5b08db88

Yao Matrix authored 2 months ago


* fix transformers_cli relative import path issue

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* fix style

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

---------

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

5b08db88

[docs] Attention mask image (#36970) · 3a8ec8c4
Steven Liu authored 2 months ago
```
add image
```
3a8ec8c4

temp fix for TP : some attention layers's FP8 scales are too small + shared is... · 409f3412

Arthur Zucker authored 2 months ago

temp fix for TP : some attention layers's FP8 scales are too small + shared is local colwise and anything is local if FP8 because weights are used

409f3412

Remove deprecated training arguments (#36946) · 2b550c47
cyyever authored 2 months ago
```
* Remove deprecated training arguments

* More fixes

* More fixes

* More fixes
```
2b550c47

fix typos in the code comments and error messages (#36993) · 44715225

Afanti authored 2 months ago

* chore: enhance code comments

* chore: enhance code comments

* chore: enhance code comments

* chore: enhance code comments

* chore: enhance code comments

* chore: enhance code comments

* chore: enhance code comments

44715225

Log the correct learning rate (#36973) · 79d6f9fd
Marc Sun authored 2 months ago
```
* fix learning rate log

* fix lr log

* add lr
```
79d6f9fd
Fix device_map check for ggml files (#37003) · 13d36e89
Mohamed Mekkouri authored 2 months ago
```
fix
```
13d36e89