Commits · debug+_audio · 某某某 / transformers-new

31 Mar, 2025 13 commits

hello · e4c217c8
ydshieh authored 2 months ago

e4c217c8
debug audio pipeline · d7bc83ca
ydshieh authored 2 months ago

d7bc83ca
debug audio pipeline · 80864d96
ydshieh authored 2 months ago

80864d96

Remove deprecated code (#37059) · f99c279d

cyyever authored 2 months ago


* Remove deprecated code

* fix get_loading_attributes

* fix error

* skip test

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

f99c279d

RWKV: fix mask warning typo (#37114) · d1efaf03
Robin Kahlow authored 2 months ago
```
rwkv: fix mask warning typo
```
d1efaf03
Fix Gemma3 embedding scaling (#37109) · 19919689
Thien Tran authored 2 months ago
```
fix gemma3 embedding
```
19919689

[MLU] Fix FA2 check error, remove deepspeed-mlu deps. (#36159) · d0b65bb4

huismiling authored 2 months ago

* add Cambricon MLUs support

* fix mlu device rng state

* up for quality check

* up mlu to support fp16

* fix mlu device dependency error

* fix mlu device dependency error

* enable mlu device for bf16

* fix mlu device memory tracker

* Cambricon support SDPA and flash_attn

* MLU devices : Checks if `mlu` is available via an `cndev-based` check which won't trigger the drivers and leave mlu

* Fix mlu FA2 check. Remove deepspeed-mlu check. add mlu tests support.

* fix testing errors.

* Merge branch 'hf/main' into main

* fix get_device_count error.

* fix mlu testing utils.

* fix code quality and style.

* switch to @require_torch_multi_accelerator

d0b65bb4

fix whisper re-compile (#36712) · ad63d20d

jiqing-feng authored 2 months ago


* fix whisper re-compile

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix copy

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix comment

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix copies

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* revert useless changes

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

ad63d20d

enable tp on CPU (#36299) · 286393fb

jiqing-feng authored 2 months ago


* enable tp on CPU

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* get rank from cpu

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* enable TP tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix comment

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* em print

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix model id

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix conflict

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix index and add doc

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

286393fb

Fix 4090/ada not detected as having FP8 support (#37067) · 4705b04c

Qubitium-ModelCloud authored 2 months ago


fix 4090/ada not detected as having FP8 support

Signed-off-by: Qubitium <qubitium@modelcloud.ai>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

4705b04c

Support passing flash_attn_kwargs when gradient_checkpointing is enabled (#37037) · 2b4734bd

efsotr authored 2 months ago

* support passing flash_attn_kwargs when gradient_checkpointing is enabled

* make modeling_deepspeek_v3.py consistent with modular_deepseek_v3.py

2b4734bd

Gaudi: Fix the pipeline failed issue with hpu device (#36990) · bd41b9c1

Yuan Wu authored 2 months ago


* Gaudi: fix the issue of is_torch_hpu_available() returns false

Signed-off-by: yuanwu <yuan.wu@intel.com>

* Fix make fixup

Signed-off-by: yuanwu <yuan.wu@intel.com>

* Add comments for the implicit behavior of import

Signed-off-by: yuanwu <yuan.wu@intel.com>

* Update src/transformers/utils/import_utils.py

* Update src/transformers/utils/import_utils.py

---------

Signed-off-by: yuanwu <yuan.wu@intel.com>
Co-authored-by: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>

bd41b9c1

Adding Qwen3 and Qwen3MoE (#36878) · 6acd5aec

Bo Zheng authored 2 months ago


* Initial commit for Qwen3

* fix and add tests for qwen3 & qwen3_moe

* rename models for tests.

* fix

* fix

* fix and add docs.

* fix model name in docs.

* simplify modular and fix configuration issues

* Fix the red CI: ruff was updated

* revert ruff, version was wrong

* fix qwen3moe.

* fix

* make sure MOE can load

* fix copies

---------

Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>

6acd5aec

30 Mar, 2025 1 commit
- [i18n-KO] Translated `qwen2_vl.md` to Korean (#36750) · 0d6a60fe
  MinJu-Ha authored 2 months ago
```
* fix: manual edits

* fix: resolve suggestions

* Update toctree.yml
```
  0d6a60fe
28 Mar, 2025 15 commits

Kenlm (#37091) · b7fc2daf

Yih-Dar authored 2 months ago


* kenlm

* kenlm

* kenlm

* kenlm

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

b7fc2daf

[Cache] rename dtype attribute (#37044) · bab605dd
Joao Gante authored 2 months ago
```
* yoink

* same pattern in all cache
```
v4.50.3-DeepSeek-3

bab605dd

[generate] beam search -- fix output cropping (#37080) · 9fd94760

Joao Gante authored 2 months ago

* handle jagged beams

* better comment

* bart -- beam search tests print special tokens

* more bart test updates

* more tests!

* better comment

9fd94760

fixed typo. (#37057) · 257bc670

湛露先生 authored 2 months ago


Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

257bc670

Fix AttentionInterface following feedback (#37010) · 2bea6bf2
Cyril Vallez authored 2 months ago
```
* up

* typo

* update doc

* Update attention_interface.md
```
2bea6bf2
Fix state_dict map location when quantized (#37086) · a86dad56
Cyril Vallez authored 2 months ago
```
* Update modeling_utils.py

* Update modeling_utils.py
```
a86dad56
Update w/ new account (#37084) · d6064754
Zach Mueller authored 2 months ago
```
* Update w/ new account

* DS
```
d6064754

fix tied weigths issue (#37031) · 581cf96e

Yih-Dar authored 2 months ago


* fix

* comment

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

581cf96e

[WIP] add deepseek-v3 (#35926) · eca74d13

Minho Ryu authored 2 months ago


* init commit

* style

* take comments into account

* add deepseekv3 modeling

* remove redundant code

* apply make style

* apply fix-copies

* make format

* add init files

* rename deepseekv3 into deepseek_v3 based on its model_type

* rename deepseekv3 into deepseek_v3 based on its model_type

* deepseek-v3 not deepseek_v3

* set model_type as deepseek_v3

* use default docs

* apply make

* fill type and docstring

* add rope_config_validation

* use custom DeepseekV3MLP

* hold code only for checkpoints congifuration; remove redundant

* revise rope yarn for DeepSeek variation

* rename DeepSeek-V3

* some refactoring

* revise load_hook to work properly; make moe func trainable; use llama instead of mixtral

* fix attention forward

* use -1 for not-changing dim when to use exapnd

* refactor DeepseekV3TopkRouter

* use reshape_for_rope instead of load_hook; revise attention forward for TP; rename q_head_dim with qk_head_dim

* register pre_hook and hook both

* make style

* use n_shared_experts

* Update src/transformers/models/deepseek_v3/configuration_deepseek_v3.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* add test file

* update modeling_file according to modular file

* make style

* add mapping for DeepseekV3ForSequenceClassification

* remove aux_loss_alpha

* add deepseek_v3 for perf

* add deepseek_v3

* rename test as deepseekv3

* use tiny-deepseek-v3

* remove DeepseekV3ForSequenceClassification

* cache before padding

* remote output_router_logits

* Revert "remote output_router_logits"

This reverts commit f264f800

.

* remove output_router_logits

* make e_score_correction_bias as buffer

* skip tests not compatible

* make style

* make e_score_correction_bias as buffer

* use rope_interleave instead of load_hook

* skip tests not compatible with MLA

* add doc for rope_interleave

* fix typo

* remove torch.no_grad for selecting topk

* fix post merge issue

* mrege with main and simplify

* nits

* final

* small fixes

* fix

* support TP better

* stash

* changes currently requires

* remove synch

* more fixes for TP

* temp fix for TP : some attention layers's FP8 scales are too small + shared is local colwise and anything is local if FP8 because weights are used

* updates to have generation work!

* push most of the changes

* reorder functions + call for contributions!

* update readme

* nits

* update

* ruff was updated on main

* merge with main and fix copies

* revert unrelated changes

* route all tokens to all experts when testing to avoid no gradient iddues

* finish fixing all tests

* fixup

* nit

* clean config

* last readme changes

* nit

* do cnit

* typo

* last nit

* one more one more

---------

Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: arthur@huggingface.co <arthur@ip-26-0-165-131.ec2.internal>

eca74d13

[blip-2] Fix dtype mismatch when keep in fp32 (#37068) · 52cc204d
Raushan Turganbay authored 2 months ago
```
* fix fp32 BLIP2

* no need to reorder that

* check for `Noneness` as well before casting dtype
```
52cc204d
Change deprecated PT functions (#37041) · aa3778af
cyyever authored 2 months ago
```
Change deprecated functions
```
aa3778af
Fix some typos about benchmark scripts. (#37027) · c90e6e96
湛露先生 authored 2 months ago
```
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
```
c90e6e96

Use `lru_cache` for tokenization tests (#36818) · 1fcaad6d

Yih-Dar authored 2 months ago


* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

1fcaad6d

fix: AttributeError: 'LlavaProcessor' object has no attribute 'image_token_id' (#37026) · 3af425d4

jp authored 2 months ago

* Add image_token_id and video_token_id handling in Llava processors

* fix: image to video

* fix: correct image and video token ID handling in Llava processors

* fix: improve image and video token ID handling in Llava processors

3af425d4

Fix SDPA implementation in Qwen2-VL (issues with torch==2.6.0) (#36891) · 064cd7cd
Manuel Faysse authored 2 months ago
```
* fix sdpa implementation

* ruff

* also modify 2_5 for consistency
```
064cd7cd

27 Mar, 2025 11 commits
- fix: Fully remove legacy cache from Llama (#36958) · 348f3285
  Perry Gibson authored 2 months ago
```
* bug: fully remove legacy cache from Llama

* bug: fix CI issues

* bug: update jetmoe model

* bug: apply =check_modular_conversion.py= fix

* bug: apply make fix-copies

* bug: fix ruff

* PR suggestions

* Remove trailing commas in auto-gen files

* Trivial new line removal
```
  348f3285
- fixed typo (#37036) · d6b3c748
  Finn-Ole Höner authored 2 months ago
  
  d6b3c748
- Remove deprecated batch_size parameter (#37007) · 6cc9c8d7
  cyyever authored 2 months ago
  
  6cc9c8d7
- Replace default split function with jnp.split() in flax models (#37001) · 4cc65e99
  Prem Kumar M authored 2 months ago
```
Replace split with jnp's split function for flax models (#36854)
```
  4cc65e99
- Set weights_only in torch.load (#36991) · 41a0e58e
  cyyever authored 2 months ago
  
  41a0e58e
- Fix typing for None valued variables (#37004) · de77f5b1
  cyyever authored 2 months ago
```
Fix typing for None-able variables
```
  de77f5b1
- Avoid unnecessary device operations in loss computing (#36950) · 8c5e29ba
  cyyever authored 2 months ago
```
* Avoid unnecessary tensor copy in loss computing

* Add type
```
  8c5e29ba
- clean pipeline question_answering. (#36986) · 471cf1de
  湛露先生 authored 2 months ago
```
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
```
  471cf1de
- [generate, cache] handle more complex device maps (#37014) · 29f322d0
  Joao Gante authored 2 months ago
  
  29f322d0
- [audio utils] fix fft_bin_width computation (#36603) · fb8e6c50
  eustlb authored 2 months ago
```
* fix fft_bin_width computation

* update docstring + enforce correct params

* update test with correct value

* udpate test

* update feature extractors for concerned models

* update

* make

* udpate docstring

* udpate docstring
```
  fb8e6c50
- [chat templates} support loading audio from video (#36955) · e97c7600
  Raushan Turganbay authored 2 months ago
```
* add audio from video

* typos

* delete print

* comments
```
  e97c7600