Commits · bf46e44878bd86aebcfa1eceb4a93a6e5b20e863 · 某某某 / transformers-new

11 Apr, 2025 12 commits

Allow saving and loading multiple "raw" chat... · bf46e448

Matt authored 3 months ago

🚨 🚨

 Allow saving and loading multiple "raw" chat template files (#36588)

* Add saving in the new format (but no loading yet!)

* Add saving in the new format (but no loading yet!)

* A new approach to template files!

* make fixup

* make fixup, set correct dir

* Some progress but need to rework for cached_file

* Rework loading handling again

* Small fixes

* Looks like it's working now!

* make fixup

* Working!

* make fixup

* make fixup

* Add TODO so I don't miss it

* Cleaner control flow with one less indent

* Copy the new logic to processing_utils as well

* Proper support for dicts of templates

* make fixup

* define the file/dir names in a single place

* Update the processor chat template reload test as well

* Add processor loading of multiple templates

* Flatten correctly to match tokenizers

* Better support when files are empty sometimes

* Stop creating those empty templates

* Revert changes now we don't have empty templates

* Revert changes now we don't have empty templates

* Don't support separate template files on the legacy path

* Rework/simplify loading code

* Make sure it's always a chat_template key in chat_template.json

* Update processor handling of multiple templates

* Add a full save-loading test to the tokenizer tests as well

* Correct un-flattening

* New test was incorrect

* Correct error/offline handling

* Better exception handling

* More error handling cleanup

* Add skips for test failing on main

* Reorder to fix errors

* make fixup

* clarify legacy processor file docs and location

* Update src/transformers/processing_utils.py

Co-authored-by: Lucain <lucainp@gmail.com>

* Update src/transformers/processing_utils.py

Co-authored-by: Lucain <lucainp@gmail.com>

* Update src/transformers/processing_utils.py

Co-authored-by: Lucain <lucainp@gmail.com>

* Update src/transformers/processing_utils.py

Co-authored-by: Lucain <lucainp@gmail.com>

* Rename to _jinja and _legacy

* Stop saving multiple templates in the legacy format

* Cleanup the processing code

* Cleanup the processing code more

* make fixup

* make fixup

* correct reformatting

* Use correct dir name

* Fix import location

* Use save_jinja_files instead of save_raw_chat_template_files

* Correct the test for saving multiple processor templates

* Fix type hint

* Update src/transformers/utils/hub.py

Co-authored-by: Julien Chaumond <julien@huggingface.co>

* Patch llava_onevision test

* Update src/transformers/processing_utils.py

Co-authored-by: Julien Chaumond <julien@huggingface.co>

* Update src/transformers/tokenization_utils_base.py

Co-authored-by: Julien Chaumond <julien@huggingface.co>

* Refactor chat template saving out into a separate function

* Update tests for the new default

* Don't do chat template saving logic when chat template isn't there

* Ensure save_jinja_files is propagated to tokenizer correctly

* Trigger tests

* Update more tests to new default

* Trigger tests

---------

Co-authored-by: Lucain <lucainp@gmail.com>
Co-authored-by: Julien Chaumond <julien@huggingface.co>

bf46e448

Disable kernels for quantization (#37446) · 89787474
Mohamed Mekkouri authored 3 months ago
```
fix
```
89787474
prevent creating a view/leaf param for low rank optimizers w FSDP (#37379) · 6a75528c
Wing Lian authored 3 months ago
```
prevent creating a view/leaf param for low rank optimizers:
```
6a75528c
[Regression] Fix Quark quantized model loading after refactorization (#37407) · 6cef03ba
Bowen Bao authored 3 months ago

6cef03ba
[processor] clean up mulitmodal tests (#37362) · a563999a
Raushan Turganbay authored 3 months ago
```
* clkea up mulitmodal processor tests

* fixup

* fix tests

* fix one last test

* forgot
```
a563999a
Remove triton mlp kernel, not compiling for some models (#37449) · 3c39c079
Mohamed Mekkouri authored 3 months ago
```
* remove mlp for now

* disable on docker
```
3c39c079
Fix the test fetcher (#37452) · f797e3d9
Lysandre Debut authored 3 months ago
```
Test fetcher
```
f797e3d9

Add moe kernels (#37376) · 442d356a

Arthur authored 3 months ago


* the fix that did not get in

* add kernels

* full graph does not work

* simpler is better

* Update src/transformers/integrations/hub_kernels.py

Co-authored-by: Daniël de Kok <me@danieldk.eu>

* Update src/transformers/integrations/fbgemm_fp8.py

Co-authored-by: Daniël de Kok <me@danieldk.eu>

* Update src/transformers/integrations/hub_kernels.py

Co-authored-by: Daniël de Kok <me@danieldk.eu>

* fixup

---------

Co-authored-by: Daniël de Kok <me@danieldk.eu>

442d356a

Update-kernel-pin (#37448) · 7e9b57ce
Arthur authored 3 months ago
```
* update `kernels`

* oups

* new pinned version
```
7e9b57ce

Simplify soft dependencies and update the dummy-creation process (#36827) · 54a123f0

Lysandre Debut authored 3 months ago


* Reverse dependency map shouldn't be created when test_all is set

* [test_all] Remove dummies

* Modular fixes

* Update utils/check_repo.py

Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>

* [test_all] Better docs

* [test_all] Update src/transformers/commands/chat.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* [test_all] Remove deprecated AdaptiveEmbeddings from the tests

* [test_all] Doc builder

* [test_all] is_dummy

* [test_all] Import utils

* [test_all] Doc building should not require all deps

---------

Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

54a123f0

Fixes: Corrects file path for CUDA kernels (#37438) · 931126b9

Donggeun Yu authored 3 months ago

Corrects the file path used to locate the CUDA kernels
for the Deformable Attention module. This ensures that
the kernels are loaded correctly, resolving potential
errors during module initialization and usage.

931126b9

enhance require_deterministic_for_xpu (#37437) · c7064cdb

Yao Matrix authored 3 months ago


* enhance require_deterministic_for_xpu

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* fix style

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* fix style

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

---------

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

c7064cdb

10 Apr, 2025 24 commits

Remove old code for PyTorch, Accelerator and tokenizers (#37234) · 371c44d0

cyyever authored 3 months ago


* Remove unneeded library version checks

Signed-off-by: cyy <cyyever@outlook.com>

* Remove PyTorch condition

Signed-off-by: cyy <cyyever@outlook.com>

* Remove PyTorch condition

Signed-off-by: cyy <cyyever@outlook.com>

* Fix ROCm get_device_capability

Signed-off-by: cyy <cyyever@outlook.com>

* Revert "Fix ROCm get_device_capability"

This reverts commit 0e756434bd7e74ffd73de5500476072b096570a6.

* Remove unnecessary check

Signed-off-by: cyy <cyyever@outlook.com>

* Revert changes

Signed-off-by: cyy <cyyever@outlook.com>

---------

Signed-off-by: cyy <cyyever@outlook.com>

371c44d0

[Feat] Support npu in modeling models (#37369) · 7ff896c0
duanjunwen authored 3 months ago

7ff896c0
Adding to self_comment_ci.yml (#37426) · 10907e28
Mohamed Mekkouri authored 3 months ago
```
add myself
```
10907e28

(Part 2) feat: allow for tp_size attr for tplizing the model (#37054) · 7d768764

Mehant Kammakomati authored 3 months ago


* feat: custom tp_size, new transformers tp interface

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>

* fix: review cmt - error when tp_plan not set for tp_size

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>

* fix: nit in docs

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>

---------

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com>

7d768764

fix: use mtime by default in Trainer._rotate_checkpoints with automatic fallback (#37260) · dac44341
Terrasse authored 3 months ago
```
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
```
dac44341

Add GGUF support to Gemma3 Text backbone (#37424) · 6daec12d

Isotr0py authored 3 months ago


* add gemma3 gguf support

Signed-off-by: Isotr0py <2037008807@qq.com>

* fix typo and add gguf limit

Signed-off-by: Isotr0py <2037008807@qq.com>

* fix a typo

Signed-off-by: Isotr0py <2037008807@qq.com>

* add vision conversion test

Signed-off-by: Isotr0py <2037008807@qq.com>

* fix typos

Signed-off-by: Isotr0py <2037008807@qq.com>

---------

Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

6daec12d

Llama Kernel integration (#37092) · 0ea11512

Mohamed Mekkouri authored 3 months ago

* initial commit

* style

* update

* change approach attention

* clean up

* fix import

* update

* update

* fix style

* change method

* attention

* add mlp back

* change name

* update name

* fix copies

* fix config

* fix

0ea11512

Fix require_read_token (#37422) · 9c0c323e
Mohamed Mekkouri authored 3 months ago
```
* nit

* fix

* fix
```
9c0c323e

Correctly drop tokens in SwitchTransformer (#37123) · bde41d69

Mario Michael Krell authored 3 months ago

Previously, the identity function was used for dropped tokens
with a weight from the expert that was not applied to the hidden states.
This was misleading, because dropping means, the expert weight is zero.
Instead of trying to fix the weight, we take an easier approach by initializing with zeros.

Fixes issue https://github.com/huggingface/transformers/issues/37017

bde41d69

Add image classifier donut & update loss calculation for all swins (#37224) · 7ecc5b88

AbdelKarim ELJANDOUBI authored 3 months ago

* add classifier head to donut

* add to transformers __init__

* add to auto model

* fix typo

* add loss for image classification

* add checkpoint

* remove no needed import

* reoder import

* format

* consistency

* add test of classifier

* add doc

* try ignore

* update loss for all swin models

7ecc5b88

Quark Quantization gated repo (#37412) · 5ae9b2ca
Mohamed Mekkouri authored 3 months ago
```
* fix

* empty commit

* empty

* nit

* fix maybe ?
```
5ae9b2ca
Fix new failure reports not including anything other than `tests/models/` (#37415) · d9e76656
Yih-Dar authored 3 months ago
```
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
d9e76656

[chat-template] Unify tests and clean up 🧼 (#37275) · 1ae8d54b

Raushan Turganbay authored 3 months ago

* fix tests and some clean up

* make one general test for each modality

* remove redundant merging of kwargs

* edge cases

* dont enforce slow when reloading

* fix gemma3 tests

* has to adapt llama 4 after rebase

* remove also from overriden tests

* should be green now

1ae8d54b

use `rms_norm_eps` for the L2Norm for Llama4 (#37418) · 10144ff1
Arthur authored 3 months ago
```
use `rms_norm_eps`
```
10144ff1

Allow rocm systems to run these tests (#37278) · aa478567

ivarflakstad authored 3 months ago

* Allow rocm systems to run these tests

* Fix skipTest logic

* Use get_device_properties to check system capabilities

aa478567

from_pretrained should handle xpu case (#37382) · ae5ce226

Wang, Yi authored 3 months ago


* from_pretrained should handle xpu case

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* fmt

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

---------

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

ae5ce226

Send trainer/fsdp/deepspeed CI job reports to a single channel (#37411) · 4f139f5a

Yih-Dar authored 3 months ago


* send trainer/fsdd/deepspeed channel

* update

* change name

* no .

* final

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

4f139f5a

update `kernels` to 0.4.3 (#37419) · a2c2fb01
Arthur authored 3 months ago
```
* update `kernels`

* oups
```
a2c2fb01
mark llama4 as not supported with fa2 (#37416) · 0ddad2d6
Wing Lian authored 3 months ago

0ddad2d6

Offloaded hybrid cache for Llama4 (#37401) · fbb2054e

Cyril Vallez authored 3 months ago

* first try (maybe race condition)

* Update cache_utils.py

* cannot avoid the race condition -> use 2 layers

* Update cache_utils.py

* Update cache_utils.py

fbb2054e

Fix Llama4 offset (#37414) · 6d8b0b33
Cyril Vallez authored 3 months ago
```
* add +1

* Update modeling_llama4.py
```
6d8b0b33
Restrict & Explain tp_plan for FBgemm (#37404) · f5865d32
Mohamed Mekkouri authored 3 months ago
```
* explain tp_plan

* add llama4 check

* add clarification
```
f5865d32
Handle torch ver in flexattn (#37400) · e39c7326
Serge Panev authored 3 months ago
```
* Handle torch ver in flexattn

* update
```
e39c7326
Add warning when failed to acquire other user's lock at model download (#37395) · bc0150bb
Manuel de Prada Corral authored 3 months ago

bc0150bb

09 Apr, 2025 4 commits

handle torch version edge cases (#37399) · 9cda4265
Wing Lian authored 3 months ago

9cda4265

the fix that did not get in (#37370) · e032d12e

Arthur authored 3 months ago


* debugging improvements

* add debugging details

* add more debugging details

* debug more

* the fix that did not get in

* First fix flex

* fix query offset

* fix flex first

* fix device mask creation for speed

* small mask creation sdpa

* Update flex_attention.py

* remove chunked prefill from HybridChunkedCache

* never seen such a fucked up merged

* clean up layers + output

* add summary json file

* Efficient general cache

* Update cache_utils.py

* cleanup

* fix?

* fix!

* oups typo

* not everywhere

* more fixes

* revert unrelated changes

* Fix but ugly for now -> should use pad instead

* oups

* re-initialize the cache

* Use pad to simplify

* style

* correct slicing

---------

Co-authored-by: Pablo <pablo.montalvo.leroux@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

e032d12e

Attention Quantization with FBGemm & TP (#37384) · f834ca2c
Mohamed Mekkouri authored 3 months ago
```
* fix

* keep fused

* contiguous

* rm print

* update

* update

* rm print
```
f834ca2c

Fix some failing AWQ tests (#37383) · c5c648dd

DerekLiu35 authored 3 months ago

* update AwqQuantizer

* fix style

* add an arg to get_modules_to_not_convert to add get_keys_to_not_convert(model)

c5c648dd