Commits · composable-tp · zhusg / transformers-new

10 Apr, 2025 2 commits
- Fix: correct sharding dims · d4b01f2f
  S1ro1 authored 2 months ago
  
  d4b01f2f
- WIP: boiler plate for dp+tp · f13e867c
  S1ro1 authored 2 months ago
  
  f13e867c
09 Apr, 2025 16 commits

Merge branch 'main' into tp-size · c0737368
Matej Sirovatka authored 2 months ago

c0737368
handle torch version edge cases (#37399) · 9cda4265
Wing Lian authored 2 months ago

9cda4265

the fix that did not get in (#37370) · e032d12e

Arthur authored 2 months ago


* debugging improvements

* add debugging details

* add more debugging details

* debug more

* the fix that did not get in

* First fix flex

* fix query offset

* fix flex first

* fix device mask creation for speed

* small mask creation sdpa

* Update flex_attention.py

* remove chunked prefill from HybridChunkedCache

* never seen such a fucked up merged

* clean up layers + output

* add summary json file

* Efficient general cache

* Update cache_utils.py

* cleanup

* fix?

* fix!

* oups typo

* not everywhere

* more fixes

* revert unrelated changes

* Fix but ugly for now -> should use pad instead

* oups

* re-initialize the cache

* Use pad to simplify

* style

* correct slicing

---------

Co-authored-by: Pablo <pablo.montalvo.leroux@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

e032d12e

Attention Quantization with FBGemm & TP (#37384) · f834ca2c
Mohamed Mekkouri authored 2 months ago
```
* fix

* keep fused

* contiguous

* rm print

* update

* update

* rm print
```
f834ca2c

Fix some failing AWQ tests (#37383) · c5c648dd

DerekLiu35 authored 2 months ago

* update AwqQuantizer

* fix style

* add an arg to get_modules_to_not_convert to add get_keys_to_not_convert(model)

c5c648dd

Merge branch 'main' into tp-size · d77505c0
Marc Sun authored 2 months ago

d77505c0

Apply torchfix to replace deprecated functions:... · 71b35387

Brayden Zhong authored 2 months ago

Apply torchfix to replace deprecated functions: `_pytree._register_pytree_node` and `torch.cpu.amp.autocast` (#37372)

fix: apply torchfix

71b35387

Fix warning message for PEFT models in text-generation pipeline #36783 (#36887) · ad340908

Sangyun_LEE (이상윤) authored 2 months ago

* add peft model in constant

* add test

* fix formating

* make fixup execute

* change code

* check by self.task

* add test

* fixup test code

* fix minor typo

* fix pipeline test

* apply maintainers reqests

ad340908

Add "selecting a quantization method" doc (#37159) · 2527f71a

DerekLiu35 authored 2 months ago


* initial draft

* make documentation simpler

* Update docs/source/en/quantization/selecting.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quantization/selecting.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quantization/selecting.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quantization/selecting.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quantization/selecting.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quantization/selecting.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quantization/selecting.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quantization/selecting.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quantization/selecting.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quantization/selecting.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* turn pros and cons into tables

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* add links to each quant method page

* separate calibration vs no calibration methods

* add calibration time estimates

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

2527f71a

fix: nit in docs · a65130c6
Mehant Kammakomati authored 2 months ago
```
Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>
```
a65130c6
fix: review cmt - error when tp_plan not set for tp_size · bb2950d1
Mehant Kammakomati authored 2 months ago
```
Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>
```
bb2950d1
feat: custom tp_size, new transformers tp interface · 1059fffb
Mehant Kammakomati authored 2 months ago
```
Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>
```
1059fffb

update deepspeed docker (#37371) · 7ae0be72

Marc Sun authored 2 months ago


* update

* create docker image

* 03

* uninstall pytest as it conflits with transformers

* wrong one

* better

* see which package depends on pytest

* up

* resintall

* fix

* deepspeedddddddd

* deepspeedddddddd

* deepspeedddddddd

* deepspeedddddddd

* deepspeedddddddd

* deepspeedddddddd

* deepspeedddddddd

* deepspeedddddddd

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

7ae0be72

Add glm4 (#37388) · e3eda6d1

Arthur authored 2 months ago


* add changed

* Revert "add changed"

This reverts commit 0a0166a1fe80556115a49fbf0c2132de0f4f85c9.

* update with NEW MODEL class called GLM4

* update

* Update glm4.md

* Name

* style

* fix copies

* fixup test

---------

Co-authored-by: Yuxuan Zhang <2448370773@qq.com>

e3eda6d1

fix: llama4 conversion script no_rope_layers (#37359) · 1e6ff5fd

Jonas M. Kübler authored 2 months ago


fix conversion script no_rope_layers

`no_rope_layers` should either be a list of NoPE layers or None, such that it is created in the config from the `no_rope_layer_interval`

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

1e6ff5fd

Update composition flag usage (#36263) · 6f4058ae

Raushan Turganbay authored 2 months ago

* update composition flag usage

* remove print

* fix tests

* actually fix

* oh c'mon

* now should be fixed right?

* fix copies

6f4058ae

08 Apr, 2025 14 commits

Preserve requires_grad in pre quantized model (#37354) · 08e3217b

Jerry Zhang authored 2 months ago


* Preserve requires_grad in pre quantized model

Summary:
discovered this when running lm-eval for some models, current
code will set requires_grad to True always

Test Plan:
lm_eval --model hf --model_args pretrained=jerryzh168/phi4-torchao-gguf-q4_k --tasks hellaswag --device cuda:0 --batch_size 8

Reviewers:

Subscribers:

Tasks:

Tags:

* ruff format

---------

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

08e3217b

Setup -> setupclass conversion (#37282) · 4d0de5f7

Matt authored 2 months ago

* More limited setup -> setupclass conversion

* make fixup

* Trigger tests

* Fixup UDOP

* Missed a spot

* tearDown -> tearDownClass where appropriate

* Couple more class fixes

* Fixups for UDOP and VisionTextDualEncoder

* Ignore errors when removing the tmpdir, in case it already got cleaned up somewhere

* CLIP fixes

* More correct classmethods

* Wav2Vec2Bert fixes

* More methods become static

* More class methods

* More class methods

* Revert changes for integration tests / modeling files

* Use a different tempdir for tests that actually write to it

* Remove addClassCleanup and just use teardownclass

* Remove changes in modeling files

* Cleanup get_processor_dict() for got_ocr2

* Fix regression on Wav2Vec2BERT test that was masked by this before

* Rework tests that modify the tmpdir

* make fix-copies

* revert clvp modeling test changes

* Fix CLIP processor test

* make fix-copies

4d0de5f7

fix(qwen): fix shape error when using tp (#36947) · c15a7adb

KimmiShi authored 2 months ago


* fix(qwen): fix shape error when using tp

* Update modeling_qwen2_vl.py

---------

Co-authored-by: shidongxing <shidongxing@pjlab.org.cn>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

c15a7adb

prune LM Head for USD (#36695) · 121f91d3

Jonathan Mamou authored 2 months ago


* initial commit

* fix

* fix style

* set default to prune

* add tests

* comment

* remove prune flag from generate

* address Joao's comments

* deprecate_kwarg

* add doc

* fix target_vocab_size

* Update src/transformers/generation/candidate_generator.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update src/transformers/generation/candidate_generator.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update src/transformers/generation/candidate_generator.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update src/transformers/generation/candidate_generator.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* fix deprecated argument assistant_model_device

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

121f91d3

[core] remove `GenerationMixin` inheritance by default in `PreTrainedModel` (#37173) · 4321b064
Joao Gante authored 2 months ago

4321b064

Skip non-selected experts for mixtral and qwen2_moe (#32429) · aab08783

Kerry authored 2 months ago


* Skip non-selected experts for mixtral and qwen2_moe

* Fix: tensor tolist()

* WIP: tokenization test

* fix modular source of truth

* nits

---------

Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

aab08783

[llama 4] dynamic rope decorator (#37365) · 35f0f5b5
Joao Gante authored 2 months ago
```
l4 + dynamic rope decorator
```
35f0f5b5

Set vision config to None for Gemma 1B conversion (#37366) · 530322cc

Ryan Mullins authored 2 months ago


* Set vision config to None for Gemma 1B conversion

* Trigger tests

---------

Co-authored-by: Matt <rocketknight1@gmail.com>

530322cc

fix deepspeed job (#37284) · 8064cd9b
Yih-Dar authored 2 months ago
```
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
8064cd9b
A bit of cleaning 🧹🧹 (#37215) · cdfb018d
Cyril Vallez authored 2 months ago
```
* cleaning

* CIs
```
cdfb018d
Use Python 3.9 syntax in tests (#37343) · 1e6b546e
cyyever authored 2 months ago
```
Signed-off-by: cyy <cyyever@outlook.com>
```
1e6b546e

convert float for yarn related arguments in rope_scaling (#37139) · 0fc683d1

Minho Ryu authored 2 months ago


* convert float for yarn related arguments in rope_scaling

* sort keys alphabetically

---------

Co-authored-by: ryan.agile <ryan.agile@kakaobrain.com>

0fc683d1

Expose blip2qformer (#37254) · 2515a5a2
Alex Brooks authored 2 months ago
```
* Expose blip2qformer

* Add missing args to blip2 config
```
2515a5a2

Multiple llama4 fixe (#37353) · 2da82e43

Arthur authored 2 months ago

* update for fixes

* more fixes

* fuxix dynamic cache?

* style

* fix both traiining and generating. Eager seems alright

* dynamic does not work

* fix most cases, use_cache or not, eager or not, no default cache (ex: not training but you want to get cache states)

* should be final fixes

* fix more stuff no cat

* style

* fix

* style

* final sytle

* qualityeioiwhjfaopsejdpofqsdjkfjha;wesdhgfkjlqsw.denghjkaswednkgs

* fix

* revert

2da82e43

07 Apr, 2025 8 commits

Fixing flex attention for torch=2.6.0 (#37285) · 794fde7b

salman authored 2 months ago


* adding compile kwarg for torch 2.6

* fixing dynamic

* addressing comment

* typo

* Update src/transformers/integrations/flex_attention.py

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

794fde7b

more fixes for post-training llama4 (#37329) · b54c2f46

Wing Lian authored 2 months ago

* more fixes for post-training llama4

* use target_length instead of guearded past_key_values

b54c2f46

Remove unnecessary attr assignment (#36837) · 754a370b
Tugsbayasgalan Manlaibaatar authored 2 months ago
```
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
```
754a370b

Updated Model-card for donut (#37290) · 31a62c2e

logesh R authored 2 months ago


* Updated documentation for Donut model

* Update docs/source/en/model_doc/donut.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/donut.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/donut.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/donut.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Updated code suggestions

* Update docs/source/en/model_doc/donut.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Updated code suggestion to Align with the AutoModel example

* Update docs/source/en/model_doc/donut.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Updated notes section included code examples

* close hfoption block and indent

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

31a62c2e

Add bnb to the list of supported quantization methods for LLama4 (#37348) · f8301051
Mohamed Mekkouri authored 2 months ago
```
* add bnb

* style

* update

* add pre_quantized check
```
f8301051

Update Model Card for Jamba (#37152) · e2b0224d

Parag Ekbote authored 2 months ago


* Update model card for jamba

* Apply the suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Apply suggestions from code review-2

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* update model page.

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update as per code review.

* Update docs/source/en/model_doc/jamba.md as per code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/jamba.md as per code review

`

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* update as per code review.

* fixes

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

e2b0224d

Improvements in Gemma2 model card (#37076) · 6cc109c3

Devesh Rahatekar authored 2 months ago


* Improved Model card for Gemma2

* Made changes in gemma2 as suggested

* Made more changes in the doc (adding image, notes, closing hfoptions)

* minor fixes

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

6cc109c3

Clean up the compressed-tensors integration (#37349) · 8bbcdf54
Mohamed Mekkouri authored 2 months ago
```
clean up
```
8bbcdf54