Commits · processor-template-duplicated-tokens · 某某某 / transformers-new

06 Feb, 2025 2 commits

Processor: prevent duplicated tokens · c4cbed80

Pedro Cuenca authored 4 months ago

When using text-only LLMs, the chat template is expected to take care of
adding the required special tokens, such as bos. Hence, tokenization
must not include special tokens.

The same contract should be honored for multimodal processors.

c4cbed80

Add `Qwen2VLImageProcessorFast` into `Qwen2VLProcessor` (#35987) · b5f327f3

Ye Liu authored 4 months ago


* Add `Qwen2VLImageProcessorFast` into `Qwen2VLProcessor`

* Use `AutoImageProcessor` instead

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

---------

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

b5f327f3

05 Feb, 2025 10 commits

Fix Audio Classification Pipeline top_k Documentation Mismatch and Bug #35736 (#35771) · 0de15c98

Sambhav Dixit authored 4 months ago


* added condition for top_k Doc mismatch fix

* initilation of test file for top_k changes

* added test for returning all labels

* added test for few labels

* tests/test_audio_classification_top_k.py

* final fix

* ruff fix

---------

Co-authored-by: sambhavnoobcoder <indosambahv@gmail.com>

0de15c98

Fix how we compute the final non-padding token for ForSequenceClassification models (#35911) · 694aaa7f

Matt authored 4 months ago

* Fix how we compute the final non-padding token for Gemma (and probably other models)

* .size() -> .shape[]

* Propagating changes to other models

* Propagating changes to other models

* Change it for all ForSequenceClassification models

* Fix batch dim

* More TF fixes

* Copy the TF fix around as well

* Correct layer name for TFCTRL

* Cleaner .to()

* Clean up the nested if-else

* Use argmax() instead of .max().values

694aaa7f

[docs] no hard-coding cuda (#36043) · 531d1511
Fanli Lin authored 4 months ago
```
make device-agnostic
```
531d1511
[docs] fix bugs in the bitsandbytes documentation (#35868) · 7399f802
Fanli Lin authored 4 months ago
```
* fix doc

* update model
```
7399f802

[docs] no hard coding cuda as bnb has multi-backend support (#35867) · 0a1a8e3c

Fanli Lin authored 4 months ago


* change cuda to DEVICE

* Update docs/source/en/llm_tutorial.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

0a1a8e3c

DeepSpeed github repo move sync (#36021) · 9dc1efa5
Stas Bekman authored 4 months ago
```
deepspeed github repo move
```
9dc1efa5
add support for empty list as input to create_model_card (#36042) · c772bff3
ROZBEH authored 4 months ago
```
handle cases where it is list
```
c772bff3
Add XPU type for work-around -inf mask causing sdpa NaN issue in modeling files (#35647) · 315a9f49
Liangliang Ma authored 4 months ago
```
* add xpu for unmask

* change modular for generated matching

* add lastest modeling for helium
```
315a9f49

Fix synced multi-GPU generation with LLMs and VLMs (#35893) · d8080d55

ManukyanD authored 4 months ago


* Fix synced multi-GPU generation

* fix copies

---------

Co-authored-by: Davit Manukyan <ManukyanD>
Co-authored-by: Raushan Turganbay <raushan@huggingface.co>

d8080d55

Fix Gemma2 synced multi-GPU generation (#35232) · 4831a94e
ManukyanD authored 4 months ago
```
* Fix Gemma2 synced multi-GPU generation

* Fix import ordering in modular_gemma2.py
```
4831a94e

04 Feb, 2025 16 commits

Refactoring of ImageProcessorFast (#35069) · fa56dcc2

Yoni Gozlan authored 4 months ago

* add init and base image processing functions

* add add_fast_image_processor to transformers-cli

* add working fast image processor clip

* add fast image processor to doc, working tests

* remove "to be implemented" SigLip

* fix unprotected import

* fix unprotected vision import

* update ViTImageProcessorFast

* increase threshold slow fast ewuivalence

* add fast img blip

* add fast class in tests with cli

* improve cli

* add fast image processor convnext

* add LlavaPatchingMixin and fast image processor for llava_next and llava_onevision

* add device kwarg to ImagesKwargs for fast processing on cuda

* cleanup

* fix unprotected import

* group images by sizes and add batch processing

* Add batch equivalence tests, skip when center_crop is used

* cleanup

* update init and cli

* fix-copies

* refactor convnext, cleanup base

* fix

* remove patching mixins, add piped torchvision transforms for ViT

* fix unbatched processing

* fix f strings

* protect imports

* change llava onevision to class transforms (test)

* fix convnext

* improve formatting (following Pavel review)

* fix handling device arg

* improve cli

* fix

* fix inits

* Add distinction between preprocess and _preprocess, and support for arbitrary kwargs through valid_extra_kwargs

* uniformize qwen2_vl fast

* fix docstrings

* add add fast image processor llava

* remove min_pixels max_pixels from accepted size

* nit

* nit

* refactor fast image processors docstrings

* cleanup and remove fast class transforms

* update add fast image processor transformers cli

* cleanup docstring

* uniformize pixtral fast and  make _process_image explicit

* fix prepare image structure llava next/onevision

* Use typed kwargs instead of explicit args

* nit fix import Unpack

* clearly separate pops and gets in base preprocess. Use explicit typed kwargs

* make qwen2_vl preprocess arguments hashable

fa56dcc2

Add DAB-DETR for object detection (#30803) · 8d73a386

David authored 4 months ago


* initial commit

* encoder+decoder layer changes WIP

* architecture checks

* working version of detection + segmentation

* fix modeling outputs

* fix return dict + output att/hs

* found the position embedding masking bug

* pre-training version

* added iamge processors

* typo in init.py

* iterupdate set to false

* fixed num_labels in class_output linear layer bias init

* multihead attention shape fixes

* test improvements

* test update

* dab-detr model_doc update

* dab-detr model_doc update2

* test fix:test_retain_grad_hidden_states_attentions

* config file clean and renaming variables

* config file clean and renaming variables fix

* updated convert_to_hf file

* small fixes

* style and qulity checks

* return_dict fix

* Merge branch main into add_dab_detr

* small comment fix

* skip test_inputs_embeds test

* image processor updates + image processor test updates

* check copies test fix update

* updates for check_copies.py test

* updates for check_copies.py test2

* tied weights fix

* fixed image processing tests and fixed shared weights issues

* added numpy nd array option to get_Expected_values method in test_image_processing_dab_detr.py

* delete prints from test file

* SafeTensor modification to solve HF Trainer issue

* removing the safetensor modifications

* make fix copies and hf uplaod has been added.

* fixed index.md

* fixed repo consistency

* styel fix and dabdetrimageprocessor docstring update

* requested modifications after the first review

* Update src/transformers/models/dab_detr/image_processing_dab_detr.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* repo consistency has been fixed

* update copied NestedTensor function after main merge

* Update src/transformers/models/dab_detr/modeling_dab_detr.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* temp commit

* temp commit2

* temp commit 3

* unit tests are fixed

* fixed repo consistency

* updated expected_boxes varible values based on related notebook results in DABDETRIntegrationTests file.

* temporarialy config modifications and repo consistency fixes

* Put dilation parameter back to config

* pattern embeddings have been added to the rename_keys method

* add dilation comment to config + add as an exception in check_config_attributes SPECIAL CASES

* delete FeatureExtractor part from docs.md

* requested modifications in modeling_dab_detr.py

* [run_slow] dab_detr

* deleted last segmentation code part, updated conversion script and changed the hf path in test files

* temp commit of requested modifications

* temp commit of requested modifications 2

* updated config file, resolved codepaths and refactored conversion script

* updated decodelayer block types and refactored conversion script

* style and quality update

* small modifications based on the request

* attentions are refactored

* removed loss functions from modeling file, added loss function to lossutils, tried to move the MLP layer generation to config but it failed

* deleted imageprocessor

* fixed conversion script + quality and style

* fixed config_att

* [run_slow] dab_detr

* changing model path in conversion file and in test file

* fix Decoder variable naming

* testing the old loss function

* switched back to the new loss function and testing with the odl attention functions

* switched back to the new last good result modeling file

* moved back to the version when I asked the review

* missing new line at the end of the file

* old version test

* turn back to newest mdoel versino but change image processor

* style fix

* style fix after merge main

* [run_slow] dab_detr

* [run_slow] dab_detr

* added device and type for head bias data part

* [run_slow] dab_detr

* fixed model head bias data fill

* changed test_inference_object_detection_head assertTrues to torch test assert_close

* fixes part 1

* quality update

* self.bbox_embed in decoder has been restored

* changed Assert true torch closeall methods to torch testing assertclose

* modelcard markdown file has been updated

* deleted intemediate list from decoder module

---------

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

8d73a386

Update tests regarding attention types after #35235 (#36024) · fe52679e

Yih-Dar authored 4 months ago


* update

* update

* update

* dev-ci

* more changes

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

fe52679e

CircleCI with python 3.9 (#36027) · 014a1fa2

Yih-Dar authored 4 months ago


update docker files

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

014a1fa2

feat(ci): ignore trufflehog unverified results (#36031) · c98b4679
Luc Georges authored 4 months ago

c98b4679
Hotfix for `self-comment-ci.yml` (#36030) · 9855acb9
Yih-Dar authored 4 months ago
```
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
9855acb9

Display warning for unknown quants config instead of an error (#35963) · 9f486bad

Marc Sun authored 4 months ago


* add supports_quant_method check

* fix

* add test and fix suggestions

* change logic slightly

---------

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

9f486bad

Commont bot CI for other jobs (`generation` / `quantization`) (#35341) · f19bfa50

Yih-Dar authored 4 months ago


* quantization CI on PRs

* fix

* fix

* add 2 members

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

f19bfa50

Fix RMSNormGated in Zamba2 (#35943) · a93b8058

pglorio authored 4 months ago


* First commit

* Finish model implementation

* First commit

* Finish model implementation

* Register zamba2

* generated modeling and configuration

* generated modeling and configuration

* added hybrid cache

* fix attention_mask in mamba

* dropped unused loras

* fix flash2

* config docstrings

* fix config and fwd pass

* make fixup fixes

* text_modeling_zamba2

* small fixes

* make fixup fixes

* Fix modular model converter

* added inheritances in modular, renamed zamba cache

* modular rebase

* new modular conversion

* fix generated modeling file

* fixed import for Zamba2RMSNormGated

* modular file cleanup

* make fixup and model tests

* dropped inheritance for Zamba2PreTrainedModel

* make fixup and unit tests

* Add inheritance of rope from GemmaRotaryEmbedding

* moved rope to model init

* drop del self.self_attn and del self.feed_forward

* fix tests

* renamed lora -> adapter

* rewrote adapter implementation

* fixed tests

* Fix torch_forward in mamba2 layer

* Fix torch_forward in mamba2 layer

* Fix torch_forward in mamba2 layer

* Dropped adapter in-place sum

* removed rope from attention init

* updated rope

* created get_layers method

* make fixup fix

* make fixup fixes

* make fixup fixes

* update to new attention standard

* update to new attention standard

* make fixup fixes

* minor fixes

* cache_position

* removed cache_position postion_ids use_cache

* remove config from modular

* removed config from modular (2)

* import apply_rotary_pos_emb from llama

* fixed rope_kwargs

* Instantiate cache in Zamba2Model

* fix cache

* fix @slow decorator

* small fix in modular file

* Update docs/source/en/model_doc/zamba2.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* several minor fixes

* inherit mamba2decoder fwd and drop position_ids in mamba

* removed docstrings from modular

* reinstate zamba2 attention decoder fwd

* use regex for tied keys

* Revert "use regex for tied keys"

This reverts commit 9007a522b1f831df6d516a281c0d3fdd20a118f5.

* use regex for tied keys

* add cpu to slow forward tests

* dropped config.use_shared_mlp_adapter

* Update docs/source/en/model_doc/zamba2.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* re-convert from modular

* extended Zamba2RMSNormGated to n_groups>1

* removed einops import

* set _supports_sdpa = True

* add use_mem_eff_path flag for fused mamba2 fwd

* added docstring for use_mem_eff_ath flag

---------

Co-authored-by: root <root@node-2.us-southcentral1-a.compute.internal>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

a93b8058

Fix device mismatch error in Whisper model during feature extraction (#35866) · bc9a6d83

Sumit Vij authored 4 months ago


* Fix device mismatch error in whisper feature extraction

* Set default device

* Address code review feedback

---------

Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>

bc9a6d83

Refactor (and fix) gpt_neox (#35610) · 9afb904b

Cyril Vallez authored 4 months ago

* start a nice modular

* Update modular_gpt_neox.py

* Update modular_gpt_neox.py

* Update modular_gpt_neox.py

* Update modular_gpt_neox.py

* update

* Update modular_gpt_neox.py

* convert

* fix attribute

* fix attrs

* oups

* fix

* fix

* fix

* fix

* fix

* fix order to pass test (see with accelerate team)

* trigger CIs

* modular

* update

* up

* Update test_modeling_gpt_neox.py

* Update test_modeling_gpt_neox.py

* trigger CIs

* correctly pass arg

* simplify

* remove key warning

* update tp -> it's compatible since the view is before

* trigger CIs

9afb904b

Update Mistral converter (#35967) · ad305989

Cyril Vallez authored 4 months ago

* Update convert_mistral_weights_to_hf.py

* Update convert_mistral_weights_to_hf.py

* update

* style

* move it to integrations

* style

* trigger CIs

* trigger CIs

ad305989

layernorm_decay_fix (#35927) · b1954fd6

Ryoo Kwangrok authored 4 months ago

* layernorm_decay_fix

* W293 fix

* ruff format fix

* black format

* ruff format

* erase last layer

* add test_get_parameter_names_rmsnorm

* rmsnorm fix

b1954fd6

apply_chat_template: consistent behaviour for... · 2ba040a7

Dmitry Tarasov authored 4 months ago

apply_chat_template: consistent behaviour for return_assistant_tokens_mask=True return_tensors=True (#35582)

* apply_chat_template: consistent return_tensors behaviour with return_assistant_tokens_mask flag

* test_chat_template_return_assistant_tokens_mask: support tokenizers with no attention mask

* test_chat_template_return_assistant_tokens_mask: skip tokenizers with no padding token

* test_chat_template_return_assistant_tokens_mask: force tokenizer padding_side=right

---------

Co-authored-by: Eduard Allakhverdov <goncharova@airi.net>
Co-authored-by: d.tarasov <d.tarasov@airi.net>

2ba040a7

Fix custom kernel for DeformableDetr, RT-Detr, GroindingDINO, OmDet-Turbo in Pytorch 2.6.0 (#35979) · 9c02cb62
Pavel Iakubovskii authored 4 months ago
```
Updates type().is_cuda() -> .is_cuda(); .data<> -> .data_ptr<>
```
9c02cb62
Qwen2-VL: fix rope delta calculation (#36013) · 5d75a25b
Raushan Turganbay authored 4 months ago
```
* fix rope delats calculation

* add test

* style
```
5d75a25b

03 Feb, 2025 3 commits

Update Granite Vision Model Path / Tests (#35998) · e284c7e9

Alex Brooks authored 4 months ago


* Update granite vision model path

Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>

* Enable granite vision test

Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>

---------

Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>

e284c7e9

Add mean_resizing for every VLMs' resizing_token_embeddings() (#35717) · 9d2056f1
Gar authored 4 months ago
```
* refine all resize_token_embedding()

* ruff format

* hotfix
```
9d2056f1
Update-tp test (#35844) · 7eecdf2a
Arthur authored 4 months ago
```
* update test for now

* up

* cleanup

* update todo
```
7eecdf2a

31 Jan, 2025 4 commits

use torch 2.6 for daily CI (#35985) · 62db3e6e

Yih-Dar authored 4 months ago


use torch 2.6 for CI

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

62db3e6e

Add GOT-OCR 2.0 to Transformers (#34721) · 2b469431

Yoni Gozlan authored 4 months ago

* init modular got_ocr2

* Get correct got_ocr architecture

* add processing

* run modular with processing

* add working inference

* apply modular

* Refactor and fix style

* Refactor, cleanup, fix style

* fix init order

* Fix docs

* add base modeling tests

* fix style and consistency

* rename doc file

* fix repo consistency

* fix inference with box

* add image processing and support for crop_to_multi_page

* Fix batch inference

* add tests

* fixup

* fix slow test

* fix docstrings

* Add model doc

* update to new init

* fix input autocast pixel_values dtype

* update doc

* move doc to multimodal

* Reformat crop_image_to_patches and add docstrings

* Fix example in forward docstring

* Address Pablo review

* [run slow] got_ocr2

* remove defaults defined twice

* apply modular

* add torch_device to integration tests

* update modular

* follow-up Pavel review

* add device variable in doc

* fix doc multi-page

* Force eager attention for vision encoder to avoid attn implementation conflict

* revert qwen2vl doc changes

* use Qwen2ForCausalLM instead of Qwen2Model

* make fixup

* refactor gotocr2 to llava style

* uniformize function names and reduce checks

* final nits

* fix pixel_values dtype error

* change checkpoint names

* fix modular

2b469431

[Moshi] disable automatic compilation if the model can't compile (#35992) · 5bbee12a
Joao Gante authored 4 months ago
```
moshi cant compile
```
5bbee12a
[Moonshine] compute head_dim_padding at init (#35984) · e6f4a4eb
eustlb authored 4 months ago
```
compute head_dim_padding at init
```
e6f4a4eb

30 Jan, 2025 5 commits

Add support for nested images to LLava and VipLLava (#35558) · d7188ba6

Yoni Gozlan authored 4 months ago

* move make_flat_list_of_images and make_batched_videos to image_utils

* remove unnecessary is_vision_available

* move make_nested_list_of_images to image_utils

* fix fast pixtral image processor

* fix import mllama

* fix make_nested_list_of_images

* add tests

* convert 4d arrays/tensors to list

* add test_make_batched_videos

* add support nested batch of videos

* fix image processing qwen2vl

d7188ba6

Handle empty change indices in SAM's mask to rle conversion (#35665) · e4227eb4

Marcel authored 4 months ago

* Handle empty change indices in RLE conversion for masks

* [test] Add unit tests for RLE encoding of masks in SamProcessor

* [test] Update RLE conversion tests to use TensorFlow implementation

* [test] Fix formatting in SamProcessorTest according to check_code_quality action

* [test] Fix formatting in SamProcessorTest according to check_code_quality

* [test] Refactored rle test cases into one test and used tf tensors in tf test cases

* [test] Fix: removed self parameter from refactored methods

* [test] Removed nested methods in run-length encoding tests for PyTorch and TensorFlow

* [test] Added description to individual to run-length encoding tests for PyTorch and TensorFlow.

e4227eb4

not to use A100 for `benchmark.yml` (#35974) · 47bd4296
Yih-Dar authored 4 months ago
```
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
47bd4296

Support batching for UsefulSensors Moonshine (#35922) · 693328f2

Nat Jeffries authored 4 months ago


* Add support for attention masking in moonshine.

Tested against Open ASR Leaderboard with batch size 256.

* Update comments and ensure attention masks are passed everywhere.

Perform attention mask downsampling inside of moonshine forward call.

* Hide padding behind conditional. Fix encoder/decoder masking.

- Correctly pipe encoder attention mask into decoder
- Add correct scaling factor if one is not already provided.
- Fix formatting with ruff

* Add auto generated modeling_moonshine file.

* Update formatting in generated model file.

* Address review comments.

* Fix typo.

* Add `pad_head_dim_to_multiple_of` to moonshine config.

* Correct args order for MooonshineConfig.

* Update configuration moonshine too.

* Update src/transformers/models/moonshine/modular_moonshine.py

* Update src/transformers/models/moonshine/configuration_moonshine.py

---------

Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>

693328f2

Less flaky for `TimmBackboneModelTest::test_batching_equivalence` (#35971) · 57576818
Yih-Dar authored 4 months ago
```
* fix

* remove is_flaky

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
57576818