Commits · ee38fc31fb8b50f6a1903c11622d5e1ba01d463b · zhusg / transformers-new

21 Mar, 2024 13 commits

Llama: always convert the causal mask in the SDPA code path (#29663) · ee38fc31
Joao Gante authored 1 year ago
```
* always convert the mask

* rebase and fix copies
```
ee38fc31
Generate: remove legacy generation mixin imports (#29782) · 5ffef2a9
Joao Gante authored 1 year ago

5ffef2a9
Add support for `torch_dtype` in the run_mlm example (#29776) · ef6e371d
Jacky Lee authored 1 year ago
```
feat: add support for torch_dtype

Co-authored-by: Jacky Lee <jackylee328@gmail.com>
```
ef6e371d
Add deterministic config to `set_seed` (#29778) · 10d232e8
Zach Mueller authored 1 year ago
```
* Add deterministic config

* Add note on slowdown

* English fails me again
```
10d232e8
Silence deprecations and use the DataLoaderConfig (#29779) · f0bfb150
Zach Mueller authored 1 year ago
```
* Remove deprecations

* Clean
```
f0bfb150
Cast bfloat16 to float32 for Numpy conversions (#29755) · de627f5a
Matt authored 1 year ago
```
* Cast bfloat16 to float32 for Numpy conversions

* Add test
```
de627f5a
[`LlavaNext`] Fix llava next unsafe imports (#29773) · 73a73b41
Arthur authored 1 year ago
```
* path llava-next

* styling

* styling
```
73a73b41
Fix docker image build for `Latest PyTorch + TensorFlow [dev]` (#29764) · 2ddceef9
Yih-Dar authored 1 year ago
```
* update

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
2ddceef9
fix issue with logit processor during beam search in Flax (#29636) · fd734be1
théo gigant authored 1 year ago
```
fix issue with logit processor in beam search in Flax
```
fd734be1

Allow `-OO` mode for `docstring_decorator` (#29689) · 691c3d73

Fixes
```
  File "/nix/store/rv8xdwghdad9jv2w86b8g08kan9l6ksm-python3.11-transformers-4.38.2/lib/python3.11/site-packages/transformers/models/auto/configuration_auto.py", line 987, in <module>
    class AutoConfig:
  File "/nix/store/rv8xdwghdad9jv2w86b8g08kan9l6ksm-python3.11-transformers-4.38.2/lib/python3.11/site-packages/transformers/models/auto/configuration_auto.py", line 1011, in AutoConfig
    @replace_list_option_in_docstrings()
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nix/store/rv8xdwghdad9jv2w86b8g08kan9l6ksm-python3.11-transformers-4.38.2/lib/python3.11/site-packages/transformers/models/auto/configuration_auto.py", line 966, in docstring_decorator
    lines = docstrings.split("\n")
            ^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'split'
```

691c3d73

OWL-ViT box_predictor inefficiency issue (#29712) · 9556054f

Rahul Vinod Vishwakarma authored 1 year ago


* Calculating box_bias at the start once, then reusing it at inference

* Updating the compute_box_bias function for backwards compatibility

* Caching compute_box_bias function

* Bux fix

* Update owlv2 accordingly to ensure repo consistency

* Co-authored by: nvbinh15 <binh.pdc01@gmail.com>

* Fixup changes

* Made copied code consistent

* Co-authored by: nvbinh15 <binh.pdc01@gmail.com>

---------

Co-authored-by: Nguyen Van Binh <>
Co-authored-by: Nguyen Van Binh <binh.pdc01@gmail.com>

9556054f

Fixed typo in quantization_config.py (#29766) · 0639034a

Ash Kuroki authored 1 year ago

Update quantization_config.py

Fixed typo for clarity and correctness.

previous: input time
current: input type
// changed time to type to fix the typo

0639034a

[docs] Remove redundant `-` and `the` from custom_tools.md (#29767) · 5d1a58a6
Michael authored 1 year ago
```
[docs] Remove redundant  and  from custom_tools.md
```
5d1a58a6

20 Mar, 2024 16 commits

[`BC 4.37 -> 4.38`] for Llama family, memory and speed (#29753) · ff841900

Arthur authored 1 year ago

* attempt to fix

* the actual fix that works with compilation!

* this?

* temporary update

* nit?

* dispatcg to memory efficient?

* update both models that have static cache support

* fix copies fix compile

* make sure fix

* fix cohere and gemma

* fix beams?

* nit

* slipped through the cracks

* nit

* nits

* update

* fix-copies

* skip failing tests

* nits

ff841900

[`BitsAndBytesConfig`] Warning for unused `kwargs` & safety checkers for... · 8dd4ce6f

Benjamin Ye authored 1 year ago

[`BitsAndBytesConfig`] Warning for unused `kwargs` & safety checkers for `load_in_4bit` and `load_in_8bit` (#29761)

* added safety checkers for load_in_4bit and load_in_8bit on init, as well as their setters

* Update src/transformers/utils/quantization_config.py

typo correction for load_in_8bit setter checks

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

8dd4ce6f

Fix docker image build (#29762) · 17e4467f
Yih-Dar authored 1 year ago
```
update

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
17e4467f
Update test reqs to include sentencepiece (#29756) · c78f5772
Zach Mueller authored 1 year ago
```
* Update test reqs

* Clean
```
c78f5772

Add LLaVa-1.6, bis (#29586) · d91fd7f9

NielsRogge authored 1 year ago


* First draft

* Fix tests, add docs

* Improve docstrings

* Fix test

* Address comments

* Address comments

* Remove vocab_size attribute

* Remove batch_size

* Address comment

* Add image processor tests

* Support fx

* Update docstring

* Add support for 34b

* Convert 34b model

* Add integration tests

* Update checkpoints

* Convert vicuna-13b, remove doc tests

* Remove script

* Remove file

* Address comments

* Improve docstrings

* Deprecate vocab_size

* Remove aspect_ratio_setting

* Address comments

* Update READMEs

* Add tips about chat templates

* Fix tests

* Deprecate vocab_size safely

* Update tests

---------

Co-authored-by: Amy Roberts <22614925+amyeroberts@users.noreply.github.com>

d91fd7f9

Add correct batched handling for apply_chat_template (#29222) · 9d999481

Matt authored 1 year ago


* Add correct batched handling for apply_chat_template

* Fix warning method

* Add error for incompatible options

* expand tests

* Add a skip for markuplm

* Add skips for other layout models

* Skip for LayoutLMv2

* Slightly update the warning message

* Update src/transformers/tokenization_utils_base.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/tokenization_utils_base.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/tokenization_utils_base.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/tokenization_utils_base.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/tokenization_utils_base.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/tokenization_utils_base.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* typo fix

* Update docstring for conversation kwarg

* Update return docstring

* Remove the warning, improve error message

* Update src/transformers/tokenization_utils_base.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/tokenization_utils_base.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/test_tokenization_common.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/test_tokenization_common.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Remove return_dict=None

* Fix up some merge cruft

* More merge cruft

* Add another skip

* Add another skip

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

9d999481

SuperPointModel -> SuperPointForKeypointDetection (#29757) · 3c17c529
amyeroberts authored 1 year ago

3c17c529
v4.40.0.dev.0 · 1248f092
Arthur Zucker authored 1 year ago

1248f092

Support sharded safetensors in TF (#29350) · 11ef35e8

Matt authored 1 year ago


* Initial commit (still lots of unfinished bits)

* (Still untested) add safetensors sharding to save_pretrained

* Fix savetensors saving, update default shard size to match PT

* Add proper loading of TF-format safetensors

* Revert default size in case that changes things

* Fix incorrect index name

* Update loading priority

* Update tests

* Make the tests a little more stringent

* Expand tests

* Add sharded cross-test

* Fix argument name

* One more test fix

* Adding mlx to the list of allowed formats

* Remove irrelevant block for safetensors

* Refactor warning logging into a separate function

* Remove unused skip_logger_warnings arg

* Update src/transformers/modeling_tf_utils.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Move function def

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

11ef35e8

fix jinja2 package version check (#29754) · 870bbb4c
Ricardo authored 1 year ago

870bbb4c

Update Mamba types and pass through use_cache attr to MambaModel (#29605) · 76b3b20f

Kola authored 1 year ago


* Update docstring for RMSNorm

* Update cache_params object to correct MambaCache type

* Update docstrings and type info

* Pass through use_cache

* ruff

* Reformat with 119 char limit per line (thanks Arthur)

* Pass through use_cache specifically to the backbone rather than all keyword arguments

* Update src/transformers/models/mamba/modeling_mamba.py

* Update src/transformers/models/mamba/modeling_mamba.py

* Update src/transformers/models/mamba/modeling_mamba.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/mamba/modeling_mamba.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update tab

* Update src/transformers/models/mamba/modeling_mamba.py

* Update src/transformers/models/mamba/modeling_mamba.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

76b3b20f

[Tests] Remove unused code (#29737) · 776c9d3a
NielsRogge authored 1 year ago
```
Remove unused code
```
776c9d3a
fix galore layerwise with frozen params (#29743) · a1a74541
peterjc123 authored 1 year ago

a1a74541

fixed the issue of DPO trainer that using one node and mutiple GPUs and set... · 8692aa88

Peng Wei authored 1 year ago

fixed the issue of DPO trainer that using one node and mutiple GPUs and set the device_map='auto' (#29695)

* fixed the issue of DPO trainer that using one node and mutiple GPUs

* before update, add the assert

* run the ruff formatter

* Update src/transformers/trainer.py

Thank you.

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* remember to do make style and make quality before commit

* Update src/transformers/trainer.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

8692aa88

Larger runner on CircleCI (#29750) · 243d0de9
Yih-Dar authored 1 year ago
```
larger runner

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
243d0de9
Tests: Musicgen tests + `make fix-copies` (#29734) · 1a5c500f
Joao Gante authored 1 year ago
```
* make fix-copies

* some tests fixed

* tests fixed
```
1a5c500f

19 Mar, 2024 9 commits

Fix `check_copies` not capturing the diff in model/paper title and link (#29724) · 66ce9593
Yih-Dar authored 1 year ago
```
* fix

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
66ce9593

Llama: partial 4d masks (#29731) · 4294f0c3

Joao Gante authored 1 year ago


* partial 4d masks

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

4294f0c3

Clean-up generation tests after moving methods to private (#29582) · 425ba56c

Raushan Turganbay authored 1 year ago

* clean-up tests

* refine comments

* fix musicgen tests

* make style

* remove slow decorator from a test

* more clean-up

* fix other failing tests

425ba56c

Implementation of SuperPoint and AutoModelForKeypointDetection (#28966) · 56baa033

StevenBucaille authored 1 year ago


* Added SuperPoint docs

* Added tests

* Removed commented part

* Commit to create and fix add_superpoint branch with a new branch

* Fixed dummy_pt_objects

* Committed missing files

* Fixed README.md

* Apply suggestions from code review

Fixed small changes

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Moved ImagePointDescriptionOutput from modeling_outputs.py to modeling_superpoint.py

* Removed AutoModelForKeypointDetection and related stuff

* Fixed inconsistencies in image_processing_superpoint.py

* Moved infer_on_model logic simply in test_inference

* Fixed bugs, added labels to forward method with checks whether it is properly a None value, also added tests about this logic in test_modeling_superpoint.py

* Added tests to SuperPointImageProcessor to ensure that images are properly converted to grayscale

* Removed remaining mentions of MODEL_FOR_KEYPOINT_DETECTION_MAPPING

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Fixed from (w, h) to (h, w) as input for tests

* Removed unnecessary condition

* Moved last_hidden_state to be the first returned

* Moved last_hidden_state to be the first returned (bis)

* Moved last_hidden_state to be the first returned (ter)

* Switched image_width and image_height in tests to match recent changes

* Added config as first SuperPointConvBlock init argument

* Reordered README's after merge

* Added missing first config argument to SuperPointConvBlock instantiations

* Removed formatting error

* Added SuperPoint to README's de, pt-br, ru, te and vi

* Checked out README_fr.md

* Fixed README_fr.md

* Test fix README_fr.md

* Test fix README_fr.md

* Last make fix-copies !

* Updated checkpoint path

* Removed unused SuperPoint doc

* Added missing image

* Update src/transformers/models/superpoint/modeling_superpoint.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Removed unnecessary import

* Update src/transformers/models/superpoint/modeling_superpoint.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Added SuperPoint to _toctree.yml

---------

Co-authored-by: steven <steven.bucaillle@gmail.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Steven Bucaille <steven.bucaille@buawei.com>

56baa033

[`GemmaConverter`] use user_defined_symbols (#29473) · 2f9a3edb

Arthur authored 1 year ago

* use user_defined_symbols

* fixup

* nit

* add a very robust test

* make sure all models are tested with the `pretrained_tokenizer_to_test`

* should we make sure we test all of them?

* merge

* remove the id

* fix test

* update

* ousies

* oups

* fixup

* fix copies check

* remove `pretrained_tokenizer_to_test`

2f9a3edb

[`Gemma`] final fixes to the modeling (#29729) · 8e2fc52e

Arthur authored 1 year ago


* gelu_pytorch_tanh

* Force config.hidden_act to be approx gelu

* Gemma bug fixes

* force_use_exact_gelu

* Update configuration_gemma.py

* Update modeling_gemma.py

* update

* update for simpler handling

* nit

* nit

* fixpup

* update

* also update the jax modeling!

* add `"gelu_pytorch_tanh": partial(nn.gelu, approximate=True),`

* fixup

* fix order

* act vs act_fn

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

8e2fc52e

[tests] add more tests to `NOT_DEVICE_TESTS` (#29670) · 229ac72b
Fanli Lin authored 1 year ago
```
* add more tests

* remove 2 tests

* add more tests
```
229ac72b

FEAT / Optim: Add GaLore optimizer (#29588) · f6261d7d

Younes Belkada authored 1 year ago


* add galore v1

* add import

* add tests and doc

* fix doctest

* forward contrib credits from discussions

* forward contrib credits from discussions

* Apply suggestions from code review

Co-authored-by: Zach Mueller <muellerzr@gmail.com>

* fix failing tests'

* switch to `optim_target_modules` and clarify docs

* more clarification

* enhance lookup logic

* update a test to add peak memory

* add regex, all-linear and single string support

* add layer-wise optimization through DummyOptimizers and LRSchedulers

* forward contrib credits from discussions and original idea

* add a section about DDP not supported in layerwise

* Update src/transformers/trainer.py

Co-authored-by: Zach Mueller <muellerzr@gmail.com>

* fix self

* check only if layer_wise

* Update src/transformers/training_args.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* oops

* make use of intervals

* clarify comment

* add matching tests

* GaLoRe -> GaLore

* move to `get_scheduler`

* add note on docs

* add a warning

* adapt a bit the docs

* update docstring

* support original API

* Update docs/source/en/trainer.md

* slightly refactor

* Update docs/source/en/trainer.md

Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>

* Update src/transformers/training_args.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* fix args parsing and add tests

* remove warning for regex

* fix type hint

* add note about extra args

* make `is_regex` return optional

---------

Co-authored-by: Maxime <maximegmd @users.noreply.github.com>
Co-authored-by: Wing Lian <winglian @users.noreply.github.com>
Co-authored-by: Zach Mueller <muellerzr@gmail.com>
Co-authored-by: hiyouga <hiyouga@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>

f6261d7d

Use logging.warning instead of warnings.warn in pipeline.__call__ (#29717) · 484e10f7
Motoki Wu authored 1 year ago
```
* Use logging.warning instead of warnings.warn in pipeline.__call__

* Update src/transformers/pipelines/base.py
```
484e10f7

18 Mar, 2024 2 commits

Update the pipeline tutorial to include `gradio.Interface.from_pipeline` (#29684) · 838b87ab

Abubakar Abid authored 1 year ago


* Update pipeline_tutorial.md to include gradio

* Update pipeline_tutorial.md

* Update docs/source/en/pipeline_tutorial.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/pipeline_tutorial.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/pipeline_tutorial.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/pipeline_tutorial.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update pipeline_tutorial.md

* Update docs/source/en/pipeline_tutorial.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

838b87ab

FIX [`bnb`] Make `unexpected_keys` optional (#29420) · c852d4fb

Younes Belkada authored 1 year ago


* make `unexpected_keys` optional

* push

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

c852d4fb