- 12 Feb, 2025 17 commits
-
-
Zach Mueller authored
* Add more rigerous non-slow grad accum tests * Further nits * Re-add space * Readbility * Use tinystories instead * Revert transformer diff * tweak threshs
-
Ke Wen authored
Update doc about models' TP support
-
hsilva664 authored
* Adding option to save/reload scaler * Removing duplicate variable * Adding save/reload test * Small fixes on deterministic algorithm call * Moving LLM test to another file to isolate its environment * Moving back to old file and using subprocess to run test isolated * Reverting back accidental change * Reverting back accidental change
-
kang sheng authored
* Fix multi gpu loss sync condition, add doc and test * rename function and class * loss should not scale during inference * fix typo
-
zhuHQ authored
* Added APOLLO optimizer integration * fix comment * Remove redundancy: Modularize low-rank optimizer construction * Remove redundancy: Remove useless comment * Fix comment: Add typing * Fix comment: Rewrite apollo desc
-
Dmitry Rogozhkin authored
* milti-gpu: fix inputs_embeds + position_embeds Fixing the following errors in few models: ``` > hidden_states = inputs_embeds + pos_embeds E RuntimeError: Expected all tensors to be on the same device, but found at least two devices, xpu:2 and xpu:3! ``` Fixes: #35762 Signed-off-by:
Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com> * multi-gpu: fix tensor device placements for various models Fixes: #35762 Signed-off-by:
Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com> * Apply make fix-copies Signed-off-by:
Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com> --------- Signed-off-by:
Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
-
Lucain authored
* Remove cache migration script * remove dummy move_cache
-
dependabot[bot] authored
Bump cryptography from 43.0.1 to 44.0.1 in /examples/research_projects/decision_transformer (#36142) Bump cryptography in /examples/research_projects/decision_transformer Bumps [cryptography](https://github.com/pyca/cryptography) from 43.0.1 to 44.0.1. - [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst) - [Commits](https://github.com/pyca/cryptography/compare/43.0.1...44.0.1 ) --- updated-dependencies: - dependency-name: cryptography dependency-type: direct:production ... Signed-off-by:
dependabot[bot] <support@github.com> Co-authored-by:
dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
-
dependabot[bot] authored
Bump transformers in /examples/research_projects/vqgan-clip Bumps [transformers](https://github.com/huggingface/transformers) from 4.38.0 to 4.48.0. - [Release notes](https://github.com/huggingface/transformers/releases) - [Commits](https://github.com/huggingface/transformers/compare/v4.38.0...v4.48.0 ) --- updated-dependencies: - dependency-name: transformers dependency-type: direct:production ... Signed-off-by:
dependabot[bot] <support@github.com> Co-authored-by:
dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
-
Leon Engländer authored
Replace In-Place Operations for Deberta and Deberta-V2
-
Joao Gante authored
rm deprecated/inoperational commands
-
Raushan Turganbay authored
* fix cached tests * fix some tests * fix pix2struct * fix
-
Sambhav Dixit authored
* Reload transformers fix form cache * add imports * add test fn for clearing import cache * ruff fix to core import logic * ruff fix to test file * fixup for imports * fixup for test * lru restore * test check * fix style changes * added documentation for usecase * fixing --------- Co-authored-by:
sambhavnoobcoder <indosambahv@gmail.com>
-
Joao Gante authored
* remove redundant test * delete another test * revert default max_length * (wrong place, moving)
-
MilkClouds authored
* feat: added warning to Trainer when label_names is not specified for PeftModel * Update trainer.py * feat: peft detectw ith `_is_peft_model` * Update src/transformers/trainer.py Co-authored-by:
Benjamin Bossan <BenjaminBossan@users.noreply.github.com> * Applied formatting in trainer.py --------- Co-authored-by:
Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
-
nhamanasu authored
* add RAdamScheduleFree optimizer * revert schedulefree version to the minimum requirement * refine is_schedulefree_available so that it can take min_version * refine documents --------- Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com>
-
Harry Mellor authored
* Add `base_model_pp_plan` to `PretrainedConfig` Signed-off-by:
Harry Mellor <19981378+hmellor@users.noreply.github.com> * Add `_pp_plan` to `PreTrainedModel` Signed-off-by:
Harry Mellor <19981378+hmellor@users.noreply.github.com> * Add both to Llama for testing Signed-off-by:
Harry Mellor <19981378+hmellor@users.noreply.github.com> * Fix type error Signed-off-by:
Harry Mellor <19981378+hmellor@users.noreply.github.com> * Update to suggested schema Signed-off-by:
Harry Mellor <19981378+hmellor@users.noreply.github.com> * `_pp_plan` keys are not patterns Signed-off-by:
Harry Mellor <19981378+hmellor@users.noreply.github.com> * Simplify schema Signed-off-by:
Harry Mellor <19981378+hmellor@users.noreply.github.com> * Fix typing error Signed-off-by:
Harry Mellor <19981378+hmellor@users.noreply.github.com> * Update input name for Llama Signed-off-by:
Harry Mellor <19981378+hmellor@users.noreply.github.com> * Add pp plan to Aria Signed-off-by:
Harry Mellor <19981378+hmellor@users.noreply.github.com> * Add pp plan to Bamba Signed-off-by:
Harry Mellor <19981378+hmellor@users.noreply.github.com> * Add pp plan to Cohere 1 & 2 Signed-off-by:
Harry Mellor <19981378+hmellor@users.noreply.github.com> * Add pp plan to diffllama and emu3 Signed-off-by:
Harry Mellor <19981378+hmellor@users.noreply.github.com> * Add pp plan to Gemma 1 & 2 Signed-off-by:
Harry Mellor <19981378+hmellor@users.noreply.github.com> * Add pp plan to GLM and GPT NeoX Signed-off-by:
Harry Mellor <19981378+hmellor@users.noreply.github.com> * Add pp plan to Granite and Helium Signed-off-by:
Harry Mellor <19981378+hmellor@users.noreply.github.com> * Add pp plan to Mistral and Mixtral Signed-off-by:
Harry Mellor <19981378+hmellor@users.noreply.github.com> * Add pp plan to OLMo 1 & 2 Signed-off-by:
Harry Mellor <19981378+hmellor@users.noreply.github.com> * Add pp plan to Phi and Phi 3 Signed-off-by:
Harry Mellor <19981378+hmellor@users.noreply.github.com> * Add pp plan for Qwen 2, 2 MoE, 2 VL and 2.5 VL Signed-off-by:
Harry Mellor <19981378+hmellor@users.noreply.github.com> * Add pp plan for Starcoder 2 Signed-off-by:
Harry Mellor <19981378+hmellor@users.noreply.github.com> * Add enum for accessing inputs and outputs Signed-off-by:
Harry Mellor <19981378+hmellor@users.noreply.github.com> * Update type hints to use tuples Signed-off-by:
Harry Mellor <19981378+hmellor@users.noreply.github.com> * Change outer list to tuple Signed-off-by:
Harry Mellor <19981378+hmellor@users.noreply.github.com> --------- Signed-off-by:
Harry Mellor <19981378+hmellor@users.noreply.github.com>
-
- 11 Feb, 2025 10 commits
-
-
Fanli Lin authored
* update awq doc * Update docs/source/en/quantization/awq.md Co-authored-by:
Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/awq.md Co-authored-by:
Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/awq.md Co-authored-by:
Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/awq.md Co-authored-by:
Steven Liu <59462357+stevhliu@users.noreply.github.com> * add note for inference --------- Co-authored-by:
Steven Liu <59462357+stevhliu@users.noreply.github.com>
-
Fanli Lin authored
fix
-
Sambhav Dixit authored
* make output_dir optional * inintaied a basic testing module to validate and verify the changes * Test output_dir default to 'tmp_trainer' when unspecified. * test existing functionality of output_dir. * test that output dir only created when needed * final check * added doc string and changed the tmp_trainer to trainer_output * amke style fixes to test file. * another round of fixup --------- Co-authored-by:
sambhavnoobcoder <indosambahv@gmail.com>
-
Arthur authored
-
Pablo Montalvo authored
* make explicit gpu dep * [run-slow] bamba
-
Hicham Tala authored
* Remove unused `max_size` variable in processor which was always `None` and triggered unnecessary deprecated warning * Remove unused `max_size` variable in processor which was always `None` and triggered unnecessary deprecated warning * Remove deprecated warnings and eliminate `max_size` usage * Test use `int` as argument for `size` Add a test to ensure test can pass successfully and backward compatibility * The test pipelines still use `max_size` Remove `max_size` from test pipelines and replace by `size` by a `Dict` with `'shortest_edge'` `'longest_edge'` as keys * Reformatting * Reformatting * Revert "Reformatting" This reverts commit c3040acee75440357cffd1f60c9d29ff5b2744b8. * Revert "Reformatting" This reverts commit ac4522e5c9a02d2d0c298295026db68ea26453df. * Revert "The test pipelines still use `max_size`" This reverts commit eaed96f041ffc32459536e1524d87f7a12ddee29. * Revert "Test use `int` as argument for `size`" This reverts commit 1925ee38c7c5eabb11832316712df1d4ba8043d0. * Revert "Remove deprecated warnings and eliminate `max_size` usage" This reverts commit d8e7e6ff9025931468fc1f3827cda1fa391003d5. * Change version `4.26` to "a future version" * Reformatting * Revert "Change version `4.26` to "a future version"" This reverts commit 2b53f9e4
-
湛露先生 authored
Signed-off-by:
zhanluxianshen <zhanluxianshen@163.com>
-
Maxim Evtush authored
* Update tools.py * Update text_generation.py * Update question_answering.py
-
Pavel Iakubovskii authored
* Add is_torch_greater_or_equal test decorator * Add common test for torch.export * Fix bit * Fix focalnet * Fix imagegpt * Fix seggpt * Fix swin2sr * Enable torch.export test for vision models * Enable test for video models * Remove json * Enable for hiera * Enable for ijepa * Fix detr * Fic conditional_detr * Fix maskformer * Enable test maskformer * Fix test for deformable detr * Fix custom kernels for export in rt-detr and deformable-detr * Enable test for all DPT * Remove custom test for deformable detr * Simplify test to use only kwargs for export * Add comment * Move compile_compatible_method_lru_cache to utils * Fix beit export * Fix deformable detr * Fix copies data2vec<->beit * Fix typos, update test to work with dict * Add seed to the test * Enable test for vit_mae * Fix beit tests * [run-slow] beit, bit, conditional_detr, data2vec, def...
-
Arthur authored
fix osme missing atols
-
- 10 Feb, 2025 13 commits
-
-
ivarflakstad authored
-
Joao Gante authored
* shape checks compatible with static cache * add test * tmp * manually turn on eager attn when we want to output attn * typo * generalize to encoder-decoder models * force compilation on cpu * tmp commit * fix static cache shape checks * models with odd caches * fix copies * shorter cache search loop * use decoder_past_key_values everywhere * better test variable names and comments * signature * rename _check_outputs into _check_generate_outputs * add comments * HybridCache future test note
-
Marc Sun authored
fix
-
kkscilife authored
fix file name Co-authored-by:
kkscilife <qa-caif-cicd@pjlab.org.cn>
-
jiqing-feng authored
* remove cross attention Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * remove is_decoder Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix pkv Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> --------- Signed-off-by:
jiqing-feng <jiqing.feng@intel.com>
-
Yoni Gozlan authored
remove multithreaded image conversion Co-authored-by:
Marc Sun <57196510+SunMarc@users.noreply.github.com>
-
Yih-Dar authored
* fix * remove * fix --------- Co-authored-by:
ydshieh <ydshieh@users.noreply.github.com>
-
Jingze Shi authored
* Add support for constant learning rate with cooldown * Add support for constant learning rate with cooldown * Add support for constant learning rate with cooldown * Add support for constant learning rate with cooldown * Add support for constant learning rate with cooldown * Add support for constant learning rate with cooldown * Add support for constant learning rate with cooldown * Add more warmup and cooldown methods to 'get_wsc_schedule' * Add more warmup and cooldown methods to 'get_wsc_schedule' * Add more warmup and cooldown methods to 'get_wsc_schedule' * Add more warmup and cooldown methods to 'get_wsc_schedule' * Add more warmup and decay methods to 'get_wsd_schedule' * support num_training_steps and num_stable_steps for get_wsd_schedule * support num_training_steps and num_stable_steps for get_wsd_schedule * get wsd scheduler before the `num_training_steps` decision * fix code_quality * Update stable branch logic * fix code_quality * Move stable stage decide to `get_wsd_schedule` * Update docstring of `get_wsd_schedule` * Update `num_train_steps` to optional * Update `num_train_steps` to optional * Update docstring of `get_wsd_schedule` * Update src/transformers/optimization.py Co-authored-by:
Marc Sun <57196510+SunMarc@users.noreply.github.com> --------- Co-authored-by:
Marc Sun <57196510+SunMarc@users.noreply.github.com>
-
Armaghan Shakir authored
* implement config and model building blocks * refactor model architechture * update model outputs * update init param to include use_fov_model * update param name in config * fix hidden_states and attentions outputs for fov * sort config * complete minor todos * update patching * update config for encoder * fix config * use correct defaults in config * update merge for compatibility with different image size * restructure encoder for custom configuration * make fov model compatible with custom config * replace word "decoder" with "fusion" * weight conversion script * fix fov squeeze * update conversion script (without test) * upload ruff image processing * create fast image processing * use torch interpolation for image processing * complete post_process_depth_estimation * config: fix imports and sort args * apply inference in weight conversion * use mllama script instead for weight conversion * clean weight conversion script * add depth-pro status in other files * fill docstring in config * formatting * more formatting * formatting with ruff * formatting with style * fix copied classes * add examples; update weight convert script * fix using check_table.py and isort * fix config docstring * add depth pro to sdpa docs * undo unintentional changes in configuration_gemma.py * minor fixes * test image processing * fixes and tests * more fixes * use output states from image_encoder instead * Revert "use output states from image_encoder instead" This reverts commit 2408ec54e4f27d2abbecdb8374e58f34d91d8e96. * make embeddings dynamic * reshape output hidden states and attentions as part of computation graph * fix ruff formating * fix docstring failure * use num_fov_head_layers in tests * update doc * check consistency with config * ruff formatting * update test case * fix ruff formatting * add tests for fov * use interpolation in postprocess * run and fix slow tests locally * use scaled_images_features for image and fov encoder * return fused_hidden_states in fusion stage * fix example * fix ruff * fix copyright license for all files * add __all__ for each file * minor fixes - fix download spell - add push_to_hub option - fix Optional type hinting - apply single loop for DepthProImageProcessor.preprocess * return list in post_process_depth_estimation * minor fixes - capitalize start of docstring - use ignore copy - fix examples - move docstring templates and custom output classes to top - remove "-> None" typehinting from __init__ - type hinting for forward passes - fix docstrings for custom output classes * fix "ruff check" * update upsample and projection * major changes: (image size and merge optimization) - add support for images of any size - optimize merge operation - remove image_size from config - use full names instead of B, C, H, W - remove interpolation from fusion stage - add interpolation after merge - move validations to config - update integration test - add type hints for functions * fix push_to_hub option in weights conversion * remove image_size in weights conversion * major changes in the architecture - remove all DepthProViT modules and support different backbones using the AutoModel API - set default use_fov_model to False - validate parameters in configuration - update interpolate function: use "nearest" for faster computation - update reshape_feature function: remove all special tokens, possible from different backbones - update merge function: use padding from config instead of merge_out_size - remove patch_to_batch and batch_to_patch conversions for now - calculate out_size dynamically in the encoder - leave head_mask calculation to the backbone - fix bugs with merge - add more comments - update tests * placeholder for unused config attributes * improve docs amid review * minor change in docs * further optimize merge * fix formatting * remove unused patch/batch convertion functions * use original F.interpolate * improve function naming * minor chages - use torch_int instead of int - use proper for newly initialized tensors - use user provided return_dict for patch_encoder - use if-else block instead in self.use_fov_model * rearchitect upsample block for improved modularity * update upsample keys in weight conversion * improve padding in merge_patches * use double-loop for merge * update comments * create feature_extractor, reduce some forward code * introduce config.use_mask_token in dinov2 * minor fixes * minor fixes for onnx * update __init__ to latest format * remove DepthProConfig.to_dict() * major changes in backbone * update config in weight conversion * formatting * converted model is fp32 * improve naming and docs for feature_extractor->reconstruct_feature_maps * minor fixes; amid review * create intermediate vars in func call * use torch.testing.assert_close * use ModuleList instead of Sequential and ModuleDict * update docs * include fov in integraiton tests * update docs * improve initialization of convolution layers * fix unused fov keys * update tests * ruff format * fix test, amid kaimming initialization * add depthpro to toctree * add residual layer to _no_split_modules * architecture rework * Update src/transformers/models/depth_pro/image_processing_depth_pro.py Co-authored-by:
Pavel Iakubovskii <qubvel@gmail.com> * Update src/transformers/models/depth_pro/image_processing_depth_pro_fast.py Co-authored-by:
Pavel Iakubovskii <qubvel@gmail.com> * update docs * improve merge_patches * use flatten with fov_output * ruff formatting * update resources section in docs Co-authored-by:
Pavel Iakubovskii <qubvel@gmail.com> * fix typo "final_kernal_size" Co-authored-by:
Pavel Iakubovskii <qubvel@gmail.com> * fix output typehint for DepthProDepthEstimator Co-authored-by:
Pavel Iakubovskii <qubvel@gmail.com> * residual operation in 2 steps Co-authored-by:
Pavel Iakubovskii <qubvel@gmail.com> * use image_size instead of global patch_size in interpolation * replace all Sequential with ModuleList * update fov * update heads * fix and update conversion script for heads * ruff formatting * remove float32 conversion * use "Fov" instead of "FOV" in class names * use "Fov" instead of "FOV" in config docs * remove prune_heads * update fusion stage * use device in examples * update processor * ruff fixes * add do_rescale in image_processor_dict * skip test: test_fast_is_faster_than_slow * ruff formatting * DepthProImageProcessorFast in other files * revert antialias removal * add antialias in BaseImageProcessorFast * Revert "revert antialias removal" This reverts commit 5caa0bd8f9f7463b98410c04e6cfe8fef3adee18. * Revert "add antialias in BaseImageProcessorFast" This reverts commit 3ae1134780ae236872985523d9c0a444eabcc179. * update processor for grouping and antialias * try test_fast_is_faster_than_slow without "skip" or "flanky" * update checkpoint * update checkpoint * use @is_flanky for processor test * update checkpoint to "apple/DepthPro-hf" --------- Co-authored-by:
Pavel Iakubovskii <qubvel@gmail.com>
-
Raushan Turganbay authored
* revert * type check
-
Raushan Turganbay authored
* update * we need batched nested input to always process correctly * update a bit * fix copies
-
Raushan Turganbay authored
allow tuples of images
-