- 31 Mar, 2025 3 commits
-
-
efsotr authored
* support passing flash_attn_kwargs when gradient_checkpointing is enabled * make modeling_deepspeek_v3.py consistent with modular_deepseek_v3.py
-
Yuan Wu authored
* Gaudi: fix the issue of is_torch_hpu_available() returns false Signed-off-by:
yuanwu <yuan.wu@intel.com> * Fix make fixup Signed-off-by:
yuanwu <yuan.wu@intel.com> * Add comments for the implicit behavior of import Signed-off-by:
yuanwu <yuan.wu@intel.com> * Update src/transformers/utils/import_utils.py * Update src/transformers/utils/import_utils.py --------- Signed-off-by:
yuanwu <yuan.wu@intel.com> Co-authored-by:
Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>
-
Bo Zheng authored
* Initial commit for Qwen3 * fix and add tests for qwen3 & qwen3_moe * rename models for tests. * fix * fix * fix and add docs. * fix model name in docs. * simplify modular and fix configuration issues * Fix the red CI: ruff was updated * revert ruff, version was wrong * fix qwen3moe. * fix * make sure MOE can load * fix copies --------- Co-authored-by:
Arthur Zucker <arthur.zucker@gmail.com>
-
- 30 Mar, 2025 1 commit
-
-
MinJu-Ha authored
* fix: manual edits * fix: resolve suggestions * Update toctree.yml
-
- 28 Mar, 2025 15 commits
-
-
Yih-Dar authored
* kenlm * kenlm * kenlm * kenlm --------- Co-authored-by:
ydshieh <ydshieh@users.noreply.github.com>
-
Joao Gante authored
* yoink * same pattern in all cache
-
Joao Gante authored
* handle jagged beams * better comment * bart -- beam search tests print special tokens * more bart test updates * more tests! * better comment
-
湛露先生 authored
Signed-off-by:
zhanluxianshen <zhanluxianshen@163.com> Co-authored-by:
Joao Gante <joaofranciscocardosogante@gmail.com>
-
Cyril Vallez authored
* up * typo * update doc * Update attention_interface.md
-
Cyril Vallez authored
* Update modeling_utils.py * Update modeling_utils.py
-
Zach Mueller authored
* Update w/ new account * DS
-
Yih-Dar authored
* fix * comment --------- Co-authored-by:
ydshieh <ydshieh@users.noreply.github.com>
-
Minho Ryu authored
* init commit * style * take comments into account * add deepseekv3 modeling * remove redundant code * apply make style * apply fix-copies * make format * add init files * rename deepseekv3 into deepseek_v3 based on its model_type * rename deepseekv3 into deepseek_v3 based on its model_type * deepseek-v3 not deepseek_v3 * set model_type as deepseek_v3 * use default docs * apply make * fill type and docstring * add rope_config_validation * use custom DeepseekV3MLP * hold code only for checkpoints congifuration; remove redundant * revise rope yarn for DeepSeek variation * rename DeepSeek-V3 * some refactoring * revise load_hook to work properly; make moe func trainable; use llama instead of mixtral * fix attention forward * use -1 for not-changing dim when to use exapnd * refactor DeepseekV3TopkRouter * use reshape_for_rope instead of load_hook; revise attention forward for TP; rename q_head_dim with qk_head_dim * register pre_hook and hook both * make style * use n_shared_experts * Update src/transformers/models/deepseek_v3/configuration_deepseek_v3.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * add test file * update modeling_file according to modular file * make style * add mapping for DeepseekV3ForSequenceClassification * remove aux_loss_alpha * add deepseek_v3 for perf * add deepseek_v3 * rename test as deepseekv3 * use tiny-deepseek-v3 * remove DeepseekV3ForSequenceClassification * cache before padding * remote output_router_logits * Revert "remote output_router_logits" This reverts commit f264f800 . * remove output_router_logits * make e_score_correction_bias as buffer * skip tests not compatible * make style * make e_score_correction_bias as buffer * use rope_interleave instead of load_hook * skip tests not compatible with MLA * add doc for rope_interleave * fix typo * remove torch.no_grad for selecting topk * fix post merge issue * mrege with main and simplify * nits * final * small fixes * fix * support TP better * stash * changes currently requires * remove synch * more fixes for TP * temp fix for TP : some attention layers's FP8 scales are too small + shared is local colwise and anything is local if FP8 because weights are used * updates to have generation work! * push most of the changes * reorder functions + call for contributions! * update readme * nits * update * ruff was updated on main * merge with main and fix copies * revert unrelated changes * route all tokens to all experts when testing to avoid no gradient iddues * finish fixing all tests * fixup * nit * clean config * last readme changes * nit * do cnit * typo * last nit * one more one more --------- Co-authored-by:
Arthur Zucker <arthur.zucker@gmail.com> Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by:
arthur@huggingface.co <arthur@ip-26-0-165-131.ec2.internal>
-
Raushan Turganbay authored
* fix fp32 BLIP2 * no need to reorder that * check for `Noneness` as well before casting dtype
-
cyyever authored
Change deprecated functions
-
湛露先生 authored
Signed-off-by:
zhanluxianshen <zhanluxianshen@163.com>
-
Yih-Dar authored
* fix * fix * fix * fix --------- Co-authored-by:
ydshieh <ydshieh@users.noreply.github.com>
-
jp authored
* Add image_token_id and video_token_id handling in Llava processors * fix: image to video * fix: correct image and video token ID handling in Llava processors * fix: improve image and video token ID handling in Llava processors
-
Manuel Faysse authored
* fix sdpa implementation * ruff * also modify 2_5 for consistency
-
- 27 Mar, 2025 20 commits
-
-
Perry Gibson authored
* bug: fully remove legacy cache from Llama * bug: fix CI issues * bug: update jetmoe model * bug: apply =check_modular_conversion.py= fix * bug: apply make fix-copies * bug: fix ruff * PR suggestions * Remove trailing commas in auto-gen files * Trivial new line removal
-
Finn-Ole Höner authored
-
cyyever authored
-
Prem Kumar M authored
Replace split with jnp's split function for flax models (#36854)
-
cyyever authored
-
cyyever authored
Fix typing for None-able variables
-
cyyever authored
* Avoid unnecessary tensor copy in loss computing * Add type
-
湛露先生 authored
Signed-off-by:
zhanluxianshen <zhanluxianshen@163.com>
-
Joao Gante authored
-
eustlb authored
* fix fft_bin_width computation * update docstring + enforce correct params * update test with correct value * udpate test * update feature extractors for concerned models * update * make * udpate docstring * udpate docstring
-
Raushan Turganbay authored
* add audio from video * typos * delete print * comments
-
Pavel Iakubovskii authored
* Fixup * trigger
-
Sungyoon Jeong authored
* Optimize to_py_obj for python-native numeric lists and scalars * Fix bug that tuple is not converted to list * Try np.array for more robust type checking * Apply review and add tests for to_py_obj
-
jiqing-feng authored
* fix pegasus init weights Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix the rest of models Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix test Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix informer init Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * init weight before checking Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix roformer tests Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix roformer tests Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> --------- Signed-off-by:
jiqing-feng <jiqing.feng@intel.com>
-
Parteek authored
* Added conversion Script * Update src/transformers/models/depth_anything/convert_distill_any_depth_to_hf.py Co-authored-by:
Pavel Iakubovskii <qubvel@gmail.com> * Updated Conversion Script * Update src/transformers/models/depth_anything/convert_distill_any_depth_to_hf.py Co-authored-by:
Pavel Iakubovskii <qubvel@gmail.com> --------- Co-authored-by:
Pavel Iakubovskii <qubvel@gmail.com>
-
Mohamed Mekkouri authored
* skip fp8 linear * add capability check * format
-
hoshi-hiyouga authored
* Update optimization.py * Update optimization.py
-
Yih-Dar authored
* fix * fix * fix --------- Co-authored-by:
ydshieh <ydshieh@users.noreply.github.com>
-
Kyle Sayers authored
support loading fp8 Signed-off-by:
Kyle Sayers <kylesayrs@gmail.com> Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com>
-
Michael Goin authored
-
- 26 Mar, 2025 1 commit
-
-
Abu Bakr Soliman authored
* push ModernBertForQuestionAnswering * update ModernBertForQuestionAnswering * update __init__ loading * set imports for ModernBertForQuestionAnswering * update ModernBertForQuestionAnswering * remove debugging logs * update init_weights method * remove custom initialization for ModernBertForQuestionAnswering * apply make fix-copies * apply make style * apply make fix-copies * append ModernBertForQuestionAnswering to the pipeline supported models * remove unused file * remove invalid autoload value * update en/model_doc/modernbert.md * apply make fixup command * make fixup * Update dummies * update usage tips for ModernBertForQuestionAnswering * update usage tips for ModernBertForQuestionAnswering * add init * add lint * add consistency * update init test * change text to trigger stuck text * use self.loss_function instead of custom loss By @Cyrilvallez Co-authored-by:
Cyril Vallez <cyril.vallez@gmail.com> * Update modeling_modernbert.py make comparable commit to even it out * Match whitespace * whitespace --------- Co-authored-by:
Matt <rocketknight1@gmail.com> Co-authored-by:
Orion Weller <wellerorion@gmail.com> Co-authored-by:
Orion Weller <31665361+orionw@users.noreply.github.com> Co-authored-by:
Cyril Vallez <cyril.vallez@gmail.com>
-