- 31 Mar, 2025 13 commits
-
-
ydshieh authored
-
ydshieh authored
-
ydshieh authored
-
cyyever authored
* Remove deprecated code * fix get_loading_attributes * fix error * skip test --------- Co-authored-by:
Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by:
Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
-
Robin Kahlow authored
rwkv: fix mask warning typo
-
Thien Tran authored
fix gemma3 embedding
-
huismiling authored
* add Cambricon MLUs support * fix mlu device rng state * up for quality check * up mlu to support fp16 * fix mlu device dependency error * fix mlu device dependency error * enable mlu device for bf16 * fix mlu device memory tracker * Cambricon support SDPA and flash_attn * MLU devices : Checks if `mlu` is available via an `cndev-based` check which won't trigger the drivers and leave mlu * Fix mlu FA2 check. Remove deepspeed-mlu check. add mlu tests support. * fix testing errors. * Merge branch 'hf/main' into main * fix get_device_count error. * fix mlu testing utils. * fix code quality and style. * switch to @require_torch_multi_accelerator
-
jiqing-feng authored
* fix whisper re-compile Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix copy Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix comment Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix copies Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * revert useless changes Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> --------- Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> Co-authored-by:
Marc Sun <57196510+SunMarc@users.noreply.github.com>
-
jiqing-feng authored
* enable tp on CPU Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * get rank from cpu Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * update Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * enable TP tests Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix comment Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * em print Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix model id Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix conflict Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> * fix index and add doc Signed-off-by:
jiqing-feng <jiqing.feng@intel.com> --------- Signed-off-by:
jiqing-feng <jiqing.feng@intel.com>
-
Qubitium-ModelCloud authored
fix 4090/ada not detected as having FP8 support Signed-off-by:
Qubitium <qubitium@modelcloud.ai> Co-authored-by:
Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by:
Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
-
efsotr authored
* support passing flash_attn_kwargs when gradient_checkpointing is enabled * make modeling_deepspeek_v3.py consistent with modular_deepseek_v3.py
-
Yuan Wu authored
* Gaudi: fix the issue of is_torch_hpu_available() returns false Signed-off-by:
yuanwu <yuan.wu@intel.com> * Fix make fixup Signed-off-by:
yuanwu <yuan.wu@intel.com> * Add comments for the implicit behavior of import Signed-off-by:
yuanwu <yuan.wu@intel.com> * Update src/transformers/utils/import_utils.py * Update src/transformers/utils/import_utils.py --------- Signed-off-by:
yuanwu <yuan.wu@intel.com> Co-authored-by:
Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>
-
Bo Zheng authored
* Initial commit for Qwen3 * fix and add tests for qwen3 & qwen3_moe * rename models for tests. * fix * fix * fix and add docs. * fix model name in docs. * simplify modular and fix configuration issues * Fix the red CI: ruff was updated * revert ruff, version was wrong * fix qwen3moe. * fix * make sure MOE can load * fix copies --------- Co-authored-by:
Arthur Zucker <arthur.zucker@gmail.com>
-
- 30 Mar, 2025 1 commit
-
-
MinJu-Ha authored
* fix: manual edits * fix: resolve suggestions * Update toctree.yml
-
- 28 Mar, 2025 15 commits
-
-
Yih-Dar authored
* kenlm * kenlm * kenlm * kenlm --------- Co-authored-by:
ydshieh <ydshieh@users.noreply.github.com>
-
Joao Gante authored
* yoink * same pattern in all cache
-
Joao Gante authored
* handle jagged beams * better comment * bart -- beam search tests print special tokens * more bart test updates * more tests! * better comment
-
湛露先生 authored
Signed-off-by:
zhanluxianshen <zhanluxianshen@163.com> Co-authored-by:
Joao Gante <joaofranciscocardosogante@gmail.com>
-
Cyril Vallez authored
* up * typo * update doc * Update attention_interface.md
-
Cyril Vallez authored
* Update modeling_utils.py * Update modeling_utils.py
-
Zach Mueller authored
* Update w/ new account * DS
-
Yih-Dar authored
* fix * comment --------- Co-authored-by:
ydshieh <ydshieh@users.noreply.github.com>
-
Minho Ryu authored
* init commit * style * take comments into account * add deepseekv3 modeling * remove redundant code * apply make style * apply fix-copies * make format * add init files * rename deepseekv3 into deepseek_v3 based on its model_type * rename deepseekv3 into deepseek_v3 based on its model_type * deepseek-v3 not deepseek_v3 * set model_type as deepseek_v3 * use default docs * apply make * fill type and docstring * add rope_config_validation * use custom DeepseekV3MLP * hold code only for checkpoints congifuration; remove redundant * revise rope yarn for DeepSeek variation * rename DeepSeek-V3 * some refactoring * revise load_hook to work properly; make moe func trainable; use llama instead of mixtral * fix attention forward * use -1 for not-changing dim when to use exapnd * refactor DeepseekV3TopkRouter * use reshape_for_rope instead of load_hook; revise attention forward for TP; rename q_head_dim with qk_head_dim * register pre_hook and hook both * make style * use n_shared_experts * Update src/transformers/models/deepseek_v3/configuration_deepseek_v3.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * add test file * update modeling_file according to modular file * make style * add mapping for DeepseekV3ForSequenceClassification * remove aux_loss_alpha * add deepseek_v3 for perf * add deepseek_v3 * rename test as deepseekv3 * use tiny-deepseek-v3 * remove DeepseekV3ForSequenceClassification * cache before padding * remote output_router_logits * Revert "remote output_router_logits" This reverts commit f264f800 . * remove output_router_logits * make e_score_correction_bias as buffer * skip tests not compatible * make style * make e_score_correction_bias as buffer * use rope_interleave instead of load_hook * skip tests not compatible with MLA * add doc for rope_interleave * fix typo * remove torch.no_grad for selecting topk * fix post merge issue * mrege with main and simplify * nits * final * small fixes * fix * support TP better * stash * changes currently requires * remove synch * more fixes for TP * temp fix for TP : some attention layers's FP8 scales are too small + shared is local colwise and anything is local if FP8 because weights are used * updates to have generation work! * push most of the changes * reorder functions + call for contributions! * update readme * nits * update * ruff was updated on main * merge with main and fix copies * revert unrelated changes * route all tokens to all experts when testing to avoid no gradient iddues * finish fixing all tests * fixup * nit * clean config * last readme changes * nit * do cnit * typo * last nit * one more one more --------- Co-authored-by:
Arthur Zucker <arthur.zucker@gmail.com> Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by:
arthur@huggingface.co <arthur@ip-26-0-165-131.ec2.internal>
-
Raushan Turganbay authored
* fix fp32 BLIP2 * no need to reorder that * check for `Noneness` as well before casting dtype
-
cyyever authored
Change deprecated functions
-
湛露先生 authored
Signed-off-by:
zhanluxianshen <zhanluxianshen@163.com>
-
Yih-Dar authored
* fix * fix * fix * fix --------- Co-authored-by:
ydshieh <ydshieh@users.noreply.github.com>
-
jp authored
* Add image_token_id and video_token_id handling in Llava processors * fix: image to video * fix: correct image and video token ID handling in Llava processors * fix: improve image and video token ID handling in Llava processors
-
Manuel Faysse authored
* fix sdpa implementation * ruff * also modify 2_5 for consistency
-
- 27 Mar, 2025 11 commits
-
-
Perry Gibson authored
* bug: fully remove legacy cache from Llama * bug: fix CI issues * bug: update jetmoe model * bug: apply =check_modular_conversion.py= fix * bug: apply make fix-copies * bug: fix ruff * PR suggestions * Remove trailing commas in auto-gen files * Trivial new line removal
-
Finn-Ole Höner authored
-
cyyever authored
-
Prem Kumar M authored
Replace split with jnp's split function for flax models (#36854)
-
cyyever authored
-
cyyever authored
Fix typing for None-able variables
-
cyyever authored
* Avoid unnecessary tensor copy in loss computing * Add type
-
湛露先生 authored
Signed-off-by:
zhanluxianshen <zhanluxianshen@163.com>
-
Joao Gante authored
-
eustlb authored
* fix fft_bin_width computation * update docstring + enforce correct params * update test with correct value * udpate test * update feature extractors for concerned models * update * make * udpate docstring * udpate docstring
-
Raushan Turganbay authored
* add audio from video * typos * delete print * comments
-