1. 08 Apr, 2025 10 commits
  2. 07 Apr, 2025 24 commits
  3. 05 Apr, 2025 5 commits
    • Lysandre's avatar
      v4.52.0.dev0 · d1b92369
      Lysandre authored
      d1b92369
    • Arthur's avatar
      Add llama4 (#37307) · 25b7f272
      Arthur authored
      * remove one of the last deps
      
      * update fast image processor after refactor
      
      * styling
      
      * more quality of life improvements
      
      * nit
      
      * update
      
      * cleanups
      
      * some cleanups
      
      * vllm updates
      
      * update fake image token
      
      * [convert] Fix typo
      
      * [convert] Strip extraneous bytes from shards
      
      * [convert] Minor fixes
      
      * [convert] Use num_experts
      
      * multi-image fixes in modeling + processor
      
      * fixup size
      
      * 128 experts
      
      * Use default rope
      
      * Unfuse mlp
      
      * simplify a lot inputs embeds merging
      
      * remove .item() 👀
      
      
      
      * fix from review
      
      * Address feedback
      
      * Use None "default" for rope_scaling. Add eot.
      
      * set seed
      
      * return aspect ratios and bug fixes
      
      * Moe 128 rebased (#8)
      
      * 128 experts
      
      * Use default rope
      
      * Unfuse mlp
      
      * Address feedback
      
      * Use None "default" for rope_scaling. Add eot.
      
      * Meta/llama quant compat (#7)
      
      * add quant compatible model & conversion code for llama4
      
      * fix a few issues
      
      * fix a few issues
      
      * minor type mapping fix
      
      ---------
      
      Co-authored-by: default avatarLu Fang <fanglu@fb.com>
      
      * use a new config parameter to determine which model definition to use for MoE
      
      ---------
      
      Co-authored-by: default avatarPedro Cuenca <pedro@huggingface.co>
      Co-authored-by: default avatarLu Fang <fanglu@fb.com>
      
      * un-comment write_tokenizer from converting script
      
      * remove un-used imports
      
      * [llama4] Pop aspect_ratios from image processor output in Llama4Processor
      
      Signed-off-by: default avatarJon Swenson <jmswen@gmail.com>
      
      * Fix parameter_count name
      
      * Update src/transformers/models/llama4/configuration_llama4.py
      
      * nit
      
      * Add changes for no_rope, moe_layers, chunked attention. Just need to test all
      
      * Update src/transformers/models/llama4/image_processing_llama4_fast.py
      
      * nit
      
      * fix post merge with main
      
      * support flex attention
      
      * fixes
      
      * fix
      
      * add layer
      
      * small updates
      
      * rebase and delete llm_compressor
      
      * nit
      
      * [llama4/mm] Add back <|image|> token that delimits global tile
      
      * [llama4/mm] Fix Llama 4 image processing unit tests
      
      * add explicit dtype
      
      Signed-off-by: default avatarJon Swenson <jmswen@gmail.com>
      
      * sdpa works
      
      * comment todo small
      
      * fix model loading
      
      Signed-off-by: default avatarZijing Liu <liuzijing2014@gmail.com>
      
      * revert
      
      * nits
      
      * small fix for TP on 1 node
      
      * Read new params from config
      
      * Add <|eom|>
      
      * lol don't know how this got here
      
      * adding fp8
      
      * Save processor, fix chat template
      
      * style
      
      * Add boi/eoi tokens
      
      We don't use them.
      
      * fixes for now flex seems to work :)
      
      * updates
      
      * nits
      
      * updates
      
      * missking keys
      
      * add context parallel
      
      * update
      
      * update
      
      * fix
      
      * nits
      
      * add worldsize and make eager attn work for vision
      
      * Ignore new key present in base models
      
      * add tp_plan
      
      * fix nope
      
      Signed-off-by: default avatarZijing Liu <liuzijing2014@gmail.com>
      
      * minor fix
      
      Signed-off-by: default avatarZijing Liu <liuzijing2014@gmail.com>
      
      * Clean up Llama4 vision model
      
      * current updates
      
      * add support for `attn_temperature_tuning`
      
      * add floor scale
      
      * add missing attn scales
      
      * push what works, dirty trick for the device synch
      
      * oups
      
      * Fix pad_token_id
      
      See
      https://huggingface.co/ll-re/Llama-4-Scout-17B-16E/discussions/2/files
      Confirmed in the original codebase.
      
      * fix causallml loading
      
      * rm
      
      * fix tied-weights
      
      * fix sdpa
      
      * push current version
      
      * should work with both short and long
      
      * add compressed_tensos & fix fbgemm tp
      
      * Fix flex impl
      
      * style
      
      * chunking
      
      * try to revert the potentially breaking change
      
      * fix auto factory
      
      * fix shapes in general
      
      * rm processing
      
      * commit cache utils cleanup
      
      * Fix context length
      
      * fix
      
      * allocate
      
      * update tp_plan
      
      * fix SDPA!
      
      * Add support for sparse `Llama4TextMoe` layer from the kernel hub
      
      * cleanup
      
      * better merge
      
      * update
      
      * still broken fixing now
      
      * nits
      
      * revert print
      
      * Write max_position_embeddings and max_model_length
      
      * Update modeling_llama4.py
      
      * Save attention_chunk_size
      
      * Sync eos terminators
      
      * Read initializer_range
      
      * style
      
      * remove `dict`
      
      * fix
      
      * eager should use `chunked_attention_mask`
      
      * revert
      
      * fixup
      
      * fix config
      
      * Revert "Merge pull request #36 from huggingface/sparse-llama4-moe"
      
      This reverts commit ccda19f0, reversing
      changes made to a515579a
      
      .
      
      * Fix typo and remove warning with compiled flex and chunked prefill
      
      * Fix MoE vs FF (#41)
      
      * fix
      
      * Use correct no_rope_layers if provided one is empty list
      
      * update tests
      
      * fix
      
      * skipping some tests
      
      * fix fp8 loading
      
      Signed-off-by: default avatarZijing Liu <liuzijing2014@gmail.com>
      
      * fix text geneartion pipeline
      
      Signed-off-by: default avatarZijing Liu <liuzijing2014@gmail.com>
      
      * eager needs 4D mask
      
      * fix
      
      * Some cleanup
      
      * fix
      
      * update
      
      * fix
      
      * replace correctly module
      
      * patch
      
      * modulelist
      
      * update
      
      * update
      
      * clean up
      
      * Don't move to `cuda:0` in distributed mode
      
      * restrict to compressed tensors for now
      
      * rm print
      
      * Docs!
      
      * Fixes
      
      * Update docs/source/en/model_doc/llama4.md
      
      Co-authored-by: default avatarPedro Cuenca <pedro@huggingface.co>
      
      * Fixes
      
      * cuda graph fix
      
      * revert some stuff
      
      * fixup
      
      * styling
      
      * Update src/transformers/models/llama4/modeling_llama4.py
      
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * fixup
      
      * commit licence, cleanup here and there and style
      
      * more styling changes
      
      * fix dummies
      
      * fix and clean docstrings
      
      * remove comment
      
      * remove warning
      
      * Only fast image processor is supported
      
      * nit
      
      * trigger CI
      
      * fix issue with flex encoder
      
      * fix dynamic cache
      
      * Code quality
      
      * Code quality
      
      * fix more tests for now
      
      * Code quality
      
      * Code quality
      
      * Nuke bunch of failing stuff
      
      * Code quality
      
      * Code quality
      
      * cleanup removal of slow image processor
      
      * ruff fix fast image processor
      
      * fix
      
      * fix styling
      
      * Docs
      
      * Repo consistency
      
      * Repo consistency
      
      * fix sliding window issue
      
      * separate llama cache
      
      * styling
      
      * Repo consistency
      
      * Repo consistency
      
      * push waht works
      
      * L4 Repo consistency
      
      * Docs
      
      * fix last last alst alst alst alstsaltlsltlaslt
      
      ---------
      
      Signed-off-by: default avatarJon Swenson <jmswen@gmail.com>
      Signed-off-by: default avatarZijing Liu <liuzijing2014@gmail.com>
      Co-authored-by: default avataryonigozlan <yoni.gozlan10@gmail.com>
      Co-authored-by: default avatarPedro Cuenca <pedro@huggingface.co>
      Co-authored-by: default avatarPablo Montalvo <pablo.montalvo.leroux@gmail.com>
      Co-authored-by: default avatarPablo Montalvo <39954772+molbap@users.noreply.github.com>
      Co-authored-by: default avatarKeyun Tong <tongkeyun@gmail.com>
      Co-authored-by: default avatarZijing Liu <liuzijing2014@users.noreply.github.com>
      Co-authored-by: default avatarLu Fang <fanglu@fb.com>
      Co-authored-by: default avatarZijing Liu <liuzijing2014@gmail.com>
      Co-authored-by: default avatarJon Swenson <jmswen@gmail.com>
      Co-authored-by: default avatarjmswen <jmswen@users.noreply.github.com>
      Co-authored-by: default avatarMekkCyber <mekk.cyber@gmail.com>
      Co-authored-by: default avatarMohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
      Co-authored-by: default avatarMohit Sharma <mohit21sharma.ms@gmail.com>
      Co-authored-by: default avatarYong Hoon Shin <yhshin@meta.com>
      Co-authored-by: default avatarMarc Sun <marc@huggingface.co>
      Co-authored-by: default avatardrisspg <drisspguessous@gmail.com>
      Co-authored-by: default avatarCyril Vallez <cyril.vallez@gmail.com>
      Co-authored-by: default avatarDaniël de Kok <me@danieldk.eu>
      Co-authored-by: default avatarLysandre <hi@lysand.re>
      Co-authored-by: default avatarYe (Charlotte) Qi <ye.charlotte.qi@gmail.com>
      Co-authored-by: default avatarydshieh <ydshieh@users.noreply.github.com>
      25b7f272
    • Lysandre Debut's avatar
      Hf Xet extra (#37305) · aa40fda3
      Lysandre Debut authored
      * Hf Xet extra
      
      * Hf Xet extra
      aa40fda3
    • Cyril Vallez's avatar
      Fix deepspeed loading (part 2) (#37306) · e9457158
      Cyril Vallez authored
      * fix
      
      * Update modeling_utils.py
      
      * Update modeling_utils.py
      
      * oups remove print
      e9457158
    • Cyril Vallez's avatar
      Fix deepspeed loading (#37281) · 84aa13dd
      Cyril Vallez authored
      * Update modeling_utils.py
      
      * Update modeling_utils.py
      
      * fix and remove all imports
      
      * Update modeling_utils.py
      
      * Update modeling_utils.py
      
      * style
      
      * Update modeling_utils.py
      84aa13dd
  4. 04 Apr, 2025 1 commit