1. 12 Mar, 2025 8 commits
    • Nicolas Patry's avatar
      90a3468f
    • Cyril Vallez's avatar
      [core] Large/full refactor of `from_pretrained` (#36033) · 071a161d
      Cyril Vallez authored
      * squash everything together
      start to simplify inner logic
      
      Update modeling_utils.py
      
      Update modeling_utils.py
      
      Update modeling_utils.py
      
      Update modeling_utils.py
      
      continue refactor
      
      fix
      
      small fixes
      
      add type hints/docstring
      
      Update modeling_utils.py
      
      remove _fast_init
      
      keep improving
      
      Update modeling_utils.py
      
      Update modeling_utils.py
      
      new first tp loading version
      
      style
      
      fix weird in-place op
      
      trigger CIs
      
      Update modeling_utils.py
      
      much clearer renaming of keys
      
      fix
      
      update
      
      Update test_modeling_common.py
      
      trigger CIs
      
      update
      
      update
      
      style
      
      Update modeling_utils.py
      
      Update modeling_utils.py
      
      Update modeling_utils.py
      
      fix
      
      fast download first prototype
      
      remove old function
      
      remove old functions
      
      Remove unused function and move back _get_tp_registry
      
      fix tp plan registry
      
      simplify
      
      CIs
      
      Update hub.py
      
      Update modeling_utils.py
      
      simplify
      
      simplify renaming logic
      
      remove unused check
      
      add sanity check back (a test depends on it)
      
      Update modeling_utils.py
      
      finalize sound renaming logic
      
      style
      
      add forgotten check
      
      Update modeling_utils.py
      
      add key_mapping keyword
      
      style
      
      Update modeling_utils.py
      
      add comment
      
      minor updates
      
      minor change for clarity
      
      fix small prefix issue and simplify
      
      style
      
      trigger CIs
      
      typo fix
      
      Post rebase fix
      
      post rebase cleanup
      
      simplify tp
      
      typo
      
      oupsi
      
      typo
      
      correctly escape
      
      improvements based on Marc's review
      
      finalize Marc's review comments
      
       squash everything
      
      * improve
      
      * Update modeling_utils.py
      
      * Update modeling_utils.py
      
      * fix
      
      * Update modeling_utils.py
      
      * Update modeling_utils.py
      
      * style
      
      * Update modeling_utils.py
      
      * simplify
      
      * style
      
      * Update modeling_utils.py
      
      * Update modeling_utils.py
      
      * Update modeling_utils.py
      
      * Update modeling_utils.py
      
      * Update modeling_utils.py
      
      * Update modeling_utils.py
      
      * fix dtype issue
      
      * Update modeling_utils.py
      
      * style
      
      * remove test that does not make sense
      
      * style
      
      * small fixes
      
      * style
      
      * fix
      
      * cleanup after rebase
      
      * style
      
      * typo
      
      * escape
      
      * tp for task specific top modules
      
      * Update modeling_utils.py
      
      * Update modeling_utils.py
      
      * fix allocation
      
      * CIs
      
      * CIs
      
      * CIs
      
      * improve docstring
      
      * CIs
      
      * Update modeling_utils.py
      
      * fix
      071a161d
    • Marc Sun's avatar
      Fix bnb regression due to empty state dict (#36663) · 7652804d
      Marc Sun authored
      fix
      7652804d
    • Joao Gante's avatar
      [CI] gemma 3 `make fix-copies` (#36664) · 994cad27
      Joao Gante authored
      * make fixup
      
      * trigger ci
      994cad27
    • Arthur's avatar
      fix block mask typing (#36661) · 2829013d
      Arthur authored
      
      * fix block mask typing
      
      * updated
      
      Co-authored-by: default avatarCyril Vallez <cyril.vallez@gmail.com>
      
      * gemma
      
      * fix
      
      ---------
      
      Co-authored-by: default avatarCyril Vallez <cyril.vallez@gmail.com>
      2829013d
    • Nicolas Patry's avatar
      135db69f
    • Ilyas Moutawwakil's avatar
      HPU support (#36424) · 89f69560
      Ilyas Moutawwakil authored
      * test
      
      * fix
      
      * fix
      
      * skip some and run some first
      
      * test fsdp
      
      * fix
      
      * patches for generate
      
      * test distributed
      
      * copy
      
      * don't test distributed loss for hpu
      
      * require fp16 and run first
      
      * changes from marc's PR fixing zero3
      
      * better alternative
      
      * return True when fp16 support on gaudi without creating bridge
      
      * fix
      
      * fix tested dtype in deepspeed inference test
      
      * test
      
      * fix
      
      * test
      
      * fix
      
      * skip
      
      * require fp16
      
      * run first fsdp
      
      * Apply suggestions from code review
      
      * address comments
      
      * address comments and refactor test
      
      * reduce precison
      
      * avoid doing gaudi1 specific stuff in the genreation loop
      
      * document test_gradient_accumulation_loss_alignment_with_model_loss test a bit more
      89f69560
    • Ryan Mullins's avatar
      Gemma3 (#36658) · 50d3530a
      Ryan Mullins authored
      
      * Fix converter
      
      * [Broken] Adds Gemma 3 to Hugging Face Transformers
      
      * Consolidating Config and Processor params across impls
      
      * Sorting out configuration parameters. Adds qk_norm before RoPE. Still not sure if RoPE is right.
      
      * Additional plumbing for CausalLM and ConditionalGeneration variants
      
      * incomplete draft of Orbax conversion script
      
      * More complete checkpoint conversion
      
      * Supporting Gemma 3 1B checkpoints
      
      * Updating RoPE for multiple frequencies
      
      * Adjustments to rotary embedder
      
      * Proof of life for text-only operation
      
      * Updating the conversion script to handle multimodal projection weights
      
      * Fixing tet-only conversions
      
      * Cleaner conversion script with multimodal support and a simpler processor
      
      * Additional refatcors to the Gemma3Processor
      
      * Simplified Processor to work over text representations
      
      * Updated conversion script to join text and vision embeddings at converion time
      
      * Logging for debugging
      
      * Update src/transformers/models/gemma2/modeling_gemma2.py
      
      Co-authored-by: default avatarJoshua Lochner <admin@xenova.com>
      
      * Removed extraneous Config params
      
      * Switching to fast tokenizer for checkpoint conversions
      
      * isolating siglip for performance tetsing
      
      * Minor changes for debugging tests against baselines
      
      * Adding average pooling for soft tokens
      
      * Updating processor code to enable simpler embedding interleaving for arbitrary number of images in prompts
      
      * Updating conversion script for ShieldGemma 2 conversion compatibility
      
      * Allow disable_compile to be provided as a kwarg
      
      * Refresh from modular
      
      * Updated conversion script and corrected sliding window
      
      * Fix type mismatch in cache_position (#4)
      
      * Fix dtype (#5)
      
      * Fix type mismatch in cache_position
      
      * Actually fix in the modular file
      
      Co-authored-by: default avatarAritra Roy Gosthipaty <aritra.born2fly@gmail.com>
      
      ---------
      
      Co-authored-by: default avatarAritra Roy Gosthipaty <aritra.born2fly@gmail.com>
      
      * fixes for embedding table overflow and missing image_soft_token_mask from Gemma3Processor
      
      * Adding 2D pooling for image embeddings
      
      * Revert "Adding 2D pooling for image embeddings"
      
      This reverts commit 65350cf531296f050b2078a5b8e46f61642b2648.
      
      * Gemma3 average pooling changed from 1D to 2D
      
      * Major refactor to Gemma3MultimodalInputProjection
      
      * Updating Gemm 3 Auto* registrations
      
      * Add option to save Gemma 3 chat template with tokenizer during weights conversion
      
      * Removing unused imports
      
      * Moving out-of-vocab handling from Gemma3Processor to Gemma3ForConditionalGeneration
      
      * Removing duplicate config property
      
      * Removing final logit softcapping and 1-indexing of position ids
      
      * Fixing image processor config and none --> None typo
      
      * Fixing sliding window size for 1B
      
      * Updating image_mean and image_std in Image Processor
      
      * Attention masking changed to lower triangular
      
      * Moving image special tokens to conversion script
      
      * Mirror image processor defaults from conversion script into Gemma3ProcessorKwargs
      
      * Remove special token variables from symbol space
      
      * Moving image soft token mask computation from Gemma3Processor to Gemma3ForConditionalGeneration
      
      * tie lm_head and embedding weights
      
      Co-authored-by: default avatarMatthew Douglas <38992547+matthewdouglas@users.noreply.github.com>
      
      * Correct tied weights in Gemma3CausalLM
      
      * iterative bidirectional attention
      
      * resolving merge conflicts
      
      * Reverting to Gemma 2 HybridCache with sldiing window support and a sliding_window_pattern of 6
      
      * Correcting RoPE scaling
      
      * clean up first pass, dummy model geenration works
      
      * final clean up before fixing tests
      
      * causal lm test works, so fine
      
      * Fix conversion
      
      * Update src/transformers/models/gemma3/processing_gemma3.py
      
      * model tests are happy
      
      * processor tests are happy
      
      * image processing tests added
      
      * fixup
      
      * Fix pre-processing in conversion
      
      * Inputs merging
      
      * Do not normalize vision embeddings
      
      * Apply Ryan's (and team) changes to attention
      
      * token type ids + mask
      
      * template
      
      * move embed scale, add rope scale, fix tests
      
      * Add chat template to tokenizer
      
      * Use prefix for causal model loading
      
      * use existing code for sliding mask from gemma2
      
      * self.embed_tokens already normalizes
      
      * Correcting Gemma3TextConfig parameters in conversion script
      
      * typo, modular overwrites my fixes
      
      * enable device map for text model
      
      * Conversion updates
      
      * ultra nit: no einsums
      
      * update image token
      
      * copy deepcopy config + some docs
      
      * add some test, still WIP
      
      * Refactoring --include_chat_tempalte logic in converter
      
      * Update src/transformers/models/gemma3/modular_gemma3.py
      
      Co-authored-by: default avatarXuan-Son Nguyen <thichthat@gmail.com>
      
      * Add eos tokens for instruct models
      
      * dump so i can work on dgx
      
      * Removing add_bos by default
      
      * dump
      
      * add fast im proc
      
      * docs for PaS + fixup
      
      * another fixup
      
      * one more fixup
      
      * fix tests
      
      * Inverting prior BOS change
      
      * ultra nit
      
      * Reverting to Tokenizer saved with add_bos_token=True and chat template starting with BOS
      
      * resize embeds, remove sqrt, add slow test outputs
      
      * FA2 but quality is meh
      
      * nit
      
      * skip FA2, no idea what happened
      
      * last bit for green CI
      
      * please, green CI for docs
      
      * T_T
      
      * Fix for Gemma3 logits
      
      * Support both options for system prompt
      
      * Update src/transformers/models/gemma3/image_processing_gemma3_fast.py
      
      Co-authored-by: default avatarPedro Cuenca <pedro@huggingface.co>
      
      * Update docs/source/en/model_doc/gemma3.md
      
      Co-authored-by: default avatarPedro Cuenca <pedro@huggingface.co>
      
      * Update docs/source/en/model_doc/gemma3.md
      
      Co-authored-by: default avatarPedro Cuenca <pedro@huggingface.co>
      
      * Update docs/source/en/model_doc/gemma3.md
      
      Co-authored-by: default avatarPedro Cuenca <pedro@huggingface.co>
      
      * Update docs/source/en/model_doc/gemma3.md
      
      Co-authored-by: default avatarPedro Cuenca <pedro@huggingface.co>
      
      * Update docs/source/en/model_doc/gemma3.md
      
      Co-authored-by: default avatarPedro Cuenca <pedro@huggingface.co>
      
      * Docs updates now that assets are live
      
      * Style fixes
      
      ---------
      
      Co-authored-by: default avatarJoshua Lochner <admin@xenova.com>
      Co-authored-by: default avatarPedro Cuenca <pedro@huggingface.co>
      Co-authored-by: default avatarAritra Roy Gosthipaty <aritra.born2fly@gmail.com>
      Co-authored-by: default avatarMayank Chaturvedi <imayank@google.com>
      Co-authored-by: default avatarMatthew Douglas <38992547+matthewdouglas@users.noreply.github.com>
      Co-authored-by: default avatarraushan <raushan@huggingface.co>
      Co-authored-by: default avatarRaushan Turganbay <raushan.turganbay@alumni.nu.edu.kz>
      Co-authored-by: default avatarXuan-Son Nguyen <thichthat@gmail.com>
      Co-authored-by: default avatarLysandre <hi@lysand.re>
      50d3530a
  2. 11 Mar, 2025 10 commits
  3. 10 Mar, 2025 4 commits
  4. 07 Mar, 2025 8 commits
  5. 06 Mar, 2025 10 commits