1. 15 Apr, 2025 7 commits
    • Pavel Iakubovskii's avatar
      Fix missing return type for MLCD docs (#37527) · 356b3cd7
      Pavel Iakubovskii authored
      * Fix missing return type for docs
      
      * trigger
    • Manuel de Prada Corral's avatar
      fix: Restore explicit error surfacing for unexpected hub exceptions (#37525) · 0ad3710d
      Manuel de Prada Corral authored
      
      * fix: Restore explicit error surfacing for unexpected hub exceptions
      
      Prior to PR #36033, unexpected exceptions (e.g., ModuleNotFoundError) during hub model loading were not swallowed silently. They either matched specific except blocks or were raised.
      
      After #36033, a catch-all except Exception block was introduced without a fallback else, causing unknown errors to be silently ignored and leading to misleading downstream behavior.
      
      This commit adds an `else: raise e` to ensure only explicitly handled exceptions are suppressed. All others are surfaced, restoring pre-4.50 behavior and aiding in debugging and dependency visibility.
      
      Co-authored-by: default avatarCyril Vallez <cyril.vallez@huggingface.co>
      0ad3710d
    • Parteek's avatar
      Add Fast Yolos Processor (#37292) · f6c79f76
      Parteek authored
      
      * Add Fast Yolos Processor
      
      * Update modular file
      
      * Fix copies
      
      ---------
      
      Co-authored-by: default avatarYoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
      f6c79f76
    • Pavel Belevich's avatar
      Llama4: remove redundant transpose of router_logits (#37468) · ecaeee66
      Pavel Belevich authored
      * Llama4: remove redundant transpose of router_logits
      
      * Fix formatting
      ecaeee66
    • Huajie Tan's avatar
      Add MLCD model (#36182) · 6f7ea1cf
      Huajie Tan authored
      * Add MLCD model
      
      * Update codes for auto-mapping
      
      * Add test scripts for MLCD
      
      * Update doc for MLCD model
      
      * Fix import error
      
      * Fix import error
      
      * Fix CI error for attention_outputs
      
      * Fix code style for CI
      
      * Fix code style for CI
      
      * Fix code style for CI
      
      * Fix code style for CI
      
      * Fix code style for CI
      
      * Fix CI error for initialization
      
      * Fix code style for CI
      
      * Fix code style for CI
      
      * Reformat codes and docs for CI test
      
      * Reformat codes and docs for CI test
      
      * Remove unused attributes for CI test
      
      * Fix style for CI test
      
      * List MLCD in flash_attn doc
      
      * Fix: typos, modulars, refactors from suggestions
      
      * Refactoring convert_mlcd_weights_to_hf.py from suggestions
      
      * Fix: docs conflicts
      
      * Fix error for CI test
      
      * Fix style for CI test
      
      * Add integration test for MLCD
      
      * Refactoring by class inheritance
      
      * Fix: refactor attention interface, adjust codes
      
      * Fix: merging conflicts
      
      * Fix: merging conflicts
      
      * Fix: style for CI test
      
      * Fix: style for CI test
      
      * Fix: set test_resize_embeddings to be False
      
      * Fix: initializer for CI test
      
      * Fix: conflicts, CI test, warning and refactoring
      
      * Fix: merging conflicts
      
      * Refactor
      
      * Update docs
      
      * Fix mistakes
      
      * Remove unused args and fix multi-gpu error
      
      * Revert position_embeddings
      
      * Solve conflicts
      
      * Solve conflicts
      
      * Remove dummy
      
      * Update _init_weights
      
      * Update _init_weights
      
      * Update _init_weights for CI test
      6f7ea1cf
    • AinL's avatar
      Change default value of `attn_temperature_tuning` (#37501) · d6ac923a
      AinL authored
      fix: change default value of `attn_temperature_tuning`
      d6ac923a
    • Cyril Vallez's avatar
      Detect and use device context manager or global device in `from_pretrained` (#37216) · c8e0e603
      Cyril Vallez authored
      * Update modeling_utils.py
      
      * improve
      
      * Update modeling_utils.py
      
      * Update test_modeling_common.py
      
      * Update test_modeling_timm_backbone.py
      
      * Update test_modeling_common.py
      
      * Update test_modeling_common.py
      
      * Update test_modeling_common.py
      
      * Update test_modeling_common.py
      
      * CIs
      c8e0e603
  2. 14 Apr, 2025 24 commits
  3. 11 Apr, 2025 9 commits
    • Eric Wiener's avatar
      Fix typing issues with SigLip2 (#37356) · 953196a4
      Eric Wiener authored
      
      * Fix issues
      
      * Fix comment
      
      ---------
      
      Co-authored-by: default avatarPavel Iakubovskii <qubvel@gmail.com>
      953196a4
    • Joao Gante's avatar
      [agents] remove agents 🧹 (#37368) · aaf129cd
      Joao Gante authored
      aaf129cd
    • Matt's avatar
      Delete hubconf.py (#37455) · 69e6ddf2
      Matt authored
      * Delete hubconf.py
      
      * Trigger tests
      69e6ddf2
    • Alex Brooks's avatar
      Add Granite Speech Support (#36801) · 623d395a
      Alex Brooks authored
      
      * First pass at speech granite
      
      Add encoder / projector, rename things
      
      * Combine into one model file with causal lm outputs for forward
      
      * Add loss calc
      
      * Fix config loading
      
      Signed-off-by: default avatarAlex-Brooks <Alex.brooks@ibm.com>
      
      * Split new / old loading logic
      
      * Use transformers integration for loading peft adapters
      
      * Add generation wrapper for selective lora enablement
      
      * Add note for qformer encoder automodel
      
      * Guard torch/audio imports in feature extractor
      
      * Handle granite speech autoclasses
      
      * Handle optional deps in package structure for granite speech
      
      * Add granite pretrained model def for init
      
      * Add dummy objects for torch/torchaudio
      
      * Add tests for granite speech processor
      
      * Minor formatting fixes and refactoring
      
      * Add options for falling back to config in forward
      
      * Tentative model docstrings for granite speech
      
      * Fix config type
      
      * Remove legacy load
      
      * Allow non-lora variants for granite speech
      
      * Override weight tying for llm
      
      * Use text config instead of llm config
      
      * Add output embeddings getter to fix weight tying
      
      * Fix relative imports
      
      * computing the number of audio features, based on the raw audio sequence.
      
      * collating audio inputs, and keeping the original lengths.
      
      * asserted we have text. otherwise we can't specify the audio special token.
      
      * assering the number of audio-symbols/audios match correctly.
      running get validated_audios only when audio is present
      
      * indentation bugfix + supporting different feature lengths when expanding audio.
      
      * redundant, done in _get_validated_text
      
      * adapting the tests:
      - we must have text (not either audio or text)
      - _get_num_audio_features takes a list of raw lengths, provided it insetad.
      
      * Minor cleanup, remove unused import
      
      * Add more tests for batch feature processing
      
      * Allow setting offset in rel position embeddings
      
      * Add config option for warning if peft is not installed w/ lora
      
      * Port blip2 qformer code into granite speech
      
      * Add sad test for numpy arr processing
      
      * Allow numpy arrays / tuples in granite speech processor
      
      * Fix config type for projector
      
      * - pad instead of creating a zeros tensor, to keep the original dtype/device (support bfloat16)
      - cast input_features to the model dtype (support bfloat16)
      
      * merge Blip2QFormerConfig to GraniteSpeechProjectorConfig
      
      * prevent a crash when re-saving/loading the model (line 109)
      
      * consider additional edge cases during preprocessing.
      
      * consider additional edge cases during preprocessing.
      
      * add features mask for batched inference (bugfix)
      
      * Minor refactor, remove multiaudio processor tests
      
      * Add set input/output embeddings for granite speech
      
      * Fix feature dim check in processor test
      
      * Pop input features in embed test for granite speech
      
      * Small fixes for test edge cases
      
      Add granite speech to seq2seq causal lm mapping names
      
      * Add small tests for granite speech model
      
      * Fix data parallelism test
      
      * Standardize model class names
      
      * Fix check for copies
      
      * Fix misaligned init check
      
      * Skip granite speech in checkpoint check
      
      * Use default for tie_word_embeddings in granite speech
      
      * Fix non documentation granite speech repo issues
      
      * Fix comments and docstring checks
      
      * Add placeholder docs for granite speech
      
      * Fix test naming collision
      
      * Code formatting
      
      * Rerun torch dummy obj regen
      
      * Fix save pretrained for granite speech
      
      * Import sorting
      
      * Fix tests typo
      
      * Remove offset hack
      
      * Pass args through encoder config
      
      * Remove unused prune heads from blip2
      
      * removing einsum. replaced with explicit multiplication (relative positional encodings) and sdpa attention.
      
      * remove Sequential from ConformerFeedForward and ConformerConvModule. + fix for sdpa attention
      
      * remove GraniteSpeechConformerScale
      
      * rename to hidden_states
      
      * rename conformer layers to self.layers, remove the first linear from the list to keep the list homogenous.
      
      * move pre-norm to the attention/feedforward blocks (avoid complex module wrapping)
      
      * adding pre_norm into forward
      
      * feature extractor refactoring to resemble how it's done in phi4multimodal.
      
      * rename feature_extractor to audio_processor
      
      * bugfix: input_feature_mask fix to get the exact number tokens.
      
      * Fix pytest decorator in processor test
      
      * Add (disabled) integration tests for granite speech
      
      * Fix handling of optional feature masking
      
      * Loosen validation in processing for vLLM compatability
      
      * Formatting fixes
      
      * Update init structure to mirror llama
      
      * Make granite speech projector generic
      
      * Update test config to reflect generic projector
      
      * Formatting fixes
      
      * Fix typos, add license
      
      * Fix undefined var in input processing
      
      * Cleanup and expose ctc encoder
      
      * Add missing config docstrings
      
      * Better var names, type hints, etc
      
      * Set attn context size in init
      
      * Add max pos emb to encoder config
      
      * Cleanup feature extractor
      
      * Add granite speech architecture details
      
      * Remove granite speech qformer ref
      
      * Add paper link, explicit calc for qkv
      
      * Calculate padding directly in depthwise conv1d init
      
      * Raise value error instead of asserting
      
      * Reorder class defs (classes used at top)
      
      * Precompute relpos distances
      
      * Run formatting
      
      * Pass attention distances through forward
      
      * Apply suggestions from code review
      
      Co-authored-by: default avatareustlb <94853470+eustlb@users.noreply.github.com>
      
      * Add todo for using common batch feature extraction
      
      * Rename audios/features
      
      * Ensure chat template may be provided to processor
      
      * Move granite speech docs to audio models
      
      * Add todos for input proc refactoring
      
      * Fix import order
      
      * Guard torch import
      
      * Use relative imports
      
      * Require torch backend for processor in granite speech
      
      * Add backend guards in feature extractor
      
      ---------
      
      Signed-off-by: default avatarAlex-Brooks <Alex.brooks@ibm.com>
      Co-authored-by: default avatarAvihu Dekel <avihu.dekel@ibm.com>
      Co-authored-by: default avatareustlb <94853470+eustlb@users.noreply.github.com>
      623d395a
    • Mehant Kammakomati's avatar
      435f88f1
    • cyyever's avatar
      Add XPU case to is_torch_bf16_gpu_available (#37132) · 954f31cd
      cyyever authored
      
      * Add xpu case to is_torch_bf16_gpu_available
      
      Signed-off-by: default avatarcyy <cyyever@outlook.com>
      
      * Refine error messages
      
      Signed-off-by: default avatarcyy <cyyever@outlook.com>
      
      ---------
      
      Signed-off-by: default avatarcyy <cyyever@outlook.com>
      Co-authored-by: default avatarMarc Sun <57196510+SunMarc@users.noreply.github.com>
      954f31cd
    • cyyever's avatar
      Add weights_only=True to torch.load (#37062) · 28eae8b4
      cyyever authored
      28eae8b4
    • Matt's avatar
      🚨 🚨 Allow saving and loading multiple "raw" chat... · bf46e448
      Matt authored
      🚨 🚨
      
       Allow saving and loading multiple "raw" chat template files (#36588)
      
      * Add saving in the new format (but no loading yet!)
      
      * Add saving in the new format (but no loading yet!)
      
      * A new approach to template files!
      
      * make fixup
      
      * make fixup, set correct dir
      
      * Some progress but need to rework for cached_file
      
      * Rework loading handling again
      
      * Small fixes
      
      * Looks like it's working now!
      
      * make fixup
      
      * Working!
      
      * make fixup
      
      * make fixup
      
      * Add TODO so I don't miss it
      
      * Cleaner control flow with one less indent
      
      * Copy the new logic to processing_utils as well
      
      * Proper support for dicts of templates
      
      * make fixup
      
      * define the file/dir names in a single place
      
      * Update the processor chat template reload test as well
      
      * Add processor loading of multiple templates
      
      * Flatten correctly to match tokenizers
      
      * Better support when files are empty sometimes
      
      * Stop creating those empty templates
      
      * Revert changes now we don't have empty templates
      
      * Revert changes now we don't have empty templates
      
      * Don't support separate template files on the legacy path
      
      * Rework/simplify loading code
      
      * Make sure it's always a chat_template key in chat_template.json
      
      * Update processor handling of multiple templates
      
      * Add a full save-loading test to the tokenizer tests as well
      
      * Correct un-flattening
      
      * New test was incorrect
      
      * Correct error/offline handling
      
      * Better exception handling
      
      * More error handling cleanup
      
      * Add skips for test failing on main
      
      * Reorder to fix errors
      
      * make fixup
      
      * clarify legacy processor file docs and location
      
      * Update src/transformers/processing_utils.py
      
      Co-authored-by: default avatarLucain <lucainp@gmail.com>
      
      * Update src/transformers/processing_utils.py
      
      Co-authored-by: default avatarLucain <lucainp@gmail.com>
      
      * Update src/transformers/processing_utils.py
      
      Co-authored-by: default avatarLucain <lucainp@gmail.com>
      
      * Update src/transformers/processing_utils.py
      
      Co-authored-by: default avatarLucain <lucainp@gmail.com>
      
      * Rename to _jinja and _legacy
      
      * Stop saving multiple templates in the legacy format
      
      * Cleanup the processing code
      
      * Cleanup the processing code more
      
      * make fixup
      
      * make fixup
      
      * correct reformatting
      
      * Use correct dir name
      
      * Fix import location
      
      * Use save_jinja_files instead of save_raw_chat_template_files
      
      * Correct the test for saving multiple processor templates
      
      * Fix type hint
      
      * Update src/transformers/utils/hub.py
      
      Co-authored-by: default avatarJulien Chaumond <julien@huggingface.co>
      
      * Patch llava_onevision test
      
      * Update src/transformers/processing_utils.py
      
      Co-authored-by: default avatarJulien Chaumond <julien@huggingface.co>
      
      * Update src/transformers/tokenization_utils_base.py
      
      Co-authored-by: default avatarJulien Chaumond <julien@huggingface.co>
      
      * Refactor chat template saving out into a separate function
      
      * Update tests for the new default
      
      * Don't do chat template saving logic when chat template isn't there
      
      * Ensure save_jinja_files is propagated to tokenizer correctly
      
      * Trigger tests
      
      * Update more tests to new default
      
      * Trigger tests
      
      ---------
      
      Co-authored-by: default avatarLucain <lucainp@gmail.com>
      Co-authored-by: default avatarJulien Chaumond <julien@huggingface.co>
      bf46e448
    • Mohamed Mekkouri's avatar
      Disable kernels for quantization (#37446) · 89787474
      Mohamed Mekkouri authored
      fix
      89787474