1. 14 Apr, 2025 18 commits
  2. 11 Apr, 2025 19 commits
    • Eric Wiener's avatar
      Fix typing issues with SigLip2 (#37356) · 953196a4
      Eric Wiener authored
      
      * Fix issues
      
      * Fix comment
      
      ---------
      
      Co-authored-by: default avatarPavel Iakubovskii <qubvel@gmail.com>
      953196a4
    • Joao Gante's avatar
      [agents] remove agents 🧹 (#37368) · aaf129cd
      Joao Gante authored
      aaf129cd
    • Matt's avatar
      Delete hubconf.py (#37455) · 69e6ddf2
      Matt authored
      * Delete hubconf.py
      
      * Trigger tests
      69e6ddf2
    • Alex Brooks's avatar
      Add Granite Speech Support (#36801) · 623d395a
      Alex Brooks authored
      
      * First pass at speech granite
      
      Add encoder / projector, rename things
      
      * Combine into one model file with causal lm outputs for forward
      
      * Add loss calc
      
      * Fix config loading
      
      Signed-off-by: default avatarAlex-Brooks <Alex.brooks@ibm.com>
      
      * Split new / old loading logic
      
      * Use transformers integration for loading peft adapters
      
      * Add generation wrapper for selective lora enablement
      
      * Add note for qformer encoder automodel
      
      * Guard torch/audio imports in feature extractor
      
      * Handle granite speech autoclasses
      
      * Handle optional deps in package structure for granite speech
      
      * Add granite pretrained model def for init
      
      * Add dummy objects for torch/torchaudio
      
      * Add tests for granite speech processor
      
      * Minor formatting fixes and refactoring
      
      * Add options for falling back to config in forward
      
      * Tentative model docstrings for granite speech
      
      * Fix config type
      
      * Remove legacy load
      
      * Allow non-lora variants for granite speech
      
      * Override weight tying for llm
      
      * Use text config instead of llm config
      
      * Add output embeddings getter to fix weight tying
      
      * Fix relative imports
      
      * computing the number of audio features, based on the raw audio sequence.
      
      * collating audio inputs, and keeping the original lengths.
      
      * asserted we have text. otherwise we can't specify the audio special token.
      
      * assering the number of audio-symbols/audios match correctly.
      running get validated_audios only when audio is present
      
      * indentation bugfix + supporting different feature lengths when expanding audio.
      
      * redundant, done in _get_validated_text
      
      * adapting the tests:
      - we must have text (not either audio or text)
      - _get_num_audio_features takes a list of raw lengths, provided it insetad.
      
      * Minor cleanup, remove unused import
      
      * Add more tests for batch feature processing
      
      * Allow setting offset in rel position embeddings
      
      * Add config option for warning if peft is not installed w/ lora
      
      * Port blip2 qformer code into granite speech
      
      * Add sad test for numpy arr processing
      
      * Allow numpy arrays / tuples in granite speech processor
      
      * Fix config type for projector
      
      * - pad instead of creating a zeros tensor, to keep the original dtype/device (support bfloat16)
      - cast input_features to the model dtype (support bfloat16)
      
      * merge Blip2QFormerConfig to GraniteSpeechProjectorConfig
      
      * prevent a crash when re-saving/loading the model (line 109)
      
      * consider additional edge cases during preprocessing.
      
      * consider additional edge cases during preprocessing.
      
      * add features mask for batched inference (bugfix)
      
      * Minor refactor, remove multiaudio processor tests
      
      * Add set input/output embeddings for granite speech
      
      * Fix feature dim check in processor test
      
      * Pop input features in embed test for granite speech
      
      * Small fixes for test edge cases
      
      Add granite speech to seq2seq causal lm mapping names
      
      * Add small tests for granite speech model
      
      * Fix data parallelism test
      
      * Standardize model class names
      
      * Fix check for copies
      
      * Fix misaligned init check
      
      * Skip granite speech in checkpoint check
      
      * Use default for tie_word_embeddings in granite speech
      
      * Fix non documentation granite speech repo issues
      
      * Fix comments and docstring checks
      
      * Add placeholder docs for granite speech
      
      * Fix test naming collision
      
      * Code formatting
      
      * Rerun torch dummy obj regen
      
      * Fix save pretrained for granite speech
      
      * Import sorting
      
      * Fix tests typo
      
      * Remove offset hack
      
      * Pass args through encoder config
      
      * Remove unused prune heads from blip2
      
      * removing einsum. replaced with explicit multiplication (relative positional encodings) and sdpa attention.
      
      * remove Sequential from ConformerFeedForward and ConformerConvModule. + fix for sdpa attention
      
      * remove GraniteSpeechConformerScale
      
      * rename to hidden_states
      
      * rename conformer layers to self.layers, remove the first linear from the list to keep the list homogenous.
      
      * move pre-norm to the attention/feedforward blocks (avoid complex module wrapping)
      
      * adding pre_norm into forward
      
      * feature extractor refactoring to resemble how it's done in phi4multimodal.
      
      * rename feature_extractor to audio_processor
      
      * bugfix: input_feature_mask fix to get the exact number tokens.
      
      * Fix pytest decorator in processor test
      
      * Add (disabled) integration tests for granite speech
      
      * Fix handling of optional feature masking
      
      * Loosen validation in processing for vLLM compatability
      
      * Formatting fixes
      
      * Update init structure to mirror llama
      
      * Make granite speech projector generic
      
      * Update test config to reflect generic projector
      
      * Formatting fixes
      
      * Fix typos, add license
      
      * Fix undefined var in input processing
      
      * Cleanup and expose ctc encoder
      
      * Add missing config docstrings
      
      * Better var names, type hints, etc
      
      * Set attn context size in init
      
      * Add max pos emb to encoder config
      
      * Cleanup feature extractor
      
      * Add granite speech architecture details
      
      * Remove granite speech qformer ref
      
      * Add paper link, explicit calc for qkv
      
      * Calculate padding directly in depthwise conv1d init
      
      * Raise value error instead of asserting
      
      * Reorder class defs (classes used at top)
      
      * Precompute relpos distances
      
      * Run formatting
      
      * Pass attention distances through forward
      
      * Apply suggestions from code review
      
      Co-authored-by: default avatareustlb <94853470+eustlb@users.noreply.github.com>
      
      * Add todo for using common batch feature extraction
      
      * Rename audios/features
      
      * Ensure chat template may be provided to processor
      
      * Move granite speech docs to audio models
      
      * Add todos for input proc refactoring
      
      * Fix import order
      
      * Guard torch import
      
      * Use relative imports
      
      * Require torch backend for processor in granite speech
      
      * Add backend guards in feature extractor
      
      ---------
      
      Signed-off-by: default avatarAlex-Brooks <Alex.brooks@ibm.com>
      Co-authored-by: default avatarAvihu Dekel <avihu.dekel@ibm.com>
      Co-authored-by: default avatareustlb <94853470+eustlb@users.noreply.github.com>
      623d395a
    • Mehant Kammakomati's avatar
      435f88f1
    • cyyever's avatar
      Add XPU case to is_torch_bf16_gpu_available (#37132) · 954f31cd
      cyyever authored
      
      * Add xpu case to is_torch_bf16_gpu_available
      
      Signed-off-by: default avatarcyy <cyyever@outlook.com>
      
      * Refine error messages
      
      Signed-off-by: default avatarcyy <cyyever@outlook.com>
      
      ---------
      
      Signed-off-by: default avatarcyy <cyyever@outlook.com>
      Co-authored-by: default avatarMarc Sun <57196510+SunMarc@users.noreply.github.com>
      954f31cd
    • cyyever's avatar
      Add weights_only=True to torch.load (#37062) · 28eae8b4
      cyyever authored
      28eae8b4
    • Matt's avatar
      🚨 🚨 Allow saving and loading multiple "raw" chat... · bf46e448
      Matt authored
      🚨 🚨
      
       Allow saving and loading multiple "raw" chat template files (#36588)
      
      * Add saving in the new format (but no loading yet!)
      
      * Add saving in the new format (but no loading yet!)
      
      * A new approach to template files!
      
      * make fixup
      
      * make fixup, set correct dir
      
      * Some progress but need to rework for cached_file
      
      * Rework loading handling again
      
      * Small fixes
      
      * Looks like it's working now!
      
      * make fixup
      
      * Working!
      
      * make fixup
      
      * make fixup
      
      * Add TODO so I don't miss it
      
      * Cleaner control flow with one less indent
      
      * Copy the new logic to processing_utils as well
      
      * Proper support for dicts of templates
      
      * make fixup
      
      * define the file/dir names in a single place
      
      * Update the processor chat template reload test as well
      
      * Add processor loading of multiple templates
      
      * Flatten correctly to match tokenizers
      
      * Better support when files are empty sometimes
      
      * Stop creating those empty templates
      
      * Revert changes now we don't have empty templates
      
      * Revert changes now we don't have empty templates
      
      * Don't support separate template files on the legacy path
      
      * Rework/simplify loading code
      
      * Make sure it's always a chat_template key in chat_template.json
      
      * Update processor handling of multiple templates
      
      * Add a full save-loading test to the tokenizer tests as well
      
      * Correct un-flattening
      
      * New test was incorrect
      
      * Correct error/offline handling
      
      * Better exception handling
      
      * More error handling cleanup
      
      * Add skips for test failing on main
      
      * Reorder to fix errors
      
      * make fixup
      
      * clarify legacy processor file docs and location
      
      * Update src/transformers/processing_utils.py
      
      Co-authored-by: default avatarLucain <lucainp@gmail.com>
      
      * Update src/transformers/processing_utils.py
      
      Co-authored-by: default avatarLucain <lucainp@gmail.com>
      
      * Update src/transformers/processing_utils.py
      
      Co-authored-by: default avatarLucain <lucainp@gmail.com>
      
      * Update src/transformers/processing_utils.py
      
      Co-authored-by: default avatarLucain <lucainp@gmail.com>
      
      * Rename to _jinja and _legacy
      
      * Stop saving multiple templates in the legacy format
      
      * Cleanup the processing code
      
      * Cleanup the processing code more
      
      * make fixup
      
      * make fixup
      
      * correct reformatting
      
      * Use correct dir name
      
      * Fix import location
      
      * Use save_jinja_files instead of save_raw_chat_template_files
      
      * Correct the test for saving multiple processor templates
      
      * Fix type hint
      
      * Update src/transformers/utils/hub.py
      
      Co-authored-by: default avatarJulien Chaumond <julien@huggingface.co>
      
      * Patch llava_onevision test
      
      * Update src/transformers/processing_utils.py
      
      Co-authored-by: default avatarJulien Chaumond <julien@huggingface.co>
      
      * Update src/transformers/tokenization_utils_base.py
      
      Co-authored-by: default avatarJulien Chaumond <julien@huggingface.co>
      
      * Refactor chat template saving out into a separate function
      
      * Update tests for the new default
      
      * Don't do chat template saving logic when chat template isn't there
      
      * Ensure save_jinja_files is propagated to tokenizer correctly
      
      * Trigger tests
      
      * Update more tests to new default
      
      * Trigger tests
      
      ---------
      
      Co-authored-by: default avatarLucain <lucainp@gmail.com>
      Co-authored-by: default avatarJulien Chaumond <julien@huggingface.co>
      bf46e448
    • Mohamed Mekkouri's avatar
      Disable kernels for quantization (#37446) · 89787474
      Mohamed Mekkouri authored
      fix
      89787474
    • Wing Lian's avatar
      prevent creating a view/leaf param for low rank optimizers w FSDP (#37379) · 6a75528c
      Wing Lian authored
      prevent creating a view/leaf param for low rank optimizers:
      6a75528c
    • Bowen Bao's avatar
    • Raushan Turganbay's avatar
      [processor] clean up mulitmodal tests (#37362) · a563999a
      Raushan Turganbay authored
      * clkea up mulitmodal processor tests
      
      * fixup
      
      * fix tests
      
      * fix one last test
      
      * forgot
      a563999a
    • Mohamed Mekkouri's avatar
      Remove triton mlp kernel, not compiling for some models (#37449) · 3c39c079
      Mohamed Mekkouri authored
      * remove mlp for now
      
      * disable on docker
      3c39c079
    • Lysandre Debut's avatar
      Fix the test fetcher (#37452) · f797e3d9
      Lysandre Debut authored
      Test fetcher
      f797e3d9
    • Arthur's avatar
      Add moe kernels (#37376) · 442d356a
      Arthur authored
      
      * the fix that did not get in
      
      * add kernels
      
      * full graph does not work
      
      * simpler is better
      
      * Update src/transformers/integrations/hub_kernels.py
      
      Co-authored-by: default avatarDaniël de Kok <me@danieldk.eu>
      
      * Update src/transformers/integrations/fbgemm_fp8.py
      
      Co-authored-by: default avatarDaniël de Kok <me@danieldk.eu>
      
      * Update src/transformers/integrations/hub_kernels.py
      
      Co-authored-by: default avatarDaniël de Kok <me@danieldk.eu>
      
      * fixup
      
      ---------
      
      Co-authored-by: default avatarDaniël de Kok <me@danieldk.eu>
      442d356a
    • Arthur's avatar
      Update-kernel-pin (#37448) · 7e9b57ce
      Arthur authored
      * update `kernels`
      
      * oups
      
      * new pinned version
      7e9b57ce
    • Lysandre Debut's avatar
      Simplify soft dependencies and update the dummy-creation process (#36827) · 54a123f0
      Lysandre Debut authored
      
      * Reverse dependency map shouldn't be created when test_all is set
      
      * [test_all] Remove dummies
      
      * Modular fixes
      
      * Update utils/check_repo.py
      
      Co-authored-by: default avatarPablo Montalvo <39954772+molbap@users.noreply.github.com>
      
      * [test_all] Better docs
      
      * [test_all] Update src/transformers/commands/chat.py
      
      Co-authored-by: default avatarJoao Gante <joaofranciscocardosogante@gmail.com>
      
      * [test_all] Remove deprecated AdaptiveEmbeddings from the tests
      
      * [test_all] Doc builder
      
      * [test_all] is_dummy
      
      * [test_all] Import utils
      
      * [test_all] Doc building should not require all deps
      
      ---------
      
      Co-authored-by: default avatarPablo Montalvo <39954772+molbap@users.noreply.github.com>
      Co-authored-by: default avatarJoao Gante <joaofranciscocardosogante@gmail.com>
      54a123f0
    • Donggeun Yu's avatar
      Fixes: Corrects file path for CUDA kernels (#37438) · 931126b9
      Donggeun Yu authored
      Corrects the file path used to locate the CUDA kernels
      for the Deformable Attention module. This ensures that
      the kernels are loaded correctly, resolving potential
      errors during module initialization and usage.
      931126b9
    • Yao Matrix's avatar
      enhance require_deterministic_for_xpu (#37437) · c7064cdb
      Yao Matrix authored
      
      * enhance require_deterministic_for_xpu
      
      Signed-off-by: default avatarYAO Matrix <matrix.yao@intel.com>
      
      * fix style
      
      Signed-off-by: default avatarYAO Matrix <matrix.yao@intel.com>
      
      * fix style
      
      Signed-off-by: default avatarYAO Matrix <matrix.yao@intel.com>
      
      ---------
      
      Signed-off-by: default avatarYAO Matrix <matrix.yao@intel.com>
      c7064cdb
  3. 10 Apr, 2025 3 commits