- 12 Mar, 2025 8 commits
-
-
Nicolas Patry authored
-
Cyril Vallez authored
* squash everything together start to simplify inner logic Update modeling_utils.py Update modeling_utils.py Update modeling_utils.py Update modeling_utils.py continue refactor fix small fixes add type hints/docstring Update modeling_utils.py remove _fast_init keep improving Update modeling_utils.py Update modeling_utils.py new first tp loading version style fix weird in-place op trigger CIs Update modeling_utils.py much clearer renaming of keys fix update Update test_modeling_common.py trigger CIs update update style Update modeling_utils.py Update modeling_utils.py Update modeling_utils.py fix fast download first prototype remove old function remove old functions Remove unused function and move back _get_tp_registry fix tp plan registry simplify CIs Update hub.py Update modeling_utils.py simplify simplify renaming logic remove unused check add sanity check back (a test depends on it) Update modeling_utils.py finalize sound renaming logic style add forgotten check Update modeling_utils.py add key_mapping keyword style Update modeling_utils.py add comment minor updates minor change for clarity fix small prefix issue and simplify style trigger CIs typo fix Post rebase fix post rebase cleanup simplify tp typo oupsi typo correctly escape improvements based on Marc's review finalize Marc's review comments squash everything * improve * Update modeling_utils.py * Update modeling_utils.py * fix * Update modeling_utils.py * Update modeling_utils.py * style * Update modeling_utils.py * simplify * style * Update modeling_utils.py * Update modeling_utils.py * Update modeling_utils.py * Update modeling_utils.py * Update modeling_utils.py * Update modeling_utils.py * fix dtype issue * Update modeling_utils.py * style * remove test that does not make sense * style * small fixes * style * fix * cleanup after rebase * style * typo * escape * tp for task specific top modules * Update modeling_utils.py * Update modeling_utils.py * fix allocation * CIs * CIs * CIs * improve docstring * CIs * Update modeling_utils.py * fix
-
Marc Sun authored
fix
-
Joao Gante authored
* make fixup * trigger ci
-
Arthur authored
* fix block mask typing * updated Co-authored-by:
Cyril Vallez <cyril.vallez@gmail.com> * gemma * fix --------- Co-authored-by:
Cyril Vallez <cyril.vallez@gmail.com>
-
Nicolas Patry authored
-
Ilyas Moutawwakil authored
* test * fix * fix * skip some and run some first * test fsdp * fix * patches for generate * test distributed * copy * don't test distributed loss for hpu * require fp16 and run first * changes from marc's PR fixing zero3 * better alternative * return True when fp16 support on gaudi without creating bridge * fix * fix tested dtype in deepspeed inference test * test * fix * test * fix * skip * require fp16 * run first fsdp * Apply suggestions from code review * address comments * address comments and refactor test * reduce precison * avoid doing gaudi1 specific stuff in the genreation loop * document test_gradient_accumulation_loss_alignment_with_model_loss test a bit more
-
Ryan Mullins authored
* Fix converter * [Broken] Adds Gemma 3 to Hugging Face Transformers * Consolidating Config and Processor params across impls * Sorting out configuration parameters. Adds qk_norm before RoPE. Still not sure if RoPE is right. * Additional plumbing for CausalLM and ConditionalGeneration variants * incomplete draft of Orbax conversion script * More complete checkpoint conversion * Supporting Gemma 3 1B checkpoints * Updating RoPE for multiple frequencies * Adjustments to rotary embedder * Proof of life for text-only operation * Updating the conversion script to handle multimodal projection weights * Fixing tet-only conversions * Cleaner conversion script with multimodal support and a simpler processor * Additional refatcors to the Gemma3Processor * Simplified Processor to work over text representations * Updated conversion script to join text and vision embeddings at converion time * Logging for debugging * Update src/transformers/models/gemma2/modeling_gemma2.py Co-authored-by:
Joshua Lochner <admin@xenova.com> * Removed extraneous Config params * Switching to fast tokenizer for checkpoint conversions * isolating siglip for performance tetsing * Minor changes for debugging tests against baselines * Adding average pooling for soft tokens * Updating processor code to enable simpler embedding interleaving for arbitrary number of images in prompts * Updating conversion script for ShieldGemma 2 conversion compatibility * Allow disable_compile to be provided as a kwarg * Refresh from modular * Updated conversion script and corrected sliding window * Fix type mismatch in cache_position (#4) * Fix dtype (#5) * Fix type mismatch in cache_position * Actually fix in the modular file Co-authored-by:
Aritra Roy Gosthipaty <aritra.born2fly@gmail.com> --------- Co-authored-by:
Aritra Roy Gosthipaty <aritra.born2fly@gmail.com> * fixes for embedding table overflow and missing image_soft_token_mask from Gemma3Processor * Adding 2D pooling for image embeddings * Revert "Adding 2D pooling for image embeddings" This reverts commit 65350cf531296f050b2078a5b8e46f61642b2648. * Gemma3 average pooling changed from 1D to 2D * Major refactor to Gemma3MultimodalInputProjection * Updating Gemm 3 Auto* registrations * Add option to save Gemma 3 chat template with tokenizer during weights conversion * Removing unused imports * Moving out-of-vocab handling from Gemma3Processor to Gemma3ForConditionalGeneration * Removing duplicate config property * Removing final logit softcapping and 1-indexing of position ids * Fixing image processor config and none --> None typo * Fixing sliding window size for 1B * Updating image_mean and image_std in Image Processor * Attention masking changed to lower triangular * Moving image special tokens to conversion script * Mirror image processor defaults from conversion script into Gemma3ProcessorKwargs * Remove special token variables from symbol space * Moving image soft token mask computation from Gemma3Processor to Gemma3ForConditionalGeneration * tie lm_head and embedding weights Co-authored-by:
Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com> * Correct tied weights in Gemma3CausalLM * iterative bidirectional attention * resolving merge conflicts * Reverting to Gemma 2 HybridCache with sldiing window support and a sliding_window_pattern of 6 * Correcting RoPE scaling * clean up first pass, dummy model geenration works * final clean up before fixing tests * causal lm test works, so fine * Fix conversion * Update src/transformers/models/gemma3/processing_gemma3.py * model tests are happy * processor tests are happy * image processing tests added * fixup * Fix pre-processing in conversion * Inputs merging * Do not normalize vision embeddings * Apply Ryan's (and team) changes to attention * token type ids + mask * template * move embed scale, add rope scale, fix tests * Add chat template to tokenizer * Use prefix for causal model loading * use existing code for sliding mask from gemma2 * self.embed_tokens already normalizes * Correcting Gemma3TextConfig parameters in conversion script * typo, modular overwrites my fixes * enable device map for text model * Conversion updates * ultra nit: no einsums * update image token * copy deepcopy config + some docs * add some test, still WIP * Refactoring --include_chat_tempalte logic in converter * Update src/transformers/models/gemma3/modular_gemma3.py Co-authored-by:
Xuan-Son Nguyen <thichthat@gmail.com> * Add eos tokens for instruct models * dump so i can work on dgx * Removing add_bos by default * dump * add fast im proc * docs for PaS + fixup * another fixup * one more fixup * fix tests * Inverting prior BOS change * ultra nit * Reverting to Tokenizer saved with add_bos_token=True and chat template starting with BOS * resize embeds, remove sqrt, add slow test outputs * FA2 but quality is meh * nit * skip FA2, no idea what happened * last bit for green CI * please, green CI for docs * T_T * Fix for Gemma3 logits * Support both options for system prompt * Update src/transformers/models/gemma3/image_processing_gemma3_fast.py Co-authored-by:
Pedro Cuenca <pedro@huggingface.co> * Update docs/source/en/model_doc/gemma3.md Co-authored-by:
Pedro Cuenca <pedro@huggingface.co> * Update docs/source/en/model_doc/gemma3.md Co-authored-by:
Pedro Cuenca <pedro@huggingface.co> * Update docs/source/en/model_doc/gemma3.md Co-authored-by:
Pedro Cuenca <pedro@huggingface.co> * Update docs/source/en/model_doc/gemma3.md Co-authored-by:
Pedro Cuenca <pedro@huggingface.co> * Update docs/source/en/model_doc/gemma3.md Co-authored-by:
Pedro Cuenca <pedro@huggingface.co> * Docs updates now that assets are live * Style fixes --------- Co-authored-by:
Joshua Lochner <admin@xenova.com> Co-authored-by:
Pedro Cuenca <pedro@huggingface.co> Co-authored-by:
Aritra Roy Gosthipaty <aritra.born2fly@gmail.com> Co-authored-by:
Mayank Chaturvedi <imayank@google.com> Co-authored-by:
Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com> Co-authored-by:
raushan <raushan@huggingface.co> Co-authored-by:
Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz> Co-authored-by:
Xuan-Son Nguyen <thichthat@gmail.com> Co-authored-by:
Lysandre <hi@lysand.re>
-
- 11 Mar, 2025 10 commits
-
-
Afanti authored
* chore: fix typos in the docs directory * chore: fix typos in the docs directory * chore: fix typos in the docs directory
-
Marc Sun authored
* update * doc * update * Update docs/source/en/gguf.md Co-authored-by:
Steven Liu <59462357+stevhliu@users.noreply.github.com> * fix --------- Co-authored-by:
Steven Liu <59462357+stevhliu@users.noreply.github.com>
-
Matt authored
* Remove research projects * Add new README to explain where the projects went * Trigger tests * Cleanup all references to research_projects
-
Steven Liu authored
update
-
Matt authored
-
Matt authored
* Remove redundant pipeline warning * Remove redundant pipeline warning
-
ivarflakstad authored
AriaForConditionalGeneration depends on idefics3 vision transformer which does not support flex attn
-
Arthur authored
* proper performant flex attention implementation * wrapper for flex attention to compile only when triggered * wrapper for flex attention to compile only when triggered * attention mask type detection * Update src/transformers/integrations/flex_attention.py Co-authored-by:
Anton Vlasjuk <73884904+vasqu@users.noreply.github.com> * nit * nit * nit * nit * gemma2 support * add citation for torchtune * Update src/transformers/models/llama/modeling_llama.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update flex_attention.py * nit * nit * nit * reset gemma2 modifications * nit * nit * nit * licencing * apply changes to other models * safe import --------- Co-authored-by:
Sung Ching Liu <sunny19981005@outlook.com> Co-authored-by:
Sung Ching Liu <22844540+bursteratom@users.noreply.github.com> Co-authored-by:
Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
-
Travis Johnson authored
* fix: handle input_channel_dim == channels_last Signed-off-by:
Travis Johnson <tsjohnso@us.ibm.com> * fix: default PIL images to channels_last Signed-off-by:
Travis Johnson <tsjohnso@us.ibm.com> * Apply suggestions from code review Co-authored-by:
Pavel Iakubovskii <qubvel@gmail.com> * fixup from review batch Signed-off-by:
Travis Johnson <tsjohnso@us.ibm.com> * test: add 1x1 PIL image to ambiguous channel test Signed-off-by:
Travis Johnson <tsjohnso@us.ibm.com> * fix(mllama): avoid 0 dimension for image with impractical aspect ratio Signed-off-by:
Travis Johnson <tsjohnso@us.ibm.com> --------- Signed-off-by:
Travis Johnson <tsjohnso@us.ibm.com> Co-authored-by:
Pavel Iakubovskii <qubvel@gmail.com>
-
Arthur authored
* some config changes * update * current state * update * update * updates and cleanup * something that works * fixup * fixes * nits * nit * nits and fix * Update src/transformers/integrations/tensor_parallel.py Co-authored-by:
Lysandre Debut <hi@lysand.re> * Update src/transformers/integrations/tensor_parallel.py Co-authored-by:
Lysandre Debut <hi@lysand.re> * cleanup * style * safe import * fix * updates * rename stuff an clean * style * small updates * ups * oups * nit * protect imports * update tp * rodfl * arf * turbo nit on init * fix import error * frumble gumbgle * try to fix the import error * should fix the non model test * update keep in float32 * update * fix * nits * fix subvconfigs * test was weird * nit * fix failing test * fix instruct blip * fixes * style * x.com * fix overwrite * ok last bit of failing test --------- Co-authored-by:
Lysandre Debut <hi@lysand.re>
-
- 10 Mar, 2025 4 commits
-
-
Steven Liu authored
* initial * fix * model-impl
-
Afanti authored
* chore: fix typos in language models * chore: fix typos in mistral model * chore: fix model copy from issue * chore: fix model copy from issue * chore: fix model copy from issue * chore: fix model copy from issue * chore: fix model copy from issue
-
Matt authored
* Fix auto-assign reviewers * Clean up endanchor a bit * We don't actually need the end anchor at all
-
Joao Gante authored
-
- 07 Mar, 2025 8 commits
-
-
Kevron Rees authored
-
gautham authored
Fixed 2 issues regarding `tests/trainer/test_data_collator.py::TFDataCollatorIntegrationTest::test_all_mask_replacement`: 1. I got the error `RuntimeError: "bernoulli_tensor_cpu_p_" not implemented for 'Long'`. This is because the `mask_replacement_prob=1` and `torch.bernoulli` doesn't accept this type (which would be a `torch.long` dtype instead. I fixed this by manually casting the probability arguments in the `__post_init__` function of `DataCollatorForLanguageModeling`. 2. I also got the error `tensorflow.python.framework.errors_impl.InvalidArgumentError: cannot compute Equal as input #1(zero-based) was expected to be a int64 tensor but is a int32 tensor [Op:Equal]` due to the line `tf.reduce_all((batch["input_ids"] == inputs) | (batch["input_ids"] == tokenizer.mask_token_id))` in `test_data_collator.py`. This occurs because the type of the `inputs` variable is `tf.int32`. Solved this by manually casting it to `tf.int64` in the test, as the expected return type of `batch["input_ids"]` is `tf.int64`.
-
dependabot[bot] authored
Bump jinja2 in /examples/research_projects/decision_transformer Bumps [jinja2](https://github.com/pallets/jinja) from 3.1.5 to 3.1.6. - [Release notes](https://github.com/pallets/jinja/releases) - [Changelog](https://github.com/pallets/jinja/blob/main/CHANGES.rst) - [Commits](https://github.com/pallets/jinja/compare/3.1.5...3.1.6 ) --- updated-dependencies: - dependency-name: jinja2 dependency-type: direct:production ... Signed-off-by:
dependabot[bot] <support@github.com> Co-authored-by:
dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
-
Joao Gante authored
update who to tag
-
Krishnakumar Kannan authored
Update chat_extras.md - content Fixed a typo in the content, that may confuse the readers.
-
Matt authored
* First draft of github action on PR opening for auto-assigning reviewers * fix missing import * Don't reassign reviewers if we already have them * Temporarily comment out the opened line so we can test the script * Correct path for codeowners file * Update workflow permissions * Update workflow permissions * Update debug logs * Strip inline comments * Remove prefix * Request reviews instead of assigning * Request reviews instead of assigning * Add TODO * Use pull-request-target instead * Update the script * Set back to pull_request for testing * Set to pull_request_target, testing works! * Add licence * Tighten up one of the globs * Refactor things to be a bit less convoluted * Only assign reviewers when marked ready for review
-
Andreas Abdi authored
* Export base streamer. Previously, the base streamer class was not exported so the set of available streamers was fixed to 3 streamer classes. This change makes it so that customers may extend the default base streamer class. * make fixup --------- Co-authored-by:
Joao Gante <joaofranciscocardosogante@gmail.com> Co-authored-by:
Joao Gante <joao@huggingface.co>
-
Dolen authored
avoid errors when the size of `input_ids` passed to `PrefixConstrainedLogitsProcessor` is zero (#36489) * avoid errors when the size of `input_ids` passed to PrefixConstrainedLogitsProcessor is zero * use more reasonable process * avoid early return --------- Co-authored-by:
Joao Gante <joaofranciscocardosogante@gmail.com>
-
- 06 Mar, 2025 10 commits
-
-
Nouamane Tazi authored
-
Joao Gante authored
these tests should be slow
-
Joao Gante authored
-
Shaohon Chen authored
* add swanlab integration * feat(integrate): add SwanLab as an optional experiment tracking tool in transformers - Integrated SwanLab into the transformers library as an alternative for experiment tracking. - Users can now log training metrics, hyperparameters, and other experiment details to SwanLab by setting `report_to="swanlab"` in the `TrainingArguments`. - Added necessary dependencies and documentation for SwanLab integration. * Fix the spelling error of SwanLabCallback in callback.md * Apply suggestions from code review Co-authored-by:
Marc Sun <57196510+SunMarc@users.noreply.github.com> * Fix typo in comment * Fix typo in comment * Fix typos and update comments * fix annotation * chore: opt some comments --------- Co-authored-by:
Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by:
AAssets <20010618@qq.com> Co-authored-by:
ZeYi Lin <944270057@qq.com> Co-authored-by:
KAAANG <79990647+SAKURA-CAT@users.noreply.github.com>
-
hlky authored
* Modular Conversion --fix_and_overwrite on Windows * -newline on read
-
湛露先生 authored
Signed-off-by:
zhanluxianshen <zhanluxianshen@163.com>
-
dependabot[bot] authored
Bump transformers in /examples/research_projects/pplm Bumps [transformers](https://github.com/huggingface/transformers) from 4.38.0 to 4.48.0. - [Release notes](https://github.com/huggingface/transformers/releases) - [Commits](https://github.com/huggingface/transformers/compare/v4.38.0...v4.48.0 ) --- updated-dependencies: - dependency-name: transformers dependency-type: direct:production ... Signed-off-by:
dependabot[bot] <support@github.com> Co-authored-by:
dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
-
Afanti authored
* chore: enhance message descriptons in parameters,comments,logs and docstrings * chore: enhance message descriptons in parameters,comments,logs and docstrings * Update src/transformers/hf_argparser.py * Update src/transformers/keras_callbacks.py --------- Co-authored-by:
Matt <Rocketknight1@users.noreply.github.com>
-
湛露先生 authored
Signed-off-by:
zhanluxianshen <zhanluxianshen@163.com>
-