- 11 Mar, 2025 4 commits
-
-
ivarflakstad authored
AriaForConditionalGeneration depends on idefics3 vision transformer which does not support flex attn
-
Arthur authored
* proper performant flex attention implementation * wrapper for flex attention to compile only when triggered * wrapper for flex attention to compile only when triggered * attention mask type detection * Update src/transformers/integrations/flex_attention.py Co-authored-by:
Anton Vlasjuk <73884904+vasqu@users.noreply.github.com> * nit * nit * nit * nit * gemma2 support * add citation for torchtune * Update src/transformers/models/llama/modeling_llama.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update flex_attention.py * nit * nit * nit * reset gemma2 modifications * nit * nit * nit * licencing * apply changes to other models * safe import --------- Co-authored-by:
Sung Ching Liu <sunny19981005@outlook.com> Co-authored-by:
Sung Ching Liu <22844540+bursteratom@users.noreply.github.com> Co-authored-by:
Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
-
Travis Johnson authored
* fix: handle input_channel_dim == channels_last Signed-off-by:
Travis Johnson <tsjohnso@us.ibm.com> * fix: default PIL images to channels_last Signed-off-by:
Travis Johnson <tsjohnso@us.ibm.com> * Apply suggestions from code review Co-authored-by:
Pavel Iakubovskii <qubvel@gmail.com> * fixup from review batch Signed-off-by:
Travis Johnson <tsjohnso@us.ibm.com> * test: add 1x1 PIL image to ambiguous channel test Signed-off-by:
Travis Johnson <tsjohnso@us.ibm.com> * fix(mllama): avoid 0 dimension for image with impractical aspect ratio Signed-off-by:
Travis Johnson <tsjohnso@us.ibm.com> --------- Signed-off-by:
Travis Johnson <tsjohnso@us.ibm.com> Co-authored-by:
Pavel Iakubovskii <qubvel@gmail.com>
-
Arthur authored
* some config changes * update * current state * update * update * updates and cleanup * something that works * fixup * fixes * nits * nit * nits and fix * Update src/transformers/integrations/tensor_parallel.py Co-authored-by:
Lysandre Debut <hi@lysand.re> * Update src/transformers/integrations/tensor_parallel.py Co-authored-by:
Lysandre Debut <hi@lysand.re> * cleanup * style * safe import * fix * updates * rename stuff an clean * style * small updates * ups * oups * nit * protect imports * update tp * rodfl * arf * turbo nit on init * fix import error * frumble gumbgle * try to fix the import error * should fix the non model test * update keep in float32 * update * fix * nits * fix subvconfigs * test was weird * nit * fix failing test * fix instruct blip * fixes * style * x.com * fix overwrite * ok last bit of failing test --------- Co-authored-by:
Lysandre Debut <hi@lysand.re>
-
- 10 Mar, 2025 4 commits
-
-
Steven Liu authored
* initial * fix * model-impl
-
Afanti authored
* chore: fix typos in language models * chore: fix typos in mistral model * chore: fix model copy from issue * chore: fix model copy from issue * chore: fix model copy from issue * chore: fix model copy from issue * chore: fix model copy from issue
-
Matt authored
* Fix auto-assign reviewers * Clean up endanchor a bit * We don't actually need the end anchor at all
-
Joao Gante authored
-
- 07 Mar, 2025 8 commits
-
-
Kevron Rees authored
-
gautham authored
Fixed 2 issues regarding `tests/trainer/test_data_collator.py::TFDataCollatorIntegrationTest::test_all_mask_replacement`: 1. I got the error `RuntimeError: "bernoulli_tensor_cpu_p_" not implemented for 'Long'`. This is because the `mask_replacement_prob=1` and `torch.bernoulli` doesn't accept this type (which would be a `torch.long` dtype instead. I fixed this by manually casting the probability arguments in the `__post_init__` function of `DataCollatorForLanguageModeling`. 2. I also got the error `tensorflow.python.framework.errors_impl.InvalidArgumentError: cannot compute Equal as input #1(zero-based) was expected to be a int64 tensor but is a int32 tensor [Op:Equal]` due to the line `tf.reduce_all((batch["input_ids"] == inputs) | (batch["input_ids"] == tokenizer.mask_token_id))` in `test_data_collator.py`. This occurs because the type of the `inputs` variable is `tf.int32`. Solved this by manually casting it to `tf.int64` in the test, as the expected return type of `batch["input_ids"]` is `tf.int64`.
-
dependabot[bot] authored
Bump jinja2 in /examples/research_projects/decision_transformer Bumps [jinja2](https://github.com/pallets/jinja) from 3.1.5 to 3.1.6. - [Release notes](https://github.com/pallets/jinja/releases) - [Changelog](https://github.com/pallets/jinja/blob/main/CHANGES.rst) - [Commits](https://github.com/pallets/jinja/compare/3.1.5...3.1.6 ) --- updated-dependencies: - dependency-name: jinja2 dependency-type: direct:production ... Signed-off-by:
dependabot[bot] <support@github.com> Co-authored-by:
dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
-
Joao Gante authored
update who to tag
-
Krishnakumar Kannan authored
Update chat_extras.md - content Fixed a typo in the content, that may confuse the readers.
-
Matt authored
* First draft of github action on PR opening for auto-assigning reviewers * fix missing import * Don't reassign reviewers if we already have them * Temporarily comment out the opened line so we can test the script * Correct path for codeowners file * Update workflow permissions * Update workflow permissions * Update debug logs * Strip inline comments * Remove prefix * Request reviews instead of assigning * Request reviews instead of assigning * Add TODO * Use pull-request-target instead * Update the script * Set back to pull_request for testing * Set to pull_request_target, testing works! * Add licence * Tighten up one of the globs * Refactor things to be a bit less convoluted * Only assign reviewers when marked ready for review
-
Andreas Abdi authored
* Export base streamer. Previously, the base streamer class was not exported so the set of available streamers was fixed to 3 streamer classes. This change makes it so that customers may extend the default base streamer class. * make fixup --------- Co-authored-by:
Joao Gante <joaofranciscocardosogante@gmail.com> Co-authored-by:
Joao Gante <joao@huggingface.co>
-
Dolen authored
avoid errors when the size of `input_ids` passed to `PrefixConstrainedLogitsProcessor` is zero (#36489) * avoid errors when the size of `input_ids` passed to PrefixConstrainedLogitsProcessor is zero * use more reasonable process * avoid early return --------- Co-authored-by:
Joao Gante <joaofranciscocardosogante@gmail.com>
-
- 06 Mar, 2025 10 commits
-
-
Nouamane Tazi authored
-
Joao Gante authored
these tests should be slow
-
Joao Gante authored
-
Shaohon Chen authored
* add swanlab integration * feat(integrate): add SwanLab as an optional experiment tracking tool in transformers - Integrated SwanLab into the transformers library as an alternative for experiment tracking. - Users can now log training metrics, hyperparameters, and other experiment details to SwanLab by setting `report_to="swanlab"` in the `TrainingArguments`. - Added necessary dependencies and documentation for SwanLab integration. * Fix the spelling error of SwanLabCallback in callback.md * Apply suggestions from code review Co-authored-by:
Marc Sun <57196510+SunMarc@users.noreply.github.com> * Fix typo in comment * Fix typo in comment * Fix typos and update comments * fix annotation * chore: opt some comments --------- Co-authored-by:
Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by:
AAssets <20010618@qq.com> Co-authored-by:
ZeYi Lin <944270057@qq.com> Co-authored-by:
KAAANG <79990647+SAKURA-CAT@users.noreply.github.com>
-
hlky authored
* Modular Conversion --fix_and_overwrite on Windows * -newline on read
-
湛露先生 authored
Signed-off-by:
zhanluxianshen <zhanluxianshen@163.com>
-
dependabot[bot] authored
Bump transformers in /examples/research_projects/pplm Bumps [transformers](https://github.com/huggingface/transformers) from 4.38.0 to 4.48.0. - [Release notes](https://github.com/huggingface/transformers/releases) - [Commits](https://github.com/huggingface/transformers/compare/v4.38.0...v4.48.0 ) --- updated-dependencies: - dependency-name: transformers dependency-type: direct:production ... Signed-off-by:
dependabot[bot] <support@github.com> Co-authored-by:
dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
-
Afanti authored
* chore: enhance message descriptons in parameters,comments,logs and docstrings * chore: enhance message descriptons in parameters,comments,logs and docstrings * Update src/transformers/hf_argparser.py * Update src/transformers/keras_callbacks.py --------- Co-authored-by:
Matt <Rocketknight1@users.noreply.github.com>
-
湛露先生 authored
Signed-off-by:
zhanluxianshen <zhanluxianshen@163.com>
-
- 05 Mar, 2025 2 commits
-
-
co63oc authored
Signed-off-by:
co63oc <co63oc@users.noreply.github.com>
-
Marc Sun authored
* u16 * style * fix
-
- 04 Mar, 2025 5 commits
-
-
Afanti authored
chore: enhance the message in docstrings
-
Mohamed Mekkouri authored
fix quantization doc
-
ivarflakstad authored
-
co63oc authored
Fix typos in docs and examples Signed-off-by:
co63oc <co63oc@users.noreply.github.com>
-
Arthur authored
* initial commit * small fix * move stuff to image processing file * remove stuff in validate turn and fix return tensor * remove liquid stuff * in the process of addressing comments * changes to get the right tokenization * new __init__ works * fixing defulat std and mean * works * small testing scipt -- to be deleted before merge * remove redundant code * addressing comments * fix inits, add docs templates * refactor processor, switch to gotocr image processor * remove image proc from init * refactor to working llava-style architecture * Change AyaVisionModel to AyaVisionForConditionalGeneration * add tests * fixups * update doc * Adding logits_to_keep explicitly in ayavision forward to enable compatibility with cohere model * better variable names + remove code paths * Updates to aya_vision.md * address comments * adding copied from * make style and remove unused projector_hidden_act from config * sort init * include usage of fast image proc and proc on cuda in doc * update checkpoint iin test processor * update checkpoint in test processor 2 * remove test_model and update docstring * skip failing tests --------- Co-authored-by:
Saurabh Dash <saurabh@cohere.com> Co-authored-by:
yonigozlan <yoni.gozlan@huggingface.co>
-
- 03 Mar, 2025 7 commits
-
-
Steven Liu authored
* toctree * not-doctested.txt * collapse sections * feedback * update * rewrite get started sections * fixes * fix * loading models * fix * customize models * share * fix link * contribute part 1 * contribute pt 2 * fix toctree * tokenization pt 1 * Add new model (#32615) * v1 - working version * fix * fix * fix * fix * rename to correct name * fix title * fixup * rename files * fix * add copied from on tests * rename to `FalconMamba` everywhere and fix bugs * fix quantization + accelerate * fix copies * add `torch.compile` support * fix tests * fix tests and add slow tests * copies on config * merge the latest changes * fix tests * add few lines about instruct * Apply suggestions from code review Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix * fix tests --------- Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * "to be not" -> "not to be" (#32636) * "to be not" -> "not to be" * Update sam.md * Update trainer.py * Update modeling_utils.py * Update test_modeling_utils.py * Update test_modeling_utils.py * fix hfoption tag * tokenization pt. 2 * image processor * fix toctree * backbones * feature extractor * fix file name * processor * update not-doctested * update * make style * fix toctree * revision * make fixup * fix toctree * fix * make style * fix hfoption tag * pipeline * pipeline gradio * pipeline web server * add pipeline * fix toctree * not-doctested * prompting * llm optims * fix toctree * fixes * cache * text generation * fix * chat pipeline * chat stuff * xla * torch.compile * cpu inference * toctree * gpu inference * agents and tools * gguf/tiktoken * finetune * toctree * trainer * trainer pt 2 * optims * optimizers * accelerate * parallelism * fsdp * update * distributed cpu * hardware training * gpu training * gpu training 2 * peft * distrib debug * deepspeed 1 * deepspeed 2 * chat toctree * quant pt 1 * quant pt 2 * fix toctree * fix * fix * quant pt 3 * quant pt 4 * serialization * torchscript * scripts * tpu * review * model addition timeline * modular * more reviews * reviews * fix toctree * reviews reviews * continue reviews * more reviews * modular transformers * more review * zamba2 * fix * all frameworks * pytorch * supported model frameworks * flashattention * rm check_table * not-doctested.txt * rm check_support_list.py * feedback * updates/feedback * review * feedback * fix * update * feedback * updates * update --------- Co-authored-by:
Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by:
Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
-
Matt authored
-
Kashif Rasul authored
* fix E721 warnings * config.hidden_size is not a tuple * fix copies * fix-copies * not a tuple * undo * undo
-
Matt authored
* Fix edge case for continue_final_message * lstrip() correctly * Add regression test * Add a clearer error message when the final message is not present * Add a clearer error message when the final message is not present * Fix massive bug!
-
Matt authored
* Fix pipeline-peft interaction * once again you have committed a debug breakpoint * Remove extra testing line * Add a test to check adapter loading * Correct adapter path * make fixup * Remove unnecessary check * Make check a little more stringent
-
Afanti authored
chore: fix messagedescriptions in arguments and comments
-
co63oc authored
Co-authored-by:
Matt <Rocketknight1@users.noreply.github.com>
-