- 23 Aug, 2024 3 commits
-
-
Jason (Siyu) Zhu authored
* add liger integration * fix syntax * fix import issue * add trainer.md * Use _apply_liger_kernel() * Fixed log message * Update docs/source/en/trainer.md Co-authored-by:
Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update docs/source/en/trainer.md Co-authored-by:
Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update src/transformers/training_args.py Co-authored-by:
Byron Hsu <byronhsu1230@gmail.com> * Update src/transformers/trainer.py Co-authored-by:
Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update src/transformers/training_args.py Co-authored-by:
Byron Hsu <byronhsu1230@gmail.com> * Update docs/source/en/trainer.md Co-authored-by:
Byron Hsu <byronhsu1230@gmail.com> * Fixed checkstyle and updated readme * Added test * Fixed checkstyle * fix docstring * rename use_liger to use_liger_kernel * Trigger Build * Added test * add fix-copies * Fixed copy inconsistencies --------- Co-authored-by:
shimizust <sshimizu@linkedin.com> Co-authored-by:
Steven Shimizu <shimizust@gmail.com> Co-authored-by:
Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by:
Byron Hsu <byronhsu1230@gmail.com>
-
Joao Gante authored
Forbid `PretrainedConfig` from saving `generate` parameters; Update deprecations in `generate`-related code 🧹 (#32659) Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com>
-
Cyril Vallez authored
* Add .float() in all generation methods logit outputs * Switch float-casting of logits to training only for main models * Add `num_logits_to_keep` in Llama and add it by default in generate * Apply style * Add num_logits_to_keep as arg in prepare_input_for_generation * Add support for Mistral * Revert models except llama and mistral * Fix default None value in _supports_num_logits_to_keep() * Fix dimension of dummy input * Add exception for prophetnet in _supports_num_logits_to_keep() * Update _supports_num_logits_to_keep() to use inspect.signature() * Add deprecation cycle + remove modification with pretraining_tp * Apply style * Add most used models * Apply style * Make `num_logits_to_keep` an int in all cases to remove if-else clause * Add compile check for the warning * Fix torch versions * style * Add gemma2 * Update warning version * Add comment about .float operations in generation utils * Add tests in GenerationTesterMixin and ModelTesterMixin * Fix batch size for assisted decoding in tests * fix small issues in test * refacor test * fix slicing removing dim issue * Add nemotron support (should fix check-copy issue in CIs) * Trigger new CIs * Trigger new CIs * Bump version * Bump version in TODO * Trigger CIs * remove blank space * Trigger CIs
-
- 22 Aug, 2024 17 commits
-
-
Stefano Fiorucci authored
fix outdated link
-
Joao Gante authored
-
Jinuk authored
* docs: ko: tasks/knowledge_distillation_for_image_classification.md * feat: nmt draft * fix: manual edits * Apply suggestions from code review Co-authored-by:
Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr> * Apply suggestions from code review Co-authored-by:
Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr> * Apply suggestions from code review Co-authored-by:
Ahnjj_DEV <ahnjj.dev@gmail.com> * Apply suggestions from code review Co-authored-by:
Ahnjj_DEV <ahnjj.dev@gmail.com> * Apply suggestions from code review Co-authored-by:
Ahnjj_DEV <ahnjj.dev@gmail.com> * Apply suggestions from code review Co-authored-by:
Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr> * Apply suggestions from code review Co-authored-by:
Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr> * Apply suggestions from code review Co-authored-by:
Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr> * Apply suggestions from code review * Apply suggestions from code review * Apply suggestions from code review * Apply suggestions from code review --------- Co-authored-by:
Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr> Co-authored-by:
Ahnjj_DEV <ahnjj.dev@gmail.com>
-
Franz Louis Cesista authored
fix save_pretrained
-
Andrés Marafioti authored
-
Joao Gante authored
-
Shaopeng Fu authored
fix: (issue #32689) `AttributeError` raised when using `Trainer` with `eval_on_start=True` in Jupyter Notebook. (#32849) fix: `AttributeError` raised when using `Trainer` with `eval_on_start=True` in Jupyter Notebook.
-
Isotr0py authored
* add chat_template to gguf tokenizer * add template through tokenizer config
-
regisss authored
Do not call torch.repeat_interleave if expand_size is 1
-
Yih-Dar authored
* fix * >= 0.3.0 --------- Co-authored-by:
ydshieh <ydshieh@users.noreply.github.com>
-
Shubham Ugare authored
-
Younes Belkada authored
* Update hub.py * Update errors * Apply suggestions from code review Co-authored-by:
Lucain <lucainp@gmail.com> --------- Co-authored-by:
Amy Roberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by:
Lucain <lucainp@gmail.com>
-
Joao Gante authored
* separate step to download nltk files * duplicated * rm comma
-
Marc Sun authored
* add 4bit optimizer * style * fix msg * style * add qgalore * Revert "add qgalore" This reverts commit 25278e805f24d5d48eaa0638abb48de1b783a3fb. * style * version check
-
Gal Cohen (galco) authored
Co-authored-by:
Gal Cohen <galc@ai21.com>
-
Sai-Suraj-27 authored
Added missing huggingface_hub installation to workflows.
-
Joao Gante authored
* try test updates * a few more changes * a few more changes * a few more changes * [run slow] jamba * skip logits checks on older gpus * [run slow] jamba * oops * [run slow] jamba * Update tests/models/jamba/test_modeling_jamba.py Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/jamba/test_modeling_jamba.py Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by:
amyeroberts <22614925+amyeroberts@users.noreply.github.com>
-
- 21 Aug, 2024 2 commits
-
-
Arthur authored
commit
-
Ruilin Huang authored
fix: [whisper] don't overwrite GenerationConfig's `return_timestamps` when `return_timestamps` is not passed to `generate` function (#31296) [whisper] don't overwrite return_timestamps when not passed to generate
-
- 20 Aug, 2024 9 commits
-
-
Ahmed Almaghz authored
* Update README.md * Update README.md * Add README_ar.md to i18n/README_de.md * Add README_ar.md to i18n/README_es.md * Add README_ar.md to i18n/README_fr.md * Add README_ar.md to i18n/README_hd.md * Add README_ar.md to i18n/README_ja.md * Add README_ar.md to i18n/README_ko.md * Add README_ar.md to i18n/README_pt-br.md * Add README_ar.md to i18n/README_ru.md * Add README_ar.md to i18n/README_te.md * Add README_ar.md to i18n/README_vi.md * Add README_ar.md to i18n/README_vi.md * Add README_ar.md to i18n/README_zh-hans.md * Add README_ar.md to i18n/README_zh-hant.md * Create README_ar.md
-
Nicholas Broad authored
* link for optimizer names Add a note and link to where the user can find more optimizer names easily because there are many more optimizers than are mentioned in the docstring. * make fixup
-
Pavel Iakubovskii authored
* Replace .norm() with decomposed version for executorch export * [run_slow] clip
-
dependabot[bot] authored
Bump nltk in /examples/research_projects/decision_transformer Bumps [nltk](https://github.com/nltk/nltk) from 3.7 to 3.9. - [Changelog](https://github.com/nltk/nltk/blob/develop/ChangeLog) - [Commits](https://github.com/nltk/nltk/compare/3.7...3.9 ) --- updated-dependencies: - dependency-name: nltk dependency-type: direct:production ... Signed-off-by:
dependabot[bot] <support@github.com> Co-authored-by:
dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
-
Anton Vlasjuk authored
* mamba2 uses norm_before_gate=False * small nit * remove norm_before_gate flag and follow False path only
-
Gal Cohen (galco) authored
Co-authored-by:
Gal Cohen <galc@ai21.com>
-
Arthur authored
add nx
-
Marc Sun authored
* Update min version of accelerate to 0.26.0 * dev-ci * update min version in import * remove useless check * dev-ci * style * dev-ci * dev-ci
-
Arthur authored
* support head dim * fix the doc * fixup * add oproj Co-authored-by: Suhara <suhara@users.noreply.github.com>> * update Co-authored-by:
bzantium <bzantium@users.noreply.github.com> * Co-authored-by:
suhara <suhara@users.noreply.github.com> * Update Co-authored-by:
Yoshi Suhara <suhara@users.noreply.github.com> --------- Co-authored-by:
bzantium <bzantium@users.noreply.github.com> Co-authored-by:
Yoshi Suhara <suhara@users.noreply.github.com>
-
- 19 Aug, 2024 9 commits
-
-
Matt authored
-
Sai-Suraj-27 authored
Fixed whisper-large-v2 model link in docs.
-
Anton Vlasjuk authored
* fix cache when using input embeddings * simplify check, we can always add input ids seq len since its 0 in first pass
-
Younes Belkada authored
* fix mamba left padding * Apply suggestions from code review Co-authored-by:
Pablo Montalvo <39954772+molbap@users.noreply.github.com> * fix copies * test with `inputs_embeds` * Update src/transformers/models/falcon_mamba/modeling_falcon_mamba.py Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com> * copies * clairfy * fix last comments * remove --------- Co-authored-by:
Pablo Montalvo <39954772+molbap@users.noreply.github.com> Co-authored-by:
Arthur <48595927+ArthurZucker@users.noreply.github.com>
-
Isotr0py authored
* fix gguf config vocab size * minor fix * link issue
-
Alan-Blanchet authored
* fix: Parameterized norm freezing For the R18 model, the authors don't freeze norms in the backbone. * Update src/transformers/models/rt_detr/configuration_rt_detr.py Co-authored-by:
Pavel Iakubovskii <qubvel@gmail.com> --------- Co-authored-by:
Pavel Iakubovskii <qubvel@gmail.com>
-
Yitong Huang authored
* Support save/load ckpt for XLA FSDP * Fix bug for save * Fix style * reserve sharded ckpt and better file naming * minor fix Co-authored-by:
Zach Mueller <muellerzr@gmail.com> * add is_fsdp_xla_v1_enabled --------- Co-authored-by:
Zach Mueller <muellerzr@gmail.com>
-
Aaron Chung authored
* Add representation for Conv1D, for better output info. * code format for Conv1D * We add a __repr__ func for Conv1D, this allows the print (or output) of the model's info has a better description for Conv1D.
-
Fanli Lin authored
* enable * fix
-