Commits · 854260ca44080a13bbf1937c3c6ce3a2d17aba07 · 某某某 / transformers-new

31 Aug, 2021 12 commits

TF/Numpy variants for all DataCollator classes (#13105) · 854260ca

Matt authored 3 years ago


* Adding a TF variant of the DataCollatorForTokenClassification to get feedback

* Added a Numpy variant and a post_init check to fail early if a missing import is found

* Fixed call to Numpy variant

* Added a couple more of the collators

* Update src/transformers/data/data_collator.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Fixes, style pass, finished DataCollatorForSeqToSeq

* Added all the LanguageModeling DataCollators, except SOP and PermutationLanguageModeling

* Adding DataCollatorForPermutationLanguageModeling

* Style pass

* Add missing `__call__` for PLM

* Remove `post_init` checks for frameworks because the imports inside them were making us fail code quality checks

* Remove unused imports

* First attempt at some TF tests

* A second attempt to make any of those tests actually work

* TF tests, round three

* TF tests, round four

* TF tests, round five

* TF tests, all enabled!

* Style pass

* Merging tests into `test_data_collator.py`

* Merging tests into `test_data_collator.py`

* Fixing up test imports

* Fixing up test imports

* Trying shuffling the conditionals around

* Commenting out non-functional old tests

* Completed all tests for all three frameworks

* Style pass

* Fixed test typo

* Style pass

* Move standard `__call__` method to mixin

* Rearranged imports for `test_data_collator`

* Fix data collator typo "torch" -> "pt"

* Fixed the most embarrassingly obvious bug

* Update src/transformers/data/data_collator.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Renaming mixin

* Updating docs

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Dalton Walker <dalton_walker@icloud.com>
Co-authored-by: Andrew Romans <andrew.romans@hotmail.com>

854260ca

Clean up test file · 74b3344f
Sylvain Gugger authored 3 years ago

74b3344f

Set missing seq_length variable when using inputs_embeds with ALBERT & Remove... · ef8d6f2b

Jongheon Kim authored 3 years ago

Set missing seq_length variable when using inputs_embeds with ALBERT & Remove code duplication (#13152)

* Set seq_length variable when using inputs_embeds

* remove code duplication

ef8d6f2b

docs: fix minor typo (#13289) · 180c6de6
Jake Tae authored 3 years ago
```
`at` should be `a1`
```
180c6de6
correct TP implementation resources (#13248) · 066fd047
Stas Bekman authored 3 years ago
```
fix a few implementation links
```
066fd047
Handle nested dict/lists of tensors as inputs in the Trainer (#13338) · 4d10474f
Sylvain Gugger authored 3 years ago

4d10474f

Deberta_v2 tf (#13120) · 3efcfeab

Kamal Raj authored 3 years ago

* Deberta_v2 tf

* added new line at the end of file, make style

* +V2, typo

* remove never executed branch of code

* rm cmnt and fixed typo in url filter

* cleanup according to review comments

* added #Copied from

3efcfeab

doc mismatch fixed (#13345) · 286ccefb
Apoorv Garg authored 3 years ago

286ccefb

Add GPT2ForTokenClassification (#13290) · 41c55941

tucan9389 authored 3 years ago


* Add GPT2ForTokenClassification

* Fix dropout exception for GPT2 NER

* Remove sequence label in test

* Change TokenClassifierOutput to TokenClassifierOutputWithPast

* Fix for black formatter

* Remove dummy

* Update docs for GPT2ForTokenClassification

* Fix check_inits ci fail

* Update dummy_pt_objects after make fix-copies

* Remove TokenClassifierOutputWithPast

* Fix tuple input issue

Co-authored-by: danielsejong55@gmail.com <danielsejong55@gmail.com>

41c55941

Fixing a typo in the data_collator documentation (#13309) · 11fbc32e
Serhiy-Shekhovtsov authored 3 years ago

11fbc32e

[Testing] Add Flax Tests on GPU, Add Speech and Vision to Flax & TF tests (#13313) · 062300ba

Patrick von Platen authored 3 years ago

* up

* finish

* Apply suggestions from code review

* apply Lysandres suggestions

* adapt circle ci as well

* finish

* Update setup.py

062300ba

Tests fetcher tests (#13340) · 8b2de0e4

Sylvain Gugger authored 3 years ago

* Incorporate tests dependencies in tests_fetcher

* Harder modif

* Debug

* Loop through all files

* Last modules

* Remove debug statement

8b2de0e4

30 Aug, 2021 22 commits

Use DS callable API to allow hf_scheduler + ds_optimizer (#13216) · 42f359d0

Olatunji Ruwase authored 3 years ago


* Use DS callable API to allow hf_scheduler + ds_optimizer

* Preserve backward-compatibility

* Restore backward compatibility

* Tweak arg positioning

* Tweak arg positioning

* bump the required version

* Undo indent

* Update src/transformers/trainer.py

* style

Co-authored-by: Stas Bekman <stas@stason.org>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

42f359d0

Add missing module __spec__ (#13321) · 35236b87

Laura Hanu authored 3 years ago

* added missing __spec__ to _LazyModule

* test __spec__ is not None after module import

* changed module_spec arg to be optional in _LazyModule

* fix style issue

* added module spec test to test_file_utils

35236b87

Fix release utils (#13337) · 4ebe798f

Sylvain Gugger authored 3 years ago


* Fix release utils

* Update docs/source/conf.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

4ebe798f

Fix AutoTokenizer when no fast tokenizer is available (#13336) · c4ecd234
Sylvain Gugger authored 3 years ago
```
* Fix AutoTokenizer when a tokenizer has no fast version

* Add test
```
c4ecd234

Correct wrong function signatures on the docs website (#13198) · ffecfea9

Li-Huai (Allan) Lin authored 3 years ago

* Correct outdated function signatures on website.

* Upgrade sphinx to 3.5.4 (latest 3.x)

* Test

* Test

* Test

* Test

* Test

* Test

* Revert unnecessary changes.

* Change sphinx version to 3.5.4"

* Test python 3.7.11

ffecfea9

albert flax (#13294) · 98e409ab

Kamal Raj authored 3 years ago

* albert flax

* year -> 2021

* docstring updated for flax

* removed head_mask

* removed from_pt

* removed passing attention_mask to embedding layer

98e409ab

the use_auth_token has not been set up early enough in the model_kwargs. Fixes #12941 (#13205) · ee5b2457
Ben Nimmo authored 3 years ago

ee5b2457
Fall back to `observed_batch_size` when the `dataloader` does not know the `batch_size`. (#13188) · 03056730
Maxwell Forbes authored 3 years ago

03056730
🐛 fix small model card bugs (#13310) · ce6add8e
Nathan Raw authored 3 years ago
```
* 🐛 fix small model card bugs

* 💄 style
```
ce6add8e
Update label2id in the model config for run_glue (#13334) · 139e8301
Sylvain Gugger authored 3 years ago

139e8301

add ability to connect a neptune.ai run (#13319) · 6f3c99ac

fcakyon authored 3 years ago

when `NEPTUNE_RUN_ID` environmetnt variable is set, neptune will log into the previous run with id `NEPTUNE_RUN_ID`

6f3c99ac

Use existing functionality for #13251 (#13333) · f4f4e6b2
Sylvain Gugger authored 3 years ago

f4f4e6b2
Check None before going through iteration (#13250) · d5064953
Li-Huai (Allan) Lin authored 3 years ago
```
* Check None before going through iteration

* Format
```
d5064953

distilbert-flax (#13324) · 774760e6

Kamal Raj authored 3 years ago

* distilbert-flax

* added missing self

* docs fix

* removed tied kernal extra init

* updated docs

* x -> hidden states

* removed head_mask

* removed from_pt, +FLAX

* updated year

774760e6

fix: typo spelling grammar (#13212) · 01977466
arfy slowy authored 3 years ago
```
* fix: typo spelling grammar

* fix: make fixup
```
01977466

Improve documentation of pooler_output in ModelOutput (#13228) · ef83dc4f

Navjot authored 3 years ago


* update documentation of pooler_output in modeling_outputs, making it more clear and available for generic usage

* Update src/transformers/modeling_outputs.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/modeling_outputs.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* run make style

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

ef83dc4f

✨ add citation file (#13214) · 7828194e
Falk Puschner authored 3 years ago

7828194e

Add LayoutLMv2 + LayoutXLM (#12604) · b6ddb08a

NielsRogge authored 3 years ago


* First commit

* Make style

* Fix dummy objects

* Add Detectron2 config

* Add LayoutLMv2 pooler

* More improvements, add documentation

* More improvements

* Add model tests

* Add clarification regarding image input

* Improve integration test

* Fix bug

* Fix another bug

* Fix another bug

* Fix another bug

* More improvements

* Make more tests pass

* Make more tests pass

* Improve integration test

* Remove gradient checkpointing and add head masking

* Add integration test

* Add LayoutLMv2ForSequenceClassification to the tests

* Add LayoutLMv2ForQuestionAnswering

* More improvements

* More improvements

* Small improvements

* Fix _LazyModule

* Fix fast tokenizer

* Move sync_batch_norm to a separate method

* Replace dummies by requires_backends

* Move calculation of visual bounding boxes to separate method + update README

* Add models to main init

* First draft

* More improvements

* More improvements

* More improvements

* More improvements

* More improvements

* Remove is_split_into_words

* More improvements

* Simply tesseract - no use of pandas anymore

* Add LayoutLMv2Processor

* Update is_pytesseract_available

* Fix bugs

* Improve feature extractor

* Fix bug

* Add print statement

* Add truncation of bounding boxes

* Add tests for LayoutLMv2FeatureExtractor and LayoutLMv2Tokenizer

* Improve tokenizer tests

* Make more tokenizer tests pass

* Make more tests pass, add integration tests

* Finish integration tests

* More improvements

* More improvements - update API of the tokenizer

* More improvements

* Remove support for VQA training

* Remove some files

* Improve feature extractor

* Improve documentation and one more tokenizer test

* Make quality and small docs improvements

* Add batched tests for LayoutLMv2Processor, remove fast tokenizer

* Add truncation of labels

* Apply suggestions from code review

* Improve processor tests

* Fix failing tests and add suggestion from code review

* Fix tokenizer test

* Add detectron2 CI job

* Simplify CI job

* Comment out non-detectron2 jobs and specify number of processes

* Add pip install torchvision

* Add durations to see which tests are slow

* Fix tokenizer test and make model tests smaller

* Frist draft

* Use setattr

* Possible fix

* Proposal with configuration

* First draft of fast tokenizer

* More improvements

* Enable fast tokenizer tests

* Make more tests pass

* Make more tests pass

* More improvements

* Addd padding to fast tokenizer

* Mkae more tests pass

* Make more tests pass

* Make all tests pass for fast tokenizer

* Make fast tokenizer support overflowing boxes and labels

* Add support for overflowing_labels to slow tokenizer

* Add support for fast tokenizer to the processor

* Update processor tests for both slow and fast tokenizers

* Add head models to model mappings

* Make style & quality

* Remove Detectron2 config file

* Add configurable option to label all subwords

* Fix test

* Skip visual segment embeddings in test

* Use ResNet-18 backbone in tests instead of ResNet-101

* Proposal

* Re-enable all jobs on CI

* Fix installation of tesseract

* Fix failing test

* Fix index table

* Add LayoutXLM doc page, first draft of code examples

* Improve documentation a lot

* Update expected boxes for Tesseract 4.0.0 beta

* Use offsets to create labels instead of checking if they start with ##

* Update expected boxes for Tesseract 4.1.1

* Fix conflict

* Make variable names cleaner, add docstring, add link to notebooks

* Revert "Fix conflict"

This reverts commit a9b46ce9afe47ebfcfe7b45e6a121d49e74ef2c5.

* Revert to make integration test pass

* Apply suggestions from @LysandreJik's review

* Address @patrickvonplaten's comments

* Remove fixtures DocVQA in favor of dataset on the hub

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>

b6ddb08a

use float 16 in causal mask and masked bias (#13194) · 439e7abd
Hwijeen Ahn authored 3 years ago

439e7abd
Announcing the default model used by the pipeline (with a link). (#13276) · 8be921f9
Nicolas Patry authored 3 years ago

8be921f9

[Slow tests] Disable Wav2Vec2 pretraining test for now (#13303) · a75db353

Patrick von Platen authored 3 years ago


* fix_torch_device_generate_test

* remove @

* wav2vec2 pretraining

Co-authored-by: Patrick von Platen <patrick@huggingface.co>

a75db353

correct (#13304) · 4362ee29
Patrick von Platen authored 3 years ago

4362ee29

28 Aug, 2021 1 commit

examples: only use keep_linebreaks when reading TXT files (#13320) · 4046e66e

Stefan Schweter authored 3 years ago

* examples: only use keep_linebreaks when reading TXT files for all CLM examples

* examples: only use keep_linebreaks when reading TXT files for all CLM examples

* examples: only use keep_linebreaks when reading TXT files for all CLM examples

4046e66e

27 Aug, 2021 5 commits

Add Wav2Vec2 & Hubert ForSequenceClassification (#13153) · b6f332ec

Anton Lozhkov authored 3 years ago

* Add hubert classifier + tests

* Add hubert classifier + tests

* Dummies for all classification tests

* Wav2Vec2 classifier + ER test

* Fix hubert integration tests

* Add hubert IC

* Pass tests for all classification tasks on Hubert

* Pass all tests + copies

* Move models to the SUPERB org

b6f332ec

[Flax] Correct all return tensors to numpy (#13307) · 2bef3433
Patrick von Platen authored 3 years ago
```
* fix_torch_device_generate_test

* remove @

* finish find and replace
```
2bef3433
Fixing mbart50 with `return_tensors` argument too. (#13301) · 8aa67fc1
Nicolas Patry authored 3 years ago
```
* Fixing mbart50 with `return_tensors` argument too.

* Adding mbart50 tokenization tests.
```
8aa67fc1

Moving `zero-shot-classification` pipeline to new testing. (#13299) · b89a964d

Nicolas Patry authored 3 years ago

* Moving `zero-shot-classification` pipeline to new testing.

* Cleaning up old mixins.

* Fixing tests
`sshleifer/tiny-distilbert-base-uncased-finetuned-sst-2-english` is
corrupted in PT.

* Adding warning.

b89a964d

Fix BeitForMaskedImageModeling (#13275) · cc27ac1a

NielsRogge authored 3 years ago

* First pass

* Fix docs of bool_masked_pos

* Add integration script

* Fix docstring

* Add integration test for BeitForMaskedImageModeling

* Remove file

* Fix docs

cc27ac1a