Commits · f4f4e6b2d3f8ff6a5d418c3cb617caebf37e3893 · 某某某 / transformers-new

30 Aug, 2021 11 commits

Use existing functionality for #13251 (#13333) · f4f4e6b2
Sylvain Gugger authored 3 years ago

f4f4e6b2
Check None before going through iteration (#13250) · d5064953
Li-Huai (Allan) Lin authored 3 years ago
```
* Check None before going through iteration

* Format
```
d5064953

Kamal Raj authored 3 years ago

* distilbert-flax

* added missing self

* docs fix

* removed tied kernal extra init

* updated docs

* x -> hidden states

* removed head_mask

* removed from_pt, +FLAX

* updated year

774760e6

fix: typo spelling grammar (#13212) · 01977466
arfy slowy authored 3 years ago
```
* fix: typo spelling grammar

* fix: make fixup
```
01977466

Improve documentation of pooler_output in ModelOutput (#13228) · ef83dc4f

Navjot authored 3 years ago


* update documentation of pooler_output in modeling_outputs, making it more clear and available for generic usage

* Update src/transformers/modeling_outputs.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/modeling_outputs.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* run make style

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

ef83dc4f

✨ add citation file (#13214) · 7828194e
Falk Puschner authored 3 years ago

7828194e

Add LayoutLMv2 + LayoutXLM (#12604) · b6ddb08a

NielsRogge authored 3 years ago


* First commit

* Make style

* Fix dummy objects

* Add Detectron2 config

* Add LayoutLMv2 pooler

* More improvements, add documentation

* More improvements

* Add model tests

* Add clarification regarding image input

* Improve integration test

* Fix bug

* Fix another bug

* Fix another bug

* Fix another bug

* More improvements

* Make more tests pass

* Make more tests pass

* Improve integration test

* Remove gradient checkpointing and add head masking

* Add integration test

* Add LayoutLMv2ForSequenceClassification to the tests

* Add LayoutLMv2ForQuestionAnswering

* More improvements

* More improvements

* Small improvements

* Fix _LazyModule

* Fix fast tokenizer

* Move sync_batch_norm to a separate method

* Replace dummies by requires_backends

* Move calculation of visual bounding boxes to separate method + update README

* Add models to main init

* First draft

* More improvements

* More improvements

* More improvements

* More improvements

* More improvements

* Remove is_split_into_words

* More improvements

* Simply tesseract - no use of pandas anymore

* Add LayoutLMv2Processor

* Update is_pytesseract_available

* Fix bugs

* Improve feature extractor

* Fix bug

* Add print statement

* Add truncation of bounding boxes

* Add tests for LayoutLMv2FeatureExtractor and LayoutLMv2Tokenizer

* Improve tokenizer tests

* Make more tokenizer tests pass

* Make more tests pass, add integration tests

* Finish integration tests

* More improvements

* More improvements - update API of the tokenizer

* More improvements

* Remove support for VQA training

* Remove some files

* Improve feature extractor

* Improve documentation and one more tokenizer test

* Make quality and small docs improvements

* Add batched tests for LayoutLMv2Processor, remove fast tokenizer

* Add truncation of labels

* Apply suggestions from code review

* Improve processor tests

* Fix failing tests and add suggestion from code review

* Fix tokenizer test

* Add detectron2 CI job

* Simplify CI job

* Comment out non-detectron2 jobs and specify number of processes

* Add pip install torchvision

* Add durations to see which tests are slow

* Fix tokenizer test and make model tests smaller

* Frist draft

* Use setattr

* Possible fix

* Proposal with configuration

* First draft of fast tokenizer

* More improvements

* Enable fast tokenizer tests

* Make more tests pass

* Make more tests pass

* More improvements

* Addd padding to fast tokenizer

* Mkae more tests pass

* Make more tests pass

* Make all tests pass for fast tokenizer

* Make fast tokenizer support overflowing boxes and labels

* Add support for overflowing_labels to slow tokenizer

* Add support for fast tokenizer to the processor

* Update processor tests for both slow and fast tokenizers

* Add head models to model mappings

* Make style & quality

* Remove Detectron2 config file

* Add configurable option to label all subwords

* Fix test

* Skip visual segment embeddings in test

* Use ResNet-18 backbone in tests instead of ResNet-101

* Proposal

* Re-enable all jobs on CI

* Fix installation of tesseract

* Fix failing test

* Fix index table

* Add LayoutXLM doc page, first draft of code examples

* Improve documentation a lot

* Update expected boxes for Tesseract 4.0.0 beta

* Use offsets to create labels instead of checking if they start with ##

* Update expected boxes for Tesseract 4.1.1

* Fix conflict

* Make variable names cleaner, add docstring, add link to notebooks

* Revert "Fix conflict"

This reverts commit a9b46ce9afe47ebfcfe7b45e6a121d49e74ef2c5.

* Revert to make integration test pass

* Apply suggestions from @LysandreJik's review

* Address @patrickvonplaten's comments

* Remove fixtures DocVQA in favor of dataset on the hub

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>

b6ddb08a

use float 16 in causal mask and masked bias (#13194) · 439e7abd
Hwijeen Ahn authored 3 years ago

439e7abd
Announcing the default model used by the pipeline (with a link). (#13276) · 8be921f9
Nicolas Patry authored 3 years ago

8be921f9

[Slow tests] Disable Wav2Vec2 pretraining test for now (#13303) · a75db353

Patrick von Platen authored 3 years ago


* fix_torch_device_generate_test

* remove @

* wav2vec2 pretraining

Co-authored-by: Patrick von Platen <patrick@huggingface.co>

a75db353

correct (#13304) · 4362ee29
Patrick von Platen authored 3 years ago

4362ee29

28 Aug, 2021 1 commit

examples: only use keep_linebreaks when reading TXT files (#13320) · 4046e66e

Stefan Schweter authored 3 years ago

* examples: only use keep_linebreaks when reading TXT files for all CLM examples

* examples: only use keep_linebreaks when reading TXT files for all CLM examples

* examples: only use keep_linebreaks when reading TXT files for all CLM examples

4046e66e

27 Aug, 2021 8 commits

Add Wav2Vec2 & Hubert ForSequenceClassification (#13153) · b6f332ec

Anton Lozhkov authored 3 years ago

* Add hubert classifier + tests

* Add hubert classifier + tests

* Dummies for all classification tests

* Wav2Vec2 classifier + ER test

* Fix hubert integration tests

* Add hubert IC

* Pass tests for all classification tasks on Hubert

* Pass all tests + copies

* Move models to the SUPERB org

b6f332ec

[Flax] Correct all return tensors to numpy (#13307) · 2bef3433
Patrick von Platen authored 3 years ago
```
* fix_torch_device_generate_test

* remove @

* finish find and replace
```
2bef3433
Fixing mbart50 with `return_tensors` argument too. (#13301) · 8aa67fc1
Nicolas Patry authored 3 years ago
```
* Fixing mbart50 with `return_tensors` argument too.

* Adding mbart50 tokenization tests.
```
8aa67fc1

Moving `zero-shot-classification` pipeline to new testing. (#13299) · b89a964d

Nicolas Patry authored 3 years ago

* Moving `zero-shot-classification` pipeline to new testing.

* Cleaning up old mixins.

* Fixing tests
`sshleifer/tiny-distilbert-base-uncased-finetuned-sst-2-english` is
corrupted in PT.

* Adding warning.

b89a964d

Fix BeitForMaskedImageModeling (#13275) · cc27ac1a

NielsRogge authored 3 years ago

* First pass

* Fix docs of bool_masked_pos

* Add integration script

* Fix docstring

* Add integration test for BeitForMaskedImageModeling

* Remove file

* Fix docs

cc27ac1a

Moving `translation` pipeline to new testing scheme. (#13297) · a3f96f36
Nicolas Patry authored 3 years ago
```
* Moving `translation` pipeline to new testing scheme.

* Update tokenization mbart tests.
```
a3f96f36

examples: add keep_linebreaks option to CLM examples (#13150) · 319d840b

Stefan Schweter authored 3 years ago

* examples: add keep_linebreaks option to text dataset loader for all CLM examples

* examples: introduce new keep_linebreaks option as data argument in CLM examples

319d840b

Moving `token-classification` pipeline to new testing. (#13286) · 45a8eb66
Nicolas Patry authored 3 years ago
```
* Moving `token-classification` pipeline to new testing.

* Fix tests.
```
45a8eb66

26 Aug, 2021 13 commits

Moving `text-generation` pipeline to new testing framework. (#13285) · a6e36558

Nicolas Patry authored 3 years ago

* Moving `text-generation` pipeline to new testing framework.

* Keep check_model_type but log instead of raise Exception.

* warning -> error.

a6e36558

Add DINO conversion script (#13265) · 0759f251

NielsRogge authored 3 years ago

* First commit

* Add interpolation of patch embeddings

* Comment out code

* Fix bug

* Fix another bug

* Fix bug

* Fix another bug

* Remove print statements

* Update conversion script

* Use the official vit implementation

* Add support for converting dino_vits8

* Add DINO to docs of ViT

* Remove assertion

* Add interpolation of position encodings

* Fix bug

* Add align_corners

* Add interpolate_pos_encoding option to forward pass of ViTModel

* Improve interpolate_pos_encoding method

* Add docstring

0759f251

Moving `text2text-generation` to new pipeline testing mecanism. (#13283) · 14e52783
Nicolas Patry authored 3 years ago

14e52783
Hotfixing master tests. (#13282) · 662b143b
Nicolas Patry authored 3 years ago

662b143b
Moving `text2text-generation` to new pipeline testing mecanism. (#13281) · 59c378d0
Nicolas Patry authored 3 years ago

59c378d0
Moving `table-question-answering` pipeline to new testing. (#13280) · 0ebda538
Nicolas Patry authored 3 years ago

0ebda538
Moving `summarization` pipeline to new testing format. (#13279) · 879fe8fa
Nicolas Patry authored 3 years ago
```
* Moving `summarization` pipeline to new testing format.

* Remove generate_kwargs from __init__ args.
```
879fe8fa

Moving question_answering tests to the new testing scheme. Had to tweak a... · 55fb88d3

Nicolas Patry authored 3 years ago

Moving question_answering tests to the new testing scheme. Had to tweak a little some ModelTesterConfig for pipelines. (#13277)

* Moving question_answering tests to the new testing scheme. Had to tweak
a little some ModelTesterConfig for pipelines.

* Removing commented code.

55fb88d3

Fixing the test (warnings was incorrect.) (#13278) · 4fa1cd99
Nicolas Patry authored 3 years ago

4fa1cd99

Move `image-classification` pipeline to new testing (#13272) · 6b586ed1

Nicolas Patry authored 3 years ago

- Enforce `test_small_models_{tf,pt}` methods to exist (enforce checking
actual values in small tests)
- Add support for non RGB image for the pipeline.

6b586ed1

Add error message concerning revision (#13266) · 401377e6

Bram Vanroy authored 3 years ago


* add error message concerning revision

* Update src/transformers/configuration_utils.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* re-add double line endings

* is not None instead of implicit bool casting

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

401377e6

fix `tokenizer_class_from_name` for models with `-` in the name (#13251) · 40d60e15

Stas Bekman authored 3 years ago


* fix tokenizer_class_from_name

* Update src/transformers/models/auto/tokenization_auto.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* add test

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

40d60e15

Migrating conversational pipeline tests to new testing format (#13114) · 83bfdbdd

Nicolas Patry authored 3 years ago

* New test format for conversational.

* Putting back old mixin.

* Re-enabling auto tests with LazyLoading.

* Feature extraction tests.

* Remove feature-extraction.

* Feature extraction with feature_extractor (No pun intended).

* Update check_model_type for fill-mask.

83bfdbdd

25 Aug, 2021 7 commits
- Add require flax to test (#13260) · 72eefb34
  Lysandre Debut authored 3 years ago
  
  72eefb34
- Some `model_type`s cannot be in the mapping (#13259) · 5af8df5a
  Lysandre Debut authored 3 years ago
```
* Some tokenizers cannot be in the mapping

* Style
```
  5af8df5a
- Add CLIP tokenizer to AutoTokenizer (#13258) · 68b69072
  Lysandre Debut authored 3 years ago
  
  68b69072
- Hubert test fix (#13261) · 3bbe68f8
  Lysandre Debut authored 3 years ago
  
  3bbe68f8
- Better notification service (#13267) · 3bb44662
  Lysandre Debut authored 3 years ago
  
  3bb44662
- Replace assert statement with if condition and ValueError (#13263) · 225de5cc
  Nishant Prabhu authored 3 years ago
  
  225de5cc
- Grad enabled typo · 46554fc1
  Lysandre authored 3 years ago
  
  46554fc1