• Jinho Park's avatar
    Add BROS (#23190) · 17fdd354
    Jinho Park authored
    
    
    * add Bros boilerplate
    
    * copy and pasted modeling_bros.py from official Bros repo
    
    * update copyright of bros files
    
    * copy tokenization_bros.py from official repo and update import path
    
    * copy tokenization_bros_fast.py from official repo and update import path
    
    * copy configuration_bros.py from official repo and update import path
    
    * remove trailing period in copyright line
    
    * copy and paste bros/__init__.py from official repo
    
    * save formatting
    
    * remove unused unnecessary pe_type argument - using only crel type
    
    * resolve import issue
    
    * remove unused model classes
    
    * remove unnecessary tests
    
    * remove unused classes
    
    * fix original code's bug - layer_module's argument order
    
    * clean up modeling auto
    
    * add bbox to prepare_config_and_inputs
    
    * set temporary value to hidden_size (32 is too low because of the of the
    Bros' positional embedding)
    
    * remove decoder test, update create_and_check* input arguemnts
    
    * add missing variable to model tests
    
    * do make fixup
    
    * update bros.mdx
    
    * add boilerate plate for no_head inference test
    
    * update BROS_PRETRAINED_MODEL_ARCHIVE_LIST (add naver-clova-ocr prefix)
    
    * add prepare_bros_batch_inputs function
    
    * update modeling_common to add bbox inputs in Bros Model Test
    
    * remove unnecessary model inference
    
    * add test case
    
    * add model_doc
    
    * add test case for token_classification
    
    * apply fixup
    
    * update modeling code
    
    * update BrosForTokenClassification loss calculation logic
    
    * revert logits preprocessing logic to make sure logits have original shape
    
    * - update class name
    
    * - add BrosSpadeOutput
    - update BrosConfig arguments
    
    * add boilerate plate for no_head inference test
    
    * add prepare_bros_batch_inputs function
    
    * add test case
    
    * add test case for token_classification
    
    * update modeling code
    
    * update BrosForTokenClassification loss calculation logic
    
    * revert logits preprocessing logic to make sure logits have original shape
    
    * apply masking on the fly
    
    * add BrosSpadeForTokenLinking
    
    * update class name
    put docstring to the beginning of the file
    
    * separate the logits calculation logic and loss calculation logic
    
    * update logic for loss calculation so that logits shape doesn't change
    when return
    
    * update typo
    
    * update prepare_config_and_inputs
    
    * update dummy node initialization
    
    * update last_hidden_states getting logic to consider when return_dict is False
    
    * update box first token mask param
    
    * bugfix: remove random attention mask generation
    
    * update keys to ignore on load missing
    
    * run make style and quality
    
    * apply make style and quality of other codes
    
    * update box_first_token_mask to bool type
    
    * update index.md
    
    * apply make style and quality
    
    * apply make fix-copies
    
    * pass check_repo
    
    * update bros model doc
    
    * docstring bugfix fix
    
    * add checkpoint for doc, tokenizer for doc
    
    * Update README.md
    
    * Update docs/source/en/model_doc/bros.md
    
    Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
    
    * Update bros.md
    
    * Update src/transformers/__init__.py
    
    Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
    
    * Update docs/source/en/model_doc/bros.md
    
    Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
    
    * Apply suggestions from code review
    
    Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
    
    * apply suggestions from code review
    
    * apply suggestions from code review
    
    * revert test_processor_markuplm.py
    
    * Update test_processor_markuplm.py
    
    * apply suggestions from code review
    
    * apply suggestions from code review
    
    * apply suggestions from code review
    
    * update BrosSpadeELForTokenClassification head name to entity linker
    
    * add doc string for config params
    
    * update class, var names to more explicit and apply suggestions from code review
    
    * remove unnecessary keys to ignore
    
    * update relation extractor to be initialized with config
    
    * add bros processor
    
    * apply make style and quality
    
    * update bros.md
    
    * remove bros tokenizer, add bros processor that wraps bert tokenizer
    
    * revert change
    
    * apply make fix-copies
    
    * update processor code, update itc -> initial token, stc -> subsequent token
    
    * add type hint
    
    * remove unnecessary condition branches in embedding forward
    
    * fix auto tokenizer fail
    
    * update docstring for each classes
    
    * update bbox input dimension as standard 2 points and convert them to 4
    points in forward pass
    
    * update bros docs
    
    * apply suggestions from code review : update Bros -> BROS in bros.md
    
    * 1. box prefix var -> bbox
    2. update variable names to be more explicit
    
    * replace einsum with torch matmul
    
    * apply style and quality
    
    * remove unused argument
    
    * remove unused arguments
    
    * update docstrings
    
    * apply suggestions from code review: add BrosBboxEmbeddings, replace
    einsum with classical matrix operations
    
    * revert einsum update
    
    * update bros processor
    
    * apply suggestions from code review
    
    * add conversion script for bros
    
    * Apply suggestions from code review
    
    * fix readme
    
    * apply fix-copies
    
    ---------
    
    Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
    17fdd354