• Eduardo Pacheco's avatar
    Adding grounding dino (#26087) · b752ad30
    Eduardo Pacheco authored
    
    
    * Fixed typo when converting weigths to GroundingDINO vision backbone
    
    * Final modifications on modeling
    
    * Removed unnecessary class
    
    * Fixed convert structure
    
    * Added image processing
    
    * make fixup partially completed
    
    * Now text_backbone_config has its own class
    
    * Modified convert script
    
    * Removed unnecessary config attribute
    
    * Added new function to generate sub sentence mask
    
    * Renamed parameters with gamma in the name as it's currently not allowed
    
    * Removed tokenization and image_processing scripts since we'll map from existing models
    
    * Fixed some issues with configuration
    
    * Just some modifications on conversion script
    
    * Other modifications
    
    * Copied deformable detr
    
    * First commit
    
    * Added bert to model
    
    * Bert validated
    
    * Created Text and Fusion layers for Encoder
    
    * Adapted Encoder layer
    
    * Fixed typos
    
    * Adjusted Encoder
    
    * Converted encoder to hf
    
    * Modified Decoder Layer
    
    * Modified main decoder class
    
    * Removed copy comments
    
    * Fixed forward from GroundingDINOModel and GroundingDINODecoder
    
    * Added all necessary layers, configurations and forward logic up to GroundingDINOModel
    
    * Added all layers to convertion
    
    * Fixed outputs for GroundingDINOModel and GroundingDINOForObjectDetection
    
    * Fixed mask input to encoders and fixed nn.MultiheadAttention batch first and attn output
    
    * Fixed forward from GroundingDINOTextEnhancerLayer
    
    * Fixed output bug with GroundingDINODeformableLayer
    
    * Fixed bugs that prevent GroundingDINOForObjectDetection to run forward method
    
    * Fixed attentions to be passed correctly
    
    * Passing temperature arg when creating Sine position embedding
    
    * Removed copy comments
    
    * Added temperature argument for position embedding
    
    * Fixed typo when converting weigths to GroundingDINO vision backbone
    
    * Final modifications on modeling
    
    * Removed unnecessary class
    
    * Fixed convert structure
    
    * Added image processing
    
    * make fixup partially completed
    
    * Now text_backbone_config has its own class
    
    * Modified convert script
    
    * Removed unnecessary config attribute
    
    * Added new function to generate sub sentence mask
    
    * Renamed parameters with gamma in the name as it's currently not allowed
    
    * Removed tokenization and image_processing scripts since we'll map from existing models
    
    * Fixed some issues with configuration
    
    * Just some modifications on conversion script
    
    * Other modifications
    
    * Fix style
    
    * Improve fixup
    
    * Improve conversion script
    
    * Improve conversion script
    
    * Add GroundingDINOProcessor
    
    * More improvements
    
    * Return token type ids
    
    * something
    
    * Fix more tests
    
    * More improvements
    
    * More cleanup
    
    * More improvements
    
    * Fixed tests, improved modeling and config
    
    * More improvements and fixing tests
    
    * Improved tests and modeling
    
    * Improved tests and added image processor
    
    * Improved tests inference
    
    * More improvements
    
    * More test improvements
    
    * Fixed last test
    
    * Improved docstrings and comments
    
    * Fix style
    
    * Update src/transformers/models/grounding_dino/modeling_grounding_dino.py
    
    Co-authored-by: default avatarRafael Padilla <31217453+rafaelpadilla@users.noreply.github.com>
    
    * Update src/transformers/models/grounding_dino/modeling_grounding_dino.py
    
    Co-authored-by: default avatarRafael Padilla <31217453+rafaelpadilla@users.noreply.github.com>
    
    * Update src/transformers/models/grounding_dino/modeling_grounding_dino.py
    
    Co-authored-by: default avatarRafael Padilla <31217453+rafaelpadilla@users.noreply.github.com>
    
    * Update src/transformers/models/grounding_dino/modeling_grounding_dino.py
    
    Co-authored-by: default avatarRafael Padilla <31217453+rafaelpadilla@users.noreply.github.com>
    
    * Update src/transformers/models/grounding_dino/modeling_grounding_dino.py
    
    Co-authored-by: default avatarRafael Padilla <31217453+rafaelpadilla@users.noreply.github.com>
    
    * Better naming
    
    * Better naming
    
    * Added Copied statement
    
    * Added Copied statement
    
    * Moved param init from GroundingDINOBiMultiHeadAttention
    
    * Better naming
    
    * Fixing clamp style
    
    * Better naming
    
    * Update src/transformers/models/grounding_dino/modeling_grounding_dino.py
    
    Co-authored-by: default avatarNielsRogge <48327001+NielsRogge@users.noreply.github.com>
    
    * Update src/transformers/models/grounding_dino/modeling_grounding_dino.py
    
    Co-authored-by: default avatarNielsRogge <48327001+NielsRogge@users.noreply.github.com>
    
    * Update src/transformers/models/grounding_dino/configuration_grounding_dino.py
    
    Co-authored-by: default avatarRafael Padilla <31217453+rafaelpadilla@users.noreply.github.com>
    
    * Update src/transformers/models/grounding_dino/convert_grounding_dino_to_hf.py
    
    Co-authored-by: default avatarRafael Padilla <31217453+rafaelpadilla@users.noreply.github.com>
    
    * Update src/transformers/models/grounding_dino/modeling_grounding_dino.py
    
    Co-authored-by: default avatarRafael Padilla <31217453+rafaelpadilla@users.noreply.github.com>
    
    * Improving conversion script
    
    * Improved config
    
    * Improved naming
    
    * Improved naming again
    
    * Improved grouding-dino.md
    
    * Moved grounding dino to multimodal
    
    * Update src/transformers/models/grounding_dino/convert_grounding_dino_to_hf.py
    
    Co-authored-by: default avatarRafael Padilla <31217453+rafaelpadilla@users.noreply.github.com>
    
    * Fixed docstrings and style
    
    * Fix docstrings
    
    * Remove timm attributes
    
    * Reorder imports
    
    * More improvements
    
    * Add Grounding DINO to pipeline
    
    * Remove model from check_repo
    
    * Added grounded post_process to GroundingDINOProcessor
    
    * Fixed style
    
    * Fixed GroundingDINOTextPrenetConfig docstrings
    
    * Aligned inputs.keys() when both image and text are passed with model_input_names
    
    * Added tests for GroundingDINOImageProcessor and GroundingDINOProcessor
    
    * Testing post_process_grounded_object_detection from GroundingDINOProcessor at test_inference_object_detection_head
    
    * Fixed order
    
    * Marked test with require_torch
    
    * Temporarily changed repo_id
    
    * More improvements
    
    * Fix style
    
    * Final improvements
    
    * Improve annotators
    
    * Fix style
    
    * Add is_torch_available
    
    * Remove type hints
    
    * vocab_tokens as one liner
    
    * Removed print statements
    
    * Renamed GroundingDINOTextPrenetConfig to GroundingDINOTextConfig
    
    * remove unnecessary comments
    
    * Removed unnecessary tests on conversion script
    
    * Renamed GroundingDINO to camel case GroundingDino
    
    * Fixed GroundingDinoProcessor docstrings
    
    * loading MSDA kernels in the modeling file
    
    * Fix copies
    
    * Replace nn.multiheadattention
    
    * Replace nn.multiheadattention
    
    * Fixed inputs for GroundingDinoMultiheadAttention & order of modules
    
    * Fixed processing to avoid messing with inputs
    
    * Added more tips for GroundingDino
    
    * Make style
    
    * Chaning name to align with SAM
    
    * Replace final nn.multiheadattention
    
    * Fix model tests
    
    * Update year, remove GenerationTesterMixin
    
    * Address comments
    
    * Address more comments
    
    * Rename TextPrenet to TextModel
    
    * Rename hidden_states
    
    * Address more comments
    
    * Address more comments
    
    * Address comment
    
    * Address more comments
    
    * Address merge
    
    * Address comment
    
    * Address comment
    
    * Address comment
    
    * Make style
    
    * Added layer norm eps to layer norms
    
    * Address more comments
    
    * More fixes
    
    * Fixed equivalence
    
    * Make fixup
    
    * Remove print statements
    
    * Address comments
    
    * Address comments
    
    * Address comments
    
    * Address comments
    
    * Address comments
    
    * Address comments
    
    * Add comment
    
    * Address comment
    
    * Remove overwriting of test
    
    * Fix bbox_embed
    
    * Improve decoder_bbox_embed_share
    
    * Simplify outputs
    
    * Updated post_process_grounded_object_detection
    
    * Renamed sources to feature_maps
    
    * Improved tests for Grounding Dino ImageProcessor and Processor
    
    * Fixed test requirements and imports
    
    * Fixed image_processing
    
    * Fixed processor tests
    
    * Fixed imports for image processing tests
    
    * Fix copies
    
    * Updated modeling
    
    * Fix style
    
    * Moved functions to correct position
    
    * Fixed copy issues
    
    * Update src/transformers/models/deformable_detr/modeling_deformable_detr.py
    
    Co-authored-by: default avatarSangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com>
    
    * Update src/transformers/models/grounding_dino/modeling_grounding_dino.py
    
    Co-authored-by: default avatarSangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com>
    
    * Update src/transformers/models/grounding_dino/modeling_grounding_dino.py
    
    Co-authored-by: default avatarSangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com>
    
    * Keeping consistency custom cuda kernels for MSDA
    
    * Make GroundingDinoProcessor logic clearer
    
    * Updated Grounding DINO checkpoints
    
    * Changed tests to correct structure
    
    * Updated gpu-cpu equivalence test
    
    * fix copies
    
    * Update src/transformers/models/grounding_dino/processing_grounding_dino.py
    
    Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
    
    * Update src/transformers/models/grounding_dino/processing_grounding_dino.py
    
    Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
    
    * Update src/transformers/models/grounding_dino/modeling_grounding_dino.py
    
    Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
    
    * Update src/transformers/models/grounding_dino/configuration_grounding_dino.py
    
    Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
    
    * Fixed erros and style
    
    * Fix copies
    
    * Removed inheritance from PreTrainedModel from GroundingDinoTextModel
    
    * Fixed GroundingDinoTextModel
    
    * Fixed type of default backbone config
    
    * Fixed missing methods for GroundingDinoTextModel and Added timm support for GroundingDinoConvEncoder
    
    * Addressed comments
    
    * Addressed batched image processing tests
    
    * Addressed zero shot test comment
    
    * Addressed tip comment
    
    * Removed GroundingDinoTextModel from check_repo
    
    * Removed inplace masking
    
    * Addressed comments
    
    * Addressed comments
    
    * Addressed comments
    
    * Fix copies
    
    * Fixing timm test
    
    * Fixed batching equivalence test
    
    * Update docs/source/en/model_doc/grounding-dino.md
    
    Co-authored-by: default avatarTianqi Xu <40522713+dandansamax@users.noreply.github.com>
    
    * Update docs/source/en/model_doc/grounding-dino.md
    
    Co-authored-by: default avatarTianqi Xu <40522713+dandansamax@users.noreply.github.com>
    
    * Update docs/source/en/model_doc/grounding-dino.md
    
    Co-authored-by: default avatarTianqi Xu <40522713+dandansamax@users.noreply.github.com>
    
    * Addressed more comments
    
    * Added a new comment
    
    * Reduced image size
    
    * Addressed more comments
    
    * Nits
    
    * Nits
    
    * Changed the way text_config is initialized
    
    * Update src/transformers/models/grounding_dino/processing_grounding_dino.py
    
    Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
    
    ---------
    
    Co-authored-by: default avatarNiels <niels.rogge1@gmail.com>
    Co-authored-by: default avatarRafael Padilla <31217453+rafaelpadilla@users.noreply.github.com>
    Co-authored-by: default avatarNielsRogge <48327001+NielsRogge@users.noreply.github.com>
    Co-authored-by: default avatarEduardo Pacheco <eduardo.pacheco@limehome.com>
    Co-authored-by: default avatarSangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com>
    Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
    Co-authored-by: default avatarTianqi Xu <40522713+dandansamax@users.noreply.github.com>
    b752ad30