• NielsRogge's avatar
    Add Nougat (#25942) · ace74d16
    NielsRogge authored
    
    
    * Add conversion script
    
    * Add NougatImageProcessor
    
    * Add crop margin
    
    * More improvements
    
    * Add docs, READMEs
    
    * Remove print statements
    
    * Include model_max_length
    
    * Add NougatTokenizerFast
    
    * Fix imports
    
    * Improve postprocessing
    
    * Improve image processor
    
    * Fix image processor
    
    * Improve normalize method
    
    * More improvements
    
    * More improvements
    
    * Add processor, improve docs
    
    * Simplify fast tokenizer
    
    * Remove test file
    
    * Fix docstrings
    
    * Use NougatProcessor in conversion script
    
    * Add is_levensthein_available
    
    * Add tokenizer tests
    
    * More improvements
    
    * Use numpy instead of opencv
    
    * Add is_cv2_available
    
    * Fix cv2_available
    
    * Add is_nltk_available
    
    * Add image processor tests, improve crop_margin
    
    * Add integration tests
    
    * Improve integration test
    
    * Use do_rescale instead of hacks, thanks Amy
    
    * Remove random_padding
    
    * Address comments
    
    * Address more comments
    
    * Add import
    
    * Address more comments
    
    * Address more comments
    
    * Address comment
    
    * Address comment
    
    * Set max_model_input_sizes
    
    * Add tests
    
    * Add requires_backends
    
    * Add Nougat to exotic tests
    
    * Use to_pil_image
    
    * Address comment regarding nltk
    
    * Add NLTK
    
    * Improve variable names, integration test
    
    * Add test
    
    * refactor, document, and test regexes
    
    * remove named capture groups, add comments
    
    * format
    
    * add non-markdown fixed tokenization
    
    * format
    
    * correct flakyness of args parse
    
    * add regex comments
    
    * test functionalities for crop_image, align long axis and expected output
    
    * add regex tests
    
    * remove cv2 dependency
    
    * test crop_margin equality between cv2 and python
    
    * refactor table regexes to markdown
    
    add newline
    
    * change print to log, improve doc
    
    * fix high count tables correction
    
    * address PR comments: naming, linting, asserts
    
    * Address comments
    
    * Add copied from
    
    * Update conversion script
    
    * Update conversion script to convert both small and base versions
    
    * Add inference example
    
    * Add more info
    
    * Fix style
    
    * Add require annotators to test
    
    * Define all keyword arguments explicitly
    
    * Move cv2 annotator
    
    * Add tokenizer init method
    
    * Transfer checkpoints
    
    * Add reference to Donut
    
    * Address comments
    
    * Skip test
    
    * Remove cv2 method
    
    * Add copied from statements
    
    * Use cached_property
    
    * Fix docstring
    
    * Add file to not doctested
    
    ---------
    
    Co-authored-by: default avatarPablo Montalvo <pablo.montalvo.leroux@gmail.com>
    ace74d16