• Ola Piktus's avatar
    RAG (#6813) · c754c41c
    Ola Piktus authored
    * added rag WIP
    
    * path fix
    
    * Formatting / renaming prior to actual work
    
    * added rag WIP
    
    * path fix
    
    * Formatting / renaming prior to actual work
    
    * added rag WIP
    
    * path fix
    
    * Formatting / renaming prior to actual work
    
    * added rag WIP
    
    * Formatting / renaming prior to actual work
    
    * First commit
    
    * improve comments
    
    * Retrieval evaluation scripts
    
    * refactor to include modeling outputs + MPI retriever
    
    * Fix rag-token model + refactor
    
    * Various fixes + finetuning logic
    
    * use_bos fix
    
    * Retrieval refactor
    
    * Finetuning refactoring and cleanup
    
    * Add documentation and cleanup
    
    * Remove set_up_rag_env.sh file
    
    * Fix retrieval wit HF index
    
    * Fix import errors
    
    * Fix quality errors
    
    * Refactor as per suggestions in https://github.com/huggingface/transformers/pull/6813#issuecomment-687208867
    
    
    
    * fix quality
    
    * Fix RAG Sequence generation
    
    * minor cleanup plus initial tests
    
    * fix test
    
    * fix tests 2
    
    * Comments fix
    
    * post-merge fixes
    
    * Improve readme + post-rebase refactor
    
    * Extra dependencied for tests
    
    * Fix tests
    
    * Fix tests 2
    
    * Refactor test requirements
    
    * Fix tests 3
    
    * Post-rebase refactor
    
    * rename nlp->datasets
    
    * RAG integration tests
    
    * add tokenizer to slow integration test and allow retriever to run on cpu
    
    * add tests; fix position ids warning
    
    * change structure
    
    * change structure
    
    * add from encoder generator
    
    * save working solution
    
    * make all integration tests pass
    
    * add RagTokenizer.save/from_pretrained and RagRetriever.save/from_pretrained
    
    * don't save paths
    
    * delete unnecessary imports
    
    * pass config to AutoTokenizer.from_pretrained for Rag tokenizers
    
    * init wiki_dpr only once
    
    * hardcode legacy index and passages paths (todo: add the right urls)
    
    * finalize config
    
    * finalize retriver api and config api
    
    * LegacyIndex index download refactor
    
    * add dpr to autotokenizer
    
    * make from pretrained more flexible
    
    * fix ragfortokengeneration
    
    * small name changes in tokenizer
    
    * add labels to models
    
    * change default index name
    
    * add retrieval tests
    
    * finish token generate
    
    * align test with previous version and make all tests pass
    
    * add tests
    
    * finalize tests
    
    * implement thoms suggestions
    
    * add first version of test
    
    * make first tests work
    
    * make retriever platform agnostic
    
    * naming
    
    * style
    
    * add legacy index URL
    
    * docstrings + simple retrieval test for distributed
    
    * clean model api
    
    * add doc_ids to retriever's outputs
    
    * fix retrieval tests
    
    * finish model outputs
    
    * finalize model api
    
    * fix generate problem for rag
    
    * fix generate for other modles
    
    * fix some tests
    
    * save intermediate
    
    * set generate to default
    
    * big refactor generate
    
    * delete rag_api
    
    * correct pip faiss install
    
    * fix auto tokenization test
    
    * fix faiss install
    
    * fix test
    
    * move the distributed logic to examples
    
    * model page
    
    * docs
    
    * finish tests
    
    * fix dependencies
    
    * fix import in __init__
    
    * Refactor eval_rag and finetune scripts
    
    * start docstring
    
    * add psutil to test
    
    * fix tf test
    
    * move require torch to top
    
    * fix retrieval test
    
    * align naming
    
    * finish automodel
    
    * fix repo consistency
    
    * test ragtokenizer save/load
    
    * add rag model output docs
    
    * fix ragtokenizer save/load from pretrained
    
    * fix tokenizer dir
    
    * remove torch in retrieval
    
    * fix docs
    
    * fixe finetune scripts
    
    * finish model docs
    
    * finish docs
    
    * remove auto model for now
    
    * add require torch
    
    * remove solved todos
    
    * integrate sylvains suggestions
    
    * sams comments
    
    * correct mistake on purpose
    
    * improve README
    
    * Add generation test cases
    
    * fix rag token
    
    * clean token generate
    
    * fix test
    
    * add note to test
    
    * fix attention mask
    
    * add t5 test for rag
    
    * Fix handling prefix in finetune.py
    
    * don't overwrite index_name
    
    Co-authored-by: default avatarPatrick Lewis <plewis@fb.com>
    Co-authored-by: default avatarAleksandra Piktus <piktus@devfair0141.h2.fair>
    Co-authored-by: default avatarAleksandra Piktus <piktus@learnfair5102.h2.fair>
    Co-authored-by: default avatarAleksandra Piktus <piktus@learnfair5067.h2.fair>
    Co-authored-by: default avatarYour Name <you@example.com>
    Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
    Co-authored-by: default avatarQuentin Lhoest <lhoest.q@gmail.com>
    c754c41c