• Arthur's avatar
    [`Add Mixtral`] Adds support for the Mixtral MoE (#27942) · accccdd0
    Arthur authored
    
    
    * up
    
    * up
    
    * test
    
    * logits ok
    
    * up
    
    * up
    
    * few fixes
    
    * conversion script
    
    * up
    
    * nits
    
    * nits
    
    * update
    
    * nuke
    
    * more updates
    
    * nites
    
    * fix many issues
    
    * nit
    
    * scatter
    
    * nit
    
    * nuke megablocks
    
    * nits
    
    * fix conversion script
    
    * nit
    
    * remove
    
    * nits
    
    * nit
    
    * update
    
    * oupsssss
    
    * change
    
    * nits device
    
    * nits
    
    * fixup
    
    * update
    
    * merge
    
    * add copied from
    
    * fix the copy mentions
    
    * update tests
    
    * more fixes
    
    * nits
    
    * conversion script
    
    * add parts of the readme
    
    * Update tests/models/mixtral/test_modeling_mixtral.py
    
    Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
    
    * new test + conversion script
    
    * Apply suggestions from code review
    
    Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
    
    * Apply suggestions from code review
    
    * fix
    
    * fix copies
    
    * fix copies
    
    * ooops
    
    * fix config
    
    * Apply suggestions from code review
    
    * fix nits
    
    * nit
    
    * add copies
    
    * add batched tests
    
    * docs
    
    * fix flash attention
    
    * let's add more verbose
    
    * add correct outputs
    
    * support router ouptus
    
    * ignore copies where needed
    
    * fix
    
    * cat list if list is given for now
    
    * nits
    
    * Update docs/source/en/model_doc/mixtral.md
    
    * finish router refactoring
    
    * fix forward
    
    * fix expected values
    
    * nits
    
    * fixup
    
    * fix
    
    * fix bug
    
    * fix
    
    * fix dtype mismatch
    
    * fix
    
    * grrr grrr I support item assignment
    
    * fix CI
    
    * docs
    
    * fixup
    
    * remove some copied form
    
    * fix weird diff
    
    * skip doctest fast on the config and modeling
    
    * mark that is supports flash attention in the doc
    
    * update
    
    * Update src/transformers/models/mixtral/modeling_mixtral.py
    
    Co-authored-by: default avatarLysandre Debut <hi@lysand.re>
    
    * Update docs/source/en/model_doc/mixtral.md
    
    Co-authored-by: default avatarLysandre Debut <hi@lysand.re>
    
    * revert router logits config issue
    
    * update doc accordingly
    
    * Update src/transformers/models/mixtral/convert_mixtral_weights_to_hf.py
    
    * nits
    
    * use torch testing asssert close
    
    * fixup
    
    * doc nits
    
    ---------
    
    Co-authored-by: default avataryounesbelkada <younesbelkada@gmail.com>
    Co-authored-by: default avatarYounes Belkada <49240599+younesbelkada@users.noreply.github.com>
    Co-authored-by: default avatarLysandre Debut <hi@lysand.re>
    accccdd0