1. 11 Apr, 2025 12 commits
  2. 10 Apr, 2025 24 commits
  3. 09 Apr, 2025 4 commits
    • Wing Lian's avatar
      handle torch version edge cases (#37399) · 9cda4265
      Wing Lian authored
      9cda4265
    • Arthur's avatar
      the fix that did not get in (#37370) · e032d12e
      Arthur authored
      
      * debugging improvements
      
      * add debugging details
      
      * add more debugging details
      
      * debug more
      
      * the fix that did not get in
      
      * First fix flex
      
      * fix query offset
      
      * fix flex first
      
      * fix device mask creation for speed
      
      * small mask creation sdpa
      
      * Update flex_attention.py
      
      * remove chunked prefill from HybridChunkedCache
      
      * never seen such a fucked up merged
      
      * clean up layers + output
      
      * add summary json file
      
      * Efficient general cache
      
      * Update cache_utils.py
      
      * cleanup
      
      * fix?
      
      * fix!
      
      * oups typo
      
      * not everywhere
      
      * more fixes
      
      * revert unrelated changes
      
      * Fix but ugly for now -> should use pad instead
      
      * oups
      
      * re-initialize the cache
      
      * Use pad to simplify
      
      * style
      
      * correct slicing
      
      ---------
      
      Co-authored-by: default avatarPablo <pablo.montalvo.leroux@gmail.com>
      Co-authored-by: default avatarCyril Vallez <cyril.vallez@gmail.com>
      e032d12e
    • Mohamed Mekkouri's avatar
      Attention Quantization with FBGemm & TP (#37384) · f834ca2c
      Mohamed Mekkouri authored
      * fix
      
      * keep fused
      
      * contiguous
      
      * rm print
      
      * update
      
      * update
      
      * rm print
      f834ca2c
    • DerekLiu35's avatar
      Fix some failing AWQ tests (#37383) · c5c648dd
      DerekLiu35 authored
      * update AwqQuantizer
      
      * fix style
      
      * add an arg to get_modules_to_not_convert to add get_keys_to_not_convert(model)
      c5c648dd