1. 18 Apr, 2024 7 commits
    • Arthur's avatar
      Revert "Re-enable SDPA's FA2 path (#30070)" (#30314) · acab997b
      Arthur authored
      * Revert "Re-enable SDPA's FA2 path (#30070)"
      
      This reverts commit 05bdef16.
      
      * Revert "Fix quality Olmo + SDPA (#30302)"
      
      This reverts commit ec92f983.
      acab997b
    • Marc Sun's avatar
      Fix RecurrentGemma device_map (#30273) · 7509a0ad
      Marc Sun authored
      * Switch to non persistant buffer
      
      * fix device mismatch issue due to cache
      
      * style
      7509a0ad
    • fxmarty's avatar
      Add atol for sliding window test (#30303) · 9459efb8
      fxmarty authored
      atol for sliding window test
      9459efb8
    • tomeras91's avatar
      Add jamba (#29943) · 3f20877d
      tomeras91 authored
      * Add jamba arch
      
      * apply "make fix-copies" changes
      
      * fix link to model in JambaConfig docstring
      
      * Add n_ctx in modeling file because repo-consistency wants that
      
      * Add jamba to flash attention and sdpa documentation
      
      * mamba dt_proj quant fix now works for LoRA as well
      
      * override test_left_padding_compatibility and use a more permissive tolerance. left padding numerical difference are accentuated by mamba layers
      
      * add jamba to tokenization auto
      
      * fix comments of shape (PR #24 in the model page: https://huggingface.co/ai21labs/Jamba-v0.1/discussions/24)
      
      * simple PR fixes
      
      * remove unnecessary kwargs from JambaAttentionDecoderLayer and JambaMambaDecoderLayer
      
      * remove the LoRA hack for the mamba dt_proj bias. It was solved in huggingface/peft#1530 (https://github.com/huggingface/peft/pull/1530)
      
      * Add copied comment on JambaMLP (it's the same as MixtralMLP)
      
      * remove padding_mask warnings. It's not supported anymore
      
      * fix docstring. Float instead of int
      
      * A few more minor PR fixes
      
      * (1) lowercase names for mamba layernorms (2) remove _apply_inner_layernorms and do it directly in the forward pass
      
      * Return None attention weights from mamba layers. Append to all attentions only if not None.
      
      * remove some leftover jamba archive lists
      
      * Better separation between expert vs non-expert layers. non-expert layers return None as router_logits, and it is not concatenated to all_router_logits returned from JambaModel
      
      * no need to take router_logits at config.expert_layer_offset anymore. result.router_logits now holds results only for expert layers
      
      * Add Jamba paper on READMEs
      
      * (1) rename n_ctx -> max_position_embeddings (2) don't use it in the modeling file since it's not needed (set it as an exception to check_config_attributes)
      
      * Add copied from comment
      
      * remove the code path for apply_inner_layernorms=False. Jamba always has the inner mamba layernorms
      
      * clearer docstring for _convert_to_standard_cache
      
      * style fixes
      
      * Change calc_logits_for_entire_prompt (bool) to num_logits_to_keep (int). Adapt assisted decoding code tp use it. Also small change in low memory beam search decoding path to support this new int value in model_inputs
      
      * rename test so it still overrides what its meant to override
      
      * draft
      
      * oups
      
      * nit
      
      * remove more complexe logic
      
      * fix names used in config
      
      * fix fix fix
      
      * style
      
      * fix some more failing tests
      
      * generate did not init the cache :upside_down:
      
      
      
      * more small nits
      
      * typo
      
      * config.mamba_expand * config.hidden_size for the intermediate size of the mamba shapes
      
      * fix init of pkv with torch.tensor()
      
      * empty tensor
      
      * fix some init issues
      
      * stupid changes required by generate because it does not even support it's own DynamicCache class
      
      * more fixes
      
      * fix general assisted gen cache_position bug
      
      * tests passing
      
      * Add offsets and periods as SPECIAL_CASES_TO_ALLOW in check_config_attributes.py
      
      * fix reorder_cache to reorder mamba states and override some more functions in HybridMambaAttentionDynamicCache
      
      * no need to override test_past_key_values_format() and _check_past_key_values_for_generate() in tests anymore
      
      * fix docstrings and typehints for past_key_values
      
      * style fixes
      
      * fix docs
      
      * change typehint due to copy from Mixtral
      
      * forgot import
      
      * import order
      
      * Add configuration_jamba and modeling_jamba to not_doctested because the model is too big to download (in docstring of JambaForCausalLM.forward)
      
      * Add integration test with tiny tandom Jamba model on hub
      
      * fix flash attention cache shapes
      
      * bring back forgotten hidden states
      
      * rename HybridMambaAttentionDynamicCache.seqlen_offset to has_previous_state (and make bool) and bugfix - it should be set to True after a finished forward pass of the entire model
      
      * align integration test after modeling fixes
      
      * bugfix - mamba can use precomputed states only of forward pass is on a single token
      
      * bugfix - mamba can use precomputed states only if they match the batch size
      
      * typo
      
      * remove making _prepare_4d_causal_attention_mask a leaf function
      
      * stop using past_seq_len.get_seq_length(). Use cache positions instead. Adjust test (test_decoder_model_past_with_large_inputs) accordingly
      
      ---------
      
      Co-authored-by: default avatarArthur Zucker <arthur.zucker@gmail.com>
      Co-authored-by: default avatarJoao Gante <joao@huggingface.co>
      3f20877d
    • Yih-Dar's avatar
      Fix all torch pipeline failures except one (#30290) · 28a22834
      Yih-Dar authored
      
      * fix
      
      * fix
      
      ---------
      
      Co-authored-by: default avatarydshieh <ydshieh@users.noreply.github.com>
      28a22834
    • Pavel Iakubovskii's avatar
      Fix donut token2json multiline (#30300) · 7915a259
      Pavel Iakubovskii authored
      * Fix multiline processing
      
      * Update test for token2json
      7915a259
    • Alexander Visheratin's avatar
      Add Flash Attention 2 to M2M100 model (#30256) · b65df514
      Alexander Visheratin authored
      
      * Added flash attention 2.
      
      * Fixes.
      
      * Fix inheritance.
      
      * Fixed init.
      
      * Remove stuff.
      
      * Added documentation.
      
      * Add FA2 to M2M100 documentation.
      
      * Add test.
      
      * Fixed documentation.
      
      * Update src/transformers/models/m2m_100/modeling_m2m_100.py
      
      Co-authored-by: default avatarYounes Belkada <49240599+younesbelkada@users.noreply.github.com>
      
      * Update docs/source/en/model_doc/nllb.md
      
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Fixed variable name.
      
      ---------
      
      Co-authored-by: default avatarYounes Belkada <49240599+younesbelkada@users.noreply.github.com>
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      b65df514
  2. 17 Apr, 2024 14 commits
    • fxmarty's avatar
      Fix quality Olmo + SDPA (#30302) · ec92f983
      fxmarty authored
      fix olmo
      ec92f983
    • fxmarty's avatar
      Re-enable SDPA's FA2 path (#30070) · 05bdef16
      fxmarty authored
      
      * tentatively re-enable FA2 + SDPA
      
      * better comment
      
      * _ignore_causal_mask_sdpa as staticmethod
      
      * type hints
      
      * use past_seen_tokens instead
      
      * enable copied from for sdpa
      
      * ruff
      
      * llama simplifications on review
      
      * remove unnecessary self.is_causal check
      
      * fix copies
      
      * cleaning
      
      * precise message
      
      * better doc
      
      * add test
      
      * simplify
      
      * Update src/transformers/models/llama/modeling_llama.py
      
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/models/llama/modeling_llama.py
      
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/models/llama/modeling_llama.py
      
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * style
      
      ---------
      
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      05bdef16
    • Shane A's avatar
      Add OLMo model family (#29890) · e4ea19b9
      Shane A authored
      * Add OLMo using add-new-model-like with Llama
      
      * Fix incorrect tokenizer for OLMo
      
      * Copy-paste relevant OLMo methods and their imports
      
      * Add OLMo config
      
      * Modify OLMo config to follow HF conventions
      
      * Remove unneeded Llama code from OLMo model
      
      * Add ability for OLMo model to output attentions
      
      * Add OLMoPreTrainedModel and OLMoModel
      
      * Add OLMoForCausalLM
      
      * Minor fixes to OLMo model for style and missing functions
      
      * Implement OLMo tokenizer
      
      * Implement OLMo to HF conversion script
      
      * Add tests for OLMo model
      
      * Add tests for OLMo fast tokenizer
      
      * Add auto-generated dummy objects
      
      * Remove unimplemented OLMo classes from auto and init classes and re-format
      
      * Add README and associated auto-generated files
      
      * Use OLMo names for common properties
      
      * Run make fixup
      
      * Remove `|` from OLMo typing
      
      * Remove unneeded tokenization_olmo.py
      
      * Revert model, config and converter to add-new-model-like Llama
      
      * Move logic for adding bos/eos token into GPTNeoxTokenizerFast
      
      * Change OLMoConfig defaults to match OLMo-7B
      
      * Use GPTNeoXToknizerFast in OLMo tokenizer tests
      
      * Modify auto-generated OLMoModelTests to work for OLMo
      
      * Add non-parametric layer norm OLMoLayerNorm
      
      * Update weight conversion script for OLMo
      
      * Fix __init__ and auto structure for OLMo
      
      * Fix errors from make fixup
      
      * Remove OLMoTokenizerFast from documentation
      
      * Add missing 'Copied from' for OLMoModel._update_causal_mask
      
      * Run make fix-copies
      
      * Rearrange string replacements in OLMoForCausalLM Copied from
      
      * Move OLMo and Llama CausalLM.forward example into global constants
      
      * Fix OLMO_GENERATION_EXAMPLE doc string typo
      
      * Add option for qkv clipping to OLMo
      
      * Rearrange OLMoConfig kwargs in convert_olmo_weights_to_hf
      
      * Add clip_qkv to OLMoConfig in convert_olmo_weights_to_hf
      
      * Fix OLMo tokenization bug using conversion script
      
      * Keep model in full precision after conversion
      
      * Do not add eos token automatically
      
      * Update references to OLMo model in HF Hub
      
      * Do not add eos token during encoding by default
      
      * Fix Llama generation example
      
      * Run make fixup
      
      * OLMo 7B integration test fix
      
      * Remove unneeded special case for OLMoConfig
      
      * OLMo 7B Twin 2T integration test fix
      
      * Fix test_model_7b_greedy_generation
      
      * Remove test_compile_static_cache
      
      * Fix OLMo and Llama generation example
      
      * Run make fixup
      
      * Revert "OLMo 7B integration test fix"
      
      This reverts commit 4df56a4b150681bfa559846f40e9b7b7f97d7908.
      
      * Revert "OLMo 7B Twin 2T integration test fix"
      
      This reverts commit 9ff65a4a294ace89ab047b793ca55e623a9ceefc.
      
      * Ungate 7B integration tests and fix greedy generation test
      
      * Add retries for flaky test_eager_matches_sdpa_generate
      
      * Fix output of doc example for OLMoForCausalLM.forward
      
      * Downsize OLMo doc test for OLMoForCausalLM.forward to 1B model
      
      * Try fix incorrect characters in OLMoForCausalLM.forward doct test
      
      * Try fix incorrect characters in OLMoForCausalLM.forward doc test using end quotes
      
      * Remove pretraining_tp from OLMo config and model
      
      * Add missing 'Copied from' instances
      
      * Remove unneeded causal_mask from OLMoModel
      
      * Revert Llama changes
      
      * Ignore copy for OLMoForCausalLM.forward
      
      * Change 'OLMo' to 'Olmo' in classes
      
      * Move minimal OLMo tokenization tests to model tests
      
      * Add missed 'Copied from' for repeat_kv
      e4ea19b9
    • Nicolas Patry's avatar
      Upgrading to tokenizers 0.19.0 (#30289) · 8e5f76f5
      Nicolas Patry authored
      * [DO NOT MERGE] Testing tokenizers 0.19.0rc0
      
      * Accounting for the breaking change.
      
      * Ruff.
      
      * Upgrading to tokenizers `0.19` (new release with preprend_scheme fixed
      and new surface for BPE tiktoken bug).
      8e5f76f5
    • Pavel Iakubovskii's avatar
      Add strategy to store results in evaluation loop (#30267) · c15aad09
      Pavel Iakubovskii authored
      * Add evaluation loop container for interm. results
      
      * Add tests for EvalLoopContainer
      
      * Formatting
      
      * Fix padding_index in test and typo
      
      * Move EvalLoopContainer to pr_utils to avoid additional imports
      
      * Fix `eval_do_concat_batches` arg description
      
      * Fix EvalLoopContainer import
      c15aad09
    • st81's avatar
      Add token type ids to CodeGenTokenizer (#29265) · 8d6b5096
      st81 authored
      * Add create token type ids to CodeGenTokenizer
      
      * Fix inconsistent length of token type ids
      
      * Format source codes
      
      * Fix inconsistent order of methods
      
      * Update docstring
      
      * add test_tokenizer_integration test
      
      * Format source codes
      
      * Add `copied from` comment to CodeGenTokenizerFast
      
      * Add doc of create_token_type_ids_from_sequences
      
      * Make return_token_type_ids False by default
      
      * Make test_tokenizer_integration as slow test
      
      * Add return_token_type_ids to tokenizer init arg
      
      * Add test for tokenizer's init return_token_type_ids
      
      * Format source codes
      8d6b5096
    • Younes Belkada's avatar
      FIX: Fix push important models CI (#30291) · 812a5de2
      Younes Belkada authored
      Update push-important-models.yml
      812a5de2
    • Yih-Dar's avatar
    • Yih-Dar's avatar
      05dab4e5
    • Raushan Turganbay's avatar
      Enable fx tracing for Mistral (#30209) · 304c6a1e
      Raushan Turganbay authored
      * tracing for mistral
      
      * typo
      
      * fix copies
      304c6a1e
    • Utkarsha Gupte's avatar
      Configuring Translation Pipelines documents update #27753 (#29986) · 98717cb3
      Utkarsha Gupte authored
      * Configuring Translation Pipelines documents update #27753
      
      Configuring Translation Pipelines documents update
      
      * Language Format Addition
      
      * adding supported list of languages list
      98717cb3
    • Younes Belkada's avatar
      FIX / AWQ: Fix failing exllama test (#30288) · 080b7008
      Younes Belkada authored
      fix filing exllama test
      080b7008
    • Yoach Lacombe's avatar
      41145247
    • fxmarty's avatar
      Fix SDPA sliding window compatibility (#30127) · 40eb6d6c
      fxmarty authored
      
      * fix sdpa + sliding window
      
      * give credit
      
      Co-authored-by: default avatarehuaa <ehuamail@163.com>
      
      * remove unnecessary warning
      
      * fix typog
      
      * add test
      
      ---------
      
      Co-authored-by: default avatarehuaa <ehuamail@163.com>
      40eb6d6c
  3. 16 Apr, 2024 10 commits
  4. 15 Apr, 2024 9 commits