user avatar
[`StableLm`] Add QK normalization and Parallel Residual Support (#29745)
Jonathan Tow authored
* init: add StableLm 2 support

* add integration test for parallel residual and qk layernorm

* update(modeling): match qk norm naming for consistency with phi/persimmon

* fix(tests): run fwd/bwd on random init test model to jitter norm weights off identity

* `use_parallel_residual`: add copy pointer to `GPTNeoXLayer.forward`

* refactor: rename head states var in `StableLmLayerNormPerHead`

* tests: update test model and add generate check
2f12e408
Forked from 破败王者之剑 / OSKernel2022-LOS
Source project has a limited visibility.
Name Last commit Last update