Skip to content
GitLab
Projects
Groups
Topics
Snippets
/
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
lxy
PRA24-Convolution
Repository
Branches
Overview
Active
Stale
All
main
default
protected
c5d79788
·
fix: delete conf.txt
·
Jun 08, 2025
Select Archive Format
Download source code
zip
tar.gz
tar.bz2
tar
dev/matrix-core-fp16fp32-nonfused-input-transform
0cd870c5
·
add the template of output transform
·
Nov 09, 2024
Compare
Select Archive Format
Download source code
zip
tar.gz
tar.bz2
tar
dev/matrix-core-fp16fp32-Template-Akxm_Bnxk_Cnxm
df7d5c47
·
Feat: multi-warp gemm_batched_kernel.
·
Nov 08, 2024
Compare
Select Archive Format
Download source code
zip
tar.gz
tar.bz2
tar
dev/32x32x16-fused-VUY-as-Union
6c092e3c
·
Feat: shrink lds size from 32KiB to 16KiB
·
Nov 08, 2024
Compare
Select Archive Format
Download source code
zip
tar.gz
tar.bz2
tar
dev/matrix-core-fp16fp32-32x32x16-fuse-winograd2x3
cc78ca3a
·
Feat: eliminated lds operation of image tile
·
Nov 07, 2024
Compare
Select Archive Format
Download source code
zip
tar.gz
tar.bz2
tar
dev/matrix-core-fp16fp32-128x64x16-Akxm_Bnxk_Cnxm-winograd-2x3
71929978
·
WIP: Use vertor type for read GMEM
·
Nov 07, 2024
Compare
Select Archive Format
Download source code
zip
tar.gz
tar.bz2
tar
dev/matrix-core-fp16fp32-Template-Amxk_Bnxk_Cnxm
1f9b7b07
·
WIP: finished gemm, not validated
·
Nov 06, 2024
Compare
Select Archive Format
Download source code
zip
tar.gz
tar.bz2
tar
dev/matrix-core-fp16fp32-Template-Akxm_Bnxk_Cnxm-double-buffering
fe6a2356
·
Feat: double buffering template function
·
Nov 05, 2024
Compare
Select Archive Format
Download source code
zip
tar.gz
tar.bz2
tar
dev/matrix-core-fp16fp32-256x64x16-Akxm_Bnxk_Cnxm-double-buffering
096bc939
·
not finished yet
·
Nov 05, 2024
Compare
Select Archive Format
Download source code
zip
tar.gz
tar.bz2
tar
dev/matrix-core-fp16fp32-128x64x16-Akxm_Bnxk_Cnxm-double-buffering
61f6be3d
·
Feat: double buffering
·
Nov 05, 2024
Compare
Select Archive Format
Download source code
zip
tar.gz
tar.bz2
tar
dev/matrix-core-fp16fp32-128x64x16-Akxm_Bnxk_Cnxm
daeedbed
·
Revert " merge input and filter transform of 4x3"
·
Nov 04, 2024
Compare
Select Archive Format
Download source code
zip
tar.gz
tar.bz2
tar
implicit_gemm
88c39510
·
Mixed up the code, can pass the final case, but error when c is not the multiple of 16
·
Nov 04, 2024
Compare
Select Archive Format
Download source code
zip
tar.gz
tar.bz2
tar
commit/11-04-580.4203
f0dbd026
·
Merge Optmization on tranforms
·
Nov 04, 2024
Compare
Select Archive Format
Download source code
zip
tar.gz
tar.bz2
tar
dev/matrix-core-fp16fp32-128x64x16-Akxm_Bnxk_Cnxm-8-wavefront
a62b962a
·
merge input and filter transform of 4x3
·
Nov 03, 2024
Compare
Select Archive Format
Download source code
zip
tar.gz
tar.bz2
tar
dev/matrix-core-fp16fp32-64x64x16-Akxm_Bnxk_Cnxm-winograd-2x3
2fd761f4
·
Merge Input and filter transform
·
Nov 03, 2024
Compare
Select Archive Format
Download source code
zip
tar.gz
tar.bz2
tar
dev/matrix-core-fp16fp32-64x64x16-fuse-winograd2x3
a6f7e65a
·
Can't pass validation and slow
·
Nov 02, 2024
Compare
Select Archive Format
Download source code
zip
tar.gz
tar.bz2
tar
dev/matrix-core-fp16fp32-64x64x16-Akxm_Bnxk_Cnxm
25e7abf0
·
add the batch kernel of 64x64x16
·
Nov 02, 2024
Compare
Select Archive Format
Download source code
zip
tar.gz
tar.bz2
tar
dev/no-matrix-core-fp16fp32-32x32x16-fuse-winograd2x3
f7a817f7
·
can only pass 16 16 12 12 64 3 3 1 1 1 1
·
Nov 01, 2024
Compare
Select Archive Format
Download source code
zip
tar.gz
tar.bz2
tar
dev/matrix-core-fp16fp32-16x16x16-fuse-winograd2x3
def6b7a1
·
Delete __syncthreads in if() at filter&img trans
·
Nov 01, 2024
Compare
Select Archive Format
Download source code
zip
tar.gz
tar.bz2
tar
dev/matrix-core-fp16fp32-16x16x16-fuse
5871a2b9
·
Feat: now can handle different m, n, k
·
Oct 31, 2024
Compare
Select Archive Format
Download source code
zip
tar.gz
tar.bz2
tar
Prev
1
2
Next