Skip to content
GitLab
Projects
Groups
Topics
Snippets
/
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
lxy
PRA24-Convolution
Repository
Branches
Overview
Active
Stale
All
xry/sgemm
0f973a42
·
Feat: templated precision
·
Sep 18, 2024
Compare
Select Archive Format
Download source code
zip
tar.gz
tar.bz2
tar
dev/fp32
merged
70ffe8ad
·
fix the wrong code ; change the blk_k back to 8
·
Oct 09, 2024
Compare
Select Archive Format
Download source code
zip
tar.gz
tar.bz2
tar
dev/matrix-core
3dd83314
·
Passed: gemm with fp32 tensor core (32x32x8 blocking)
🎉
·
Oct 11, 2024
Compare
Select Archive Format
Download source code
zip
tar.gz
tar.bz2
tar
dev/matrix-core-fp16fp32
merged
96c25d83
·
Passed && improved : coalesced read matrix A and B
·
Oct 12, 2024
Compare
Select Archive Format
Download source code
zip
tar.gz
tar.bz2
tar
dev/matrix-core-fp32-16x16x8
feaee3cf
·
Coalescing read from global memory on matices on A,B
·
Oct 12, 2024
Compare
Select Archive Format
Download source code
zip
tar.gz
tar.bz2
tar
dev/matrix-core-fp32-32x32x8
3e881a85
·
Coalescing read from global memory on A, B
·
Oct 12, 2024
Compare
Select Archive Format
Download source code
zip
tar.gz
tar.bz2
tar
dev/matrix-core-fp16fp32-16x16x16
a10b5a65
·
Eliminate lds bank conflict
·
Oct 23, 2024
Compare
Select Archive Format
Download source code
zip
tar.gz
tar.bz2
tar
dev/matrix-core-fp16fp32-32x32x16-double-buffer
9dd3aa08
·
tried double buffer
·
Oct 24, 2024
Compare
Select Archive Format
Download source code
zip
tar.gz
tar.bz2
tar
dev/matrix-core-fp16fp32-32x32x16
merged
b38c9677
·
Add: profile_final.sh
·
Oct 27, 2024
Compare
Select Archive Format
Download source code
zip
tar.gz
tar.bz2
tar
dev/matrix-core-fp16fp32-32x32x16-Akxm_Bnxk_Cnxm_gemm_grid_m_n_k_batch
d1e415b6
·
Slow , and diy atomicAdd is wrong
·
Oct 30, 2024
Compare
Select Archive Format
Download source code
zip
tar.gz
tar.bz2
tar
dev/matrix-core-fp16fp32-32x32x16+16x16x16-Akxm_Bnxk_Cnxm
34d6fffe
·
Added gemm_batched_kernel_tensorcore_16x16x16_fp16fp32_Akxm_Bnxk_Cnxm(), bad performance
·
Oct 30, 2024
Compare
Select Archive Format
Download source code
zip
tar.gz
tar.bz2
tar
dev/matrix-core-fp16fp32-32x32x16-Akxm_Bnxk_Cnxm
merged
19d36546
·
put fp16 registerUnion define in common.h
·
Oct 30, 2024
Compare
Select Archive Format
Download source code
zip
tar.gz
tar.bz2
tar
dev/matrix-core-fp16fp32-32x32x16-Akxm_Bnxk_Cnxm_img_trans_per_img
16facd88
·
Bad Performance
·
Oct 30, 2024
Compare
Select Archive Format
Download source code
zip
tar.gz
tar.bz2
tar
dev/matrix-core-fp16fp32-16x16x16-fuse
5871a2b9
·
Feat: now can handle different m, n, k
·
Oct 31, 2024
Compare
Select Archive Format
Download source code
zip
tar.gz
tar.bz2
tar
dev/matrix-core-fp16fp32-16x16x16-fuse-winograd2x3
def6b7a1
·
Delete __syncthreads in if() at filter&img trans
·
Nov 01, 2024
Compare
Select Archive Format
Download source code
zip
tar.gz
tar.bz2
tar
dev/no-matrix-core-fp16fp32-32x32x16-fuse-winograd2x3
f7a817f7
·
can only pass 16 16 12 12 64 3 3 1 1 1 1
·
Nov 01, 2024
Compare
Select Archive Format
Download source code
zip
tar.gz
tar.bz2
tar
dev/matrix-core-fp16fp32-64x64x16-Akxm_Bnxk_Cnxm
25e7abf0
·
add the batch kernel of 64x64x16
·
Nov 02, 2024
Compare
Select Archive Format
Download source code
zip
tar.gz
tar.bz2
tar
dev/matrix-core-fp16fp32-64x64x16-fuse-winograd2x3
a6f7e65a
·
Can't pass validation and slow
·
Nov 02, 2024
Compare
Select Archive Format
Download source code
zip
tar.gz
tar.bz2
tar
dev/matrix-core-fp16fp32-64x64x16-Akxm_Bnxk_Cnxm-winograd-2x3
2fd761f4
·
Merge Input and filter transform
·
Nov 03, 2024
Compare
Select Archive Format
Download source code
zip
tar.gz
tar.bz2
tar
dev/matrix-core-fp16fp32-128x64x16-Akxm_Bnxk_Cnxm-8-wavefront
a62b962a
·
merge input and filter transform of 4x3
·
Nov 03, 2024
Compare
Select Archive Format
Download source code
zip
tar.gz
tar.bz2
tar
Prev
1
2
Next