Skip to content
GitLab
Projects
Groups
Topics
Snippets
/
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
lxy
PRA24-Convolution
Repository
Branches
Overview
Active
Stale
All
dev/matrix-core-fp16fp32-32x32x16-Akxm_Bnxk_Cnxm_img_trans_per_img
16facd88
·
Bad Performance
·
Oct 30, 2024
Compare
Select Archive Format
Download source code
zip
tar.gz
tar.bz2
tar
dev/matrix-core-fp16fp32-32x32x16-Akxm_Bnxk_Cnxm
merged
19d36546
·
put fp16 registerUnion define in common.h
·
Oct 30, 2024
Compare
Select Archive Format
Download source code
zip
tar.gz
tar.bz2
tar
dev/matrix-core-fp16fp32-32x32x16+16x16x16-Akxm_Bnxk_Cnxm
34d6fffe
·
Added gemm_batched_kernel_tensorcore_16x16x16_fp16fp32_Akxm_Bnxk_Cnxm(), bad performance
·
Oct 30, 2024
Compare
Select Archive Format
Download source code
zip
tar.gz
tar.bz2
tar
dev/matrix-core-fp16fp32-32x32x16-Akxm_Bnxk_Cnxm_gemm_grid_m_n_k_batch
d1e415b6
·
Slow , and diy atomicAdd is wrong
·
Oct 30, 2024
Compare
Select Archive Format
Download source code
zip
tar.gz
tar.bz2
tar
dev/matrix-core-fp16fp32-32x32x16
merged
b38c9677
·
Add: profile_final.sh
·
Oct 27, 2024
Compare
Select Archive Format
Download source code
zip
tar.gz
tar.bz2
tar
dev/matrix-core-fp16fp32-32x32x16-double-buffer
9dd3aa08
·
tried double buffer
·
Oct 24, 2024
Compare
Select Archive Format
Download source code
zip
tar.gz
tar.bz2
tar
dev/matrix-core-fp16fp32-16x16x16
a10b5a65
·
Eliminate lds bank conflict
·
Oct 23, 2024
Compare
Select Archive Format
Download source code
zip
tar.gz
tar.bz2
tar
dev/matrix-core-fp32-32x32x8
3e881a85
·
Coalescing read from global memory on A, B
·
Oct 12, 2024
Compare
Select Archive Format
Download source code
zip
tar.gz
tar.bz2
tar
dev/matrix-core-fp32-16x16x8
feaee3cf
·
Coalescing read from global memory on matices on A,B
·
Oct 12, 2024
Compare
Select Archive Format
Download source code
zip
tar.gz
tar.bz2
tar
dev/matrix-core-fp16fp32
merged
96c25d83
·
Passed && improved : coalesced read matrix A and B
·
Oct 12, 2024
Compare
Select Archive Format
Download source code
zip
tar.gz
tar.bz2
tar
dev/matrix-core
3dd83314
·
Passed: gemm with fp32 tensor core (32x32x8 blocking)
🎉
·
Oct 11, 2024
Compare
Select Archive Format
Download source code
zip
tar.gz
tar.bz2
tar
dev/fp32
merged
70ffe8ad
·
fix the wrong code ; change the blk_k back to 8
·
Oct 09, 2024
Compare
Select Archive Format
Download source code
zip
tar.gz
tar.bz2
tar
xry/sgemm
0f973a42
·
Feat: templated precision
·
Sep 18, 2024
Compare
Select Archive Format
Download source code
zip
tar.gz
tar.bz2
tar
Prev
1
2
Next