Skip to content
GitLab
Projects Groups Topics Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
  • P PRA24-Convolution
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributor statistics
    • Graph
    • Compare revisions
  • Issues 0
    • Issues 0
    • List
    • Boards
    • Service Desk
    • Milestones
  • Deployments
    • Deployments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Terraform modules
  • Monitor
    • Monitor
    • Metrics
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Commits
  • Issue Boards
Collapse sidebar
  • lxy
  • PRA24-Convolution
  • Repository
  • Branches
  • Overview
  • Active
  • Stale
  • All
  • xry/sgemm
    0f973a42 · Feat: templated precision · Sep 18, 2024
    Compare
    Download source code
    zip tar.gz tar.bz2 tar
  • dev/fp32 merged
    70ffe8ad · fix the wrong code ; change the blk_k back to 8 · Oct 09, 2024
    Compare
    Download source code
    zip tar.gz tar.bz2 tar
  • dev/matrix-core
    3dd83314 · Passed: gemm with fp32 tensor core (32x32x8 blocking) 🎉 · Oct 11, 2024
    Compare
    Download source code
    zip tar.gz tar.bz2 tar
  • dev/matrix-core-fp16fp32 merged
    96c25d83 · Passed && improved : coalesced read matrix A and B · Oct 12, 2024
    Compare
    Download source code
    zip tar.gz tar.bz2 tar
  • dev/matrix-core-fp32-16x16x8
    feaee3cf · Coalescing read from global memory on matices on A,B · Oct 12, 2024
    Compare
    Download source code
    zip tar.gz tar.bz2 tar
  • dev/matrix-core-fp32-32x32x8
    3e881a85 · Coalescing read from global memory on A, B · Oct 12, 2024
    Compare
    Download source code
    zip tar.gz tar.bz2 tar
  • dev/matrix-core-fp16fp32-16x16x16
    a10b5a65 · Eliminate lds bank conflict · Oct 23, 2024
    Compare
    Download source code
    zip tar.gz tar.bz2 tar
  • dev/matrix-core-fp16fp32-32x32x16-double-buffer
    9dd3aa08 · tried double buffer · Oct 24, 2024
    Compare
    Download source code
    zip tar.gz tar.bz2 tar
  • dev/matrix-core-fp16fp32-32x32x16 merged
    b38c9677 · Add: profile_final.sh · Oct 27, 2024
    Compare
    Download source code
    zip tar.gz tar.bz2 tar
  • dev/matrix-core-fp16fp32-32x32x16-Akxm_Bnxk_Cnxm_gemm_grid_m_n_k_batch
    d1e415b6 · Slow , and diy atomicAdd is wrong · Oct 30, 2024
    Compare
    Download source code
    zip tar.gz tar.bz2 tar
  • dev/matrix-core-fp16fp32-32x32x16+16x16x16-Akxm_Bnxk_Cnxm
    34d6fffe · Added gemm_batched_kernel_tensorcore_16x16x16_fp16fp32_Akxm_Bnxk_Cnxm(), bad performance · Oct 30, 2024
    Compare
    Download source code
    zip tar.gz tar.bz2 tar
  • dev/matrix-core-fp16fp32-32x32x16-Akxm_Bnxk_Cnxm merged
    19d36546 · put fp16 registerUnion define in common.h · Oct 30, 2024
    Compare
    Download source code
    zip tar.gz tar.bz2 tar
  • dev/matrix-core-fp16fp32-32x32x16-Akxm_Bnxk_Cnxm_img_trans_per_img
    16facd88 · Bad Performance · Oct 30, 2024
    Compare
    Download source code
    zip tar.gz tar.bz2 tar
  • dev/matrix-core-fp16fp32-16x16x16-fuse
    5871a2b9 · Feat: now can handle different m, n, k · Oct 31, 2024
    Compare
    Download source code
    zip tar.gz tar.bz2 tar
  • dev/matrix-core-fp16fp32-16x16x16-fuse-winograd2x3
    def6b7a1 · Delete __syncthreads in if() at filter&img trans · Nov 01, 2024
    Compare
    Download source code
    zip tar.gz tar.bz2 tar
  • dev/no-matrix-core-fp16fp32-32x32x16-fuse-winograd2x3
    f7a817f7 · can only pass 16 16 12 12 64 3 3 1 1 1 1 · Nov 01, 2024
    Compare
    Download source code
    zip tar.gz tar.bz2 tar
  • dev/matrix-core-fp16fp32-64x64x16-Akxm_Bnxk_Cnxm
    25e7abf0 · add the batch kernel of 64x64x16 · Nov 02, 2024
    Compare
    Download source code
    zip tar.gz tar.bz2 tar
  • dev/matrix-core-fp16fp32-64x64x16-fuse-winograd2x3
    a6f7e65a · Can't pass validation and slow · Nov 02, 2024
    Compare
    Download source code
    zip tar.gz tar.bz2 tar
  • dev/matrix-core-fp16fp32-64x64x16-Akxm_Bnxk_Cnxm-winograd-2x3
    2fd761f4 · Merge Input and filter transform · Nov 03, 2024
    Compare
    Download source code
    zip tar.gz tar.bz2 tar
  • dev/matrix-core-fp16fp32-128x64x16-Akxm_Bnxk_Cnxm-8-wavefront
    a62b962a · merge input and filter transform of 4x3 · Nov 03, 2024
    Compare
    Download source code
    zip tar.gz tar.bz2 tar
  • Prev
  • 1
  • 2
  • Next