Skip to content
GitLab
Projects Groups Topics Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
  • P proj59-NKMILO
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributor statistics
    • Graph
    • Compare revisions
  • Issues 0
    • Issues 0
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 1
    • Merge requests 1
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Terraform modules
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • 赵宁
  • proj59-NKMILO
  • Merge requests
  • !2

feat: 队员B平衡内存锁定实现

  • Review changes

  • Download
  • Patches
  • Plain diff
Open 巩岱松 requested to merge gds+20260608-130525 into main Jun 08, 2026
  • Overview 0
  • Commits 3
  • Pipelines 0
  • Changes 20

内容

  • 完成队员B平衡内存锁定文档记录。
  • 接入 FlexInfer 部分 mlock / madvise 和 balanced lock plan。
  • 增加 CLI/bench 参数、算法测试和模型加载冒烟测试。
  • 明确队员A实际异步预取实现位于 src/models/qwen2.cpp 的 Qwen2 graph 构造函数。
  • 新增 A/B 衔接:qwen2.cpp 的 A 预取路径调用 flexinfer_runtime_should_prefetch_tensor(),已被 B 锁定的张量不重复预取。

验证

  • cmake --build build --target test-flexinfer-model-load --config Release
  • WSL/g++ 轻量编译运行 test-flexinfer-balanced-lock 通过

2026-06-08 validation update

  • flexinfer_runtime_before_decode() now launches unlocked-weight prefetch with std::async, while still filtering by Team B's actual lock plan.
  • Windows build passed: test-flexinfer-balanced-lock, real GGUF test-flexinfer-model-load.exe qwen2-1_5b-instruct-q4_k_m.gguf, and llama-bench pp1+tg8 with FlexInfer on/off.
  • Qwen2 Team A prefetch remains active: decode logs show n_layer = 28 and 28 [PREFETCH] Layer N entries.
  • Windows VirtualLock quota limits actual locked pages, so logs distinguish planned locked bytes from actual locked bytes. This validates implementation/linkage, not a full Linux memory-pressure reproduction of the paper's throughput curves.
Edited Jun 08, 2026 by 巩岱松
Assignee
Assign to
Reviewers
Request review from
Time tracking
Source branch: gds+20260608-130525