feat: 队员B平衡内存锁定实现 (!2) · Merge requests · 赵宁 / proj59-NKMILO

Open 巩岱松 requested to merge gds+20260608-130525 into main Jun 08, 2026

内容

完成队员B平衡内存锁定文档记录。
接入 FlexInfer 部分 mlock / madvise 和 balanced lock plan。
增加 CLI/bench 参数、算法测试和模型加载冒烟测试。
明确队员A实际异步预取实现位于 src/models/qwen2.cpp 的 Qwen2 graph 构造函数。
新增 A/B 衔接：qwen2.cpp 的 A 预取路径调用 flexinfer_runtime_should_prefetch_tensor()，已被 B 锁定的张量不重复预取。

flexinfer_runtime_before_decode() now launches unlocked-weight prefetch with std::async, while still filtering by Team B's actual lock plan.
Windows build passed: test-flexinfer-balanced-lock, real GGUF test-flexinfer-model-load.exe qwen2-1_5b-instruct-q4_k_m.gguf, and llama-bench pp1+tg8 with FlexInfer on/off.
Qwen2 Team A prefetch remains active: decode logs show n_layer = 28 and 28 [PREFETCH] Layer N entries.
Windows VirtualLock quota limits actual locked pages, so logs distinguish planned locked bytes from actual locked bytes. This validates implementation/linkage, not a full Linux memory-pressure reproduction of the paper's throughput curves.

Edited Jun 08, 2026 by 巩岱松