flexinfer_runtime_before_decode() now launches unlocked-weight prefetch with std::async, while still filtering by Team B's actual lock plan.
Windows build passed: test-flexinfer-balanced-lock, real GGUF test-flexinfer-model-load.exe qwen2-1_5b-instruct-q4_k_m.gguf, and llama-benchpp1+tg8 with FlexInfer on/off.
Qwen2 Team A prefetch remains active: decode logs show n_layer = 28 and 28 [PREFETCH] Layer N entries.
Windows VirtualLock quota limits actual locked pages, so logs distinguish planned locked bytes from actual locked bytes. This validates implementation/linkage, not a full Linux memory-pressure reproduction of the paper's throughput curves.