Skip to content

vllm.model_executor.warmup.deepseek_v4_mhc_warmup

Warm up DeepSeek V4 mHC TileLang kernels before serving requests.

Ported from lucifer1004/vllm-jasl with the two env-var knobs removed (VLLM_ENABLE_DEEPSEEK_V4_MHC_WARMUP, VLLM_DEEPSEEK_V4_MHC_WARMUP_TOKEN_SIZES). Gating is intrinsic: non-DSv4 models and layers without hc_* attributes return early, so the warmup is a no-op except where it's needed.