vllm.compilation.matcher_utils ¶
QUANT_OPS module-attribute ¶
QUANT_OPS: dict[QuantKey, OpOverload] = {
kFp8StaticTensorSym: default,
kFp8DynamicTensorSym: default,
kFp8DynamicTokenSym: default,
}
MatcherCustomOp ¶
Bases: ABC
Source code in vllm/compilation/matcher_utils.py
__call__ ¶
__init__ ¶
__init__(enabled: bool)
Source code in vllm/compilation/matcher_utils.py
empty ¶
empty_f32 ¶
empty_int64 ¶
forward_custom abstractmethod ¶
forward_native abstractmethod ¶
MatcherFusedAddRMSNorm ¶
Bases: MatcherCustomOp
Source code in vllm/compilation/matcher_utils.py
MatcherQuantFP8 ¶
Bases: MatcherCustomOp
Source code in vllm/compilation/matcher_utils.py
__init__ ¶
Source code in vllm/compilation/matcher_utils.py
forward_custom ¶
Source code in vllm/compilation/matcher_utils.py
forward_native ¶
inputs ¶
make_scale ¶
make_scale(input: Tensor)
Source code in vllm/compilation/matcher_utils.py
MatcherRMSNorm ¶
Bases: MatcherCustomOp
Source code in vllm/compilation/matcher_utils.py
__init__ ¶
forward_custom ¶
Source code in vllm/compilation/matcher_utils.py
forward_native ¶
MatcherRotaryEmbedding ¶
Bases: MatcherCustomOp
Source code in vllm/compilation/matcher_utils.py
__init__ ¶
__init__(
is_neox: bool,
head_size: int,
num_heads: int,
num_kv_heads: int,
use_flashinfer: bool = False,
enabled: bool | None = None,
) -> None
Source code in vllm/compilation/matcher_utils.py
forward_custom ¶
forward_custom(
positions: Tensor,
query: Tensor,
key: Tensor | None,
cos_sin_cache: Tensor,
) -> tuple[Tensor, Tensor | None]
Source code in vllm/compilation/matcher_utils.py
forward_native ¶
forward_native(
positions: Tensor,
query: Tensor,
key: Tensor | None,
cos_sin_cache: Tensor,
) -> tuple[Tensor, Tensor | None]
Source code in vllm/compilation/matcher_utils.py
inputs ¶
Source code in vllm/compilation/matcher_utils.py
MatcherSiluAndMul ¶
Bases: MatcherCustomOp