vllm.transformers_utils.configs.deepseek_vl2 ¶
DeepseekVLV2Config ¶
Bases: PretrainedConfig
Source code in vllm/transformers_utils/configs/deepseek_vl2.py
candidate_resolutions class-attribute instance-attribute ¶
projector_config instance-attribute ¶
projector_config: MlpProjectorConfig = MlpProjectorConfig(
**projector_config
)
vision_config instance-attribute ¶
vision_config: VisionEncoderConfig = VisionEncoderConfig(
**vision_config
)
__init__ ¶
__init__(
tile_tag: str = "tile_tag",
global_view_pos: str = "head",
candidate_resolutions: tuple[tuple[int, int]] = (
(384, 384),
),
**kwargs,
)
Source code in vllm/transformers_utils/configs/deepseek_vl2.py
MlpProjectorConfig ¶
Bases: PretrainedConfig
Source code in vllm/transformers_utils/configs/deepseek_vl2.py
__init__ ¶
__init__(
projector_type: str = "downsample_mlp_gelu",
input_dim: int = 1152,
n_embed: int = 2048,
depth: int = 2,
mlp_ratio: int = 1,
downsample_ratio: int = 2,
**kwargs,
)
Source code in vllm/transformers_utils/configs/deepseek_vl2.py
VisionEncoderConfig ¶
Bases: PretrainedConfig
Source code in vllm/transformers_utils/configs/deepseek_vl2.py
__init__ ¶
__init__(
model_name: str = "vit_so400m_patch14_siglip_384.webli",
image_size: int = 384,
patch_size: int = 16,
width: int = 1024,
layers: int = 24,
heads: int = 16,
mlp_ratio: int = 4,
global_pool: str = "map",
ignore_head: bool = True,
class_token: bool = False,
num_classes: int = 0,
use_checkpoint: bool = False,
**kwargs,
)