LLM Configuration Types

BlackCat supports multiple types of large model capabilities and will automatically call the appropriate model type depending on the feature. The plugin currently recognizes and uses the following four types of models:

Chat Model (Chat / LLM)

Used for everyday conversation, webpage content understanding, text generation, and reasoning—the core interactive capabilities.

Document Understanding Model (Document / File Understanding)

Used to parse and understand long texts on webpages, structured documents, or user-uploaded files.

Image Generation Model

Generate images from text instructions for creative, design, and content enhancement scenarios.

Vision / Multimodal Model

Used to understand images, screenshots, or video frames on webpages to enable true multimodal analysis.

The plugin will automatically match and call the appropriate model type based on the feature scenario.

For the most complete and powerful AI browser experience, we recommend configuring all four model types.

Important: BlackCat only supports model interfaces compatible with the OpenAI SDK. Only models that follow this standard can be correctly recognized, scheduled, and run stably by the plugin.