Skip to main content

AI Vision

Analyze Images and Extract Structured Insights

The AI Vision node in Fluidfit enables you to analyze images using advanced AI vision models and extract meaningful insights such as detected text, brands, descriptions, keywords, or product categories.

This feature allows users to convert visual content into structured data that can be used within automated workflows. Whether you are analyzing product photos, packaging images, retail shelves, or visual content from campaigns, the AI Vision node helps transform images into actionable information.

How AI Vision Works

The AI Vision node takes an image as input and processes it using the selected AI model along with a custom prompt that guides the analysis.

The system then generates a structured response that can be used in subsequent workflow steps.

Typical workflow:

  1. Provide an image input

  2. Add analysis instructions (prompt)

  3. Select the AI Vision model

  4. Choose the output format

  5. Generate structured insights

The output is sent to a Text node, from where the data can be copied or passed into downstream workflow components.

Supported AI Models

Fluidfit allows users to choose from multiple AI vision models depending on the complexity and performance requirements.

Available models include:

  • Gemini 2.0 Flash

  • Gemini 2.5 Flash Lite

  • Gemini 2.5 Flash

  • Gemini 3.1 Flash Lite Preview

  • Claude Sonnet 4.6

  • OpenAI GPT-4o

  • LLaVA-13B

Each model offers different trade-offs between speed, cost, and analytical capability.

Configuring the AI Vision Node

1. Provide Analysis Instructions

Users can provide a prompt describing how the image should be analyzed.

Example prompt:

Analyze the provided image and return a JSON object containing detected text, brands, product description, keywords, and category.

This prompt helps the AI model understand what information should be extracted from the image.

2. Choose the Output Format

Fluidfit allows users to select the desired format for the generated output.

Available options:

  • JSON – Returns structured machine-readable data

  • Raw Text – Returns a plain text description

Selecting JSON is recommended when the output will be used in automated workflows.

3. Generate Results

After configuring the model and prompt:

  1. Click Generate

  2. The AI Vision node processes the image

  3. The output appears in the connected Text node

The generated text can then be copied or used in further workflow steps.

Example Output

Example JSON output from image analysis:

This structured output can be easily used for:

  • cataloging products

  • tagging images

  • automating data pipelines

  • feeding downstream AI workflows

Common Use Cases

The AI Vision node can support many real-world scenarios, including:

  • Product Recognition
    Identify products, brands, and packaging information from images.

  • Retail Shelf Analysis
    Detect items present on shelves and categorize them.

  • Visual Content Tagging
    Automatically generate keywords or descriptions for images.

  • Text Extraction
    Identify text elements present within images.

  • Image Metadata Generation
    Generate structured metadata for digital asset management.

Best Practices

To achieve the best results:

  • Use clear prompts that specify the expected output format

  • Select JSON output when building automated workflows

Summary

The AI Vision node enables Fluidfit users to transform images into structured insights using powerful AI models.

By combining image inputs, intelligent prompts, and flexible output formats, this feature allows visual data to seamlessly integrate into automated workflows.

This makes it possible to turn images into usable business intelligence inside Fluidfit pipelines.This feature is also available via API. Refer to the API documentation here for implementation details.


Try Fluidfit Today

Did this answer your question?