perceptron

Perceptron: Perceptron Mk1

Name: Perceptron: Perceptron Mk1
Brand: perceptron
Price: 0.1800 USD
Availability: InStock

Perceptron Mk1 (Mark One) is Perceptron's highest-quality vision-language model for video and embodied reasoning. It accepts image and video inputs paired with natural language queries, and produces detailed visual understanding responses, either structured or natural language. It excels at video understanding tasks like video QA, summarization, and event detection. On image inputs, it advances point-by-example grounding from multimodal prompts, OCR and document parsing on messy real-world inputs, open vocabulary object detection and counting, and hand pose estimation. Reasoning can be enabled per request to trade latency for deeper analysis on harder tasks. Structured annotations are emitted inline with text only when explicitly requested via the annotationformat parameter (pass "point", "box", or "polygon" for spatial localization on images, or "clip" (start/end timestamps) for temporal segments in video). Without annotationformat, the model returns natural-language text only.

Try in playground API reference

32,768 context

Modalities:text, image, video->text

Released:5/12/2026

Weekly tokens

394.7M

Tokens generated this week (network-wide)

Usage by period

No ranking data yet for this model.