
Written by
Andrei Negrau
Introducing Video: Siena now understands customer videos
January 13, 2026
5
min read
Something changed in how customers communicate with brands.
A few years ago, customers would write detailed paragraphs describing their problems. They'd painstakingly explain that the handle on their pan was cracked, or that the zipper on their jacket was stuck, or that the product they received looked nothing like the photos on the website.
Today, they just send a video.
A 10-second clip of the cracked handle. A quick recording showing the zipper jamming. A video panning across the damaged packaging. No explanation needed. The visual evidence speaks for itself.
This shift mirrors how we communicate everywhere else. We send voice notes instead of typing. We screen-share instead of describing errors. We record videos instead of writing essays. The modern customer doesn't want to describe their problem. They want to show it.
And until now, AI agents couldn't see.
The missing modality
Here's a scene that plays out thousands of times a day across consumer brands: a customer submits a warranty claim with a video attachment. A human agent receives the ticket. They open the video. They watch it. They pause at the right moment. They take a mental note of what they see. They describe it internally: "crack visible on left side of handle, approximately 2 inches long." They make a judgment call. They process the claim or ask for more details.
This entire process - watching, interpreting, deciding - happens manually.
This is a bottleneck for brands with high warranty volume. We work with brands where more than 50% of all customer interactions include an image or video attachment. Some categories are even higher. Cookware. Electronics. Furniture. Apparel with fit issues. Anything where "show me" is faster than "tell me."
These brands have automated nearly everything else. Order lookups. Tracking updates. Return processing. Subscription changes. But the moment a video enters the conversation, a human has to step in. That changes today.
From images to video
We launched Siena Vision 1.0 over a year ago to give Siena the ability to understand images. We've since processed millions of interactions with image comprehension. We learned that visual communication is becoming the default. And images were the first step.
Video is fundamentally different. It captures movement, demonstrates problems in action, shows context that static images miss. A customer can show a product working incorrectly. They can demonstrate the exact moment something fails. They can walk through a setup that isn't going as expected.
With Siena Vision 2.0, Siena can now process and understand customer videos the same way a human would. When a customer sends a video showing their issue, Siena watches it, extracts the relevant context, and takes action - whether that's processing a warranty claim, troubleshooting the problem, or answering a product question.
Scaling video to production-grade
Multimodal models exist. Video comprehension exists. But production-grade video support for customer experiences is a different beast.
First, customer videos come in every format imaginable. Different compression standards. Different resolutions. Files that are too large, corrupted, or recorded in poor lighting. Videos from email, from live chat, from social channels, from 3rd party integrations. Each source has its own quirks.
Then there's the understanding layer. Extracting useful information from a video isn't the same as describing what's in it. Siena needs to understand conversation context, identify which product is shown, assess the type and severity of damage, understand the functional issue being demonstrated, and connect all of that to the customer's order history and the brand's policies.
And it all has to happen reliably and at scale.
What video makes possible today

Warranty claims become fully automated. A customer sends a video showing product damage. Siena watches the video, identifies the product, assesses the issue, cross-references the purchase date and warranty terms, and processes the claim. No agent intervention required for straightforward cases. For edge cases, Siena escalates with full context - including its interpretation of the video - so agents don't start from scratch.
Product troubleshooting gets visual. Instead of asking customers to describe what's happening, Siena can see it. A product not working as expected? The customer records a quick video demonstrating the issue. Siena watches, diagnoses, and responds with targeted solutions. If it's user error, Siena can walk them through the fix. If it's a defect, Siena can process a return or replacement immediately.
Product quality assessment scales without headcount. Shipping damage, manufacturing defects, condition issues on returns - all of these require visual verification. Previously, that meant human review. Now Siena can handle the assessment automatically, routing only the genuinely ambiguous cases to agents.
Agentic shopping: new use cases we're exploring
We think video will open up entirely new experiences.
Think about shade matching. A customer records a quick video of their skin tone in natural lighting. Siena analyzes the video and recommends the right foundation shade. Not based on a quiz or self-reported skin type - based on what it actually sees.
Think about product recommendations. A customer shows their living room and asks what couch would work in the space. Siena looks at the dimensions, the lighting, the existing style, and suggests products that would actually fit.

Think about setup assistance. A customer shows how they've installed a product. Siena spots that the mounting bracket is upside down, or that a cable is connected to the wrong port, and provides step-by-step guidance to fix it.
Video transforms customer service from reactive problem-solving into a collaborative, visual experience. Customers can just show their situation, and AI can respond to what it sees.
What's next
Customers don't think in modalities. They just communicate. They send whatever makes it easiest to explain their situation - text, images, video, voice. From day one, our vision was to meet them there: one agent that understands it all, across every channel.
Siena Vision 2.0 brings us closer to that vision. Text, images, video, voice - Siena now processes whatever format customers use to communicate, across email, chat, social, and SMS.
But this is bigger than support conversations. Brands create video content, run video ads, receive video responses and comments. There's a white space of multimedia shaping how customers perceive and interact with brands. Understanding that content at scale, in real time, opens possibilities we're only beginning to explore.
Available now
If you're already a Siena customer, video support is live at no additional cost.
If you're not yet working with us, reach out to see Siena Vision 2.0 in action and learn how we can help transform your customer experience.






