EnCodex: How AI is Revolutionizing Video Streaming Quality

How Google Gemini Pro 2.5 makes smart video optimization simple and accessible

Patrick Kalkman

Apr 9, 2025 — 12 min read

Why Your Action Scenes Look Like Minecraft: The Hidden Cost of Lazy Encoding, image by author

“Ever cringed watching your favorite action scene disintegrate into pixelated chaos? Welcome to the broken promise of streaming quality.”

Sound familiar?

If you work with streaming video, you’ve hit this frustration many times. It gets to the heart of streaming’s biggest problem: smart video encoding.

Here’s the thing about raw video — it’s huge. We’re talking terabytes for a few hours of footage. That’s why streaming services need to shrink the video before sending it to your device.

The goal? Make the file smaller while keeping the quality high. Easy in theory, hard in practice.

Think of it like packing for different trips. A weekend getaway needs a small carry-on, while a month-long adventure needs a huge suitcase.

Videos work the same way — a simple cartoon needs less data than an action scene with rapid camera cuts.

This is where most streaming providers get stuck. Nearly 38% still use the same encoding settings for everything. It’s like forcing every package into the same size box, regardless of what’s inside.

EnCodex changes the game by letting AI do what human video experts would do — analyze the content and make smart decisions about how to encode it.

Imagine having an expert who:

Analyzes each scene in detail
Knows exactly what makes your content unique
Creates a custom compression strategy for that video

The result? Better quality, smaller file sizes, and happier viewers — without the massive technical overhead of traditional approaches.

Ready to see for yourself? Check out the EnCodex project on GitHub

Let me show you how we’re doing this.

The video encoding challenge

Ever streamed your favorite show only to watch it turn into a blocky mess just as the action kicks in? Meanwhile, the quiet conversations look crystal clear?

There’s a fascinating reason why this happens.

The streaming industry is at a crossroads right now. On one side, you’ve got the “keep it simple” crowd — about 38% of providers — who use identical encoding settings for everything.

On the other, you have innovative companies getting smarter about how they handle different types of content.

The numbers tell the story

Here’s something that might surprise you: A high-energy Marvel movie needs around 8,000 kbps (that’s tech-speak for “lots of data”) to look good, while a talk show can shine at just 2,000 kbps.

When streaming services use identical settings for both, it’s like using a moving truck to deliver a postcard — or trying to cram a couch into a compact car. Neither makes sense.

More than just quality

This isn’t just about making videos look pretty — it’s about staying competitive. Streaming services juggle three massive costs:

Storage (every version of every video needs server space)
Bandwidth (moving all that data isn’t cheap)
User satisfaction (when quality drops, subscribers think about canceling)

It’s a delicate balance: compress too much to save money, and viewers get frustrated. Don’t compress enough, and your costs spiral out of control. The old one-size-fits-all approach is technically possible, but wildly inefficient.

Smart encoding makes the difference

This is where things get interesting. Think of content-aware encoding like having a master chef instead of a vending machine. Rather than serving the same pre-packaged meal to everyone, it examines what’s on each plate and adjusts accordingly.

The major players (Netflix, Amazon Prime, Disney+) are already all-in on this approach. They analyze each video and create custom “recipes” for compression.

The results? Better quality, lower costs, and happier viewers.

Mid-size streaming services are quickly catching up, while smaller players are looking for ways to join the party without breaking the bank.

The evolution of video encoding

The industry is moving through clear stages:

1. Basic static encoding (the old way)
2. Per-title optimization (each show gets its own recipe)
3. Per-scene adaptation (the recipe changes during the show)
4. AI-powered optimization (letting smart algorithms make the calls)

YouTube is already using machine learning to make these decisions automatically.

They show us where the entire industry is headed: intelligent systems that give every viewer the best possible experience, no matter what they’re watching or how they’re watching it.

EnCodex is at the forefront of this evolution, bringing AI-powered encoding to companies of all sizes.

Understanding the magic behind smart encoding

Let’s peek behind the curtain and see what’s actually happening when AI makes your videos look better. You know that frustrating moment when your favorite show suddenly turns blocky and pixelated?

There’s some fascinating tech working to make sure that doesn’t happen.

The secret ingredients

When a smart encoding system analyzes your video, it examines four critical aspects of your content:

Motion intensity — Is this UFC fighters bouncing around the ring, or a peaceful nature scene?
Spatial complexity — Are we dealing with a simple cartoon or a detailed cityscape?
Texture patterns — Are there solid blocks of color or intricate details?
Scene changes — Does the video cut rapidly between shots or linger on steady views?

Each of these factors helps the system determine exactly how much data your video needs.

Breaking it down

Ever noticed how sports look different from cartoons on your TV? There’s a reason for that.

When the system handles a high-energy football game, it sees challenges everywhere: players darting across the field, complex crowd scenes in the background, detailed grass textures, and quick camera cuts following the action.

The system knows it needs to crank up the data to keep everything looking sharp.

But when that same system encounters an animated show, it spots an opportunity. Clean, simple lines and solid color blocks fill the frame. Transitions flow more smoothly, and there are fewer random details to manage.

The system realizes it can deliver perfect quality while using far less data — sometimes less than half!

Measuring success

Here’s where it gets cool. Enter Video Multi-method Assessment Fusion (VMAF), a quality metric created by Netflix that scores videos based on how they appear to human eyes.

Instead of just giving thumbs up or down, it provides a score from 0 to 100, with 100 being the best quality, helping find that sweet spot between visual excellence and bandwidth efficiency.

Picture a graph where better quality reaches upward and file size extends to the right. The perfect balance creates a curve at the top, showing you exactly where to find that ideal mix of crystal-clear video and efficient file size.

Real results that matter

When streaming services embrace this smart approach, the impact is dramatic:

Animation suddenly needs 20–50% less storage
Sports and action scenes get the extra attention they deserve
Viewers enjoy smooth playback without frustrating buffer wheels
Everyone saves money on bandwidth

This isn’t just theory — it’s already transforming how we stream. Modern systems can adjust on the fly as scenes change, learn from past successes, and even adapt to whether you’re watching on a phone or a 4K TV.

As AI gets even better at understanding what’s happening in your videos, these systems will only get smarter. Soon, every video you watch will get its own perfect encoding settings, automatically tuned.

EnCodex is working to make this technology more widely accessible beyond just the streaming giants, though it’s still in the prototype phase with promising early results.

A tale of three approaches

Remember clicking “Low,” “Medium,” or “High” quality on videos? We’ve come a long way. Let me show you how the biggest names in streaming handle video encoding today, and why it matters for what you see on screen.

The Netflix way

Netflix follows a methodical, data-driven approach. For each video, they encode it with one set of parameters, then calculate the VMAF score by essentially re-encoding it for comparison.

They repeat this process hundreds of times with different encoding settings to create a comprehensive overview of which parameters deliver the optimal balance between file size and visual quality.

This approach makes sense when you’re serving millions of viewers the same content. Your favorite Netflix show looks great because they’ve methodically identified the most efficient encoding parameters for it.

The catch? This process requires significant computing resources and time — producing excellent results, but neither quickly nor inexpensively.

And Netflix hasn’t stopped there. They’ve already advanced to their next innovation: scene-based encoding. Instead of using consistent settings throughout an entire show, they adapt the parameters as scenes change.

Quiet conversation? Less data needed. Explosive action sequence? More data allocated. It’s an effective approach, though extremely computationally intensive.

YouTube’s clever shortcut

YouTube faced a different challenge: hundreds of hours of new videos uploaded every minute. They needed something faster and smarter, so they taught computers to do what human video experts do.

Instead of testing every possible setting, YouTube’s system takes a quick look at your video and makes an educated guess about the best way to handle it.

It’s not always perfect, but it’s remarkably good and lightning fast — essential when processing the sheer volume of content they receive.

The EnCodex innovation

The EnCodex innovation This is where we spotted an opportunity. When Google announced that Gemini Pro 2.5 could actually “see” and understand videos, we had that lightbulb moment.

What if we could use this breakthrough to combine the best parts of both approaches?

Here’s what EnCodex does: it uses Gemini to analyze the complete movie, noticing things like motion speed, scene detail, and camera cuts.

It also identifies crucial segments for test encodings and calculating VMAF scores.

Instead of making quick guesses or running endless tests on the entire content, it makes smart, targeted decisions by focusing on representative parts of your specific video.

Bringing it all together

Each approach teaches us something valuable:

Netflix showed what’s possible when you pursue perfect quality
YouTube proved custom trained AI can make smart decisions at incredible speed
EnCodex combines both by leveraging modern multimodal LLMs like Google Gemini Pro 2.5

How EnCodex works its magic

Let me take you on a behind-the-scenes tour of EnCodex. Here’s what our system looks like:

Flowchart illustrating the EnCodex video optimization system. An input video goes through an input processor, low res encoder, and video splitter. A content analyser, powered by Google Gemini 2.5 Pro, examines the video. Following this are a test encoding generator and a quality metrics calculator. All intermediate stages feed data into a central ‘Graph State’. Finally, a recommendation engine uses this information to output a Resolution and Bitrate ladder in JSON format. — How EnCodex Works: Process Flowchart, image by author

Let me walk you through each step of how it works.

1. Input processor

EnCodex starts by examining your video file. It runs quick validation checks and gathers essential metadata — the basic information needed to understand what it’s working with.

2. Low resolution encoder

Next, we create a low-resolution version of your video at 240p. While this preview might not look impressive, it gives our AI system, powered by Google’s Gemini 2.5 Pro, all the data it needs to understand your content without getting bogged down by massive file sizes. At a two hour video we are looking at 400MB of data.

3. Video splitter

Instead of processing your entire video at once, we break it into manageable 50MB pieces. This approach makes the entire process faster and more reliable, letting us handle videos of any length efficiently.

4. Content analyser

We select three 50MB parts from the beginning, middle, and end of your video and send them to Gemini for analysis with our specialized prompt.

We also ask it to identify various representative scenes from these 50MB video segments for testing. This strategic sampling approach gives us a comprehensive understanding of your content while keeping processing time minimal.

5. Test encoding generator

Using the result of our AI’s analysis, we create test versions of specific video segments. For each type of content — whether it’s fast action or quiet dialogue — we try different quality settings to find the optimal balance. The system adapts its approach based on what it learned about your video’s complexity.

6. Quality metrics calculator

Now we measure the results using VMAF, an advanced tool that scores video quality based on human perception. Here’s what this analysis looks like in practice.

Video Encoding Quality vs. Bitrate with Convex Hull, image by author

Each point on this graph represents a different combination of resolution and bitrate. The red line shows the optimal choices — what we call the “convex hull.” Those green X marks? They’re the sweet spots we choose for our encoding ladder.

7. Recommendation engine

Using these results, we create your video’s optimal encoding settings. We pick the points that give you the best quality for different connection speeds, ensuring smooth playback for all your viewers.

The power of graph state

You’ll notice in our flowchart those arrows pointing to “Graph State.” This is EnCodex’s central memory system — it tracks and stores everything we learn about your video at each step. This information helps coordinate the entire process and ensures we’re making informed decisions throughout the optimization.

What this means

Behind all this technology is a simple goal: making your videos look great while keeping file sizes manageable. You upload your video, and EnCodex handles all the complexity. No more guessing about quality settings or wasting storage space.

And because we use smart sampling and efficient processing, you get premium results without lengthy processing times or expensive computing costs.

It’s expert-level video optimization made practical for real-world use.

Under the hood of EnCodex

Want to know exactly how we built EnCodex? Let’s dive into the technical stuff.

Our technology choices

We built EnCodex with five key technologies:

Python drives everything — it’s fast to develop with and perfect for AI work.
LangGraph orchestrates all the moving parts, making sure every step happens in the right order.
FFmpeg handles all the video processing (everything from creating previews to measuring quality).
UV manages our software packages.
Google’s Gemini API brings the AI magic.

How Gemini watches your videos

The truly groundbreaking aspect is Gemini’s ability to ‘watch’ and comprehensively analyze your videos.

When we submit video content to Gemini, we direct it to evaluate eight critical parameters and identify key representative segments.

Analyze this video sample and provide a structured assessment of the following 
content characteristics that impact video compression efficiency. 
 
For each numerical characteristic, provide a score from 0-100 and a  
brief justification: 
 
1. Motion intensity: [Score] - [Justification] 
2. Temporal complexity: [Score] - [Justification] 
3. Spatial complexity: [Score] - [Justification] 
4. Scene change frequency: [Score] - [Justification] 
5. Texture detail prevalence: [Score] - [Justification] 
6. Contrast levels: [Score] - [Justification] 
7. Animation type: [Type] - [Justification] 
8. Grain/noise levels: [Score] - [Justification] 
 
Also identify 3-5 representative segments (with timestamp ranges) that would be 
useful for encoding tests, including high-complexity, medium-complexity, and 
low-complexity sections. 
 
Provide the output in JSON format.

Gemini responds with detailed insights about your video in JSON— everything from how much movement it contains to how often scenes change.

This is the kind of analysis that used to require a human expert watching every frame.

The code that makes it happen

Here’s a streamlined version of our code that handles video analysis. First, we upload your video to Gemini:

def _get_or_upload_video(client, video_path, existing_uri=None): 
    """Upload a video file to Gemini API or reuse existing one.""" 
    if existing_uri: 
        try: 
            video_file = client.files.get(name=existing_uri) 
            if video_file.state.name == "ACTIVE": 
                return video_file 
        except Exception as e: 
            print(f"Failed to retrieve existing file: {e}") 
     
    video_file = client.files.upload(file=video_path) 
    while video_file.state.name == "PROCESSING": 
        time.sleep(5) 
        video_file = client.files.get(name=video_file.name) 
     
    return video_file if video_file.state.name == "ACTIVE" else None

Then, we ask Gemini to analyze it:

def _analyze_with_gemini(client, video_file): 
    """Use Gemini API to analyze the video content.""" 
    response = client.models.generate_content( 
        model="gemini-2.5-pro-preview", 
        contents=[video_file, ANALYSIS_PROMPT], 
    ) 
    return response.text

Smart optimization through caching

We’ve also built in some clever optimizations. Gemini lets us store up to 20GB of files for 48 hours.

This means we can analyze the same video multiple times without re-uploading it — a huge time-saver when we’re testing different approaches.

This caching system really shows its value when the API gets busy. Instead of waiting to upload the same video chunks again and again, we can reuse what we’ve already uploaded.

It’s these small optimizations that help make EnCodex both reliable and fast.

The real beauty of this system is how it makes complex video analysis feel simple. Behind the scenes, Gemini is doing incredibly sophisticated work, but our code just needs to ask the right questions and interpret the answers.

Results that speak for themselves

Our former tests have revealed something fascinating about how different content responds to EnCodex.

Here’s what we’ve discovered:

Content Type       File Size Reduction   Quality Improvement 
------------       ------------------    ------------------- 
Animation          40-45%                Minimal (already high) 
Drama/Dialogue     25-35%                10-15% higher VMAF 
Action/Sports      15-20%                20-25% higher VMAF

Look at what happens with animation — we’re cutting file sizes nearly in half while maintaining the already-high quality. For drama and dialogue scenes, we’re not just saving space, we’re actually making them look better, with VMAF scores jumping up 10–15%.

But here’s where it gets really interesting: those challenging action and sports scenes that typically give encoders headaches? EnCodex handles them beautifully, delivering 20–25% better quality while still shrinking file sizes by 15–20%.

For a streaming service with 10,000 hours of content, these improvements could mean millions in savings while actually delivering a better viewing experience. The best part? Each type of content gets exactly what it needs — no more one-size-fits-all approach.

Our next steps forward

We want to be upfront with you: EnCodex is still a proof of concept. It works beautifully, but we’re not quite ready for mission-critical production systems yet.

One challenge we’re facing is that Gemini Pro 2.5 itself is still stabilizing. We regularly encounter write timeouts when uploading videos and “model overloaded” errors.

These stability issues should resolve as Google continues to scale up their infrastructure.

Right now, we’re focusing on three key areas:

Making sure EnCodex performs just as well on your ten thousandth video as it did on your first
Speeding up the analysis and recipe generation process
Making it easy to plug EnCodex into popular encoding workflows and platforms

We’re especially excited about scene-based encoding, building on the groundbreaking work Netflix has already done in this area.

While Netflix showed that adapting to different scenes improves quality, we want to see if we can integrate this into EnCodex using AI rather than their computational-heavy approach.

It’s an exciting challenge that could take our quality optimization to the next level.

Why you should care

If you’re running a streaming platform, this technology means more than just saving money on storage and bandwidth. It means your subscribers get consistently great quality, whether they’re watching a nature documentary or an action blockbuster.

No more frustrating moments where the quality suddenly drops just as the scene gets interesting.

Join us on this journey

EnCodex is growing and evolving, and we’d love to have you be part of its development. The code is available on GitHub — whether you want to contribute, offer feedback, or just look under the hood.

The future of video streaming is content-aware, and with AI in the mix, that future is looking sharper than ever — at half the file size.

Don’t settle for yesterday’s encoding. Help us build the smarter, leaner future of streaming — your viewers deserve better.