New year, new side project?

I love working on side projects, do you? They let me experiment with new technologies and investigate new frameworks. During my workday, I…

Patrick Kalkman

Jan 10, 2020 — 6 min read

Photo by Ian Schneider on Unsplash

New Year, New Side Project?

I love working on side projects, do you? They let me experiment with new technologies and investigate new frameworks. During my workday, I am always busy with getting things done and finding ways to be productive. Therefore, it is often impossible to expand my skill set or learn more about new technology.

At the start of this year, I decided to start with a new side project. Together with implementing this project, I will also write about the struggles and triumphs. As with any new project, I will start by defining the requirements and describing a global architecture that includes the technologies I would like to investigate.

Mini Video Encoder

Meet my new side project, the “Mini Video Encoder (MVE).” MVE will be a platform to convert/encode videos in such a way that they can be delivered to multiple devices via streaming. This project can be used, for example, when you want to start building your own Netflix ;-).

MVE Requirements

Must-haves

Scalable and portable
Open-source
Optimized for streaming, support multiple authoring specifications
Support modern video and audio encodings (x264, x265, VP9, AV1)
Support delivery via MPEG-DASH, HLS and Smooth Streaming
Support major DRMs (Google Widevine, Apple Fairplay, Microsoft PlayReady)

Could haves

Per title encoding based on an automatic quality determination using VMAF
A higher degree of scalability by splitting titles into small segments

The rationale for the Must-haves

Scalable and portable

I want the product to be scalable and portable, not tied to any cloud vendor or commercial product. Video encoding is exceptionally resource-intensive, and therefore, it should be possible to distribute the load on to multiple machines. These machines could run on-premise or with any one of the existing cloud vendors such as Microsoft Azure, Amazon AWS, or Google Cloud.

Open-Source

MVE will be open-source and hosted on GitHub so that other programmers can learn and contribute. The project will use several open-source frameworks and components, so it makes sense to also open-source it.

Optimized for streaming

MVE should be optimized for encoding videos in such a way so that it can deliver video content via streaming to multiple devices. It should support various authoring specifications, preferably being able to extend it using configuration or adding records to a database.

Support modern video and audio encodings

MVE should support advanced video encodings such as x264/AVC, x265/HEVC, VP9, and AV1. Also, it should support the most common audio codecs, such as Opus, AAC, and AC3.

x264/AVC is the most commonly used format for the recording, compression, and distribution of video content. x265, VP9, and AV1 are new encoding techniques that offer better or the same quality as x264 at equal resolution and lower bitrate. x265, VP9, and AV1 encoders are much slower than x264 encoders but provide better compression.

Support modern packagers

MVE should be able to package the encoded content so that it can deliver content via Apple HLS, MPEG-DASH, and Smooth Streaming. These three are the most commonly used streaming delivery protocols.

Support major DRMs

MVE should be able to encrypt content so that it can be delivered securely using DRM such as Google Widevine, Apple Fairplay, or Microsoft PlayReady. DRM support makes it possible to deliver content securely from the major distributors such as Sony, Warner Bros to all sorts of devices.

The rationale for the Could-haves

Per title encoding

Instead of having fixed authoring specification, it should be possible to change the encoding ladder based on an automatic quality determination using VMAF. Video Multi-method Assessment Fusion, or VMAF for short, is a video quality metric that combines human vision modeling with machine learning. Netflix developed VMAF and open-sourced it on June 2016 and deployed it to Github.

Even better scaling

Video encoding is very resource-intensive, and you can improve the speed of encoding by scaling your hardware. Vertical scaling means increasing the CPU processing power of a machine. Vertical scaling seems to be the easiest way to decrease encoding time.

Another way to scale is to scale horizontally by using multiple machines to encode. Horizontal scaling will work if you are encoding various encoding ladder or numerous videos, but I want it also to work with a single bitrate encoding. The way to make this work is to split the input video into multiple small segments, for example, segments of 10 minutes, and encode all the little parts on numerous machines. After the encoder finishes, a separate step is needed to combine all the encoded segments into the final encoded video.

Video Encoding

Before we dive into the architecture of the project, it is necessary to explain the process of video encoding and packaging. Video encoding is the process of compressing and changing the format of video content. Video encoding is essential because it makes it easier to transmit video over the Internet. Compression reduces the needed bandwidth while still giving quality experience. Another reason for encoding is compatibility. Different devices require different video or codec formats.

Below a schematic workflow can be seen that shows a simplified video encoding workflow

All video encoding starts with uploading the input video, here “Tears of Steel.mov,” so that the encoding workflow has access to the input video. The input video is often of very high quality. The Encoder encodes the input video three times to produces three different encodings. Each encoding is of a different resolution and bitrate (SD, HD, UHD).

The Packager converts the three different encodings to two different packages, MPEG-DASH and HLS. The encoding workflow deploys the HLS and MPEG-DASH files to the webserver that streams the video content.

Global architecture

The diagram below shows my original architecture of the MVE project. It consists of six separate services that should be able to run inside a Docker container.

Workflow admin

The Workflow admin is responsible for visualizing and managing the encoding workflow. It shows which encodings are running, inserting new encoding jobs, and editing existing ones. The Workflow admin will be presented via a Web interface. I am not sure which technology to use just yet but probably one of the current javascript UI frameworks (Angular, React, Vue)

Workflow database

The workflow engine will store the state of the encoding workflow in a MongoDB database. I choose a document database to have the flexibility to change the schema during the implementation. I think this will be easier than working with an SQL database. Also, I have never worked with MongoDB, so an opportunity to learn something new.

Workflow engine

The Workflow engine service is the hart of the encoding workflow. It offers an API to the rest of the workflow services and holds the state of all running encodings. I will implement the Workflow Engine using Node.js. The REST API will be implemented using Fastify. I have worked with Express in the past but want to try something different, therefore Fastify. I use Mongoose because I like the way Mongoose allows me to define objects with a strongly-typed schema mapped to a MongoDB document.

Encoder

The Encoder service is responsible for encoding video content. The Encoder will be implemented using Node.js. The Encoder will make use of the open-source video encoder FFmpeg and the NPM package fluent-FFmpeg to make it easier to interact with FFmpeg. FFmpeg supports the requested video (x264, x265, VP9, and AV1) and audio encoders (Opus, AAC). The Encoder service needs to be highly scalable so we can have multiple instances of this Encoder service running.

Packager

The Packager service is responsible for packaging and protecting multiple encoded video streams according to a specific packaging protocol such as MPEG-DASH, HLS, or Smooth Stream. The Packager service will be implemented using Node.js. The Packager uses Bento4, Shaka Packager, Mp4Box, and FFmpeg, which are various open-source applications for packaging the encoded content. These packager applications also support encrypting the encoded content to implement the required DRMs.

Asset Storage

Most video files, especially the master video files, are huge. The Asset Storage service is responsible for storing the master video files, the encoded video content, and the packaged video content. The Asset Storage is also responsible for providing access to these files so that the other services can access these files. Currently, I have no idea what technology or external service to use for storage. During development, I will use local hard disk space for Asset Storage.

The project on Github

If you want to follow my progress or wish to contribute, message me on Medium, email, or GitHub. In the next post, I will start implementing the workflow engine using Fastify and the workflow database using MongoDB.