You should use a parser to parse!

I recently took over a project that needed to parse the Http Live Streaming protocol (RFC 8216). The project used string manipulation and…

You should use a parser to parse!

I recently took over a project that needed to parse the Http Live Streaming protocol (RFC 8216). The project used string manipulation and regular expressions to parse the stream. The project had become a mess. I needed to add functionality. Before I started, I decided to clear the mess by using the library Sprache for creating a parser. The code became much more readable.

The source code of the C# HLS parser is available on GitHub

Parsing Playlists from Http Live Streams

You build a parser in Sprache by creating small sub parsers. This enables you to build the parser step by step. I started by creating the parser for HLS Playlist, which is a part of the specification. The first test validates if the playlist tag #EXTM3U is parsed. As the specification states that every playlist should start with this tag.

The PlaylistGrammar class contains all the sub parser. The TagIdStringParser is responsible for parsing Playlist tags.

Sprache uses the static Parser class to provide its functionality. Parsing the Tag id looks like a LINQ query. This is how you can specify a sequence using Sprache — using Parse.Char(‘#’).Once(), you specify that you expect the first character of the input string to be the hash # symbol and that it should occur once.

The second statement Parse.AnyChar.Until(Parse.Char(‘:’)).Text() will continue with parsing until it finds a colon. The .Text() at the end signals to Sprache that you want the parsed part returned as a string, default it will return an array of characters.

The last parse part Parse.AnyChar.Until(Parse.LineTerminator)).Text() specifies the other way that the Tag could end, with an end of line.

Parsing key-value pairs after the Tag

After the Tag, there can be a value or a set of key-value tags. For example:#EXT-X-VERSION:3
#EXT-X-STREAM-INF:BANDWIDTH=1280000,AVERAGE-BANDWIDTH=1000000

Here we can show the power of Sprache by combining multiple parsers from the Grammar class.

First, the test, we feed the parser a string with a set of key-value pairs separated by a comma. We check that all the attributes are parsed and are stored inside a collection of TagAttributes.

The MultipleTagAttributeParser inside the PlaylistGrammar class uses the TagAttributeParser to parse a single key value pair. By adding the .Many() to the TagAttributeParser, we can parse multiple key-value pairs.

I hope this shows that parsing text using the Sprache parser library makes parsing source code easier to read and extend. Sprache itself is open source and is available on Github.

If you want more details about parsing the HLS stream, the complete source code is available on Github. The parser is written in C# using .NET Core 2.2 using Visual Studio Code. The Unit tests are written using NUnit.