• Fullstack Data Engineer
  • Posts
  • System Design Lessons from Unreeling Netflix: Understanding and Improving Multi-CDN Movie Delivery

System Design Lessons from Unreeling Netflix: Understanding and Improving Multi-CDN Movie Delivery

The paper offers an insight into Netflix’s architecture around 2011 when it was moving more of its infrastructure to AWS. It shows how Netflix uses many cloud services to achieve Internet video delivery at scale. Netflix leveraged 3 CDN providers to optimize movie delivery to its users. The use of a manifest file is a great design pattern to allow backend servers to send custom configurations to clients without needing a new release or update. Leveraging multiple CDNs allowed clients to fall back on others when one was not performing well. In addition, Netflix adopted the DASH protocol to allow for adaptive streaming over HTTP by breaking up video and audio clients into chunks so that the client could dynamically select the next best chunk to download based on network conditions. This paper provides a great overview of Netflix’s infrastructure, cloud architecture, and user flow for streaming content to users at scale leveraging third-party cloud infrastructure.

System Design Components

  • Content Delivery Networks (CDN)

  • Manifest File

  • DNS

  • Adaptive Streaming using the DASH protocol

  • Frontend server

  • Backend server

  • Client with video playback

  • Logging

System Design Principles

  • Leverage AWS and cloud infrastructure as much as possible to benefit from economies of scale, so you don’t need to provision your own infrastructure and hardware. Netflix uses many cloud services to achieve Internet video delivery at scale.

    • Netflix hosted its frontend websites on its own data center but offloaded much of the other work to AWS services like S3 and Cassandra for storing video files, file storage, and other computers for data processing.

    • The frontend website was hosted on its own data center, but source files were stored in Amazon and copied to CDNs.

  • Host your content on multiple CDNs and give clients the ability to select alternate CDN if one is not performing well. Each CDN provider can provide different SLAs depending on the infra they have in a location, load, and other factors. When choosing a CDN, you want the one with the highest availability and location and can support a high number of concurrent users or clients at a time.

    • Not all CDNs are created equally, and their quality changes at different times and locations. Network conditions drastically change over a day.

  • Using a manifest file in either JSON or XML is a key design decision that allows the backend to dynamically send metadata to the frontend clients without needing to issue a new build or release. It allows the backend to send a customized file based on the client and be updated independently of the frontend clients. This is especially important for Netflix as it has to support a wide variety of clients ranging from desktops, laptops, mobile phones, and smart TVs all with a range of network and hardware capabilities.

    • Hard coding config values on the clients require updates and are difficult to manage at Netfilx’s scale. Instead, have the client send the backend its capabilities, so the server can generate a custom set of configurations such as CDN selection, video format, etc.

System Architecture

  • www.netflix.com

    • Frontend hosted on a server in the Netflix data center.

    • Handles two functions: 1) registration of new user accounts and capture of payment information or 2) redirects to movies.netflix.com or signup.neflix.com depending on if the user is logged in or not.

    • Clients do not interact with this server during video playback

  • agmoviecontrol.netflix.com and movies.netflix.com

    • Hosted in AWS cloud using a variety of services such as EC2, S3, and VPC.

    • Infrastructure in AWS responsible for handling content ingestion, log recording/analysis, DRM, CDN routing, user sign-in, and mobile device support

  • 3 CDN providers (Level 3, Limelight, Akamai)

    • Encoded and DRM videos stored in AWS but copied to CDNs.

  • Frontend Video Player

    • Microsoft Silverlight (deprecated) is used on desktop web browsers. The paper focused on desktop streaming. The key point is that it used DASH (Dynamic Streaming over HTTP).

      • In DASH, video is encoded in several different quality levels and divided into chunks, no more than a few seconds. Clients request one chunk at a time via HTTP. Players can freely switch between different quality levels to adapt to network conditions.

  • Logging

    • After playback starts, the Netflix player sends logs /heatbeat and /logblob to agmoviecontrol.netflix.com.

    • CDN selection strategy

    • Netflix's selection strategy did not change over a period of a few days and was based on the user account regardless of the movie type, computer, or location.

Video Streaming Flow

  • A user who wants to stream video opens up Netflix and is redirected to a page to stream the video

  • The user clicks the “Play Now” button and the browser downloads the Silverlight player.

  • The client sends the Netflix backend server its own capabilities such as only being capable of rendering h.264 encoded video or only playback .wmv.

  • Netflix returns an XML manifest file custom-tailed to that client with all the necessary information to start streaming from CDNS. Delivered over SSL.

    • Contains a list of 3 CDNs with a rank and weight for priority. The client can fallback to other CDNs if one is not performant enough.

    • The manifest file also contains the location of the trickplay data, video/audio chunk URLs for multiple quality levels, and timing parameters such as time-out interval, polling interval, etc.

  • The client supports Trickplay such as pause, rewind, forward, and random seek in 10-second intervals.

  • Manifest file also contains information for trickplay such as thumbnail resolution, pixel aspect, trickplay interval, and CDN from where to download the trickplay file.

  • The client begins downloading audio and video chunk data from the preferred CDN.

  • The client uses DASH protocol to perform adaptive streaming, which adjusts the quality of the movie being delivered based on the user's network conditions.

  • The client periodically sends keep-alive messages and logs to /heartbeat and /logblob to agmoviecontrol.netflix.com.

Key terms

Control Server

A control server is a type of server that is used to manage and monitor the activity of other servers or devices in a network. It acts as a central point of control for the network, and is responsible for tasks such as allocating resources, monitoring performance, and providing access to data and services. Control servers can also be used to implement security policies, manage backups, and perform other administrative functions. It is a key element in the network infrastructure and can be used to manage and control several other servers and devices that form the network.

SSL

SSL works by using a secret code, called a "key", to scramble the information that is sent between your computer and the website you're visiting. When your computer wants to talk to the website, it first asks the website for its key. The website then sends its key, and your computer uses that key to scramble the information it's sending. Once the website receives the scrambled information, it uses its own key to unscramble it and read it. This way, if someone tries to listen in on the conversation, they wouldn't be able to understand the information because it's scrambled. It also helps to ensure that you are connected to the real website and no one else is pretending to be it.

Bitrate

A bitrate is like a measure of how fast the information is moving when you watch videos or listen to music on the internet. Just like when you're running, you can go faster or slower, videos and music can also go faster or slower. Bitrate is the number of "bits" of information that are moving every second. So if a video has a high bitrate, that means it has a lot of information moving quickly and it will look and sound better. If it has a low bitrate, it will look and sound more choppy and low quality.

DNS

A Domain Name Server (DNS) is a system that translates domain names, such as www.example.com, into IP addresses, such as 192.0.2.1. When a user types a domain name into their web browser, the browser sends a request to a DNS server to resolve the domain name to an IP address. The DNS server looks up the IP address associated with the domain name in its database and returns it to the browser. The browser then uses the IP address to connect to the web server hosting the website associated with the domain name and requests the web page from the server. The server responds by sending the requested web page to the browser, which then displays it to the user.

Sender Window in TCP Session

In TCP, the sender window is communicated to the receiver using a window size field in the TCP header. The sender keeps track of the amount of unacknowledged data (i.e. data that has been sent but not yet acknowledged) and the receiver keeps track of the amount of buffer space available. This way, the sender can ensure that it is not sending more data than the receiver can handle.

Trick play

Trick play, or trick mode as it is sometimes called, is a feature that gives viewers visual feedback while they are rewinding or fast-forwarding a stream (i.e., 'scrubbing' through it).

Manifest File

Using a manifest file in either JSON or XML is a key design decision that allows the backend to dynamically send metadata to the frontend clients without needing to issue a new build or release. It allows the backend to send a customized file based on the client and be updated independently of the backend.