==Phrack Inc.== Volume 0x10, Issue 0x47, Phile #0x06 of 0x11 |=-----------------------------------------------------------------------=| |=--------------=[ MPEG-CENC: Defective by Specification ]=--------------=| |=-----------------------------------------------------------------------=| |=---------------------=[ David "retr0id" Buchanan ]=--------------------=| |=-----------------------------------------------------------------------=| --[ Table of Contents 0 - Introduction 1 - The Video Streaming DRM Landscape 1.0 - Pointing a Camera at the Screen (Aka “The Analog Hole”) 1.1 - Digitally Recording the HDMI Port 1.2 - Exfiltrating the Decrypted but Not-Yet-Decompressed Data 1.3 - Exfiltrating Content Keys 1.4 - Exfiltrating CDM Secrets 1.5 - EME, MSE, WTF? 2 - The DeCENC Exploit 2.0 - How to Bypass a Video Decoder 2.1 - Leveraging I_PCM 2.2 - The Devilish Details 2.2.0 - Background: AES-CTR 2.2.1 - NAL Unit Emulation Prevention Bytes 2.2.2 - Chroma Subsampling 2.2.3 - Limited Range Color 2.2.4 - Crafting I_PCM Bitstreams 2.2.5 - Metadata Preparation 2.2.6 - Video Stream Substitution 2.2.7 - Putting It All Together 3 - Capabilities 4 - Mitigations 5 - Aside: Learning about h264, MP4, and ISO-BMFF 6 - Reflections 7 - References [================= [ 0. Introduction [================= You've probably heard the saying "DRM is defective by design". It's true, and I can prove it. In this paper I present DeCENC, a generic attack on the MPEG-CENC file format. DeCENC enables decryption of video files without direct knowledge of the key. The fundamental flaw involves the use of encryption without authentication - a rookie error[0], although exploiting it in this context is fiddly, to say the least. MPEG-CENC is not DRM[1], but it is an encrypted media container format commonly used as part of DRM systems. Any DRM'd playback system that correctly implements the MPEG-CENC specification is conceptually vulnerable to DeCENC. The attack relies on interactions with video codec features present in either h264 (AVC) or h265 (HEVC), which are both widely supported. Applicability to other codecs is plausible but has not yet been investigated. DeCENC is a security research tool that may be used to assess the robustness of CENC-compatible video DRM systems. Although the exploit aims to be generic, I make no specific claims of compatibility with any particular DRM system or configurations thereof. However, the PoC source release includes documentation for testing against "ClearKey", a pseudo-DRM scheme defined as part of the W3C's EME specification[2]. The source is available here[3]: https://github.com/DavidBuchanan314/DeCENC By the way, all the relevant MPEG specs are paywalled (thanks ISO,) so I'll try to keep my explanations here self-contained. [====================================== [ 1. The Video Streaming DRM Landscape [====================================== Before I get into the attack itself, I'd like to give some background. I'm trying to steer clear of vendor-specific implementation details, lest I lose the Do Not Violate The DMCA Challenge (2024 edition,) so here's an overview of how a generic video streaming DRM system might work: +----- The Big Scary DRM Black-Box -----+ | | +----------+ | +-------------+ | | | | | License | | | Movies |<---->| Acquisition | | | R Us | | +-------------+ | | dot com | | | Keyz | | (content | | v | | provider)| | +-------------+ +---------------+ | +---------+ | |----->| Decryption |-->| Video Decoder |---->| Monitor |-> eyes +----------+ | +-------------+ +---------------+ | +---------+ +---------------------------------------+ Like most video on the internet, it's compressed, with a codec like h264. But now it's encrypted, too. Your computer needs to decrypt it before it can render it to your screen, and that's where a CDM (Content Decryption Module) comes in. The CDM runs on "your" device, and is either implemented using software, secure hardware (e.g. inside a secure enclave,) or some combination of the two. My diagram represents it as "The Big Scary DRM Black-Box" - you're not supposed to be able to tamper with it, or meaningfully inspect its operation. In theory. Before the CDM can decrypt the video, it needs the decryption key. How does the key get inside the CDM? It depends, but normally there's a protocol between the CDM and the content provider. During "license acquisition", the content provider decides whether it trusts the CDM, whether the user has permission to access the content, etc. If the licensing authority is happy with all the details, then it'll issue a "license" (containing relevant key material) to the CDM. This protocol is secured so that an eavesdropper can't just sniff keys as they travel over the network. MPEG-CENC is a container file format that stores the metadata a CDM needs in order to do its job, telling it which parts of the file are encrypted, how, and with which keys. It doesn't store keys directly (that would be too easy to break!) but instead references keys by an ID. The CDM is responsible for figuring out how to map a key ID to an actual decryption key. CENC stands for "Common ENCryption", the idea is that it's a common standard that many DRM systems can share. This is convenient for streaming platforms, because they can (in theory) serve the same file to all their users, regardless of which DRM system they're using (because not all platforms support all DRM systems.) It's important to note that CENC is just a file format. The CENC specification doesn't say anything about how DRM should work, it is only concerned with encryption metadata. You could in theory use CENC for some non-DRM purpose, or architect the DRM differently to what I just described above. So that's how it's all *supposed* to work. Now let's go through some common ways that systems like this are broken, ordered roughly from easiest to hardest. --[ Method 0: Pointing a Camera at the Screen (Aka “The Analog Hole”) This attack is so low-tech that it's impossible to prevent, although watermarking can discourage it. No matter how good your camera is, your recording will be imperfect. Sometimes called a "camrip", these are the bottom of the barrel in the video archival scene. --[ Method 1: Digitally Recording the HDMI Port HDCP ("High-bandwidth Digital Content Protection") is supposed to make this impossible, by encrypting the video link, but in practice even newer versions of HDCP are trivially bypassed using "splitter" dongles[4]. Similarly, it may be possible to record a device's screen using pure software methods, although CDMs can take steps to prevent this using platform-specific features. The result of this approach is much better than a camrip, but it also necessitates re-compressing the video data. This is undesirable because it either inflates the file size, introduces codec artifacts, or both. This problem is known as Generation Loss[5]. The resulting video file might be labeled as a "WEBRip". --[ Method 2: Exfiltrating the Decrypted but Not-Yet-Decompressed Data Video decoding (i.e. decompression) is a separate process to decryption. At the very least, these will be implemented by two different areas of software, or even different pieces of hardware (e.g. a hardware video decoder.) CDMs will do their best to prevent it, but as the data travels between these two components it is potentially exposed to adversarial archivists. --[ Method 3: Exfiltrating Content Keys For decryption to work, the relevant keys must be held *somewhere* within the walls of the CDM, within the playback device owned by the attacker. The keys can be obfuscated[6], put in secure hardware, etc., but they're still in there somewhere. A sufficiently determined attacker will always be able to get them back out again. Cryptographic side-channel attacks[7] are very much on the cards here. --[ Method 4. Exfiltrating CDM Secrets In practice, the CDM must contain some sort of key material that it uses to authenticate itself as genuine, during License Acquisition (i.e. content key provisioning.) This key material might be provisioned to hardware during device manufacturing, or it might just be another software-obfuscated secret. If this identification/authentication material can be extracted[8][9][10] (or perhaps merely "code lifted"[11], in the case of software obfuscation,) then an attacker can replace the whole CDM with their own code, and request content keys from the licensing authority directly. They'll still need permission to view the content (e.g. a premium account on a streaming service,) but now they can trivially access its decryption keys. This general approach is perhaps the most difficult to achieve in the first place, but once you've got it working it's extremely repeatable. Those last 3 techniques all permit an archivist to get a complete and "untouched" copy of the original video file, without any re-encoding or other losses. The resulting file might be referred to as a "WEBDL", which is as good as it gets for archival of streamed videos (Note: Some people use the terms "WEBDL" and "WEBRip" interchangeably. I'm not one of those people.) Truly discerning archivists will usually opt for files sourced from physical media[12] however, but that's out of scope for this paper. Every time you see "WEBDL" or "WEBRip" in a media file name, it's likely that one of the above techniques were used to obtain it, or some variation thereof. From the existence of these files we can perhaps infer that DRM is a "solved problem" (from the archival perspective, at least,) but many of those solutions remain closely guarded secrets. --[ 1.5: EME, MSE, WTF? There's one last piece of background to get out of the way before I move on to the fun stuff. EME stands for Encrypted Media Extensions. It's a standardized API for the web platform that allows web pages to show DRM-encumbered content. CENC still exists as a standalone format, but it's most commonly used today as a subcomponent of EME. EME doesn't specify any actual DRM, it just describes an interface between DRM systems and web browsers. MSE stands for Media Source Extensions. It's a closely related API that allows for more flexibility in how video data gets piped into HTML