Technical · HLS · Cryptography

AES-128 HLS Downloader: How the Cipher Actually Works

Most articles tell you how to download an AES-128 HLS stream. This one explains why most tools break, what the cipher is actually doing per segment, and what a correct implementation must handle under the hood. Audience: developers, ops engineers, and power users who want to understand the machinery, not just run a command.

By the Vidora engineering team 11 min read

Researchers validating video pipelines, ops teams archiving course content they licensed, developers integrating HLS playback into their own apps: they all eventually collide with an EXT-X-KEY line in the manifest and a tool that silently produces a corrupt MP4. The encryption is not exotic. AES-128 in CBC mode is the same cipher that protects HTTPS traffic, database backups, and disk images. What makes it tricky in the HLS context is not the algorithm itself but the details layered around it: per-segment initialization vectors, key URL authentication, rendition-level key isolation, and key rotation across hundreds of segments. Vidora ships AES-128 decryption as a first-class feature, handling all of those edge cases natively inside the browser extension with no external process required.

If you want the practical step-by-step on downloading an encrypted HLS stream, the companion page How to download encrypted M3U8 video covers the three working methods with concrete commands. This page focuses on the underlying mechanics so you understand what any AES-128-capable tool must do internally.

1. What AES-128 means in the HLS context

AES stands for Advanced Encryption Standard, the symmetric block cipher standardized by NIST in 2001. "128" refers to the key length in bits: 128 bits = 16 bytes, which is the minimum AES key size. AES also comes in 192-bit and 256-bit variants, but HLS only defines METHOD=AES-128, so you will always be dealing with 16-byte keys.

The mode of operation is CBC: Cipher Block Chaining. Here is what that means concretely:

The practical implication: to decrypt a single HLS segment you need exactly three inputs: the 16-byte key, the 16-byte IV for that segment, and the encrypted ciphertext bytes. Get any one of those wrong and the output is garbage. There is no error thrown by the AES algorithm itself. It just produces different garbage.

Why AES-128 and not AES-256?

128-bit AES is already computationally impossible to brute-force with current hardware. The security delta between 128-bit and 256-bit AES is irrelevant for streaming media protection: the bottleneck is always the key delivery mechanism (HTTPS with authentication), not the cipher strength. 128-bit keys are also faster to compute, which matters when you are decrypting hundreds of segments in sequence on a mobile device or inside a JavaScript runtime.

For background on how HLS segments and playlists are structured in general, the M3U8 and HLS explainer covers the format from the ground up.

2. How EXT-X-KEY works in the m3u8 manifest

The #EXT-X-KEY tag is the mechanism by which an HLS playlist signals encryption. It can appear anywhere in a media playlist (not the master playlist) and applies to every segment listed after it, until the next #EXT-X-KEY tag overrides it. This scoping rule is the foundation of key rotation, which we cover in section 6.

A complete #EXT-X-KEY line looks like this:

#EXT-X-KEY:METHOD=AES-128,URI="https://cdn.example.com/keys/k1.key",IV=0x00000000000000000000000000000001,KEYFORMAT="identity",KEYFORMATVERSIONS="1"

Each attribute has a precise meaning:

Attribute Required? Meaning
METHODYesEncryption method: NONE, AES-128, or SAMPLE-AES
URIYes (if METHOD != NONE)URL of the 16-byte binary key file
IVNoExplicit 16-byte IV as a 32-hex-digit string (0x-prefixed)
KEYFORMATNoDefaults to "identity" (raw binary key). Other values signal DRM key delivery systems.
KEYFORMATVERSIONSNoVersion of the KEYFORMAT scheme, usually "1"

The IV: explicit vs. implicit

The IV attribute deserves close attention because its absence changes the decryption logic entirely.

When IV is present, the decryptor uses that exact value for every segment governed by this key. The IV is static per key entry.

When IV is absent, the HLS specification (RFC 8216 section 5.2) defines the IV implicitly: it is the segment's media sequence number, expressed as a 128-bit big-endian integer. So for segment number 42, the IV is 0x0000000000000000000000000000002A. A downloader that hardcodes IV = 0x00...00 when no IV attribute is found will correctly decrypt the first segment (sequence number 0) but produce garbage from segment 1 onward. This is one of the most common silent failure modes in naive implementations.

How the key file is structured

The URI points to a tiny binary file: exactly 16 bytes, no header, no wrapper. An HTTP GET to that URL returns the raw key octets. Any authentication (cookies, signed URLs, Bearer tokens) must be included in that request just as it was in the segment requests. This is where session inheritance becomes critical, and where many tools first break.

3. Why most downloaders fail on AES-128 streams

The failure modes are specific and correctable. Understanding them is the fastest path to evaluating whether a given tool is actually AES-128-capable or just claims to be.

Failure mode A: the key tag is ignored entirely

Unsophisticated HLS scrapers parse the playlist, collect segment URLs, download each .ts file, and concatenate the bytes. They never look at #EXT-X-KEY. The resulting file has valid container framing but encrypted payload bytes where the codec expects H.264 NAL units. Most players will open the file, report a valid duration, and then display a black screen or visual noise because every frame is corrupted. This accounts for the majority of "I downloaded the video but it won't play" reports on forums.

Failure mode B: the key fetch is unauthenticated

The key URL in production deployments almost never responds to anonymous requests. It sits behind the same CDN authentication layer as the video segments: signed URLs with expiry parameters, Referer checks, or cookie-gated access. A command-line tool that launches a fresh HTTP client without the page's session cookies will receive a 403 or 401 on the key request and either crash or silently skip decryption. Tools running inside the authenticated browser session do not have this problem because they inherit the session cookies automatically.

Failure mode C: the IV is hardcoded to zero

As described in section 2, when no IV attribute is present, each segment's IV is its media sequence number. A tool that always uses IV=0 decrypts only segment 0 correctly (since its sequence number is 0 by default). All subsequent segments produce garbled output. The audio track, if in a separate rendition playlist with its own sequence numbering, fails at a different offset, creating an audio/video desync that is very hard to diagnose without knowing the root cause.

Failure mode D: PKCS#7 padding is not stripped

AES-CBC pads the final block of each segment to 16 bytes. A correct decryptor must strip that padding before writing the segment to disk or passing it to the muxer. If the padding bytes remain, the segment container has extra garbage bytes at the end. For .ts segments this usually causes the next segment's sync byte (0x47) to be misaligned, which some muxers tolerate and others do not. The bug manifests as a corrupted final few frames in each segment boundary, usually invisible but occasionally causing a green flash or audio dropout.

Failure mode E: audio rendition uses a different key

In a master playlist with separate audio and video renditions, each rendition has its own media playlist. Each media playlist can have its own #EXT-X-KEY line with a different URI, different IV, or both. A downloader that fetches only the key from the video playlist and applies it to the audio segments will produce silent audio. The segment bytes decrypt without error (AES-CBC does not validate the key) but the resulting audio codec data is garbage.

4. What a correct AES-128 HLS downloader must do

Reducing the problem to its essential steps makes the requirements clear. A compliant AES-128 HLS downloader executes this pipeline for each segment:

# For each segment in the media playlist:
1. resolve_key(segment):
     key_url = segment.ext_x_key.uri          # from the governing EXT-X-KEY tag
     key     = fetch_authenticated(key_url)    # 16 bytes, with session headers
     iv      = segment.ext_x_key.iv            # explicit OR segment.media_sequence_number
     return (key, iv)

2. fetch_segment(segment.url)                  # authenticated, with Referer + cookies
   -> ciphertext (N * 16 bytes, padded)

3. decrypt_aes128_cbc(ciphertext, key, iv)
   strip_pkcs7_padding(plaintext[-16:])        # critical: remove trailing pad bytes
   -> raw_ts_or_m4s_bytes

4. mux_append(raw_bytes)                       # feed into MP4 muxer incrementally

The subtleties that trip up real implementations:

To find the m3u8 manifest URL in the first place, the DevTools method for finding m3u8 URLs is the standard starting point, and our M3U8 detector tool can surface manifest URLs without opening DevTools manually.

5. Vidora's AES-128 implementation in practice

Vidora ships AES-128 decryption as a core primitive, not an afterthought. The implementation runs entirely inside the Chrome extension's offscreen document, which is a hidden Chromium page that provides a full DOM and Web Crypto API without being a service worker. This matters for two reasons: the offscreen document can call crypto.subtle.decrypt() with AES-CBC parameters, and it maintains persistent state (key cache, segment buffer) across the entire download without the service worker's 30-second idle kill.

The per-segment key cache

A naively correct implementation fetches the key URL on every segment. For a 60-minute 1080p stream with 6-second segments, that is 600 key fetches even if the same key governs every segment. Vidora caches the raw key bytes by URI so each distinct key URL is fetched exactly once, with subsequent lookups served from memory. For streams with key rotation (different key per segment or per group), the cache still handles it correctly: each new URI gets its own cached entry, and expiry is tied to the download session lifetime.

Authenticated key fetch via the browser session

The key request is made through the extension's fetch call, which inherits the browser's cookie jar for the originating domain. No header forging is needed: the key CDN sees the same authenticated request the video player would send. This is the structural advantage of a browser extension over any command-line tool. yt-dlp's --cookies-from-browser flag approximates this but requires the cookies to have been persisted to disk, which some session-only cookies never are.

IV resolution: explicit and implicit paths

Vidora's manifest parser extracts the IV attribute if present and converts the 0x-prefixed hex string to a 16-byte Uint8Array. If the IV attribute is absent, the parser uses the segment's #EXT-X-MEDIA-SEQUENCE offset plus the segment's position within the playlist to compute the correct media sequence number, then encodes it as a 16-byte big-endian integer. Both paths produce an identical input type to crypto.subtle.decrypt(), so the decryption call is identical regardless of which IV source was used.

Separate audio rendition handling

When the master playlist declares separate audio and video renditions, Vidora resolves both media playlists independently and applies the key and IV from each rendition's own #EXT-X-KEY tag. This is validated against Apple-style audio rendition fixtures in the extension's E2E test suite (8/8 passing), which includes cases where the audio key URI differs from the video key URI. The muxer (a TypeScript port of mp4-muxer targeting fragmented MP4) receives pre-decrypted raw bytes from both renditions and interleaves them with correct timestamp alignment.

What Vidora does not do

Vidora decrypts METHOD=AES-128 streams, which are governed by the open HLS specification. It does not bypass KEYFORMAT values other than "identity" (which are always DRM key delivery systems: FairPlay, Widevine via HLS). It also does not decrypt SAMPLE-AES content that uses DRM-bound keys. This is by design, not a limitation. For the alternatives landscape, the Vidora alternatives page covers the full range of HLS download tooling.

6. Edge cases: key rotation, signed URLs, byte ranges, LL-HLS

The baseline AES-128 case (one key per playlist, IV derived from sequence number) is well supported by any tool that passes a basic correctness check. The edge cases separate production-grade implementations from toys.

Key rotation

The HLS specification allows a new #EXT-X-KEY tag to appear at any point in the playlist. When it does, every segment after that point is encrypted with the new key (and possibly a new IV). Some platforms rotate keys every segment (one unique key per 6-second chunk), which is used as a lightweight anti-scraping measure. A correct downloader must re-evaluate the governing key for every segment in parse order, not once at playlist parse time. Vidora's segment iterator tracks the "current key context" as it walks the playlist line by line, updating it on each #EXT-X-KEY encounter.

Signed key URLs with short-lived tokens

Some CDNs (Bunny.net with token auth, AWS CloudFront with signed URLs, custom implementations) embed expiry parameters in the key URL itself. The key URL might look like https://cdn.example.com/keys/k1.key?token=abc&expires=1715000000. If the download takes longer than the expiry window (typically 5-15 minutes), the key fetch fails partway through the stream. The correct mitigation is to fetch keys on demand rather than pre-fetching all keys upfront, combined with early detection of 403 responses that could signal token expiry.

Byte-range segments

HLS permits segments to be byte ranges within a single large file, using the #EXT-X-BYTERANGE tag. An encrypted byte-range segment requires a Range request against the container file, followed by AES-CBC decryption of exactly those bytes. The IV for a byte-range segment is always explicit in the EXT-X-KEY tag (using an implicit IV would require knowing the exact byte offset in terms of AES blocks, which is ambiguous). Any tool that handles byte-range segments must issue a Range: bytes=N-M HTTP request and then decrypt the returned slice.

Low-Latency HLS (LL-HLS)

LL-HLS (Apple's extension defined in the HLS spec draft) introduces partial segments and preload hints. AES-128 encryption applies to partial segments as well, with the same CBC mechanics. The difference is that partial segments may be streamed before they are complete, so a decryptor cannot rely on the final block being available when it starts processing. LL-HLS decryption requires streaming-capable AES-CBC that buffers ciphertext in multiples of 16 bytes and flushes output as complete blocks arrive. This is a non-trivial streaming implementation challenge distinct from the standard batch-per-segment model.

Manual verification: decrypting a single segment with OpenSSL

When debugging a pipeline, it is useful to verify that your key and IV are correct before trusting the download tool. You can do this manually with curl and OpenSSL:

# 1. Fetch the 16-byte key and hex-encode it
curl -s "https://cdn.example.com/keys/k1.key" | xxd -p | tr -d '\n'
# output: 0a1b2c3d4e5f...  (32 hex chars)

# 2. Fetch one encrypted segment
curl -s "https://cdn.example.com/seg001.ts" -o seg001.enc.ts

# 3. Decrypt with OpenSSL (IV must be 32 hex chars, no 0x prefix)
openssl enc -d -aes-128-cbc \
  -K "0a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d" \
  -iv "00000000000000000000000000000001" \
  -in seg001.enc.ts \
  -out seg001.dec.ts

# 4. Inspect: a valid .ts segment starts with sync byte 0x47
xxd -l 4 seg001.dec.ts

If the first byte of the decrypted output is 47, the key and IV are correct and you have a valid transport stream segment. If it is anything else, your IV derivation is wrong. This test is tool-agnostic and works as a ground-truth check before investing time in a full pipeline debug.

For converting the resulting segments to a playable file, the online M3U8 to MP4 converter handles unencrypted segments after you have verified decryption manually. For a comparison of extension-based approaches, see the Vidora vs Video DownloadHelper breakdown.

7. Frequently asked questions

What is the difference between AES-128 HLS and DRM?

AES-128 in HLS is a transport-layer encryption defined in the open HLS specification (RFC 8216). The key is a plain binary file fetched over HTTPS alongside the video. Any HLS-compliant player, including open-source ones like VLC and ffmpeg, can read it. DRM (Widevine, FairPlay, PlayReady) is a content-layer protection: the key is bound to a hardware-backed Content Decryption Module on the playback device and is never exposed to user-space code. AES-128 is not DRM. Vidora, ffmpeg, and yt-dlp can all handle AES-128. None of them bypass DRM, by design.

Why does my downloader produce a file that plays only the first segment correctly?

Almost certainly an IV bug. If the EXT-X-KEY tag has no explicit IV attribute, the IV for each segment is its media sequence number encoded as a 128-bit big-endian integer. A tool that defaults to IV = 0 will only correctly decrypt segment 0. Segments 1, 2, 3... will be decrypted with the wrong IV, producing garbage video data that the codec cannot render. Check your tool's changelog for "IV handling" or "media sequence IV" fixes.

Can the open-source ffmpeg or yt-dlp handle AES-128 HLS?

Yes, both handle METHOD=AES-128 correctly when given the right session context. ffmpeg fetches the key URL natively when it processes the playlist. yt-dlp does the same. The common failure with both tools is not the AES implementation but the authentication layer: the key URL requires cookies or a signed token that the tool does not have unless you explicitly pass them. Inside a browser extension the session is inherited automatically, which is why extension-based tools tend to have a higher success rate on production streams without manual configuration.

What is KEYFORMAT="identity" and when does it matter?

KEYFORMAT="identity" means the URI points to a raw 16-byte binary key file. This is the default when the attribute is absent and covers all standard AES-128 streams. Non-identity KEYFORMAT values (like "com.apple.streamingkeydelivery" for FairPlay) signal that the key is delivered via a DRM key exchange protocol, not as a raw file. A downloader that only handles identity keys (which is all that the open HLS spec requires) cannot do anything with non-identity keys. This is not a bug; it is the boundary between AES-128 and DRM territory.

How much performance overhead does AES-128 decryption add?

On modern hardware with AES-NI instructions (every x86 CPU since 2010, every Apple Silicon chip), AES-128-CBC throughput exceeds 10 GB/s. Decrypting a 6-second 1080p segment (roughly 3-6 MB) takes under a millisecond. The bottleneck is always network bandwidth and segment fetch latency, never the cipher. In JavaScript via crypto.subtle.decrypt(), which delegates to the browser's native AES-NI implementation, the overhead is similarly negligible compared to the segment download time. You will not notice any decryption slowdown in practice.

Does AES-128 HLS encryption protect against redistribution?

No. AES-128 in HLS is a transport protection, not a content rights management system. Once a compliant player decrypts the segments for playback, the raw video bytes are in memory and accessible to anyone with the right tooling. AES-128 raises the bar above "right-click save" and prevents casual hotlinking of segment URLs (since each key fetch needs auth). It does not prevent a determined person with authenticated session access from saving the content. That is the honest capability boundary. Content that needs strong redistribution protection requires DRM, which is a fundamentally different system.

What is SAMPLE-AES and how is it different from AES-128?

METHOD=SAMPLE-AES encrypts individual codec samples (NAL units in H.264, AAC frames) rather than the entire transport stream segment. The effect is that the container framing (the .ts or .m4s wrapper) remains unencrypted and readable, but the media data inside is encrypted. This makes it harder for a naive scraper to detect that it has downloaded encrypted content. SAMPLE-AES is used with FairPlay on Apple platforms where the CDM decrypts samples inside the playback pipeline. It is not the same as AES-128, and tools that handle AES-128 do not automatically handle SAMPLE-AES with DRM-bound keys.

Conclusion

AES-128-CBC in HLS is a well-specified, auditable encryption scheme. Its failure in practice is almost never about the cipher and almost always about the plumbing around it: IV derivation, authenticated key fetches, per-rendition key isolation, and padding cleanup. A correct AES-128 HLS downloader handles all of these as first-class concerns, not special cases. Vidora was built with that correctness requirement from the start, validated against 8/8 E2E fixtures covering the range of real-world production configurations. For the practical walkthrough on using any of these tools against a real stream, the companion guide on downloading encrypted M3U8 video picks up from here.

About the author

RGC Digital LLC builds Vidora, a Chrome extension for downloading HLS, DASH, and MP4 video from Vimeo, Bunny.net, Wistia, and Loom. We test every method we publish against real production streams and maintain an E2E test harness covering AES-128, DASH fMP4, and Apple-style audio renditions.

Related reading