ESAM Explained: How AWS MediaConvert Uses ESAM to Insert Ad Markers in HLS Playlists
Author
Alexian Kauffmann
Date Published
Modern streaming workloads increasingly rely on server-side ad insertion (SSAI), where ad breaks are inserted into HLS manifests on the server to create a seamless, broadcast‑like viewing experience. In the AWS video stack, a typical VOD flow consists of the following components:
- AWS Elemental MediaConvert transcodes source media and generates HLS playlists.
- AWS Elemental MediaTailor reads those playlists, interprets ad markers, communicates with an ad decision server, and generates personalized manifests.
- A player, such as Video.js, plays the resulting HLS stream as a continuous playback experience.
Accurate ad signaling is central to this workflow. The packager, in this case MediaConvert, must determine:
- When ad opportunities occur in the program timeline.
- What those events represent in SCTE‑35 terms.
- How those events should be expressed as HLS tags for downstream systems
This is where ESAM (Event Signaling and Management) becomes important. ESAM provides a standardized XML-based mechanism for describing ad events and instructing packagers such as MediaConvert to modify HLS manifests accordingly.
ESAM in Context: ESAM vs. SCTE‑35 vs. HLS Tags
To understand the role of ESAM, it is important to distinguish between the three signaling layers involved in an SSAI workflow.
SCTE-35
SCTE-35 is a binary, in-band signaling standard widely used in broadcast environments to identify splice points, durations, and segmentation events.
It defines signaling semantics such as:
- Splice command types
- Segmentation event identifiers
- UPIDs (Unique Program Identifiers)
HLS Ad Tags
HLS ad tags are textual markers embedded directly into media playlists. Common examples include:
- #EXT-X-CUE-OUT
- #EXT-X-CUE-IN
- #EXT-X-SCTE35
Custom tags may also be used depending on the workflow requirements. These tags are interpreted by SSAI services such as AWS Elemental MediaTailor and, in some cases, directly by video players.
ESAM
ESAM is an out-of-band XML-based control mechanism between a session manager and a signal processor, which in this workflow is AWS Elemental MediaConvert.
ESAM instructs MediaConvert on:
- The logical ad events, including SCTE-35 semantics and timestamps
- The HLS manifest modifications that should occur at those events
In AWS VOD workflows, native SCTE-35 packets are typically not embedded directly into the mezzanine asset. Instead, ESAM XML documents are provided to MediaConvert, which then generates the required HLS ad tags for downstream components to interpret.
The Two ESAM Documents Used by MediaConvert
MediaConvert relies on two ESAM XML files:
- SignalProcessingNotification (SPN): Defines when ad events occur and specifies their SCTE-35 semantics.
- ManifestConfirmConditionNotification (MCCN): Defines how each event should be represented in the HLS manifest.
A useful way to think about these documents is as a separation between event definition and manifest behavior:
ESAM document | Responsibility |
SPN | Defines timeline placement and SCTE-35 event semantics (“when” and “what”) |
MCCN | Defines the HLS tags and manifest modifications associated with each event (“how” the HLS should look) |
Both documents reference events using the same identity fields:
- acquisitionPointIdentity
- acquisitionSignalID
MediaConvert applies a manifest modification only when it can successfully match a ManifestResponse in MCCN with a corresponding ResponseSignal in SPN using this identity pair.
1. SignalProcessingNotification (SPN): “When” and “What”
The SPN document defines ad events as <ResponseSignal> elements. Each element represents a single signaling event or ad marker.
Identity Fields
Each event is identified using the following fields:
- acquisitionPointIdentity: Identifies the overall acquisition context, typically corresponding to an asset or channel.
- acquisitionSignalID: Uniquely identifies a specific signal within that acquisition context.
Timing Information
Event timing is controlled through:
- sig:NPTPoint@nptPoint: Specifies the presentation timestamp, in seconds from program start, where the marker should be inserted into the output timeline.
SCTE-35 Semantics
The SCTE-35 metadata associated with an event is described using:
- sig:SCTE35PointDescriptor
This descriptor can include fields such as:
- spliceCommandType: For example, 06 for segmentation events.
- SegmentationDescriptorInfo attributes such as:
- segmentEventId
- segmentTypeId
- upidType
Within AWS Elemental MediaConvert, the SCTE-35 descriptor is primarily used for event classification and downstream compatibility. The primary operational control points are:
- nptPoint, which determines when the marker is triggered
- acquisitionPointIdentity and acquisitionSignalID, which determine which MCCN entry can attach HLS tags to the event
Example SPN Document
The following example defines one linear ad marker and two overlay ad markers:
1<?xml version="1.0" encoding="UTF-8"?>2<SignalProcessingNotification3 xmlns="urn:cablelabs:iptvservices:esam:xsd:signal:1"4 xmlns:sig="urn:cablelabs:md:xsd:signaling:3.0"5 xmlns:common="urn:cablelabs:iptvservices:esam:xsd:common:1"6 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"7 xmlns:content="urn:cablelabs:iptvservices:esam:xsd:content:1"8 acquisitionPointIdentity="ExampleService">9 <common:BatchInfo batchId="1">10 <common:Source xsi:type="content:MovieType"/>11 </common:BatchInfo>1213 <!-- Linear ad opportunity at 180s -->14 <ResponseSignal15 acquisitionPointIdentity="ExampleService"16 acquisitionSignalID="2"17 signalPointID="180.000"18 action="create">19 <sig:NPTPoint nptPoint="180.000"/>20 <sig:SCTE35PointDescriptor spliceCommandType="06">21 <sig:SegmentationDescriptorInfo22 segmentEventId="201"23 segmentTypeId="52"24 upidType="0"/>25 </sig:SCTE35PointDescriptor>26 </ResponseSignal>2728 <!-- Overlay ad at 360s -->29 <ResponseSignal30 acquisitionPointIdentity="ExampleService"31 acquisitionSignalID="3"32 signalPointID="360.000"33 action="create">34 <sig:NPTPoint nptPoint="360.000"/>35 <sig:SCTE35PointDescriptor spliceCommandType="06">36 <sig:SegmentationDescriptorInfo37 segmentEventId="301"38 segmentTypeId="52"39 upidType="0"/>40 </sig:SCTE35PointDescriptor>41 </ResponseSignal>4243 <!-- Overlay ad at 540s -->44 <ResponseSignal45 acquisitionPointIdentity="ExampleService"46 acquisitionSignalID="4"47 signalPointID="540.000"48 action="create">49 <sig:NPTPoint nptPoint="540.000"/>50 <sig:SCTE35PointDescriptor spliceCommandType="06">51 <sig:SegmentationDescriptorInfo52 segmentEventId="302"53 segmentTypeId="52"54 upidType="0"/>55 </sig:SCTE35PointDescriptor>56 </ResponseSignal>57</SignalProcessingNotification>
Key points:
- Each <ResponseSignal> represents a single ad marker.
- nptPoint uses program time in seconds. Adjusting this value shifts the marker position in the output timeline.
- acquisitionPointIdentity and acquisitionSignalID must uniquely identify each event and must match the corresponding MCCN entry.
2. ManifestConfirmConditionNotification (MCCN): “How” the Manifest Is Tagged
The MCCN document defines how AWS Elemental MediaConvert should modify the HLS manifest for each event defined in the SPN document.
Each <ns2:ManifestResponse> element performs two functions:
- Identifies the corresponding SPN event
- Defines the HLS tags that should be inserted into the manifest
Event Identification
Each manifest response references an SPN event using:
- acquisitionPointIdentity
- acquisitionSignalID
These values must match the corresponding <ResponseSignal> entry in the SPN document.
Manifest Modification Instructions
The manifest behavior is described using the following elements:
- ns2:SegmentModify: Defines which segment modifications should be applied.
- ns2:FirstSegment: Specifies that the modification applies to the first segment associated with the event.
- ns2:Tag value="...": Defines the literal HLS tag line that MediaConvert inserts into the playlist.
Unlike SPN, MCCN does not control event timing. Its role is limited to defining how existing events should be represented in the output manifest.
Example MCCN Document
The following example:
- Inserts linear ad markers using #EXT-X-CUE-OUT:0 and #EXT-X-CUE-IN
- Inserts overlay opportunities using a custom #EXT-X-OVERLAY-AD tag
1<?xml version="1.0" encoding="UTF-8"?>2<ns2:ManifestConfirmConditionNotification3 xmlns:ns2="http://www.cablelabs.com/namespaces/metadata/xsd/confirmation/2"4 xmlns="http://www.cablelabs.com/namespaces/metadata/xsd/core/2"5 xmlns:ns3="http://www.cablelabs.com/namespaces/metadata/xsd/signaling/2">67 <!-- Linear ad: use #EXT-X-CUE-OUT:0 for VOD with MediaTailor -->8 <ns2:ManifestResponse9 acquisitionPointIdentity="ExampleService"10 acquisitionSignalID="2">11 <ns2:SegmentModify>12 <ns2:FirstSegment>13 <ns2:Tag value="#EXT-X-CUE-OUT:0"/>14 <ns2:Tag value="#EXT-X-CUE-IN"/>15 </ns2:FirstSegment>16 </ns2:SegmentModify>17 </ns2:ManifestResponse>1819 <!-- Overlay ad at 360s -->20 <ns2:ManifestResponse21 acquisitionPointIdentity="ExampleService"22 acquisitionSignalID="3">23 <ns2:SegmentModify>24 <ns2:FirstSegment>25 <ns2:Tag26 value="#EXT-X-OVERLAY-AD:ID="1",DURATION=5.0"/>27 </ns2:FirstSegment>28 </ns2:SegmentModify>29 </ns2:ManifestResponse>3031 <!-- Overlay ad at 540s -->32 <ns2:ManifestResponse33 acquisitionPointIdentity="ExampleService"34 acquisitionSignalID="4">35 <ns2:SegmentModify>36 <ns2:FirstSegment>37 <ns2:Tag38 value="#EXT-X-OVERLAY-AD:ID="2",DURATION=5.0"/>39 </ns2:FirstSegment>40 </ns2:SegmentModify>41 </ns2:ManifestResponse>42</ns2:ManifestConfirmConditionNotification>
Key points:
- Every <ns2:ManifestResponse> should correspond to a matching <ResponseSignal> in the SPN document.
- The value attribute contains the exact HLS tag line inserted into the manifest.
- If a tag is not explicitly defined in MCCN, MediaConvert does not emit it into the output playlist.
How MediaConvert Processes ESAM to Generate HLS Markers
At a high level, MediaConvert processes ESAM documents in four stages to generate HLS manifests with ad signaling markers.
1. Parse the SPN Document
MediaConvert first parses the SPN document and builds an internal event list.
For each <ResponseSignal> element, MediaConvert records:
- The event identity:
- acquisitionPointIdentity
- acquisitionSignalID
- The event timing:
- sig:NPTPoint@nptPoint
- The SCTE-35 metadata:
- sig:SCTE35PointDescriptor
Each event is internally keyed using the combination of:
- acquisitionPointIdentity
- acquisitionSignalID
2. Parse the MCCN Document
MediaConvert then parses the MCCN document and builds the manifest modification rules associated with each event.
For every <ns2:ManifestResponse> element, MediaConvert:
- Searches for a matching SPN event using:
- acquisitionPointIdentity
- acquisitionSignalID
- Ignores the manifest response if no matching SPN event exists
- Associates the defined HLS tags with the matched event
3. Align Events with HLS Segments
During transcoding and packaging, MediaConvert generates the HLS ABR ladder and segments the media.
For each ESAM event, MediaConvert:
- Identifies the first HLS segment whose start time occurs at or after the specified nptPoint
- Applies the SegmentModify, FirstSegment, and Tag operations to that segment
Because HLS signaling is segment-based, markers are applied at segment boundaries rather than arbitrary frame positions.
4. Generate the Output HLS Manifests
The resulting HLS playlists contain the tags defined in MCCN, such as:
- #EXT-X-CUE-OUT:0
- #EXT-X-CUE-IN
- #EXT-X-OVERLAY-AD:ID="1",DURATION=5.0
These markers can then be interpreted downstream by:
- SSAI services such as AWS Elemental MediaTailor
- Client-side playback applications and ad logic
Important Behavioral Constraints
Strict ID Matching
Both acquisitionPointIdentity and acquisitionSignalID must match between SPN and MCCN.
Any mismatch, including typographical errors or missing entries, prevents MediaConvert from applying the associated manifest modification.
Timing Is Controlled Only by SPN
Event timing is defined exclusively through:
- sig:NPTPoint@nptPoint
Changing MCCN does not affect marker placement.
To reposition a marker, the nptPoint value in SPN must be modified.
Tag Content Is Controlled Only by MCCN
The HLS tags inserted into the manifest are defined entirely in MCCN.
For example, changing:
#EXT-X-CUE-OUT:0
to another tag format requires editing MCCN rather than SPN.
Segment Boundary Alignment
Markers are inserted at HLS segment boundaries.
If an nptPoint falls in the middle of a segment, the corresponding tags are typically applied to the next segment header.
Designing Reliable ESAM Configurations for Linear Ad Breaks
Linear ad breaks, including pre-roll, mid-roll, and post-roll placements, are commonly handled by AWS Elemental MediaTailor using HLS ad markers generated by AWS Elemental MediaConvert.
ESAM provides a structured mechanism for defining those markers consistently across VOD assets.
Choosing NPT Times and Segments
Each ad opportunity should be represented by a dedicated <ResponseSignal> element in the SPN document.
When defining event timing:
- nptPoint should reference the intended position in the output timeline rather than the source asset timeline
- Segment duration and GOP structure should be considered during packaging design
For precise marker placement:
- GOP and segment durations should be aligned so that ad markers occur close to segment boundaries
- Segment durations in the 4 to 6 second range with aligned keyframes are commonly used
Otherwise, markers may shift forward to the next available segment boundary during packaging.
HLS VOD Signaling with #EXT-X-CUE-OUT:0
For HLS VOD workflows using MediaTailor, a common approach is to emit:
#EXT-X-CUE-OUT:0
The 0 value indicates that the manifest does not explicitly define the replacement duration for the ad break.
Instead, MediaTailor determines the effective ad duration dynamically during manifest generation and ad stitching.
Using a non-zero CUE-OUT value in VOD workflows can lead to unintended playback behavior, including the removal or skipping of portions of the primary content timeline.
To indicate the conclusion of the ad opportunity, the following marker is typically emitted:
#EXT-X-CUE-IN
For this reason, the MCCN example shown earlier emits both tags for the same event.
Recommended Minimal ESAM Structure for VOD SSAI
SPN Responsibilities
The SPN document should:
- Define each ad opportunity using precise nptPoint values
- Include valid SCTE-35 segmentation descriptors for downstream compatibility
MCCN Responsibilities
The MCCN document should:
- Emit #EXT-X-CUE-OUT:0
- Emit #EXT-X-CUE-IN
- Associate those tags with the corresponding SPN event identifiers
Overlay and Banner Ads with Custom HLS Tags
ESAM is not limited to standard linear ad markers. Custom HLS tags can also be inserted into manifests to support client-driven experiences such as banner ads, overlays, or picture-in-picture promotions.
A common approach uses a custom tag format such as:
- #EXT-X-OVERLAY-AD:ID="1",DURATION=5.0
In this example:
- ID defines a stable identifier associated with the overlay opportunity
- DURATION specifies how long the overlay should remain visible, in seconds.
These tags are inserted into the manifest by AWS Elemental MediaConvert through MCCN definitions, in the same way that linear ad markers are inserted.
The workflow typically follows this structure:
- SPN defines overlay events at particular nptPoint values.
- MCCN maps each event to an #EXT-X-OVERLAY-AD tag with matching IDs.
- The output HLS manifest contains the custom overlay markers at the appropriate segment boundaries
Unlike standard SSAI markers, custom overlay tags have no intrinsic playback behavior. Additional client-side logic is required to consume and act on them.
A player application or associated client component typically performs the following actions:
- Parses the manifest
- Detects #EXT-X-OVERLAY-AD lines
- Triggers overlay rendering at the appropriate playback position
- Removes the overlay after the specified duration
The following sections describe architectures that combine both linear ad signaling and custom overlay signaling within the same playback workflow.
End-to-End Workflow: From File Ingest to Tagged HLS
A typical VOD workflow using ESAM with AWS Elemental MediaConvert generally consists of four stages: ingest and metadata extraction, transcoding and packaging, server-side ad insertion, and client playback.
Ingest and Metadata Processing
The workflow typically begins with source asset ingestion into Amazon S3.
A common implementation pattern includes:
- Source mezzanine assets uploaded to an S3 bucket acting as a watch folder
- An AWS Lambda function triggered on object creation events
- Media analysis performed using tools such as MediaInfo to determine:
- Asset duration
- Structural metadata
- Stream characteristics
Based on predefined business rules, the Lambda function generates the ESAM documents:
- SPN for event timing and SCTE-35 semantics
- MCCN for HLS manifest modifications
Typical rules may include:
- Mid-roll insertion every 20 minutes
- Pre-roll insertion
- Post-roll insertion
Transcoding and Packaging
After generating the ESAM documents, the workflow submits a MediaConvert job for the asset.
The job configuration commonly includes:
- A mezzanine asset stored in S3 as the source input
- An HLS output group
- SPN and MCCN XML documents attached as ESAM sidecar inputs
During processing, MediaConvert:
- Transcodes the source asset
- Generates the HLS ABR ladder
- Segments the media
- Inserts HLS tags according to the ESAM instructions
The resulting manifests and media segments are typically written to an S3 bucket serving as the content origin.
Server-Side Ad Insertion
AWS Elemental MediaTailor then consumes the generated HLS manifests.
MediaTailor:
- Reads the HLS master and media playlists from S3
- Interprets ad signaling markers such as:
- #EXT-X-CUE-OUT:0
- #EXT-X-CUE-IN
- Uses SCTE-35 metadata when available
- Communicates with an ad decision server
- Generates personalized manifests containing stitched ad segments
Client Playback and Overlay Rendering
On the client side, a playback application such as Video.js consumes the MediaTailor-generated manifest.
Within this playback workflow:
- Linear advertisements appear as stitched media segments
- Ad metadata tracks can be used for playback enforcement and analytics
- Overlay advertisements are triggered using custom tags such as:
- #EXT-X-OVERLAY-AD
Overlay rendering logic is commonly implemented using client-side advertising SDKs such as Google Interactive Media Ads.
The result is a unified playback workflow in which both linear and overlay advertisements are synchronized through the same ESAM-defined event model.
Common Pitfalls and How to Avoid Them
Several implementation issues commonly appear when working with ESAM and AWS Elemental MediaConvert workflows. Most can be traced back to a small set of configuration and synchronization problems.
1. Missing or Mismatched Event Identifiers
Symptom: Expected HLS tags do not appear in the output manifest.
Cause: acquisitionPointIdentity or acquisitionSignalID values differ between SPN and MCCN due to typographical errors, copy-paste inconsistencies, or incomplete mappings.
Fix:
- Ensure every SPN <ResponseSignal> has a corresponding MCCN <ManifestResponse>.
- Keep IDs in a single source of truth (code or templates) to avoid manual mismatches.
2. Assuming MCCN Controls Timing
Symptom: Tags appear at unexpected times, or moving tags in MCCN does not affect marker placement.
Cause: MCCN does not define timing information and does not contain nptPoint values.
Fix:
- Modify event timing exclusively through: sig:NPTPoint@nptPoint.
- Treat MCCN as pure decoration of events already defined in SPN.
3. Segment Boundary Misalignment
Symptom: Markers appear one segment later than expected.
Cause: The specified nptPoint occurs within a segment rather than near its boundary. MediaConvert applies tags at segment boundaries and therefore shifts the marker to the next eligible segment.
Fix:
- Align planned ad positions as closely as possible with segment boundaries
- Adjust GOP structure and segment duration to support predictable ad insertion behavior
4. Incomplete Tag Sets
Symptom: The manifest contains #EXT-X-CUE-OUT without #EXT-X-CUE-IN, or vice versa.
Cause: One of the required tags is omitted from the MCCN definition.
Fix:
- For each linear ad event, emit both #EXT-X-CUE-OUT:0 and #EXT-X-CUE-IN in MCCN unless a specific workflow requirement dictates otherwise.
- Automated validation can also help detect incomplete signaling sequences before deployment by scanning generated manifests for expected tag patterns.
5. Incorrect CUE-OUT Values in VOD
Symptom: Playback skips sections of primary content or exhibits unexpected ad transition behavior.
Cause: Using #EXT-X-CUE-OUT:<non-zero> for VOD instead of #EXT-X-CUE-OUT:0.
Fix:
- For VOD with MediaTailor, use #EXT-X-CUE-OUT:0 and allow MediaTailor to determine effective ad break duration dynamically
- Reserve non-zero CUE-OUT durations for specialized workflows that have been explicitly validated against the target SSAI behavior
Validation Checklist
Before deploying ESAM-driven workflows into production, the generated signaling and manifest outputs should be validated for each asset.
SPN
- Every intended ad opportunity is represented by a corresponding <ResponseSignal>
- Each event contains the correct nptPoint value
- acquisitionSignalID values are unique within the associated acquisitionPointIdentity
MCCN
- Each <ResponseSignal> has a corresponding <ManifestResponse> unless intentionally omitted
- acquisitionPointIdentity and acquisitionSignalID values match the SPN document exactly
- All required HLS tags are present
- Tag definitions are syntactically valid
Output manifests
Inspect the generated HLS playlists directly using text inspection tools or automated validation workflows.
Confirm that:
- Tags appear at the expected approximate playback positions, accounting for segment-boundary alignment behavior
- Generated tag values match the MCCN definitions exactly
- Linear and overlay markers appear consistently across all required media playlists
Key Takeaways
ESAM provides a structured way to control ad signaling in AWS VOD workflows by separating event timing from manifest generation behavior.
Within this architecture:
- SPN defines when ad events occur and describes their SCTE-35 semantics
- MCCN defines how those events are represented in the HLS manifest
- AWS Elemental MediaConvert combines media processing with ESAM instructions to generate HLS playlists containing standard and custom ad markers
- AWS Elemental MediaTailor consumes linear ad markers to perform server-side ad insertion
- Client applications can consume custom HLS tags to drive overlays, banners, and additional interactive advertising behavior
A reliable ESAM implementation depends on several core principles:
- Consistent identifier matching between SPN and MCCN
- Accurate nptPoint timing definitions
- Proper alignment between signaling events and HLS segment boundaries
- Clear separation between timing logic and manifest decoration logic
When ESAM documents are treated as validated configuration artifacts rather than static XML files, they become a scalable mechanism for managing both linear and non-linear advertising workflows across large VOD catalogs.
This approach provides precise control over HLS manifest generation while supporting extensible monetization strategies built on standardized signaling workflows.
Related Posts

How AWS Elemental MediaTailor Uses SCTE-35 for Server-Side Ad Insertion
AWS MediaTailor uses SCTE-35 markers to detect ad opportunities and enable server-side ad insertion for seamless, targeted streaming monetization workflows.

Setting Up a MediaTailor Configuration with Amazon CloudFront: A Guide to Integrating Cache Policies for Optimal Performance

