TrackIt
TrackIt
Contact us
Blogs

ESAM Explained: How AWS MediaConvert Uses ESAM to Insert Ad Markers in HLS Playlists

Author

Alexian Kauffmann

Date Published

Modern streaming workloads increasingly rely on server-side ad insertion (SSAI), where ad breaks are inserted into HLS manifests on the server to create a seamless, broadcast‑like viewing experience. In the AWS video stack, a typical VOD flow consists of the following components:

  • AWS Elemental MediaConvert transcodes source media and generates HLS playlists.
  • AWS Elemental MediaTailor reads those playlists, interprets ad markers, communicates with an ad decision server, and generates personalized manifests.
  • A player, such as Video.js, plays the resulting HLS stream as a continuous playback experience.

Accurate ad signaling is central to this workflow. The packager, in this case MediaConvert, must determine:

  • When ad opportunities occur in the program timeline.
  • What those events represent in SCTE‑35 terms.
  • How those events should be expressed as HLS tags for downstream systems

This is where ESAM (Event Signaling and Management) becomes important. ESAM provides a standardized XML-based mechanism for describing ad events and instructing packagers such as MediaConvert to modify HLS manifests accordingly.

ESAM in Context: ESAM vs. SCTE‑35 vs. HLS Tags

To understand the role of ESAM, it is important to distinguish between the three signaling layers involved in an SSAI workflow.

SCTE-35

SCTE-35 is a binary, in-band signaling standard widely used in broadcast environments to identify splice points, durations, and segmentation events.

It defines signaling semantics such as:

  • Splice command types
  • Segmentation event identifiers
  • UPIDs (Unique Program Identifiers)

HLS Ad Tags

HLS ad tags are textual markers embedded directly into media playlists. Common examples include:

  • #EXT-X-CUE-OUT
  • #EXT-X-CUE-IN
  • #EXT-X-SCTE35

Custom tags may also be used depending on the workflow requirements. These tags are interpreted by SSAI services such as AWS Elemental MediaTailor and, in some cases, directly by video players.

ESAM

ESAM is an out-of-band XML-based control mechanism between a session manager and a signal processor, which in this workflow is AWS Elemental MediaConvert.

ESAM instructs MediaConvert on:

  • The logical ad events, including SCTE-35 semantics and timestamps
  • The HLS manifest modifications that should occur at those events

In AWS VOD workflows, native SCTE-35 packets are typically not embedded directly into the mezzanine asset. Instead, ESAM XML documents are provided to MediaConvert, which then generates the required HLS ad tags for downstream components to interpret.

The Two ESAM Documents Used by MediaConvert

MediaConvert relies on two ESAM XML files:

  • SignalProcessingNotification (SPN): Defines when ad events occur and specifies their SCTE-35 semantics.
  • ManifestConfirmConditionNotification (MCCN): Defines how each event should be represented in the HLS manifest.

A useful way to think about these documents is as a separation between event definition and manifest behavior:

ESAM document

Responsibility

SPN

Defines timeline placement and SCTE-35 event semantics (“when” and “what”)

MCCN

Defines the HLS tags and manifest modifications associated with each event (“how” the HLS should look)

Both documents reference events using the same identity fields:

  • acquisitionPointIdentity
  • acquisitionSignalID

MediaConvert applies a manifest modification only when it can successfully match a ManifestResponse in MCCN with a corresponding ResponseSignal in SPN using this identity pair.

1. SignalProcessingNotification (SPN): “When” and “What”

The SPN document defines ad events as <ResponseSignal> elements. Each element represents a single signaling event or ad marker.

Identity Fields

Each event is identified using the following fields:

  • acquisitionPointIdentity: Identifies the overall acquisition context, typically corresponding to an asset or channel.
  • acquisitionSignalID: Uniquely identifies a specific signal within that acquisition context.

Timing Information

Event timing is controlled through:

  • sig:NPTPoint@nptPoint: Specifies the presentation timestamp, in seconds from program start, where the marker should be inserted into the output timeline. 

SCTE-35 Semantics

The SCTE-35 metadata associated with an event is described using:

  • sig:SCTE35PointDescriptor

This descriptor can include fields such as:

  • spliceCommandType: For example, 06 for segmentation events.
  • SegmentationDescriptorInfo attributes such as:
    • segmentEventId
    • segmentTypeId
    • upidType

Within AWS Elemental MediaConvert, the SCTE-35 descriptor is primarily used for event classification and downstream compatibility. The primary operational control points are:

  • nptPoint, which determines when the marker is triggered
  • acquisitionPointIdentity and acquisitionSignalID, which determine which MCCN entry can attach HLS tags to the event

Example SPN Document

The following example defines one linear ad marker and two overlay ad markers:

1<?xml version="1.0" encoding="UTF-8"?>
2<SignalProcessingNotification
3 xmlns="urn:cablelabs:iptvservices:esam:xsd:signal:1"
4 xmlns:sig="urn:cablelabs:md:xsd:signaling:3.0"
5 xmlns:common="urn:cablelabs:iptvservices:esam:xsd:common:1"
6 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
7 xmlns:content="urn:cablelabs:iptvservices:esam:xsd:content:1"
8 acquisitionPointIdentity="ExampleService">
9 <common:BatchInfo batchId="1">
10 <common:Source xsi:type="content:MovieType"/>
11 </common:BatchInfo>
12
13 <!-- Linear ad opportunity at 180s -->
14 <ResponseSignal
15 acquisitionPointIdentity="ExampleService"
16 acquisitionSignalID="2"
17 signalPointID="180.000"
18 action="create">
19 <sig:NPTPoint nptPoint="180.000"/>
20 <sig:SCTE35PointDescriptor spliceCommandType="06">
21 <sig:SegmentationDescriptorInfo
22 segmentEventId="201"
23 segmentTypeId="52"
24 upidType="0"/>
25 </sig:SCTE35PointDescriptor>
26 </ResponseSignal>
27
28 <!-- Overlay ad at 360s -->
29 <ResponseSignal
30 acquisitionPointIdentity="ExampleService"
31 acquisitionSignalID="3"
32 signalPointID="360.000"
33 action="create">
34 <sig:NPTPoint nptPoint="360.000"/>
35 <sig:SCTE35PointDescriptor spliceCommandType="06">
36 <sig:SegmentationDescriptorInfo
37 segmentEventId="301"
38 segmentTypeId="52"
39 upidType="0"/>
40 </sig:SCTE35PointDescriptor>
41 </ResponseSignal>
42
43 <!-- Overlay ad at 540s -->
44 <ResponseSignal
45 acquisitionPointIdentity="ExampleService"
46 acquisitionSignalID="4"
47 signalPointID="540.000"
48 action="create">
49 <sig:NPTPoint nptPoint="540.000"/>
50 <sig:SCTE35PointDescriptor spliceCommandType="06">
51 <sig:SegmentationDescriptorInfo
52 segmentEventId="302"
53 segmentTypeId="52"
54 upidType="0"/>
55 </sig:SCTE35PointDescriptor>
56 </ResponseSignal>
57</SignalProcessingNotification>

Key points:

  • Each <ResponseSignal> represents a single ad marker.
  • nptPoint uses program time in seconds. Adjusting this value shifts the marker position in the output timeline.
  • acquisitionPointIdentity and acquisitionSignalID must uniquely identify each event and must match the corresponding MCCN entry.

2. ManifestConfirmConditionNotification (MCCN): “How” the Manifest Is Tagged

The MCCN document defines how AWS Elemental MediaConvert should modify the HLS manifest for each event defined in the SPN document.

Each <ns2:ManifestResponse> element performs two functions:

  • Identifies the corresponding SPN event
  • Defines the HLS tags that should be inserted into the manifest

Event Identification

Each manifest response references an SPN event using:

  • acquisitionPointIdentity
  • acquisitionSignalID

These values must match the corresponding <ResponseSignal> entry in the SPN document.

Manifest Modification Instructions

The manifest behavior is described using the following elements:

  • ns2:SegmentModify: Defines which segment modifications should be applied.
  • ns2:FirstSegment: Specifies that the modification applies to the first segment associated with the event.
  • ns2:Tag value="...": Defines the literal HLS tag line that MediaConvert inserts into the playlist.

Unlike SPN, MCCN does not control event timing. Its role is limited to defining how existing events should be represented in the output manifest.

Example MCCN Document

The following example:

  • Inserts linear ad markers using #EXT-X-CUE-OUT:0 and #EXT-X-CUE-IN
  • Inserts overlay opportunities using a custom #EXT-X-OVERLAY-AD tag
1<?xml version="1.0" encoding="UTF-8"?>
2<ns2:ManifestConfirmConditionNotification
3 xmlns:ns2="http://www.cablelabs.com/namespaces/metadata/xsd/confirmation/2"
4 xmlns="http://www.cablelabs.com/namespaces/metadata/xsd/core/2"
5 xmlns:ns3="http://www.cablelabs.com/namespaces/metadata/xsd/signaling/2">
6
7 <!-- Linear ad: use #EXT-X-CUE-OUT:0 for VOD with MediaTailor -->
8 <ns2:ManifestResponse
9 acquisitionPointIdentity="ExampleService"
10 acquisitionSignalID="2">
11 <ns2:SegmentModify>
12 <ns2:FirstSegment>
13 <ns2:Tag value="#EXT-X-CUE-OUT:0"/>
14 <ns2:Tag value="#EXT-X-CUE-IN"/>
15 </ns2:FirstSegment>
16 </ns2:SegmentModify>
17 </ns2:ManifestResponse>
18
19 <!-- Overlay ad at 360s -->
20 <ns2:ManifestResponse
21 acquisitionPointIdentity="ExampleService"
22 acquisitionSignalID="3">
23 <ns2:SegmentModify>
24 <ns2:FirstSegment>
25 <ns2:Tag
26 value="#EXT-X-OVERLAY-AD:ID=&quot;1&quot;,DURATION=5.0"/>
27 </ns2:FirstSegment>
28 </ns2:SegmentModify>
29 </ns2:ManifestResponse>
30
31 <!-- Overlay ad at 540s -->
32 <ns2:ManifestResponse
33 acquisitionPointIdentity="ExampleService"
34 acquisitionSignalID="4">
35 <ns2:SegmentModify>
36 <ns2:FirstSegment>
37 <ns2:Tag
38 value="#EXT-X-OVERLAY-AD:ID=&quot;2&quot;,DURATION=5.0"/>
39 </ns2:FirstSegment>
40 </ns2:SegmentModify>
41 </ns2:ManifestResponse>
42</ns2:ManifestConfirmConditionNotification>

Key points:

  • Every <ns2:ManifestResponse> should correspond to a matching <ResponseSignal> in the SPN document.
  • The value attribute contains the exact HLS tag line inserted into the manifest.
  • If a tag is not explicitly defined in MCCN, MediaConvert does not emit it into the output playlist.

How MediaConvert Processes ESAM to Generate HLS Markers

At a high level, MediaConvert processes ESAM documents in four stages to generate HLS manifests with ad signaling markers.

1. Parse the SPN Document

MediaConvert first parses the SPN document and builds an internal event list.

For each <ResponseSignal> element, MediaConvert records:

  • The event identity:
    • acquisitionPointIdentity
    • acquisitionSignalID
  • The event timing:
    • sig:NPTPoint@nptPoint
  • The SCTE-35 metadata:
    • sig:SCTE35PointDescriptor

Each event is internally keyed using the combination of:

  1. acquisitionPointIdentity
  2. acquisitionSignalID

2. Parse the MCCN Document

MediaConvert then parses the MCCN document and builds the manifest modification rules associated with each event.

For every <ns2:ManifestResponse> element, MediaConvert:

  • Searches for a matching SPN event using:
    • acquisitionPointIdentity
    • acquisitionSignalID
  • Ignores the manifest response if no matching SPN event exists
  • Associates the defined HLS tags with the matched event

3. Align Events with HLS Segments

During transcoding and packaging, MediaConvert generates the HLS ABR ladder and segments the media.

For each ESAM event, MediaConvert:

  • Identifies the first HLS segment whose start time occurs at or after the specified nptPoint
  • Applies the SegmentModify, FirstSegment, and Tag operations to that segment

Because HLS signaling is segment-based, markers are applied at segment boundaries rather than arbitrary frame positions.

4. Generate the Output HLS Manifests

The resulting HLS playlists contain the tags defined in MCCN, such as:

  • #EXT-X-CUE-OUT:0
  • #EXT-X-CUE-IN
  • #EXT-X-OVERLAY-AD:ID="1",DURATION=5.0

These markers can then be interpreted downstream by:

  1. SSAI services such as AWS Elemental MediaTailor
  2. Client-side playback applications and ad logic

Important Behavioral Constraints

Strict ID Matching

Both acquisitionPointIdentity and acquisitionSignalID must match between SPN and MCCN.

Any mismatch, including typographical errors or missing entries, prevents MediaConvert from applying the associated manifest modification.

Timing Is Controlled Only by SPN

Event timing is defined exclusively through:

  • sig:NPTPoint@nptPoint

Changing MCCN does not affect marker placement.

To reposition a marker, the nptPoint value in SPN must be modified.

Tag Content Is Controlled Only by MCCN

The HLS tags inserted into the manifest are defined entirely in MCCN.

For example, changing:

#EXT-X-CUE-OUT:0

to another tag format requires editing MCCN rather than SPN.

Segment Boundary Alignment

Markers are inserted at HLS segment boundaries.

If an nptPoint falls in the middle of a segment, the corresponding tags are typically applied to the next segment header.

Designing Reliable ESAM Configurations for Linear Ad Breaks

Linear ad breaks, including pre-roll, mid-roll, and post-roll placements, are commonly handled by AWS Elemental MediaTailor using HLS ad markers generated by AWS Elemental MediaConvert.

ESAM provides a structured mechanism for defining those markers consistently across VOD assets.

Choosing NPT Times and Segments

Each ad opportunity should be represented by a dedicated <ResponseSignal> element in the SPN document.

When defining event timing:

  • nptPoint should reference the intended position in the output timeline rather than the source asset timeline
  • Segment duration and GOP structure should be considered during packaging design

For precise marker placement:

  • GOP and segment durations should be aligned so that ad markers occur close to segment boundaries
  • Segment durations in the 4 to 6 second range with aligned keyframes are commonly used

Otherwise, markers may shift forward to the next available segment boundary during packaging.

HLS VOD Signaling with #EXT-X-CUE-OUT:0

For HLS VOD workflows using MediaTailor, a common approach is to emit:

#EXT-X-CUE-OUT:0

The 0 value indicates that the manifest does not explicitly define the replacement duration for the ad break.

Instead, MediaTailor determines the effective ad duration dynamically during manifest generation and ad stitching.

Using a non-zero CUE-OUT value in VOD workflows can lead to unintended playback behavior, including the removal or skipping of portions of the primary content timeline.

To indicate the conclusion of the ad opportunity, the following marker is typically emitted:

#EXT-X-CUE-IN

For this reason, the MCCN example shown earlier emits both tags for the same event.

Recommended Minimal ESAM Structure for VOD SSAI

SPN Responsibilities

The SPN document should:

  • Define each ad opportunity using precise nptPoint values
  • Include valid SCTE-35 segmentation descriptors for downstream compatibility

MCCN Responsibilities

The MCCN document should:

  • Emit #EXT-X-CUE-OUT:0
  • Emit #EXT-X-CUE-IN
  • Associate those tags with the corresponding SPN event identifiers

Overlay and Banner Ads with Custom HLS Tags

ESAM is not limited to standard linear ad markers. Custom HLS tags can also be inserted into manifests to support client-driven experiences such as banner ads, overlays, or picture-in-picture promotions. 

A common approach uses a custom tag format such as:

  • #EXT-X-OVERLAY-AD:ID="1",DURATION=5.0

In this example:

  • ID defines a stable identifier associated with the overlay opportunity
  • DURATION specifies how long the overlay should remain visible, in seconds.

These tags are inserted into the manifest by AWS Elemental MediaConvert through MCCN definitions, in the same way that linear ad markers are inserted.

The workflow typically follows this structure:

  • SPN defines overlay events at particular nptPoint values.
  • MCCN maps each event to an #EXT-X-OVERLAY-AD tag with matching IDs.
  • The output HLS manifest contains the custom overlay markers at the appropriate segment boundaries

Unlike standard SSAI markers, custom overlay tags have no intrinsic playback behavior. Additional client-side logic is required to consume and act on them.

A player application or associated client component typically performs the following actions:

  • Parses the manifest
  • Detects #EXT-X-OVERLAY-AD lines
  • Triggers overlay rendering at the appropriate playback position
  • Removes the overlay after the specified duration

The following sections describe architectures that combine both linear ad signaling and custom overlay signaling within the same playback workflow.

End-to-End Workflow: From File Ingest to Tagged HLS

A typical VOD workflow using ESAM with AWS Elemental MediaConvert generally consists of four stages: ingest and metadata extraction, transcoding and packaging, server-side ad insertion, and client playback.

Ingest and Metadata Processing

The workflow typically begins with source asset ingestion into Amazon S3.

A common implementation pattern includes:

  • Source mezzanine assets uploaded to an S3 bucket acting as a watch folder
  • An AWS Lambda function triggered on object creation events
  • Media analysis performed using tools such as MediaInfo to determine:
    • Asset duration
    • Structural metadata
    • Stream characteristics

Based on predefined business rules, the Lambda function generates the ESAM documents:

  • SPN for event timing and SCTE-35 semantics
  • MCCN for HLS manifest modifications

Typical rules may include:

  • Mid-roll insertion every 20 minutes
  • Pre-roll insertion
  • Post-roll insertion

Transcoding and Packaging

After generating the ESAM documents, the workflow submits a MediaConvert job for the asset.

The job configuration commonly includes:

  • A mezzanine asset stored in S3 as the source input
  • An HLS output group
  • SPN and MCCN XML documents attached as ESAM sidecar inputs

During processing, MediaConvert:

  • Transcodes the source asset
  • Generates the HLS ABR ladder
  • Segments the media
  • Inserts HLS tags according to the ESAM instructions

The resulting manifests and media segments are typically written to an S3 bucket serving as the content origin.

Server-Side Ad Insertion

AWS Elemental MediaTailor then consumes the generated HLS manifests.

MediaTailor:

  • Reads the HLS master and media playlists from S3
  • Interprets ad signaling markers such as:
    • #EXT-X-CUE-OUT:0
    • #EXT-X-CUE-IN
  • Uses SCTE-35 metadata when available
  • Communicates with an ad decision server
  • Generates personalized manifests containing stitched ad segments

Client Playback and Overlay Rendering

On the client side, a playback application such as Video.js consumes the MediaTailor-generated manifest.

Within this playback workflow:

  • Linear advertisements appear as stitched media segments
  • Ad metadata tracks can be used for playback enforcement and analytics
  • Overlay advertisements are triggered using custom tags such as:
    • #EXT-X-OVERLAY-AD

Overlay rendering logic is commonly implemented using client-side advertising SDKs such as Google Interactive Media Ads.

The result is a unified playback workflow in which both linear and overlay advertisements are synchronized through the same ESAM-defined event model.

Common Pitfalls and How to Avoid Them

Several implementation issues commonly appear when working with ESAM and AWS Elemental MediaConvert workflows. Most can be traced back to a small set of configuration and synchronization problems.

1. Missing or Mismatched Event Identifiers

Symptom: Expected HLS tags do not appear in the output manifest.

Cause: acquisitionPointIdentity or acquisitionSignalID values differ between SPN and MCCN due to typographical errors, copy-paste inconsistencies, or incomplete mappings.

Fix:

  • Ensure every SPN <ResponseSignal> has a corresponding MCCN <ManifestResponse>.
  • Keep IDs in a single source of truth (code or templates) to avoid manual mismatches.

2. Assuming MCCN Controls Timing

Symptom: Tags appear at unexpected times, or moving tags in MCCN does not affect marker placement.

Cause: MCCN does not define timing information and does not contain nptPoint values.

Fix:

  • Modify event timing exclusively through: sig:NPTPoint@nptPoint.
  • Treat MCCN as pure decoration of events already defined in SPN.

3. Segment Boundary Misalignment

Symptom: Markers appear one segment later than expected.

Cause: The specified nptPoint occurs within a segment rather than near its boundary. MediaConvert applies tags at segment boundaries and therefore shifts the marker to the next eligible segment.

Fix:

  • Align planned ad positions as closely as possible with segment boundaries
  • Adjust GOP structure and segment duration to support predictable ad insertion behavior

4. Incomplete Tag Sets

Symptom: The manifest contains #EXT-X-CUE-OUT without #EXT-X-CUE-IN, or vice versa.

Cause: One of the required tags is omitted from the MCCN definition.

Fix:

  • For each linear ad event, emit both #EXT-X-CUE-OUT:0 and #EXT-X-CUE-IN in MCCN unless a specific workflow requirement dictates otherwise.
  • Automated validation can also help detect incomplete signaling sequences before deployment by scanning generated manifests for expected tag patterns.


5. Incorrect CUE-OUT Values in VOD

Symptom: Playback skips sections of primary content or exhibits unexpected ad transition behavior.

Cause: Using #EXT-X-CUE-OUT:<non-zero> for VOD instead of #EXT-X-CUE-OUT:0.

Fix:

  • For VOD with MediaTailor, use #EXT-X-CUE-OUT:0 and allow MediaTailor to determine effective ad break duration dynamically
  • Reserve non-zero CUE-OUT durations for specialized workflows that have been explicitly validated against the target SSAI behavior

Validation Checklist

Before deploying ESAM-driven workflows into production, the generated signaling and manifest outputs should be validated for each asset.

SPN

  • Every intended ad opportunity is represented by a corresponding <ResponseSignal>
  • Each event contains the correct nptPoint value
  • acquisitionSignalID values are unique within the associated acquisitionPointIdentity

MCCN

  • Each <ResponseSignal> has a corresponding <ManifestResponse> unless intentionally omitted 
  • acquisitionPointIdentity and acquisitionSignalID values match the SPN document exactly
  • All required HLS tags are present
  • Tag definitions are syntactically valid

Output manifests

Inspect the generated HLS playlists directly using text inspection tools or automated validation workflows.

Confirm that:

  • Tags appear at the expected approximate playback positions, accounting for segment-boundary alignment behavior
  • Generated tag values match the MCCN definitions exactly
  • Linear and overlay markers appear consistently across all required media playlists

Key Takeaways

ESAM provides a structured way to control ad signaling in AWS VOD workflows by separating event timing from manifest generation behavior.

Within this architecture:

  • SPN defines when ad events occur and describes their SCTE-35 semantics
  • MCCN defines how those events are represented in the HLS manifest
  • AWS Elemental MediaConvert combines media processing with ESAM instructions to generate HLS playlists containing standard and custom ad markers
  • AWS Elemental MediaTailor consumes linear ad markers to perform server-side ad insertion
  • Client applications can consume custom HLS tags to drive overlays, banners, and additional interactive advertising behavior

A reliable ESAM implementation depends on several core principles:

  • Consistent identifier matching between SPN and MCCN
  • Accurate nptPoint timing definitions
  • Proper alignment between signaling events and HLS segment boundaries
  • Clear separation between timing logic and manifest decoration logic

When ESAM documents are treated as validated configuration artifacts rather than static XML files, they become a scalable mechanism for managing both linear and non-linear advertising workflows across large VOD catalogs.

This approach provides precise control over HLS manifest generation while supporting extensible monetization strategies built on standardized signaling workflows.