VAST From the Inside: How the XML Template Works and Why It Is Important For Accurate Tracking

May 12, 2026
The online video advertising market continues to grow, and with it the requirements for the delivery of commercials are becoming more complex. Advertisers need transparent data, publishers need stable monetization, and developers need a clear technical standard that works equally across the web, mobile apps, CTV, and OTT environments.

The VAST standard, the Video Ad Serving Template, has become the foundation on which modern video advertising systems are built: video clips, event tracking, analytics, working with different devices, and integration between players and advertising servers.

In this article, we'll look at how VAST-XML works, which key tags are included in its structure, and why correct markup is crucial for accurate tracking of impressions, clicks, and screenings.
How The Player and The Server "Talk" to Each Other
In the context of video advertising, VAST acts as a universal language between a media player and an advertising platform. The player can work in a browser, mobile app, Smart TV app, or OTT service. The advertising side can be represented by an ad server, DSP, SSP, or other adtech system.

When the moment comes to display an ad — for example, before the start of a video, in the middle of viewing, or after completing the content — the player sends a request to the advertising platform. The request may include device parameters, type of playback environment, screen size, available formats, content information, and other technical specifications.

In response, the advertising server returns a VAST tag, an XML document with instructions for the player: which video to download, which events to track, where to send tracking requests, and what to do when the user clicks.

Key elements of the exchange:
1. Request, or ad call. The player informs the advertising platform that it is ready to play the commercial.
2. VAST response, or VAST response. The server returns an XML file with the description of the ad, links to media files, tracking pixels, and click settings.
3. Tracking. During playback, the player calls the URLs specified in VAST, capturing events: impression, start, firstQuartile, midpoint, complete, click, skip, and others.

This is how a bundle is formed in which the player and the advertising platform understand each other due to a single standard.
What It Looks Like During Playback?
A typical scenario can be described as follows:
  1. The player downloads the main video content, for example via CDN.
  2. If pre-roll is provided, the advertisement is played before the main video.
  3. If an ad should appear in the middle of the content, the player reaches the ad marker and initiates an advertising request.
  4. The advertising server returns VAST XML or another related structure, such as VMAP for managing commercial breaks.
  5. The player reads VAST, selects the appropriate media file, and launches the commercial.
  6. During playback, the player sends tracking requests to the specified URLs.
  7. After the ad ends, the player returns to the main content.

On streaming platforms and in CTV environments, players often additionally use the SDK to verify ads, measure viewability, control errors, and protect against fraud. Such SDKs can be proprietary or provided by third-party technology partners.

For Eastern European markets, this is especially important when working with international advertisers who need comparable metrics, brand safety, correct measurement of inspections and compliance with privacy compliance requirements.
Test your VAST tags
Make sure that your VAST/VPAID tags are working correctly quickly and conveniently. With UMG's VAST Inspector, you can check advertising tags, identify integration errors, and improve the stability of video advertising campaigns.

Test VAST →
Messaging Scheme
From the point of view of interaction between the participants, the process can be reduced to a simple scheme.

Player → Ad Server.
The player sends an advertising request. It can contain the User-Agent, IP address, device parameters, content data, screen size, type of playback medium, and other technical information.

Ad Server → Player.
The advertising server analyzes the incoming parameters, selects a suitable video creative, and generates VAST XML. The player receives the XML and starts processing it.

Player → Tracking URLs
While playing the commercial, the player calls up tracking links: start, firstQuartile, midpoint, thirdQuartile, complete, and others. This is necessary for analytics and confirmation of the fact of the impression.

Player → Landing Page.
If the user clicks on the video, the player opens the advertiser's page via <ClickThrough> and simultaneously sends a <ClickTracking> signal.

Thus, the VAST tag becomes a technical map for the player: where to get the video, which events to capture, which URLs to trigger, and how to handle user interaction.
The Minimum Required VAST Structure
In order for the player to show a simple video clip, VAST must have a basic set of elements.

An example of a minimal VAST:
<VAST version="3.0">
  <Ad id="SomeAdID">
    <InLine>
      <AdSystem>Ad_Server_Name</AdSystem>
      <AdTitle>Video Ad Title</AdTitle>
      <Impression><![CDATA[https://tracking.example.com/impression]]></Impression>
      <Creatives>
        <Creative>
          <Linear>
            <Duration>00:00:30</Duration>
            <MediaFiles>
              <MediaFile
                delivery="progressive"
                type="video/mp4"
                width="640"
                height="360"
              >
                <![CDATA[https://cdn.example.com/video.mp4]]>
              </MediaFile>
            </MediaFiles>
          </Linear>
        </Creative>
      </Creatives>
    </InLine>
  </Ad>
</VAST>
Basic elements:
  • <VAST version="3.0"> is the root element that indicates the version of the specification.
  • <Ad> is a container for describing a single advertising unit.
  • <InLine> is a full—fledged ad with media files, tracking, and click settings.
  • <AdSystem> is the system that generated the ad.
  • <AdTitle> is the name of the ad.
  • <Impression> is the URL that the player calls when recording the display.
  • <Creatives> — a block with creatives.
  • <Linear> is a linear video creative that is played before, during, or after the main content.
  • <Duration> — the duration of the video clip.
  • <MediaFile> is a link to the video file and its technical parameters.

This minimal block already allows you to display a video clip and record the launch of an ad.
The <VAST> Root Element
The <VAST> root tag always contains the version attribute, which indicates which version of the standard the XML document is based on.

For example:
<VAST version="4.2">
  <!-- The content of the advertisement -->
</VAST>
The version is important because players and advertising platforms focus on it when processing the tag. Some functions are only available in newer versions, while others may not be available in older implementations.

There can be one or more <Ad> elements inside the <VAST>. This allows you to transmit not only single commercials, but also sequences of ads, such as ad pods for CTV and OTT environments.

For Eastern Europe, this is especially useful in premium video and connected TV inventory, where ad blocks can consist of several videos and be sold using a more complex model than the classic pre-roll.
The <Ad> Tag and The Choice Between <InLine> and <Wrapper>
The <Ad> element is a container describing a specific advertising unit. It usually contains an id attribute that helps identify the ad.

There can be one of two main types of ads inside <Ad>:
  1. <InLine> is a full—fledged ad that already contains links to the video file, tracking events, click settings, and additional parameters.
  2. <Wrapper> is a wrapper that does not contain the final creative, but links to another VAST file via <VASTAdTagURI>.

When to choose InLine
<InLine> is suitable if the system already knows which video needs to be shown and can immediately return all the data: media file, tracking, click links and playback parameters.

This is a good option for direct campaigns, in-house ad server integrations, and scenarios where minimal latency and easy debugging are important.

When to choose Wrapper
<Wrapper> is used if the advertising request needs to be passed on: for example, from a publisher to an SSP, from an SSP to a DSP, or from a DSP to a third-party ad server.

Wrapper also allows you to add your own tracking links on top of an existing tag. This is convenient in programmatic chains, where several participants want to capture display events and interactions.
VAST's Key Elements and Their Role
<MediaFile>: video parameters
The <MediaFile> tag inside the <MediaFiles> block indicates the media resource that the player should play.

Basic attributes:
  • type — MIME is the file type, for example, video/mp4 or video/webm.
  • delivery is the delivery method: progressive for direct file download or streaming for streaming.
  • width and height are the resolution of the video.
  • bitrate is the bitrate of the video, which helps the player choose the appropriate quality.

The player can receive several <mediafiles> and select the most appropriate option, taking into account the device, connection speed, supported format and screen size.

In CTV and OTT environments, HLS and DASH are also often used. In such cases, it is important to check the compatibility of the player, the stream format, and the correctness of the fallback logic in advance.

<TrackingEvents>: Viewing and interaction events

The <TrackingEvents> block contains <Tracking> tags, each of which describes an event that the player should capture.

Example:
<TrackingEvents>
  <Tracking event="start"><![CDATA[https://tracking.example.com/start]]></Tracking>
  <Tracking event="firstQuartile"><![CDATA[https://tracking.example.com/25perc]]></Tracking>
  <Tracking event="midpoint"><![CDATA[https://tracking.example.com/50perc]]></Tracking>
  <Tracking event="thirdQuartile"><![CDATA[https://tracking.example.com/75perc]]></Tracking>
  <Tracking event="complete"><![CDATA[https://tracking.example.com/100perc]]></Tracking>
</TrackingEvents>
These events capture key viewing points:
start — the ad has started playing.;
firstQuartile — the user has viewed 25%;
midpoint — the user has viewed 50%;
thirdQuartile — the user has viewed 75%;
complete — the user has watched the video to the end.

The skip, pause, resume, mute, unmute, fullscreen, creativeView, and others events can also be used.

It is tracking events that allow advertisers to understand how the audience interacts with the video: whether they watch the video, skip the ad, turn off the sound, and click on the ad.

<VideoClicks>: managing clicks and landing pages
The <VideoClicks> tag defines what happens when a user clicks on a video ad.

Usually used inside it:
  • <ClickThrough> is the end URL to which the user will be redirected.
  • <ClickTracking> is a tracking link that records the fact of a click.
  • <CustomClick> — additional click scripts, if they are supported by the player.

Example:
<VideoClicks>
  <ClickThrough><![CDATA[https://advertiser.example.com/landing-page]]></ClickThrough>
  <ClickTracking><![CDATA[https://tracking.example.com/click]]></ClickTracking>
</VideoClicks>
When clicked, the player first sends a request to <ClickTracking>, and then opens <ClickThrough>. This helps advertising platforms count CTR, analyze the effectiveness of a creative, and compare clicks with the user's subsequent actions.

For campaigns in Eastern Europe, it is important to take into account in advance the language version of the landing page, the user's country, the rules of consent management and the correct operation of redirects between local domains.

<Extensions>: additional functionality
The <Extensions> block allows you to add custom data without violating the basic VAST structure.

Inside <Extensions> can be passed:
  • parameters of interactive formats;
  • data for VPAID or SIMID scenarios;
  • verification pixels;
  • brand safety parameters;
  • additional campaign IDs;
  • settings specific to a specific adtech platform.

The player can ignore unknown extensions or process them if it supports the appropriate logic.

This makes VAST flexible enough: the basic standard remains the same, but if necessary, it can be expanded to meet the requirements of specific partners, platforms, or regional campaigns.
Inline and Wrapper: When and Why to Use
Direct embedding of the creative via Inline
Inline ads are used when the advertising platform has already selected a specific video and can immediately transmit all the data in a single XML document.

Scenario: The advertiser, agency, or direct ad server stores the video, tracking links, and CTA. In this case, it is logical to return <InLine> so that the player immediately receives the final instructions and does not waste time on additional requests.

Advantages of Inline:
  • minimum delay;
  • easier to test and debug;
  • less risk of losing tracking events;
  • higher predictability of playback.

The disadvantage of Inline:
if you need to add third-party pixels or verification, they must be specified in advance in the VAST structure.

Multi-level redirects via Wrapper
Wrapper is more often used in a programmatic environment where multiple participants participate in the selection and delivery of a creative.

For example, the player accesses the SSP, the SSP returns the Wrapper to the DSP, the DSP can pass the request on to the ad server, and the final creative comes in the last VAST response.

Advantages of Wrapper:
  • flexible request transmission logic;
  • the ability to connect multiple partners;
  • convenient addition of tracking at different levels of the chain;
  • the ability to change the advertising provider without changing the player code.

Disadvantages of Wrapper:
  • increased latency due to additional requests;
  • higher risk of timeouts;
  • Error diagnosis is more difficult;
  • the probability of losing some of the events in case of incorrect implementation.

For CTV and OTT inventory, long Wrapper chains are particularly sensitive: the user expects smooth viewing, and delays in loading ads directly worsen the experience.

In practice, it is worth limiting the number of Wrapper levels and agreeing in advance with partners on the acceptable depth of the chain. It is optimal to keep the structure as short and testable as possible.
The Main Tracking Events and How They Help In Analytics
Video ads are valuable not only for their coverage, but also for the quality of their analytics. Through <TrackingEvents>, you can collect data that helps you evaluate creative effectiveness, inventory quality, and audience behavior.

Quartile events
The quartile events show how deeply the user watched the video:
  • Start — the video has started playing.
  • FirstQuartile — 25% viewed.
  • Midpoint — 50% viewed.
  • ThirdQuartile — 75% viewed.
  • Complete — the video has been watched to the end.

If a significant portion of users leave before midpoint, this may indicate that the video is too long, the first screen is weak, the audience is irrelevant, or the placement is unsuccessful.

If the completion rate is high but there are few clicks, it is possible that the video holds the attention well, but the CTA is not noticeable enough or the landing page does not meet the user's expectations.

Other important events
  • ClickThrough. The user went to the advertiser's website.
  • ClickTracking. The system registered the fact of the click.
  • Skip. The user skipped the video if the format supports skipoffset.
  • Pause and Resume. The user paused the ad or resumed viewing.
  • Mute and Unmute. The user turned the sound off or on.
  • Error. The player encountered an error when loading or playing the ad.

These events help not only to count the campaign, but also to optimize it. For example, a high skip rate may indicate a problem with creativity or targeting, and frequent errors may indicate technical problems with a tag, media file, or Wrapper chain.
Recommendations for Reducing Errors In the Structure
Errors in VAST-XML can lead to ads not being played, tracking not working, or analytics being incomplete. Below are practical recommendations for checking the tag before launching.

Use CDATA for links
If the URL contains special characters, such as &, < or >, it is better to enclose the link in CDATA so as not to violate the XML syntax.
<ClickThrough><![CDATA[https://example.com/?param=1&another=2]]></ClickThrough>
This is especially important for click-through links, tracking pixels, and redirects, where UTM tags and additional parameters are often used.

Keep track of the duration and skipoffset
<Duration> must be specified in the HH:MM:SS format, for example:
<Duration>00:00:15</Duration>
If the video is skipped, the skipoffset value should not exceed the duration of the video. Otherwise, the player may incorrectly handle the possibility of skipping.

Check the logical consistency
Tracking events must match the actual structure of the ad.

For example:
  • if the video is not skippable, you should not specify the skip event.;
  • if there is no clickable area, you do not need to add unnecessary click logic.;
  • if several <mediafiles> are specified, they must be actually available.;
  • The width and height dimensions must match the video parameters.;
  • The MIME type must match the actual file format.

Control Wrapper chains
If you do not control the entire chain, agree with your partners on the allowed number of Wrapper levels. Too long a chain increases delays and complicates diagnosis.

For premium CTV and OTT inventory, it is better to use shorter and more predictable chains to reduce the risk of timeouts and deterioration of the user experience.

Check for compatibility with privacy requirements
For campaigns in Eastern Europe, it is important to consider which markets the ads will be launched in.

In EU countries, it is necessary to take into account GDPR, ePrivacy and the work of the consent management platform. Other jurisdictions in the region may have their own requirements for processing user data.

Technically, this means that advertising requests, identifiers, user parameters, and third-party pixels must be used correctly and within the framework of applicable rules.

Test the tag before launching
Before production, the VAST tag must be checked in a validator or inspector. This helps to identify in advance:
  • XML syntax errors;
  • inaccessible media files;
  • invalid URLs;
  • problems with tracking;
  • Wrapper chains are too long;
  • incompatible video formats;
  • errors when processing clicks.

Tools like UMG VAST Inspector allow you to quickly check the tag and reduce the risk of problems when launching a campaign.
Conclusion
The VAST XML structure is the basis of any video advertising system. It determines which video will be played, which events will be recorded, how the click will be processed, and how accurate the analytics will be.

Understanding the key elements — <MediaFile>, <TrackingEvents>, <VideoClicks>, <Extensions>, <InLine>, and <Wrapper> — helps advertisers, publishers, and developers avoid common mistakes and build a more reliable advertising infrastructure.

The main conclusions:
  • VAST sets a single standard for interaction between the player and the advertising platform.
  • The XML structure must be syntactically correct and logically consistent.
  • Inline is better suited for direct creative delivery and minimal delay.
  • Wrapper is useful for programmatic chains, but requires control of the depth of redirects.
  • TrackingEvents provide accurate analytics of views, clicks, skips, and errors.
  • For regional campaigns, it is important to consider not only the technical structure of the tag, but also data requirements, consent management, and compatibility with local platforms.

In the next article, we'll look at how VAST has evolved: from the first versions to the 4.x specifications. We'll also look at why the new features of the standard are especially important when working with OTT, CTV, and premium video inventory.