This is an in-depth technical explainer. If you're looking for a high-level introduction to Attribution Reporting with event-level reports, head over to Event-level reports in the Attribution Reporting API.
A list of all API guides and blog posts for this API is also available here.
This document is an explainer for a potential new web platform feature which allows for measuring and reporting ad click and view conversions.
See the explainer on aggregate measurement for a potential extension on top of this.
Table of Contents
- Motivation
- Related work
- API Overview
- Registering attribution sources
- Handling an attribution source event
- Publisher-side Controls for Attribution Source Declaration
- Triggering Attribution
- Registration requests
- Data limits and noise
- Trigger attribution algorithm
- Multiple sources for the same trigger (Multi-touch)
- Sending Scheduled Reports
- Attribution Reports
- Data Encoding
- Optional attribution filters
- Optional: extended debugging reports
- Noisy fake conversion example
- Storage limits
- Privacy Considerations
- Security considerations
Currently, the web ad industry measures conversions via identifiers they can associate across sites. These identifiers tie information about which ads were clicked to information about activity on the advertiser's site (the conversion). This allows advertisers to measure ROI, and for the entire ads ecosystem to understand how well ads perform.
Since the ads industry today uses common identifiers across advertiser and publisher sites to track conversions, these common identifiers can be used to enable other forms of cross-site tracking.
This doesn’t have to be the case, though, especially in cases where identifiers such as third-party cookies are either unavailable or undesirable. A new API surface can be added to the web platform to satisfy this use case without them, in a way that provides better privacy to users.
This API alone will not be able to support all conversion measurement use cases. We envision this API as one of potentially many new APIs that will seek to reproduce valid advertising use cases in the web platform in a privacy preserving way. In particular, we think this API could be extended by using server side aggregation to provide richer data, which we are continuing to explore.
There is an alternative Private Click Measurement draft spec in the PrivacyCG. See this WebKit blog post for more details.
There are multiple aggregate designs in wicg/privacy-preserving-ads, including Bucketization and MaskedLARK.
Folks from Meta and Mozilla have published the Interoperable Private Attribution(IPA) proposal for doing aggregate attribution measurement.
Brave has published and implemented an Ads Confirmation Protocol.
Attribution sources are events which future triggers can be attributed to.
Sources are registered by returning a new HTTP response header on requests
which are eligible for attribution. A request is eligible as long as it has
the Attribution-Reporting-Eligible
request header.
There are two types of attribution sources, navigation
sources and event
sources.
navigation
sources are registered via clicks on anchor tags:
<a href="https://advertiser.example/landing"
attributionsrc="https://adtech.example/attribution_source?my_ad_id=123">
click me
</a>
or via calls to window.open
that occur with transient
activation:
// Encode the attributionsrc URL in case it contains special characters, such as '=', that will
// cause the parameter to be improperly parsed
const encoded = encodeURIComponent('https://adtech.example/attribution_source?my_ad_id=123');
window.open(
"https://advertiser.example/landing",
"_blank",
`attributionsrc=${encoded}`);
event
sources do not require any user interaction and can be registered via
<img>
or <script>
tags with the new attributionsrc
attribute:
<img src="https://advertiser.example/pixel"
attributionsrc="https://adtech.example/attribution_source?my_ad_id=123">
<script src="https://advertiser.example/register-view"
attributionsrc="https://adtech.example/attribution_source?my_ad_id=123">
Specifying a URL value for attributionsrc
within <a>
, <img>
, <script>
or
window.open
will cause the browser to initiate a separate keepalive
fetch
request which includes the Attribution-Reporting-Eligible
request header.
When the attributionsrc
attribute is present in these surfaces/APIs, both with
and without a value, existing requests made via src
/href
attributes or
window.open
will now include the Attribution-Reporting-Eligible
request
header. Each of these requests will be able to register attribution sources.
event
sources can also be registered using existing JavaScript request
APIs by setting the Attribution-Reporting-Eligible
header manually:
const headers = {
'Attribution-Reporting-Eligible': 'event-source'
};
// Optionally set keepalive to ensure the request outlives the page.
window.fetch("https://adtech.example/attribution_source?my_ad_id=123",
{ headers, keepalive: true });
Other requests APIs which allow specifying headers (e.g. XMLHttpRequest
)
will also work.
The response to these requests will configure the API via a new JSON HTTP
header Attribution-Reporting-Register-Source
of the form:
{
"source_event_id": "12340873456",
"destination": "[eTLD+1]",
"expiry": "[64-bit signed integer]",
"priority": "[64-bit signed integer]"
}
-
destination
: an origin whose eTLD+1 is where attribution will be triggered for this source. -
source_event_id
: (optional) A string encoding a 64-bit unsigned integer which represents the event-level data associated with this source. This will be limited to 64 bits of information but the value can vary. Defaults to 0. -
expiry
: (optional) expiry in seconds for when the source should be deleted. Default is 30 days, with a maximum value of 30 days. The maximum expiry can also vary between browsers. This will be rounded to the nearest day. -
priority
: (optional) a signed 64-bit integer used to prioritize this source with respect to other matching sources. When a trigger redirect is received, the browser will find the matching source with highestpriority
value and generate a report. The other sources will not generate reports.
Once this header is received, the browser will proceed with handling an attribution source event. Note that it is possible to register multiple sources for the same request using HTTP redirects (though these multiple sources may not set distinct destinations).
Note that we sometimes call the attributionsrc
's origin the "reporting origin"
since it is the origin that will end up receiving attribution reports.
A navigation
attribution source event will be logged to storage if the
resulting document being navigated to ends up sharing an eTLD+1 with the
destination
origin. Additionally, the navigation needs occur with transient
user activation.
event
sources don’t require any of the above constraints to be logged.
An attribution source will be eligible for reporting if any page on the
destination
eTLD+1 (advertiser site) triggers attribution for the associated
reporting origin.
In order to prevent arbitrary third parties from registering sources without the publisher’s knowledge, the Attribution Reporting API will need to be enabled in child contexts by a new Permissions Policy:
<iframe src="https://advertiser.example" allow="attribution-reporting 'src'">
<a … attributionsrc="https://ad-tech.example?..."></a>
</iframe>
The API will be enabled by default in the top-level context and in same-origin children. Any script running in these contexts can declare a source with any reporting origin. Publishers who wish to explicitly disable the API for all parties can do so via an HTTP header.
Without a Permissions Policy, a top-level document and cooperating iframe could
recreate this functionality. This is possible by using
postMessage
to send the source_event_id
, attributionsrc
origin, destination
values to
the top level document who can then wrap the iframe in an anchor tag (with some
additional complexities behind handling clicks on the iframe). Using Permissions
Policy prevents the need for these hacks. This is inline with the classification
of powerful features as discussed on this
issue.
NOTE: For the Chromium Origin Trial, the Chromium implementation of the Attribution Reporting API will temporarily ship with a Permissions Policy default of *
, which bypasses the need for top-level documents to delegate permission to cross-origin iframes.
Attribution can only be triggered for a source on a page whose eTLD+1 matches
the eTLD+1 of the site provided in destination
. To trigger attribution, a
similar mechanism is used as source event registration, via HTML:
<img src="https://ad-tech.example/conversionpixel"
attributionsrc="https://adtech.example/attribution_trigger?purchase=13">
or JavaScript:
const headers = {
'Attribution-Reporting-Eligible': 'trigger'
};
// Optionally set keepalive to ensure the request outlives the page.
window.fetch("https://adtech.example/attribution_trigger?purchase=13",
{ headers, keepalive: true });
As a stop-gap to support pre-existing conversion tags which do not include the
attributionsrc
attribute, or use a different Fetch API, the browser will also
process trigger registration headers for all subresource requests on the page
where the attribution-reporting
Permissions Policy is enabled.
Like source event registrations, these requests should respond with a new HTTP
header Attribution-Reporting-Register-Trigger
which contains information
about how to treat the trigger event:
{
"event_trigger_data": [{
"trigger_data": "[unsigned 64-bit integer]",
"priority": "[signed 64-bit integer]",
"deduplication_key": "[unsigned 64-bit integer]"
}]
}
trigger_data
: optional coarse-grained data to identify the triggering event. The value will be limited to either 3 bits or 1 bit depending on the attributed source type.priority
: optional signed 64-bit integer representing the priority of this trigger compared to other triggers for the same source.deduplication_key
: optional unsigned 64-bit integer which will be used to deduplicate multiple triggers which contain the samededuplication_key
for a single source.
When this header is received, the browser will schedule an attribution report as detailed in Trigger attribution algorithm. Note that the header can be present on redirect requests.
Triggering attribution requires the attribution-reporting
Permissions Policy
to be enabled in the context the request is made. As described in Publisher
Controls for Attribution Source
Declaration, this
Permissions Policy will be enabled by default in the top-level context and in
same-origin children, but disabled in cross-origin children.
Navigation sources may be attributed up to 3 times. Event sources may be attributed up to 1 time.
Depending on the context in which it was made, a request is eligible to
register sources, triggers, sources or triggers, or nothing, as indicated in
the Attribution-Reporting-Eligible
request header, which is a structured
dictionary.
The reporting origin may use the value of this header to determine which registrations, if any, to include in its response. The browser will likewise ignore invalid registrations:
<a>
andwindow.open
will havenavigation-source
.- Other APIs that automatically set
Attribution-Reporting-Eligible
(like<img>
) will containevent-source, trigger
. - Requests from JavaScript, e.g.
window.fetch
, can set this header manually, but it is an error for such requests to specifynavigation-source
. - All other requests will not have the
Attribution-Reporting-Eligible
header. For those requests the browser will permit trigger registration only.
The source_event_id
will be limited to 64 bits of information to enable
uniquely identifying an ad click.
The advertiser-side data must therefore be limited quite strictly, by limiting
the amount of data and by applying noise to the data. navigation
sources will
be limited to only 3 bits of trigger_data
, while event
sources will be
limited to only 1 bit.
Noise will be applied to whether a source event will be reported truthfully.
When an attribution source is registered, the browser will perform one of the
following steps given a probability p
:
- With probability
1 - p
, the browser logs the source as normal - With probability
p
, the browser chooses randomly among all the possible output states of the API. This includes the choice of not reporting anything at all, or potentially reporting multiple fake reports for the event.
Note that this scheme is an instantiation of k-randomized response, see Differential privacy.
Strawman: we can set p
such that each source is protected with randomized
response that satisfies an epsilon value of 14. This would entail:
p = .24%
fornavigation
sourcesp = .00025%
forevent
sources
Note that correcting for this noise addition is straightforward in most cases,
please see <TODO link to de-biasing advice/code snippet here>. Reports will be
updated to include p
so that noise correction can work correctly in the event
that p
changes over time, or if different browsers apply different
probabilities:
{
"randomized_trigger_rate": 0.0024,
...
}
Note that these initial strawman parameters were chosen as a way to ease adoption of the API without negatively impacting utility substantially. They are subject to change in the future with additional feedback and do not necessarily reflect a final set of parameters.
When the browser receives an attribution trigger redirect on a URL matching the
destination
eTLD+1, it looks up all sources in storage that match
<attributionsrc
origin, destination
> and picks the one with the greatest
priority
. If multiple sources have the greatest priority
, the
browser picks the one that was stored most recently.
The browser then schedules a report for the source that was picked by storing
{attributionsrc
origin, destination
eTLD+1, source_event_id
,
decoded trigger_data
, priority
, deduplication_key
} for
the source. Scheduled reports will be sent as detailed in Sending scheduled
reports.
The browser will create reports for a source only if the trigger's
deduplication_key
has not already been associated with a report for that source.
Each navigation
source is allowed to schedule only a maximum of three reports,
while each event
source is only allowed to schedule a maximum of one.
If a source has already scheduled the maximum number of reports when a new report is being scheduled, the browser will compare the priority of the new report with the priorities of the scheduled reports for that source. If the new report has the lowest priority, it will be ignored. Otherwise, the browser will delete the scheduled report with the lowest priority and schedule the new report.
If multiple sources were registered and associated with a single attribution trigger, send reports for the one with the highest priority. If no priority is specified, the browser performs last-touch.
There are many possible alternatives to this, like providing a choice of rules-based attribution models. However, it isn’t clear the benefits outweigh the additional complexity. Additionally, models other than last-click potentially leak more cross-site information if sources are clicked across different sites.
Reports for event
sources will be sent 1 hour after the source expires at its
expiry
.
Reports for navigation
sources may be reported earlier than the source's
expiry
, at specified points in time relative to when the source event was
registered. See
here for the
precise algorithm.
Note that the report may be sent at a later date if the browser was not running when the window finished. In this case, reports will be sent on startup. The browser may also decide to delay some of these reports for a short random time on startup, so that they cannot be joined together easily by a given reporting origin.
To send a report, the browser will make a non-credentialed (i.e. without session cookies) secure HTTP POST request to:
https://<reporting origin>/.well-known/attribution-reporting/report-event-attribution
The report data is included in the request body as a JSON object with the following keys:
-
attribution_destination
: the attribution destination set on the source -
source_event_id
: 64-bit event id set on the attribution source -
trigger_data
: Coarse data set in the attribution trigger registration -
report_id
: A UUID string for this report which can be used to prevent double counting -
source_type
: Either "navigation" or "event", indicates whether this source was associated with a navigation. -
randomized_trigger_rate
: Decimal number between 0 and 1 indicating how often noise is applied.
The source event id and trigger data should be specified in a way that is amenable to the privacy assurances a browser wants to provide (i.e. the number of distinct data states supported).
The input values will be 64-bit integers which the browser will interpret modulo its maximum data value chosen by the browser. The browser will take the input and performs the equivalent of:
function getData(input, max_value) {
return input % max_value;
}
The benefit of this method over using a fixed bit mask is that it allows browsers to implement max_values that aren’t multiples of 2. That is, browers can choose a "fractional" bit limit if they want to.
Source and trigger registration has additional optional functionality to both:
- Selectively filter some triggers (effectively ignoring them)
- Choose trigger data based on source event data
This can be done via simple extensions to the registration configuration.
Source registration:
{
"source_event_id": "12345678",
"destination": "https://toasters.example",
"expiry": "604800000",
"filter_data": {
"conversion_subdomain": ["electronics.megastore", "electronics2.megastore"],
"product": ["1234"]
// Note that "source_type" will be automatically generated as
// one of {"navigation", "event"}
}
}
Trigger registration:
{
... // existing fields, such as `event_trigger_data`
// Note that "not_filters", which filters with a negation, is also supported.
"filters": {
"conversion_subdomain": ["electronics.megastore"],
// Not set on the source side, so this key is ignored
"directory": ["/store/electronics]"
}
}
If keys in the filters JSON match keys in filter_data
, the trigger is
completely ignored if the intersection is empty.
Note: A key which is present in one JSON and not the other will not be included in the matching logic.
Note: The filter JSON does not support nested dictionaries or lists.
filter_data
and filters
are only allowed to have a list of values with
string type.
The event_trigger_data
field can also be extended to do selective filtering
to set trigger_data
based on filter_data
:
// Filter by the source type to handle different bit limits.
{
"event_trigger_data": [
{
"trigger_data": "2",
// Note that "not_filters", which filters with a negation, is also supported.
"filters": {"source_type": ["navigation"]}
},
{
"trigger_data": "1",
"filters": {"source_type": ["event"]}
}
]
}
If the filters do not match for any of the event triggers, no event-level report will be created.
If the filters match for multiple event triggers, the first matching event trigger is used.
The Attribution Reporting API is a new and fairly complex way to do attribution measurement without third-party cookies. As such, we are open to introducing a transitional mechanism to learn more information about attribution reports while third-party cookies are available. This ensures that the API can be fully understood during roll-out and help flush out any bugs (either in browser or caller code), and more easily compare the performance to cookie-based alternatives.
Source and trigger registrations will both accept a new field debug_key
:
{
...
"debug_key": "[64-bit unsigned integer]"
}
Reports will include up to two new parameters which pass any specified debug keys from source and trigger events unaltered:
{
// normal report fields...
"source_debug_key": "[64-bit unsigned integer]",
"trigger_debug_key": "[64-bit unsigned integer]"
}
If a report is created with both source and trigger debug keys, a duplicate debug
report will be sent immediately to a
.well-known/attribution-reporting/debug/report-event-attribution
endpoint. The debug reports will be identical to normal reports, including the
two debug key fields. Including these keys in both allows tying normal reports
to the separate stream of debug reports.
Note that event-level reports associated with false trigger events
will not have trigger_debug_key
s. This allows developers to more
closely understand how noise is applied in the API.
To ensure that this data (which could enable cross-site tracking) is only
available in a transitional phase while third-party cookies are available and
are already capable of user tracking, the browser will check (at both source
and trigger registration) for the presence of a special SameSite=None
cookie
set by the reporting origin:
Set-Cookie: ar_debug=1; SameSite=None; Secure; HttpOnly
If a cookie of this form is not present, debugging information will be ignored.
Note that in the context of proposals such as CHIPS, the cookie must be unpartitioned in order to allow debug keys to be registered.
publisher.example
wants to show ads on their site, so they contract out to
ad-tech.example
. ad-tech.example
's script in the main document creates a
cross-origin iframe to host the third party advertisement for
toasters.example
.
Within the iframe, ad-tech-3p.example
code annotates their anchor tags to use
the ad-tech.example
reporting origin, and sets the attributionsrc
attribute
based on the ad that was served (e.g. some ad with id 123456).
<iframe src="https://ad-tech-3p.example/show-some-ad"
allow="attribution-reporting">
...
<a
href="https://toasters.example/purchase"
attributionsrc="https://ad-tech.example?adid=123456">
click me!
</a>
...
</iframe>
A user clicks on the ad and this opens a window that lands on a URL to
toasters.example/purchase
. In the background, the browser issues an HTTP
request to https://ad-tech.example?adid=123456
. The ad-tech responds with a
Attribution-Reporting-Register-Source
JSON header:
{
"source_event_id": "12345678",
"destination": "https://toasters.example",
"expiry": "604800000"
}
2 days later, the user buys something on toasters.example
. toasters.example
triggers attribution on the few different ad-tech companies it buys ads on,
including ad-tech.example
, by adding conversion pixels:
<img src="..." attributionsrc="https://ad-tech.example/trigger-attribution?model=toastmaster3000&price=$49.99&...">
ad-tech.example
receives this request, and decides to trigger attribution on
toasters.example
. They must compress all of the data into 3 bits, so
ad-tech.example
chooses to encode the value as "2" (e.g. some bucketed version
of the purchase value). They respond to the request with an
Attribution-Reporting-Register-Trigger
header:
{
"event_trigger_data": [{
"trigger_data": "2"
}]
}
The browser sees this response, and schedules a report to be sent. The report is
associated with the 7-day deadline as the 2-day deadline has passed. Roughly 5
days later, ad-tech.example
receives the following HTTP POST to
https://ad-tech.example/.well-known/attribution-reporting/report-event-attribution
with the following body:
{
"attribution_destination": "https://toasters.example",
"source_event_id": "12345678",
"trigger_data": "2"
}
Assume the caller uses the same inputs as in the above example, however, the
noise mechanism in the browser chooses to generate
completely fake data for the source event. This occurs with some probability
p
.
To generate fake events, the browser considers all possible outputs for a given source event:
- No reports at all
- One report with metadata "0" at the first reporting window
- One report with metadata "1" at the first reporting window and one report with metadata "3" at the second reporting window
- etc. etc. etc.
After enumerating all possible outputs of the API for a given source event, the browser simply selects one of them at random uniformly. Any subsequent true trigger events that would be attributed to the source event are completely ignored.
In the above example, the browser could have chosen to generate three reports:
- One report with metadata "7", sent 2 days after the click
- One report with metadata "3", sent 7 days after the click
- One report with metadata "0", also sent 7 days after the click
The browser may apply storage limits in order to prevent excessive resource usage.
Strawman: There should be a limit of 1024 pending sources per source origin.
Strawman: There should be a limit of 1024 pending event-level reports per destination site.
A primary privacy goal of the API is to make linking identity between two different top-level sites difficult. This happens when either a request or a JavaScript environment has two user IDs from two different sites simultaneously.
Secondary goals of the API are to:
- give some level of plausible deniability to cross-site data leakage associated with source events.
- limit the raw amount of cross-site information a site can learn relative to a source event
In this API, the 64-bit source ID can encode a user ID from the publisher’s top- level site, but the low-entropy, noisy trigger data could only encode a small part of a user ID from the advertiser’s top-level site. The source ID and the trigger data are never exposed to a JavaScript environment together, and the request that includes both of them is sent without credentials and at a different time from either event, so the request adds little new information linkable to these events. This allows us to limit the information gained by the ad-tech relative to a source event.
Additionally, there is a small chance that all the output for a given source event is completely fabricated by the browser, giving the user plausible deniability whether subsequent trigger events actually occurred the way they were reported.
Trigger data, e.g. advertiser-side data, is extremely important for critical use cases like reporting the purchase value of a conversion. However, too much advertiser-side data could be used to link advertiser identity with publisher identity.
Mitigations against this are to provide only coarse information (only a few bits at a time), and introduce some noise to the API output. Even sophisticated attackers will therefore need to invoke the API multiple times (through multiple clicks/views) to join identity between sites with high confidence.
Note that this noise still allows for aggregate measurement of bucket sizes with an unbiased estimator (assuming rate-limits are not hit) See generic approaches of dealing with Randomized response for a starting point.
TODO: Update this script to account for the more complicated randomized response approach.
By bucketing reports within a small number reporting deadlines, it becomes harder to associate a report with the identity of the user on the advertiser’s site via timing side channels.
Reports within the same reporting window occur within an anonymity set with all others during that time period. For example, if we didn’t bucket reports with a delay (and instead sent them immediately after a trigger event), the reports (which contain publisher IDs) could be easily joined up with the advertiser’s first-party information via correlating timestamps.
Note that the delay windows / deadlines chosen represent a trade-off with utility, since it becomes harder to properly assign credit to a click if the time from click to conversion is not known. That is, time-to-conversion is an important signal for proper attribution. Browsers should make sure that this trade-off is concretely evaluated for both privacy and utility before deciding on a delay.
If the advertiser is allowed to cycle through many possible reporting origins, then the publisher and advertiser don’t necessarily have to agree apriori on what origin to use, and which origin actually ends up getting used reveals some extra information.
To prevent this kind of abuse, the browser should limit the number of reporting origins per <source site, destination site> pair, counted per source registration. This should be limited to 100 origins per 30 days.
Additionally, there should be a limit of 10 reporting origins per <source site, destination site, 30 days>, counted for every attribution that is generated.
Attribution source data and attribution reports in browser storage should be clearable using existing "clear browsing data" functionality offered by browsers.
To limit the amount of user identity leakage between a <source site, destination> pair, the browser should throttle the amount of total information sent through this API in a given time period for a user. The browser should set a maximum number of attributions per