Price Modeling: The right level of data aggregation

Price Modeling: The right
level of data aggregation
Authors:
Amit Gupta
Hari Hariharan
©2014 Copyright Fractal Analytics, Inc., all rights reserved.
Confidential and proprietary Information of Fractal Analytics Inc. Fractal is a registered trademark of Fractal Analytics Limited.
Price Modeling: The right level of data aggregation
Abstract
When developing price models, market researchers face the
constant debate around using market and retailer level data
versus store-level data. Aggregated data carries aggregation
bias, which affects parameter estimates. Research shows
aggregation bias can be avoided by examining homogeneous
entities and restraining the projection of estimates. Storelevel data, in comparison, can lead to erroneous estimates
due to noise caused by local effects.
In addition, store-level data is expensive, often has restricted
access and can be hard to work with.
This paper provides a point-of-view and framework to
identify the right level of data that appropriately suits the
level of decision making for efficient and accurate forecasts.
Price Modeling: The right level of data aggregation
Table of Contents
1. Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2. Objectives and Research questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
3. Issue with Aggregated Data – Aggregation Bias . . . . . . . . . . . . . . . . . . . . . . 2
3.1 What is aggregation bias? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
3.2 What are the antecedents of Aggregation Bias . . . . . . . . . . . . . . . . . . . 3
3.3 Aggregation bias is bigger when a smaller component of an
aggregated set is exposed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
4. Dealing with Bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
4.1 Aggregation bias can be avoided without store-level data . . . . . . . . . . 4
5. Problems with store-level data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
6. Proposed Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
7. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Price Modeling: The right level of data aggregation
1. Motivation
data or market level data
CPG companies increasingly rely on econometric
This choice of data for analyses is driven by a
models to enable data driven marketing decision
making in the area of pricing and promotions.
They have realized that such rigorous scientific
decision making has made significant bottom line
impact. Businesses rely on either external
consultants or have internal expertise to design,
develop, implement and interpret the results of
these models. All the businesses rely on the two
primary sources of syndicated data, namely IRI
and Nielsen, to create and deploy the
econometric models.
When designing the research approach for
conducting a price-and-promo analysis
researchers have to make crucial decisions
regarding the level of data with which to work. In
such studies, the statistical models that need to
be built can leverage the data at different levels
of aggregation across channels (store, market or
retailer), products (SKUs or Product Price
Groups) and time (weekly, monthly etc.). In
practice, since it is difficult to deal with the huge
number of SKUs, CPG and Retailers tend to rely
on product price groups for pricing decisions,
and the use of weekly data has become the
normal practice as well. The more debated
question is whether to build models with storelevel data or aggregated retailer/account level
Copyright © Fractal 2013 – 2014
few factors. The first factor is the availability
and cost of obtaining data at the different
levels of aggregation. Second, the amount of
time and effort required to work with the data
should be considered. Third and perhaps the
most important consideration is the level of
data aggregation that provides the most
accurate parameter estimates which in turn
facilitates accurate sales estimation and
forecasts.
Industry experts also have different
perspectives. Some believe that models built
with aggregated data produce biased
parameter estimates, given that the models
are inherently non-linear while the aggregation
is linear. The biased estimates in turn produce
inaccurate forecasts. The store level data has
its own challenges as well. First, the data is
based on a sample of stores. Second, store
level data carries a lot of noise (due to local
factors) which cannot be accounted for in the
models leading to inaccurate estimates. For
example, sales of sodas may run up in a store
because there was a local football match
nearby. However such data could never get
captured leading to inaccurate estimates.
1
Price Modeling: The right level of data aggregation
2. Objectives and Research
questions
The objective of the paper is to help marketers
and researchers align the level of syndicated data
aggregation with their marketing problem.
The specific questions that this paper will
address are:
 What are the different levels of data
aggregation that can be used to model price?
 What is the accuracy of the estimates of the
different methods of aggregation?
 How can we handle biases in data?
3. Issue with Aggregated Data –
Aggregation Bias
3.1 What is aggregation bias?
Aggregation bias refers to an inaccurate slant in
analytic studies caused by aggregating data to
build a smaller number of data-points, for
analytics. This aggregation can be of data that
belongs to different products, channels, markets
or points in time.
Consider the example of two stores in a market.
The first store has a promotional event while the
second one doesn’t. When we model store-level
 How can we handle noise in data?
data, the model will capture the lift in the first
 How do we choose the right level of
store. There is no lift beyond base sales in the
aggregation for a given decision?
second store. If, on the other hand, we model
Based on the above, we will provide a framework
to assess the alignment of the data aggregation
method with the marketing decision problem.
aggregated data, the model will capture the lift
in sales given 50 percent ACV of feature,
assuming both stores are equal size.
The framework should help marketers and
The lift captured through 50 percent ACV of
decision makers to choose the right level of data
promotion when projected to 100 percent ACV
aggregation for their specific marketing decision
is likely to give a biased result, overstating the
problems.
lift. This is aggregation bias, which emerged
because the data for 2 stores were aggregated
to build a single piece of data to be analyzed.
Copyright © Fractal 2013 – 2014
2
Price Modeling: The right level of data aggregation
3.2 What are the antecedents of
aggregation bias?
to size or variant level, days are aggregated to
Heterogeneity is one important antecedent of
a store are aggregated to total store sales.
aggregation bias. In the example above two
stores were aggregated because they had
different exposure to promotions – one had the
promotion while the other did not.
weeks or months, and different shoppers within
Why then are researchers only concerned with
store data aggregated at the market level, and
not with other aggregations?
Researchers show that biases occur when we
The reason is simple: aggregation bias occurs
aggregate entities that are not homogeneous, or
only when heterogeneous entities are grouped
in other words, don’t witness the same activity
together despite their competing characteristics.
or input. (see Link 1995; Wittink et al 1997)
Consider ‘aggregation’ from a wider perspective.
We know that scanner data can be characterized
using three different dimensions and each may
have different levels of aggregation (see Table 1)
Table 1: Dimensions in scanner data
S.No. Dimension Levels and aggregation
1
Channel
Transaction, shopper,
store, market, retailer
2
Product
SKU, size, variant, brand
3
Time
Day, week, month
ACV (All Commodity Value) is a measure of the width or coverage
of a promotional event.
1
As Table 1 illustrates, to some extent
aggregation may happen on each of the
 When we aggregate different shoppers within
a store to get to store level sales, there is no
aggregation bias since they are all exposed to
same activity in the store.
 Similarly, when we aggregate different SKUs
that follow same pricing strategy to get size
level group, there is no bias. This set of SKUs
is termed a Promoted Product Group (PPG).
 When we aggregate days of a week to weekly
sales, there is no bias since all seven days of a
week witness the same activity most of the
time.
If we group a set of stores that follow the same
pricing and promotions, there will be no bias
because these stores are homogeneous.
dimensions. For example, SKUs are aggregated
Copyright © Fractal 2013 – 2014
3
Price Modeling: The right level of data aggregation
3.3 Aggregation bias is bigger when a
smaller component of an aggregated
set is exposed
Research suggests that you will have to create a
Research also suggests that the bias is bigger
paper focuses on the ways of avoiding the bias
when a smaller set of stores in a market are
and not on how to adjust for the bias once it is
exposed to an activity (see Link 1995). In other
there.
words, in the aggregated model, when we
estimate a parameter for an activity that is 10
percent ACV wide as opposed to another with
custom category specific solution to mitigate the
effects of bias. (see Wittink et al 1997). This
4.1 Aggregation bias can be avoided
without store-level data
70 percent width, the bias is much larger when
While there are ways to adjust the estimates to
projected to 100 percent ACV for the activity
overcome the bias, it is better to try to avoid the
that is 10 percent wide.
bias altogether. There are specific reasons that
cause and govern the extent of bias. These are:
Hence another significant factor that causes or
impacts the extent of bias is the gap between
(i) Heterogeneity in aggregated entities
the actual and projected amount of activity in
(ii) The gap between the actual occurrence and
the aggregated set of stores.
projected activity
4. Dealing with Bias
Once these reasons are known, it is possible to
avoid the bias using the following methods:
The bias that emerges from aggregation can be
Meaningful aggregation of homogeneous
dealt in two different ways:
entities
1. Avoid the bias by using the appropriate
level of data aggregation.
2. Adjust the parameter estimates to mitigate
the bias.
Most pricing and promotion decisions are made
at retailer (e.g. Kroger) and retailer-market
(e.g. Kroger Ohio) combination level. If
researchers use data aggregated at these levels
Researchers have established several ways of
they will avoids aggregation bias and at the
countering the bias by adjusting the parameter
same time also avoid noise that appears in
estimates.
disaggregated store-level data.
Copyright © Fractal 2013 – 2014
4
Price Modeling: The right level of data aggregation
Variables definition to limit projections
Store-level data often includes a lot of “noise”
Another source of bias is projecting the impact
which occurs because of random variations in
of activity that occurs in a small set of stores to
sales that are not due to any marketing or
a larger store set. The bias occurs in parameter
pricing decisions, but instead are organic e.g.
estimates for the %-of-stores kind of variables
construction activity near a store could drive
(e.g. ACV), which are then projected to 100
sales down, change in temperature around a
percent of the stores.
store may drive sales up etc. Such noise in the
Such a bias can be avoided by breaking the
variable that captures such activity into multiple
variables that capture different levels of activity.
For instance, instead of one ACV Display variable
you must create separate variables for 0 percent
to 20 percent ACV of display, 20 percent to 40
percent ACV of display, 40 percent to 60 percent
ACV of display, etc. By doing this, the models
provide separate parameter estimates for
different levels of activity, avoiding the need to
project the parameter estimate from smaller
level to a larger level of activity, and hence
avoiding the bias itself.
data may disturb the parameter estimate and
make it inaccurate or unstable. In other words,
the parameter estimates tend to be inaccurate
when we look at such disaggregated data, even
when these estimates are mostly free from any
aggregation bias.
This begs the question, how helpful and
reliable are estimates from store level data
that are free from aggregation bias, but tend
to be inaccurate themselves?
Table 2 suggests that the better approach is to use
data aggregated for homogenous entities.
Table 2: Issues with different levels of data
5. Problems with store-level data
Researchers often state that sales estimation
and forecasts are more precise when done using
Accuracy of
estimate
Store-level
data
estimates from the analysis of market-level
data. Why is this the case when we know that
the estimates from store-level data analysis are
free from aggregation bias?
Copyright © Fractal 2013 – 2014
Aggregated
Store level
Data
Estimates can
be inaccurate
and unstable,
given the noise
Estimates are
accurate and
stable
Aggregation Bias
No aggregation Bias
Homogeneous: No
aggregation bias
Heterogeneous:
Avoid aggregation
bias by limiting the
projection
5
Price Modeling: The right level of data aggregation
6. Proposed Framework
Framework for Assessing Alignment and Marketing Decision Making Focus
Focus of
Marketing
Decision Making
Decision Granularity,
Cadence and Modeling
approach
Data used for Models
Assessment of Alignment
• Data is available
Store-level data for modeling
and estimates at the account
level
Decision at the account
level & weekly decisions
• Estimates not biased
• Uncommon Situation
• Risk of noise
• Data is available
Aggregated store-level data
for modeling and estimates at • Estimates not biased
the account level
• Uncommon situation
Price and
promotion
• Data is available
Decision at retailermarket level & weekly
decisions
Store-level data for modeling
at retailer-market level
• Estimates not biased
• Common situation
• Risk of noise
Aggregated store-level data
for modeling at retailermarket level
Media spend
• Data is available
• Estimates not biased
• Common situation
• Data is available
Regional level across
accounts
Store-level data for modeling
and estimates at the regional
level
• Estimates not biased
• Uncommon situation
• Risk of noise
• Data is available
Aggregated store-level data
for modeling and estimates at • Estimates not biased
the regional level
• Common situation
Copyright © Fractal 2013 – 2014
6
Price Modeling: The right level of data aggregation
Table 3: Overall Trade-off Matrix
Market level
Retailer level
Retailer Market
Level
Store level
Aggregation Bias
May appear
because of
heterogeneity
Very limited bias
given the stores
within a retailer
are largely
homogeneous
No bias given the
stores are
homogeneous in
pricing decisions
No bias as there is
no store aggregation
Data Availability
Yes
Yes
Yes
Not readily
accessible
Cost
Low
Moderate
High
Very High
Implementation
Quick
Easy
Comprehensive
Tedious
In conclusion, we suggest the following guiding
principles:
 Use the level of aggregation at which pricing
decisions are made, and business planning is
done. This ensures homogeneity and avoids
aggregation bias
References
Marcus Christen, Sachin Gupta, John C. Porter,
 Avoid projecting the impact of a narrow
promotional activity to a much wider set
Richard Staelin, Dick R. Wittink, “Using Market-
 Store-level data is not always the best
solution and it is certainly not required to
mitigate aggregation bias
Nonlinear Model”, Journal of Marketing
 Store level data can suffer from noise which is
hard to model
 For pricing decisions the ideal level of
aggregation is either Retailer or RetailerMarket level
Copyright © Fractal 2013 – 2014
level Data to Understand Promotions Effects in a
Research (1997), 322-334
Steven Tenn, “Estimating Promotional Effects
with Retailer-Level Scanner Data” (2003)
Ross Link, “Are aggregate scanner data models
biased?” (1995)
7
Price Modeling: The right level of data aggregation
About Fractal Analytics
by research advisor Gartner.
Fractal Analytics is a global analytics firm
Fractal Analytics has also been recognized
that serves Fortune 500 companies gain a
competitive advantage by providing them a
deep understanding of consumers and tools
to improve business efficiency. Producing
accelerated analytics that generate datadriven decisions, Fractal Analytics delivers
insight, innovation and impact through
predictive analytics and visual story-telling.
for its rapid growth, being ranked on the
exclusive Inc. 5000 list for the past three
years and also being named among the
USPAACC’s Fast 50 Asian-American owned
businesses for the past two years.
For more information, contact us at:
+1 650 378 1284
info@fractalanalytics.com
Fractal Analytics was in founded in 2000
and has 700 people in 12 offices around the
world serving clients in over 100 countries.
Authors
Amit Gupta
Fractal Analytics is backed by TA Associates,
VP Growth – Tech,
a global growth private equity firm, and
amitg@fractalanalytics.com
recently partnered with Aimia, a global
LinkedIn
loyalty and consumer insights firm.
Hari Haran
The company has earned recognition by
VP Global Consulting − CPG, Retail and FSI,
industry analysts and has been named one
harih@fractalanalytics.com
of the top five “Cool Vendors in Analytics”
LinkedIn