Asymmetric Information, Adverse Selection and Seller Revelation on eBay Motors ∗ Gregory Lewis

Asymmetric Information, Adverse Selection and
Seller Revelation on eBay Motors∗
Gregory Lewis†
Department of Economics
University of Michigan
December 28, 2006
Abstract
Since the pioneering work of Akerlof (1970), economists have been aware of the adverse
selection problem that asymmetric information can create in durable goods markets. The
success of eBay Motors, an online auction market for used cars, thus poses something of a
puzzle. I argue that the key to resolving this puzzle is the auction webpage, which allows
sellers to reduce information asymmetries by revealing credible signals about car quality
in the form of text, photos and graphics. To validate my claim, I develop a new model of
a common value eBay auction with costly seller revelation and test its implications using
a unique panel dataset of car auctions from eBay Motors. Precise testing requires a structural estimation approach, so I develop a new pseudo-maximum likelihood estimator for
this model, accounting for the endogeneity of information with a semiparametric control
function. I find that bidders adjust their valuations in response to seller revelations, as the
theory predicts. To quantify how important these revelations are in reducing information
asymmetries, I simulate a counterfactual in which sellers cannot post information to the
auction webpage and show that in this case a significant gap between vehicle value and
expected price can arise. Since this gap is dramatically narrowed by seller revelations, I conclude that information asymmetry and adverse selection on eBay Motors are endogenously
reduced by information transmitted through the auction webpage.
∗
I would like to thank the members of my thesis committee, Jim Levinsohn, Scott Page, Lones Smith and
especially Pat Bajari for their helpful comments. I have also benefited from discussions with Jim Adams,
Tilman B¨
orgers, Han Hong, Kai-Uwe K¨
uhn, Serena Ng, Nicola Persico, Dan Silverman, Jagadeesh Sivadasan,
Doug Smith, Charles Taragin and seminar participants at the Universities of Michigan and Minnesota. All
remaining errors are my own.
†
Email: gmlewis@umich.edu
1
Introduction
eBay Motors is, at first glance, an improbable success story. Ever since Akerlof’s classic paper,
the adverse selection problems created by asymmetric information have been linked with their
canonical example, the used car market. One might think that an online used car market,
where buyers usually don’t see the car in person until after purchase, couldn’t really exist.
And yet it does - and it prospers. eBay Motors is the largest used car market in America,
selling over 15000 cars in a typical month. This poses a question: How does this market limit
asymmetric information problems?
The answer, this paper argues, is that the seller may credibly reveal his private information
through the auction webpage. Consider Figure 1, which shows screenshots from an eBay
Motors auction for a Corvette Lingenfelter. The seller has provided a detailed text description
of the features of the car and taken many photos of both the interior and the exterior. He has
in addition taken photographs of original documentation relating to the car, including a series
of invoices for vehicle modifications and an independent analysis of the car’s performance. This
level of detail appears exceptional, but it is in fact typical in most eBay car auctions for sellers
to post many photos, a full text description of the car’s history and features, and sometimes
graphics showing the car’s condition.
I find strong empirical evidence that buyers respond to these seller revelations. The data
is consistent with a model of costly revelation, in which sellers weigh costs and benefits in
deciding how much detail about the car to provide. An important prediction of this model is
that bidders’ prior valuations should be increasing in the quantity of information posted on the
webpage. I find that this is the case. Using the estimated structural parameters, I simulate a
counterfactual in which sellers cannot reveal their private information on the auction webpage.
In that case a significant gap arises between the value of the vehicle and the expected price.
I quantify this price-value gap, showing that seller revelations dramatically narrow it. Since
adverse selection occurs when sellers with high quality cars are unable to receive a fair price
for their vehicle, I conclude that these revelations substantially limit asymmetric information
and thus the potential for adverse selection.
My empirical analysis of this market proceeds through the following stages. Initially, I run
a series of hedonic regressions to demonstrate that there is a large, significant and positive
relationship between auction prices and the number of photos and bytes of text on the auction
webpage. To explain this finding, I propose a stylized model of a common values auction
on eBay with costly revelation by sellers, and subsequently prove that in equilibrium, there
is a positive relationship between the number of signals revealed by the seller and the prior
1
Figure 1: Information on an auction webpage On this auction webpage, the seller has provided
many different forms of information about the Corvette he is selling. These include the standardized
eBay description (top panel), his own full description of the car’s options (middle panel), and many
photos (two examples are given in the bottom panel). The right photo is of the results of a car
performance analysis done on this vehicle.
2
valuation held by bidders. In the third stage, I test this prediction. A precise test requires
that I structurally estimate my eBay auction model, and I develop a new pseudo-maximum
likelihood estimation approach to do so. I consider the possibility that my information measure
is endogenous, providing an instrument and a semiparametric control function solution for
endogeneity. The final step is to use the estimated parameters of the structural model to
simulate a counterfactual and quantify the impact of public seller revelations in reducing the
potential for adverse selection.
The first of these stages requires compiling a large data set of eBay Motors listings for a 6 month
period, containing variables relating to item, seller and auction characteristics. I consider two
measures of the quantity of information on an auction webpage: the number of photographs,
and the number of bytes of text. Through running a series of hedonic regressions, I show that
these measures are significantly and positively correlated with price. The estimated coefficients
are extremely large, and this result proves robust to controls for marketing effects, seller size
and seller feedback. This suggests that it is the webpage content itself rather than an outside
factor that affects prices, and that the text and photos are therefore genuinely informative. It
leaves open the question of why the quantity of information is positively correlated with price.
I hypothesize that this is an artifact of selection, where sellers with good cars put up more
photos and text than those selling lemons.
To formally model this, in the second stage I propose a stylized model of an eBay auction.
Bidders are assumed to have a common value for the car, since the common values framework
allows for bidder uncertainty, and such uncertainty is natural in the presence of asymmetric
information.1 Sellers have access to their own private information, captured in a vector of
private signals, and may choose which of these to reveal. The seller’s optimal strategy is
characterized by a vector of cutoff values, so that a signal is revealed if and only if it is
sufficiently favorable. In equilibrium, the model predicts that bidders’ prior valuation for the
car being sold is increasing in the number of signals revealed by the seller.
I test this prediction in the third stage by structurally estimating the common value auction
model. It is necessary to proceed structurally because the prediction is in terms of a latent
variable - the bidder’s prior value - and is thus only directly testable in a structural model.
Strategic considerations arising from the Winner’s Curse preclude an indirect test based on the
relationship between price and information measures. In estimating the model, I propose a new
pseudo-maximum likelihood estimation approach that addresses two of the common practical
concerns associated with such estimation. One of the concerns is that in an eBay context it is
1
I also show that this assumption finds support in the data. In an extension, to be found in the appendix, I
test the symmetric common values framework used here against a symmetric private values alternative. I reject
the private values framework at the 5% level.
3
difficult to determine which bids actually reflect the bidder’s underlying valuations, and which
bids are spurious. Using arguments similar to those made by Song (2004) for the independent
private values (IPV) case, I show that in equilibrium the final auction price is in fact equal
to the valuation of the bidder with second most favorable private information. I obtain a
robust estimator by maximizing a function based on the observed distribution of prices, rather
than the full set of observed bids. The second concern relates to the computational cost of
computing the moments implied by the structural model, and then maximizing the resulting
objective function. I show that by choosing a pseudo-maximum likelihood approach, rather
than the Bayesian techniques of Bajari and Horta¸csu (2003) or the quantile regression approach
of Haile, Hong, and Shum (2003), one is able to use a nested loop procedure to maximize
the objective function that limits the curse of dimensionality. This approach also provides
transparent conditions for the identification of the model.
The theory predicts that bidders respond to the information content of the auction webpage,
but this is unobserved by the econometrician. Moreover, under the theory, this omitted content
is certainly correlated with quantitative measures of information such as the number of photos
and number of bytes of text. To address this selection problem I adopt a semiparametric
control function approach, forming a proxy for the unobserved webpage content using residuals
from a nonparametric regression of the quantity of information on exogenous variables and an
instrument. I exploit the panel data to obtain an instrument, using the quantity of information
provided for the previous car sold by that seller. As the theory predicts, I find that the quantity
of information is endogenous and positively correlated with the prior mean valuation of bidders.
My results suggest that the information revealed on the auction webpage reduces the information asymmetry between the sellers and the bidders. One would thus expect that seller
revelations limit the potential for adverse selection. My final step is to investigate this conjecture with the aid of a counterfactual simulation. I simulate the bidding functions and compute
expected prices for the observed regime and a counterfactual in which sellers cannot reveal
their private information through the auction webpage. In the counterfactual regime, sellers
with a “peach” - a high quality car - cannot demonstrate this to bidders except through private
communications. The impact of removing this channel for information revelation is significant,
driving a wedge between the value of the car and the expected price paid by buyers. I find
that a car worth 10% more than the average car with its characteristics will sell for only 6%
more in the absence of seller revelation. This gap disappears when sellers can demonstrate the
quality of their vehicle by posting text, photos and graphics to the auction webpage.
I make a number of contributions to the existing literature. Akerlof (1970) showed that in a
market with information asymmetries and no means of credible revelation, adverse selection
4
can occur. Grossman (1981) and Milgrom (1981) argued that adverse selection can be avoided
when the seller can make ex-post verifiable and costless statements about the product he is
selling. I consider the intermediate case where the seller must pay a cost for credibly revealing
his private information and provide structural analysis of a market where this is the case.
Estimation and simulation allow me to quantify by how much credible seller revelation reduces
information asymmetries and thus the potential for adverse selection. The contributes to the
sparse empirical literature on adverse selection which has until now focused mainly on insurance
markets (Cohen and Einav (2006), Chiappori and Salanie (2000)).
In the auctions literature Milgrom and Weber (1982) show that in a common value auction
sellers should commit a priori to revealing their private information about product quality if
they can do so. But since private sellers typically sell a single car on eBay Motors, there is no
reasonable commitment device and it is better to model sellers as first observing their signals
and then deciding whether or not to reveal. This sort of game is tackled in the literature on
strategic information revelation. The predictions of such models depend on whether revelation
is costless or not: in games with costless revelation, Milgrom and Roberts (1986) have shown
that an unravelling result obtains whereby sellers reveal all their private information. By
contrast, Jovanovic (1982) shows that with revelation costs it is only the sellers with sufficiently
favorable private information that will reveal. Shin (1994) obtains a similar partial revelation
result in a setting without revelation costs by introducing buyer uncertainty about the quality
of the seller’s information. My model applies these insights in an auction setting, showing
that with revelation costs, equilibrium is characterized by partial revelation by sellers. I also
extend to the case of multidimensional private information, showing that the number of signals
revealed by sellers is on average positively related to the value of the object. This relates the
quantity of information to its content.
The literature on eBay has tended to focus on the seller reputation mechanism. Various authors
have found that prices respond to feedback ratings, and have suggested that this is due to the
effect of these ratings in reducing uncertainty about seller type (Resnick and Zeckhauser (2002),
Melnik and Alm (2002), Houser and Wooders (2006)). I find that on eBay Motors the size
of the relationship between price and information measures that track webpage content is far
larger than that between price and feedback ratings. This suggests that in analyzing eBay
markets with large information asymmetries, it may also be important to consider the role
of webpage content in reducing product uncertainty. I contribute to the literature here by
providing simple information measures, and showing that they explain a surprisingly large
amount of the underlying price variation in this market.
Finally, I make a number of methodological contributions with regard to auction estimation.
5
Bajari and Horta¸csu (2003) provide a clear treatment of eBay auctions, providing a model which
can explain late bidding and a Bayesian estimation strategy for the common values case. Yet
their model implies that an eBay auction is strategically equivalent to a second price sealed bid
auction, and that all bids may be interpreted as values. I incorporate the insights of Song (2004)
in paying more careful attention to bid timing and proxy bidding, and show that only the final
price in an auction can be cleanly linked to the distribution of bidder valuations. I provide
a pseudo maximum likelihood estimation strategy that uses only the final price from each
auction, and is therefore considerably more robust. There are also computational advantages
to my common values estimation approach relative to that of Bajari and Horta¸csu (2003) and
the quantile regression approach of Hong and Shum (2002). One important potential problem
with all these estimators lies in not allowing for unobserved heterogeneity and endogeneity. For
the independent private values case, Krasnokutskaya (2002) and Athey, Levin, and Seira (2004)
have provided approaches for dealing with unobserved heterogeneity. Here I develop conditions
under which a semiparametric control function may be used to deal with endogeneity provided
an appropriate instrument can be found. In sum, I provide a robust and computable estimator
for common value eBay auctions. I also provide an implementation of the test for common
values suggested by Athey and Haile (2002) in the appendix, using the subsampling approach
of Haile, Hong, and Shum (2003).
This research leads me to three primary conclusions. First, webpage content in online auctions
informs bidders and is an important determinant of price. Bidder behavior is consistent with a
model of endogenous seller revelation, where sellers only reveal information if it is sufficiently
favorable, and thus bidders bid less on objects with uninformative webpages. Second, quantitative measures of information serve as good proxies for content observed by buyers, but not the
econometrician. This follows from the theory which yields a formal link between the number
of signals revealed by sellers and the content of those signals. Finally, and most important,
the auction webpage allows sellers to credibly reveal the quality of their vehicles. Although
revelation costs prevent full revelation, information asymmetries are substantively limited by
this institutional feature and the gap between values and prices is considerably smaller than it
would be if such revelations were not possible. I conclude that seller revelations through the
auction webpage reduce the potential for adverse selection in this market.
The paper proceeds as follows. Section 2 outlines the market, and section 3 presents the reduced form empirical analysis. Section 4 introduces the auction model with costly information
revelation, while section 5 describes the estimation approach. The estimation results follow in
section 6. Section 7 details the counterfactual simulation and section 8 concludes.2
2
Proofs of all propositions, as well as most estimation results, are to be found in the appendix.
6
2
eBay Motors
eBay Motors is the automobile arm of online auctions giant eBay. It is the largest automotive
site on the Internet, with over $1 billion in revenues in 2005 alone. Every month, approximately
95000 passenger vehicles are listed on eBay, and about 15% of these will be sold on their first
listing.3 This trading volume dwarfs those of its online competitors, the classified services
cars.com, autobytel.com and Autotrader.com. In contrast to these sites, most of the sellers
on eBay Motors are private individuals, although dealers still account for around 30% of the
listings.
Listing a car on eBay Motors is straightforward. For a fixed fee of $40, a seller may post a
webpage with photos, a standardized description of the car, and a more detailed description
that can include text and graphics. The direct costs of posting photos, graphics and text
are negligible: text and graphics are free, while each additional photo costs $0.15. But the
opportunity costs are considerably higher, as it is time-consuming to take, select and upload
photos, write the description, generate graphics, and fill in the forms required to post all of
these to the auction webpage.4
Figure 2 shows part of the listing by a private seller who used eBay’s standard listing template.
In addition to the standardized information provided in the table at the top of the listing, which
includes model, year, mileage, color and other features, the seller may also provide his own
detailed description. In this case the seller has done poorly, providing just over three lines of
text. He also provided just three photos. This car, a 1998 Ford Mustang with 81000 miles on
the odometer, received only four bids, with the highest bid being $1225.
Figure 3 shows another 1998 Ford Mustang, in this case with only 51000 miles. The seller is a
professional car dealership, and has used proprietary software to create a customized auction
webpage. This seller has provided a number of pieces of useful information for buyers: a text
based description of the history of the car, a full itemization of the car features in a table, a
free vehicle history report through CARFAX, and a description of the seller’s dealership. This
listing also included 28 photos of the vehicle. The car was sold for $6875 after 37 bids were
placed on it.
The difference between these outcomes is striking. The car listed with a detailed and informative webpage sold for $1775 more than the Kelley Blue Book private party value, while that
with a poor webpage sold for $3475 less than the corresponding Kelley Blue Book value. This
3
The remainder are either re-listed (35%), sold elsewhere or not traded. For the car models in my data, sales
are more prevalent, with about 30% sold on their first listing.
4
Professional car dealers will typically use advanced listing management software to limit these costs. Such
software is offered by CARad (a subsidiary of eBay), eBizAutos and Auction123.
7
Figure 2: Example of a poor auction webpage This webpage consists mainly of the standardized information required for a listing on eBay. The seller has provided little additional information
about either the car or himself. Only three photos of the car were provided. In the auction, the seller
received only four bids, with the highest bid being $1225. The Kelley Blue Book private party value,
for comparison, is $4700.
Figure 3: Example of a good auction webpage The dealer selling this vehicle has used
proprietary software to create a professional and detailed listing. The item description seen on the
right, contains information on the car, a list of all the options included, warranty information, and a
free CARFAX report on the vehicle history. The dealer also posted 28 photos of the car. This car
received 37 bids, and was sold for $6875. The Kelley Blue Book retail price, for comparison, is $7100,
while the private party value is $5100.
8
Figure 4: Additional information The left panel shows a graphic that details the exterior condition of the vehicle. The right panel shows the Kelley Blue Book information for the model-year of
vehicle being auctioned.
suggests that differences in the quality of the webpage can be important for auction prices.
How can webpages differ? Important sources of differences include the content of the textbased description, the number of photos provided, the information displayed in graphics, and
information from outside companies such as CARFAX. Examples of the the latter two follow
in Figure 4. In the left panel I show a graphic showing the condition of the exterior of the
car. In the right panel, I show the Kelley Blue Book information posted by a seller in order to
inform buyers about the retail value of the vehicle.
It is clear that many of these pieces of information, such as photos and information from outside
institutions, are hard to fake. Moreover, potential buyers have a number of opportunities to
verify this information. Potential buyers may also choose to acquire vehicle history reports
through CARFAX or Autocheck.com, or purchase a third party vehicle inspection from an
online service for about $100. Bidders may query the seller on particular details through
direct communication via e-mail or telephone. Finally, blatant misrepresentation by sellers
is subject to prosecution as fraud. In view of these institutional details, I will treat seller
revelations as credible throughout this paper. All of these sources of information - the text,
photos, graphics and information from outside institutions, as well as any private information
acquired - act to limit the asymmetric information problem faced by bidders. The question
I wish to examine is how important these sources of information are for auction prices. This
question can only be answered empirically, and so it is to the data that I now turn.
9
3
Empirical Regularities
3.1
Data and Variables
I collected data from eBay Motors auctions over a six month period.5 I drop observations with
nonstandard or missing data, re-listings of cars, and those auctions which received less than
two bids.6 I also drop auctions in which the webpage was created using proprietary software.7
The resulting dataset consists of over 18000 observations of 18 models of vehicle. The models of
vehicle may be grouped into three main types: those which are high volume Japanese cars (e.g.
Honda Accord, Toyota Corolla), a group of vintage and newer “muscle” cars (e.g. Corvette,
Mustang), and most major models of pickup truck (e.g. Ford F-series, Dodge Ram).
The data contain a number of item characteristics including model, year, mileage, transmission
and title information. I also observe whether the vehicle had an existing warranty, and whether
the seller has certified the car as inspected.8 In addition, I have data on the seller’s eBay
feedback and whether they provided a phone number and address in their listing. I can also
observe whether the seller has paid an additional amount to have the car listing “featured”,
which means that it will be given priority in searches and highlighted.
Two of the most important types of information that can be provided about the vehicle by the
seller are the vehicle description and the photos. I consider two simple quantitative measures
of these forms of information content. The first measure is the number of bytes of text in the
vehicle description provided by the seller. The second measure is the number of photos posted
on the webpage. I use both of these variables in the hedonic regressions in the section below.
5
I downloaded the auction webpages for the car models of interest every twenty days, recovered the urls for
the bid history, and downloaded the history. I implemented a pattern matching algorithm in Python to parse
the html code and obtain the auction variables.
6
I do this because in order to estimate my structural auction model, I need at least two bids per auction.
Examining the data, I find that those auctions with less than two bids are characterized by extremely high
starting bids (effectively reserves), but are otherwise similar to the rest of the data. One respect in which
there is a significant difference is that in these auctions the seller has provided less text and photos. This is in
accordance with the logic detailed in the rest of the paper.
7
Webpages created using proprietary software are often based on standardized templates, and some of the
text of the item description is not specific to the item being listed. Since one of my information measures is the
number of bytes of text, comparisons of standard eBay listings with those generated by advanced software are
not meaningful.
8
This certification indicates that the seller has either hired an outside company to perform a car inspection.
The results of the inspection can be requested by potential buyers.
10
3.2
Relating Prices and Information
In order to determine the relationship between the amount of text and photos and the final
auction price, I run a number of hedonic regressions. Each specification has the standard form:
log(pj ) = zj β + ε
(1)
where zj is a vector of covariates.9 I estimate the relationship via ordinary least squares (OLS),
including model and title fixed effects.10 I report the results in Table 1, suppressing the fixed
effects.
In the first specification, the vector of covariates includes standard variables, such as age,
mileage and transmission, as well as the log number of bytes of text and the number of photos.
The coefficients generally have the expected sign and all are highly significant.11 Of particular
interest is the sheer magnitude of the coefficients on the amount of text and photos. I find
that postings with twice the amount of text sell for 8.7% more, which for the average car in
the dataset is over $800 more. Likewise, each additional photo on the webpage is associated
with a selling price that is 2.6% higher - a huge effect.
I am not asserting a causal relationship between the amount of text and photos and the auction
price. My claim is that buyers are using the content of the text and photos to make an informed
decision as to the car’s value. The content of this information includes the car’s options, the
condition of the exterior of the vehicle, and vehicle history and usage, all of which are strong
determinants of car value. Moreover, as I will show in the structural model below, buyers will
in equilibrium interpret the absence of information as a bad signal about vehicle quality, and
adjust their bids downwards accordingly. I will also show that the amount of information is an
effective proxy for the content of that information. Thus my interpretation of this regression
is that the large coefficients on text and photos arise because these measures are proxying for
attributes of the vehicle that are unobserved by the econometrician.
In the next four regressions, I explore alternate explanations. The first alternative is that this
is simply a marketing effect, whereby slick webpages with a lot of text and photos attract
more bidders and thus the cars sell for higher prices. I therefore control for the number of
bidders, and also add a dummy for whether the car was a featured listing. I find that adding
these controls does reduce the information coefficients, but they still remain very large. A
9
I use the log of price instead of price as the dependent variable so that it has range over the entire real line.
The seller may report the title of the car as clean, salvage, other or unspecified. The buyer may check this
information through CARFAX.
11
One might have expected manual transmission to enter with a negative coefficient, but I have a large number
of convertible cars and pickups in my dataset, and for these, a manual transmission may be preferable.
10
11
Table 1: Hedonic Regressions
Age
Age Squared
Log of Miles
Log of Text Size (in bytes)
Number of Photos
Manual Transmission
(1)
-0.1961
(0.0019)
0.0039
(0.0000)
-0.1209
(0.0041)
0.0852
(0.0073)
0.0269
(0.0009)
0.0349
(0.0124)
(2)
-0.1935
(0.0018)
0.0038
(0.0000)
-0.1225
(0.0040)
0.0686
(0.0072)
0.0240
(0.0009)
0.0348
(0.0122)
0.2147
(0.0211)
0.0355
(0.0015)
(3)
-0.1931
(0.0018)
0.0038
(0.0000)
-0.1221
(0.0040)
0.0816
(0.0075)
0.0241
(0.0009)
0.0328
(0.0122)
0.2112
(0.0210)
0.0356
(0.0015)
-0.0211
(0.0033)
-0.0031
(0.0015)
(4)
-0.1924
(0.0018)
0.0038
(0.0000)
-0.1215
(0.0040)
0.0794
(0.0076)
0.0234
(0.0009)
0.0338
(0.0122)
0.2113
(0.0211)
0.0355
(0.0015)
-0.0200
(0.0033)
-0.0031
(0.0015)
-0.0020
(0.0015)
0.1151
(0.0123)
-0.0498
(0.0183)
9.7097
(0.0995)
9.5918
(0.0980)
9.5746
(0.0980)
9.5704
(0.0980)
Featured Listing
Number of Bidders
Log Feedback
Percentage Negative Feedback
Total Number of Listings
Phone Number Provided
Address Provided
Warranty
Warranty*Logtext
Warranty*Photos
Inspection
Inspection*Logtext
Inspection*Photos
Constant
(5)
-0.1875
(0.0020)
0.0037
(0.0000)
-0.1170
(0.0041)
0.0978
(0.0085)
0.0260
(0.0010)
0.0319
(0.0121)
0.2125
(0.0210)
0.0357
(0.0015)
-0.0179
(0.0033)
-0.0032
(0.0015)
-0.0019
(0.0015)
0.1193
(0.0122)
-0.0510
(0.0182)
0.8258
(0.1638)
-0.0738
(0.0248)
-0.0189
(0.0031)
0.6343
(0.1166)
-0.0654
(0.0176)
-0.0054
(0.0022)
9.2723
(0.1025)
Estimated standard errors are given in parentheses. The model fits well, with the R2 ranging from 0.58 in (1)
to 0.61 in (5). The nested specifications (2)-(4) include controls for marketing effects, seller feedback and dealer
status. Specification (5) includes warranty and inspection dummies and their interactions with the information
measures.
12
second explanation is that the amount of text and photos are somehow correlated with seller
feedback ratings, which are often significant determinants of prices on eBay. I find not only
that including these controls has little effect on the information coefficients, but that for this
specific market the effects of seller reputation are extremely weak. The percentage negative
feedback has a very small negative effect on price, while the coefficient on total log feedback
is negative, which is the opposite of what one would expect. A possible reason for these
results is that for the used car market, the volume of transactions for any particular seller is
small and this makes seller feedback a weak measure of seller reputation. Another problem is
that feedback is acquired through all forms of transaction, and potential buyers may only be
interested on feedback resulting from car auctions.
Returning to the question of alternative explanations, a third possibility is that frequent sellers
such as car dealerships tend to produce better webpages, since they have stronger incentives to
develop a good template than less active sellers. Then if buyers prefer to buy from professional
car dealers, I may be picking up this preference rather than the effects of information content.
To control for this, I include the total number of listings by the seller in my dataset as a
covariate, as well as whether they posted a phone number and an address. All of these measures
track whether the seller is a professional car dealer or not. I report the results in column
(4). I find that although posting a phone number is valuable (perhaps because it facilitates
communication), both the number of listings and the address have negative coefficients. That
is, there is little evidence that buyers are willing to pay more to buy from a frequent lister
or a firm that posts its address. Again, the coefficients on text and photos remain large and
significant.
Finally, I try to test my hypothesis that text and photos matter because they inform potential
buyers about car attributes that are unobserved by the econometrician. To do this, I include
dummies for whether the car is under warranty, and has been certified by the seller as inspected.
Both of these variables should reduce the value of information, because the details of car
condition and history become less important if one knows that the car is under warranty or
has already been certified as being in excellent condition. In line with this idea, I include
interaction terms between the warranty and inspection dummies and the amount of text and
photos. The results in column (5) indicate that the interaction terms are significantly negative
as expected, while both the inspection and warranty significantly raise the expected price of
the vehicle. For cars that have neither been inspected nor are under warranty, the coefficients
on text and photos increase.
There are problems with a simple hedonic regression, and in order to deal with the problem
of bids being strategic and different from the underlying valuations, I will need a structural
13
model. I will also need to demonstrate that my measures of the quantity of information are
effective proxies for the content of that information. Yet the reduced form makes the point
that the amount of text and photos are positively related to auction price, and that the most
compelling explanation for this is that their content gives valuable information to potential
buyers.
4
Theory
Modeling demand on eBay is not a trivial task. The auction framework has a bewildering
array of institutional features, such as proxy bidding, secret reserves and sniping software.
With this in mind, it is critical to focus on the motivating question: how does seller revelation
of information through the auction webpage affect bidder beliefs about car value, and thereby
behavior and prices?
I make three important modeling choices. The first is to model an auction on eBay Motors as a
symmetric common value auction. In this model, agents have identical preferences, but are all
uncertain as to the value of the object upon which they are bidding. The bidders begin with
a common prior distribution over the object’s value, but then draw different private signals,
which generates differences in bidding behavior. This lies in contrast to a private values model,
in which agents are certain of their private valuation, and differences in bidding behavior result
from differences in these private valuations.
I believe that the common value framework is better suited to the case of eBay Motors than
the private values framework for a number of reasons. First, it allows bidders to be uncertain
about the value of the item they are bidding on, and this uncertainty is necessary for any model
that incorporates asymmetric information and the potential for adverse selection. Second, the
data support the choice of a common values model. I show this in the appendix, where I
implement the Athey and Haile (2002) test of symmetric private values against symmetric
common values, and reject the private value null hypothesis at the 5% level.12 Third, it
provides a clean framework within which one may think clearly about the role of the auction
webpage in this market. The auction webpage is the starting point for all bidders in an online
auction, and can thus be modeled as the source of the common prior that bidders have for the
object’s value. I can identify the role of information on eBay Motors by examining its effect
on the distribution of the common prior. Finally, though market participants on eBay Motors
12
The test is based on the idea that the Winner’s Curse is present in common values auctions, but not
in private values auctions. This motivates a test based on stochastic dominance relationships between bid
distributions as the number of bidders n, and thus the Winner’s Curse, increases.
14
will in general have different preferences, it may be reasonable to assume that those bidders
who have selected into any particular auction have similar preferences. This is captured in the
common values assumption.13
The second modeling choice is to use a modified version of the Bajari and Horta¸csu (2003)
model of late bidding on eBay. On eBay Motors, I observe many bids in the early stages of the
auction that seem speculative and unreflective of underlying valuations; while in the late stages
of the auction, I observe more realistic bids. With late bidding, bidders often do not have time
to best respond to opponent’s play. “Sniping” software can be set to bid on a player’s behalf
up to one second before the end of the auction, and bidders cannot best respond to these late
bids. Bajari and Horta¸csu (2003) capture these ideas in a two stage model, in which bidders
may observe each other’s bids in the early stage, but play the second stage as though it were
a sealed bid second price auction. I vary from their model by taking account of the timing of
bids in the late stage, since a bid will only be recorded if it is above the current price.14
Accounting for timing is important, as an example will make clear. Suppose that during the
early stage of an auction bidder A makes a speculative bid of $2000. As the price rises and it
becomes obvious this bid will not suffice, she programs her sniping software to bid her valuation
of $5600 in the last five seconds of an auction. Now suppose that bidder B bids $6000 an hour
before the end of the auction, and so when bidder A’s software attempts to make a bid on her
behalf at the end of the auction, the bid is below the standing price and hence not recorded.
The highest bid by bidder A observed during the auction is thus $2000. An econometrician
who treats an eBay auction as simply being a second price sealed bid auction may assume that
bidder A’s valuation of the object is $2000, rather than $5600. This will lead to inconsistent
parameter estimates. I show that regardless of bid timing, the final price in the auction with
late bidding is the valuation of the bidder with the second highest private signal. This allows
me to construct a robust estimation procedure based on auction prices, rather than the full
distribution of bids.
Lastly, I allow for asymmetric information, and let the seller privately know the realizations
of a number of signals affiliated with the value of the car. One may think of these signals as
describing the car history, condition and features. The seller may choose which of these signals
to reveal on the auction webpage, but since writing detailed webpages and taking photos is
costly, such revelations are each associated with a revelation cost. In line with my earlier
description of the institutional details of eBay Motors, all revelations made by the seller are
13
One could consider a more general model of affiliated values, incorporating both common and private value
elements. Unfortunately, without strong assumptions, such a model will tend to be unidentified.
14
Song (2004) also makes this observation, and uses it to argue that in the independent private values context
(IPV), the only value that will be recorded with certainty is that of the second highest bidder.
15
modeled as being credible. This assumption prevents the model from being a signaling model
in the style of Spence (1973).15 Instead, this setup is similar to the models of Grossman (1981)
and Milgrom (1981), in which a seller strategically and costlessly makes ex-post verifiable
statements about product quality to influence the decision of a buyer. In these models, an
unravelling result obtains, whereby buyers are skeptical and assume the worst in the absence
of information, and consequently sellers reveal all their private information.
My model differs in three important respects from these benchmarks. First, I allow for revelation costs. This limits the incentives for sellers to reveal information, and thus the unravelling
argument no longer holds. Second, the parties receiving information in my model are bidders,
who then proceed to participate in an auction, inducing a multi-stage game. Finally, I allow
the seller’s private information to be multidimensional. While each of these modifications has
received treatment elsewhere16 there is no unified treatment of revelation games with all these
features, and attempting a full analysis of the resulting multi-stage game is beyond the scope of
this paper. Instead I provide results for the case in which the relationship between the seller’s
private information and the car’s value is additively separable in the signals. This assumption
will be highlighted in the formal analysis below.
4.1
An Auction Model with Late Bidding
Consider a symmetric common values auction model, as in Wilson (1977). There are N
symmetric bidders, bidding for an object of unknown common value V . Bidders have a common
prior distribution for V , denoted F . Each bidder i is endowed with a private signal Xi , and
these signals are conditionally i.i.d., with common conditional distribution G|v. The variables
V and X = X1 , X2 · · · Xn are affiliated.17 The realization of the common value is denoted
v, and the realization of each private signal is denoted xi . The realized number of bidders
n is common knowledge.18 One may think of the bidders’ common prior as being based on
the content of the auction webpage. Their private signals may come from a variety of sources
15
In those games, the sender may manipulate the signals she sends in order to convince the receiver that she
is of high type. Here the sender is constrained to either truthfully reveal the signal realization, or to not reveal
it at all.
16
See for example Jovanovic (1982) for costly revelation, Okuno-Fujiwara, Postlewaite, and Suzumura (1990)
for revelations in a multi-stage game and Milgrom and Roberts (1986) for multidimensional signal revelation.
17
This property is equivalent to the log supermodularity of the joint density of V and X. A definition of log
supermodularity is provided in the appendix.
18
This assumption is somewhat unrealistic, as bidders in an eBay auction can never be sure of how many
other bidders they are competing with. The alternative is to let auction participants form beliefs about n, where
those beliefs are based on a first stage entry game, as in Levin and Smith (1994). But dealing with endogenous
entry in this way has certain problems in an eBay context (see Song (2004)), and thus I choose to go with the
simpler model and make n common knowledge.
16
such as private communications with the seller through e-mail or phone and the results of
inspections they have contracted for or performed in person.
Bidding takes place over a fixed auction time period [0, τ ]. At any time during the auction, t,
the standing price pt is posted on the auction website. The standing price at t is given by the
second highest bid submitted during the period [0, t). Bidding takes place in two stages. In
the early stage [0, τ − ), the auction takes the form of an open exit ascending auction. Bidders
may observe the bidding history, and in particular observe the prices at which individuals drop
out of the early stage. In the late stage [τ − , τ ], all bidders are able to bid again. This
time, however, they don’t have time to observe each other’s bidding behavior. They may each
submit a single bid, and the timing of their submission is assumed to be randomly distributed
over the time interval [τ − , τ ].19 If their bid bit is lower than the standing price pt , their bid is
not recorded. The following proposition summarizes the relevant equilibrium behavior in this
model:
Proposition 1 (Equilibrium without Revelation) There exists a symmetric Bayes-Nash
equilibrium in which:
(a) All bidders bid zero during the early stage of the auction.
(b) All bidders bid their pseudo value during the late stage, where their pseudo value is defined
by
v(x, x; n) = E[V |Xi = x, Y = x, N = n]
(2)
and Y = maxj6=i Xj is the maximum of the other bidder’s signals.
(c) The final price is pτ is given by
pτ = v(x(n−1:n) , x(n−1:n) ; n)
(3)
where x(n−1:n) denotes the second highest realization of the signals {Xi }.
The idea is that bidders have no incentive to bid a positive amount during the early stage of
the auction, as this may reveal their private information to the other bidders. Thus they bid
only in the late stage, bidding as in a second price sealed bid auction. The second highest bid
will certainly be recorded, and I obtain an explicit formula for the final price in the auction as
the bidding function evaluated at the second highest signal. This formula will form the basis
of my estimation strategy later in the paper. In the model thus far, though, I have not allowed
the seller to reveal his private information. I extend the model to accommodate information
asymmetry and seller revelation in the next section.
19
Allowing bidders to choose the timing of their bid would not change equilibrium behavior, and would
unnecessarily complicate the argument.
17
4.2
Information Revelation
Let the seller privately observe a vector of real-valued signals S = S1 · · · Sm with common
bounded support [s, s]. As with the bidder’s private information, these signals are conditionally
i.i.d with common distribution H|v and are conditionally independent of the bidder’s private
signals X1 · · · Xn . Let V and S = S1 · · · Sm be affiliated. The seller may report his signals to
the bidders prior to the start of the auction, and bidders treat all such revelations as credible.
The seller’s objective is to maximize his expected revenue from the auction. A (pure strategy)
reporting policy for the seller is a mapping R : Rm → {0, 1}m , where reporting the signal is
denoted 1 and not reporting is denoted 0. Letting s = s1 · · · sm be a realization of the signal
vector S = S1 · · · Sm , define I = #{i : Ri (s) = 1} as the number of signals reported by the
seller. Finally, let the seller face a fixed revelation cost ci for revealing the signal Si , and let
the associated cost vector be c. Revelation costs are common knowledge.
This setup generates a multi-stage game in which the seller first makes credible and public
statements about the object to be sold, and then bidders update their priors and participate
in an auction. Since each revelation by the seller is posted to the auction webpage, there
is a natural link between the number of signals I and my empirical measures of information
content, the number of photos and bytes of text on the webpage.
Define the updated pseudo value w(x, y, s; n) = E[V |X = x, Y = y, S = s, N = n] where
Y = maxj6=i Xj is the maximum of the other bidder’s signals. Then in the absence of revelation
costs, the equilibrium reporting policy for the seller is to reveal all his information:
Proposition 2 (Unravelling) For c = 0, there exists a sequential equilibrium in which the
seller reports all his signals (R(s) = 1 ∀s), bidders bid zero in the early stage of the auction,
and bid their updated pseudo value w(x, x, s; n) in the late stage.
This result seems counter-intuitive. It implies that sellers with “lemons” - cars with a bad
history or in poor condition - will disclose this information to buyers, which runs counter to
the intuition behind models of adverse selection. It also implies that the quantity of information
revealed on the auction webpage should be unrelated to the value of the car, since sellers will
fully disclose, regardless of the value of the car. Since I know from the hedonic regressions that
price and information measures are positively related, it suggests that the costless revelation
model is ill suited to explaining behavior in this market. Revealing credible information is
a costly activity for the seller, and thus sellers with “lemons” have few incentives to do so.
I can account for this by considering the case with strictly positive revelation costs, but this
generates a non-trivial decision problem for the seller. In order to make the equilibrium analysis
tractable, I make a simplifying assumption.
18
Assumption 1 (Constant Differences) The function w(x, y, s; n) exhibits constant differences in si for all (x, y, s−i ). That is, there exist functions ∆i (s1 , s0 ; n), i = 1 · · · m, such
that:
∆i (s1 , s0 ; n) = w(x, y, s−i , s1 ; n) − w(x, y, s−i , s0 ; n) ∀(x, y, s−i )
This assumption says that from the bidder’s point of view, the implied value of each signal
revealed by the seller is independent of all other signals the bidder observes.20 The extent to
which this assumption is reasonable depends on the extent to which car features, history and
condition are substitutes or complements for value. Under this assumption, one may obtain a
clean characterization of equilibrium.
Proposition 3 (Costly Information Revelation) For c > 0, there exists a sequential
equilibrium in which:
(a) The reporting policy is characterized by a vector of cutoffs t = t1 · · · tm with

1
Ri (s) =
0
if si ≥ ti
otherwise
where ti = inf {t : E [∆i (t, t0 ; n)|t0 < t] ≥ ci }.
(b) Bidders bid zero during the early stage of the auction, and bid
vR (x, x, sR ; n) = E[w(x, x, s; n)|{si < ti }i∈U ]
during the late stage, where sR is the vector of reported signals and U = {i : R(si ) = 0}
is the set of unreported signals.
(c) The final auction price is pτ = vR (x(n−1:n) , x(n−1:n) , sR ; n).
In contrast to the result with costless revelation, this equilibrium is characterized by partial
revelation, where sellers choose to reveal only sufficiently favorable private information. Bidders
(correctly) interpret the absence of information as a bad signal about the value of the car, and
adjust their bids accordingly. Then the number of signals I revealed by the seller is positively
correlated with the value of the car.
Corollary 1 (Expected Monotonicity) E[V |I] is increasing in I.
I prove the result by showing that the seller’s equilibrium strategy induces affiliation between
V and I. This result is useful for my empirical analysis, as it establishes a formal link between
20
A sufficient condition for the constant differences assumption to holdP
is that the conditional mean E[V |X, S]
is additively separable in the signals S, so that E[V |X = x, S = s] = m
j=1 fj (sj ) + g(x) for some increasing
real-valued functions fj and g.
19
the number of signals I, quantified in the number of photos and bytes of text on the auction
webpage, and the content of that information, the public signals S. It also provides a testable
prediction: that the prior mean E[V |I] should be increasing in the amount of information
posted. I now proceed to outline the strategy for estimating this auction model.
5
Auction Estimation
Estimating a structural model is useful here. First, I am able to account for the strategic
interaction between bidders and uncover the relationship between valuations and item characteristics, rather than price and characteristics. Second, I can test the model’s implications
precisely. The model predicts a positive relationship between the conditional mean E[V |I]
and I. This in turn implies that E[p|I] is increasing in I, but that positive relationship could
also hold because better informed bidders are less afraid of the Winner’s Curse and therefore
bid more.21 Estimating the structural model allows a test of the model prediction that is free
of strategic considerations on the part of bidders. Finally, and most important, structural
estimation permits simulation of counterfactuals. This will allow me to quantify the impact of
the auction webpage in reducing information asymmetries and adverse selection below.
Yet estimating a common values auction is a thorny task. There are two essential problems.
Firstly, a number of model primitives are latent: the common value V , and the private signals
X1 · · · Xn . Secondly, the bidding function itself is highly non-linear and so even in a parametric model computing the moments of the structural model is difficult and computationally
intensive. Addressing endogeneity in addition to the above problems is especially tricky. My
estimation approach is able to deal with all of these problems and remain computationally
tractable, and thus may be of independent interest. Two different approaches have been tried
in recent papers. Bajari and Horta¸csu (2003) employ a Bayesian approach with normal priors
and signals, and estimate the posterior distribution of the parameters by Markov Chain Monte
Carlo (MCMC) methods. Hong and Shum (2002) use a quantile estimation approach with
log normal priors and signals, and obtain parameter estimates by maximizing the resulting
quantile objective function.
My estimation approach, which is based on maximizing a pseudo-likelihood function, has a
number of advantages relative to these types of estimators. The main advantage lies in the
computational simplicity of my approach. I am able to show that my estimator may be
21
The Winner’s Curse is the result that bidders in a common value auction will be “cursed” if they bid their
conditional valuation of the object E[V |Xi ], as they will only win when their bid exceeds everyone else’s. In this
event, they also had the highest signal, which suggests that they were over-optimistic in their initial valuation.
Rational bidders therefore shade their bids down, always bidding below their conditional valuation.
20
maximized through a nested loop, in which the inner loop runs weighted least squares (WLS)
and the outer loop is a standard quasi-Newton maximization algorithm. For most applications,
the outer loop will search over a low dimensional space, while the inner loop may be of almost
arbitrarily high dimension since the maximizer may be computed analytically. This avoids the
curse of dimensionality, and allows me to include over thirty regressors in the structural model
while still maximizing the objective function in less than thirty minutes.22 By contrast, the
existing estimation techniques would require an MCMC implementation in a thirty dimensional
parameter space, and it would typically take days of computation to get convergence of the
chain to the stationary distribution of the parameters.23
A second advantage is that my estimation approach permits a semiparametric control function
method for dealing with endogeneity. Here I consider the case where precisely one variable is
endogenous and show that if the relationship between the endogenous variable and the source
of the endogeneity is monotone, a nonparametric first stage can be used to estimate a proxy
variable that controls for endogeneity. I note that the econometric theory given in Andrews
(1994a) guarantees the consistency and asymptotic normality of the resulting estimator, even
in the complicated non-linear setting of a common value auction.
The third advantage is eBay specific, and lies in the fact that I identify the model using only
variation in auction prices, the number of bidders n, and the covariates. This contrasts with
papers that use all bids from each auction, which can lead to misinterpretation. Identification
is also extremely transparent in this setup, which is not the case with the other estimators. I
now provide the specifics of the estimation approach.
5.1
Specification
The theory provides a precise form for the relationship between the final price in each auction
and the bidders’ priors and private signals. In order to link price to the auction covariates
such as item characteristics and information, I first specify a parametric form for the interim
prior distribution F |z and the conditional signal distribution G|v.24 I may then specify the
relationship between the moments of these distributions and auction covariates z. Let F |z
be the log-normal distribution, and let G|v be the normal distribution centered at v˜ = log v.
22
Most of these are fixed effects, but this approach could accommodate thirty continuous regressors.
This remark applies to the quantile estimation approach as well, since the objective function is not smooth
and therefore MCMC is the considered to be the best approach for maximizing it. See Chernozhukov and Hong
(2003) for details.
24
By “interim prior”, I mean the updated prior held by bidders after viewing the webpage but before obtaining
their private signal.
23
21
Then we have:
v˜ = µ + σεv ∼ N (µ, σ 2 )
xi |˜
v = v˜ + rεi ∼ N (˜
v, r2 )
where εv , εi are i.i.d standard normal random variables for all i.25 Three variables form the
primitives of the auction model: the interim prior mean µ, the interim prior standard deviation
σ, and the signal standard deviation r.
Next I link the primitives (µ, σ, r) to the auction covariates. As argued earlier, the information
posted on the auction webpage changes the common prior distribution over values for all
bidders. Consequently, I allow the mean and standard deviation of F , µ and σ, to vary with
auction covariates z. I include in z my empirical measures of information content, the number
of photos and bytes of text. The signal variance parameter r is treated as a constant, as the
private information is by definition orthogonal to the publicly available information captured
in the covariates. The chosen specification takes the index form:
µj = αzj
σj = κ(βzj )
where κ is an increasing function with strictly positive range, so that κ(x) > 0 ∀x.26 This
specification yields admissible values for σj .
5.2
Method
The above specification governs the relationship between the parameter vector θ = (α, β, r),
bidder prior valuations, and bidder signals. Prices are a complex function of these parameters,
since they are equal to the equilibrium bid of the bidder with the second highest signal, an
order statistic. Thus even though I have a fully specified parametric model for valuations
and signals, the distribution of prices is hard to determine. This prevents a straightforward
maximum likelihood approach. I can however compute the mean and variance of log prices
implied by the structural model, E[log p|n, zj , θ] and Ω[log p|n, zj , θ].27 Notice that these
predicted moments vary with the number of bidders n. Computing these moments allows me to
25
The choice of the log normal distribution for v is motivated by two considerations. First, all valuations
should be positive, so it is important to choose a distribution with strictly positive support. Second, the log prices
observed in the data are approximately normally distributed, and since the price distribution is generated by
the value distribution, this seems a sensible choice. The choice of a normal signal distribution is for convenience,
and seems reasonable given that the signals are latent and unobserved.
26
For computational reasons, κ(·) is based on the arctan function - specifics are given in the appendix.
27
Explicit formulae for these moments are to be found in the appendix.
22
estimate the model by pseudo-maximum likelihood(Gourieroux, Monfort, and Trognon 1984).
In this approach, I maximize a pseudo-likelihood function derived from a quadratic exponential
family. Since the observed price distribution is approximately log normal, and the log normal
distribution is quadratic exponential, I choose it as the basis of my pseudo likelihood function
and maximize the criterion function:
L(θ) = (−1/2)
J X
j=1
(log pj − E[log p|n, zj , θ])2
log(Ω[log p|n, zj , θ]) +
Ω[log p|n, zj , θ]
(4)
b is both consistent and asymptotiProvided that the model is identified, the PML estimator θ,
cally normal (Gourieroux, Monfort, and Trognon 1984).28 To see why the model is identified,
consider the case where there are no covariates apart from n, and the econometrician observes
the prices from repeatedly auctioning objects with values drawn from a common prior. The
number of bidders n in each auction is determined prior to the auction. The aim is to identify
the model primitives, the mean µ and standard deviation σ of the prior distribution, and the
signal standard deviation r. One may identify the prior mean µ from the mean prices of the
objects sold, and the variance in prices pins down σ and r though it does not separately identify them. For this, you need variation in the number of bidders n, which separately identifies
these parameters through variation in the Winner’s Curse effect.29
5.3
Computation
Due to the large number of covariates, the parameter space is of high dimension, and it is
possible that the problem could quickly become computationally intractable. Fortunately, the
following result eases the computational burden:
Proposition 4 (Properties of the Bidding Function) Let v(x, x; n, µ, σ, r) be the bidding
function under model primitives (µ, σ, r), and let v(x − a, x − a; n, z, µ − a, σ, r) be the bidding function when the signals of all bidders and the prior mean are decreased by a. Then
log v(x, x; n, µ, σ, r) = a + log v(x − a, x − a; n, µ − a, σ, r) and consequently:
E[log p|n, zj , α, β, r] = αzj + E[log p|n, z, α = 0, β, r]
(5)
Ω[log p|n, zj , α, β, r] =
(6)
Ω[log p|n, zj , α = 0, β, r]
28
Those familiar with the auctions estimation literature may be concerned about parameter support dependence, a problem noted in Donald and Paarsch (1993). It can be shown that under this specification the
equilibrium bid distribution has full support on [0, ∞), so this problem does not arise here.
29
The global identification condition is that if E[log p|n, zj , θ0 ] = E[log p|n, zj , θ1 ] and Ω[log p|n, zj , θ0 ] =
Ω[log p|n, zj , θ1 ] ∀(n, zj ) then θ0 = θ1 . I have numerically verified this condition for a range of parameter values.
23
This result is extremely useful. To see this, fix the parameters β and r, and consider finding
the parameter estimate α
b(β, r) that maximizes the criterion function (4). Using the result of
the above proposition, we can write this parameter estimate as:
α
b(β, r) = Argmin
α
J
X
j=1
1
(log pj − E[log p|n, zj , α = 0, β, r] − αz)2
Ω[log p|n, zj , α = 0, β, r]
which for fixed β and r can be solved by weighted least squares (WLS) with the dependent
variable yj = log pj − E[log p|n, zj , α = 0, β, r] and weights wj = 1/Ω[log p|n, zj , α = 0, β, r].
This allows a nested estimation procedure in which I search over the parameter space of (β, r),
using WLS to calculate α
b(β, r) at each step. Since I choose a specification in which most of
the covariates affect the prior mean and not the variance, this space is considerably smaller
than the full space of (α, β, r), and the computation is much simpler. Further computational
details are to be found in the appendix.
5.4
Endogeneity
In the above analysis, differences between observed prices and those predicted by the structural
model result from the unobserved deviations of the latent value and bidder signals from their
mean values. There is no unobserved source of heterogeneity between cars with identical
observed characteristics. Yet I know that in fact cars sold on eBay differ in many respects
that are not captured by the variables observed by the econometrician, such as in their history
or features. When sellers have favorable private information about the car they are selling,
they will reveal such information on the auction webpage. I capture the dollar value of the
webpage content in a variable ξ. But included in my covariates z are quantitative measures
of information, and these will be correlated with the omitted webpage content value ξ. This
may cause an endogeneity problem. This section investigates when this will be the case, and
provides a control-function based solution if needed.
In this section I will assume that I have a scalar endogenous information measure I.30 For
my application, I must choose between using the number of bytes of text and the number of
photos as this measure. I choose to use the text measure because I have a large number of
missing observations of the number of photos. Fortunately, the amount of text and photos are
highly correlated in the data and thus the amount of text will serve as a good measure of the
information content of the webpage by itself.31
30
I suspect that extending the approach to deal with a vector of endogenous variables is possible, but showing
this is beyond the scope of the paper.
31
The correlation is 0.35, which is higher than the correlation between age and mileage (0.26).
24
Partition the auction covariates so that zj = [Ij , zj0 ] where zj0 is the standardized characteristic
data observed by the econometrician, and Ij is the number of bytes of text on the auction
webpage. Now, following the theory, consider a new model specification in which the mean
and standard deviation of the interim prior distribution F |sR depend on the vector of reported
signals, sR :
where ξj =
P
sRi ∈z
/ j0
µj = φµ sRj = αzj + ξj
(7)
σj = φσ sRj =
(8)
κ (βzj )
φiµ si . In other words, ξ gives the contribution of the signals observed by
the bidders but not the econometrician to the mean of the interim prior. I will sometimes
refer to ξ as the content value of the webpage. I specify that the unobserved signals make no
contribution to the standard deviation, since it is unusual in demand estimation to account
for this possibility.32 At the cost of more notation these results can be extended to cover this
case.
Under the theoretical model, the information measure I is the number of signals si above a
threshold ti . It is therefore reasonable to believe that I is positively correlated with ξ. The
information measure I will thus be a valid proxy for the contribution of the webpage content
ξ to the interim prior mean valuation µ, under standard assumptions:
Assumption 2 (Redundancy) The information measure I is redundant in the true specification of the interim prior mean. That is, µj = E[log V |Ij , zj0 , ξj ] = E[log V |zj0 , ξj ].
Assumption 3 (Conditional Independence) The conditional mean E[ξ|Ij ] is independent
of the observed characteristics zj0 .
The redundancy assumption is quite plausible here, as I believe that it is the webpage content
rather than the quantity of information that matters for the interim prior valuation of bidders.
The conditional independence assumption is standard for a non-linear proxy, and essentially
says that when I is included as a regressor, the remaining covariates do not generate an
endogeneity problem. This sort of assumption is typical of the demand estimation literature,
in which only the potential endogeneity of price is typically addressed. With these assumptions
in place, the amount of information I - or more generally an unknown function of it - is a valid
proxy for the contribution of the omitted content to bidder priors. This justifies the estimation
approach in the above sections where I included information measures as regressors without
accounting for endogeneity.
32
This does not imply that the signals observed by bidders are not informative - it just implies that the prior
variance does not vary with the realization of the signals.
25
Arguing that the potentially endogenous variable I is simply redundant seems to make sense
here, but is clearly not an acceptable solution to the more general problem of endogeneity
in common value auction estimation. I next show how a control function approach can be
used to control for endogeneity when assumption 2 fails but assumption 3 is maintained so
that there is only one endogenous variable. Suppose that I have a valid instrument W for the
endogenous variable I, so that W is independent of ξ but correlated with I. I will need two
final assumptions:
Assumption 4 (Additive Separability) The endogenous variable I is an additively separable function of the observed characteristics z, instrument W , and the omitted variable ξ:
Ij = g(zj0 , Wj ) + h(ξj )
Assumption 5 (Monotonicity) The function h is strictly monotone.
These are relatively weak assumptions. For my particular application, additive separability is
plausible since the signals are conditionally independent, and so it is not unreasonable to believe
that I depends in a separable fashion on the observed characteristics z and the unobserved
content value ξ. Monotonicity is also reasonable, as when the seller has favorable information,
the theoretical model predicts that he will reveal it.
Now, define mj as the residual Ij − g(zj0 , Wj ). Then since mj is monotone in ξj , it is clear that
some function of mj could be used to control for ξj . I may estimate mj as the residuals from
a first stage nonparametric estimation procedure in which I regress Ij on the instrument Wj
and other car characteristics zj0 .33 I may then use the estimated residuals m
cj as a control for
ξ in the second stage regression. I summarize this result in a proposition:
Proposition 5 (Endogeneity) Under assumptions 3, 4 and 5, there exists a control function
f such that:
(a) E[log p|n, z, m, θ] = αz + E[log p|n, z, α = 0, β, r] + f (m)
(b) Ω[log p|n, z, m, θ] does not depend on ξ
Under technical conditions needed to ensure uniform convergence of the estimator, the PML
estimate θb obtained by maximizing the analogue of (4) under the above moment specification
remains both consistent and asymptotically normal. The relevant conditions are provided by
Andrews (1994a). Andrews (1994b) shows that if the first stage nonparametric estimation
33
This approach uses ideas seen in Blundell and Powell (2003), Newey, Powell, and Vella (1999) and Pinkse
(2000). I also consider a simpler first stage OLS procedure and present the results with this specification in the
results section below.
26
procedure is kernel based, then most of these conditions can be satisfied by careful choice of
the kernel, bandwidth and trimming. A more complete discussion of these technical issues is
deferred to the appendix.
To obtain an instrument, I exploit the fact that I have panel data. My instrument W is the
number of bytes of text in the description of the previous car sold by that seller. The idea is
that the quantity of information posted on the auction webpage for the previous car should
be independent of the value of the webpage content for the current car being sold. This relies
on the unobserved content of the auction webpages ξ being randomly assigned over time for a
given seller. This is reasonably plausible, as each car listed by a given dealer is likely to have
idiosyncratic features and history. On the other hand, the data shows a strong relationship
between the amount of information posted for the current car, I, and that for the previous car
W . This makes W a valid instrument. I estimate the fitted residuals m
b using kernel regression
with a multivariate Gaussian kernel, bandwidth choice by cross-validation and trimming as in
Andrews (1994b). I implement the control function f as a second-order polynomial in m.
b 34
Regardless of whether or not I is endogenous in the econometric sense, the control function
f (m) should be a better proxy than I for ξ, as it is orthogonal to the contribution of observed
characteristics z 0 and firm specific effects captured in W that are included in I but irrelevant
for ξ. My model thus predicts that the control function will be significant in the regression.
In the following section, I present the results of tests for significance and analyze the results of
the structural estimation.
6
Results
The results of the demand estimation follow in Tables 3, 4, 5 and 6, to be found in the
appendix.35 In the first table, I consider estimation of demand without controlling for endogeneity. As noted earlier, this approach will be valid under assumptions 2 and 3, in which
case the number of bytes of text will be a valid proxy for the unobserved webpage content.
I consider five specifications, corresponding to those given in the hedonic regressions earlier
in the paper. Throughout this section, one should note that model and title fixed effects are
included but not reported.
34
As is usual with control function approaches, one cannot be sure what the appropriate form of the control
function is, and so a flexible function is used. I have verified that the pseudo log likelihood does not change
significantly with the use of higher order or orthogonal (Legendre) polynomials, and so this simpler quadratic
model was chosen.
35
In all cases, standard errors are obtained from estimates of the asymptotic variance-covariance matrix. I
do not account for first stage error in constructing the standard errors for the parameters estimated using the
control function.
27
The coefficients on µ give the estimated relationship between the prior valuation and the
covariates. The coefficients on logged regressors may be seen as elasticities, while a one unit
change in an unlogged regressor i has a 100αi % effect on the prior valuation. It is immediately
apparent that age and mileage have significant effects on valuation in all specifications. Ignoring
the curvature introduced by the age squared term, an additional year of age is expected to
decrease bidder’s valuation by 21%, while every percentage increase in the mileage implies an
approximately 0.125% decrease in the price.
As in the hedonic regressions, the effect of the information measures on µ are significant in all
specifications and of large magnitude. I find that each additional percentage of text implies a
percentage increase of between 0.047% and 0.061% in the price (ignoring the final specification,
which includes interactions). For a typical car in the dataset, this implies that a percentage
increase in the amount of text supplied is associated with an increase in the price of about $5,
which is a big effect. But when one thinks of the range of seller revelations possible, including
important details such as car features and history, this is not really that surprising. There is
also a positive and highly significant relationship between the number of photos and the prior
valuation. I find that each photo is associated with an approximately 1.8% increase in the
prior valuation, which amounts to $180 for a typical car in my dataset. These results are as
predicted in Corollary 1, and therefore consistent with the theoretical model.
The additional covariates included in specifications (2)-(5) have coefficients of similar size and
magnitude to those in the hedonic regressions. Of interest is the sizable coefficient on featured
listings, which imply that featured cars are valued by buyers 19% more. Since these returns
vastly outstrip the costs of such a listing, one should be concerned about endogeneity. I do not
include this regressor in later estimations. Finally, I find that as before, the coefficient on the
interactions between the warranty and inspection dummies and my information measures are
significant and negative. This is consistent with the idea that the value of webpage content
such as vehicle history and photos is less valuable while the vehicle is still under warranty.
The second half of the table on the following page contains the coefficients on σ and the
constant bidder signal standard deviation r. These are harder to interpret. Note firstly that
the coefficient on text size is not significant, suggesting that the text does little to reduce prior
uncertainty about the quality of the car. The remaining coefficients for σ are significant at
the 5% level.36 To interpret the coefficients, I turn to the marginal effects. A extra year of
age is associated with a 0.013 increase in the standard deviation of the interim prior, while an
additional percentage of text is associated with a 0.00007 decrease in that deviation. Photos do
36
The link function used to convert rIN DEX into r implies that the T-test here is against the null that
r = 1.25. So the estimate for r is relatively precise, even though rIN DEX has a large standard error.
28
far more to reduce uncertainty, with each additional photo associated with a 0.0067 decrease in
the interim prior deviation. The photo content of auction webpages thus improves the precision
of bidder’s prior estimates and help them to avoid the winner’s curse.
One may wish to quantify the magnitude of this winner’s curse effect. To get at this I need
the estimated marginal effect of σ on price. I find that an increase of 1 standard deviation in
the prior uncertainty is associated with a 10% decrease in price. Thus each additional photo,
through decreasing the prior uncertainty, is expected to lead to a 0.067% increase in price, or
about $7 for a typical car. This effect, it is clear, is dwarfed by the direct effect of the photo
content. I can also use the estimates of r and σ to back out how heavily bidders will weight
their private signal as opposed to the public information in forming their posterior estimate of
car value. The relevant statistic is the ratio r/(r + σ), which indicates the weight on the public
information. This is estimated to be around 0.6 in all specifications, indicating that private
communications between buyers and sellers, while important, are less heavily weighted than
webpage content in forming posterior valuations.
In Table 4, I examine how these effects vary for different models of car. I group cars into
three groups, one consisting of “classic” cars such as the Mustang and Corvette, one consisting
of “reliable cars” such as the Honda Accord and Civic, and one consisting solely of pickup
trucks (such as the Ford F Series and GM Silverado). The vehicles in the “classic” group are
considerably older than those in the other two groups, at an average of 24 years old versus 10
years for the reliable cars and 15 for the pickup trucks. A number of differences arise. The
predicted relationship between mileage, age and price varies for the three groups. The difference
is especially marked in the case of classic cars, where mileage has a considerably strongly
negative effect. This may be because vintage cars in near mint condition command much
higher premiums. I also find differences in the coefficient on transmission, which presumably
reflects a preference for manual transmission when driving a Corvette or Mustang, but not
when driving a pickup.
Most important, I find that the coefficients on the information measures vary considerably.
For classic cars, where vehicle history, repair and condition are of great importance, the text
size and photo coefficients are both large. On the other hand, for the reliable and largely
standardized cars such as the Accord and Toyota, these effects are much less. The coefficients
for pickups fall somewhere in between, perhaps because pickups have more options (long bed,
engine size) for a given model, and more variation in wear-and–tear than do the reliable cars.
I also find that the prior standard deviation varies considerably among groups, even after
accounting for age, text and photos. There is considerable prior uncertainty about the value
of classic cars, and far less for reliable cars, which makes sense. I do find though that the
29
uncertainty is quickly increasing in age for reliable cars, which is somewhat surprising. This
may be a local effect, which cannot be extrapolated beyond a neighborhood of the “mean car”.
Finally, I find that the weighting ratio between public and private information is similar for
classic cars and pickups, at around 0.6, but is higher for reliable cars at around 0.65. This
makes sense since these cars have less idiosyncratic features and bidders will therefore weight
private signals less heavily.
In Table 5, I present a comparison of the outcomes of the structural regression with and without a non-linear control function (modeled as a quadratic). The second specification gives
the control constructed with a non-parametric first stage and bandwidth h chosen by cross
validation. The third and fourth specifications are a robustness check, showing that the coefficients remain similar when I vary the bandwidth. The final specification considers a control
function constructed with an OLS first stage, for comparison. The number of observations is
smaller here (around 7000, as opposed to 35000 for the preceding regressions). This is because
construction of the instrument requires a panel of at least two observations for any given seller,
and the first observation of any panel is dropped due to lack of an instrument. In all specifications I use the log text size as my information measure, using only one such measure because
I only have an instrument for this variable.
One of the predictions of the theory is that the amount of information is an endogenous
variable, in that it is chosen strategically.37 This prediction is validated here, where I find that
the controls are jointly significant in all specifications. For example, I conduct a likelihood
ratio test on the exclusion of all controls, using the log likelihoods from the first and second
specifications. I compute the likelihood ratio test statistic LR = 19.88, which exceeds the
chi-squared critical value χ20.999 = 13.81 at the 0.1% level and thus conclude that the controls
are jointly significant. After including the control function, I find a corresponding decrease in
the coefficient on text size. It is somewhat surprising though that it remains significant and
positive. I explore this in more detailed regressions below. Finally, note that the remaining
coefficients remain largely unchanged after introducing the control function. This makes sense
since the control is orthogonal to the other characteristics z 0 by construction.
I consider a specification with controls and different car groups in Table 6. This specification
shows that the surprisingly large coefficient on log text, even after including controls, in Table
5 was due to the classic car group. For the other two car groups, the effect of the amount
of text is statistically indistinguishable from zero, after including controls. For both of these
groups, the controls show up as jointly significant. This validates the model prediction that it
37
Though it need not cause an endogeneity problem in the econometric sense - see the discussion in the section
on endogeneity.
30
is webpage content, rather than the amount of information that matters.
Yet for classic cars, both the controls and the amount of text are significant and large in
magnitude. A possible explanation for this lies in the nature of the instrument used. In order
for my instrument to be valid, I need random assignment of the content value ξ across cars
sold by the same seller. For car dealerships, who typically deal in high volume cars including
pickups and reliable cars, this assumption is reasonable. On the other hand, many of the sellers
of classic cars on eBay buy them elsewhere and restore them. In such a case, much of the value
of the car is wrapped up in the restoration process, and the unobserved content ξ will have a
large seller-specific term. The coefficient on log text may be picking up this effect.
The main results are as follows. First, the predictions of the costly revelation model hold
up in the data, across all specifications. Second, the fact that logtext is not significant after
controlling for endogeneity in two of the three specifications suggests that the amount of text
proxies for webpage content, rather than having any independent effect of its own. Third, I
find that webpage content has an important role in explaining prior valuations, which suggests
that seller revelation is important for informing bidders in this market.
7
Counterfactual Simulation
One of the strengths of structural estimation is the ability to perform counterfactual simulations
of market outcomes. In this section I return to the idea that motivated the paper, and quantify
the magnitude by which seller revelations through the auction webpage help to reduce the
potential for adverse selection. My tool in this endeavor is a stark counterfactual: a world in
which eBay stops sellers from posting their own information on the auction webpage, so that
only basic and standardized car characteristic data is publicly available.
This is hardly an idle comparison. Local classified advertisements typically provide only basic
information, while even some professional websites do not allow sellers to post their own
photos.38 In the case of local purchases, of course, potential buyers will view the cars in
person and so their private signals will be much more precise. This is typically not the case on
eBay Motors. I will show that even when private signals reflect the true value of the vehicle,
the less precise prior valuations held by bidders in the counterfactual world will lead them to
systematically underbid on “peaches” and overbid on “lemons”. Relative to the counterfactual
then, seller revelations through the auction webpage help to limit differences between prices
and valuations, and reduce adverse selection.
38
The South African version of Autotrader, autotrader.co.za, is such an example. Even on large online websites
such as cars.com and autobytel.com, it is unusual to find multiple photos for a vehicle.
31
Formally, let the data generating process (DGP) be as specified in the structural model presented earlier. For simplicity, I assume throughout this section that there is no unobserved
heterogeneity or endogeneity, so that I have access to all the information needed to simulate the
DGP. Since bidders in fact learn more from the auction webpage than I capture in my coarse
information measures, this assumption will lead me to underestimate the role of seller revelation in my simulation results. Recall that values are log normally distributed, so that v˜ = log v
p
has normal distribution with mean µ = E[˜
v |I, z 0 ] and standard deviation σ = V ar[˜
v |I, z 0 ].
Bidders independently observe unbiased and normally distributed private signals of the log
value v˜, with signal standard deviation r.
Now consider two regimes. In the true regime, bidders observe the auction webpage and
form a prior valuation based on the standardized characteristic data z 0 and the amount of
information I. Their prior is correct in equilibrium, and thus parameterized by µ and σ.
In the counterfactual regime, the auction webpage contains only the standardized data and
so bidders have a less informative prior. This prior is log normal, and parameterized by
p
v |z 0 ]. Bidders still obtain independent and unbiased private
µc = E[˜
v |z 0 ] and σ c = V ar[˜
signals in both regimes.
To see how prices respond to seller revelation in the equilibrium and in the counterfactual
case, I simulate the bidding function and compute expected prices under both regimes for a
random sample of 1000 car auctions in my dataset. For the equilibrium regime, this is relatively
straightforward. The estimated structural parameters imply estimates of the model primitives
µ, σ and r for each data point, and I can compute the expected price E[p|I, z 0 ] predicted by the
model. To get the expected price under the counterfactual, E[pc |I, z 0 ], I estimate µc and σ c .
I do this by integrating out I from E[˜
v |I, z 0 ] and V ar[˜
v |I, z 0 ] over the conditional distribution
of I given z 0 . This gives me the counterfactual prior, and allows me to simulate the bidding
function for any signal realization xi . I obtain the expected price E[pc |I, z 0 ] by integrating over
the distribution of ordered signal realizations x(n−1:n) specified by the DGP. Since it will be
useful to compare relative prices and values across auctions, I specify a baseline price. I define
the baseline price as the expected price E[pb |z 0 ] which would obtain if the counterfactual prior
specified by µc and σ c was in fact correct, so that µc = µ and σ c = σ. One may think of this
as the case in which the seller is selling a car whose condition is “average” in the sense that
his equilibrium revelations would not cause bidders to revise their priors at all. Notice that
under the DGP the distribution of values depends on I, and thus so do the private signals and
the expected prices. This is the sense in which private signals reveal the true value of the car
even in the counterfactual world.39
39
Additional details about the simulation are given in the appendix.
32
Rel. Price !%"
15
10
5
!15
!10
!5
!5
5
10
15
Rel. Value !%"
!10
!15
Figure 5: Simulation Results Relative value is defined as a function of the ratio E[V |I, z 0 ]/E[V |z 0 ],
so that cars with a high relative value are revealed through seller revelations to be “peaches”, while the
opposite implies the car is a lemon. The blue dots show the percentage by which the simulated prices
for these cars exceed a baseline price E[pb |z 0 ] under the true regime. The green dots show the relative
price under the counterfactual in which sellers cannot reveal their private information on the auction
webpage, but buyers still get accurate private signals.
The results are graphically depicted in Figure 5. On the x-axis, I have the relative value of
the car, which I define as the percentage by which the informed prior valuation E[V |I, z 0 ]
exceeds the uninformed prior E[V |z 0 ]. A simple characterization of this is that those cars with
relative value greater than zero are “peaches”, while those with relative value less than zero are
“lemons”. On the y-axis, I have the relative price of the car, which I define as the percentage
by which the expected price E[p|I, z 0 ] or E[pc |I, z 0 ] exceeds the baseline price E[pb |z 0 ]. Since
the baseline price is that for an average car with characteristics z 0 , relative prices should be
positive when the car is better than average (i.e. a peach), and negative when the car is worse
than average (i.e. a lemon). The points in blue are expected prices under the equilibrium
model, while those in green are from the counterfactual simulation.
It is clear after examining Figure 5 that a linear model would describe the relationship between
relative value and relative price quite well in both regimes. Fitting such a model yields an
estimated slope coefficient for the equilibrium model of 1.014., while that for the counterfactual
is 0.609.40 This implies a seller with a car revealed to be worth 10% more than the baseline
can expect to fetch 10.1% more for his car than the baseline price when sold on eBay Motors.
Thus a seller with a peach can post information and inform buyers of the car’s quality and
40
Both are estimated extremely precisely, with t-statistics of 413 and 121 respectively.
33
hence obtain a price premium commensurate with the value of the car.41 By contrast, under
the counterfactual, that same car would fetch only 6% more than the baseline, because the
prior value that potential buyers place on the car is the baseline value. Although bidders
will on average receive favorable private signals indicating that the quality of the car is high,
such signals receive limited weight because they are noisy. The overall effect is that without
revelation on the auction webpage, the seller is on average only able to extract 61% of the
additional value of his vehicle in revenue.
These simulation results make clear the role that public seller revelations can play in reducing
adverse selection in this market. Whenever sellers with high quality cars are unable to extract
a commensurate price premium from buyers because of information asymmetries, there is
the potential for adverse selection. In the counterfactual regime without revelation, the gap
between relative value and relative price is nearly 40%, and this potential is high. It is clear
from my simulations then that seller revelations reduce the potential for adverse selection
significantly.
8
Conclusion
In a world in which ever more esoteric items are sold online, it is important to understand
what makes these markets work. This paper shows that information asymmetries in these
markets can be endogenously resolved, so that adverse selection need not occur. The required
institutional feature is a means for credible information revelation. With this in place, sellers
have both the opportunity and the incentives to remedy information asymmetries between
themselves and potential buyers.
Where such mechanisms for credible information revelation exist, it is clear from my results
that these revelations can be important determinants of buyer behavior and thus prices. The
theory developed in this paper suggests that in the face of revelation costs, sellers will reveal
only the most favorable of their private information. It follows that quantitative measures of
information, such as those provided in this paper, will have predictive power for prices. Future
research could see whether this holds true in other settings, for if it does, this provides a useful
set of proxies for unobserved differences in heterogeneous goods markets with information
asymmetries.
41
In fact, the seller will do slightly better because the additional information reduces uncertainty and the
Winner’s Curse, and thereby raises prices.
34
References
Akerlof, G. A. (1970): “The Market for Lemons: Quality Uncertainty and the Market
Mechanism,” Quarterly Journal of Economics, 84(3), 488–500.
Andrews, D. W. K. (1994a): “Asymptotics for Semiparametric Econometric Models via
Stochastic Equicontinuity,” Econometrica, 62(1), 43–72.
(1994b): “Nonparamatric Kernel Estimation for Semiparametric Models,” Econometric Theory, 10.
Athey, S. (2002): “Monotone Comparative Statics under Uncertainty,” Quarterly Journal of
Economics, 117(1), 187–223.
Athey, S., and P. Haile (2002): “Identification of Standard Auction Models,” Econometrica,
70(6), 2107–2140.
Athey, S., J. Levin, and E. Seira (2004): “Comparing Open and Sealed-Bid Auctions:
Theory and Evidence from Timber Auctions,” mimeo, Stanford University.
Bajari, P., and A. Hortac
¸ su (2003): “The winner’s curse, reserve prices and endogenous
entry: empirical insights from eBay auctions,” Rand Journal of Economics, 34(2), 329–355.
Blundell, R., and J. Powell (2003): “Endogeneity in Nonparametric and Semiparametric
Regressions Models,” in Advances in Economics and Econometrics, ed. by M. Dewatripont,
L. P. Hansen, and S. J. Turnovsky. Cambridge University Press.
Chernozhukov, V., and H. Hong (2003): “An MCMC approach to classical estimation,”
Journal of Econometrics, 115, 293–346.
Chiappori, P.-A., and B. Salanie (2000): “Testing for Asymmetric Information in Insurance Markets,” Journal of Political Economy, 108(1), 56–78.
Cohen, A., and L. Einav (2006): “Estimating Risk Preferences from Deductible Choice,”
forthcoming in American Economic Review.
Donald, S. G., and H. J. Paarsch (1993): “Piecewise Pseudo Maximum Likelihood Estimation in Empirical Models of Auctions,” International Economic Review, 34(1), 121–148.
Gourieroux, C., A. Monfort, and A. Trognon (1984): “Pseudo Maximum Likelihood
Methods: Theory,” Econometrica, 52(3), 681–700.
35
Grossman, S. J. (1981): “The Informational Role of Warranties and Private Disclosures,”
Journal of Law and Economics, 24(3), 461–483.
Haile, P., H. Hong, and M. Shum (2003): “Nonparametric tests for common values at
first-price sealed bid auctions,” NBER Working Paper 10105.
Hong, H., and M. Shum (2002): “Increasing Competition and the Winner’s Curse: Evidence
from Procurement,” Review of Economic Studies, 69, 871–898.
Houser, D., and J. Wooders (2006): “Reputation in Auctions: Theory and Evidence from
eBay,” Journal of Economics and Management Strategy, 15(2), 353–370.
Jovanovic, B. (1982): “Truthful Disclosure of Information,” Bell Journal of Economics,
13(1), 36–44.
Krasnokutskaya, E. (2002): “Identification and Estimation in Highway Procurement Auctions under Unobserved Auction Heterogeneity,” mimeo, University of Pennsylvania.
Levin, D., and J. L. Smith (1994): “Equilibrium in Auctions with Entry,” American Economic Review, 84(3), 585–99.
Linton, O., E. Maasoumi, and Y.-J. Whang (2005): “Consistent Testing for Stochastic
Dominance under General Sampling Schemes,” Review of Economic Studies, 72, 735–765.
Melnik, M., and J. Alm (2002): “Does a Seller’s Ecommerce reputation matter? Evidence
from eBay Auctions,” Journal of Industrial Economics, L(3), 337–349.
Milgrom, P. (1981): “Good News and Bad News: Representation Theorems and Applications,” Bell Journal of Economics, 12(2), 380–391.
Milgrom, P., and J. Roberts (1986): “Relying on the information of interested parties,”
Rand Journal of Economics, 17(1).
Milgrom, P., and R. J. Weber (1982): “A Theory of Auctions and Competitive Bidding,”
Econometrica, 50(5), 1089–1122.
Newey, W. K., J. L. Powell, and F. Vella (1999): “Nonparametric Estimation of Triangular Simultaneous Equations Models,” Econometrica, 67(3), 565–603.
Okuno-Fujiwara, M., A. Postlewaite, and K. Suzumura (1990): “Strategic Information
Revelation,” Review of Economic Studies, 57(1), 25–47.
36
Pinkse, J. (2000): “Nonparametric Two-Step Regression Estimation when Regressors and
Error are Dependent,” Canadian Journal of Statistics, 28(2), 289–300.
Politis, D. N., J. P. Romano, and M. Wolf (1999): Subsampling. Springer, New York.
Resnick, P., and R. Zeckhauser (2002): Trust Among Strangers in Internet Transactions:
Empirical Analysis of eBay’s Reputation Systemvol. 11 of Advances in Applied Microeconomics, chap. 5. Elsevier Science.
Shin, H. (1994): “News Management and the Value of Firms,” Rand Journal of Economics,
25(1), 58–71.
Song, U. (2004): “Nonparametric Estimation of an eBay Auction Model with an Unknown
Number of Bidders,” Working Paper, University of British Columbia.
Spence, M. A. (1973): “Job Market Signalling,” The Quarterly Journal of Economics, 87(3),
355–74.
Wilson, R. (1977): “A Bidding Model of Perfect Competition,” Review of Economic Studies,
4, 511–518.
37
9
Appendix
A. Proof of Proposition 1
The first two results follow from Proposition 1 in Bajari and Horta¸csu (2003), after noting
that the two auction models are strategically equivalent. To obtain the final result, order the
bidders by the realization of their signals beginning with the bidder with the highest signal,
bidder 1. Now, let bidder 2 submit his bid in the late stage at randomly determined time
t2 ∈ [τ − , τ ] . Since bids are monotone in the signals, and the standing price is the second
highest bid submitted on [0, t2 ), under the equilibrium strategies the standing price pt2 cannot
exceed v(x(n−2:n) , x(n−2:n) ; n). Thus the bid of bidder 2, v(x(n−1:n) , x(n−1:n) ; n), will exceed the
standing price and be recorded. At some point t1 in [τ − , τ ], the bid of bidder 1 will likewise
be recorded. Thus at the end of the auction, the final price will be the second highest of the
recorded bids, which is v(x(n−1:n) , x(n−1:n) ; n).
B. Proof of Proposition 2
Sequential equilibrium requires that bidders have consistent beliefs and that their behavior is
sequentially rational. So consider any equilibrium reporting policy R. On path, consistency
requires that bidders believe that reported signals SR are equal to their reports sR ; and that
unreported signals SU are consistent with the reporting policy, so R(S) = r. Now, a report
vector r is off-path if there does not exist a signal realization s such that R(s) = r. In that
case, consistency still requires that bidders believe that reported signals SR are equal to their
reports sR , but places limited restrictions on the unreported signals. I specify that these
beliefs are Si = s ∀i ∈ U , so that bidders believe the worst of off-path unreported signals.
These beliefs satisfy consistency. Now, the symmetric strategies in which bidders bid zero
in the early stage of the auction and bid E[w(x, x, s; n)|R(s) = r, SR = sR ] (on path) or
E[w(x, x, s; n)|SR = sR , SU = s] (off path) in the late stage are mutual best responses. No
bidder has strict incentive to deviate in the early stage. In the late stage, bidders condition
on the revealed information sR and make (correct) inferences about the unrevealed signals
based on the seller’s equilibrium reporting policy. Their play satisfies sequential rationality.
To complete the equilibrium construction, I must show that the reporting policy Ri (s) = 1 ∀i
is optimal for the seller. Notice that for any deviation R0 , there must exist signal realizations
s with Ri0 (s) = 0. But for such realizations, the seller will expect to receive (weakly) less
revenue than under the equilibrium reporting policy, since the bids under R0 take the form
E[w(x, x, s; n)|SR = sR , sU = s] and this is weakly less than the equilibrium bid w(x, x, s; n).
38
C. Proof of Proposition 3
I begin with a useful lemma.
Lemma 1 Let U be a subset of 1 · · · m that includes element i, and let U−i be the same set
without element i. Then:
E w(x, x, s; n){sj < tj }j∈U−i − E [w(x, x, s; n)|{sj < tj }j∈U ] = E ∆i (si , s0i ; n)s0i < ti
Proof:
E w(x, x, s; n){sj < tj }j∈U−i −E [w(x, x, s; n)|{sj < tj }j∈U ]
= E w(x, x, s; n) − E [w(x, x, s; n)|si < ti ]{sj < tj }j∈U−i
= E E w(x, x, s−i , si ; n) − w(x, x, s−i , s0i ; n)s0i < ti {sj < tj }j∈U−i
= E E ∆i (si , s0i ; n)|s0i < ti {sj < tj }j∈U−i
= E ∆i (si , s0i ; n)s0i < ti
where the final line follows by Assumption 1.
Now, to prove the proposition, let bidders have beliefs as specified in the proof of Proposition
2 above. Let the seller’s reporting policy be as in the proposition. As before, the symmetric
strategies where bidders bid zero during the early stage of the auction and E[w(x, x, s; n)|R(s) =
r, SR = sR ] in the late stage are mutual best responses (all points are on path). Under the
sellers’ reporting policy, I may write E[w(x, x, s; n)|R(s) = r, SR = sR ] as vR (x, x, sR ; n) =
E[w(x, x, s; n)|{si < ti }i∈U ]. It remains to show that the seller’s strategy is optimal. To this
end, consider a deviation. Under the constant differences assumption, it will suffice to show
that for all realizations of the signal vector s, there is no signal si for which a deviation in
R(si ) is profitable. So consider revealing si < ti for some i. The payoff to deviation depends
on the seller’s expected revenue, and is given by:
i
h h
i
π =E E vR (x, x, sU−i ; n) − vR (x, x, sU ; n)X (n−1:n) = x S = s − ci
h h i
i
=E E E w(x, x, s; n)|{sj < tj }j∈U−i − E [w(x, x, s; n)|{sj < tj }j∈U ]X (n−1:n) = x S = s − ci
h h i
i
= E E E ∆i (si , s0i ; n)s0i < ti X (n−1:n) = x S = s − ci
= E ∆i (si , s0i ; n)|s0i < ti − ci
where the third line follows by Lemma 1, and the fourth by Assumption 1. Now since ti =
inf {t : E [∆i (t, t0 ; n)|t0 < t] ≥ ci }, and si < ti , this expression is negative and thus the deviation
39
is unprofitable. It can similarly be proved that the opposite deviation (not revealing si > ti )
is unprofitable. Hence the seller’s reporting policy is a best response to the bidder’s bidding
strategies. Finally, part (iii) follows immediately from the proof of proposition 1.
D. Proof of Corollary 1
By Bayes Rule:
Z
E[V |I] =
Z
vf (v|I)dv =
g(I|v)
vR
dv =
g(I|y)dF (y)
Z
v`(v, I)dv
One may show that `(I, v) is log supermodular (LSPM) - see below. It follows immediately
that v`(l, v) is also LSPM. But then the right hand side has the form EV [h(v, I)] where h(v, I)
is LSPM, and from the results in Athey (2002) it follows that EV [h(v, I)] is increasing in I.
This establishes the result.
E. Log Supermodularity of `(I, v)
A real-valued function g(x, y) : Rm+n → R is said to be LSPM if for all (x1 , y1 ) and (x2 , y2 ):
g(x1 ∧ x2 , y1 ∧ y2 )g(x1 ∨ x2 , y1 ∨ y2 ) ≥ g(x1 ∧ x2 , y1 ∨ y2 )g(x1 ∨ x2 , y1 ∧ y2 )
where ∨ is the component-wise minimum operator, and ∧ is the component-wise maximum
operator. Since `(I, v) is two-dimensional, it will suffice to prove that for v1 < v2 and I1 < I2 ,
we have:
`(v1 , I1 )`(v2 , I2 ) ≥ `(v1 , I2 )`(v2 , I1 )
Let pi (v) = 1−H(ti |v). By the affiliation of Si and V for all i, it follows that pi (v) is increasing
in v ∀i. Let r be a vector of reports and let ||r|| = #{i : ri = 1}. Then we have:
!
g(I|v) =
X
Y
r:||r||=I
ri =1
pi (v)
Y
(1 − pi (v))
ri =0
where I use the conditional independence of the signals Si to express the RHS as products.
Since the denominators on both sides cancel, it will suffice to prove that g(I1 |v1 )g(I2 |v2 ) ≥
40
g(I1 |v2 )g(I2 |v1 ). Take the LHS and expand it:

X
Y
r:||r||=I1
ri =1
LHS = 
=
M X
N
X
! 
X
(1 − pi (v1 ))  
ri =0
r:||r||=I2
Y
ri =1
pi (v2 )
Y
!
(1 − pi (v2 )) 
ri =0


Y

j=1 k=1
pi (v1 )
Y
pi (v1 )
rij =1
Y
(1 − pi (v1 ))
rij =0
Y
pi (v2 )
Y
(1 − pi (v2 ))
rik =0
rik =1
where j = 1 · · · M index the IS1 elements with ||r|| = I1 and k = 1 · · · N index the IS2
elements with ||r|| = I2 . Now expand and index the RHS in an identical fashion, and compare
the jk-th elements. After canceling identical terms on both sides, the result is:
Y
pi (v2 ) (1 − pi (v1 )) ≥
Y
pi (v1 ) (1 − pi (v2 ))
rij =0,rik =1
rij =0,rik =1
This inequality holds by the monotonicity of pi (v) ∀i, since that implies pi (v2 ) > pi (v1 ) and
1 − pi (v1) > 1 − pi (v2). But then since the inequality holds for all terms jk in the sum, it
holds for their summation as well. This proves the log supermodularity of `(I, v).
F. Proof of Proposition 4
By definition, v(x, x; n, µ, σ, r) = E[V |Xi = x, Y = x, N = n]. Condition on all other signals,
so that we have:
v(x, x; n, µ, σ, r) = EX−i |Y <x [V |Xi = x, X−i = x−i , N = n]
"
(
#
Pn
2 )
rµ + σ i=1 xi
1
rσ
= EX−i |Y <x exp
+
Xi = x, X−i = x−i , N = n
r + nσ
2 r + nσ
"
(
#
Pn
2 )
r(µ − a) + σ i=1 (xi − a) 1
rσ
= EX−i |Y <x exp a +
+
Xi = x, X−i = x−i , N = n
r + nσ
2 r + nσ
"
(
#
Pn
2 )
r(µ
−
a)
+
σ
(x
−
a)
1
rσ
i
i=1
= ea EX−i |Y <x exp
+
Xi = x, X−i = x−i , N = n
r + nσ
2 r + nσ
= ea v(x − a, x − a; n, µ − a, σ, r)
where the second line follows from the posterior mean and variance of log v given normal
unbiased signals {xi }, and the fourth line follows on noting that ea is a constant and thus can
be taken outside of the expectation. Taking logs on both sides, we get log v(x, x; n, µ, σ, r) =
a + log v(x − a, x − a; n, µ − a, σ, r).
To get equation (5), note that E[log p(n, µ, σ, r)] = E[log v(x(n−1:n) , x(n−1:n) , n, µ, σ, r)]. It
follows that E[log v(x(n−1:n) , x(n−1:n) , n, µ, σ, r)] = a + E[log v(x(n−1:n) , x(n−1:n) ; n, µ − a, σ, r)]
41
where the signals in the RHS expectation have the same conditional distribution G|v as those
on the LHS, but the latent v now has a prior with mean µ − a on the RHS. Now take a = αzj
for any observation j. Then by specification µ − a = 0, and
E[log p(n, zj , α, β, r)] = αzj + E[log p(n, zj , 0, σ, r)]
The variance result follows in similar fashion.
G. Proof of Proposition 5
To begin, one must establish the results on the moments. (b) is immediate, since by Proposition
4, Ω[log p|n, z, m, θ] does not depend on µ, and by assumption ξ enters only µ and not σ and
r. For (a), I argue as in Pinkse (2000) and write:
E[log p|n, z, m, θ] = αz + E[log p|n, z, α = 0, β, r] + E[ξ|I, z 0 , m]
= αz + E[log p|n, z, α = 0, β, r] + E[ξ|m]
= αz + E[log p|n, z, α = 0, β, r] + h−1 (m)
= αz + E[log p|n, z, α = 0, β, r] + f (m)
where the first line follows by Proposition 2 above. The third line follows on noting that
I − m = g(z 0 , W ) is independent of ξ, as is z 0 . The existence of an inverse in the fourth line
follows by the monotonicity of h.
H. Asymptotics for the Control Function Estimator
In this section I provide a discussion of the requirements for consistency and asymptotic normality of the PML estimator with a control function. All the analysis here is based on Andrews
(1994a) and Andrews (1994b). Define the sample and limiting objective functions:
J X
(log pj − E[log p|n, zj , mj , θ])2
LJ (θ, m) = (−1/2J)
log(Ω[log p|n, zj , θ]) +
Ω[log p|n, zj , θ]
j=1
(log p − E[log p|n, z, m, θ])2
L∞ (θ, m) = (−1/2)E log(Ω[log p|n, z, θ]) +
Ω[log p|n, z, θ]
Denote the true parameter vector as θ0 and the true residual as m0 . Following Theorem
A-1 in Andrews (1994a), four requirements need to be satisfied for consistency. The first is
that LJ (θ, m) uniformly weakly converges to L∞ (θ, m) on Θ × F, for F a function space for
42
m = I − E[I|W, z 0 ]. This follows by a WLLN since the sample consists of independent draws
from the stochastic process that generates the random variables (P, Z, N, W ). The second
is that the supremum distance supθ∈Θ ||L∞ (θ, m)
b − L∞ (θ, m0 )|| converges in probability to
zero. This follows from the consistency of the first stage kernel estimator for m0 . Thirdly,
I need uniform continuity of LJ (θ, m), which is straightforward. Finally, I need a separation
condition, so that for every neighborhood Θ0 of θ0 , inf θ∈Θ/Θ0 L∞ (θ, m0 ) > L∞ (θ0 , m0 ). This
follows from identification of the model and the Kulback inequality, as in Gourieroux, Monfort,
and Trognon (1984).
Obtaining asymptotic normality is somewhat less straightforward. The idea is to first show
√
that T (θbm − θ0 ) is asymptotically normally distributed, where θbm is the maximizer of the
0
0
sample criterion function with the true residuals m0 substituted for the estimated residuals m.
b
√
Then if the convergence of m
b to m0 is sufficiently smooth, asymptotic normality of T (θb −
θ) will follow. The formal requirements are given in Theorem 1 of Andrews (1994a). The
√
asymptotic normality of T (θbm0 − θ0 ) follows immediately from the asymptotic normality
results in Gourieroux, Monfort, and Trognon (1984), under standard conditions on LJ (θ, m).
Here we will also need to extend these standard conditions to all of F so that LJ (θ, m) is twice
continuously differentiable and the vector of first order conditions satisfy uniform WLLNS for
all m ∈ F. But the key requirements are an asymptotic orthogonality condition on the vector
√
of first order conditions T ∂/∂θ(LJ (θ0 , m)),
b
and the stochastic equicontinuity of LJ around
m0 . For a kernel-based estimator, Andrews (1994b) provides sufficient conditions on the first
stage estimator for these to hold. For the asymptotic orthogonality condition, it is necessary
that the kernel estimator be adaptive. I implement the kernel estimator with bandwidth chosen
by cross-validation to satisfy this. For the stochastic equicontinuity condition, it is necessary
to use kernels with smooth derivatives, and trim observations in a smooth manner. To satisfy
these conditions, I use a multivariate Gaussian kernel, and underweight outlying observations
when forming the conditional expectation m.
b It is important that the trimming function
has the property not only of assigning zero weight to outlying observations, but that it also
underweights observations near to the outliers, and so as suggested in Andrews (1994b) I use a
second kernel to re-weight observations based on their mahalanobis distance from other points
in the data.
I. Demand System Computation
In this section I provide a number of additional details on the demand system computation. I
do this for the case in which a control function is not used, as the case with a control function
is extremely similar apart from the first stage estimation of residuals. The first step is to
43
compute the moments E[log p|n, µ = 0, σ, r] and Ω[log p|n, µ = 0, σ, r] for a grid of values of r,
σ and n (my grid has 3600 points). To do this, I use the following formulae:
R q
e φ((x − q)/r)2 Φ((x − q)/r)n−1 φ((q − µ)/σ)dq
v(x, x, n; µ, σ, r) = R
φ((x − q)/r)2 Φ((x − q)/r)n−1 φ((q − µ)/σ)dq
Z Z
E[log p|n, µ = 0, σ, r] =
log v(x, x, n, µ = 0, σ, r)φ((x − q)/r) (1 − Φ((x − q)/r)) Φ((x − q)/r)n−2 φ(q/σ)dxdq
Z Z
E[(log p)2 |n, µ = 0, σ, r] =
log v(x, x, n, µ = 0, σ, r)2 φ((x − q)/r) (1 − Φ((x − q)/r)) Φ((x − q)/r)n−2 φ(q/σ)dxdq
where q = log v is the latent log value of the object, and φ and Φ are the standard normal density and cumulative distribution function respectively. I compute these moments
to very high accuracy using Gauss-Kronrod quadrature. In the algorithm below, when required to provide either of these moments for a point not on the grid, I use three dimensional linear interpolation, which remains highly accurate. I transform the indices βz and
rIN DEX with the functions κ(βz) = σ + (σ − σ) (0.5 + arctan(βz/π)), and ν(rIN DEX ) =
r + (r − r) (0.5 + arctan(rIN DEX /π)) where the grid boundaries for fixed n are [σ, σ] and [r, r].
These transformations constrain the indices to lie on the grid, which is convenient for computation. The estimated values do not lie near the boundaries, so my grid boundary choice does
not appear to apply constraints to the estimator.
The algorithm for finding the estimate θb given this grid of pre-computed values is then as
follows:
1. Draw an initial parameter vector (β0 , r0 ). For each observation, generate σj under
the specification and then interpolate on the grid to find E[log p|n, µ = 0, σj , r] and
Ω[log p|n, µ = 0, σj , r].
2. As in the text, generate a temporary dependent variable yj = log pj − E[log p|n, µ =
0, σj , r] and weights 1/Ω[log p|n, µ = 0, σj , r], and find α
b(β, r) by WLS.
3. Evaluate the objective function, and using a quasi-Newton algorithm, pick a new point
(β1 , r1 ). Repeat until convergence has been achieved.
J. Additional Simulation Details
There are two steps to the simulation: estimating the counterfactual prior parameters µc and
σ c , and computation of the expected true, counterfactual and baseline prices. For the first step,
note that µcj = E[˜
v |zj0 ] = E[E[˜
v |Ij , zj0 ]|zj0 ] = E[µj |zj0 ] = α0 zj0 + αI E[I|zj0 ]. I replace the parameter α with the structural estimate α
b, and the term E[I|zj0 ] with a nonparametric kernel estimate
r
c
c
Ib to get an estimate of µj . Similar analysis shows σj = E[σj2 |zj0 ] + αI2 E[I 2 |zj0 ] − E[I|zj0 ]2 . I
44
estimate all unknown terms nonparametrically. Then I use the following formulas in computing
the true, counterfactual and baseline expected prices:
E[pj |I, z 0 ] =
Z Z
E[pcj |I, z 0 ]
Z Z
=
E[pbj |z 0 ] =
Z Z
v(x, x, n, µ
cj , σbj , rb)φ((x − q)/b
r) (1 − Φ((x − q)/b
r)) Φ((x − q)/b
r)n−2 φ((q − µ
cj )/σbj )dxdq
cjc , rb)φ((x − q)/b
ccj , σ
r) (1 − Φ((x − q)/b
r)) Φ((x − q)/b
r)n−2 φ((q − µ
cj )/σbj )dxdq
v(x, x, n, µ
cjc )dxdq
ccj )/σ
cjc , rb)φ((x − q)/b
ccj , σ
r) (1 − Φ((x − q)/b
r)) Φ((x − q)/b
r)n−2 φ((q − µ
v(x, x, n, µ
where the dependence on I and z of µ and σ is suppressed. I compute the integrals using
Gauss-Kronrod quadrature, evaluating all three for a sample of 2000 points from my data set.
K. Testing for Common Values
I justified the use of a symmetric common values (CV) auction model by arguing that the
common uncertainty generated by the information asymmetries between the seller and the
bidders is more important than idiosyncrasies in individual preferences. Here I provide a
formal test of symmetric common values against the alternative of symmetric private values
(PV). This test, originally proposed by Athey and Haile (2002), depends on the following
equality:
2
n−2
P r(B (n−2:n) < b) +
P r(B (n−1:n) < b) = P r(B (n−2:n−1) < b)
n
n
(9)
where B (i:n) denotes the i-th ordered bid in the n person auction.
This relationship holds in symmetric PV models since in those frameworks the number of
bidders n has no effect on valuations. By contrast, in CV models the Winner’s Curse effect
means that bids are on average lower as n increases than in the PV model. This implies that
the distribution on the LHS of (9) is first order stochastically dominated by that on the right
hand side.
I observe the second and third highest bids in every auction B (n−1:n) and B (n−2:n) , and thus for
each n > 2 can construct the distributions on either side of (9). Let Gn (b) = n2 P r(B (n−2:n) <
b) +
n−2
(n−1:n)
n P r(B
< b) and Hn (b) = P r(B (n−2:n−1) < b). To test for stochastic dominance,
I use a version of the Kolmogorov-Smirnov test statistic proposed by Haile, Hong, and Shum
(2003), given by:
δT =
n
¯
X
n=n
n
o
b Tn (ε) − H
b nT (ε)
sup G
(10)
ε
where n
¯ and n are respectively the highest and lower number of bidders observed in all auctions,
Pn¯
and T =
n=n Tn for Tn the number of observations of auctions with number of bidders
45
b Tn (ε) and H
b nT (ε) are calculated for each n from the empirical distribution function of
n. G
the observed residuals, Fbn (), where the residuals have been obtained from separate OLS
regressions of the second and third highest bids on covariates z. I use residuals instead of
the bids in forming the test statistic because the objects being auctioned are not identical,
and a sensible null hypothesis is that of conditionally independent private values, where the
distribution of values is identical across auctions conditional on the covariates z.
Under the null hypothesis, the distributions Gn and Hn are equal for each n, whilst under
the common values alternative, first order stochastic dominance implies that ST will be large.
To obtain an asymptotically valid critical value, I use a subsampling approach as in Haile,
Hong, and Shum (2003) and Linton, Maasoumi, and Whang (2005). Define the normalized
√
test statistic ST = T δT . This differs from Haile, Hong, and Shum (2003) because they are
obliged to use a nonparametric first stage in estimating the underlying valuations, and thus
obtain a slower uniform convergence rate. I draw m samples r1 · · · rm of size b from the original
dataset without replacement, calculating Sb for each one. Forming the empirical distribution
function Ψ of the m statistics Sb , I take the critical value for a test of size α to be the (1−α)-th
quantile, S1−α of Ψ. The validity of this approach follows from results in Politis, Romano, and
Wolf (1999). In practice, I choose the number of draws m to be 1000, and since results can be
sensitive to the choice of sample size, I consider two sample sizes, sizes 30% and 40% of my
original dataset. The results follow in Table 2 below:
Table 2: Test for Common Values
Critical Values
Sample Size
30 %
40 %
Test Statistic
S0.9
48.592
49.41
S0.95
50.184
50.393
51.664
S0.99
52.595
53.07
The test statistic is significant at the 5% level, and I can reject the null of symmetric private
values. But there is reason to view this result with caution, as in general this test will tend to
over-reject the null in an eBay context. This is because the third highest observed bid, b(n−2:n) ,
may not actually be the bid of the bidder with the third highest valuation (in a PV context)
or signal (in a CV context). As noted earlier, such a bidder may be too late in entering his
b Tn (b) may
or her bid, and not have it recorded. It follows that the estimated distribution G
be stochastically dominated by the true distribution Gn (b) even in a private values context,
where “true” should be interpreted here to mean the distribution of bids in the counterfactual
case that all bids were recorded. Nonetheless this result is consistent with the assumption of
symmetric common values in the model used in the body of the paper.
46
Table 3: Structural Demand Estimation
µ
Age
Age Squared
Log of Miles
Log of Text Size
Number of Photos
Manual Transmission
(1)
(2)
(3)
(4)
(5)
-0.2107
(0.0023)
0.0043
(0.0001)
-0.1258
(0.0059)
0.0537
(0.0065)
0.0185
(0.0009)
0.0404
(0.0111)
-0.2094
(0.0023)
0.0043
(0.0001)
-0.1248
(0.0058)
0.0471
(0.0065)
0.0179
(0.0009)
0.0404
(0.0111)
0.1915
(0.0159)
-0.2091
(0.0023)
0.0043
(0.0001)
-0.1246
(0.0058)
0.0592
(0.0068)
0.0179
(0.0009)
0.0385
(0.0110)
0.1897
(0.0159)
-0.0195
(0.0029)
-0.0027
(0.0013)
-0.2088
(0.0023)
0.0043
(0.0001)
-0.1242
(0.0058)
0.0610
(0.0070)
0.0176
(0.0009)
0.0378
(0.0110)
0.1925
(0.0160)
-0.0178
(0.0029)
-0.0025
(0.0013)
-0.0040
(0.0011)
0.0605
(0.0106)
-0.0361
(0.0154)
10.6771
(0.0764)
10.6952
(0.0759)
10.6726
(0.0759)
10.6485
(0.0767)
-0.2095
(0.0024)
0.0043
(0.0001)
-0.1256
(0.0060)
0.0785
(0.0079)
0.0207
(0.0010)
0.0341
(0.0110)
0.1983
(0.0159)
-0.0167
(0.0029)
-0.0027
(0.0013)
-0.0044
(0.0011)
0.0641
(0.0105)
-0.0352
(0.0154)
0.4761
(0.1175)
-0.0503
(0.0182)
-0.0139
(0.0022)
0.4388
(0.0937)
-0.0433
(0.0142)
-0.0052
(0.0018)
10.4939
(0.0853)
Featured Listing
Log Feedback
Percentage Negative Feedback
Total Number of Listings
Phone Number Provided
Address Provided
Warranty
Warranty*Logtext
Warranty*Photos
Inspection
Inspection*Logtext
Inspection*Photos
Constant
Estimated standard errors are given in parentheses. The table gives the estimated relationship between the
prior mean valuation µ and auction covariates. The nested specifications (2)-(4) include controls for marketing
effects, seller feedback and dealer status. Specification (5) includes warranty and inspection dummies and their
interactions with the information measures, showing that the value of photos and text is lower when the car has
been inspected or is under warranty.
47
Table 3: Structural Demand Estimation (continued)
σ
Age
Log of Text Size
Photos
Constant
rIN DEX
Marginal Effects for σ
Age
Log of Text Size
Photos
Mean σ
r
(1)
(2)
(3)
(4)
(5)
0.0336
(0.0022)
-0.0185
(0.0357)
-0.0178
(0.0053)
-1.1490
(0.2234)
-0.0630
(0.0373)
0.0337
(0.0022)
-0.0206
(0.0360)
-0.0182
(0.0054)
-1.1384
(0.2241)
-0.0648
(0.0364)
0.0337
(0.0022)
-0.0215
(0.0360)
-0.0182
(0.0054)
-1.1344
(0.2239)
-0.0637
(0.0366)
0.0335
(0.0022)
-0.0221
(0.0360)
-0.0182
(0.0054)
-1.1287
(0.2237)
-0.0619
(0.0374)
0.0335
(0.0022)
-0.0220
(0.0363)
-0.0183
(0.0055)
-1.1345
(0.2254)
-0.0642
(0.0363)
0.0127
-0.0070
-0.0067
0.8472
1.2099
0.0126
-0.0077
-0.0068
0.8448
1.2088
0.0126
-0.0081
-0.0068
0.8442
1.2095
0.0125
-0.0083
-0.0068
0.8433
1.2106
0.0124
-0.0082
-0.0068
0.8402
1.2092
Estimated standard errors are given in parentheses. The parameters in this table show the estimated relationship
between the prior standard deviation σ, which is a measure of prior uncertainty, and auction covariates z. Since
the relationship between σ and z is modeled as being non-linear, the marginal effects are easiest to interpret.
An estimate of r, the standard deviation of bidders private signals, is also provided.
48
Table 4: Structural Demand Estimation by Car Group
µ
Age
Age Squared
Log of Miles
Log of Text Size
Photos
Manual Transmission
Constant
σ
Age
Log of Text Size
Photos
Constant
rIN DEX
Marginal Effects for σ
Age
Log of Text Size
Photos
Mean σ
r
Classic Cars
Reliable Cars
Pickups
-0.1794
(0.0029)
0.0038
(0.0001)
-0.1370
(0.0070)
0.0657
(0.0095)
0.0202
(0.0011)
0.2204
(0.0140)
10.8249
(0.0953)
-0.1732
(0.0159)
0.0007
(0.0006)
-0.1059
(0.0213)
0.0329
(0.0206)
0.0062
(0.0024)
0.0881
(0.0221)
10.7823
(0.2557)
-0.1936
(0.0038)
0.0030
(0.0001)
-0.1097
(0.0124)
0.0339
(0.0115)
0.0172
(0.0022)
-0.1757
(0.0231)
11.1512
(0.1646)
0.0278
(0.0026)
-0.0273
(0.0400)
-0.0132
(0.0050)
-1.1110
(0.2635)
-0.0793
(0.0292)
0.0950
(0.0306)
-0.0236
(0.1542)
-0.0169
(0.0192)
-2.0891
(1.0739)
-0.2537
(0.0290)
0.0321
(0.0035)
-0.0062
(0.0853)
-0.0213
(0.0156)
-1.1467
(0.5371)
0.0248
(0.5906)
0.0109
-0.0107
-0.0052
0.8512
1.1996
0.0190
-0.0047
-0.0034
0.6567
1.0918
0.0109
-0.0021
-0.0072
0.8065
1.2658
Estimated standard errors are given in parentheses. This table gives the estimated relationship between the
structural parameters and the covariates for each of the main car groups in my dataset. In all cases the prior
mean is positively related to the amount of text and photos, but the magnitudes vary significantly by group.
49
Table 5: Structural Demand Estimation with Controls
No Control
µ
Age
Age Squared
Log of Miles
Log of Text Size
Manual Transmission
Control
Control Squared
Constant
σ
Age
Log of Text Size
Constant
rIN DEX
Marginal Effects for σ
Age
Log of Text Size
Mean σ
r
Control
Control
Control
(h = 0.2)
(h = 0.1)
(h = 0.3)
OLS Control
-0.2037
(0.0037)
0.0043
(0.0001)
-0.1218
(0.0085)
0.0623
(0.0082)
0.0565
(0.0170)
-0.2975
(0.0496)
0.6089
(0.0418)
10.6941
(0.1152)
-0.2041
(0.0037)
0.0044
(0.0001)
-0.1221
(0.0084)
0.0444
(0.0104)
0.0575
(0.0170)
0.0508
(0.0177)
-0.0179
(0.0116)
10.8380
(0.1234)
-0.2040
(0.0037)
0.0043
(0.0001)
-0.1222
(0.0084)
0.0486
(0.0097)
0.0575
(0.0170)
0.0440
(0.0161)
-0.0175
(0.0116)
10.8074
(0.1209)
-0.2041
(0.0037)
0.0044
(0.0001)
-0.1221
(0.0084)
0.0406
(0.0112)
0.0574
(0.0170)
0.0556
(0.0190)
-0.0180
(0.0116)
10.8657
(0.1266)
-0.2040
(0.0037)
0.0043
(0.0001)
-0.1184
(0.0085)
0.0466
(0.0089)
0.0579
(0.0169)
0.0779
(0.0163)
-0.0182
(0.0114)
10.7643
(0.1156)
0.0317
(0.0035)
-0.1256
(0.0485)
-0.6006
(0.4186)
-0.0532
(0.0495)
0.0316
(0.0035)
-0.1248
(0.0483)
-0.6072
(0.4176)
-0.0534
(0.0493)
0.0316
(0.0035)
-0.1245
(0.0483)
-0.6093
(0.4177)
-0.0535
(0.0492)
0.0316
(0.0035)
-0.1250
(0.0483)
-0.6055
(0.4174)
-0.0534
(0.0493)
0.0316
(0.0035)
-0.1247
(0.0484)
-0.6103
(0.4182)
-0.0541
(0.0487)
0.0113
-0.0450
0.8324
1.2162
0.0113
-0.0447
0.8319
1.2160
0.0113
-0.0446
0.8319
1.2160
0.0113
-0.0448
0.8319
1.2160
0.0113
-0.0445
0.8311
1.2156
Estimated standard errors are given in parentheses. I do not account for first stage estimation error in computing
the errors. The different specifications provide estimates with and without the use of a control function, and
for varying bandwidth h in the nonparametric first stage. The final specification is estimated with an OLS first
stage.
50
Table 6: Demand Estimation by Car Group with Control
µ
Age
Age Squared
Log of Miles
Log of Text Size
Manual Transmission
Control
Control Squared
Constant
σ
Age
Log of Text Size
Constant
rIN DEX
Marginal Effects for σ
Age
Log of Text Size
Mean σ
r
Classic Cars
Reliable Cars
Pickups
-0.1629
(0.0044)
0.0036
(0.0001)
-0.1341
(0.0097)
0.0660
(0.0192)
0.2499
(0.0223)
0.0749
(0.0305)
-0.0293
(0.0167)
10.9973
(0.1732)
-0.1815
(0.0253)
0.0013
(0.0010)
-0.0804
(0.0331)
-0.0216
(0.0428)
0.0570
(0.0303)
0.1157
(0.0370)
-0.0180
(0.0201)
10.8878
(0.5546)
-0.1925
(0.0058)
0.0031
(0.0002)
-0.1028
(0.0163)
0.0108
(0.0153)
-0.1607
(0.0334)
0.0641
(0.0319)
-0.0003
(0.0224)
11.3938
(0.2632)
0.0240
(0.0035)
-0.0766
(0.0529)
-0.8223
(0.4113)
-0.0407
(0.0550)
0.0760
(0.0599)
-0.2602
(0.2543)
-0.4439
(2.4866)
-0.0190
(0.5599)
0.0331
(0.0048)
-0.1750
(0.0915)
-0.1309
(0.7419)
0.1491
(0.8446)
0.0101
-0.0322
0.8753
1.2241
0.0124
-0.0423
0.6064
1.2379
0.0104
-0.0548
0.7784
1.3442
Estimated standard errors are in parentheses. I do not account for first stage estimation error in computing
the errors. The three specifications show the estimated relationship between the structural parameters and the
covariates for different car groups. After including the control function, I find no significant relationship between
my information measures and the prior mean in the case of reliable cars and pickups.
51