1 Introduction

During the last few years Extreme Value Theory (henceforth EVT; see ^{Fisher and Tippet, 1928}; ^{Gumbel, 1935}) has proved its usefulness across various scientific fields, such as engineering, finance and even public health (^{Thomas et al. , 2016}). The new literature that has started to explore the potential of EVT in Economics has focused mostly on theoretical problems (^{Gabaix et al. , 2003}, ^{2006}; ^{Benhabib and Bisin, 2006}).

In this article I address the use of EVT for the estimation of auction models within the framework presented by Haile and Tamer (^{2003}), which restricts the focus to the case of an incomplete datasets and, in particular, to those environments where only transaction prices can be observed. This is the case, for instance, of descending bid (Dutch) auctions, where only the winning bidder reveals information about his valuation.

Menzel and Morganti (^{2013}) show that under such conditions the nonparametric estimator for the distribution function converges slowly, and that the small sample bias spreads to all the estimates of functionals of practical interest, such as the expected revenue or the optimal reserve price. In general, with small samples it is preferable to adopt a parametric approach. However, the choice of the parametric distributional form is usually arbitrary, as researchers typically do not have theories to guide their choice (see ^{Mohlin et al. , 2015}; ^{Takano et al. , 2014} for studies under complete datasets). Luckily, this is not the case in the present context. EVT theoretically guides us toward a natural parametric assumption allowing us to analytically approximate functionals of interest. de Haan et al. (^{2009}) and (^{2013}) introduce EVT in the estimation of auction models, restricting their analysis to the expected value and on the number of active bidders. In this article, we extend the analysis to other functionals such as the optimal reserve price. We also analyze the performance of standard nonparametric estimators and quantify how the bias spreads across functionals of practical interest. Over the years, the analysis of auctions has inspired one of the most successful marriages between theoretical and econometric models. Since the seminal work of Vickrey (^{1961}), theorists have constructed a rich framework to map private valuations into bids. In their attempt to identify and estimate the distribution of these private values, econometricians (see, for instance, ^{Guerre, Perrigne and Vuong, 2000}; ^{Aradillas-Lopez, Gandhi and Quint, 2013}) have adapted the results from the theory as restrictions for these data (that is, the bids).

The general approach to nonparametric identification in auction models relies on this *theoretical mapping* between the distribution of bidders’ valuations - the object of interest - and the distribution of observed bids - the data. Given the latter, we can obtain the former by inverting the mapping.

When an econometrician has access to limited data - for instance, to dataset reporting only transaction prices - Athey and Haile (^{2002}), and Haile and Tamer (^{2003}) show that it is still possible to recover the missing object of interest using a *statistical mapping* , which establishes a relationship between the distribution of any order statistics and the underlying distribution of the data. The use of such mapping is justified by the observation that transaction prices are an order statistics of the bids, as explicitly described by the rules of the auction. For instance, in a second price auction, the transaction price is equal to the second highest bid. Given the distribution of any order statistics, it is possible to invert the statistical mapping to back out the underlying distribution.^{1}

However, as Menzel and Morganti (^{2013}) pointed out, even though the statistical inversion preserves consistency, convergence of the estimated distribution to the true one with respect to an appropriate function norm fails to reach the root-

Because an econometrician observes just an extreme (or a function of an extreme) of the parental distribution, the dataset is *unbalanced* - that is, observations on the lower part of the support will be undersampled, whereas observations on the higher portion of the support will be oversampled.

Consequently, inverting the distribution of an extreme imposes a downward bias around the left end of the support, and an upward bias on the right end. All the quantiles are thereby pushed to the right, and the estimates based on them will suffer as a result. The problem is particularly evident when the number of participants in an auction approaches infinity, as the distribution of transaction prices collapses to a degenerate one with mass point at the upper extreme of the support. Monte Carlo experiments show that even when

In principle it is possible to attenuate the problems on the right tail by smoothing the nonparametric estimators with an appropriate Nearest Neighborhood Estimator, but in practice this will be difficult and time consuming. Trimming and smoothing procedures could, in theory, solve the problem on the left tail, though the choice of the regularization parameters is obstructed by several trade-offs, and the criteria for an efficient procedure are still not available.

Given these considerations, we suggest an alternative, practical approach based on EVT. This parametric method relies on well known convergence results concerning the extremes of a distribution (^{Fisher and Tippet, 1928}; ^{Gnedenko, 1943}). Under very mild assumptions, the distribution of such extremes - appropriately normalized - converges uniformly to one of three possible distributions, the so-called Extreme Value Distributions (EVD). When we rely on these results, it is possible to obtain approximate estimates of functionals of practical interest, such as the expected revenue or the optimal reserve price, in two steps. First, we estimate the two normalizing constants by minimizing the distance between the normalized empirical distribution of an extreme and the corresponding EVD. Second, by applying a simple change of variable to the integral that expresses the expected revenue of the auction, we can rewrite everything in terms of EVDs and their transformations. EVT also suggests a natural approximation for the underlying distribution of bids: Generalized Pareto.

We present results from Monte Carlo simulations, which show that the approximation method performs better than the nonparametric one - even in cases where the convergence of the extreme and the limiting distribution happens at a very slow rate.^{2}

Even though this extreme value estimator and its functionals suffer from the same limitations on the left tail as their nonparametric counterparts, they appear to be more robust. Moreover, as this relative advantage of EVT seems to hold also for those distributions with poor approximation, we can confidently count on the generality of this approach.

This approximation gets more precise as

EVT provides a general framework that can be adapted to all problems in which an order statistics is observed. For instance, an interesting application for financial markets is the estimation of the unobserved distribution of valuations in multi-unit auctions with uniform price.

This article is structured as follows: Section 2 presents the nonparametric estimator and discussed its behavior on the tails. Section 3 introduces basic and general results from EVT. In Section 4 we apply EVT to the auction framework and show how it is possible to obtain useful results relying only on EVDs and their transformations. Finally, Section 5 presents results from Monte Carlo simulations.

2 Nonparametric Identification and Estimation

We restrict our attention to symmetric independent private value (IPV) auction models, where only the transaction price is observed. For expositional purposes we focus on the case of second price auctions^{3}. The typical dataset consists of observations from ^{Athey and Haile, 2002}; ^{Haile and Tamer, 2003}). Every bidder ^{4}. The ^{5}

Athey and Haile (^{2002}) show that the mapping implicitly described above is always invertible: therefore it is possible to obtain the distribution of the bids, *statistical inversion* ). A simple nonparametric estimator for the distribution of the transaction prices is

which^{6}, by Glivenko-Cantelli theorem, converges almost-surely uniformly to the true distribution

Following Haile and Tamer (^{2003}), the Continuous Mapping Theorem gives

However, as shown in Menzel and Morganti (^{2013}), this mapping is not Lipschitz continuous, meaning that its derivative is unbounded at critical points of the support, ^{7}

This creates a serious problem in the estimation, because even small biases will be magnified in neighborhoods around these points. Moreover, it is possible to see that, as

The convergence of the estimated distribution to the true one will be slow and dependent on the number of bidders. When

This means that when the number of bidders is high we should expect nonparametric estimates to be a poor description of the behavior of the lower tail of the distribution. The distribution of the bids is irregularly identified. A similar problem with finite-dimensional parameters has been analyzed by Khan and Tamer (^{2010}). Figure 1 shows how the nonparametric estimator of the cdf of a uniform fails to identify the lower tail of the distribution. Notice that increasing the number of observations from

The rate of convergence of the nonparametric estimator **Proof Remark 1** For the kernel estimator defined above,

We need to show that

We restrict our attention to the class of problems where ^{8}. First we show that ^{9} as

Because

The typical dataset is necessarily unbalanced. Higher values of the support are oversampled whereas lower values are undersampled to the point that entire portions of the lower tail might not even be observed in finite samples. All the measures based on our nonparametric estimates will be distorted accordingly: for instance, both expected revenue of the auction and reserve price will be systematically upward biased. This problem becomes worse as

3 Extreme Value Theory

The fundamental result of EVT is the following: if the distribution of the maximum of

Formally, let ^{10}tends to some nondegenerate limit

If it is possible to find a shifting parameter and a scaling parameter, such that the normalized distribution of the maximum converges, then the limiting distribution belongs to the Extreme Value family. The theorem grants a natural parametric approximation for the distribution of the maximum, up to two normalizing parameters. Gnedenko (^{1943}) also gave necessary and sufficient conditions for ^{1936}) derived a set of sufficient conditions which are more easily testable.

It is possible to show that the class of distributions that satisfy the Von-Mises conditions is wide, and includes all known analytical distributions. More interestingly for our purposes, Falk and Marohn (^{1993}) rewrite the von Mises conditions in terms of convergence of the underlying distribution to a corresponding Generalized Pareto Distribution (^{1985}) shows that the von Mises conditions imply pointwise convergence of the density ^{11}. This, by virtue of Scheff

The rate of convergence of ^{12}. The fastest possible convergence rate is of order

The results of EVT presented so far are not limited to the first maximum: in fact, they extend to the whole joint distribution of the extremes. Define

4 EVT in the Estimation of Auction Models

We can now use the results of the previous section to approximate the distribution of the extreme with the appropriate EVD. We are going to show that objects of interest such as expected revenue and optimal reserve price can be easily obtained through a simple transformation.

We assume that ^{2002}))

We want to emphasize that, for the simple case we are considering, to obtain the expected revenue of the auction it is not necessary nor suggested to compute the integral: for this purpose it is enough to find the expected value of the transaction prices. The expected value does not suffer from the bias and should therefore be used in estimation. However, for expositional purposes, we are going to refer to the integral as a benchmark for the heavy bias that affects the nonparametric estimator. Estimation of the distribution

We are going to show that the integral can be transformed and expressed in terms of EVDs, with no significant loss in precision.

[Expected Revenue] If there exists

For instance, for the class of distributions

We construct the proof through a sequence of simple Lemmas.

** Proof :** if

** Proof :** Because

for some

** Proof of Theorem 2 ** The proof of the theorem is concluded by performing a simple change of variable in the original integral,

The approximation does not depend on the unknown distribution

The normalizing constants can be estimated through some standard minimum distance (MD) criterion^{13}. A widely used criterion is the Cram^{14} metrics, the most common is the Kolmogorov-Smirnof

where

**Optimal Reserve Value:** Using a similar approach we can estimate the optimal Reserve Price (^{15}: through a numerical search over the parameter

Notice that Lemma 2 suggests the possibility to approximate the right tail of the distribution^{16}
^{1975}), Balkema and de Haan (^{1974})).

Can we use what we learn from auctions with high participation (that is, with high

5 Monte Carlo Simulations

In this section we are going to present some results from Monte Carlo simulations in support of the theory advanced in the previous chapters. In order to simplify the discussion, we are going to focus on the case of Second Price auctions: this implies that the bids drawn are also the valuations of the bidders. Using MATLAB, we draw

From equation (2), we estimate the nonparametric distribution of our set of random draws, which we then use to find the normalizing constants using the Kolmogorov-Smirnof measure (see equation 8).^{17} A useful outcome of the Kolmogorov-Smirnof criterion is the availability of a test for the goodness of fit: in all simulations, the normalized empirical distribution is not significantly different from the corresponding EVD, the Gumbel^{18}.

Figure 2 and Figure 3 provide a graphical representations of the goodness of fit of the nonparametric estimator and of the estimator based on EVT^{19}. While we used different values for both *large sample* , for the purpose of asymptotic behavior. The remaining values define a realistic *small sample* . The approximate-distribution is represented by the dash curve; the continuous curve represents the nonparametric estimator. The dotted curve is the true CDF.

The figures immediately illustrate four points: first, as the number of bidders rises the bias of the nonparametric estimator increases. Second, the nonparametric estimator is biased in two different regions of the support: in the upper tail, because those observations are overweighted, and in the lower tail. Third, the size of the dataset seems to have very little effect on the quality of the estimates. Finally, for the case of the Negative Exponential the approximation performs well, whereas when we analyze the case of the normal distribution the fit is less satisfactory: as the number of bidders increases, EVT delivers better results than the nonparametric estimator, but the bias in the lower tail stays relevant.

Next, we are going to show how the different approaches perform in predicting the expected revenue from the auction, computed using equation (7). Rather than analysing asymptotic behavior, we here focus on plausible datasets of size 50 and 100, though in our simulations we produced results for a wide range of values. Obviously, increasing sample size makes all estimators more precise. However, we show that it is the impact of the number of bidders,

EVT provides a good estimate of the expected revenue: the bias from the Approximation is high for small number of bidders, but it rapidly decreases. The sample size affects the precision of the estimation of the normalizing constants,

Again, EVT performs slightly better when the parental distribution is the negative exponential, but the difference in the fit is small. The nonparametric approach favors distributions with slow rate of convergence, like the normal one; but still drastically underperforms compared to EVT.

N. bidders | n. auctions | True Rev. | EVT Rev. | NonP Rev. | Bias EVT % | Bias NonP % |
---|---|---|---|---|---|---|

5 | 50 | 10 . 95 |
8 . 78 |
17 . 11 |
−19 . 80 |
56 . 30 |

5 | 100 | 10 . 95 |
8 . 79 |
17 . 40 |
−19 . 69 |
59 . 00 |

50 | 50 | 13 . 72 |
13.43 | 172 . 88 |
−2 . 14 |
1160 . 05 |

50 | 100 | 13 . 72 |
13.41 | 174 . 58 |
−2 . 25 |
1172 . 47 |

N. bidders | n. auctions | True Rev. | EVT Rev. | NonP Rev. | Bias EVT % | Bias NonP % |
---|---|---|---|---|---|---|

5 | 50 | 6 . 44 |
4 . 81 |
9 . 43 |
−25 . 38 |
46 . 65 |

5 | 100 | 6 . 44 |
5 . 46 |
11 . 26 |
−15 . 16 |
74 . 87 |

50 | 50 | 17 . 33 |
17 . 27 |
231 . 63 |
−0 . 37 |
1236 . 60 |

50 | 100 | 17 . 33 |
16 . 48 |
221 . 44 |
−4 . 89 |
1177 . 81 |

Next, we are going to focus on the optimal Reserve Price of the auction when the seller has an outside value equal to

N. bidders | n. auctions | True RP. | EVT RP. | NonP RP. | Bias EVT % | Bias NonP % |
---|---|---|---|---|---|---|

5 | 50 | 12 . 08 |
13 . 23 |
12 . 31 |
9 . 52 |
1 . 90 |

5 | 100 | 12 . 08 |
13 . 16 |
12 . 27 |
8 . 94 |
1 . 57 |

50 | 50 | 12 . 08 |
12 . 34 |
10 . 8 |
2 . 15 |
−10 . 60 |

50 | 100 | 12 . 08 |
12 . 33 |
10 . 8 |
2 . 06 |
−10 . 60 |

N. bidders | n. auctions | True RP. | EVT RP. | NonP RP. | Bias EVT % | Bias NonP % |
---|---|---|---|---|---|---|

5 | 50 | 6 . 25 |
7 . 85 |
1 . 25 |
−25 . 6 |
−80 |

5 | 100 | 6 . 25 |
7 . 79 |
1 . 25 |
−24 . 64 |
−80 |

50 | 50 | 6 . 25 |
7 . 07 |
1 . 25 |
−13 . 12 |
−80 |

50 | 100 | 6 . 25 |
6 . 86 |
1 . 25 |
−9 . 76 |
−80 |

Auction theory shows that the true reserve price is not affected by the number of bidders, nor by the sample size: within the boundaries of numerical computation, the Monte Carlo exercise supports the theory. However, the number of bidders does affect the estimated reserve price under both approaches. The EVT-estimator gets closer to the true value as

The magnitude of the sample size affects only slightly the precision of the estimates: this confirms the argument that convergence occurs slowly.

Last, from the estimates of the normalizing constant we try to make out-of-sample predictions about the expected revenue. As above, we take draws from a normal distribution and a negative exponential. We try to interpolate the expected revenue for

As expected, the interpolation deteriorates the further we go out-of-sample. However, the expected revenue functional seems to mitigate the progressive bias of the normalizing constant: as far as this exercises is concerned, the results seem close to the true ones.

We have derived results from other distributions, such as uniform, lognormal and mixed distributions for which there is no analytical expression, and the evidence seems consistent. The approach based on EVT systematically provides better estimates than the nonparametric approach. It is to be noted that the approximation method is computationally easier to perform, as it breaks down to the estimation of only two normalizing constants: all the subsequent steps can be solved analytically, using the appropriate

6 Conclusions

Econometricians are usually left to make arbitrary parametric choices for the estimation of their models. In this article we showed how EVT guides us towards a natural parametric approximation in auction models with incomplete data.

We addressed the quality of nonparametric estimators in auction models with incomplete data, and we show through simulations the magnitude of the bias that affects estimates of functionals of practical interests. Monte Carlo simulations show that, even when the sample size increases the bias stays relevant and does not disappear fast enough. The number of bidders strongly affects the precision of the estimates, and dominates benefits coming from large sample sizes.

The approximate distribution performs better than its nonparametric counterpart, even when the approximation is known to occur slowly, such as the case of the normal distribution. Increasing the value of

Even though the form of the approximating distribution is analytical, the set of assumptions that justify its use are very mild and we could reasonably expect most of existing distributions to satisfy them. The practical advantage of adopting analytical formulas relies on saving computational time, making the computation of the relevant measures a minor feat.