Goodness-of-Fit Tests

        Many people study goodness-of-fit tests (e.g., the chi square and Kolmogorov Smirnov) in university courses, and there seems to be a feeling that these tests are relatively easy to understand and to implement.   However, this is actually far from the truth as we will see below.

        Suppose that we have a sample of n observations from an unknown distribution, and that we would like to determine what probability distribution provides a good representation for the data.  Suppose further that we hypothesize that the data come from a lognormal distribution, but unbeknownst to us the data really come from a gamma distribution with a shape parameter of 2 and a scale parameter of 1.  If we perform a Kolmogorov-Smirnov (K-S) test at level 0.1 to test the null hypothesis (H0) that our data come from a lognormal distribution, then what is the probability that we reject H0?  This is called the power of the test, and we would like it to be as close to 1 as possible.  We certainly expect that the power will increase as the sample size n gets larger.

        To estimate the power, we performed 25 independent experiments for each of the sample sizes n = 50, 100, and 200.  For a particular experiment corresponding to a sample size of n, we generated a sample of n independent observations from a gamma distribution with shape and scale parameters of 2 and 1, respectively.  We then fit a lognormal distribution to the data set and performed a K-S test at level 0.1 to test the following null hypothesis:

         H0: The data are an independent sample from the fitted lognormal distribution

For each data set, the K-S test was actually performed using both ExpertFit and another distribution-fitting package.  The results of our experiments are given in the following table.

Table 1. Proportion of the 25 experiments that the K-S test rejected H0 at level 0.1.

Sample Size n ExpertFit Other Software
50 0.44 0.04
100 0.72 0.04
200 0.80 0.24

Thus, in the case of n = 100, ExpertFit rejected the null hypothesis, H0, 18 times out of the 25 experiments for an estimated power of  0.72, while the other software rejected H0 just 1 time out of 25 for an estimated power of 0.04.  Thus, there is a tremendous difference in power (or discriminating ability) between ExpertFit and other distribution-fitting software.  It follows that other distribution-fitting software is much more likely to "accept" a poorly fitting distribution, which could possibly compromise the integrity of your work.

Related Testimonial

 

ExpertFit Distribution-Fitting Software