Goodness-of-Fit Tests
Many people study goodness-of-fit tests (e.g., the chi square and Kolmogorov Smirnov) in university courses, and there seems to be a feeling that these tests are relatively easy to understand and to implement. However, this is actually far from the truth as we will see below.
Suppose that we have a sample of n observations from an unknown distribution, and that we would like to determine what probability distribution provides a good representation for the data. Suppose further that we hypothesize that the data come from a lognormal distribution, but unbeknownst to us the data really come from a gamma distribution with a shape parameter of 2 and a scale parameter of 1. If we perform a Kolmogorov-Smirnov (K-S) test at level 0.1 to test the null hypothesis (H0) that our data come from a lognormal distribution, then what is the probability that we reject H0? This is called the power of the test, and we would like it to be as close to 1 as possible. We certainly expect that the power will increase as the sample size n gets larger.
To estimate the power, we performed 25 independent experiments for each of the sample sizes n = 50, 100, and 200. For a particular experiment corresponding to a sample size of n, we generated a sample of n independent observations from a gamma distribution with shape and scale parameters of 2 and 1, respectively. We then fit a lognormal distribution to the data set and performed a K-S test at level 0.1 to test the following null hypothesis:
H0: The data are an independent sample from the fitted lognormal distribution
For each data set, the K-S test was actually performed using both ExpertFit and another distribution-fitting package. The results of our experiments are given in the following table.
Table 1. Proportion of the 25 experiments that the K-S test rejected H0 at level 0.1.
| Sample Size n | ExpertFit | Other Software |
| 50 | 0.44 | 0.04 |
| 100 | 0.72 | 0.04 |
| 200 | 0.80 | 0.24 |
Thus, in the case of n = 100, ExpertFit rejected the null hypothesis, H0, 18 times out of the 25 experiments for an estimated power of 0.72, while the other software rejected H0 just 1 time out of 25 for an estimated power of 0.04. Thus, there is a tremendous difference in power (or discriminating ability) between ExpertFit and other distribution-fitting software. It follows that other distribution-fitting software is much more likely to "accept" a poorly fitting distribution, which could possibly compromise the integrity of your work.
ExpertFit Distribution-Fitting Software