Date of Conferral
Date of Award
Estimation via least squares is a mature area of statistics, but a phenomenon that occurs under certain conditions has escaped attention for hundreds of years, and is the focus of this dissertation. This new discovery demonstrates, both graphically and mathematically, the fact that certain conditions cause data points to have no influence on predictions made using ordinary least squares models. Least squares predictions are widely used in many disciplines to make decisions or to determine what may happen in the future. The loss of data when predicting y -values in a linear model is a loss of information, and such a prediction may be suboptimal in comparison to some other prediction technique that uses all the y -data points in its calculation. Since noncontributory data can be identified before the dependent variable data is even collected, this research can be used as a tool to help statisticians structure their input data more efficiently and analyze existing data with better understanding.
In this dissertation, the mathematical relationships between predictions and data points that are independent of those predictions have been developed and proven for least squares straight-line models, general polynomial models, and general univariate models that are linear in the unknown coefficients. The effect of noncontributory data were analyzed and shown graphically via numerous examples and mathematically in the general form. The important concept of data wells was introduced, defined, and examined to demonstrate the far reaching effect of this new discovery on least squares estimation. Data wells show that the phenomenon of noncontributory data is a continuous rather than a discrete phenomenon, a fact that extends the impact of this discovery dramatically. Finally, recommendations were made regarding future research in least squares sensitivity analysis, including work that will ultimately find a remedy for the phenomenon discussed in this dissertation.
This dissertation provides a foundation for future work in sensitivity analysis, and will help researchers better understand their data both before and after collection. Future research in this area should ultimately result in better predictions, and will have the effect of saving researchers both time and money in their work.