1. NAME OF CANDIDATE: OKEWOLE D. M. O.

2. NAME OF SUPERVISOR: Dr. O.E. Olubusoye

3. YEAR OF COMPLETION: 22/01/2014

4. TITLE OF Ph.D THESIS: The Bayesian Approach to Estimation of Multi-Equation

Econometric Models in The Presence of Multicollinearity

5. ABSRACT

The Bayesian approach conveys information not available in the data but on prior knowledge of the subject matter, which enables one to make probability statements about the parameters of interest, while the classical approaches deals solely with the data. Several researches on the classical approaches have shown them to be sensitive to multicollinearity, a violation of one of the assumptions of multi-equation models which often plagues economic variables. Studies on the performance of the Bayesianmethod in this context are however limited. This study was aimed at investigating the performance of the Bayesian approach in estimating multi-equation models in the presence of multicollinearity.

The purely just and over-identified multi-equation models were considered. In both cases the normal distribution with zero mean and large variance served as locally-uniform prior for the regression coefficients. Three Bayesian Method Prior Variances (BMPV) were specified as 10, 100 and 1000 in a Monte Carlo prior variance sensitivity analysis. The Wishart distribution with zero degree of freedom served as prior distribution for inverse of error variance-covariance matrix, being its conjugate. The posterior distributions for the two models were then derived from the prior distributions and the likelihood functions as a birariate Student-t and generalised Student-t distributions respectively. The estimates were then compared with those from the classical estimators; Ordinary Least Squares (OLS), Two stage Least Squares (2SLS), Three stage Least Squares (3SLS) and Limited Information Maximum Likelihood (LIML). Samples of sizes T=20, 40, 60, and 100 in 5000 replicates were generated based on eight specified research scenario. The Mean Squared Error (MSE) of the estimates were computed and used as evaluation criteria.

The BMPV 10 produced the least MSE in the prior variance sensitivity analysis for the over-identified model, whereas for the just-identified model without multicollinearity, BMPV 100 was the smallest. The Bayesian method was better in the small sample cases T≤40 than the classical estimators for β (the coefficient of the exogenous variable in the just-identified model); when T=20, MSE for BMPV 10, 100 and 1000 were 0.169, 0.168 and 0.171 respectively, whereas OLS, 2SLS, 3SLS and LIML yielded same results; 0.244, when T=40 BMPV 10,100 and 1000were 0.1220, 0.1272, 0.1361 respectively and 0.1262 for the classical methods. The 2SLS and 3SLS estimates of γ (coefficient of the endogenous explanatory variable) which were the same in the over-identified model had smaller MSE than the Bayesian method; when T=20, MSE for 2SLS/3SLS = 0.0280, whereas BMPV 10=0.0286, BMPV 100 = 0.0300, BMPV 1000 = 0.033. The Bayesian method was less sensitive to multicollinearity in estimating coefficients of the correlated exogenous variables; MSE (T=20) for BMPV 10, 100, 1000 were 0.4529, 0.5220, 0.5290 respectively, while it was 0.7492 for the classical estimators. The MSE of LIML (0.0036) was similar to that of BMPV 100 (o.0036) and BMPV 1000 (0.0036) in large sample case Т = 100 forγ.

Bayesian approach was suitable for estimating the parameters of exogenous variables in the small sample cases when the model is purely just-identified, and in over-identified model in the presence of multicollinearity.

Keywords: Bayesian approach, Prior distribution, Multicollinearity, Mean squared error.

Word Count: 498

1. NAME OF CANDIDATE: UDOMBOSO C. G.

2. NAME OF SUPERVISOR: Prof. G.N. Amahia / Prof. I. K. Dontwi

3. YEAR OF COMPLETION: 03/02/2014

4. TITLE OF Ph.D THESIS: On the Level of Precision of an Heterogeneous Statistical

Neural Network Model

5. ABSRACT

The multi-layer perceptron is a type of Statistical Neural Network (SNN) model that is more precise than the linear regression model. However, it fails to attain high precision due to the use of homogeneous Transfer Functions (TFs) which do not appropriately link the input layer to the output layer. Therefore, an alternative SNN model using heterogeneous TFs to overcome the limitations of homogeneous TFs was developed. An Adjusted Network Information Criterion (ANIC) for testing the adequacy of the SNN models was also derived.

An Heterogeneous SNN (HETSNN) model was derived by the convolution of two Homogeneous SNN (HOMSNN) models: y_1=αX+∑_(h=1)^H▒β_h g_1 (∑_(i=0)^I▒γ_hi x_i )+e_i and 〖 y〗_2=αX+∑_(h=1)^H▒β_h g_2 (∑_(i=0)^I▒γ_hi x_i )+e_i, where y_1 and y_2 are the dependent variable, X is a matrix of independent variables, α,β, and γ are the parameters of the network, e_i is the noise normally distributed with mean 0 and variance σ^2 (e_i ~ N(0,σ^2 ) ), g_1 (.) and g_2 (.) are the transfer functions, h=1,2,…,H are the number of hidden units, and i=0,1,…,I are the number of input units. One TF was used in the HOMSNN model while a convolution of two TFs was used to derive the HETSNN. Two sets of meteorological data [Amravati Hydrology Project at Manasgaon station, India from 1990 to 2004 and Nigeria Meteorological (NIMET) station, Ibadan from 1971 to 2004] were used to investigate the fit of the derived model. Thirteen sub-samples: 10, 20, 40, 60, 80, 100, 125, 150, 175, 200, 250, 300, and 400 generated from the NIMET data were used to investigate the asymptotic behaviour of the models and ANIC using the Kolmogorov-Smirnov one sample normality test. Network Information Criterion (NIC) and the derived ANIC, using Kullback’s symmetric divergence, were computed for determining the adequacy of the two models. Variance-ratio tests were carried out on the significance of the HETSNN model.

The derived HETSNN model wasy=αX+∑_(h=1)^H▒〖β_h [g_1 (∑_(i=0)^I▒〖γ_hi x_i 〗) g_2 (∑_(i=0)^I▒〖γ_hi x_i 〗) ] 〗+e_i e_j, where e_i e_j~ N(0,σ^2 ). The HETSNN approached zero asymptotically faster than HOMSNN. The asymptotic behaviour of the models showed that HETSNN improved steadily over HOMSNN at an average rate of 3.0% to 15.0%. The rates of model adequacy for both HETSNN and HOMSNN using NIC were respectively 72.9% and 58.1%. Consequently, using ANIC, the rates were 67.7% and 65.0% respectively. This indicated that ANIC was an unbiased information criterion for the two models. The ANIC decayed to zero much slower than NIC with increasing sample size. The asymptotic behaviour of the ANIC also showed that as the sample size increased, ANIC approximated the standard normal distribution, N(0,1). There was a significant difference (p
The heterogeneous statistical neural network model linked the input layer to the output layer more appropriately. The derived adjusted network information criterion performed better in model selection when compared with the network information criterion.

Keywords: Transfer functions, heterogeneous statistical neural network, adjusted network information criterion.

Word count: 458

1. NAME OF CANDIDATE: KAYODE J. A.

2. NAME OF SUPERVISOR: Prof. G.N. Amahia

3. YEAR OF COMPLETION: 30/09/2014

4. TITLE OF Ph.D THESIS: Modelling Response Propensities in Household Surveys in

Ibadan Metropolis, Oyo State

5. ABSRACT

Response propensity is a respondent’s tendency to willingly answer survey questions. Propensity models have been used to study low response as a result of non response, and to reduce non response bias. These propensity models only considered the effects of main characteristics on response propensities, ignoring their interaction effects. Therefore, this study was designed to adapt and validate response propensity models with main and interaction effects in household surveys.

A two stage stratified sampling scheme was used to select 400 households in Ibadan metropolis using the master sample list of the National Integrated Survey of Households prepared by the National Bureau of Statistics as the sampling frame. Households in both urban and rural areas were interviewed in five waves within a period of fifteen months (January 2011 – March 2012). An interviewer-administered questionnaire was used to collect data on household characteristics and response propensities: age, sex, household size, educational level, employment status, nature of employment, location, marital status, religions, literacy level and household income and expenditure. Household characteristics were analysed using summary statistics. Multi-way contingency tables were constructed to investigate the presence of relationships and dependence structures among the characteristic under consideration.

The characteristics that exhibited significant relationship and dependence structure were used in constructing a propensity model with main and interaction effects. The household characteristics were used to construct a Response Surface Polynomial Model (RSPM). The RSPM was subjected to canonical analysis to characteristics its turning point of the RSPM was used to determined saddle point, minimum point and maximum point respectively.

The average household’s size in rural and urban were 6.2+0.8 and 5.8+0.4 5respectively. The proportion of households headed by women (measure of parity was 0.28 in rural and 0.21 in urban. The disparities in the response rates in urban wave’s insignificant (p > 0.06) while they were significant (p
Household size, employment status and their interactions were found to play significant roles in obtaining optimum response in household surveys in Ibadan metropolis. A policy that promotes employment as well as encourages six persons per household is recommended as this would greatly enhance response propensities in household surveys.

Keywords: Response rate, Non-response bias, Propensity model, Interaction effects, Canonical analysis.

Words Count 453.

1. NAME OF CANDIDATE: BOLARINWA I. A

2. NAME OF SUPERVISOR: Dr. O.E. Olubusoye

3. YEAR OF COMPLETION: 27/11/2014

4. TITLE OF Ph.D THESIS: On the Sensitivity of Binary Choice Models to Misspecified

Tolerance

5. ABSRACT

Binary choice parametric models require some stringent assumptions for validity. The most critical of the assumptions is the distribution of the tolerance. However, the sensitivity of some binary choice models to departure from the tolerance distribution is a major gap in literature. Hence, this research was designed to assess the sensitivity of some binary choice models to misspecified tolerance.

The latent model under consideration was one-way fixed effects: y*it = βxit +μͥ +vit, with μͥ representing individual heterogeneity and ⱱit, the remainder disturbance; the mapping from latent y*it to observed yit was: yit = 1(y*it > 0) Two sets of experiements were involed. The first experiment adopted methods similar to those used in Heckman’s method in its simulation of the discrete response variable. The exogenous variable xit was generated using a method similar to Nerlove’s in which random variable, εit uniformly distributed on the interval [-5, 5] was chosen to simulate the model: xit = 0.1t + 0.5 xi,t -1 + εit. ⱱit was generated as Standard Cauchy; gamma with parameters k = 1and θ = 1; lognormal with parameters m = 1 and s = 0.5; and standard normal. For the second experiment, xit was generated as experiments, number of individuals, Nwas set purposedly at 25, 50, 100, 150, 200, 250 and 300 while the number of time points, T was set purposedly at 5, 10, 15, and 20. Model parameters were estimated by method of maximum likehood. The two experiments were replicated 100, 500 and 1000 times with Absolute Bias (AB), Variance and Root Mean Squared Error (RMSE) as performance criteria. Model was considered best when the AB of tolerance was minimum. Models involved were: probit, logit, complementary log-log and gompertz.

The first experiment revealed that absolute biases of probit averaged 39.1, 66.6, 26.2 and 34.6% for gamma, Cauchy, lognormal and normal tolerance respectively. The average absolute biases for logit were 15.2, 35.0, 451.4 and 2649.1%. the values for complementary log-log were 21.3, 63.3, 7.4 and 77.8%. Absolute biases averaged 21.2, 54.6, 6.9 and 81.6% for the gompertz model. Maximum RMSE and variance were 440.4, 177217.2, 319.5, 92714.5; 138.2, 18231.2 and 323.7, 103511.5 for gamma, Cauchy, lognormal, normal respectively. For the second experiment log-log and gompertz were respectively 89.2%, 4.1, 0.31; 82.4%, 118.5, 10945.0; 88.2%, 1.0, 0.2 and 87.3%, 0.9, 0.03. The AB increased with increased in autocorrelation. The two experiments revealed that variance typically diminished with increase in T.

The logit, complementary log-log and gompertz models were most sensitive to normal tolerance while the probit model was most sensitive to Cauchy. The binary choice models are at their best when correct tolerance is specificed, having proved to be quite sensitive to misspeficied tolerance.

Keywords: Panel data, Link function, Latent variable, Autocorrelation

Word count: 451

1. NAME OF CANDIDATE: AYOOLA F. J

2. NAME OF SUPERVISOR: Dr. O.E. Olubusoye

3. YEAR OF COMPLETION: 28/11/2014

4. TITLE OF Ph.D THESIS: On the Performance of Eight Panel Data Estimators in the

Presence of Exponential Heteroscedasticity Structure

5. ABSRACT

Panel data (PD) refers to the pooling of cross-section observations over several time periods. PD model of interest is one in which the error is decomposed into Individual Effect (IE) and Remainder Effect (RE). IE represents combined effect of omitted variables peculiar to the individual while the RE represents all that is not accounted for by model. Of central concern is the consequence of differences in the conditional variance (heteroscedasticity), assumed to be constant variance (homoscedasticity) across time and cross-sectional units. Presence of heteroscedasticity in PD econometrics results to misleading inferences. Most research had focused on Linear Heteroscedasticity Structure (LHS) and Quadratic Heteroscedasticity Structure (QHS) but little is known about Exponential Heteroscedasticity Structure (EHS). This study was aimed at assessing the performance of eight PD estimators in the presence of EHS on individual and remainder effects.

The error components model investigated was denotes IE and RE. Two experiments were designed following Roy and Li approach.For the first experiment, EHS was introduced by allowing IE to vary as while RE was distributed as . For the second experiment, EHS was introduced by allowing the RE to vary as and IE was distributed as , is the individual mean of and are variants of EHS. For meaningful comparison across different data generating processes, values 4 and 8 were assigned to respectively. Six cross-sectional units purposedly selected at four time periods for four variants of EHS were investigated in both experiments. Three replicates of sizes: 100, 1000 and 5000 were generated for N and T combinations. The eight estimators purposedly considered were Pooled Ordinary Least Squares (POLS), Between Group (BG), Within Group (WG), Amemiya (AM), Wallace and Hussain (WALHUS), Swamy and Arora (SWAR), Nerlove (NER) and Panel Generalized Least Squares (PGLS). The relative performances of these estimators were assessed using Absolute Bias (ABIAS) and Root Mean Squared Error (RMSE). The estimators were then ranked according to their performances.

The performance of estimators in the presence of EHS on individual and remainder effects in the first experiment when , at different levels of EHS variants and replications, showed that PGLS had minimum ABIAS and RMSE. The POLS had maximum ABIAS and RMSE. Also, in the second experiment, when PGLS performed better than other estimators while POLS performed poorly. Similarly, when , in the first experiment, PGLS performed better than other estimators with minimum ABIAS and RMSE while WG had maximum ABIAS and RMSE. When , in the second experiment, PGLS still performed better than other estimators and the POLS performed poorly. Generally, the performance improved as the combinations of N and T increased in both experiments. The ranking of the eight estimators for the two experiments are in the order PGLS (95%), SWAR (69%), NER (64%), WG (45%), AM (43%), WALHUS (37%), BG (36%) and POLS (28%).

Panel generalised least squares estimator performed better than other estimators for the exponential heteroscedasticity structure. This will help in the choice of estimators in empirical work where datasets exhibit exponential heteroscedasticity.

Keywords: Panel data, Exponential heteroscedasticity structures, Error component model, Individual and remainder effects

Word count: 473

1. NAME OF CANDIDATE: KORTER G. O.

2. NAME OF SUPERVISOR: Dr. O.E. Olubusoye

3. YEAR OF COMPLETION: 19/02/2015

4. TITLE OF Ph.D THESIS: Modelling Road Traffic Crashes using Modified Spatial

Autoregressive Model

5. ABSRACT

Spatial contiguity of locations is a major factor in modelling Road Traffic Crashes (RTC) to enable a proper understanding of Spatial Dependency (SD) in RTC occur¬rences in order to maximise information for effective delivery of road safety remedial measures. The Spatial Autoregressive (SAR) model often used in modelling RTC accounts for SD only in the Endogenous Variable (EV). This method ignores SD in the Disturbance Term (DT) and Instrumental Variable (IV) which does not allow complete and consistent estimation when modelling RTC. However, for RTC to be fully modelled, SD in the DT and IV should be captured. The objective of this study was to modify the SAR model to include SD in the DT and the IV.

The SAR model which included Spatial Autoregressive with Spatial Autoregres¬sive Disturbance (SAR-SARD) and Spatial Autoregressive with Spatial Autoregres¬sive Disturbance with Instrumental Variable (SAR-SARD-IV) models were used to model RTC. Data were obtained from Federal Road Safety Commission, Oyo State, Nigeria. The longitudes and latitudes records for RTC (2012) were taken to ascertain the locations and frequency within each of the 33 Local Government Areas (LGA). A 33 x 33 weights matrix; travel density; land area and major road length of each LGA were used as exogenous variables and population was the IV. The estimates of the parameters were obtained using Maximum Likelihood for SAR and SAR-SARD models and the Generalised Spatial Two-Stage Least Squares for SAR − SARD − IV model. The spatial dependence, λ, for RTC was estimated for SAR, SAR-SARD and SAR-SARD-IV models, while the spatial error dependence, p, was estimated for SAR-SARD and SAR-SARD-IV models. The Morans Index (M) and Getis and Ord statistic (G) were used to identify spatial pattern, concen¬tration levels and hotspots for RTC.

The derived models were and , where is the vector of observations; and are weighting matrices; is a matrix of observations; is a vector of unknown coefficients and is the coef¬ficient of EV . The coefficient estimates for population, travel density, land area and major road length were [0.63; 0.42]; [−0.07; 0.01]; [−0.10; −0.18] and [−0.25; −0.10] for the SAR-SARD and SAR-SARD-IV models. The LGAs with larger populations had more RTC; traffic generated was associated with more RTC; decrease in area of administration led to less RTC and the existence of a freeway link across a LGA led to reduction in RTC. The estimated, was 0.37 (p = 0.06); 0.71 (p = 0.001) and 1.21 (p = 0.001) for SAR, SAR-SARD and SAR-SARD-IV models respectively which confirmed the existence of SD.

The frequency of RTC in a LGA was strongly related to the frequency of RTC across LGAs that are contiguous. The estimated values were −0.45 (p = 0.050) and −1.18 (p = 0.018) for SAR-SARD and SAR¬SARD-IV models. This suggested that an exogenous shock to one LGA caused moderate changes to RTC in contiguous LGAs. The estimated M and G statistics equal 0.19 (p = 0.01) and 0.36(p = 0.01). The spatial pattern was clustered, indi¬cating strong SD. Highest concentrations and hotspots for RTC were in Egbeda, Oluyole and Akinyele LGAs.

The spatial autoregressive with spatial autoregressive disturbance and the spatial autoregressive with spatial autoregressive disturbance with instrumental variable models are more informative and appropriate for modelling road traffic crashes than the spatial autoregressive model.

Keywords: Spatial modelling, Maximum likelihood estimate, Spatial dependence, Road traffic crashes

Word count: 495

1. NAME OF CANDIDATE: OLUBIYI A. O.

2. NAME OF SUPERVISOR: Dr. O.E. Olubusoye

3. YEAR OF COMPLETION: 05/03/2015

4. TITLE OF Ph.D THESIS: Geoadditive Bayesian Model for Data with Limited Spatial

Information

5. ABSRACT

Large area estimation has been mostly accomplished using Geoadditive Models (GM) which combines the ideas of Geostatistics and additive models. The GM relaxes the classical assumptions of traditional parametric model by simultaneously incorporating linear and nonlinear, nonparametric effects of covariates, nonlinear interactions and spatial effects into a Geoadditive predictor. In the past, estimation of GM has been based on large area as a result of insufficient information in small areas. However, Bayesian approach allows out-of-sample information which can be used to augment the limited information in small areas. Hence, this study adopted the Geoadditive Bayesian model to estimate small areas with insufficient spatial information focusing on small district areas.

The GM by Kamman and Wand was specified by using Effect Coding (EC) to capture the spatial effect. The posterior was obtained by combining the likelihood (data) with the prior (out-of-sample) information. The likelihood and the prior information were assumed to be Gaussian and inverse gamma distribution respectively. The numerical solutions were obtained for the posterior distribution, which were not having a closed form solution, using Markov Chain Monte Carlo (MCMC) simulation technique. Finite difference and partial derivative methods were used to estimate other components of the Geoadditive Bayesian model. Kane analyser was used to collect vehicular emission (carbondioxide, carbonmonoxide and hydrocarbon). Information were also collected on age of vehicles, vehicle types (car and buses), vehicle uses (private and commercial) from 9211 vehicles for 3 years (2008-2011) covering 4 locations: Abeokuta, Sagamu, Ijebu-Ode and Sango-Ota. Data were also collected on respiratory health records of 9211 individuals (18 years and below) in six different hospitals on number of visits (nv) and diagnosis within the locality of the collection point of pollutants. Exploratory Data Analysis (EDA) was carried out on emitted pollutants and age of vehicles. Autocorrelation plot was used to determine model performance.

The Geoadditive Bayesian model was exp(g_0 (t)+1/√(2πτ^2 ) e^((-1)/2) (β_j )^2 ∑_(j=1)^p▒z_ij +1/√(2πτ^2 ) e^((-1)/2) (β_j )^2+1/√(2πτ^2 ) e^((-1)/2) (β^spat )^2+1/√(2πτ^2 ) e^((-1)/2) (β_gi )^2.exp∫_0^ti▒〖exp(g_0 (u)〗+∑_(i=1)^p▒〖g_j (u) 〗 z_ij)du , where 〖 z〗_ij,〖 g〗_j,β_spat and〖 β〗_j were non-linear time varying effect, linear time varying effect, spatial effect, and random component, respectively. The MCMC simulation technique gave the posterior means and the standard errors. This revealed that nv, diagnosis, vehicle uses, vehicle types jointly determine the health effect of pollutants on the individuals considered. Compared with Abeokuta individuals who lived in Sagamu (posterior mean = 0.036) were more likely to be affected by emitted pollutants while those in Sango-Ota (posterior = -0.002) and Ijebu-Ode (posterior = -0.015) were less likely to be affected. The EDA indicated non-linearity in the pollutants and age of vehicles. There were convergences of parameters at 250 Lag. A significant increase in the nonlinear effects was observed for age of vehicle (5years – 12years), Carbondioxide (10100 – 14400ppm), Carbonmonoxide (0 – 25000ppm) and hydrocarbon (4953 – 19812ppm).

The derived Geoadditive Bayesian Model was found suitable and therefore recommended for estimating location effect of small areas with limited spatial information.

Keywords:Geoadditive Bayesian Model, Autocorrelation plot, Spatial Information.

Word Count: 467

1. NAME OF CANDIDATE: YAKUBU YISA

2. NAME OF SUPERVISOR: Dr. Angela U. Chukwu/Prof. G.N. Amahia

3. YEAR OF COMPLETION: 09/03/2015

4. TITLE OF Ph.D THESIS: Robustness of Split-Plot Response Surface Designs in the

Presence of Missing Observations

5. ABSRACT

Robustness of an experimental design in the presence of a missing observation is a measure of insensitivity of the design to the missing observation. In most experimental designs, situations often arise where some observations are missing due to some unforeseen factors. In such situations, some properties like optimality, orthogonality, and rotatability, which are performance criteria of a design, are destroyed. In terms of these criteria, Completely Randomized Response Surface Designs(CRRSDs) that are robust to missing observations have been extensively studied. However, CRRSDs become inadequate inmost industrial experimental situations where some factor levels are difficult to change or control. An alternative insuch situations is the split-plot design approach, which has received less attention on robustness to missing observations. This study was therefore aimed at developing Split-Plot Response Surface Designs(SPRSDs) that are robust to missing observations.

Two configurations of SPRSDs, Full Factorial Replicate Designs(FFRDs) and Half Factorial Replicate Designs(HFRDs) were considered. For these configurations, a criterion which minimises the maximum loss in information due to missing observations in SPRSDs was derived byan extension of the existing criterion for CRRSDs. Second-order SPRSDs were used in establishing the robustness of the derived criterion. Based on this criterion, losses in design’s informationdue tomissing a single observation of factorial point(L_f), whole-plot axial point(L_α), subplot axial point(L_β), and center points(L_c), were investigated for each configuration, at various distances(ρ) of the axial point from the design center. Furthermore, losses L_ff,L_αα,L_ββ,L_cc, due to missing pairs of observations were also investigated. Efficiency of reduced, FFRDs due to these missing pairs was investigatedinterms of optimality criteria: trace(A), maximum prediction variance(G), and integrated prediction variance(V), for different correlation ratios(d):0.5,1.0,5.0,10.0.

The derived criterion for the two configurations wasL_m=1-(|X^' V^(-1) X|_m⁄|X^' V^(-1) X| ),

where L_m is the loss due to m missing observations, X is the model matrix, |X^' V^(-1) X| and|X^' V^(-1) X|_m are the determinants of the full and the reduced information matrices respectively andV is the error structure. When an observation

wasmissing, the losses L_α and L_c were not significant inthe two configurations, which indicated a low effect of the corresponding missing points. In FFRD,L_f=0.432,0.431,0.430,0.430 and L_β=0.430,0.431,0.432,0.435, for ρ=0.81,0.82,0.85,0.90 respectively. In HFRD, L_f=0.881,0.866,0.843,0.808 and L_β=0.626,0.661,0.720,0.808, whenρ=2.23,2.50,3.00,4.07 respectively. The maximum losses were0.432,0.431,0.432,0.435, and 0.881,0.866,0.843,0.808for the two configurations respectively. These designs were robust to one missing point at L_f=L_β=0.431 for ρ=0.82 and L_f=L_β=0.808 when ρ=4.07 respectively, which are the minimum of all the maximum losses. When a pair of observations was missing, L_ααand L_cc were also not significant in the two configurations. In FFRD, L_ff=0.706,0.704,0.699,0.693, andL_ββ=0.697,0.704,0.717,0.738, for ρ=1.25,1.33,1.50,1.75, respectively. In HFRD, L_ff=0.968,0.962,0.957,0.953 and L_ββ=0.944,0.962,0.972,0.979 when ρ=3.50,4.04,4.50,5.00 respectively. The respective maximum losses were 0.706,0.704,0.717,0.738, and 0.967,0.962,0,972,0.979. These designs were robust to a missing pair atL_ff=L_ββ=0.704 for ρ=1.33and L_ff=L_ββ=0.962 when ρ=4.04respectively. Maximum efficiency losses due to missing ff,αα,ββ,cc were observed only at d=0.5; these were19.1,0.9,10.6,15.7%; 10.1,0.1,16.1,0.1%; 0.1,0.1,1.1,0.2% for A,G,V respectively. As dincreases, the losses became insignificant.

The robustness potentials of split-plot response surface designs when some observations are missing were established. The developed designs are good alternatives inany experimental situations when factor levels are difficult to change.

Keywords: Split-plot response surface designs, Losses due to missing observations, Correlation ratio

Word count: 499

1. NAME OF CANDIDATE: OBISESAN K

2. NAME OF SUPERVISOR: Prof. T.A. Bamiduro

3. YEAR OF COMPLETION: 24/03/2015

4. TITLE OF Ph.D THESIS: Modelling Multiple Changepoints Detection

5. ABSRACT

Changepoints occur when statistical properties of series change abruptly. This can be applied to explain changes in environmental systems. As a result of climate change the concerns mostly encountered at extreme points of change include flooding, water pollution and eutrophication. However most existing works on multiple changepoints involve single changepoint detection taking a lot of time. The objectives of this study include proposing a multiple changepoints model for detecting changes simultaneously using rainfall series and in addition to investigate the behavior of some regular assumptions as a result of changepoints problem. The model will also cater for change detection when the assumption of single changepoint fails as a result of increase in sample size.

The data-sets used consist of monthly rainfall for South-western Nigeria from 1948 to 2008. In comparison, the United Kingdom Blackwater rainfall data from 1908 to 2000 were also used to estimate multiple changepoints. Statistical simulation was used in investigating failure of assumptions when changepoints exist such that givenx_1,…,x_T; there exists a changepoint δ∈{1,…,T-1} and that the statistical properties of {x_1,…,x_δ |f(x;θ_0)} and {x_(δ+1),…,x_T |f(x;θ_1)} differ where the densityf(x;θ_0) ≠f(x;θ_1). The likelihood approach was used to develop a multiple changepoints model ∑_(i=1)^(m+1)▒K (x_(〖(δ〗_(i-1)+1):δ_i ) )+βf(m)where δ=δ_1,…,δ_mand l_(θ= ) ∏_(i=1)^(δ_1)▒〖f(x_i;θ_1)〗,…,∏_(i=δ_(m-1)+1)^(δ_m)▒〖f(x_i;θ_m)〗 extending the case where l_θ=∑_(I=1)^δ▒log〖f(x_(i,) θ_0 )+〗 ∑_(i=δ+1)^T▒log〖f(x_(i,) θ_1 ).〗 These models were used to isolate positions of changes using the various rainfall series. Bayesian approach was also used in estimating changepoint using Bayes factors allowing imposition of prior distribution on the changepoints. Extreme changes were evaluated using extreme value distributions.

Statistical simulation revealed the failure of regular assumptions inherent in many existing methods as the log-likelihood function indicated a step function. The likelihood function produced a new test statistic l_θ=[-Tlog[∑_(i=1)^δ▒〖(〖x_i-θ_0)〗^2 〗+∑_(i=δ+1)^T▒〖(〖x_i-θ_1)〗^2 〗]^(-1) ∑_(i=1)^T▒〖(x_i-θ)〗^2 indicating the existence of changepoints at 5 percent significance level. Fitting the multiple changepoint model to the southwestern data, it revealed changes around 1958, 1972, 1980, 1999 and 2008. The extreme value distributions indicated return levels increasing over time from 352.9 mm for a 10-year return period to 462.6 mm for 100-year period in an increasing trend. The rainfall data indicated time point 65 in 1973 as the changepoint while the Bayes factor for change in mean level supported the frequentist method to validate changepoint. The prior distribution used for the Bayes factor takes the scaled Inverse Chi-square as in Inv- (v, ) and ( | )= N( , ) such that ( ) = ( ) ( | ).

A multiple changepoints model was developed following criticisms of existing models arising from the collapse of assumptions underlying the latter. An increasing trend was detected in disastrous changepoints and therefore provides an early warning mechanism of dangers whose solution means well for sustainable development.

Keywords: Changepoints, Likelihood function, Regular assumptions, Bayes factor, Extreme value

Word count: 480

1. NAME OF CANDIDATE: USMAN Baba Yahaya

2. NAME OF SUPERVISOR: Dr. O. I. Shittu

3. YEAR OF COMPLETION: 28/09/15

4. TITLE OF Ph.D THESIS: Development of Multihalver Techniques for Detecting Outliers

in Time series data

5. ABSRACT

Outliers are observations that lie outside the overall patterns of a distribution. Multihalver technique (MT) had been used for outlier detection only in small sample which are even numbered under certain criterion with serious masking effects. Little attention has been given to MT for detecting outliers in large sample with no discrimination between odd and even numbered data. This study was aimed at developing Modified Multihalver (MM) and Multihalver distribution (MD) techniques that are capable of handling large or small sample with no discrimination between odd and even numbered data.

For MM technique, Gaussian quantiles〖(q〗_i) was redefined as where a_i=|m_i-m ̅_0.20 |-⏟min┬(1
where y is the observed series, μ is the mean, σ^2 is the variance and h is the optimal number of halvings. An adapted test statistic that followed the standard normal distribution based on MD was used to detect outliers at 5% level of significance. Four Data sets on monthly Sunspot number (SSN) obtained online from 1749-2013 of size 248, Body temperature (BT) , Systolic Blood pressure (SBP) and Diastolic Blood pressure (DBP) data of size 187 each collected from a private hospital were used to validate the techniques. Three sets of simulated data (SD) of sizes 450, 950 and 1500 with randomly injected outliers of sizes 4, 9 and 14 respectively were also used.

The MM which employed MOD detected 68 (28%) outlying values in SSN, 32 (17%) in BT, 56 (30%) in SBP, 34 (18.2%) in DBP. The MD and the adapted test statistic detected 55 (22%) outlying values in SSN, 54 (29%) in BT, 42 (22.5%) each in SBP and DBP. The MD detected 3 (75%), 8 (89%) and 14 (100%) of the randomly injected outliers in the simulated data respectively. The rate of detection of outliers using MD increases with increase in data size.

Modified Multihalver and Multihalver Distribution techniques detected outliers in small and large data sets without masking effects and with no restriction on even or odd numbered data. These new techniques are recommended for detection of outliers in small or large time series data with no restrictions to odd or even numbered data.

Key word: Outliers, Multihalver techniques, Trimmed mean, Gaussian quantiles.

Word count: 442

1. NAME OF CANDIDATE: AIDEYAN Donald Osaro

2. NAME OF SUPERVISOR: Dr. O. I. Shittu

3. YEAR OF COMPLETION: 05/10/15

4. TITLE OF Ph.D THESIS: Wavelet Shrinkage Model for Detecting Abberant

Observations in time Series

5. ABSRACT

Aberrant observations (AOs) are data usually inconsistent with the rest of the series and have the tendency to render statistical inference invalid. The spectral method is widely used for detecting AOs but this is limited to when series is stationary and periodic. Wavelet shrinkage is an alternative to the spectral method which reduces the size of the series into resolutions without losing the properties of the series. This study proposed Bayesian parametric and non-parametric wavelet shrinkage methods capable of detecting AOs in non-stationary and non-periodic time series in each resolution.

For Bayesian parametric wavelet method, Normal distribution for unobservable clean series was specified. Normal distribution was assumed for AOs as f_x~N(μ_o,σ_o^2)which gave Contaminated Normal distribution (CND) for the posterior. Exponential distribution was also assumed for AOs, and gave Laplace distribution (LD) as another posterior{ f_(y/x)~N(μ_c,σ_o^2+σ_n^2 ) } where μ_o, σ_o^2 are the mean and variance of the aberrant series, σ_n^2 is the variance of the unobservable clean series. The likelihood estimates of these posteriors were used to detect AOs. In the non-parametric method, Turkey’s lower, upper and inter quartile range were computed at each resolution and were used to detect AOs. The University College Hospital, Ibadan diabetic data (UCHDD) and Zadakat data (ZD) daily offerings in a local mosque in Ibadan were used to validate these methods when =0.05 and 0.1. Three contaminated series designated as A, B and C of sizes 512, 1024, 2048 respectively were also used for validation.

The Bayesian parametric wavelet shrinkage approach was able to detect AOs at first, second, third and fourth resolutions for UCHDD; first and second resolution for ZD. In the simulated series, first, second, 75% of third, and 50% of fourth resolutions for series A; first, second, third, fourth, and fifth resolutions for series B; first, second, third, fourth, fifth, sixth, and seventh resolutions for series C when =0.05. When =0.1, first, second third and fourth resolutions for UCHDD; first, second, third, and fourth resolution for ZD. In the simulated series, first, second, third, and fourth resolutions for series A; first, second, third, fourth, and fifth resolutions for series B; first, second, third, fourth, fifth, sixth, and seventh resolutions for series C. In non-parametric approach, Turkey’s method was able to detect AOs at first, second, third, and fourth resolutions for the UCHDD; first and second resolutions for ZD. In the simulated series, first, second, third, and fourth resolutions for series A; first, second, third, fourth, and fifth resolutions for series B; first, second, third, fourth, fifth, sixth, and seventh resolutions for series C.

Bayesian parametric and non-parametric wavelet shrinkage techniques were suitable in detecting aberrant observations in the same location at different resolution especially when the series is non –stationary and non-periodic. It is therefore recommended for use where issue of stationary and period is not important.

Keywords: Wavelet analysis, Contaminated Normal Distributions, Likelihood function, Resolution level.

Word count: 461

1. NAME OF CANDIDATE: OYAMAKIN Oluwafemi Samuel

2. NAME OF SUPERVISOR: Prof. T. A. Bamiduro / Dr. Angela U. Chukwu

3. YEAR OF COMPLETION: 14/10/15

4. TITLE OF Ph.D THESIS: Development of Alternative Nonlinear Growth Models for

Biological Progresses Based on Hyperbolic Sine Function

5. ABSRACT

Growth modelling plays an important role in the study of biological processes. Studies have shown that majority of the growth models emanated from the Malthusian growth equation (MGE), which has a serious limitation of growing without bounds.Some of the variants of the Malthusian growth equations include; Richards, Gompertz, and logistic equations, which has been extensively studied and applied to a wide range of biological studiesbut with restrictions on their inflexion points, which often shows complex growth patterns.This studywas aimed at developing alternative growth models by reparametizing the intrinsic rate of increase in the MGE and its variants using the hyperbolic sine functionthat is more flexibleto enhance internal prediction and robustness in terms ofnormality and independence assumption ofthe modelsthrough valid comparison.

The intrinsic rate of increase in the MGE was modified by considering a growth equation, which produces flexible asymmetric curves through nonlinear ordinary differential equations of the form;dH/dt=H[r+θ/√(1+t^2 )] where r is the intrinsic rate of increase, H is the response variable, t is time variable and θ is the allometric constant based on the hyperbolic sine function. Both additive and multiplicativeerror structures were considered in the modeling process.Statistical tests of independence and normality of the error components were carried out. Japanese quail (JQ), Malasian oil palm (MOP), Norwegian top height (NTH) published datasets and Gmelina arborea(GA), Pine height (PH) and Pine diameter at breast height (PDBH)obtained from the records of the Forestry Research Institute of Nigeria were used to test the validity of the new models in terms of general fitness and internal predictive status as well as robustness. Mean Square Error (MSE), Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) were used to determine the best models among the proposed and the existing models.

The developed models are, Hyperbolic Exponential Growth Model (HEGM); ,Hyperbolic Monomolecular Growth Model (HMGM); , Hyperbolic Gompertz Growth Model (HGGM); , Hyperbolic Richards Growth Model (HRGM); and Hyperbolic von Bertallanfy Growth Model (HVBGM); .HGGM was best among others in JQ, MOP and PDBH withR2(0.999, 0.95, 0.99), MSE (0.657, 3.552, 0.226), AIC (-0.8787, 27.581, -21.19), and BIC (-1.095, 31.359, -18.50) respectively. HEGMwas best among others in NTH with R2 0.999, MSE 0.0152, AIC -39.008 and BIC -38.1. HMGMwas best among others in PH with R2 0.95, MSE 1.143, AIC 5.72, and BIC 9.03.The nonparametric tests established independence and normality of the error terms with p-value > 0.05 for all the proposed models. In the twenty-four scenarios considered for robustness, the proposed models outperformed the existing models in JQ with ratio 77/23; MOP 88/12; NTH 90/10 using R2, 54/46; 54/46; 50/50 using MSE and 54/46; 50/50; 46/54 using AIC as model selection criteria respectively.

All the newly developed hyperbolic growth models showed improved general fitness, robustness and predictive status over their existing source models. The developed models provide better alternatives for modelinggrowth of biological processes.

Keywords: Growth models, Hyperbolic sine function, Top height, Diameter at breast height.

Word count:490

1. NAME OF CANDIDATE: ADEPOJU KazeemAdesola

2. NAME OF SUPERVISOR: Dr. O. I. Shittu / Dr. Angela U. Chukwu

3. YEAR OF COMPLETION: 04/03/16

4. TITLE OF Ph.D THESIS: Robustness of Exponentiated for Distribution in One way

Anova to Outlying Observations

5. ABSRACT

The Classical-F(CF) relies on the three major underlying assumptions namely: non overlapping of the populations, constant variance and absence of outliers. These assumptions had been violated because of the presence of outliers in observation. Previous attempts to address these violations resulted in the inflation of type I error, because the conventional F table is still being used by researchers in decision making. This study therefore was designed to develop a modified robust Exponentiated F test and generate its corresponding modified statistical table for decision making on one way analysis of variance (ANOVA) tests in the presence of outliers.

The CF distribution was redefined by the introduction of one shape parameter c using the exponentiated link function : where and are the density and distribution functions of the F distribution, respectively. The statistical properties namely distribution function, moment generating functions, moments and order statistics were derived from the modified robust Exponentiated F distribution. Simulated data using Monte Carlo algorithm with replications were generated for balanced and unbalanced experimental designs with treatment sizes k=3 and k=5 at varying replications (2-15). Varying degrees of outliers were introduced randomly. Percentage of type I errors for the CF, existing robust F-tests namely; Welch , Scott-Smith , Brown Forsythe (BF), Ken Roger (KR), Parametric Bootstrapping (PB) and the proposed Modified robust Exponentiated-F test ( ) were calculated. The was also used to develop a modified statistical table as critical values for the test for different values of shape parameters and varying degrees of freedoms.

The developed distributionand test were

and respectively where, r1, r2 , k, r, σ and c are degrees of freedom, number of treatments, number of replication within a population, the standard deviation and the shape parameter respectively. Percentage type I errors for the CF, existing robust tests and the modified robust Exponentiated F test (FEXP O) for balanced designs k = 3; (3, 3, 3),(5, 5, 5),(7, 7, 7),(10, 10, 10) were CF(9.90%), Welch(5.23%), Scott-Smith(94.60%), BF(5.35%), KR(4.23%), PB(8.35%), FEXP O(0.38%) and k = 5; (3, 3, 3, 3, 3),(5, 5, 5, 5, 5),(7, 7, 7, 7, 7),(10, 10, 10, 10, 10) were CF(17.54%), Welch (6.33%), Scott-Smith (93.45%), BF(6.40%), KR(4.00%), P B(7.80%), FEXP O(0.30%). Unbalanced designs k = 3; (2, 3, 3),(3, 5, 7),(5, 7, 10), (10, 8, 15) and k = 5; (2,2,3,3,5), (4,4,6, 6,10), (3, 5, 7, 10, 10) gave CF(12.50%), Welch(22.93%), Scott-Smith (95.70%), BF(3.78%), KR(5.63%), PB(27.65%), FEXP O(1.23%) and CF(12.53%), Welch(4.57%), Scott-Smith (93.05%), BF(16.40%), KR(0.01%), PB(0.13%), FEXP O(0.00%) respectively. The modified robust Exponentiated F test gives more precise decision than the Classical F and the existing robust-F tests for one way analysis of variance in the presence of outliers.

Keywords: Outliers, Exponentiated link function, Robust F-tests, Type I error

Word count: 439

1. NAME OF CANDIDATE: AWE OlushinaOlawale

2. NAME OF SUPERVISOR: Dr.Adedayo A. Adepoju

3. YEAR OF COMPLETION: 22/03/16

4. TITLE OF Ph.D THESIS: Recursive Bayesian Algorithm for Estimating Time-Varying

Parameters

5. ABSRACT

Estimation in Dynamic Linear Models (DLMs) will Fixed Parameters (FPs) has been faced with considerable limitations due to its inability to capture the dynamics of most time-varying phenomena in econometric studies. An attempt to address this limitation resulted in the use of Recursive Bayesian Algorithms (RBAs) which also suffers from increased computational problems in estimating the Evolution Variance (EV) of the Time-Varying Parameters (TVPs). The aim of this study therefore was to modify the existing RBA for estimating TVPs in DLMs so as to reduce its computational challenges.

The existing DLM yt = θt + vt, where ytis the endogenous variable, θt = Gtθt-1+wt [Gt is transition matrix of order p × p, wt~N(0, Ωt) with Ωt as the EV, Xt is the matrix of predictors and vt~N(0,φt) φt is the observational variance] was modified by injecting a discounting value, λ recursively into the EV of the DLM to give the modified RBA. Two Monte Carlo Experiments (MCEs) using sample sizes (n = 20, 30,….100) at 10,000 replications each were conducted to investigate the performance of the modified RBA. Experiment 1 involved existing DLM with FPs, while experiment 2 involved TVPs in the estimated DLMs. Lowest Average Granularity Range (AGR) of λ required for convergence and for minimum Mean Squared Prediction Error (MSPE) was used to determine optimal performance. The sensitivities of RBA on DLMs with FPs and TVPs were examined using λ and MSPE. The distribution of λ and MSPE selected by the modified RBA for the specified sample sizes were also examined in the MCEs. Additional secondary data on the three macroeconomic series (external reserves, export and Gross Domestic Product (GDP)) obtained from the Central Bank of Nigeria for period 1960-2011 were used to assess the performance of the modified RBA. The AGR of λ required for convergence was determined for the simulated and secondary data.

The performance of the modified RBA for λ and MSPE in MCE 1 when n = 20, 30, 40, 50, 60, 70, 80, 90, 100 were: (0.91 and 9.20); (0.86 and 4.90); (0.88 and 4.80); (0.90 and 4.50); (0.90 and 4.04); (0.89 and 4.23); (0.90 and 3.98); (0.90 and 3.95); (0.90 and 4.10)), respectively. Estimated DLMs with FPs by RBA converged with AGR (0.85 ≤λ≥ 0.99). In MCE 2, the performance of the modified RBA for λ and MSPE were: (0.63 and 2.84); (0.61 and 3.78); (0.63 and 3.28); (0.63 and 2.96); (0.58 and 3.82); (0.57 and 3.94); (0.57 and 3.63); (0.57 and 3.40); (0.63 and 4.10), respectively. Estimated DLMs with TVPs converged faster with AGR (0.50 ≤λ≥ 0.80). Estimated DLMs by RBA with TVPs associated with external reserves, export and GDP gave λ values 0.55, 0.65 and 0.70, respectively. The selected λ for the DLMs with TVPs was consistently lower than the DLMs with FPs for all sample sizes. Also, the MSPEs were consistently lower for the TVP models than the FP models.

The modified recursive Bayesian algorithm performed better and converged faster for estimating dynamic linear models with time-varying parameters. This reduced the complexity involved in estimating the evolution variance.

Keywords: Discounted variance, Dynamic model, Granularity range, Estimation algorithm.

Word count: 493

1. NAME OF CANDIDATE: AKANBI OlawaleBasheer

2. NAME OF SUPERVISOR: Dr. O. E. Olubusoye

3. YEAR OF COMPLETION: 30/03/16

4. TITLE OF Ph.D THESIS: Some Modified g-Priors for Parameters in Bayesian Model

Averaging

5. ABSRACT

Bayesian Model Averaging (BMA) method for measuring uncertainty inherent in model selection processes depends on the appropriate choice of model and parameter priors. The correct specification of parameter priors is still a major concern in BMA. This is because existing parameter priors give extremely low Posterior Model Probability (PMP). Therefore, this study was designed to modify some existing parameter g-priors aimed at improving their sensitivities to PMP values and determining their predictive performances.

The functional form of the g-priors used was g=(m_1 (k_j ))/(m_2 (n) ), where m_1 (k_j ) and m_2 (n) are functions of regressors per model j and sample size n with lim┬(n→∞)〖m_2 (n)〗≤∞.Theg was specified in two major forms: function dependency on sample size and function of both the sample size and number of regressors per model. Consistencies of the posterior distribution for the Fernandez, Ley and Steel (FLS) models based on the modified g-priors were examined. The FLS models: y=4+2X_((1))^*-X_((5))^*+1.5X_((7))^*+X_((11))^*+0.5X_((13))^*+V (Model 1) and y=1+V(Model 2) were used to examine the prior sensitivity, whereV~N(0,6.25). The asymptotic properties of the modified g-priors were also derived. Five sub-samples: 50, 100, 1000, 10000 and 100000 generated from the normal distribution each replicated 100 times were used to investigate the sensitivity of the modified g-priors. The PMP and predictive performance of the modifiedg-priors were compared with five of the FLS g-priors using percentage difference in PMPs and Log Predictive Scores (LPS).

The modified g-priors were g_1=1/n^2 ,g_2=√(k_j )/n, g_3=k_j/n^2 , g_4=(k_j^2)/n and g_5=3/(log(n))^3 , respectively. The five modified g-priors showed consistency with the FLS models 1 and 2. The derived asymptotic properties were parameter prior distribution; marginal likelihood of the model; Bayes factor; posterior parameter distribution; posterior model probability; predictive distribution and relationship to an information criterion for each of the priors. The PMPs for the modified g-priors were 0.9999, 0.3767, 0.9226, 0.9226, 0.0379 and 0.9996, 0.2221, 0.8732, 0.8732, 0.0065 for models 1 and 2, respectively. The g_1was considered the best g-prior with the largest PMP. The predictive performances were 2.3300, 2.3280, 2.2760, 2.2840, 2.3560 for the g-priors. The g_1 prior had the best predictive performance with a value closest to the expected LPS threshold of 2.3350 specified for BMA. The g_1 was better than the best in FLS g-priors by 8.0% of the difference in PMPs for model 1 and 13.0% for model 2. Three of the five modified and two of the FLS g-priors were found to be closed to the expected threshold value of 2.3350 for the predictive performance.

The modified g-priors performed better in model selection by their improved Posterior Model Probability values under the Bayesian Model Averaging framework.

Keywords: Posterior distribution, Model uncertainty, Bayes factor, Predictive performance

Word count: 427

1. NAME OF CANDIDATE: OLAJIDE Johnson Taiwo

2. NAME OF SUPERVISOR: Dr. O. E. Olubusoye

3. YEAR OF COMPLETION: 30/03/16

4. TITLE OF Ph.D THESIS: Sensitivity of some Dynamic Panel Model Estimators in the

Presence of Autocorrelation

5. ABSRACT

Inference in Dynamic Panel (DP) model is limited by the presence of autocorrelation of the error terms. This often leads to bias and inconsistent parameter estimates. Little has been done by previous researchers to assess the performances of DP estimators in the presence of different levels of autocorrelation. This study therefore was designed to examine the sensitivity of some selected DP model estimators at varying degrees of autocorrelation.

The DP model investigated was where , , is the dependent variable, is the autoregressive parameter of the lagged dependent variable, is the row vector of exogenous variable, is the parameter of the exogenous variable, is the unobserved individual specific effect and is the error term which varies over the cross-section and time. From the DP model, data were generated using a Monte Carlo simulation procedure for and where is the parameter of lagged exogenous variable and is the random error. The data generation was based on ) and . The followed autoregressive, moving average and autoregressive-moving average processes with parameters, and and included all possible combinations of cross-sections (N=50, 100, 200) and time periods (T=5, 10, 20) for specified values of and at 0.2, 0.5 and 0.8, respectively. Each combination was replicated 100 and 500 times. The parameters of the model were obtained using the following estimators: Ordinary Least Squares (OLS), Least Squares Dummy Variable (LSDV), Anderson-Hsiao1 (AH(l)), Anderson-Hsiao2 (AH(d)), Arellano-Bond Generalised Method of Moments Estimators 1 and 2 (ABGMM1 and ABGMM2), Blundell-Bond Estimators (SSY1 and SSY2). The bias and Root Mean Square Error (RMSE) were used as criteria to assess the sensitivity of the eight estimators.

The performances of the eight estimators in terms of bias and RMSE for , when T=20 and were: AH(d)(0.0002) and AH(d)(0.0149), ABGMM2(0.0008) and ABGMM2(0.022), OLS(0.0012) and OLS(0.0218), LSDV(0.0012) and LSDV(0.0164), AH(l)(0.0021) and AH(l)(0.0151), SSY1(0.0022) and SSY1(0.0165), ABGMM1(0.0085) and ABGMM1(0.0227), and SSY2(0.1428) and SSY2(0.2468), respectively. The AH(d) performed better than other estimators with minimum bias and RMSE. For when T=20 and , the performances of the estimators in terms of bias and RMSE were: ABGMM1(0.00036) and ABGMM1(0.0195) , LSDV(0.0013) and LSDV(0.0391), OLS(0.0025) and OLS(0.0343), ABGMM2(0.0026) and ABGMM2(0.0206), AH(l)(0.003) and AH(l)(0.0392), SSY1(0.0043) and SSY1(0.0405), and SSY2(0.0053) and SSY2(0.0528), respectively. The ABGMM1 performed better than the other estimators with minimum bias and RMSE.

Anderson-Hsiao2 and Arellano-Bond Generalised Method of Moment1 dynamic panel model estimators were least sensitive in estimating the parameters of the lagged dependent variable and exogenous variable, respectively in the presence of autocorrelation.

Keywords: Exogenous variable, Generalised method of moments, Lagged dependent variable, Monte carlo simulation, Serial correlation

Word count: 408

1. NAME OF CANDIDATE: MMADUAKOR Chika Obianuju

2. NAME OF SUPERVISOR: Prof. G. N. Amahia

3. YEAR OF COMPLETION: 23/08/16

4. TITLE OF Ph.D THESIS: An Alternative Almost Unbiased Ration Estimator for the

Population Mean

5. ABSRACT

Classical Ratio Estimator (CRE) of the population mean is more precise than the sample mean ( ) in Simple Random Sampling (SRS). A major setback of CRE is that this estimator is biased. The magnitude of this bias plays important role in estimation, because it results in misleading inference, especially when sample size is small. The CRE requires the advance knowledge of the population mean of the auxiliary variable. However, information on the auxiliary variable is not always available in all cases. Little has been done in the past to reduce the magnitude of bias when the auxiliary variable is unknown. Therefore, this study was aimed at designing an Alternative Almost Unbiased Ratio Estimator (AAURE) of the population mean , when the auxiliary variable is unknown.

Double sampling design was used to obtain an estimate of the population mean of the auxiliary variable . The was used to construct a biased estimator of the population mean, ; where , is the large sample estimate of obtained from the first phase sample, and are sample estimates of and obtained from second phase sample. The is the correction for the bias of ; where and n are first and second phase sample sizes. A linear cost function (where and are the unit costs of obtaining information in first and second phases and is the total cost for the survey) was used to obtain the optimum variances for the developed estimator and population mean in SRS given by and . Using first principles, the condition when AAURE is more efficient than Population mean in SRS was derived. The AAURE was compared with population mean in SRS, assuming the model , under three different scenarios: X ~ G(2,1) and e ~ G(0.25,1); X ~ Exp (0.25) and e ~ G(0.25,1) and X ~ N(10,4) and e ~ N(0,1) for n =20, 40 and 100 at with different values of , respectively. Additional secondary data from 80 factories in a region in USA (Population I: fixed capital and numbers of Workers and Population II: Fixed capital and Output) were also used for comparison.

The designed AAURE was . The optimum variances were and ; where and . The AAURE estimator is more efficient than the population mean in SRS since ; where and . If X ~ G for and , ; X ~ N for and , , ; for and , If X ~ G for and , X ~ N for and ; and X ~ Exp for and In populations I and II, the variances of the AAURE and population mean in SRS were , and , , respectively. This indicates that the AAURE is more efficient than population mean in SRS for some populations met in practice. From populations I and II , and , , respectively. The AAURE consistently gave optimum variance for all the distributions and the two populations considered.

The Alternative Almost Unbiased Ratio Estimator for the population mean was designed and its optimum variance derived. The AAURE has reduced the magnitude of the bias thereby minimizing the problem of misleading inference in small samples, where auxiliary variable is unknown for all units in the population

Keywords: Double sampling design, Optimum variance, Auxiliary variable, Population mean, Classical Ratio Estimation.

Word count: 486

1. NAME OF CANDIDATE: YEMITAN Raphel Adebayo

2. NAME OF SUPERVISOR: Dr. O. I. Shittu / Prof. J. O. Iyaniwura

3. YEAR OF COMPLETION: 07/10/16

4. TITLE OF Ph.D THESIS: Modified state Space model with Nonlinear State Equation

5. ABSRACT

The Classical State Space Model (CSSM) is a system of two equations where the first is the Measurement Equation (ME) and the second is the State Equation (SE). Previous research on CSSM relies on linear approximations due to the assumption of linearity of the SE while excluding the prospects of nonlinear estimation. Therefore, this study was designed to modify the existing CSSM to include nonlinear estimation and inference from the SE.

The CSSM was modified by reparametrising the SE (x_(t+1)=x_t+ω_(t+1)) with the Smooth Transition Autoregressive model {x_(t+1)=φ_1 x_t [1-〖G(x〗_t;θ)]+φ_2 x_t 〖G(x〗_t;θ)+ω_(t+1), wherex_t is the transition variable, φ_1 and φ_2 are known constants, 〖G(x〗_t;θ)=〖(1+ e^((-γ (x_t-c))))〗^(-1) is the transition function, γ>0 is the transition parameter, c is the threshold parameter while ω_(t+1) are measurement innovations with covariance P_(t+1)}. The Lagrange Multiplier (LM) test statistic was used to validate the nonlinearity of the system at α_0.05before estimating the Modified State Space Model (MSSM). Predicted State (PS), Blending Equation (BE), Kalman Gain (KG), and Filtered State Covariance (FSC) which are key attributes of the MSSM were derived using the Kalman Filter. The Nigerian Consumer Price Index (CPI), Gross Domestic Products (GDP) obtained from the National Bureau of Statistics and simulated data samples of sizes 250, 500, and 1,000, respectively from a logistics function were used to validate the MSSM and CSSM. Akaike Information Criterion (AIC), Mean Absolute Percentage Error (MAPE) and Root Mean Square Error (RMSE) were used as assessment criteria to evaluate the MSSM and CSSM.

The MSSM was of the form ME; y_(t+1)=Hx_(t+1)+ v_(t+1)where y_(t+1)are observations, H is a known constant, v_(t+1) are observation innovations and SE; x_(t+1)=φ_1 x_t [1-〖G(x〗_t;θ)]+φ_2 x_t 〖G(x〗_t;θ)+ω_(t+1). The PS, BE, KG, and FSC for the MSSM were x ̂_(t+1)^*=(φ_1 [1-G(x_t;γ,θ) ]+φ_2 G(x_t;γ,θ))/[γ(φ_1+φ_2 ) ]G(x_t;γ,θ)[1-G(x_t;γ,θ) ] ,x ̂_(t+1)=x ̂_(t+1)^*+K_t (y_(t+1)-H_(t+1) x ̂_(t+1)^* ),〖 K〗_(t+1)=(H_(t+1) P_t^* )(H_(t+1) P_t^* H_(t+1)^'+R_t )^(-1)and P_t=P_t^*-K_t H_(t+1) P_t^*, respectively. The LM test for CPI and GDP were 0.33 and 0.06 respectively, showing nonlinearity. The assessment tests for CPI were: 23.02 (MSSM) and 25.84 (CSSM) for AIC, 0.43 (MSSM) and 0.60 (CSSM) for MAPE and 2.13 (MSSM) and 3.65 (CSSM) for RMSE. The assessment tests for GDP were: 565.14 (MSSM) and 970.19 (CSSM) for AIC, 0.77 (MSSM) and 3.80 (CSSM) for MAPE and 2.18 (MSSM) and 10.54 (CSSM) for RMSE. The LM test for the simulated data sets for n = 250, 500 and 1000 were: 0.00, 0.36 and 0.03, respectively. This implies linearity for samples 250 and 1000 and nonlinearity for sample 500. The assessment tests for the nonlinear sample (n = 500) were: 3,957 (MSSM) and 5,161 (CSSM) for AIC, -14.9 (MSSM) and 80.5 (CSSM) for MAPE and 12.6 (MSSM) and 41.2 (CSSM) for RMSE.

The modified state space model improved estimation and inference for phenomena exhibiting nonlinear relationships.

Keywords: Kalman filter, Smooth transition autoregressive model, Transition function, Measurement and state innovations.

Word count: 432

1. NAME OF CANDIDATE: OPAYINKA Hannah Folashade

2. NAME OF SUPERVISOR: Dr.Adedayo A. Adepoju

3. YEAR OF COMPLETION: 28/10/16

4. TITLE OF Ph.D THESIS: Modification of Efron, Smooth and Moon Bootstrap Methods

for Heavy-Tailed Distributions

5. ABSRACT

Bootstrap methods are resampling methods that have been found more effective than parametric methods when dealing with heavy-tailed distributions. The major limitation of bootstrap methods is their sensitivity to the upper tail of the distributions of a given data set which often leads to large Standard Errors (SE) and slow convergence. Little has been done by past researchers to address these challenges with heavy-tailed distributions. Therefore, this study was aimed at modifying three bootstrap methods [Efron bootstrap (EB), Smooth bootstrap (SB) and m-out of-n(moon) bootstrap (MB)] to address the problems of large SE and slow convergence.

Modified Efron Bootstrap (MEB), Modified Smooth Bootstrap (MSB) and Modified moon Bootstrap (MMB) were obtained from EB, SB and MB, respectively. The empirical distribution was decomposed by sub grouping the observations to individual ranks. For MEB and MMB, proportional allocation (n_s⁄n) method was used to obtain bootstrap samples (where n and n_s are original and stratum sample sizes, respectively). For MSB, corresponding random noise (ε_s) was added to individual observations before obtaining bootstrap samples. In the MEB and MSB, bootstrap samples of size n were taken from the original sample size of n, while in MMB, bootstrap samples of size n ⃛

The modified methods were verified to have satisfied the assumptions guiding the applications of bootstrap methods. The decomposed empirical distributions still followed the original distributions. For simulated data; when n =15000, the SE and RMSE respectively were: 0.4446 and 0.0141 for EB; 0.4619 and 0.0146 for SB; 0.0970 and 0.0031 for MEB; 0.4621 and 0.0146 for MSB. Also, whenn ⃛=3000 the SE and RMSE were: 0.9832 and 0.0310 for MB; 0.2196 and 0.0069 for MMB. For MOF, when n=13052 the SE and RMSE were: 748.62 and 23.67 for EB; 748.69 and 23.67 for SB; 204.26 and 6.45 for MEB; 723.42 and 22.87 for MSB. Also, when n ⃛=2610 the SE and RMSE were: 1625.58 and 51.40 for MB; 470.85 and 14.88 for MMB. Thus, MEB has minimum values for both SE and RMSE among the four methods, while MMB has minimum values between the other two methods. The relative convergence rate of MEB over other methods was between 17.9% and 28.8% while that of MMB over MB was above 50.0%.

The modified bootstrap methods were less sensitive to the upper tail of distributions resulting in small standard errors and fast convergence hence, suitable for handling heavy-tailed distributions.

Keywords: Proportional allocations, Resampling methods, Relative convergence, Bootstrap method

Word count: 475

1. NAME OF CANDIDATE: BADMUS NofiuIdowu

2. NAME OF SUPERVISOR: Prof. T. A. Bamiduro

3. YEAR OF COMPLETION: 21/11/16

4. TITLE OF Ph.D THESIS: An Alternative Generalized Weighted Weibull Regression

model

5. ABSRACT

Classical Regression Model (CRM) such as Weibull regression is commonly used for estimating relationship among variables. The problem with CRM is its dependence on the assumptions of normality and homoscedasticity of the residual terms. However, the assumption of normality is not valid for several real life events especially time-to-event phenomenon where the data exhibit a high level of skewness. Previous research on CRM has generally excluded non-normality of the residual terms. Therefore, this study was aimed at developing an Alternative Generalised Weighted Weibull Regression Model (AGWWRM) for improved inference when the residual terms are not normal.

The Weighted Weibull Distribution (WWD)

f(x)=(γ+1)/γ (α/β)^α x^(α-1) exp(〖-(x/β)〗^α )(1-exp(〖-γ(x/β)〗^α ) )where γ, α and β are: weighted, scale and shape parameters. The WWD was redefined by the introduction of two shape parameters, a and b to accommodate skewness in the data; based on the beta link function: g(x)= 1/(B(a, b)) [F(x) ]^(a-1) [1-F(x) ]^(b-1) f(x) where,B is the beta function and F(x) is the distribution function of the WWD. To obtain a location-scale regression model that would link the response variable y_i(=X_i^T β^*+σz_i and z_i=(y_i-μ)/σ) is the error term, where β^* is the regression model, μ and σ are the location and

dispersion parameters for i=1,2,…,n; to a vector X of p explanatory variables. The transformations

Y=log(T),α=1⁄σ and μ=log(β) were used. T is a random variable having beta Weighted Weibull (WW) density function and Y is a log-beta WW variable. The statistical properties namely: moments, moment generating functions, skewness and kurtosis were determined for the Alternative Generalised Weighted Weibull (AGWW) distribution. The performance of the AGWWRM was determined using secondary data on time-to-completion of a Ph.D. programme using a sample of 187 Ph.D. graduates from the University of Ibadan. The explanatory variables used were supervisor (x_1), employment (x_2), marital status (x_3), age (x_4) and faculty (x_5), while y being dependent variable was time-to-completion. The AGWWRM was compared with six existing generalised WW regression models: log-beta Weibull, log-beta normal, log-Weibull, log-normal, log-logistic and log-weighted. The Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) were used as the assessment criteria for AGWWRM.

The derived AGWW distribution was g(z;a, b, γ,μ, σ)=(γ+1)/(σγB(a, b)) exp(z_i )exp(-exp(z_i ) )(1-exp(-γexp(z_i ) ) ) [F(z)]^(a-1) [1-F(z)]^(b-1) where F(z) = (γ+1)/γ {(1-exp(-γexp(z_i ) ) )-1/(γ+1) (1-exp(-exp(1+γ)(z_i ) ) ) }. The developed AGWWRM was

y = β_0^* +〖 β〗_1^* x_1 + β_2^* x_2 + β_3^* x_3+ β_4^* x_4+ β_5^* x_5 + σz_i where 0
model (β_0^* 〖,β〗_1^* 〖,β〗_2^* 〖,β〗_3^*, β_4^* and β_5^*) = (2.550,3.250,1.250,4.150,1.310,5.150). The AIC and BIC for the

AGWWRM were ˗9112754.000 and ˗9112751.000 while the AIC for the six generalised WW regression models were ˗474248.600 for log-beta Weibull, ˗3076234.000 for log-beta normal, ˗487430.400 for log-Weibull, ˗1541.182 for log-normal, ˗1102.662 for log-logistic and ˗252807.000

for log-weighted, respectively. Also the corresponding BIC were ˗474249.500, ˗3076205.000,

˗487428.500, ˗1518.564, ˗1080.044 and ˗252804.800, respectively. The assessment criteria for the AGWWRM were consistently lower than those from the existing generalised WW regression models indicating improved inference.

The developed Alternative Generalised Weighted Weibull Regression model exhibited improved inference when the residual terms are not normal.

Keywords: Beta link function, Log-beta distribution, Location-scale regression model,

Log-normal distribution

Word count: 480