Bayesian inference for non-homogeneous Poisson process models for software reliability

Date of Completion

January 2006


Statistics|Computer Science




In this dissertation, we present Bayesian inference for models based on non-homogeneous Poisson processes (NHPP) for software reliability data. Statistical research for software reliability is an effort to quantify the reliability of software, either in the testing stage, or after business release. Estimation and prediction of software failures collected during the development circle are major research goals. Given the discrete nature of the failure counts and the time; dependence of the observations and covariates, models based as the non-homogeneous Poisson processes (NHPP) have been at the center of the software reliability modeling literature. Here, three different classes of models will be proposed to understand various aspects of new data from the software testing context and to better characterize and predict the patterns and variations in the data. ^ First, we present a new model for software reliability characterization using a growth curve formulation that allows model parameters to vary as a function of available covariate information. In the software reliability framework, covariates may include things as the number of lines of code for a product throughout its development cycle, size of the test team, etc. The mean function of the underlying NHPP in the proposed model is allowed to vary according to a linear function linked with the covariate information. We employ the Bayesian framework using Markov chain Monte Carlo (MCMC) for inference and model assessment. The methods are illustrated by using simulated defect data and defect data collected during development for two large commercial software products. ^ For the second problem, we develop NHPP models to characterize categorized event data, with application to modeling the discovery process for categorized software defects. Conditioning on the total number of defects, multivariate models are proposed for modeling the defects by type. A latent vector autoregressive structure is used to characterize dependencies among the different types. We show how Bayesian inference can be achieved via MCMC procedures, with the posterior prediction-based L-measure used for model selection. Simulation studies are presented to validate the models. The results are then illustrated for defects of different types found during the system test phase of a software. ^ In the third problem, two NHPP models with Markov chain switches are studied to characterize categorized software defects. The first is a univariate suites model which focuses on a Markov switch occurring in the NHPP mean function with varying discovery rate. Introducing the switch will allow for more variability among observed events as documented in literature. In the second suites model, conditioning on the total number of events, a multivariate setup is proposed for modeling the defects by categorized types. A latent Markov chain is applied to model the evolving probabilities of the types of events. In addition to model estimation via Bayesian methods using MCMC, model dimension is selected via posterior probabilities, Schwarz criterion and BIC. After the models are validated via simulation studies, the proposed methods are illustrated for software failures data from a real testing situation. ^ The last chapter will briefly discuss comparison of the models developed in the previous chapters, with application to real data. Cross-validation prediction via CPO, Schwarz criterion, L-measure will be used to facilitate the discussion. ^