Oncology
Estimating the effects of the factors underlying the progression of Familial Adenomatous Polyposis using Longitudinal ComPoisson Model
Vandna Jowaheer^{1}, Naushad Ali Mamode Khan^{1}, Durga Charan Pati^{2}
Author Affiliations
^{1}University of Mauritius, Mauritius
^{2}Department of Surgery, SSR Medical College, Mauritius
correspondence to
Vandna Jowaheer; vandnaj@uom.ac.mu
Received: December 7, 2013
Revised: March 19, 2014
Accepted: March 21, 2014
Abstract
This paper aims at developing a statistical model capable of quantifying the effects of various factors on the progression of Familial Adenomatous Polyposis (FAP), a genetic disorder affecting the colon and rectum in human beings. The progression of FAP in affected individuals is monitored by counting the number of polyps developed over a period of time. These count responses repeatedly observed over time are overdispersed and highly correlated resulting into a complicated longitudinal count data structure, which render the application of commonly, used Gaussian regression model useless. We designed a statistical model based on ComPoisson distribution, which can efficiently analyze such a data structure. The estimates of overdispersion as well as correlation parameters confirm the nature of real data. Analysis of the model indicates that males are 50% more at risk to develop polyps than females. With respect to the type of treatment, the application of vitamin C and E with high fiber treatment is a better remedy followed by vitamin C and E only as compared to placebo. In men as well as in women, initial polyp counts positively affect the polyp counts at a given time.
KEY WORDS: Familial adenomatous polyposis; Longitudinal responses; Correlation and overdispersion; ComPoisson model 
Adenomatous polyps are benign pedunculated outgrowths from epithelium with varying malignant potential. It is an autosomal dominant disorder diagnosed when the patient has more than 100 polyps in large bowel or when a member of Familial Adenomatous Polyposis (FAP) family has any number of colonic adenomas detectable. It can also occur in stomach, duodenum and small intestine. The incidence is between 1 in 8000 28000 individuals. Its main risk is large bowel cancer. The gene responsible is APC gene on the short arm of chromosome 5^{1}. It can occur sporadically by new mutation. In these cases large bowel cancer occurs in young adulthood. Male and females are both affected. FAP can be associated with benign tumours like abdominal wall tumours (desmoid tumour) and bone tumours (osteoma).
Polyps are usually visible by the age of 15 years via sigmoidoscopy. Carcinomatous changes (nearly 100%) occur 1020 years after onset of polyposis. Hence, if left untreated the person develops cancer by the age of 3540. A person with familial polyposis has a 50 percent chance of passing the condition down to each child. The symptoms associated with the growth of polyps are gastrointestinal problems such as diarrhoea, constipation, abdominal cramps, blood in the stool, or weight loss. Patients may also develop other nonmalignant tumors, bone and dental abnormalities. They may also exhibit a spot on the retina of the eye. The patients are usually given a surgery treatment for FAP. It is important to control the recurrence and spread of the polyps in such patients.
The spread of polyposis can be monitored by counting the number of polyps observed in patients over time. Gender, the initial number of polyps at the time of detection and the type of treatment are some of the important factors determining the progression of familial polyposis. One of the two types of treatments: Vitamin C+E and Vitamin C+E+high fibre is usually administered to the patients in order to control the polyp counts. This results into longitudinal count data where the responses are correlated and highly overdispersed. Moreover, the correlation structure is unknown since the joint distribution of the polyp counts is unknown. Gaussian regression model cannot take into account such features of the polyp counts data and its application will provide highly inefficient estimates of the factor effects.
Stukel^{2} and Crouchley and Davis^{3} proposed to use the generalized estimating equations (GEE) and random effects modelling approaches respectively to analyze such type of polyp counts data. The ’working’ correlation structure based GEE approach by Liang and Zeger^{4} suffers from the drawbacks of using misspecified ’working’ correlation matrix as highlighted by Crowder^{5} and Sutradhar and Das^{6}. Hence, the GEE approach assuming an approximate covariance structure based on an equicorrelation structure model used by Stukel^{2} fails to yield efficient estimates of the regression parameters. Also, the random effects models similar to those designed by Thall and Vail^{7} and Crouchley and Davis^{6} are not suitable as they are only able to model the overdispersion but are not efficient in modelling the timelag correlations among the counts repeatedly collected over time as discussed by Jowaheer and Sutradhar^{8}. Moreover, the estimation of the regression parameters by evaluating integrated likelihood function is quite complicated and the efficiency of these estimates depend on the assumption of the distribution of the random effects. On the contrary, Jowaheer and Sutradhar^{8} proposed a negative binomial longitudinal model which models the overdispersion and used joint generalized estimating equations based on true autocorrelation structure of the count responses repeatedly collected over time and estimated the true parameters involved in the model. Khan and Jowaheer^{9} used this negative binomial longitudinal regression model based on stationary autocorrelation structure to analyse polyps data. In this paper, we propose to use a ComPoisson longitudinal model^{10} and use joint generalized quasilikelihood (GQL) estimating equations^{11} based on true stationary autocorrelation structure of the counts to reanalyze the rectal polyps data from Stukel^{2}.
The original data analyzed by Stukel^{2} consists of the rectal polyp counts of 58 patients recorded over nine visits along with the information on gender and two baseline measures of the polyp counts taken before the treatment as well as the type of treatment. However, there are some missing counts in these data. In this application, we exclude the patients with missing data and consider only 45 subjects for 9 three monthly visits. The means and variances of the responses for the 9 visits are shown in Table 1. The average lagcorrelations are displayed in Table 2.
It is noted from Table 1 that variances are larger than their corresponding means, thus indicating that data are highly overdispersed. Also, the lagcorrelations displayed in Table 2 decrease gradually as the lags increase showing an autoregressive pattern underlying the responses repeatedly collected over 9 visits. There are three covariates: types of treatment, gender and the sum of baseline rates (BR). Patients are allocated to one of the threetreatment groupsPlacebo, Vitamin C+E (TR 1) and Vitamin C+E+high fibre (TR 2). These three groups are represented by x_{1} (TR 1), x_{2} (TR 2) and placebo being the reference group. Hence, x_{1} = 0 and x_{2} = 0 stands for an individual allocated to placebo; x_{1} = 1 and x_{2} = 0 stands for an individual allocated to TR 1; x_{1} = 0 and x_{2} = 1 stands for an individual allocated to TR 2. The covariate gender is represented by x_{3} which is 0 for male and 1 for female. The sum of baseline rates (BR) is the third covariate represented by x_{4}.
In order to estimate the effect of covariates on the number of polyps developed over a period of 2 years after the start of the treatment, we propose to use a ComPoisson regression model based on AR (1) type autocorrelation structure^{11}. In this section, we provide the structure of this model^{10}. The parameters of this model will be estimated using joint generalized quasilikelihood (JGQL) estimation approach discussed in the next section.
Let y_{it} be a count response and x_{it} be a p dimensional vector of covariates for subject i(i=1,…,I) observed at time t(t=1,…,T). Let β be the p×1 vector of regression parameters. For the i^{th} subject, let y_{i}= (y_{i1},…,y_{it},…,y_{iT})^{T} be the T×1 response vector and x_{i}= (x_{i1},…,x_{iT})^{T} be the T×p matrix of covariates. We assume y_{it} follows ComPoisson distribution11 with probability mass function.
...................................................................................(1)
where
........................................................(2)
and the parameter v is the dispersion index such that v = 1, v < 1 and v > 1 correspond to equi, over and under dispersion. Since equation (1) doesn’t have closed form expression, an asymptotic expression^{10} is used. This expression is given by:
................................................................(3)
Hence,
and
........................................................(4)
Here, I = 45, T = 9 and p = 5.
Since all the covariates are timeindependent,
...............................................................(5)
The parameters of the model considered in equations (1) to (4) are estimated using the consistent and efficient joint generalized estimation approach^{11}, which is briefly explained in this section. The JGQL estimating equation to estimate the regression and overdispersion parameters is given by:
.............................................................................(6)
where, are 2Tx1 vactors with and
where is the covariance matrix of the score vector ƒ_{i}and Diis the derivative matrix consisting of:
, .
The mathematical details of the covariance matrix are available in Mamode Khan and Jowaheer^{11}. Note that:
...........................................................(7)
for where . The iterative solution of JGQL estimating equation (6) is given by:
...........................................(8)
where
is the value of
at the r^{th} iteration.
[.]_{r} is the value of the expression at the r^{th} iteration. The data analysis can be easily performed using opensource software R^{12}.
The results after fitting the model to the data are presented in Table 3.
The average estimates of the correlation parameters are provided in Table 4.
The model fits the data very well. The estimates of the lagautocorrelation values are large, indicating that the data are highly correlated and decreasing values with increasing time lag justifies AR (1) autocorrelation pattern. The treatment parameters TR 1 and TR 2 are both negative, indicating that both the treatments are capable of reducing the number of cancerous polyps when compared to placebo. However, we may conclude that vitamin C and E with high fibre treatment is more effective in the reduction of polyps as compared to vitamin C and E. The negative sign in the sex parameter makes us deduce that there is lesser number of polyps among the female group. The growth of polyps is lesser by almost 50 percent in females as compared to males. Also, if the baseline rates increase by 1 percent, then the polyp counts will show an increase of 3 percent. The estimate of v is < 1 justifying that the data are overdispersed. These findings are in line with the findings made by Khan and Jowaheer^{10} after fitting an alternative negativebinomial model to the same data set. It should be remarked that ComPoisson model is preferred to negativebinomial model due to its flexibility of accommodating different types of dispersion structures.
Familial polyps, once arising in a human excretory system, multiply quite fast and lead to cancer. The growth of these polyps can be monitored by counting the number of polyps. It is of interest to understand and estimate the effect of important factors such as the type of treatment, sex as well as the baseline counts on the growth of the polyps over time. The polyps count data, longitudinally collected together with the information on covariates, is generally overdispersed with gradually decreasing autocorrelation pattern. The analysis of such data is quite challenging and requires the application of a properly designed statistical model. In this paper, we have analysed polyps count data using the longitudinal ComPoisson regression model based on AR (1) type autocorrelation structure. The estimation of the regression and overdispersion parameters is done using a joint generalized quasilikelihood approach. The estimates thus obtained are reliable and consistent with very small standard errors. Based on this study, we may conclude that the application of Vitamin C and E with high fibre treatment is a better remedy followed by Vitamin C and E only as compared to placebo in the reduction of polyps in human bodies. Males are 50 percent more at risk of developing polyps than females. Hence, with a familial history of polyposis, the offsprings especially the males should be more at guard and must take recourse to early medical checkups with respect to the disease.
 KW, Nilbert MC. Identification of FAP locus genes from chromosome 5q21. Science. 1991;253(5020):6615.
 Stukel TA. Comparison of methods for the analysis of longitudinal interval count data. Stat Med. 1993;12(14):133951.
 Crouchley R, Davies RB. A comparison of population average and random effect models for the analysis of longitudinal count data with baseline information. J Royal Statist Soc. 1999;162(3):33147.
 Liang KY, Zeger SI. Longitudinal data analysis using generalized linear models. Biometrika. 1986;73(1):1322.
 Crowder M. On the use of a working correlation matrix in using generalized linear models for repeated measures. Biometrika. 1995;82:40710.
 Sutradhar BC, Das K. On the efficiency of regression estimators in generalized linear models for longitudinal data. Biometrika. 1999;86:45965.
 Thall PF, Vail SC. Some covariance models for longitudinal count data with overdispersion. Biometrika. 1990;46:65771.
 Jowaheer V, Sutradhar BC. Analysing longitudinal count data with overdispersion. Biometrika. 2002;89:38999.
 Khan NM, Jowaheer V. Analysing familial polyposis using negative binomial longitudinal regression model. Conference proceedings of International Conference on Medical, Biological and Pharmaceutical Sciences, Thailand. 2011.
 Shmueli G, Minka T, Borle J, Boatwright P. A useful distribution for fitting discrete data. J Royal Statist Soc. 2005;54:12742.
 Khan NM, Jowaheer V. Comparing joint GQL estimation and GMM adaptive estimation in COMPoisson longitudinal regression model. Commun StatSimul C. 2013;42(4):75570.
 R development core team. http://www.rproject.org/
