Loading web-font TeX/Math/Italic
Typesetting
in Revista mexicana de ciencias forestales
How to correct the heteroscedasticity and autocorrelation of residuals in taper and height growth models?
Resumen:
En la modelación del ahusamiento y del crecimiento en altura dominante con datos de series de tiempo, es muy común la presencia de heterocedasticidad y autocorrelación de los errores. Funciones de varianza (varFunc) y estructuras de correlación (corStruct) para corregir la heterocedasticidad y modelar dependencia de los errores, respectivamente. Estas fueron combinadas y evaluadas en ecuaciones de ahusamiento y crecimiento en altura de Pinus teocote en Durango, México. La base de datos se obtuvo de 51 análisis troncales con 768 observaciones de ahusamiento y 634 de altura. Las varFunc utilizadas fueron: 1) función de potencia (varPower); 2) función exponencial (varExp); 3) función constante y de potencia (varConstPower); y 4) función combinada de potencia y exponencial (varComb). Las corStruct incluyeron: simetría compuesta (corCompSymm), autorregresiva de orden 1 (corAR1), autorregresiva continua (corCAR1), autorregresiva de media móvil (corARMA2-0), corARMA1-1, corARMA2-1, corARMA2-2, corARMA3-1 y corARMA3-2. Las ecuaciones se ajustaron por mínimos cuadrados generalizados no lineales; y se evaluaron con un sistema de calificación con los estadísticos de ajuste: RMSE, R2, AIC, BIC, LogLik, CV y sesgo promedio. Con base en la calificación, las mejores combinaciones para el ahusamiento y crecimiento en altura fueron 1-9, 2-5, 3-8 y 4-6 y 1-6, 2-9, 3-7 y 4-4, respectivamente. En el ahusamiento solo la combinación 2-5 fue homocedástica con residuales independientes al igual que las ecuaciones de altura seleccionadas y las varFunc y corStruct presentaron influencia en la trayectoria de las curvas de índice de sitio construidas.
Abstract:
In modeling of taper functions and dominant height growth with time series data, the presence of heteroscedasticity and autocorrelation in residuals is common. Variance Functions (varFunc) and correlation structures (corStruct) were used to correct heteroscedasticity and autocorrelation; both were combined and evaluated through taper and height growth equations for Pinus teocote in Durango, Mexico. A dataset of 51 stems analysis with 768 taper observations and 634 height growth observations was used. The varFuncs applied were: 1) power function (varPower); 2) exponential function (varExp); 3) constant plus power function (varConstPower); and 4) a combination of power and exponential functions (varComb). The corStructs were: compound symmetry (corCompSymm), autoregressive of order 1 (corAR1), continuous-time autoregressive of order 1 (corCAR1), autoregressive-moving average (corARMA2-0), corARMA1-1, corARMA2-1, corARMA2-2, corARMA3-1 and corARMA3-2. To fit the equations, the generalized nonlinear least squares method was used and evaluated with a rating system through: RMSE, R2, AIC, BIC, LogLik, VC and average bias. According to the rating system, the best combinations for taper and height growth equations were 1-9, 2-5, 3-8 and 4-6 and 1-6, 2-9, 3-7 and 4-4, respectively. In the taper equation, only the combination 2-5 was homoscedastic with independent residuals, and the selected height growth equations were homoscedastic with independent residuals; the varFunc and corStruct had influence on the trajectories of site index curves.
Main Text
Introduction
The planning, implementation and monitoring of sustainable forest management require research to support decision-making and the assessment of the established goals. The estimation of the timber stock and productivity of the stands is a main objective in forest management systems; therefore, it is essential to know the growth of commercial species (Aguirre-Calderón, 2015; Salas et al., 2016).
Research on the estimation of volume, growth and increment is a key tool for understanding the dynamics of ecosystems involved in forest management; therefore, these approaches continue to be necessary for the planning and implementation of forest activities. Taper and dominant height growth equations have been widely explored topics (Castillo et al., 2013; Paulo et al., 2015; Rodríguez-Carrillo et al., 2015; Krisnawati, 2016; Corral-Rivas et al., 2017; Fierros-Mateo et al., 2017; Tamarit et al., 2017) because they are biometric tools that allow characterizing the profile of the trees, as well as estimating the total volume and the commercial volume, and the forest productivity, based on the site index approach (Crecente-Campo et al., 2013; Quiñonez-Barraza et al., 2015; Özçelik and Crecente-Campo, 2016).
Since the datasets used for fitting taper and height growth equations are time series obtained from measured variables on the same tree and resulting from taper and tree stem analyses, it is reasonable to assume that the observations in each tree are correlated and, also the residuals of the adjusted equations (Arias-Rodil et al., 2015; Quiñonez-Barraza et al., 2015; Corral-Rivas et al., 2017); furthermore, their predictions might show a variation in the levels of the predictor variables, which is known as heteroscedasticity (Gujarati and Porter, 2011).
The term autocorrelation refers to the correlation between the residuals of a regression model when series of observations arranged in time are used, e.g. time-series data, or in space, and with cross-sectional data. The linear and nonlinear regression models are based on the theoretical assumption that the residues have the same variance and are, therefore, homoscedastic. The presence of autocorrelation and heteroscedasticity leads to estimations of non-minimum variance parameters and to somewhat unreliable prediction intervals, especially in modelling taper and volume equations (Fortin et al., 2007; Xu et al., 2014; Tang et al., 2016). Consequently, the usual
or
tests are not valid. Therefore, the use of generalized least squares (GNLS) approach with variance functions and correlation structures is an alternative to generate the best unbiased linear parameter estimates (Gujarati and Porter, 2011).
In studies of taper and dominant height growth models, correlation structures and power functions to correct the autocorrelation and heteroscedasticity of the residuals, respectively, are frequently used (Quiñonez-Barraza et al., 2014; Quiñonez-Barraza et al., 2015; Sharma et al., 2015; Özçelik and Crecente-Campo, 2016; Corral-Rivas et al., 2017; Tamarit et al., 2017); continuous-time autoregressive structures of errors with one, two or three power functions with known variance are highlighted.
Because it is important to improve the predictive capacity and the interpretation of the statistic properties in equations fitting, the objective of this study was to evaluate the combination of variance functions with correlation structures, in order to model the heteroscedasticity and the errors dependence in taper and dominant height growth equations of Pinus teocote Schiede ex Schltdl. & Cham. in Durango, Mexico.
Materials and Methods
Study area and description of experimental data
The information from stem analysis of 51 Pinus teocote trees collected in mixed stands of the Forest Management Unit (Umafor) 1005 Santiago Papasquiaro y Anexos, in northeastern Durango, Mexico, was used. The study area is located on the Sierra Madre Occidental, between 24°48’16.98’’ and 25°13’47.25’’ N, and 105°53’9.81’’ and 106°12’52.58’’ W. The forest polygon was San Diego de Tezains ejido, with a total area of 61 098.25 ha, of which 26 636.09 ha are programed for timber production with forest management (Quiñonez-Barraza et al., 2014). The predominant climates are temperate warm humid and temperate subhumid, with a mean annual precipitation of 1 375 mm. The mean annual temperatures ranges from 8 °C in the highest areas to 24 °C in the lower areas, where the mean altitude is around 600 m (García, 1981; Quiñonez-Barraza et al., 2015).
The data was taken using a totally random design for the stands of the timber production area, and was considered a normal distribution for the diameter categories. The trees were felled and divided into sections in order to register the growth in dominant height and taper by relative heights. The first measurement corresponded to the stump height; subsequently, to lengths of 0.30 m, 0.60 m and 1.3 m; furthermore, growth slices were extracted every 2 m (Quiñonez-Barraza et al., 2015), whose diameters, heights and number of rings were recorded. In whole dataset, 768 diameter-height (taper) and 685 height-age combinations were used. Table 1 shows the descriptive statistics of the analyzed variables.
Fitting models
The taper was modeled with the segmented equation developed by Fang et al. (2000), which has been used in several studies in order to generate compatible taper and merchantable volume systems in species of interest (Quiñonez-Barraza et al., 2014; Uranga-Valencia et al., 2015; Özçelik and Crecente-Campo, 2016; Corral-Rivas et al., 2017; Tamarit et al., 2017). The segmented taper equation is as follows:
dij=c1(Hi(K-β1)/β1(1-hijHi)(K-R)/RA(I1+I2)1AI22)0.5+εij
(1)
Where:
;
d ij = Diameter j of the tree i at the commercial height h ij (cm)
H i = Total height of the tree i (m)
h ij = Commercial height j of the tree i (m)
D i = Normal diameter of the tree i (cm)
h bi = Stump height of the tree i (m)
α i = Total volume parameters (i = 1, 2, 3)
β i = Taper parameters (i = 1, 2, 3)
ε ij = Residual of diameter j in the tree i
The dominant height as an intrinsic site index equation was modeled using the dynamic equation in the generalized algebraic difference approach (GADA), derived by Quiñonez-Barraza et al. (2015) and expressed as equation 2.
h2ij=e[β1+β2(ln(h1ij)-β1ln(1-e-β3t1ij)+β2)](1-e-β3t2ij)[ln(h1ij)-β1ln(1-e-β3t1ij)+β2]+εij
(2)
Where:
= Height
of the tree
at age
of the tree
in status 2 (m)
= Height
the tree
at age
of the tree
in status 1 (m)
= Age
of the tree
in status 1 (years)
= Age
of the tree
in status 2 (years)
= Parameters to be estimated (i = 1, 2, 3)
= Exponential function
= Natural logarithm
= Residual of height
in the tree
Autocorrelation and heteroscedasticity
Combinations of variance functions with correlation structures were used in taper equation (Eq. 1) and dominant height equation (Eq. 2). The varFunc and corStruct were determined according to Pinheiro and Bates (2000).
Variance functions. The variance functions were as follow: 1) power function (varPower); 2) exponential function (varExp); 3) constant power function (varConstPower), and 4) combination of power-exponential functions (varComb). The variance functions were used in order to model the variability between the measurements of each tree
with the merchantable height covariables (
) for the taper equation, and the dominant height (
) at state
for the height growth equation. The general structure of the power functions for modeling the heteroscedasticity considers two arguments for most varFuncs: the parameter value and shape. The first specifies the value of the variance parameter (
), and the second specifies the variance covariable
), which in this case may be considered a stratified variable for each tree; i.e. a variance parameter can be estimated for each dataset of the several trees (Pinheiro and Bates, 2000).
The variance model (
or
) for varPower is represented by equation 3 and corresponds to the variance function
) in equation 4. Covariable
was utilized for the taper equation (Eq. 1), while
was used for the dominant height growth equation (Eq. 2).
Var(εij)=σ2|vij|2δ
(3)
g(vij,δ)=|vij|δ
(4)
The varExp variance model of is represented by equation 5, and the corresponding function, by equation 6, for the same covariables previously defined for taper and dominant height growth equation. The varConstPower variance model is defined in equation 7, and the variance function, in equation 8. The varComb variance model (varExp and varPower) is defined in equation 9, with the respective function expressed in equation 10 (Pinheiro and Bates, 2000). In all cases, the same covariables, previously defined, were used. For the varConstPower,
represents the constant parameter, and
, the corresponding power parameter;
corresponds to the varExp parameter, while
corresponds to varPower.
Var(εij)=σ2e(2δvij)
(5)
g(vij,δ)=e(δvij)
(6)
Var(εij)=σ2(δ1+|vij|δ2)2
(7)
g(vij,δ)=δ1+|vij|δ2
(8)
Var1(εij)×Var2(εij)=σ2e(2δ1vij)×σ2|vij|2δ2
(9)
g1(vij,δ1)×g2(vij,δ2)=e(δ1vij)×|vij|δ2
(10)
The correlation structures were: 1) compound symmetry (corCompSymm); 2) autoregressive of order 1 (corAR1); 3) continuous-time autoregressive of order 1 (corCAR1); 4) autoregressive-moving average 2, 0 (corARMA2-0); 5) autoregressive-moving average 1, 1 (corARMA1-1); 6) autoregressive-moving average 2, 1 (corARMA2-1); 7) autoregressive-moving average 2, 2 (corARMA2-2); 8) autoregressive-moving average 3, 1 (corARMA3-1), and 9) autoregressive-moving average 3, 2 (corARMA3-2). The correlation structures were used to model the dependence between residuals of each tree, with time-series data (Pinheiro and Bates, 2000). This study modeled the dependence between the diameter and height measurements in the same tree for the aim of achieving independence in the residuals of taper equation (Eq. 1) and dominant height growth equation (Eq. 2). The general structure of correlation between groups for a single grouping level is expressed as equation 11 (Pinheiro and Bates, 2000).
cor(εij,εi1j-1)=f[d(pi1j,pi1j-1),ρ]
(11)
Where:
ε ij = Residual of the diameter or height
of the tree i
= Residual of the diameter or delayed height growth
in the tree i
= Correlation function
, taking up values between 1 and -1
= Distance between position vectors
and
, which, for the taper equation, were the differences between the commercial height
and
; while, for the dominant height growth equation, they were the heights defined in two positions:
and me podria pasar
= Correlation parameters vector
In the correlation structure corCompSymm, an equal correlation is assumed for all errors of the same group within the same tree; the correlation model is given by equation 12. The corAR1 model is represented in equation 13, while the carCAR1 model is expressed by equation 14. The autoregressive-moving average (corARMA2-0) is defined by equation 15, and corARMA1-1, by equation 16, which is the generalization for corARMA p, q structures; i.e. corARMA2-0, corARMA1-1, corARMA2-1, corARMA2-2, corARMA3-1, and corARMA3-2 are represented by the changes in p values for the autoregressive structure (AR), and q, for the moving average (MA) structure. The new error term (
) defines the dependent residuals in the regression model; in the moving average (MA) structure, the residual is determined by a ij .
εij=ρ1[d(pi1j,pi1j-1)]+ϵij
(12)
εij=ρ1εij-1+ϵij
(13)
εij=ρ1[d(pi1j,pi1j-1)]εij-1+ϵij
(14)
εij=q=2k=1ρkεij-k+ϵij
(15)
εij=p=1k=1ρkεij-k+q=1k=1θ1aij-k+aij+ϵij
(16)
Fitting equations and goodness-of-fit statistics
The equations were fitted with generalized nonlinear least squares (GNLS) of the statistical package NLME (nonlinear mixed effects models) (Pinheiro et al., 2015), in the R environment (R Core Team, 2017). In order to assess the fit of taper and dominant height growth equations, combinations of the variance functions with correlation structures like those studied by Pinheiro and Bates (2000) were used. The goodness-of-fit was evaluated using the following six statistics: root mean square error (RMSE), adjusted coefficient of determination (R2a), Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC), Log-Likelihood (logLik), coefficient of variation (CV) and absolute mean bias (Bias). A rating system was generated with these statistics in order to select the best varFunc and corStruct combinations. Each statistic was assigned a value from 1 to 9; 1 corresponds to the combination with the best statistic, and 9, to the one with the worst statistic (Sakici et al., 2008; Tamarit et al., 2014). The Durbin-Watson (Dw) statistic (Durbin and Watson, 1971) was used to evaluate the correction of the autocorrelation, with a robust modification (DwM), such as the average Dw between groups, since errors are regarded as dependent on the measurements of each tree, but not on the general dataset.
The modified statistic is shown in equation 17. The homogeneity of variances was tested using Bartlett’s test (Bartlett, 1954; Layard, 1973; Hidding et al., 2013); the variation between the groups and a significance value of 1 % (
) were considered for taper and dominant height growth equations. The Assumption of homogeneity of variance (null hypothesis, H0) is expressed in equation 18; therefore, higher values than 0.01 of probability value assume variance homogeneity, and, conversely (alternate hypothesis, Ha), lower values than 0.01 imply that at least two variances are different.
DwM=Jj=2(εij-εij-1)2Jj=2ε2ijN
(17)
Where:
= Modified Durbin-Watson statistic
= Diameter or height
residual for the tree
= Number of observations in each tree
= Number of trees i in the database
X2=(N-k)ln(s2p)-ki=1(ni-1)ln(S2i)1+13(k-1)(ki=1(1ni-1)-1N-k)
(18)
Where:
= Samples of size
= Sample variance
(estimation of the combined variance)
Results and Discussion
The combinations of variance functions and correlation structures generated 36 models for taper, and 36 for height growth, based on equations 1 and 2, respectively. The fit statistics and the ranking score (RS) showed the goodness-of-fit of the equations (Table 2, for taper, and Table 3, for height growth). Also the DwM statistics and probability of Bartlett’s test of homogeneity of variances (P-value).
The ranking score exhibited the combinations of variance functions with correlation structures by hierarchical order (Sakici et al., 2008). The lowest RS value was the statistically best combination, and the highest RS corresponded to the worst combination, based on the sum of the ranks for each fitting statistic (Tamarit et al., 2014).
In the case of taper, the varPower function combined with corARMA3-2 performed best, while the worst combination was with corCAR1. In all cases of correlation structures combined with varPower, values ranges from 1.634 to 1.862 were displayed for the DwM statistic. However, for the test of homogeneity of variances, all the combinations were heteroscedastic. In the variance function varExp, all the combinations with correlation structures, except for corCompSymm, were homoscedastic (P>0.01). And the corARMA1-1 structure performed best according to the RS, with a DwM value of 1.553.
The varConstPower and varComb functions combined with the correlation structures have consistent statistics and independent residuals (DwM values from 1.722 to 1.973); however, they exhibit heteroscedasticity in Bartlett’s test. Structures corARMA3-1 and corARMA2-1 were the best for varConstPower and varComb, respectively.
In combinations of variance functions and correlation structures, the height growth generated DwM statistics of approximately 1.12. However, for most equations, they exhibited homogeneous variances. The varPower function is likewise statistically combined with the corARMA2-1, corARMA2-2 and corARMA3-2 structures, since the RS was 22 for all three cases. The corCAR1 structure had the lowest fitting, with unequal variances. Table 3 shows the behavior of the variance functions varExp, varConstPower and varComb with the combinations of correlation structures.
In all the fitted equations, the estimated parameters were statistically different from zero at a significance level of 5 % (P<0.05). The estimated parameters for the best combinations of each variance function with the correlation structures are summarized in Table 4, for both taper (T) and height growth (HG). In the case of taper, the combinations varExp&corARMA3-2, varExp&corARMA2-2 and varConstPower&corARMA3-1 represented 14 parameter estimates, and the test of homogeneity of variances displayed unequal variances; while the combination varExp&corARMA1-1, with 12 parameter estimates, exhibited constant variances.
The estimated parameters defining the changes of the dendrometric stem shapes of the segmented taper model were found for the change from neiloid into paraboloid, as 4.0 % to 7.2 %; while the change from paraboloid into cone occurs in average at 70 % of the total height in the tree stem profiles. Similar results have been documented by researchers for different Pinus species (Uranga-Valencia et al., 2015; Özçelik and Crecente-Campo, 2016; Corral-Rivas et al., 2017; Tamarit et al., 2017), with values range from 4 % to 7 % for the first inflection point, and range from 55 % to 75 % for the corresponding second inflection point. Furthermore, continuous-time autoregressive structures of orders 1 and 2 and power functions were used, and a known variance was assumed. Nevertheless, the studies do not include the corresponding test of homogeneity of variances.
The tendency of the residuals by relative (h/H; %) of the four best combinations of variance functions with correlation structures for taper are shown as box and whisker plots (Figure 1). Although the tendencies are very similar, only the varExp&corARMA1-1 (H2-A5) combination exhibited a constant variance, and the dispersion of the extreme values are shown closer to the values of the corresponding quantiles.
The combinations in the height growth equations with the largest number of parameter estimates (9) were varExp&corARMA3-2 and varConstPower&corARMA2-2; the varExp&corARMA2-1 and varComb&corARMA2-0 combinations had seven estimated parameters.
The use of the equations with multiple parameters can cause an overparameterization, and the predictions resulting from them could not be the most efficient (Gregoire and Schabenberger, 1996). In the four cases, the variances were constant, which guarantees that the parameters are unbiased and efficient and have a minimum variance (Gujarati and Porter, 2011; Tang et al., 2016).
The selected combinations of the height growth equations generated a dispersion of residuals close to zero line, which appear in the form of boxes and whisker plots by relative height (h/H, %) in Figure 2. Although there is an overestimation from 1 % to 10 % relative height classes, and an underestimation from 10 % to 25 % relative height classes, the variances were constant for all four cases, according to Bartlett’s test of homogeneity of variances (Bartlett, 1954; Hidding et al., 2013).
As for the residuals, although the DwM test generated values of approximately 1.2, they were considered to be independent within the measurements of height and age carried out in the same tree (Crecente-Campo et al., 2013; Quiñonez-Barraza et al., 2015), and therefore the estimated parameters are unbiased, consistent, and efficient (Williams et al., 2013).
In dominant height growth and site index equations with an algebraic difference approach (ADA) or generalized (GADA) equations, the power or exponential functions have been used for the correlation of the heteroscedasticity, in which unequal variances are assumed to exist in the generalized least squares adjustment process (Castillo et al., 2013; Quiñonez-Barraza et al., 2015; Rodríguez-Carrillo et al., 2015; González et al., 2016), in addition to continuous-time autoregressive structures of the errors, in order to correct the autocorrelation with tree stem analysis of various species in natural forests and commercial forest plantations, such as Pinus durangensis Martínez, P. arizonica Engelm., Juniperus deppeana Steud, P. teocote Schiede ex Schltdl. & Cham. and P. ayacahuite Ehrenb. ex Schltdl. The approach developed in this paper considers the correction of the heteroscedasticity and of the autocorrelation as combinations of functions (Rodríguez et al., 2013; Williams et al., 2013).
Figure 3 contrasts the predictions of taper and height growth equations with the selected combinations of power variance functions and correlation structures; it shows the observed tendency of the profile of a tree, the fitted equation without heteroscedasticity and autocorrelation correction (NC) and the combinations H1-A9, H2-A5, H3-A8 and H4A6. Only the combination H2-A5 exhibited constant variances. With regard to the height growth equation, the comparison between three SI curves for the fitted equation without correction (IS10, IS14 and IS18) and also the equations with corresponding combinations H1-A6, H2-A9, H3-A7 and H4-A4 are graphically represented. Furthermore, this figure shows the growth tendencies of the trees in dataset; in this case, all the combinations displayed constant variances, whereby the desirable properties in the estimated parameters are guaranteed (Beale et al., 2010; Vogelsang, 2012).
Conclusions
The variance functions in combination with the correlation structures corrected the heteroscedasticity assumptions of variances and error autocorrelation in the taper and dominant height growth equations, using a generalized nonlinear least square approach; as a result, unbiased parameters with a minimum variance were obtained.
The predictions of the selected taper equations are more efficient, with consistent fitting statistics; the combination of the exponential variance function with an autoregressive-moving average correlation structure (corARMA1-1) produces constant variances by relative height categories of the stem profiles. The dominant height and site index model, at a base age of 60 years, exhibits realistic predictions. The selected combinations (H1-A6, H2-A9, H3-A7 and H4-A4) exhibit constant variances by relative height categories at specific ages. In both, the residuals are independent, and the properties of the tests of hypothesis of the estimated parameters are guaranteed.
The use of compatible taper and commercial volume equations, as well as of height growth and index site equations, is defined by the use of the intrinsic parameters in each equation; therefore, the parameters of the variance functions and correlation structures are only statistical indicators for rendering the fitting more efficient.
Resumen:
Abstract:
Main Text
Introduction
Materials and Methods
Study area and description of experimental data
Fitting models
Autocorrelation and heteroscedasticity
Fitting equations and goodness-of-fit statistics
Results and Discussion
Conclusions