April
25, 2024
January
, 2024
In sampling of pests with low densities, it is common to obtain a large number of zeros, which is difficult to manage since the Poisson and negative binomial probability distributions are not suitable for modeling and equations to estimate the optimal sample size are not available. In this study model the excess of zeros by estimating parameters through the methods of moments and maximum likelihood of the zero-inflated Poisson and zero-inflated negative binomial distributions, and to derive equations to calculate the optimal sample size. Systematic sampling was used to select 100 trees per grove of Río Red grapefruit (Citrus paradisi Macfad) at Finca Sayula, Veracruz, Mexico (latitude 19.20722, longitude -96.35194), from June to July 2021 and January 2022. The number of leafminers (Phyllocnistis citrella Stainton) and aphids (Toxoptera citricida Kirkaldy) present in three leaves per shoot per tree, considered as a sample unit, was counted. Simulations were performed in RStudio with different proportions of zero (0.1, 0.4, and 0.6) to compare the parameters obtained in the field using the methods of moments and maximum likelihood. Equations were derived to estimate the optimal sample size in studies of pests with low densities, based on the zero-inflated Poisson and zero-inflated negative binomial probability distributions. The method of moments yields optimal sample sizes smaller than those obtained by maximum likelihood, because they distinguish the origin from zero, so its use is recommended.
Keywords::
sampling, zero-inflated negative binomial, zero-inflated Poisson
In the population dynamics of pest organisms, count data reflect the presence and abundance of species in a fixed period of time (Hashim et al., 2021). It is common for samples of pest populations to present values of zero in excess due to the complex interactions between biotic and abiotic components, to the inherent characteristics of pest species, to spatial-temporal dependencies, to unexplained environmental heterogeneity (Zou et al., 2021) and agroecological control techniques (Villanueva-Jimenez et al., 2017; García-González et al., 2018).
The study and monitoring of the periods in which pest organisms have excess zeros can be very useful since they allow carrying out preventive management of their populations and recognizing early stages of pest invasion for the application of preventive management methods, such as those offered by precision agriculture (Jankielsohn, 2017; Clay et al., 2018), as well as the use of combat tactics before pests cause damage to crops, which would prevent the abusive use of organic-synthetic pesticides, thus also reducing damage to the environment (Shannon et al., 2018; Talaviya et al., 2020).
The excess of zeros is a theoretical and practical problem that arises when the high frequency of zeros alters the probabilities expected by the discrete variable distributions of Poisson and negative binomial (Yesilova et al., 2010; Hashim et al., 2021; Haslett et al., 2022) and no attention has been paid to the mechanisms that explain the origin of zero despite its impact on the estimation of population parameters in species of pest organisms (Haslett et al., 2022).
For the study of pest populations in agroecosystems, it is proposed to analyze the excess of zeros from the proposals of (Mullahy, 1986; Lambert 1992); that is, recognize two possible origins of zero, distinguishing between structural zero (plants without susceptible shoots for the establishment of a pest) and non-structural zero (plants with susceptible shoots free of the pest and susceptible shoots plagued), model zero by its origin with binomial distributions (Lambert, 1992; Zou et al., 2021: Haslett et al., 2022) and depending on the observed value of counts greater than zero, study the effect of overdispersion (Hall, 2000; Cheung, 2002; Doyle, 2009).
In pest counts, the optimal sample size equations for the Poisson or negative binomial distribution are used on a recurring basis, but due to the excess of zeros, the estimated optimal sample sizes are so large as to be impractical (Southwood and Henderson, 2000); however, in integrated pest management, there are no equations that estimate the optimal sample size of zero-inflated distributions, nor proposals that consider the origin of zero.
Equations estimating the optimal sample size are proposed here (Karandinos, 1976), which are adjusted to zero-inflated distributions. The objectives of the present research were: model the excess of zeros, estimate the parameters using the methods of moments and maximum likelihood of the zero-inflated Poisson and zero-inflated negative binomial distributions, and derive equations to calculate the optimal sample size.
For the estimation of the optimal sample size, the excess of zeros was modeled; the parameters were determined by the methods of moments and maximum likelihood of the zero-inflated Poisson and zero-inflated negative binomial distributions and the equations for calculating the sample size were derived.
To model the excess of zeros, the following stages were performed: i) the absence of plant tissue that allows the pest to be housed was included as a cause of extra-zeros. In this way, there were two origins: the ‘structural zero’, when there is no susceptible tissue in the plant that can be occupied by the pest and the ‘non-structural’ zero, when there is adequate tissue in the plant, but it is not inhabited by a pest.
With this definition, the frequency of structural zero was modeled using a binomial distribution (Mullahy, 1986). Where: X is the number of structural zeros present in a sample size n, therefore:
. Where:
Thus, the probability function of the random variable X or the number of structural zeros in the sample of size n is given by:
1). If
2). Where:
The Poisson distribution is used on a sample
3). Where: λ is the mean of the number of insects in the population, excluding structural zeros (ie., sample units without susceptible tissue are not considered).
With overdispersion, the negative binomial is used, where Y is the number of insects in a unit that is not a structural zero:
4). Where: λ is the mean of the number of insects in the population, excluding structural zeros; k is an overdispersion parameter and Γ(y) is the gamma mathematical function. In this way, estimates are not affected by excess zeros (structural zeros).
It can be noted that, under this scheme, the probability of a non-structural zero is given by:
if it is Poisson and
5). The mean of this distribution is
6). The mean of this distribution is
To obtain the parameters of the distributions i) zero-inflated Poisson; and
ii) zero-inflated negative binomial, the methods of moments and maximum
likelihood were used. a) For the zero-inflated Poisson distribution, the
moment estimators for
7). With
The maximum likelihood estimators for
8); b) for the zero-inflated negative binomial distribution, there are no
moment estimators for
9). If structural zeros are excluded, the
10). Where:
The maximum likelihood estimator for
11). Based on the above, it is proposed to use the moment estimators of the negative binomial distribution (Banik and Kibria, 2009), but excluding structural zeros from the equation, as an approximation to the moments of the zero-inflated negative binomial distribution.
To derive the equations of optimal sample size, the parameters obtained from
the models iii and iv were substituted in the equations of Karandinos (1976), related to the
coefficient of variation (CV), the fixed proportion of the mean (
Six systematic samplings (n= 100) were carried out in three Río Red grapefruit (Citrus paradisi Macfad) groves at Finca Sayula, SPR de RL de CV, Veracruz, Mexico (latitude 19.20722, longitude -96.35194). Sampling data were direct counts in small units (three leaves per shoot per tree), conducted during the months of June and July 2021 and January 2022.
Three of the samplings were carried out to detect the presence of the citrus leafminer Phyllocnistis citrella Stainton and three more to detect the presence of the citrus tristeza virus vector aphid Toxoptera citricida Kirkaldy. In addition, three samplings were simulated with zero-inflated Poisson and three samplings with zero-inflated negative binomial; both with n= 100, randomly generated numbers. The simulations were performed with RStudio using the programs rbinom (100, size = 1, prob = 0.1, 0.4, 0.6), rpois (100-x, 1.5), rnbinom (100, 1.5) and zeroinfl (x∼1 | 1, dist = ‘poisson’, ‘negbin’) of the vgam and pscl libraries.
For the six field samplings, three of P. citrella (Table 2) and three of T. citricida (Table 3), and for the six simulations (Table 4), the simulated and observed proportion of structural zeros, the non-structural zeros, the overdispersion parameter k, the probability of structural zero and the optimal sample size were estimated using the coefficient of variation equations, proportion of mean and half confidence interval (Table 1).
| Sampling | Method | Probability distribution | Prsz / Prnsz | k | pe | CV | D |
h |
|---|---|---|---|---|---|---|---|---|
| 1 | log-lik mom | ZIP ZINB ZIP ZINB | 0.33/0.43 | 1.4e-5 1.29 | 0.67 0.67 0.629 0.33 | 81 69 70 75 | 81 69 70 75 | 51 51 - 351 |
| 2 | log-lik mom | ZIP ZINB ZIP ZINB | 0.27/0.45 | 1.9e-5 2.69 | 0.537 0.537 0.465 0.27 | 53 55 43 102 | 53 55 43 102 | 41 41 - 472 |
| 3 | log-lik mom | ZIP ZINB ZIP ZINB | 0.13/0.46 | 8.1e-6 1.35 | 0.543 0.543 0.499 0.13 | 54 34 47 42 | 54 34 47 42 | 148 148 50 1151 |
| Sampling | Method | Probability distribution | Prsz/ Prnsz | k | pe | CV | D |
h |
|---|---|---|---|---|---|---|---|---|
| 1 | log-lik mom | ZIP ZINB ZIP ZINB | 0.33/0.64 | 181.8 0.02 | 0.97 0.97 0.987 0.33 | 1061 2994 2447 18 | 1061 2994 2447 18 | - 24686 - 1207 |
| 2 | log-lik mom | ZIP ZINB ZIP ZINB | 0.27/0.68 | 0.426 0.056 | 0.95 0.949 0.96 0.27 | 623 450 801 17 | 623 450 801 17 | 2266 3945 - 983 |
| 3 | log-lik mom | ZIP ZINB ZIP ZINB | 0.13/0.84 | 0.474 0.025 | 0.97 0.969 0.978 0.13 | 1050 779 1475 12.55 | 1050 779 1475 12.55 | 5738 8486 - 854 |
| Sampling | Method | Probability distribution | Prsz | k | pe | CV | D |
h |
|---|---|---|---|---|---|---|---|---|
| ZIPS1 | log-lik | ZIP | 0.1 | 4.8e-5 | 0.089 | 19 | 19 | 29 |
| ZIPS2 | log-lik | ZIP | 0.4 | 0.107 | 0.479 | 45 | 45 | 31 |
| ZIPS3 | log-lik | ZIP | 0.6 | 1e-5 | 0.476 | 45 | 45 | 22 |
| ZINBS1 | log-lik | ZINB | 0.1 | 2.221 | 0.005 | 39 | 39 | 664 |
| ZINBS2 | log-lik | ZINB | 0.4 | 0.623 | 0.429 | 32 | 32 | 1268 |
| ZINBS3 | log-lik | ZINB | 0.6 | 0.656 | 0.651 | 62 | 62 | 1935 |
The equations proposed to estimate the optimal sample size of pests with excess zeros are detailed in the methodology (Table 1).
It was found that the optimal sample size calculated by the proportion of the
mean (
The optimal sample size of half the confidence interval (h) increased as the overdispersion parameter (k) increased, resulting in very large or difficult-to-estimate optimal sample sizes when pest populations have excess zeros (Tables 2, 3 and 4).
The estimation of the optimal sample size by log-likelihood of the parameter k of the samples of P. citrella (Table 2) indicated that the samples have zero-inflated Poisson distribution. The k estimated by the moment method of the zero-inflated negative binomial distribution, by excluding structural zeros, showed that non-structural zeros and positive integer values had overdispersion.
This result is consistent with that reported by Banik and Kibria (2009), who indicated that, by conditioning or eliminating the structural zeros of a population modeled with a zero-inflated Poisson distribution, it can also be modeled with a negative binomial distribution, provided that the data of the non-structural component present overdispersion.
The values of pe for the methods of moments and log-likelihood for zero-inflated Poisson were similar, therefore, both methods are efficient for the estimation of the parameters. The estimated sample sizes for P. citrella are smaller when estimated by moments than by log-likelihood, even when the number of structural zeros (Prsz) is greater; however, the difference between the two estimates is not very large (< 20 units).
The effect of overdispersion significantly affected the sample size estimated
by h; for P. citrella, the results
indicate that estimation by CV or by
In the samplings of T. citricida (Table 3), an insect with a high tendency to aggregation, the k values estimated by log-likelihood indicate populations with zero-inflated negative binomial distribution. The value of k by the method of the moments resulted in low values, which indicates that, when excluding the structural component, the few sample units found with pest presented low variation.
The result is interesting since populations with zero-inflated negative
binomial distribution present random distribution at the farm level, but the
few occupied trees had a high number of individuals, indicating aggregation,
in accordance with the biology of the insect. The exclusion of structural
zero, the frequency of non-structural zeros, and the reduction of variation
in counts with positive integer values resulted in sample sizes very small
for CV and
The optimal sample size of the zero-inflated negative binomial distribution, calculated by moments, is smaller because it distinguishes the different origins of zero. By considering only the non-structural zeros and the positive integer values for the estimation of the sample size, a difference was established with the parameters estimated by log-likelihood that does not distinguish the origin of zero. Therefore, the method of moments for zero-inflated Poisson and zero-inflated negative binomial allows estimating optimal sample sizes similar to or smaller than those estimated by maximum likelihood.
In the simulations (Table 4), it was observed that, as the number of structural zeros increased, the sample size increased in both distributions since, as the sample size was only estimated by the log-likelihood method, when simulating, the origin of zero is not distinguished. In addition, the estimated value of the overdispersion parameter k is consistent with the values obtained in the field.
For zero-inflated Poisson, very small k values were obtained due to the proximity of the mean and variance values, while for the simulations of the zero-inflated negative binomial, the overdispersion parameter was greater than zero, indicating overdispersion, similar to that reported by Zou et al. (2021); Haslett et al. (2022).
The zero-inflated Poisson and zero-inflated negative binomial probability distributions allow modeling populations of pest organisms with low densities and excess zeros. The parameters obtained by the moment method distinguish the origin of zero and estimate optimal sample sizes equivalent to or less than those estimated by log-likelihood, which does not distinguish the origin of zero. A zero-inflated Poisson population can also be modeled with a negative binomial distribution, provided that the non-structural component is overdispersed.
The estimation of the optimal sample size in pest populations with excess zeros
can be performed equivalently with the coefficient of variation (CV) equation
and the mean proportion (