Sampling and nonsampling errors are often measured by the quantities, bias and variance. The bias of an estimator of an unknown population value is the difference, averaged over all possible samples of the same size and design, between the estimator and the unknown population value. Any systematic error, or inaccuracy that affects all samples of a specified design in a similar way, may bias the resulting estimates. Variance is the squared difference, averaged over all possible samples of the same size and design, between an estimator and its average value. Descriptions of sampling and nonsampling errors for the 1997 Commodity Flow Survey (CFS) are provided in the following sections.
The particular sample used in this survey is one of a large number of samples of the same size and design that could have been selected. If all possible samples had been surveyed, under the same conditions, an estimate of an unknown population value could have been obtained from each sample. The estimates obtained from these samples give rise to a distribution of estimates for the unknown population value. A statistical measure of the variability among these estimates is the standard error, which can be approximated from any one sample. The coefficient of variation (or relative standard error) of an estimate is the standard error of the estimate divided by the estimate. Measures of sampling variability, such as the standard error or coefficient of variation, are estimated from the sample and are also subject to sampling variability. (Technically, we should refer to the estimated standard error or the estimated coefficient of variation of an estimator. However, we have omitted this detail for the sake of brevity.) It is important to note that the standard error and coefficient of variation only measure sampling variability. They do not measure any biases in the estimates.
An estimate of an unknown population value and its approximate standard error can be used to construct a confidence interval. A confidence interval is a range about a given estimator that has a specified probability, or confidence, of containing the unknown population value. If, for each possible sample, an estimate of an unknown population value and the estimate's approximate standard error were obtained, then:
For approximately 90 percent of the possible samples, the interval from 1.65 standard errors below to 1.65 standard errors above the estimate would include the unknown population value.
For approximately 95 percent of the possible samples, the interval from two standard errors below to two standard errors above the estimate would include the unknown population value.
A potentially large source of bias in the estimates is due to nonresponse. Nonresponse is defined as the inability to obtain all the intended measurements or responses from all the selected establishments. Four levels of nonresponse can occur in the CFS: item, shipment, quarter (reporting week), and establishment. Item nonresponse occurs either when a question is unanswered or the response to the question fails computer or analyst edits. Item nonresponse is corrected by imputation. (Imputation is the procedure by which a missing value is replaced by a predicted value obtained from an appropriate model.) Shipment, quarter, and establishment nonresponse are used to describe the inability to obtain sufficient information about a sampled shipment, quarter, or establishment, respectively, that prevents it from contributing to tabulations. Shipment and quarter nonresponse are corrected during the estimation procedure by reweighting. Reweighting allocates characteristics to the nonrespondents in proportion to the characteristics observed for the respondents. The amount of bias introduced by this nonresponse adjustment procedure depends on the extent to which the nonrespondents differ, characteristically, from the respondents. Establishment nonresponse is corrected during the estimation procedure by the SIC-level adjustment weight. (See Appendix C for a description of the estimation procedure.) In most cases of establishment nonresponse, none of the four questionnaires have been returned to the Census Bureau, after several attempts to elicit a response. Approximately 67 percent of the sampled establishments provided at least one quarter of data that contributed to tabulations.
Some possible sources of bias that are attributed to respondent-conducted sampling include misunderstanding the definition of a shipment, constructing an incomplete frame of shipments from which to sample, ordering the shipment sampling frame by selected shipment characteristics, and selecting shipment records by a method other than the one specified in the questionnaire's instructions. We often contacted respondents who reported shipments having atypically large value or weight when compared to the rest of their reported shipments. Upon contact, if we are able to collect information on all of a given respondent's large shipments made either for a particular reporting week or for the entire quarter, then we identify these large shipments as certainty shipments. (See Appendix C for a description of how certainty shipments are used in the estimation process.)