Accessibility Information  Geographic Profile of Employment and Unemployment, 2014  Bulletin 2873  
Appendix B: Sampling and estimation procedures and sampling error tables 




Appendix B Tables: (PDF) The estimates presented in this bulletin are based on annual averages of monthly data obtained from the Current Population Survey (CPS), a sample survey of the civilian noninstitutional population. The survey, conducted each month by the U.S. Census Bureau for the U.S. Bureau of Labor Statistics, provides comprehensive data on the labor force, including such characteristics as age, gender, race, Hispanic or Latino ethnicity, marital status, occupation, and industry. The survey also provides data on the characteristics of those not in the labor force. Each month, trained interviewers collect information from a scientifically selected sample of about 60,000 eligible households. This sample, designed to represent the civilian noninstitutional population, also includes about 10,000 households in order to meet the requirements of the State Children’s Health Insurance Program (SCHIP) legislation. The SCHIP legislation required the Census Bureau to improve state estimates of the number of children who live in lowincome families and lack health insurance. These estimates are obtained from the Annual Demographic Supplement to the CPS. In September 2000, the Census Bureau began expanding the monthly CPS sample in 31 states and the District of Columbia because of the SCHIP legislation. Selected respondents in the eligible households are interviewed to obtain information about the employment status of each household member 16 years of age and older. The information that is collected pertains to a “reference week,” usually the calendar week (Sunday to Saturday) that includes the 12^{th} of the month, with actual interviewing occurring during the week following the reference week—known as the “survey week.” Sampling procedures The 2014 sample encompasses 824 sample areas, with coverage of every state and the District of Columbia. It is based, to a large extent, on information about the distribution of the population as reported in the Census 2000 enumeration. (A redesigned Census 2000based sample was phased in from April 2004 through July 2005.) The 824 areas were selected by dividing the entire area of the United States into 2,025 primary sampling units (PSUs). With some minor exceptions, a PSU consists of a county or a number of contiguous counties. Most metropolitan areas constitute separate PSUs. To improve the efficiency of the sample, the 2,025 PSUs are grouped into strata within each state. Those PSUs which are in a stratum by themselves are called selfrepresenting and are generally the most populous in each state. Other strata are formed by combining PSUs that are similar in such characteristics as population growth, proportion of Blacks and Hispanics, and distributions by occupation and industry and by age and gender. PSUs selected from these strata are nonselfrepresenting, because each one chosen represents the entire stratum. One PSU is selected from each stratum, with the probability of selection proportional to the relative population size of the PSU. In states with a SCHIP sample, the selfrepresenting PSUs are the same for both the regular CPS and SCHIP. In most states, the same nonselfrepresenting sample PSUs are in the sample for both the regular CPS and SCHIP; however, to improve the reliability of the SCHIP estimates in Maine, Maryland, and Nevada, the SCHIP nonselfrepresenting PSUs are selected independently of the regular CPS sample PSUs, with replacement. The method for stratification of PSUs for SCHIP in these states is similar to that of the other stratifications, except that the stratification variable used is the number of people under age 18 with household income below twice the poverty level. Within each of the selected PSUs, the number of households to be enumerated each month is determined in two steps. First, a sample of the unit’s census enumeration districts (EDs) is selected through the use of the population size probability selection procedure. EDs are administrative units and contain, on average, about 300 households. Second, clusters of approximately four addresses (contiguous wherever possible) are selected to be enumerated within each designated ED. Part of the sample is changed, or rotated, each month. A given rotation group is in the sample for 4 consecutive months, leaves the sample during the next 8 months, and then returns for another 4 consecutive months. A primary reason for rotating the sample is to minimize the lack of cooperation that may result from interviewing a constant panel indefinitely. The rotation plan provides for threefourths of the sample to be identical from one month to the next and onehalf to be identical with that from the same month a year earlier. Methods of estimation Under the methods of estimation used in the CPS, all of the results for a given month become available simultaneously and are based on returns from the entire sample of respondents. The estimation procedure involves weighting the data from each respondent by the inverse of the probability of the person being in the sample. The result gives a rough measure of the number of actual people that each sample person represents. Through a series of estimation steps (outlined next), the selection probabilities are adjusted for noninterviews and survey undercoverage; data from previous months are incorporated into the estimates through the composite estimation procedure. 1. Noninterview adjustment. The weights for all interviewed households are adjusted to the extent needed to account for occupied sample households for which no information was obtained because of absence, impassable roads, refusals, or unavailability of the respondents for other reasons. This noninterview adjustment is made separately for clusters of similar sample areas that are usually, but not necessarily, contained within a state. Similarity of sample areas is based on metropolitan area status and size. Within each cluster, there is a further breakdown by residence. The proportion of sample households not interviewed averages about 7 percent to 8 percent, depending upon a number of factors, including weather and vacations. 2. Ratio estimates. The distribution of the population selected for the sample may differ somewhat, by chance, from that of the population as a whole in such characteristics as age, race, gender, and state of residence. Because these characteristics are closely correlated with labor force participation and other principal measurements made from the sample, the survey estimates can be substantially improved when weighted appropriately by the known distribution of the population characteristics. This task is accomplished through four stages of adjustment, as follows:
The use of the categories of Black alone and nonBlack alone compensates for the fact that the racial composition of a nonselfrepresenting (NSR) sample PSU could differ substantially from the racial composition of the stratum it is representing. This adjustment is not necessary for selfrepresenting (SR) PSUs, because they represent only themselves. Adjustment factors are computed for the two race categories for each state containing NSR PSUs. The Blackalone and nonBlackalone cells are collapsed within a state when a cell meets one of four sampling criteria.^{1} As a result of these criteria, the firststage ratio adjustment actually is used (i.e., does not collapse to 1.0) in less than half of the states.
The adjustment is done separately for each MIS pair (1 and 5, 2 and 6, 3 and 7, and 4 and 8). Because adjusting the weights to match one set of controls can cause differences in other controls, an iterative process is used to simultaneously control all variables. Successive iterations begin with the weights as adjusted by all previous iterations. Ten iterations are performed, resulting in (virtual) consistency between the sample estimates and the population controls. The independent population controls used for the CPS are produced by the Census Bureau’s Population Division. The CPS population controls are based on a demographic framework of population accounting. Under this framework, time series of population estimates and projections are anchored by the latest decennial census enumerations, with populations for dates since the latest decennial census derived from the estimation, or projection, of population change. In the simplest terms, information from a variety of data sources is used to derive estimates of population change by adjusting the resident population as enumerated in the latest decennial census for births, deaths, and net migration. Estimates of the resident population are adjusted to represent the civilian noninstitutional population 16 years of age and older (the eligible CPS population) by subtracting estimates of the number of residents under 16 years of age, the number of residents in the Armed Forces, and the number of residents who are institutionalized. 3. Composite estimation procedure. The last step in the preparation of most CPS estimates makes use of a composite estimation procedure. The composite estimate consists of a weighted average of two factors: (1) the secondstage ratio estimate based on the entire sample from the current month and (2) the composite estimate for the previous month, plus an estimate of the monthtomonth change based on the six rotation groups common to both months. In addition, a bias adjustment term is added to the weighted average to account for relative bias associated with MIS estimates. The compositing procedure results in a further reduction in sampling error—that is, a reduction beyond that which is achieved after the two stages of ratio adjustment. Effective with the release of January 1998 data, a new composite estimation method was implemented for the CPS. The new technique provides increased operational simplicity for microdata users and allows optimization of compositing coefficients for different labor force categories. Under the new procedure, weights are derived for each record. These weights, when aggregated, produce estimates consistent with those produced by the composite estimator. Under the previous procedure, composite estimation was performed at the macrolevel. The composite estimator for each tabulated cell was a function of the aggregated weights for respondents contributing to the cell in question in current and previous months. The different months of data were combined by use of compositing coefficients. Thus, microdata users needed several months of data to compute composite estimates. To ensure consistency, the same coefficients had to be used for all estimates. The values of the coefficients selected were much closer to optimal for unemployment values than for employment or labor force values. The new composite weighting method involves two steps: (1) the computation of composite estimates for the main labor force categories, classified by important demographic characteristics, and (2) the adjustment of the microdata weights, through a series of ratio adjustments, to agree with these composite estimates, thus incorporating the effect of composite estimation into the microdata weights. Under this procedure, the sum of the composite weights of all sample people in a particular labor force category equals the composite estimate of the level for that category. Thus, to produce a composite estimate for a particular month, a data user needs simply to access the microdata file for that (single) month and compute a weighted sum. The new composite weighting approach also improves the accuracy of labor force estimates by using different compositing coefficients for different labor force categories. The weighting adjustment method ensures additivity while allowing variation in compositing coefficients. Reliability of the estimates The estimates in this bulletin are based upon a sample of the population rather than a complete count. Therefore, they may differ from the figures that would have been obtained if it had been possible to take a complete census using the same questionnaire and procedures that are used in the CPS. There are two types of errors in an estimate based on a sample survey: sampling error and nonsampling error. Tables B2 through B5 indicate the magnitude of the sampling error. They also partially measure the effect of some nonsampling errors in response and enumeration but do not measure any systematic biases in the data. Sampling variability. The standard error is primarily a measure of sampling variability—that is, the variation that occurs by chance because a sample rather than the entire population is surveyed. The sample estimate and its standard error enable one to construct confidence intervals: ranges that would include the average result of all possible samples with a known probability. For example, if all possible samples were selected, each of these samples were surveyed under essentially the same conditions by use of the same sample design, and an estimate and its estimated standard error were calculated from each sample, then the following would occur:
The error of a sample estimate varies inversely with the size of the sample and directly with the size of the estimate. Hence, an estimate for a subgroup constituting a small proportion of a population will tend to have a larger error relative to its size than will an estimate for a larger subgroup. Reliability standards The CPS sample design takes into consideration both national and state reliability. For the state data, a minimum reliability standard is set: an expected maximum coefficient of variation (CV) on the level of total unemployment of 8 percent annually. This CV is calculated with the assumption of a 6percent unemployment rate. Because each states' sample design must meet the reliability standard, the CPS sampling rate differs by state. (The sampling rate is the proportion of all households that are selected for the sample.) Generally, the smaller the state population, the higher is the sampling rate. Sampling rates range roughly from 1 in every 200 households to 1 in every 2,500 households in each stratum within the state. Publication standards for state and area CPS data To achieve comparability of the data for regions, divisions, states, metropolitan areas, metropolitan divisions, and cities for publication purposes, a unique requirement for minimum levels for the labor force and for employment and unemployment was developed for each area. This requirement is based on the known differences in sampling rates among these areas. Before estimates are published for a specific category (such as Hispanic unemployment in a particular state), a predetermined “critical cell” must meet a 50percent CV requirement. As a result of this requirement, minimum bases for publication have been developed for each area. Table B1 lists the minimum necessary base for publication of data in each of the census regions and divisions; in the states and the District of Columbia; and in the metropolitan areas, metropolitan divisions, and cities appearing in this bulletin. Estimates are not shown when they do not meet the minimum base for the state or area listed in table B1. In tables showing the labor force status of the population—that is, the number of employed and unemployed—publishability is determined by whether the labor force level exceeds the minimum base for unemployment in table B1. If the labor force level is less than the unemployment minimum base, all data—labor force, employment, unemployment, and unemployment rate—are suppressed. In all other tables, the determining factor is whether the size of the base of the distribution exceeds the minimum base for employment or unemployment separately, depending on whether the table presents a distribution of employment or unemployment for the area or population subgroup. For example, in the table showing unemployed people by reason for unemployment, the entire line of data will be suppressed if the total unemployment is less than the minimum base for unemployment. If a subgroup appears in the table (such as a given gender or race), data for the subgroup also will be suppressed if the total for the reason in question does not meet the minimum base. Data are not published for any cell with a level of less than 500 people or less than 0.05 percent of the total for a given characteristic. Using the sampling error tables Tables B2 through B5 provide sampling errors for use in constructing 90percent confidence intervals (approximately 1.645 standard errors) for major labor force characteristics. The sampling errors provided are approximations and thus indicate the order of magnitude of the sampling error rather than the precise amount of the possible error in an estimate. Illustrations on the use of these tables are provided next. In all cases, the computations present the estimated levels in thousands of people. Sampling error of an estimated number. Table B5 shows that an estimate of 50,000 unemployed people in Michigan will have an absolute sampling error of 10,000, for a relative sampling error of 20 percent (10,000/50,000). In comparison, an estimate of 100,000 unemployed people in Michigan has an absolute sampling error of 14,000, yielding a relative sampling error of 14 percent (14,000/100,000). A statement that unemployment for a particular group is between 40,000 and 60,000 in the first instance, and between 86,000 and 114,000 in the second, can be made with approximately 90percent confidence. The latter statement can be interpreted as follows: if one were to draw all possible samples, make an estimate from each sample (using the same methods and techniques), and construct an interval around each estimate (with the sampling errors shown in the tables), then 90 percent of the intervals would contain the average value of all possible samples. To convert a sampling error from 90percent confidence, as displayed in the tables, to 68percent confidence (1 standard error), multiply the sampling error shown in the tables by 0.63. To convert the sampling error from 90percent to 95percent confidence (approximately 2 standard errors), multiply the sampling error by 1.23. For the example given, the sampling error at 90percent confidence is 10,000. At 68percent confidence, the error would be about 6,300 (10,000 × 0.63). At 95percent confidence, the error would be about 12,300 (10,000 × 1.23). Sampling error of a difference. To compute the error of a difference from the tables, an additional step is required. If, for instance, one wishes to know whether a change in the unemployment rate from one year to the next in a particular area for a particular population group is statistically significant or whether the difference in the unemployment rate between two areas or population groups is statistically meaningful, the significance of the difference needs to be computed. (Differences between estimates for 2 consecutive years may be influenced to some extent by a redesign of the CPS concepts, questionnaire, and collection procedures, such as the one that occurred in 1994.) As noted before, differences can take two general forms: (1) differences between population groups and/or geographic areas, and (2) differences for the same population group and geographic area over time. Either type of difference can be calculated with the following formula, noting the limiting covariance assumption discussed later: SE_{d} = [( SE_{1}^{2} +
SE_{2}^{2} ) – 2C × ( SE_{1} × SE_{2}
)]^{1/2}.
SE_{1} = the sampling error of one group or year, SE_{2} = the sampling error of another group or year, and C = the covariance (or relationship) term. The SE_{1} and SE_{2} can be found in the appropriate table of Geographic Profile for each year if the comparison is between different years, because the size of the samples and, consequently, sampling errors may differ from year to year. Values for the covariance, or “C” term, for employment and unemployment for differences between consecutive years are as follows: for labor force or employment levels, C = 0.58; for unemployment levels or rates, C = 0.37. It is important to note that these C terms are usable only for calculating the sampling error of a difference for overtheyear change for the same geographic area and population group. Covariance terms for the relationship between different population groups or geographic areas in this bulletin are not available. In calculating sampling errors for differences between two different population groups or geographic areas, a C term of zero must be assumed. The effect of this assumption is that (1) if the relationship between two groups, areas, or years (differences for nonconsecutive years) is small, then the C term can legitimately be ignored and the sampling errors will not be adversely affected, and (2) if there is a strong positive relationship between the two groups, areas, or years (differences for consecutive years), then the error computed without a C term will be overstated. An overstatement could lead one to state that a difference or change was not statistically significant when, in fact, it was. When there is a strong relationship over time for a characteristic such as employment (people tend to remain employed from one year to the next), the importance of using a C term to calculate the sampling error of a difference over time increases greatly. The next example illustrates how to calculate the sampling error of a difference. Suppose one wished to know whether a hypothetical difference between an unemployment level of 250,000 for a particular population group in California and an unemployment level of 200,000 for the same group in New York was statistically significant at 90percent confidence. Table B5 gives the error for an unemployment level of 250,000 in California as approximately 22,000 and the error for an unemployment level of 200,000 in New York as 19,000. Using the formula described previously without the C term produces the following results (levels in thousands):
SE_{1}^{2} + SE_{2}^{2} = 845; SE_{d} = ( SE_{1}^{2} + SE_{2}^{2} )^{1/2} = 29. Because each state's sample is independent, there is no measurable correlation between the two estimates, and a C term of zero can be assumed. Thus, the error of the difference is approximately 29,000. Because the actual difference (50,000) is greater than the error of the difference, it can be stated with 90percent confidence that the difference in the unemployment level is attributable to factors other than sampling variability alone. Sampling errors for unemployment rates. Unemployment rates and error ranges for these rates are provided in tables 1, 14, and 27. This information can be used to derive a sampling error for an unemployment rate if one is needed. The error range is a 90percent confidence interval around the unemployment rate. By subtracting the estimated unemployment rate from the upper bound of the range (or subtracting the lower bound of the range from the estimated unemployment rate), the sampling error for the rate can be obtained. This sampling error can then be used in the formula given previously for computing the sampling error of a difference, or for any other purpose the user chooses. Interpolation and extrapolation. Although sampling errors are listed for selected levels of employment and unemployment in tables B2 through B5, users may wish to know the sampling error for an estimate whose value is not listed. To derive such a sampling error, it is necessary to use interpolation or extrapolation. For example, in order to derive the sampling error for the 2014 total unemployment level for women in Ohio, it is necessary to use interpolation because table B5 contains no sampling error for an unemploymentlevel estimate of 143,000. The following formula and accompanying example show how to interpolate for this estimate:
In this equation,
A = the estimated value (143,000), F = the table value (200,000) immediately above the estimated value, G = the table value (100,000) immediately below the estimated value, X = the sampling error of F (19,000), and Y = the sampling error of G (14,000). Thus (levels in thousands), SE = {[(143  100) / (200  100)] × (19  14)} + 14 SE = ( 0.43 × 5 ) + 14 SE = 2.15 + 14 SE = 16 If the samplebased estimate lies outside the boundaries of the error tables, extrapolation can be used to approximate the sampling error. The formula for extrapolation is the same as that for interpolation; however, the F term becomes the highest value in the table and the G term becomes the nexthighest value. Derivation of sampling errors The state and area sampling errors are developed with a generalized regression procedure and are not based on sample data for each individual area, population group, or labor force characteristic. As with all sampling error tables produced for CPS state and area data, a number of approximations are required in order to derive sampling errors that apply to a wide variety of items. As a result, these sampling errors indicate the order of magnitude of the error rather than a precise error for any specific item. The sampling error tables are derived from standard error equations and special parameters developed by the Bureau of Labor Statistics. These parameters are available upon request from the Division of Local Area Unemployment Statistics, Bureau of Labor Statistics, Room 4675, 2 Massachusetts Avenue NE, Washington, DC 202120001. Telephone: (202) 691–6392. Tables B2 through B5 can be used for estimates pertaining to any race or ethnic group whose data are published. As noted, the sampling errors are based on a generalized regression procedure and are approximate. Generally, the degree of precision in these tables is slightly greater for Whites (and the total of all race and ethnic groups) than it is for Blacks or Hispanics. 1 The four sampling criteria are (1) that the adjustment factor be greater than 1.3; (2) that the adjustment factor be less than 1/1.3 (or 0.769230 in decimal form); (3) that there be fewer than four NSR sample PSUs in the state; and (4) that there be fewer than 10 expected interviews in an agerace cell in the state. 



EMail: gpinfo@bls.gov Last Updated: September 23, 2015 