This page provides documentation for Consumer Expenditure Surveys (CE) Public Use Microdata (PUMD), its conventions, files, sample code, and methodology.
The Consumer Expenditure Surveys (CE) program provides data on expenditures, income, and demographic characteristics of consumers in the United States. The CE program provides these data in tables, LABSTAT databases, news releases, publications, and public use microdata files.
CE data are collected by the Census Bureau for the Bureau of Labor Statistics (BLS) in two surveys, the Interview Survey for major and/or recurring items, and the Diary Survey for more minor or frequently purchased items. CE data are primarily used to revise the relative importance of goods and services in the market basket of the Consumer Price Index. The CE program conducts the only Federal household survey to provide information on the complete range of consumers' expenditures and income. For more information, see the overview section in the CE chapter in the BLS Handbook of Methods.
CE PUMD provide the individual responses to the two surveys from respondents. The data have been adjusted to protect the confidentiality of respondents. The CE PUMD allow researchers to analyze expenditure, income, and demographic data beyond what is provided in published tabulations.
CE PUMD include data from both the Interview and Diary Surveys. Most files are analogous between the two surveys; however, the Interview Survey files contain roughly 50 additional detailed data files, as well as paradata files that provide detail about the collection process. All PUMD files cover a full calendar year. For years prior to 1996, file availability may be limited. Table 3 Interview Survey files and content lists the major files currently available, and their content. For a more comprehensive list of files provided in the CE PUMD, see the Dictionary for the Interview and Diary Surveys.
For both the Interview and Diary Surveys, the files use the following conventions:
The Interview Survey is a rotating panel survey in which approximately 10,000 addresses are contacted each calendar quarter that yield approximately 6,000 useable interviews. One-fourth of the addresses that are contacted each quarter are new to the survey. After a housing unit has been in the sample for four consecutive quarters, it is dropped from the survey, and a new address is selected to replace it. For more information, see the chapter Consumer Expenditures and Income in the BLS Handbook of Methods.
Before 2015, the Interview Survey included a preliminary bounding interview, and each CU could be contacted up to five times over five quarters. Although data from the bounding interview were not published, its purpose was to minimize telescoping errors.ⅰ The CE program stopped fielding the bounding interview in 2015 due to concerns about its effectiveness in reducing telescoping errors, cost, and impact on respondent burden. For more information, see Ian Elkin's article Recommendation regarding the use of a CE bounding interview.ⅱ
For the Interview Survey, the files use the following conventions:
Prior to 2020, each data release contains five quarters of Interview Survey data in order to allow users to calculate a calendar year estimate. For more information on this calculation, see Section 6.1 Estimation procedures for the Interview Survey. Starting in 2020 the data package does not include the first quarter of the calendar year, you will need to download the prior year's data package for the first quarter of data.
Each annual data release of the CE PUMD is processed using new data and new disclosure avoidance guidelines. For quarters that appear in two different data releases, an "x" is added to the end of the file name. This "x" is used as an indicator to inform users that the two files were processed under a different set of rules and conditions and therefore the content may differ slightly. It is at the user's discretion as to which file to use.
|Is the "x" included?||What file and release?||Did the files, methods, or data change?||Example|
|Fifth file of previous year's release||No, they stayed the same as in the previous four quarters in the package.||FMLIYYQ.sas7bdat|
|First file of current year's release||Yes, they changed from the previous year's version.||FMLIYYQx.sas7bdat|
|Valid blank; a blank field where a response is not anticipated|
|Invalid blank due to invalid nonresponse; nonresponse that is not consistent with other data reported by the CU|
|Blank due to "Don't know," refusal, or other nonresponse|
|Valid value; unadjusted|
|Valid value; allocated|
|Valid value; imputed or adjusted in some other way|
|Valid value; allocated and imputed|
|Valid blank for an expenditure that is a "parent record" where the expenditure was allocated to other records and the original expenditure was overwritten with a blank|
|Valid value; topcoded or suppressed|
|Valid value; allocated then topcoded or suppressed|
|Valid value; imputed or adjusted in some other way then topcoded or suppressed|
|Valid value; allocated and imputed or adjusted in some other way then topcoded or suppressed|
Table 3 summarizes the Interview Survey files currently available. If users encounter a file that is not listed below, consult the Dictionary for the Interview and Diary Surveys for additional details.
Table 3: Interview Survey files and content Name Content Variable periodicityⅲ Files per release Primary keysⅳ First year
CU level Summary Expenditures Quarterlyⅴ 5 NEWID 1980 CU level income, assets, and liabilities Annual CU characteristics and weights NA
Monthly expenditures Monthly 5 NEWID, SEQNO, ALCNO, UCC, RTYPE, EXPNAME, UCCSEQ, REF_MO, REF_YR 1980
Member level income Annual 5 NEWID, MEMBNO 1980 Member characteristics NA
Detailed income Monthly 5 NEWID, UCC, REF_MO, REFYR 1980
Imputed income iterations Monthly 5 NEWID, UCC, REF_MO, REFYR, IMPNUM 2004
Estimated federal and state income taxes Annual 5 NEWID and TAXID 2013 Q2 Detailed expenditure and non-expenditure data Quarterly, monthly, weekly, or NA Varies by year* NEWID, SEQNO, ALCNO 1980 Data related to the survey process NA 1 NEWID 2009 Data related to the contact history NA 1 NEWID 2009
* For the specific detailed data files available, see PUMD dictionary.
The roughly 50 detailed data files include expenditure and non-expenditure information that is directly collected from sections of the Interview Survey (See the Survey materials page for more information). For years prior to 1994, there may be fewer files. The Dictionary for the Interview and Diary Surveys contains additional information related to the content and makeup for each of these files. Each detailed data file consist of five quarters of data. Because these files correspond to specific sections in the survey, they have a number of differences between them. These are the main differences:
Here is an example of the detailed data file VEQ (Vehicles, maintenance and repair) and some of the variables it contains.
Paradata files provide data about the interview process. Beginning in 2009, the CE program began releasing paradata for the Interview Survey. The CE program does not release paradata for the Diary Survey. Paradata are available in two datasets:
The Diary Survey is a panel survey in which approximately 5,000 addresses are contacted each calendar quarter that yield approximately 3,000 useable interviews.ⅵ After a housing unit has been in the sample for two consecutive weeks, it is dropped from the survey, and a new address is selected to replace it. For more information, see the chapter Consumer Expenditures and Income in the BLS Handbook of Methods.
For the Diary Survey, the files use the following conventions:
|Valid blank; a blank field where a response is not anticipated|
|Blank due to invalid nonresponse; nonresponse that is not consistent with other data reported by the CU|
|Blank due to "Don't know," refusal, or other nonresponse|
|Valid value; unadjusted|
|Valid value; allocated|
|Valid value; topcoded or suppressed|
For Diary Survey expenditures located on the EXPD files, the variable ALLOC can be utilized to determine if an expenditure has been adjusted, allocated, topcoded, or any combination of the three. Table 5 lists the allocation codes and its corresponding flag values.
|ALLOC Code||Description||Corresponding Flag|
|Valid value, unadjusted||D|
|Valid value, allocated||E|
|Topcoded and allocated||T|
|Topcoded, not allocated||T|
Table 6 summarizes the Diary Survey files currently available. Data prior to 1994 may include fewer files. If users encounter a file that is not listed below, consult the Dictionary for the Interview and Diary Surveys for additional details.
Table 6: Diary Survey files and content Name Content Variable periodicity Files per release Primary keysⅶ First year
Summary expenditures Weekly 4 NEWID 1980 CU level income, assets, and liabilities Annual CU characteristics and weights Annual
Member level income Annual 4 NEWID and MEMBNO 1980 Member characteristics NA
Detailed expenditure and non-expenditure data Weekly 4 NEWID and ALLOC 1980
Detailed income Annual 4 NEWID and UCC 1980
Income imputation iterations Annual 4 NEWID, UCC, IMPNUM 2004
This section provides sample code for CE PUMD. When using the code, users may want to consider these points:
The CE program provides these year sample codes:
Approximates CE table 1203 Income before taxes
|Weighted||Calendar||Aggregate annual expenditures; expenditure means and standard errors by income groups||CE PUMD estimates may not match the table estimates. For more information, see FAQ 26 on the CE FAQ page.||SAS|
Aggregates selected UCCs
|Weighted||Calendar||Aggregate annual expenditures; expenditure means and standard errors||The code integrates data from both surveys on a UCC level. For a list of UCCs, see introduction of hierarchical grouping files.||SAS
Aggregates selected variables
|Weighted and un-weighted||Collection and calendar||Computes all items above and regresses on imputed and non-imputed data.||Code uses the Balanced Repeated Replication (BRR) method in a predefined SAS proc. For more information, see SAS macro for the CE surveys.||SAS|
This section describes the estimation procedures for the Interview Survey and the estimation procedures for the Diary Survey; the formulas to estimate weighted annual calendar year estimates; and sampling statements. The CE program integrates information from both the Interview and Diary Surveys in its publications. Therefore any analysis limited to only the one survey may produce results that do not match the published CE estimates. In addition, users may find that estimates do not match the published estimates due to the non-disclosure criteria that are applied to the CE PUMD. For more information on non-disclosure requirements, see the Protection of Respondent Confidentiality page.
This section discusses procedures for estimating annual calendar year means with data from the interview surveys. Field representatives interview CUs to collect the cost of all expenses during the prior three months. Data collected by each interview are treated as statistically independent - each quarter's interview is separately weighted to be representative of the population. For more information, see the collections and data sources section in the Consumer Expenditures and Income chapter in the BLS Handbook of Methods.
For the Interview Survey, users may want to consider the following general concepts:
When calculating data for 2016, interviews conducted in January 2017 cover expenditures made between October 2016 and December 2016, and are used to estimate data for these three months in 2016. Similarly, interviews conducted in March 2017 cover expenditures between December 2016 and February 2017 and are used to estimate data for December 2016. Thus, users have to use the first file for 2017 to estimate data for the last quarter of 2016. Charts 1 illustrates that concept. The green months show those that are in scope for the estimates of 2016 and the yellow months show those months in 2017 that out of scope.
Chart 1: Months in scope for quarter 5 (FMLI171)
A similar differentiation of scope happens at the beginning of the year. The data collected in January of 2016 are not in scope for 2016 expenditures because the January interview collects data for the last 3 months of 2015. However, data collected in February and March 2016 are partially in scope. Data collected in February includes data for January of 2016, and data collected in March 2016 includes data for January and February 2016. See chart 2.
Chart 2: Months in scope for quarter 1 (FMLI161)
Finally, for the months April through December all months are in scope. For example, quarter 2 interviews conducted in April, May, and June collect expenditure data for January 2016 through May 2016, which are all in scope for 2016. See chart 3. The same holds true for Quarter 3 and 4.
Chart 3: Months in scope for quarter 2 (FMLI162)
This section provides users of the Diary Survey with procedures to estimate annual calendar means.
CUs self-report a detailed description of all expenses using a product-oriented diary for two consecutive 1-week periods. Data entries can start on any day of the week. Data collected each week are treated as statistically independent - each week's diary is separately weighted to be representative of the population. For more information, see the collections and data sources section in the chapter of Consumer Expenditures and Income in the BLS Handbook of Methods.
For the Diary Survey, users may want to consider the following concepts:
The formulas described below can be used to calculate weighted estimates that use data from both surveys. The formulas calculate annual calendar year aggregates, averages, and standard errors for expenditures and reported income. While these formulas can also be used to calculate annual averages of imputed income as well, they cannot be used to calculate standard errors. For more information on this topic, see the Description of Income Imputation Beginning with 2004 Data.
When integrating data across surveys, keep in mind that estimates created from the Diary Survey will yield a weekly amount and therefore, users will need to adjust their estimates so that each survey result represents the same time period. Inflating the Diary Survey UCC estimate by a multiplier of 13, will result in a quarterly amount, which can then be summed with an Interview Survey estimate.
This section presents the methods to calculate the population, aggregate values, and average values for expenditures or income for a calendar year.
Denominator: PopulationNEWID = Identifier for one CU for one quarter
FINLWT21 = Weight of each NEWID
QNUM = Number of quarters in the analysis (Usually equal to 4 for a 1 year estimate)
MO_SCOPE = Indicator for the number of months in scope for each NEWID
For the first four quarters, MO_SCOPE is defined by the value of QINTRVMO:
For the fifth quarter, MO_SCOPE is defined by the value of QINTRVMO:
Numerator: Aggregate value
X = Expenditures or income variables by NEWID. This formula can be used for quarterly, annual, weekly, or monthly data.
Quotient: Average value
Description of sampling and non-sampling errors
Sample surveys are subject to two types of errors, sampling and non-sampling. Sampling errors occur because observations are not taken from every unit in the entire population. Standard errors measure sampling errors. The primary purpose of standard errors is to provide users with a measure of the variability associated with the mean estimates. The sample estimate and its estimated standard error enable one to construct confidence intervals.
Non-sampling errors can be attributed to many sources, such as definitional difficulties, differences in the interpretation of questions, inability or unwillingness of the respondent to provide correct information, mistakes in recording or coding the data obtained, and other errors of collection, response, processing, coverage, and estimation of missing data. Estimates using a small number of observations are less reliable. Research articles examining CE measurement error and nonresponse bias are included in the CE library. The CE program regularly examines CE data in the annual data quality assessment and compares CE results with other sources of federal statistics. For more information, see the Data Quality and Comparisons page.
Estimating sampling error
The CE program estimates sampling error using Balanced Repeated Replication (BRR). The CE program implements this method with three steps:
Replicate means for expenditures
WTREP = 44 Replicate weights (WTREP01-WTREP44)
Note that prior to 1990, 20 replicate weights were used, instead of the 44 that are currently in use. When developing a standard error using data prior to 1990, use the replicate weight variables FINLWT01-FINLWT20 in your calculation.
Note that this method does not work for imputed income data. For information on calculating sampling errors from imputed income, see the User's Guide to Income Imputation in the CE.
The CE survey sample is a nationwide household survey representing the entire U.S. civilian noninstitutional population. It includes people living in houses, condominiums, apartments, and group quarters such as college dormitories. It excludes military personnel living overseas or on base, nursing home residents, and people in prisons. The civilian noninstitutional population represents more than 98 percent of the total U.S. population. For more information, see sample design in the chapter Consumer Expenditures and Income of the BLS Handbook of Methods.
Each CU included in the CE sample represents a given number of CUs in the U.S. population, which is considered to be the universe. Weighting is used to adjust the relative contribution of each CU to reflect the inverse of its selection probability, as well as to account for nonresponse and to match certain characteristics to known control totals. For more information, see sample design in the chapter Consumer Expenditures and Income of the BLS Handbook of Methods.
The state weights initiative by CE is an effort to produce research microdata products that can allow users to explore consumer expenditure data at the state level, a feature previously unavailable in the data. The CE program intends to explore the viability of the CE sample to support weight creation for as many states as possible. The first available states are California, Florida, New York, and Texas. Users should take note that the state weights are considered a research product, and may not be consistently available across the five listed states going forward as they are highly dependent on sample composition. New Jersey is available for 2016-2020, but beginning in 2021 it was determined the sample in New Jersey could not support a state estimate.
Care should be taken when analyzing public use microdata using the state weights, as the small number of households for some expenditures can cause the mean dollar estimate to be imprecise. The more aggregated summary variables will produce more precise estimates. Additionally, it should be noted that these weights are only for their respective states and cannot be used to make inferences about any other geographic areas. The provided data must be used in conjunction with the public use microdata to obtain state level estimates.
For users interested in using the state weights, please visit the CE PUMD files page.
The following are considerations users should be aware of when working with the Consumer CE PUMD. While PUMD contain a wealth of data, the CE surveys were designed with the specific purpose of finding out how U.S. consumers spend their money, and therefore may not be applicable to every research endeavor.
The CE surveys are designed to produce national expenditure estimates. The estimates are calculated from a relatively small sample of predominantly urban areas. Within these areas, the CE program surveys only a small percentage of those households. For example, in New York State the CE program successfully interviewed roughly 1,500 households for the Interview Survey in 2017.
At the subnational level, the current CE sample design allows data users to create estimates for 4 Census regions, 9 Census divisions, 4 states, and over 25 metropolitan statistical areas. However, the PUMD do not contain information by county or zip code. For more information, see the CE geographic data page.
Data users cannot identify who bought an item or who consumed it because both CE surveys do not ask these questions. That limits the ability of data users to connect the data with demographic information of the specific purchaser or consumer.
However, data users may be able to infer some demographic characteristics for purchases by single member households because expenditures by single member household are likely purchased and consumed by that person.
Inferring demographic information for households with more than one member is more difficult than for single member households. However, in some cases it may be possible. For example a women's garment is more likely to be used by the women in the household.
Generally, the CE surveys only provide the total cost and no unit value. Thus an expense of $220 on wine could be one expensive bottle or several cases of bottles.
However, the CE surveys do provide limited information on the quantity and quality of a few expenditures. For example, the Interview Survey indicates the number purchased for selected large items, like cars or appliances. The Diary Survey may identify the number of meals purchased away from home, but not how many people ate at each meal.
With respect to quality, the PUMD do contain roughly 50 detailed expenditure files that provide additional information about an expenditure. The information booklets distributed to respondents with the survey also describe what information data users can find in the data. For information on the questions the CE surveys ask, see the Survey Materials page.
Reported data on a Consumer Unit's (CU) income, income taxes, and financial assets may have limited analytical use because of two main factors:
The available detail for some categories may have limited analytical use due to the following major reasons:
Trend analysis of individual households is limited because the CE program interviews each household for a fixed time period. The specific duration depends on the survey, the type of data, and the respondent's willingness to participate.
Trend analyses of aggregated PUMD variables and to a lesser degree, items in CE tables, over several years are limited by the changes in collection and sampling methods. For example, every ten years the CE program introduces a new sample design. Generally, when the CE program introduces a new sample design, method, item, or question, the program does not create an overlap where both the old and the new version are available during the transition.
However, longitudinal analysis of major categories across several years is possible if the data user concludes that the changes in the underlying collection methods do not affect the overall trend. Generally, larger categories are less impacted than small categories. For a list of the main survey changes in the history of the CE program, see Consumer Expenditures and Income: History in the BLS Handbook of Methods.
ⅰ Telescoping errors refer to the temporal displacement of an event. Respondents of the CE surveys may perceive recent events to be more remote than they are (backwards telescoping) and distant events to be more recent than they are (forward telescoping).
ⅱ Ian Elkin, Recommendation regarding the use of a CE bounding interview, 2013, Bureau of Labor Statistics.
ⅲ Variable periodicity refers to the period that a given value represents.
ⅳ Primary keys identify each unique record in the database.
ⅴ Quarterly summary expenditures are presented as two variables - one containing expenditures made in the previous calendar quarter and one containing expenditures made in the current calendar quarter.
ⅶ Primary keys identify each unique record in the database.
Last Modified Date: May 4, 2023