Bureau of Labor Statistics > Consumer Expenditure Survey > Methods

Consumer Expenditure Surveys Public Use Microdata Getting Started Guide

This page provides documentation for Consumer Expenditure Surveys (CE) Public Use Microdata (PUMD), its conventions, files, sample code, and methodology.

Section 1. CE program
Section 2. CE PUMD
Section 3. Interview Survey
Section 4. Diary Survey
Section 5. Sample code
Section 6. Methodology
Section 7. Considerations

Section 1. CE program

The Consumer Expenditure Surveys (CE) program provides data on expenditures, income, and demographic characteristics of consumers in the United States. The CE program provides these data in tables, LABSTAT databases, news releases, publications, and public use microdata files.

CE data are collected by the Census Bureau for the Bureau of Labor Statistics (BLS) in two surveys, the Interview Survey for major and/or recurring items, and the Diary Survey for more minor or frequently purchased items. CE data are primarily used to revise the relative importance of goods and services in the market basket of the Consumer Price Index. The CE program conducts the only Federal household survey to provide information on the complete range of consumers' expenditures and income. For more information, see the overview section in the CE chapter in the BLS Handbook of Methods.

Section 2. CE PUMD

CE PUMD provide the individual responses to the two surveys from respondents. The data have been adjusted to protect the confidentiality of respondents. The CE PUMD allow researchers to analyze expenditure, income, and demographic data beyond what is provided in published tabulations.

2.1 CE PUMD files

CE PUMD include data from both the Interview and Diary Surveys. Most files are analogous between the two surveys; however, the Interview Survey files contain roughly 50 additional detailed data files, as well as paradata files that provide detail about the collection process. All PUMD files cover a full calendar year. For years prior to 1996, file availability may be limited. Table 3 Interview Survey files and content lists the major files currently available, and their content. For a more comprehensive list of files provided in the CE PUMD, see the Dictionary for the Interview and Diary Surveys.

Are the CE PUMD required for a particular research topic?
A research topic may require the detail that only PUMD files provide, but the CE program does provide a wealth of information that has already been tabulated and may be sufficient for a user's analysis. This information includes tables, LABSTAT databases, news releases, and publications. To learn more about these products, see the introduction to the CE data products.
What is required in order to use the CE PUMD?
Users of CE PUMD need to be familiar with statistical concepts and be proficient with a statistical software package, such as SAS, R, or STATA.
What to consider when using PUMD files?
The files contain individual survey responses. Thus their uses depend on the survey design. For example, the CE survey design supports reliable national averages of major expenditures. However, it may not support reliable estimates for some states. For more information, see CE Considerations When Using the public use Microdata.
What data formats are available?
The data files are available in SAS, STATA, and CSV and can be downloaded from the public use microdata data files page. If users' research requires access to CE microdata without the disclosure restrictions applied, they can apply to be visiting researchers on the BLS onsite researcher page.
Where can users obtain additional information?
If users have comments or questions about this page and its contents, contact us.

2.2 CE PUMD file conventions

For both the Interview and Diary Surveys, the files use the following conventions:

How are CE PUMD files named?
CE PUMD file naming conventions consist of three parts:
- File name
- Calendar year (YY)
- Quarter (Q) if applicable. Quarter can be 1-5.
The detailed annual Interview Survey files do not specify the quarter but only the year, for example intrvw16.zip\expn16\cla16.sas7bdat.
What types of values do CE PUMD variables use?
CE PUMD variables are stored in one of the following three formats:
- Numeric (NUM): Variables that predominantly contain dollar amounts and counts
- String (CHAR): Variables that contain a sequence of alphanumeric characters
- Categorical (CHAR): Coded variables
Where do data users find descriptions for the CE PUMD variables?
The Dictionary for the Interview and Diary Surveys contains a description for each variable. In addition to the description, the dictionary lists the associated codes, the location of a variable within the files and within the survey, and the duration in which a variable existed in the PUMD.
How do data users track changes in PUMD?
The Dictionary for the Interview and Diary Surveys tracks detailed changes to files, variables, and codes. For large survey changes, consult the history page in the chapter Consumer Expenditures and Income in the BLS Handbook of Methods.
How do data users identify a unique record in the CE PUMD?
Identifying a unique record depends on the file. For a list of the primary key variables for each file, see Table 3 Interview Survey files and content and Table 6 Diary Survey files and content.
How do data users link an interview or diary for a given Consumer Unit (CU) in different files?
NEWID links data for one CU across interviews and files. Users cannot link CUs across surveys because the Diary and Interview surveys use different samples.
How is the variable NEWID structured?
NEWID is a unique sequential number concatenated with the number of the interview. The last digit of NEWID indicates the interview number in a series of 4, or the week of diary collection in a series of 2. All values prior to the last digit, identify a CU.

Section 3. Interview Survey

3.1 Interview Survey overview

The Interview Survey is a rotating panel survey in which approximately 10,000 addresses are contacted each calendar quarter that yield approximately 6,000 useable interviews. One-fourth of the addresses that are contacted each quarter are new to the survey. After a housing unit has been in the sample for four consecutive quarters, it is dropped from the survey, and a new address is selected to replace it. For more information, see the chapter Consumer Expenditures and Income in the BLS Handbook of Methods.

Before 2015, the Interview Survey included a preliminary bounding interview, and each CU could be contacted up to five times over five quarters. Although data from the bounding interview were not published, its purpose was to minimize telescoping errors.^ⅰ The CE program stopped fielding the bounding interview in 2015 due to concerns about its effectiveness in reducing telescoping errors, cost, and impact on respondent burden. For more information, see Ian Elkin's article Recommendation regarding the use of a CE bounding interview.^ⅱ

3.2 Interview Survey file conventions

For the Interview Survey, the files use the following conventions:

What does an Interview "quarter" refer to?
The interview "quarter" refers to the calendar quarter in which the interview occurred. For example, any CU interviewed in April, May, or June would have their data stored in the quarter 2 (YYQ2) datasets. During an interview, the CU is asked to report expenditures for the three months prior to the interview. So, for a CU interviewed in April, their expenditures in the YYQ2 are for January, February, and March. This distinction is important to remember when calculating calendar year estimates.
How many quarters does a CE PUMD release include?
Prior to 2020, each CE PUMD release includes five quarters. Four quarters for the release year (YYQ1-YYQ4) and the first quarter of the next year. Beginning in 2020, each data release contains the last three quarters of a year and the first quarter of the following year.
Why do some CE PUMD Interview Survey files exist as part of two different data releases?
Prior to 2020, each data release contains five quarters of Interview Survey data in order to allow users to calculate a calendar year estimate. For more information on this calculation, see Section 6.1 Estimation procedures for the Interview Survey. Starting in 2020 the data package does not include the first quarter of the calendar year, you will need to download the prior year's data package for the first quarter of data.
What does the "x" mean in some of the data file names before 2020?
Each annual data release of the CE PUMD is processed using new data and new disclosure avoidance guidelines. For quarters that appear in two different data releases, an "x" is added to the end of the file name. This "x" is used as an indicator to inform users that the two files were processed under a different set of rules and conditions and therefore the content may differ slightly. It is at the user's discretion as to which file to use.

Table 1: Description of "x" in file names for the fifth quarter
Is the "x" included?	What file and release?	Did the files, methods, or data change?	Example
No	Fifth file of previous year's release	No, they stayed the same as in the previous four quarters in the package.	FMLIYYQ.sas7bdat
Yes	First file of current year's release	Yes, they changed from the previous year's version.	FMLIYYQx.sas7bdat

What do the flag values in the Interview Survey represent?
In the Interview Survey files, data fields are explained by using flags for selected variables. Variables that have a flag variable associated with them are identified in the Dictionary for the Interview and Diary Surveys, on the Variable tab, under the column "Flag Name." Table 2 lists the codes for flags in the Interview Survey. Pre-1996 data will contain a subset of the flag values listed below.

Table 2: Interview Survey flag variable codes
Flag value	Description
A	Valid blank; a blank field where a response is not anticipated
B	Invalid blank due to invalid nonresponse; nonresponse that is not consistent with other data reported by the CU
C	Blank due to "Don't know," refusal, or other nonresponse
D	Valid value; unadjusted
E	Valid value; allocated
F	Valid value; imputed or adjusted in some other way
G	Valid value; allocated and imputed
H	Valid blank for an expenditure that is a "parent record" where the expenditure was allocated to other records and the original expenditure was overwritten with a blank
T	Valid value; topcoded or suppressed
U	Valid value; allocated then topcoded or suppressed
V	Valid value; imputed or adjusted in some other way then topcoded or suppressed
W	Valid value; allocated and imputed or adjusted in some other way then topcoded or suppressed

3.3 Interview Survey file types

Table 3 summarizes the Interview Survey files currently available. If users encounter a file that is not listed below, consult the Dictionary for the Interview and Diary Surveys for additional details.

Table 3: Interview Survey files and content

Name Content Variable periodicity^ⅲ Files per release Primary keys^ⅳ First year

FMLI
CU level Summary Expenditures Quarterly^ⅴ 5 NEWID 1980

CU level income, assets, and liabilities Annual

CU characteristics and weights NA

MTBI
Monthly expenditures Monthly 5 NEWID, SEQNO, ALCNO, UCC, RTYPE, EXPNAME, UCCSEQ, REF_MO, REF_YR 1980

MEMI
Member level income Annual 5 NEWID, MEMBNO 1980

Member characteristics NA

ITBI
Detailed income Monthly 5 NEWID, UCC, REF_MO, REFYR 1980

ITII
Imputed income iterations Monthly 5 NEWID, UCC, REF_MO, REFYR, IMPNUM 2004

NTAXI
Estimated federal and state income taxes Annual 5 NEWID and TAXID 2013 Q2

Detailed data files
Detailed expenditure and non-expenditure data Quarterly, monthly, weekly, or NA Varies by year* NEWID, SEQNO, ALCNO 1980

FPAR
Data related to the survey process NA 1 NEWID 2009

MCHI
Data related to the contact history NA 1 NEWID 2009

* For the specific detailed data files available, see PUMD dictionary.

Table 3: Interview Survey files and content
Name	Content	Variable periodicity^ⅲ	Files per release	Primary keys^ⅳ	First year
FMLI	CU level Summary Expenditures	Quarterly^ⅴ	5	NEWID	1980
CU level income, assets, and liabilities	Annual
CU characteristics and weights	NA
MTBI	Monthly expenditures	Monthly	5	NEWID, SEQNO, ALCNO, UCC, RTYPE, EXPNAME, UCCSEQ, REF_MO, REF_YR	1980
MEMI	Member level income	Annual	5	NEWID, MEMBNO	1980
Member characteristics	NA
ITBI	Detailed income	Monthly	5	NEWID, UCC, REF_MO, REFYR	1980
ITII	Imputed income iterations	Monthly	5	NEWID, UCC, REF_MO, REFYR, IMPNUM	2004
NTAXI	Estimated federal and state income taxes	Annual	5	NEWID and TAXID	2013 Q2
Detailed data files	Detailed expenditure and non-expenditure data	Quarterly, monthly, weekly, or NA	Varies by year*	NEWID, SEQNO, ALCNO	1980
FPAR	Data related to the survey process	NA	1	NEWID	2009
MCHI	Data related to the contact history	NA	1	NEWID	2009

3.3.1 Detailed data files - Detailed expenditure and non-expenditure data

The roughly 50 detailed data files include expenditure and non-expenditure information that is directly collected from sections of the Interview Survey (See the Survey materials page for more information). For years prior to 1994, there may be fewer files. The Dictionary for the Interview and Diary Surveys contains additional information related to the content and makeup for each of these files. Each detailed data file consist of five quarters of data. Because these files correspond to specific sections in the survey, they have a number of differences between them. These are the main differences:

The reference periods may differ due to different questions.
The number of records per CU differs. Some files having multiple records per CU, some have one record per CU, and some have no records per CU interviewed each quarter.
The method to identify unique records differs. Users can identify unique records with NEWID and depending on the file these variables:
- SEQNO is assigned sequentially during the interview as each expenditure record is recorded into the database.
- ALCNO is assigned sequentially for each record that has been allocated from one expenditure. For example, a CU may report spending $50 on a pair of men's pants and a shirt. The CE program will allocate out that record into two separate records, one for men's pants and shorts ($30) and one for men's shirts ($20).

Here is an example of the detailed data file VEQ (Vehicles, maintenance and repair) and some of the variables it contains.

VEQ-Vehicle maintenance and repair

VOPSERVY is an indicator variable that describes the type of maintenance or repair.
VOPMOA is an indicator variable for the month in which the expense occurred.
VOPEXPX is the total cost of the maintenance or repair expense.

3.3.2 Interview Survey Paradata files

Paradata files provide data about the interview process. Beginning in 2009, the CE program began releasing paradata for the Interview Survey. The CE program does not release paradata for the Diary Survey. Paradata are available in two datasets:

FPAR - Data related to the survey process

Contains data about the survey, including timing for each section and whether the respondent used records.
Organized by NEWID.
Unique records are defined by NEWID and QYEAR.

MCHI - Data related to the contact history

Contains data about the contact history between the field representative and the respondent, including reasons for interview refusal and time of contact.
The files are organized by NEWID.
Unique records are defined by NEWID and QYEAR.

How many quarters are in the paradata files?
Each paradata file has nine quarters. These include four quarters for the first year, four for the second year, and one for the first quarter of the third year.

Section 4. Diary Survey

4.1 Diary Survey overview

The Diary Survey is a panel survey in which approximately 5,000 addresses are contacted each calendar quarter that yield approximately 3,000 useable interviews.^ⅵ After a housing unit has been in the sample for two consecutive weeks, it is dropped from the survey, and a new address is selected to replace it. For more information, see the chapter Consumer Expenditures and Income in the BLS Handbook of Methods.

4.2 Diary Survey file conventions

For the Diary Survey, the files use the following conventions:

What does a Diary Survey "quarter" refer to?
A Diary Survey "quarter" refers to the calendar quarter in which the Diary Survey booklet was placed in the home of the CU by the Census Field Representative. All Diary Survey files are organized as quarterly files.
What does a Diary Survey "week" refer to?
The Diary Survey "week" refers to the 7 consecutive days in which the data were recorded. Respondents only record expenditures of that week. Each CU is in the sample for two consecutive weeks. Each Diary Survey week is assigned to the Diary Survey quarter in which it was recorded.

What do the flag values in the Diary Survey represent?
In the Diary Survey files, data fields are explained by using flags for selected variables. Variables that have a flag variable associated with them are identified in the Dictionary for the Interview and Diary Surveys, on the Variable tab, under the column "Flag Name." Table 4 lists the codes for flags in the Diary Survey. Pre-1996 data contain a subset of the flag values listed below.

Table 4: Diary Survey flags
Flag value	Description
A	Valid blank; a blank field where a response is not anticipated
B	Blank due to invalid nonresponse; nonresponse that is not consistent with other data reported by the CU
C	Blank due to "Don't know," refusal, or other nonresponse
D	Valid value; unadjusted
E	Valid value; allocated
T	Valid value; topcoded or suppressed

For Diary Survey expenditures located on the EXPD files, the variable ALLOC can be utilized to determine if an expenditure has been adjusted, allocated, topcoded, or any combination of the three. Table 5 lists the allocation codes and its corresponding flag values.

Table 5: Diary Survey allocation codes
ALLOC Code	Description	Corresponding Flag
0	Valid value, unadjusted	D
1	Valid value, allocated	E
2	Topcoded and allocated	T
3	Topcoded, not allocated	T

4.3 Diary Survey file types

Table 6 summarizes the Diary Survey files currently available. Data prior to 1994 may include fewer files. If users encounter a file that is not listed below, consult the Dictionary for the Interview and Diary Surveys for additional details.

Table 6: Diary Survey files and content

Name Content Variable periodicity Files per release Primary keys^ⅶ First year

FMLD
Summary expenditures Weekly 4 NEWID 1980

CU level income, assets, and liabilities Annual

CU characteristics and weights Annual

MEMD
Member level income Annual 4 NEWID and MEMBNO 1980

Member characteristics NA

EXPD
Detailed expenditure and non-expenditure data Weekly 4 NEWID and ALLOC 1980

DTBD
Detailed income Annual 4 NEWID and UCC 1980

DTID
Income imputation iterations Annual 4 NEWID, UCC, IMPNUM 2004

Table 6: Diary Survey files and content
Name	Content	Variable periodicity	Files per release	Primary keys^ⅶ	First year
FMLD	Summary expenditures	Weekly	4	NEWID	1980
CU level income, assets, and liabilities	Annual
CU characteristics and weights	Annual
MEMD	Member level income	Annual	4	NEWID and MEMBNO	1980
Member characteristics	NA
EXPD	Detailed expenditure and non-expenditure data	Weekly	4	NEWID and ALLOC	1980
DTBD	Detailed income	Annual	4	NEWID and UCC	1980
DTID	Income imputation iterations	Annual	4	NEWID, UCC, IMPNUM	2004

Section 5. Sample code

This section provides sample code for CE PUMD. When using the code, users may want to consider these points:

The code can integrate data from both surveys or draw solely from one survey. Integration refers to the process of integrating estimates for both the Interview and Diary surveys. Separate or code options within the programs allow for users to decide which type of estimate they want.
The code may utilize the hierarchical groupings (zip), which are available from 1996 forward.
The code was built for use with the current PUMD structure and may require adjustments for earlier years, particularly for years before 1996.

The CE program provides these year sample codes:

Section 5. Sample code
Purpose	Weights	Period	Estimates	Notes	Code
Approximates CE table 1203 Income before taxes	Weighted	Calendar	Aggregate annual expenditures; expenditure means and standard errors by income groups	CE PUMD estimates may not match the table estimates. For more information, see FAQ 26 on the CE FAQ page.	SAS
Aggregates selected UCCs	Weighted	Calendar	Aggregate annual expenditures; expenditure means and standard errors	The code integrates data from both surveys on a UCC level. For a list of UCCs, see introduction of hierarchical grouping files.	SAS R STATA
Aggregates selected variables	Weighted and un-weighted	Collection and calendar	Computes all items above and regresses on imputed and non-imputed data.	Code uses the Balanced Repeated Replication (BRR) method in a predefined SAS proc. For more information, see SAS macro for the CE surveys.	SAS

Section 6. Methodology

This section describes the estimation procedures for the Interview Survey and the estimation procedures for the Diary Survey; the formulas to estimate weighted annual calendar year estimates; and sampling statements. The CE program integrates information from both the Interview and Diary Surveys in its publications. Therefore any analysis limited to only the one survey may produce results that do not match the published CE estimates. In addition, users may find that estimates do not match the published estimates due to the non-disclosure criteria that are applied to the CE PUMD. For more information on non-disclosure requirements, see the Protection of Respondent Confidentiality page.

6.1 Estimation procedures for the Interview Survey

This section discusses procedures for estimating annual calendar year means with data from the interview surveys. Field representatives interview CUs to collect the cost of all expenses during the prior three months. Data collected by each interview are treated as statistically independent - each quarter's interview is separately weighted to be representative of the population. For more information, see the collections and data sources section in the Consumer Expenditures and Income chapter in the BLS Handbook of Methods.

For the Interview Survey, users may want to consider the following general concepts:

What information does the Interview Survey ask CUs?
The Interview Survey asks respondents about all expenses that the CU incurs during the survey period as well as information about financial data and demographic information. For more information on what is included and excluded in expenditures, see the entry on expenditures on the glossary page.
How are Interview Survey data organized?
The Interview Survey data are organized and identified by quarter, but particular files may provide data by month or by year. For more information on the data's periodicity, see Table 3: Interview Survey files and content.
How many quarters does a user need for calendar year estimates?
To produce calendar year estimates, users need to access all five quarters of data: All four quarters of the year of interest and the first quarter for the subsequent year. For example for estimates of 2017, users need the files for quarter 1 through 4 for 2017 and quarter 1 for 2018.
Why do users need data from two years to estimate one calendar year?
Data users need data from two subsequent years to calculate calendar year estimates because in the Interview Survey, users report expenditures for the three months prior to the interview. Thus in January, February, and March interviews, a CU has the potential to report expenditures from the previous year, which are considered out of scope when developing a current calendar year estimate.

When calculating data for 2016, interviews conducted in January 2017 cover expenditures made between October 2016 and December 2016, and are used to estimate data for these three months in 2016. Similarly, interviews conducted in March 2017 cover expenditures between December 2016 and February 2017 and are used to estimate data for December 2016. Thus, users have to use the first file for 2017 to estimate data for the last quarter of 2016. Charts 1 illustrates that concept. The green months show those that are in scope for the estimates of 2016 and the yellow months show those months in 2017 that out of scope.

Chart 1: Months in scope for quarter 5 (FMLI171)

A similar differentiation of scope happens at the beginning of the year. The data collected in January of 2016 are not in scope for 2016 expenditures because the January interview collects data for the last 3 months of 2015. However, data collected in February and March 2016 are partially in scope. Data collected in February includes data for January of 2016, and data collected in March 2016 includes data for January and February 2016. See chart 2.

Chart 2: Months in scope for quarter 1 (FMLI161)

Finally, for the months April through December all months are in scope. For example, quarter 2 interviews conducted in April, May, and June collect expenditure data for January 2016 through May 2016, which are all in scope for 2016. See chart 3. The same holds true for Quarter 3 and 4.

Chart 3: Months in scope for quarter 2 (FMLI162)
How much does a CU contribute to a calendar year estimate in each interview months?
A CU's contribution depends on the interview month and year. For information on how to identify a CU's contribution to a calendar year estimate, see Section 6.3 Formulas.
Is the periodicity of variable values consistent across files?
No, it is not. Different files and different variables within files may have different periodicities. For more information, see the Table 3: Interview Survey files and content.

6.2 Diary Survey estimation procedures

This section provides users of the Diary Survey with procedures to estimate annual calendar means.

CUs self-report a detailed description of all expenses using a product-oriented diary for two consecutive 1-week periods. Data entries can start on any day of the week. Data collected each week are treated as statistically independent - each week's diary is separately weighted to be representative of the population. For more information, see the collections and data sources section in the chapter of Consumer Expenditures and Income in the BLS Handbook of Methods.

For the Diary Survey, users may want to consider the following concepts:

What information does the Diary Survey ask CUs?
The Diary Survey asks for almost all expenses that the CU incurs during the survey week. In addition, the Diary Survey also asks about income and demographic information. The Diary Survey excludes expenses incurred by family members while away from home overnight or on vacation, and for credit and installment plan payments.
How are Diary Survey data organized?
The Diary Survey data are organized and identified by the day an item was purchased.
How do users identify the purchase date?
Users cannot identify the exact purchase date. However, users can identify the start month of the reference week (STRTMNTH), the day of the week (EXPNWDY), the sequential day of the survey (EXPNSQDY), as well as the reference month (EXPNMO) and year (EXPNYR).
How many quarters does a user need for annual calendar year estimates?
To produce calendar year estimates, users need to access four collection-quarter files of the year of interest. For example for 2017 estimates, users need the files for quarter 1 through 4 for 2017.
How much does a CU contribute to a calendar year estimate in each interview month?
In the Diary Survey, a CU contributes 100 percent of its expenditures to the calendar year. Unlike the Interview Survey, the Diary Survey has no lag between the time an expenditure occurs and the time it is reported, which means that the potential contribution of each CU to the mean is the same.

6.3 Formulas

The formulas described below can be used to calculate weighted estimates that use data from both surveys. The formulas calculate annual calendar year aggregates, averages, and standard errors for expenditures and reported income. While these formulas can also be used to calculate annual averages of imputed income as well, they cannot be used to calculate standard errors. For more information on this topic, see the Description of Income Imputation Beginning with 2004 Data.

What is the impact of different periodicity by different variables?
Variable periodicity can be annual, quarterly, monthly or weekly. When working with the PUMD, users need to take the particular periodicity into account. For example, when combining weekly data with annual data, users need to account for the difference in periodicity by inflating the weekly data to represent a quarterly value.
How to calculate comprehensive estimates of expenditures and income?
To gain a complete picture of expenditures and income, users need to integrate data from both surveys. The CE program collects data with two independent surveys. While they complement each other with respect to the data collection, they use independent samples that do not overlap. To see which UCCs the CE tables use to integrate and from which survey, see the Source Selection File.
How to integrate data from both surveys?
To integrate data from both surveys, users first need to choose which expenditures they would like to integrate. Some items are only collected in one survey while others are collected in both surveys. Once a user has established which UCCs they would like to integrate, users can then develop an estimate for each UCC individually. After estimates have been developed for each UCC individually, sum the results to develop an integrated estimate.

When integrating data across surveys, keep in mind that estimates created from the Diary Survey will yield a weekly amount and therefore, users will need to adjust their estimates so that each survey result represents the same time period. Inflating the Diary Survey UCC estimate by a multiplier of 13, will result in a quarterly amount, which can then be summed with an Interview Survey estimate.
How to calculate representative statistics?
Users can calculate representative statistics with the weight variable FINLWT21. This variable attributes a weight to each NEWID, which allows users to estimate values for the entire population. This variable is available in the FMLI and FMLD files.
Does the CE program provide sample code that uses the below formulas?
Yes, the CE program provides sample code with the same logic in SAS, STATA, and R on the PUMD documentation page.

6.3.1 Developing a weighted calendar year estimate

This section presents the methods to calculate the population, aggregate values, and average values for expenditures or income for a calendar year.

Denominator: Population

NEWID = Identifier for one CU for one quarter
FINLWT21 = Weight of each NEWID
QNUM = Number of quarters in the analysis (Usually equal to 4 for a 1 year estimate)
MO_SCOPE = Indicator for the number of months in scope for each NEWID

How do users calculate representative population weights (FINLWT21)?
To make the population weights representative of the U.S. population, data users need to generate two adjustment factors:
- "QNUM" adjusts the weights from annual to quarterly. The CE sample is designed to be representative of the entire annual U.S. population in the collection of each quarter. Thus, the weight (FINLWT21) needs to be divided by 4 to adjust for this fact. Without this adjustment the population in the denominator would be 4 times as large as the U.S. population. For example for an annual estimate (4 quarters) QNUM is 4.
- "MO_SCOPE/3" adjusts the weights for CUs that are out of scope. Interviews that were conducted in January, February, or March are not fully in scope. (This applies only to the interview survey. For more information, see Section 6.1 Estimation procedures for the Interview Survey.) For these months, only the part that is in scope should be used for representative population weights. MO_SCOPE adjusts the CU weights to the months in scope.
How to determine the value for MO_SCOPE?
The value for MO_SCOPE depends on the survey. For all four quarters within the Diary Survey, the value for MO_SCOPE is 3. For the Interview Survey, MO_SCOPE depends on the year and month of the interview. Users can identify the year using the FMLI variable QINTRVYR. For a description of what months are in scope, see Section 6.1 Estimation procedures for the Interview Survey.

For the first four quarters, MO_SCOPE is defined by the value of QINTRVMO:
- If QINTRVMO is 1 then MO_SCOPE is 0
- If QINTRVMO is 2 then MO_SCOPE is 1
- If QINTRVMO is 3 then MO_SCOPE is 2
- If QINTRVMO is 4-12 then MO_SCOPE is 3
For the fifth quarter, MO_SCOPE is defined by the value of QINTRVMO:
- If QINTRVMO is 1 then MO_SCOPE is 3
- If QINTRVMO is 2 then MO_SCOPE is 2
- If QINTRVMO is 3 then MO_SCOPE is 1
Numerator: Aggregate value

X = Expenditures or income variables by NEWID. This formula can be used for quarterly, annual, weekly, or monthly data.

Quotient: Average value

6.3.2 Reliability statement

Description of sampling and non-sampling errors
Sample surveys are subject to two types of errors, sampling and non-sampling. Sampling errors occur because observations are not taken from every unit in the entire population. Standard errors measure sampling errors. The primary purpose of standard errors is to provide users with a measure of the variability associated with the mean estimates. The sample estimate and its estimated standard error enable one to construct confidence intervals.

Non-sampling errors can be attributed to many sources, such as definitional difficulties, differences in the interpretation of questions, inability or unwillingness of the respondent to provide correct information, mistakes in recording or coding the data obtained, and other errors of collection, response, processing, coverage, and estimation of missing data. Estimates using a small number of observations are less reliable. Research articles examining CE measurement error and nonresponse bias are included in the CE library. The CE program regularly examines CE data in the annual data quality assessment and compares CE results with other sources of federal statistics. For more information, see the Data Quality and Comparisons page.

Estimating sampling error
The CE program estimates sampling error using Balanced Repeated Replication (BRR). The CE program implements this method with three steps:

Selects 44 subsamples that are balanced half-samples of the full sample.
Estimates a statistic for each half-sample, using the replicate weight variables WTREP01-WTREP44. The replicate weight variables contains a value greater than 0 for CUs assigned to that replicate and a value of missing for CUs not assigned to that replicate.
Estimates the variance between the values of the full-sample and half-samples with the standard formula for computing sample variances.

Replicate means for expenditures

WTREP = 44 Replicate weights (WTREP01-WTREP44)

Standard error

Note that prior to 1990, 20 replicate weights were used, instead of the 44 that are currently in use. When developing a standard error using data prior to 1990, use the replicate weight variables FINLWT01-FINLWT20 in your calculation.

Note that this method does not work for imputed income data. For information on calculating sampling errors from imputed income, see the User's Guide to Income Imputation in the CE.

6.4 Sampling statement

6.4.1 Survey sample design

The CE survey sample is a nationwide household survey representing the entire U.S. civilian noninstitutional population. It includes people living in houses, condominiums, apartments, and group quarters such as college dormitories. It excludes military personnel living overseas or on base, nursing home residents, and people in prisons. The civilian noninstitutional population represents more than 98 percent of the total U.S. population. For more information, see sample design in the chapter Consumer Expenditures and Income of the BLS Handbook of Methods.

6.4.2 Weighting

Each CU included in the CE sample represents a given number of CUs in the U.S. population, which is considered to be the universe. Weighting is used to adjust the relative contribution of each CU to reflect the inverse of its selection probability, as well as to account for nonresponse and to match certain characteristics to known control totals. For more information, see sample design in the chapter Consumer Expenditures and Income of the BLS Handbook of Methods.

6.5 State Weight Files

6.5.1 State weights

The state weights initiative by CE is an effort to produce research microdata products that can allow users to explore consumer expenditure data at the state level, a feature previously unavailable in the data. The CE program intends to explore the viability of the CE sample to support weight creation for as many states as possible. The first available states are California, Florida, New York, and Texas. Users should take note that the state weights are considered a research product, and may not be consistently available across the five listed states going forward as they are highly dependent on sample composition. New Jersey is available for 2016-2020, but beginning in 2021 it was determined the sample in New Jersey could not support a state estimate (for more information, see the memo regarding this change).

Care should be taken when analyzing public use microdata using the state weights, as the small number of households for some expenditures can cause the mean dollar estimate to be imprecise. The more aggregated summary variables will produce more precise estimates. Additionally, it should be noted that these weights are only for their respective states and cannot be used to make inferences about any other geographic areas. The provided data must be used in conjunction with the public use microdata to obtain state level estimates.

For users interested in using the state weights, please visit the CE PUMD files page.

6.5.2 Documentation

An Overview of the State-Level Weighting Procedure provides information to the user on how the weights are created and how they differ from the national weight included in the PUMD.
Using the state weights on the PUMD provides a detailed description of the variables included in the files along with tables of descriptive information about the data such as computation targets and population totals.

Section 7. Considerations

The following are considerations users should be aware of when working with the Consumer CE PUMD. While PUMD contain a wealth of data, the CE surveys were designed with the specific purpose of finding out how U.S. consumers spend their money, and therefore may not be applicable to every research endeavor.

7.1 Geographic data

The CE surveys are designed to produce national expenditure estimates. The estimates are calculated from a relatively small sample of predominantly urban areas. Within these areas, the CE program surveys only a small percentage of those households. For example, in New York State the CE program successfully interviewed roughly 1,500 households for the Interview Survey in 2017.

At the subnational level, the current CE sample design allows data users to create estimates for 4 Census regions, 9 Census divisions, 4 states, and over 25 metropolitan statistical areas. However, the PUMD do not contain information by county or zip code. For more information, see the CE geographic data page.

7.2 Information on purchasers or consumers

Data users cannot identify who bought an item or who consumed it because both CE surveys do not ask these questions. That limits the ability of data users to connect the data with demographic information of the specific purchaser or consumer.

However, data users may be able to infer some demographic characteristics for purchases by single member households because expenditures by single member household are likely purchased and consumed by that person.

Inferring demographic information for households with more than one member is more difficult than for single member households. However, in some cases it may be possible. For example a women's garment is more likely to be used by the women in the household.

7.3 Information on quantity and quality

Generally, the CE surveys only provide the total cost and no unit value. Thus an expense of $220 on wine could be one expensive bottle or several cases of bottles.

However, the CE surveys do provide limited information on the quantity and quality of a few expenditures. For example, the Interview Survey indicates the number purchased for selected large items, like cars or appliances. The Diary Survey may identify the number of meals purchased away from home, but not how many people ate at each meal.

With respect to quality, the PUMD do contain roughly 50 detailed expenditure files that provide additional information about an expenditure. The information booklets distributed to respondents with the survey also describe what information data users can find in the data. For information on the questions the CE surveys ask, see the Survey Materials page.

7.4 Reported income, income taxes, and other financial assets

Reported data on a Consumer Unit's (CU) income, income taxes, and financial assets may have limited analytical use because of two main factors:

Some respondents are unwilling or unable to calculate specific income items. Some income items require calculations that respondents may be unable or unwilling to perform, particularly for income taxes. For example, when the CE program asks in July about wages for the last 12 months, respondents have to sum half of the income from the current year and half from the previous year. With the 2004 data, the CE program began to impute missing income values and with the second quarter of the 2013 data, the program started to provide estimated federal and state income taxes using the NBER TAXSIM program instead of providing respondent-reported values. For more information, see Aaron Cobet's presentation New CE income tax estimates.
The CE sample may underrepresent households with income over $100,000. These high income households have been shown to be more reluctant to respond to the surveys. For more information, see the John Sabelhaus et al. article Is the Consumer Expenditure Survey Representative by Income? The CE program began to adjust the weights of high income households to account for their underrepresentation with the 2015 data.

7.5 Impact of CE methods on some categories of data

The available detail for some categories may have limited analytical use due to the following major reasons:

CE program bundles some products into single categories: Some items are grouped with others into one Universal Classification Code (UCC) because sparse data preclude them from being presented separately. For example, Apple watches, answering machines, Bluetooth accessories, cell phones, cell phone covers, chargers, cordless telephones, headsets, phone jacks and cords, selfie sticks, smartphones, and smartwatches are all bundled into the category "Telephones and accessories." In this case data users cannot identify expenditures on the individual items that are contained in the bundled UCC.
PUMD exclude data that could divulge a respondent's identity. To prevent data users from identifying respondents, the CE program applies a number of methods to mask the identity of its respondents. Thus a published value may differ from the reported value. PUMD flags these items. For more information on these methods, see CE's Protecting Respondent Confidentiality.

7.6 Analyzing individual households over time

Trend analysis of individual households is limited because the CE program interviews each household for a fixed time period. The specific duration depends on the survey, the type of data, and the respondent's willingness to participate.

Interview Survey expenditures: Data users can analyze one sampled address for up to four quarters. This means that one household can provide data for a maximum of 12 months. However, some households drop out earlier, because they move or do not continue to participate. This attrition biases the sample of households that participate for four quarters toward older and more affluent households, and toward owners over renters. In addition, because the CE surveys sample addresses, not people, if a new household moves into a sampled address, the CE program will continue the survey with the new household.
Diary Survey expenditures: Data users can analyze one household for up two weeks because the Diary Survey asks a household to provide their expenditures only for two consecutive weeks.
Income and assets: Both CE surveys collect various types of income data and in addition the Interview Survey collects data on some types of assets of the previous 12 months. Data users cannot analyze these trends for individual households over time.

7.7 Analyzing aggregate data over time

Trend analyses of aggregated PUMD variables and to a lesser degree, items in CE tables, over several years are limited by the changes in collection and sampling methods. For example, every ten years the CE program introduces a new sample design. Generally, when the CE program introduces a new sample design, method, item, or question, the program does not create an overlap where both the old and the new version are available during the transition.

However, longitudinal analysis of major categories across several years is possible if the data user concludes that the changes in the underlying collection methods do not affect the overall trend. Generally, larger categories are less impacted than small categories. For a list of the main survey changes in the history of the CE program, see Consumer Expenditures and Income: History in the BLS Handbook of Methods.

^ⅰ Telescoping errors refer to the temporal displacement of an event. Respondents of the CE surveys may perceive recent events to be more remote than they are (backwards telescoping) and distant events to be more recent than they are (forward telescoping).

^ⅱ Ian Elkin, Recommendation regarding the use of a CE bounding interview, 2013, Bureau of Labor Statistics.

^ⅲ Variable periodicity refers to the period that a given value represents.

^ⅳ Primary keys identify each unique record in the database.

^ⅴ Quarterly summary expenditures are presented as two variables - one containing expenditures made in the previous calendar quarter and one containing expenditures made in the current calendar quarter.

^ⅵ For more information on the number of contacted addresses and completed interviews, see the CE Data Quality Profile.

^ⅶ Primary keys identify each unique record in the database.

Last Modified Date: October 23, 2023