''Cleaning'' the Household Datasets
This paper uses the Cambodia Household Socio-Economic Survey from 2003 to 2004 and the Lao Expenditure and Consumption Survey (LECS III) from 2002 to 2003. The Cambodia survey includes all 24 provinces, from which 900 villages (6.3%) were selected, with probability proportional to size. The Cambodia survey covers household socio- demographics such as age, educational attainment, occupation, industry, sector, and possession of durable goods. It also contains household income, household consumption, crop production, and household expenditure, time use, as well as social indicators such as health, fertility, HIV/AIDS,1 migration, and victims of violence. LECS III's sampling frame comprises all 18 provinces, from which 540 villages (5%) were sampled with probability proportional to size (i.e., number of villages per province). LECS includes the following variables: age, educational attainment, industry, sector, possession of durable goods, time use, household income, household consumption, crop production, and household expenditure, as well as social indicators such as health and infrastructure.2
Both datasets were "cleaned" using the following procedure:
- Observations with missing socio-demographic variables and household income were deleted, as there were no relevant information on which to base the imputation of values. The Cambodia dataset initially consisted of 14,984 households, but 11 households were deleted; therefore, only 14,973 households were used. As for Lao PDR, only one respondent was deleted, leaving 7,998 households for analysis.
- Units of analysis across these spreadsheets were standardized at the household level. Some variables reported by all household members—such as wage, outputs, expenses, profits, property rents, dividends, and transfers—were thus aggregated into a single value representing the whole household.
- Wages were imputed to self-employed income using the mean wage of respondents with the same set of characteristics (including age, education, skill, and industry of employment). For observations that did not match, education and industry were aggregated into larger classifications (Table A1.3 [ PDF 76.8KB | 1 page ], Table A1.4 [ PDF 76.8KB | 1 page ], and Table A2.2 [ PDF 43.7KB | 1 page ]). To illustrate, wages of employed respondents who were reported as 18 years old, seventh grader, unskilled, and farmers could not be used to impute wages for self-employed respondents who had similar characteristics—i.e., 18 years old, unskilled and farmer, but were eighth graders. To remedy this, all respondents who finished sixth to eighth grade were collapsed into one category called lower secondary; this allowed a match between self employed respondents and wage earners who possessed a similar set of characteristics: 18 years old, unskilled, farmers, and lower secondary (that is, either sixth, seventh, or eighth graders). This procedure greatly increased the number of matched respondents.
- Some missing industry data were filled using the information or clues provided by occupation. For example, for some respondents whose occupations were salespersons or security guards, it was assumed that the corresponding industry is services.
- One concern about the Lao PDR dataset was that respondents were classified into agriculture and non-agriculture workers, but were not categorized as employed or self-employed. To separate the employed from the self employed among the non-agriculture workers, the following approach was taken: based on the responses of the household-head to questions about family business, if a household owned a non-agriculture business, then he or she was self-employed. Also, if the other household members worked in the same industry as the household head, then they were assumed to be working in the family business, and classified as self-employed. Among the agriculture workers, based on the household heads' responses to agriculture business-related questions in the questionnaire, if a household operated its own agriculture business, then he or she was considered self-employed. Households that did not run their own farm business were assumed to be employed by other people who ran an agricultural business.
- For Cambodia, occupation was used to determine the skill level for each income category. However, the Lao PDR dataset did not report occupation. Thus, education (Table A2.2 [ PDF 43.7KB | 1 page ]) was used as proxy indicator for skill in the Lao PDR data, since studies have shown that workers' productivity or skill depends both on years of education and what is learned at school (Heckman, Layne-Farrar, and Todd 1995; Murnane, Willett, and Levy 1995 in Fasih 2008). Ninth graders and above were considered skilled, and the rest were categorized as unskilled.
- Finally, for the Lao PDR dataset, 90% of crop price data were taken from the Food and Agriculture Organization Statistics (FAOSTAT) database.
Download this Paper [ PDF 293.9KB| 26 pages ].
Post a Comment | We welcome your feedback on this publication. Post a comment. ADBI is not obliged to acknowledge or publish comments and may abridge or edit them before web posting. |
Comment(s)
There are [0] comment(s) for this entry. Post a comment.
|
The views expressed in this paper are the views of the authors and do not necessarily reflect the views or policies of the Asian Development Bank Institute (ADBI), the Asian Development Bank (ADB), its Board of Directors, or the governments they represent. ADBI does not guarantee the accuracy of the data included in this paper and accepts no responsibility for any consequences of their use. Terminology used may not necessarily be consistent with ADB official terms.
|
|
|