Son preference in Indian Families:
Absolute Versus relative Wealth Effects
Sylvestre Gaudin
ONLINE SUPPLEMENTS
Supplement 1. DATA
1.
Data Manipulations and Selection Issues
Table S1. Observations per
Household: Effective Sample
1. 2.
Construction of the Dependant Variable and Data Selection Issues
Table S2. Distribution by wealth quintiles before and after
selection on SP.
Table S3. Distribution by wealth quintiles with and without
missing observations on independent variables.
Table S4. Distribution by wealth quintiles before and after
removing small PSU’s.
Table S5. Distribution by wealth quintiles before and after
data selection.
Table S6. Distribution of educational attainment before and
after data selection.
3. Modifications from B&Z on
Individual-Level Independent Variables
Supplement 2. STABILITY TESTS FOR SMALL SAMPLE PCA
Table S7. Summary Statistics on Correlations Results
between Full-sized and Reduced-Sample PC Scores from 50 Independent Randomized
Household Selections
Figure S1. Median Correlations between Reduced Sample and Full Sample P.C. Results
Supplement 3. ALTERNATIVE LOCAL AREA GROUPINGS (NFHS-2)
Figure S2. Grouping of Households into Local Areas based on NFHS-2 Geographical Identifiers
Table S8. Constructed Local Areas from Household Level
NFHS-2 Data
Table S9. Correlation Between PSU and Local
Area PC Scores by PSU Size.
Supplement 4. ADDITIONAL RESULTS
1. NFHS-3 Multilevel Estimation
Results
Table S10. Multilevel
linear estimation of son preference models: NFHS-3
2. Discussion of Results on Ideal
Number of Children and Its Square
3. Comparison of OLS, Logit and
Ordered Logit: NFHS-2 and NFHS-3 Results
Table S11.
Comparison of OLS, logit, and ordered logit specifications: NFHS-2
Table S12. Comparison
of OLS, logit, and ordered logit specifications: NFHS-3
4. Random-Effect Logit with State
Fixed Effects. NFHS-2 Results
Table S13. Son
preference models: mixed effects logit estimation for NFHS-2
5. Educational Preference Model
Table S14. Raw distribution of answers on educational preferences
Table S15. Multilevel Linear Estimation of Stated
Educational Bias: NFHS-2 Sample
Supplement 5. GEOGRAPHICAL REACH OF MARRIAGES
Table of Contents
Table S1. Observations per household: effective sample
Observations per |
NFHS-2 |
NFHS-3 |
||
Household |
Frequency |
Cumulative % |
Frequency |
Cumulative % |
1 |
59,143 |
75.94 |
67,529 |
80.6 |
2 |
14,417 |
94.45 |
13,213 |
96.37 |
3 |
3,378 |
98.78 |
2,390 |
99.22 |
4 |
766 |
99.77 |
484 |
99.8 |
5 |
122 |
99.92 |
126 |
99.95 |
6 |
25 |
99.96 |
36 |
99.99 |
7 |
35 |
100 |
7 |
100 |
Total cases |
77,886 |
|
83,795 |
|
Table S2. Distribution
by wealth quintiles before and after selection on SP.
All |
NFHS-2 |
NFHS-3 |
|||
Wealth quintile |
% before |
% after |
% before |
% after |
|
1 |
15.62 |
15.23 |
11.65 |
11.49 |
|
2 |
16.99 |
16.63 |
14.7 |
14.54 |
|
3 |
19.55 |
19.48 |
19.66 |
19.59 |
|
4 |
23.09 |
23.30 |
24.47 |
24.54 |
|
5 |
24.76 |
25.37 |
29.53 |
29.84 |
|
Based on N = |
84,348 |
79,731 |
89,189 |
86,176 |
|
Table S3.
Distribution by wealth quintiles with and without missing observations on
independent variables.
All |
NFHS-2 |
NFHS-3 |
||
Wealth quintile |
% before |
% after |
% before |
% after |
1 |
15.23 |
15.25 |
11.49 |
11.44 |
2 |
16.63 |
16.65 |
14.54 |
14.53 |
3 |
19.48 |
19.52 |
19.59 |
19.59 |
4 |
23.30 |
23.35 |
24.54 |
24.55 |
5 |
25.37 |
25.22 |
29.84 |
29.88 |
Based on N = |
79,731 |
78,782 |
86,176 |
84,871 |
Table S4. Distribution
by wealth quintiles before and after removing small PSUs.
All |
NFHS-2 |
NFHS-3 |
||
Wealth quintile |
% before |
% after |
% before |
% after |
1 |
15.25 |
15.31 |
11.44 |
11.54 |
2 |
16.65 |
16.72 |
14.53 |
14.54 |
3 |
19.52 |
19.55 |
19.59 |
19.47 |
4 |
23.35 |
23.33 |
24.55 |
24.51 |
5 |
25.22 |
25.09 |
29.88 |
29.95 |
Based on N = |
78,782 |
77,886 |
84,871 |
83,785 |
Table S5. Distribution
by wealth quintiles before and after data selection
All |
NFHS-2 |
NFHS-3 |
|||
Wealth quintile |
% de jure
sample |
% final sample |
% de jure
sample |
% final sample |
|
1 |
15.62 |
15.31 |
11.65 |
11.54 |
|
2 |
16.99 |
16.72 |
14.7 |
14.54 |
|
3 |
19.55 |
19.55 |
19.66 |
19.47 |
|
4 |
23.09 |
23.33 |
24.47 |
24.51 |
|
5 |
24.76 |
25.09 |
29.53 |
29.95 |
|
Based on N = |
84,348 |
77,886 |
89,189 |
83,785 |
|
Table S6. Distribution
of educational attainment before and after data selection
|
NFHS-2 |
NFHS-3 |
||
Educational attainment |
% de jure
sample |
% final sample |
% de jure
sample |
% final sample |
No education |
50.48 |
49.47 |
39.91 |
39.27 |
Incomplete primary |
9.71 |
9.67 |
8.7 |
8.53 |
Complete primary |
7.36 |
7.47 |
6.94 |
6.94 |
Incomplete secondary |
16.28 |
16.73 |
31.27 |
31.71 |
Complete secondary |
7.15 |
7.32 |
4.78 |
4.91 |
Higher |
9.03 |
9.34 |
8.39 |
8.63 |
Based on N = |
84,325 |
77,886 |
89,183 |
83,785 |
The following two sections give additional information on the construction of independent variables not essential to the comprehension of the paper.
2. CONSTRUCTION OF PER CAPITA STATE DOMESTIC PRODUCTS
The figures come from a table of Gross State Domestic
Product in current prices compiled by the Central Statistical Office of the
Ministry of Statistics and Program Implementation of the Government of India;
they are translated into constant 2001 prices using the CSO’s Consumer Price
Index for Industrial workers by centers (Annual Report 2006, CSO, Table 3). For
states with more than one “center”, a weighted average of state centers is used.
This is feasible because the CSO table provides the weight of each center
within the corresponding state. A few small states do not have centers for
measurement of CPI, in which case, the CPI of the largest closest state is
used. Per capita calculations are simply made using the 1991 and 2001 state
populations from the Indian Census. To give an idea of levels and to check
consistency, values of constant GDP/c are checked using the 2008 dollar
exchange rate. For 1999 the highest per capita GPD is USD 1,040 in the Capital
Delhi and USD 187 in the poorest state,
It
is important to note that state delineations were changed in 2000. The NFHS-3
sample includes 29 states instead of the 26 in NFHS-2: Jharkhand was split out
of
a. Education of the woman and her partner are measured in years of education instead of the categorical variables chosen by Bhat and Zavier (2003). The choice was made for two reasons: first because definitions of categories between the two surveys were changed; and second, to simplify the exposition and better focus on the variables of primary interest.
b. In addition to years of education, a literacy variable is included to account for the large number of interviewed women with no education (about 55% in NFHS-2 and 43% of NFHS-3), the variable takes the value of 1 if the woman is illiterate, 0 otherwise. When comparing results across periods, however, it is important to know that definitions of literacy changed between the two surveys. In NFHS-2, a woman who answered no to “can you read and write?” was marked illiterate. In NFHS-3, the woman was asked whether she could read all or part of a sentence from a literacy card; women who could not read at all were coded as illiterate. Unfortunately, there is no way to tell how the two variables compare exactly; in NFHS-2, those who said they could not read or write may have been able to read part of the NFHS-3 sentence, which would inflate illiteracy relative to NFHS3. On the other hand, there was no question in NFHS-2 to check reading skills and women may have wrongly declared being able to read and write.
c. Age. The squared term for the respondent’s age was dropped. Preliminary tests indicated that the effect of age goes primarily through its correlation with total number of children; given the definition of SP, including the square of the mother’s age did not improve the fit of the model and had no impact on other results.
Although principal components analysis (PCA) is widely recognized as a means to create wealth indices from survey data, the analysis is normally performed using large sample sizes. When the comparison base is a community, the number of households interviewed in the same community may be small, even if it constitutes a random sample for that community as is the case here for Primary Sampling Units (PSUs). The literature does not provide information about a minimum sample size necessary to correctly rank households using PCA. In order to evaluate whether the method correctly ranks households in smaller samples, I ran simulation tests using all PSU’s with sample size N≥55 (63 PSUs in NFHS-2 and 99 in NFHS-3). The following procedure was followed:
i. PCA is performed for the full size PSUs as in the main analysis; results are recorded in PCALL for scores and PCQALL for quintiles.
ii. Numbers from 1 to N/5 are randomly assigned to household in each quintile (regardless of how they ranked in score); samples are reduced to n=50, 40, 35, 30, 25, 20, 15, 10, and 5 households by keeping households numbered 1 through n/5 for each value of n.
iii. The same PCA is run for each sample size; scores are recorded in PCn and quintiles in PCQn.
Using this procedure for all PSU’s, ten different wealth scores and quintiles based on sample sizes from the full N≥55 down to five households (one per original full-sample quintile) are obtained. Because household selection in each quintile makes a difference in terms of resulting PC scores, the procedure is repeated k=50 time, each time with a new randomization of the ordering of households in each quintile (adding more runs did not change summary statistics of correlation coefficient). Principal components scores thus obtained are recorded in PCnk and PCQnk, k=1, 2, …50. Correlation coefficients between PCALL and PCnk (rsnk) and between PCQALL and PCQnk (rqnk), k=1, 2, … 50, are calculated for all n, based on the five households per PSU who obtained the number 1 in the random ranking in the kth run and recorded in the random variables rqn and rqn.
Table S7 gives summary statistics rqn and rqn by sample size n. Correlations of quintiles are slightly lower than correlations of scores due to threshold effects but correlations with the full sample results still average above 90 for all n>20 and above 80 for n>10. Results are very similar across the two samples. The median correlation is virtually identical to the mean in all cases; Figure S1 represents the gradual change in median correlations between full-sample and reduced-sample scores and quintiles as sample sizes are reduced. The tests reveal that the principal components procedure is relatively stable to the number of households in the sample. Correlations decrease with the size of the sample but there are no obvious breaks, down to samples sizes of 10. The rate at which deterioration occurs increases slightly when sample sizes get below 25 (although the range of y-values chosen emphasizes the magnitude of the deterioration.)
Table S7. Summary statistics on correlations
between full-sized and reduced-sample principal components scores from 50
independent randomized household selections.
|
NFHS2 (based on 63 PSUs) |
|
NFHS3(based on 99 PSUs) |
||||||
Sample sizes (n) |
Min |
Max |
Mean |
St. dev. |
|
Min |
Max |
Mean |
St. dev. |
50 (10 per quintile) |
.96 |
1 |
.99 |
.01 |
|
.96 |
1 |
.99 |
.01 |
40 |
.92 |
.99 |
.98 |
.02 |
|
.95 |
.99 |
.98 |
.01 |
35 |
.92 |
.99 |
.98 |
.02 |
|
.92 |
.99 |
.97 |
.01 |
30 |
.87 |
.99 |
.97 |
.02 |
|
.87 |
.98 |
.96 |
.02 |
25 |
.86 |
.98 |
.96 |
.02 |
|
.91 |
.98 |
.95 |
.02 |
20 |
.85 |
.97 |
.94 |
.02 |
|
.86 |
.97 |
.93 |
.02 |
15 |
.84 |
.96 |
.92 |
.03 |
|
.85 |
.95 |
.91 |
.03 |
10 |
.77 |
.94 |
.90 |
.03 |
|
.80 |
.93 |
.89 |
.03 |
5 |
.78 |
.91 |
.85 |
.03 |
|
.76 |
.89 |
.83 |
.03 |
Note:
Correlations for different size samples are calculated using the 5 observations
per PSU with calculated principal components scores at all levels, so 315
observations in NFHS-2 and 495 in NFHS-3 are used to calculate the
correlations. The number of observations is the same across the 50 runs but the
households are different.
Fig. S1. Median correlations between reduced- and full-sample
principal components results
Although not available in NFHS-3, the village/town, thesil, and district of residence are recorded for each household in NFHS-2. Instead of using the PSU as local base to calculate relative wealth scores, households in the NFHS-2 sample could be grouped into “local areas” based on these geographical identifiers (the grouping was not possible with PSU’s because PSU’s were numbered irrespective of location). Choosing groupings by area rather than PSU allowed larger sample sizes and more variation in the size of comparison groups in line with population densities. Samples ranged from 30 (by construction) to 2,435. The largest samples represented areas identified as a single village/town. The less densely populated an area, the more likely grouping involved full thesils or districts. This may be good to evaluate relative wealth if households in low density areas tend to position themselves relative to larger geographical areas. One drawback of grouping by geographical identifiers, however, is that the resulting samples are no longer statistically representative of the area. The procedure to create local areas required intensive detailed manipulation of the data set. The principle followed consisted in finding the closest grouping with at least 30 households. In a few cases, this implied grouping two districts together but in most cases, groupings remained within one district and in more than half the data, a single village/town. Figure S2 represents the algorithm used to find the smallest geographical area with samples of at least 30 households. Single village/towns, thesils, and districts are denoted respectively as Li, Ti, and Di, i being the identifier corresponding to the local residence of the household considered. Neighboring villages/town/thesils/and districts could be identified if they were numbered consecutively
Fig. S2. Grouping of households into local areas based on NFHS-2
geographical identifiers
Table
S8 indicates the extent to which local areas were aggregated to find the
local base for each household. About half of the households were compared to
their own village/town area without need for aggregation and 60% were compared
to geographical areas smaller than the tehsil. For 10% of households, district-level
comparisons were necessary. The average number of households in the local area
samples used for PCA is 206, the median is 52, with a minimum of 30 (by construction)
and a maximum of 2,435.
Table S8. Constructed local areas
from household level NFHS-2 data
Lowest level retained for
PCA |
Number of households in area used as base for PCA |
||||
Total |
% |
Min |
Max |
Mean |
|
Village/town (L1) |
50,333 |
54.42 |
30 |
2435 |
321 |
Joined villages/towns (L2 to L5) |
6,384 |
6.9 |
30 |
88 |
48 |
Tehsil (T1) |
16,135 |
17.45 |
30 |
356 |
64 |
Joined tehsils (T2) |
10,347 |
11.19 |
36 |
178 |
71 |
District (D1) |
6,956 |
7.52 |
30 |
220 |
85 |
Joined districts (D2) |
2,331 |
2.52 |
31 |
206 |
87 |
Principal
components scores and quintiles were calculated using these comparison groups. Table S9 gives correlation between the relative wealth
scores calculated at the PSU level and those calculated using local identifiers
for different PSU sizes. Obviously, correlations are much lower at low sizes
because the lower the size the more likely a much wider geographical base was
used (such as the district).
Table S9 Correlations between PSU and
local area principal scores by PSU sample size
Sample size in PSU |
No. of PSUs |
No. of households |
Correlation coeff. |
All sizes |
3,215 |
92,486 |
0.87 |
≥50 |
112 |
6,181 |
0.94 |
40-49 |
232 |
10,086 |
0.93 |
35-39 |
312 |
11,462 |
0.93 |
30-34 |
637 |
20,208 |
0.94 |
25-29 |
868 |
23,487 |
0.81 |
20-24 |
650 |
14,503 |
0.79 |
15-19 |
316 |
5,429 |
0.76 |
10-14 |
85 |
1,106 |
0.66 |
<10 |
3 |
24 |
not significant |
The article includes
tables for NFHS-2 and pooled samples. Full estimation results for NFHS-3 are
given below (Table S10)
Table S10. Multilevel
linear estimation of son preference models: NFHS-3
|
Model |
|||
Independent variable |
MW1 |
MWR1 |
MWR2 |
MWR3 |
State
level: |
|
|
|
|
GSP/c |
-0.00039* |
-0.00039* |
-0.00037* |
-0.00036* |
|
(0.068) |
(0.069) |
(0.085) |
(0.088) |
Household
level: |
|
|
|
|
W |
-0.049*** |
-0.050*** |
-0.068*** |
-0.064*** |
WR |
|
|
0.014** |
0.0039 |
|
|
|
(0.005) |
(0.531) |
WR × No land |
|
|
|
0.081 |
|
|
|
|
(0.152) |
WR × Land |
|
|
|
0.016*** |
|
|
|
|
(0.001) |
Land
acres × Urban (×100) |
|
-0.017 |
-0.018 |
-0.024 |
|
|
(0.323) |
(0.293) |
(0.168) |
Land
acres× Rural (×100) |
|
0.0096 |
0.0042 |
0.0018 |
|
|
(0.569) |
(0.803) |
(0.916) |
Individual
level |
|
|
|
|
Illiterate |
0.0034 |
0.0035 |
0.0037 |
0.0038 |
|
(0.278) |
(0.263) |
(0.237) |
(0.225) |
Education,
self |
-0.0019*** |
-0.0018*** |
-0.0018*** |
-0.0018*** |
Education,
partner |
-0.00035 |
-0.00032 |
-0.00035 |
-0.00039 |
|
(0.148) |
(0.185) |
(0.152) |
(0.114) |
Paid work |
-0.0055*** |
-0.0055*** |
-0.0052** |
-0.0051** |
|
(0.008) |
(0.008) |
(0.012) |
(0.013) |
Other
work |
0.010*** |
0.0098*** |
0.0093*** |
0.0090** |
|
(0.004) |
(0.006) |
(0.008) |
(0.011) |
Media
exposure |
-0.0069*** |
-0.0068*** |
-0.0069*** |
-0.0070*** |
|
(0.003) |
(0.001) |
(0.003) |
(0.003) |
Religion:
Ref. Hindu |
|
|
|
|
Muslim |
-0.0046 |
-0.0044 |
-0.0043 |
-0.004 |
|
(0.148) |
(0.164) |
(0.175) |
(0.207) |
Sikh |
0.012 |
0.011 |
0.012 |
0.011 |
|
(0.144) |
(0.154) |
(0.151) |
(0.172) |
Christian |
-0.030*** |
-0.030*** |
-0.030*** |
-0.030*** |
Other |
-0.0028 |
-0.0028 |
-0.0024 |
-0.0023 |
|
(0.614) |
(0.622) |
(0.665) |
(0.680) |
Scheduled
Caste |
0.0021 |
0.0021 |
0.0023 |
0.0026 |
|
(0.401) |
(0.397) |
(0.357) |
(0.301) |
Scheduled
Tribe |
-0.0082** |
-0.0087** |
-0.0091** |
-0.0092** |
|
(0.022) |
(0.015) |
(0.011) |
(0.011) |
Age
(respondent) |
0.000069 |
0.000083 |
0.000088 |
0.000089 |
|
(0.594) |
(0.521) |
(0.497) |
(0.491) |
Sons |
0.025*** |
0.025*** |
0.025*** |
0.025*** |
Daughters |
-0.014*** |
-0.014*** |
-0.014*** |
-0.014*** |
Sons-
dead |
0.0062*** |
0.0062*** |
0.0062*** |
0.0062*** |
|
(0.002) |
(0.002) |
(0.002) |
(0.002) |
Daughters-
dead |
-0.0021 |
-0.0022 |
-0.0022 |
-0.0022 |
|
(0.336) |
(0.309) |
(0.308) |
(0.309) |
Ideal-
total |
0.0026 |
0.0029 |
0.0027 |
0.0027 |
|
(0.416) |
(0.367) |
(0.386) |
(0.391) |
Ideal-
total squared |
-0.0011 |
-0.0011 |
-0.0011 |
-0.0011 |
|
(0.015) |
(0.012) |
(0.012) |
(0.013) |
Odd
ideal |
0.17*** |
0.17*** |
0.17*** |
0.17*** |
Fixed Effects |
|
|
|
|
Region: Ref. East |
|
|
|
|
North |
0.029*** |
0.029*** |
0.029*** |
0.029*** |
|
(0.004) |
(0.004) |
(0.004) |
(0.004) |
Central & West |
0.027** |
0.027** |
0.027** |
0.027** |
|
(0.025) |
(0.025) |
(0.025) |
(0.025) |
South |
-0.035*** |
-0.035*** |
-0.035*** |
-0.034*** |
|
(0.004) |
(0.003) |
(0.003) |
(0.004) |
Urban Residence |
-0.012*** |
-0.012*** |
-0.0097*** |
-0.0087*** |
|
|
|
(0.001) |
(0.002) |
Constant |
0.073*** |
0.073*** |
0.072*** |
0.071*** |
Random components (standard deviations) |
||||
Level 1: State |
0.020*** |
0.020*** |
0.020*** |
0.018*** |
Level 2: PSU |
0.037*** |
0.037*** |
0.037*** |
0.037*** |
Level 3: Household |
0.022*** |
0.020*** |
0.020*** |
0.020*** |
Residual error |
0.25*** |
0.25*** |
0.25*** |
0.25*** |
Regression Statistics |
|
|
|
|
Akaike
Information Criterion |
2,966 |
2,937 |
2,931 |
2,929 |
Nested
groups (unbalanced) States Local areas (PSUs) Households |
29 3,722 75,343 |
29 3,722 75,343 |
29 3,722 75,343 |
29 3,722 75,343 |
N |
83,785 |
83785 |
83785 |
83785 |
note: p-values
in parentheses (p>|z|), omitted when p<.001.
*p<0.10 ** p<0.05 ***p<0.01
Coefficients on ideal-total are negative and significant in NFHS-2 and the pooled sample but insignificant when using NFHS-3 alone. Coefficients on the squared term are positive and significant in the reported results but significantly negative in the NFHS-3 results (reported above, Table S10). Alternative estimations with logit and ordered logit yielded significant results in line with B&Z (see section D3). However, B&Z’s continuous variable, also proportional to ideal family size yielded the same direction of effect with OLS as with logit. It appeared that results on ideal-total were very sensitive to the specification of the dependent variable and the data set used. This deserved further inquiry…
To understand the discrepancy, I estimated the model with alternative dependent variable. When the independent variable was calculated as ideal-boys divided by ideal-total, as in B&Z, signs were found to coincide with OLS, i.e. a positive sign on ideal-total and a negative sign on the squared term. A small modification to the variable, however, inverted the signs (although, importantly for this article, signs and significance levels on other variables were unchanged). The modification concerned the cases when women gave the same answer to the ideal number of either sex as the ideal number of children, for which the value of the dependent variable was 1/2 in B&Z, instead of zero here. Note that these responses are not less son-preferring or more girl-preferring than responses with equal ideal number of boys and girls in their ideal family so a score of zero makes sense.
Two alternative dependant variables measuring son-preference are constructed resembling those used in the literature. The first one is a binary variable that takes the value of 1 when a woman indicated more sons than daughters in her ideal family composition. The second variable is an ordered categorical variable constructed by grouping the continuous variable into three categories according to the difference between ideal number of boys and ideal number of girls (D): it takes the value of 0 if D≤0, 1 if D=1, and 2 when D³2 . The estimation is done without taking account of the hierarchical structure of the data but standard errors do take account of the complex survey design with clustering at the PSU level, and strata coinciding with rural and urban areas of states. Results for the pooled sample are presented in the article. This supplement gives additional tables for the NFHS-2 (Table S11) and NFHS-3 (Table S12) separate results. Coefficient estimates for the logit and ordered logit regressions are reported in odd ratios. This is useful in that it gives a better idea of the relative magnitude of different effects; it is also easier to compare with B&Z’s logit results. To compare with OLS estimates, however, one must remember that values less than one correspond to negative signs in the linear regression. Results of the logit and ordered logit models give the same significance level and direction of effect for all the variables pertaining to the hypothesis of this paper.
Table S11. Comparison of OLS, logit, and ordered
logit specifications: NFHS-2
|
Estimation methoda |
||
Independent Variables |
OLS |
Logitb |
Ordered
logitb |
GSP/c |
-0.0016*** |
0.98*** |
0.98*** |
W |
-0.047*** |
0.53*** |
0.65*** |
WR (PSU) |
0.00275 |
1.03 |
0.996 |
|
(0.718) |
(0.799) |
(0.955) |
WR×Land |
0.015*** |
1.28*** |
1.19*** |
|
(0.006) |
(0.001) |
|
Land acres×urban (×100) |
-0.00086 |
0.62 |
0.81 |
|
(0.943) |
(0.163) |
(0.249) |
Land acres×rural (×100) |
0.036*** |
1.58*** |
1.28*** |
|
|
(0.008) |
(0.0008) |
Illiterate |
0.0081* |
1.02 |
1.02 |
|
(0.058) |
(0.717) |
(0.722) |
Education, self |
-0.0028*** |
0.95*** |
0.97*** |
Education, partner |
-0.00022 |
1.00 |
0.998 |
|
(0.456) |
(0.611) |
(0.518) |
Paid work |
-0.0073** |
1.005 |
1.01 |
|
(0.016) |
(0.899) |
(0.712) |
Other work |
0.0057 |
1.07 |
1.05*** |
|
(0.115) |
(0.127) |
(0.161) |
Media Exposure |
-0.014*** |
0.87*** |
0.90*** |
Religion: Ref. Hindu |
|
|
|
Muslim |
-0.015*** |
0.74*** |
0.85*** |
Sikh |
0.034*** |
1.43*** |
1.31*** |
Christian |
-0.0067 |
0.73*** |
0.78*** |
|
(0.334) |
(0.003) |
(0.002) |
Other |
-0.0170 |
0.85 |
0.88 |
|
(0.268) |
(0.195) |
(0.162) |
Scheduled caste |
0.0034 |
1.03 |
1.02 |
|
(0.263) |
(0.541) |
(0.491) |
Scheduled tribe |
-0.022*** |
0.72*** |
0.80*** |
Age (respondent) |
-0.00034** |
0.989*** |
0.995*** |
|
(0.034) |
|
|
Sons |
0.025*** |
1.37*** |
1.26*** |
Daughters |
-0.015*** |
0.84*** |
0.87*** |
Sons, dead |
0.010*** |
1.11*** |
1.08*** |
Daughters, dead |
-0.0027 |
0.99 |
0.98 |
|
(0.214) |
(0.716) |
(0.364) |
Ideal-total |
-0.0015** |
2.91*** |
2.25*** |
|
(0.023) |
|
|
Ideal-total squared |
0.0015** |
0.91*** |
0.95*** |
|
(0.057) |
|
|
Odd ideal |
0.18*** |
37.89*** |
13.51*** |
Region: Ref. East |
|
|
|
North |
0.047*** |
1.79*** |
1.49*** |
West |
0.030*** |
1.36*** |
1.25*** |
South |
-0.034*** |
0.43*** |
0.56*** |
Urban Residence |
-0.0066* |
0.95 |
0.98** |
|
(0.081) |
(0.333) |
(0.546) |
Constant |
0.142*** |
0.026*** |
|
Constant 0-1 |
|
|
28*** |
Constant 1-2 |
|
|
390*** |
Regression Statistics |
|
|
|
N |
77,886 |
77.886 |
77.886 |
F |
554 |
406 |
375 |
R-Squared |
0.18 |
-- |
-- |
Note: p-values (p>|z|) in parentheses below the coefficient
estimate, omitted when p<.001.
a All standard errors corrected for group
heteroskedacticity caused by the NFHS complex survey design. Strata are
rural/urban areas of each state in each NFHS-sample. Household effects are
ignored.
b Odd ratios reported; numbers <1
indicate negative relationships.
* p<0.1;
** p<0.05; *** p<0.01
Table S12 Comparison of OLS, logit, and ordered
logit specifications: NFHS-3
|
Estimation methoda |
||
Independent Variables |
OLS |
Logitb |
Ordered
logitb |
GSP/c |
-0.00028*** |
0.99*** |
0.996*** |
|
(0.003) |
|
|
W |
-0.026** |
0.65** |
0.76** |
|
(0.032) |
(0.022) |
(0.044) |
WR (PSU) |
0.0013 |
1.06 |
1.003 |
|
(0.870) |
(0.654) |
(0.973) |
WR×Land |
0.013** |
1.20** |
1.18*** |
|
(0.014) |
(0.028) |
(0.007) |
Land acres×urban ×100 |
-0.042 |
0.66 |
0.68 |
|
(0.114) |
(0.389) |
(0.260) |
Land acres×rural×100 |
0.033 |
1.31 |
1.31 |
|
(0.167) |
(0.513) |
(0.293) |
Illiterate |
-0.0020 |
0.95 |
0.95 |
|
(0.618) |
(0.367) |
(0.206) |
Education, self |
-0.0031*** |
0.95*** |
0.96*** |
Education, partner |
-0.00062** |
0.99 |
0.994** |
|
(0.037) |
(0.263) |
(0.048) |
Paid work |
-0.0027 |
0.97 |
0.99 |
|
(0.334) |
(0.496) |
(0.692) |
Other work |
0.016*** |
1.19** |
1.15*** |
|
(0.001) |
(0.01) |
(0.004) |
Media exposure |
-0.0095*** |
0.89*** |
0.92*** |
|
|
(0.003) |
(0.003) |
Religion: Ref. Hindu |
|
|
|
Muslim |
-0.010*** |
0.75*** |
0.86*** |
|
(0.004) |
|
|
Sikh |
0.033*** |
1.43*** |
1.36*** |
|
|
(0.004) |
(0.002) |
Christian |
-0.025*** |
0.57*** |
0.63*** |
Other |
-0.024*** |
0.90 |
0.86 |
|
(0.006) |
(0.440) |
(0.142) |
Scheduled caste |
0.0023 |
1.05 |
1.05 |
|
(0.447) |
(0.283) |
(0.165) |
Scheduled tribe |
-0.0050 |
0.88* |
0.93 |
|
(0.313) |
(0.077) |
(0.142) |
Age (respondent) |
-0.000067 |
0.996 |
0.998 |
|
(0.67) |
(0.105) |
(0.255) |
Sons |
0.022*** |
1.35*** |
1.24*** |
Daughters |
-0.012*** |
0.84*** |
0.87*** |
Sons, dead |
0.0041* |
1.05 |
1.04 |
|
(0.054) |
(0.106) |
(0.103) |
Daughters, dead |
-0.000034 |
0.998 |
0.998 |
|
(0.989) |
(0.963) |
(0.916) |
Ideal-total |
-0.00089 |
2.57*** |
2.35*** |
|
(0.905) |
|
|
Ideal-total squared |
0.000022 |
0.89*** |
0.95*** |
|
(0.982) |
|
|
Odd ideal |
0.19*** |
42.4*** |
21.52*** |
Region: Ref. East |
|
|
|
North |
0.0078** |
1.14** |
1.07* |
|
(0.043) |
(0.022) |
(0.077) |
West |
0.011*** |
1.24*** |
1.16*** |
|
(0.009) |
|
(0.001) |
South |
-0.044*** |
0.40*** |
0.48*** |
Urban Residence |
-0.0074** |
0.86 |
0.90** |
|
(0.032) |
(0.006) |
(0.012) |
Constant |
0.082*** |
0.011*** |
|
Constant 0-1 |
|
|
51*** |
Constant 1-2 |
|
|
954*** |
Regression Statistics |
|
|
|
N |
83,785 |
83,785 |
83,785 |
F |
421 |
344 |
300 |
R-Squared |
0.17 |
-- |
-- |
Note: p-values (p>|z|) are in parentheses below the coefficient
estimate, they are omitted when p<.001.
a All standard errors corrected for group
heteroskedacticity caused by the NFHS complex survey design. Strata are
rural/urban areas of each state in each NFHS-sample. Household effects are
ignored.
b Odd ratios reported; numbers <1
indicate negative relationships.
* p<0.1;
** p<0.05; *** p<0.01
The estimation treats household effects as random
and state effects as fixed. Standard errors are not corrected for clustering at
the PSU level. This is not the ideal multilevel procedure (multilevel logit
estimation was not feasible for such a large data set and complex structure of
the model given our computer resources at the time), but it gives us a good
idea whether the multilevel results in the text suffer from a less than ideal
distribution of the dependent variable.
Here the dependent variable is
dichotomous, either the women declared preferring more sons or not. Results of
the analysis are reported in Table S13. As above,
results are reported in odd ratios so that the magnitude of effects can be
easily compared. An estimated odds ratio below 1 is equivalent to a negative
relationship while an odds ratio above 1 indicates a positive relationship. For
bo <1, the lower bo greater the effect; the opposite is true for bo >1, although
the relative probabilities cannot directly be compared.
All variables of interest
(wealth-related) get the same direction of effect and higher significance
level. In wealthier households (in
absolute terms), the odds of being son-preferring are found to be less than one
half the odds of expressing no son preference, on average. The impact of
absolute wealth in reducing the odds of son-preference is found much larger at
the household level than for state wealth. The strength of the relationship
increases by 4% when only land ownership is controlled for, it increases by 22%
when relative wealth and land ownership are both in the estimation. The Akaike
information criteria reveal the same pattern as the multilevel linear
estimation.
Table S13. Son preference models: mixed-effects
logit estimation for NFHS-2
|
Model |
|||
Independent variable |
MW-1 |
MWR-1 |
MWR-2 |
MWR-3 |
GSP/c |
0.984*** |
0.984*** |
0.985*** |
0.986*** |
0.464*** |
0.443*** |
0.345*** |
0.375*** |
|
WR |
|
|
1.22*** |
1.03 |
|
|
|
(0.003) |
(0.681) |
WR×Land |
|
|
|
1.23*** |
|
|
|
|
(0.001) |
Land acres × Urban (×100) |
|
1.0002 |
1.0002 |
1.0001 |
|
|
(0.351) |
(0.377) |
(0.789) |
Land acres × Rural (×100) |
|
1.0006*** |
1.0005*** |
1.0005*** |
|
|
|
|
|
Illiterate |
0.969 |
0.972 |
0.977 |
0.982 |
|
(0.533) |
(0.568) |
(0.646) |
(0.713) |
Education,
self |
0.956*** |
0.957*** |
0.958*** |
0.959*** |
Education,
partner |
0.996 |
0.996 |
0.995 |
0.994* |
|
(0.267) |
(0.220) |
(0.163) |
(0.098) |
Paid
work |
0.956 |
0.962 |
0.967 |
0.969 |
|
(0.174) |
(0.23) |
(0.307) |
(0.337) |
Other
work |
1.09** |
1.07* |
1.07* |
1.06 |
|
(0.030) |
(0.059) |
(0.077) |
(0.134) |
Media
exposure |
0.944* |
0.945* |
0.943* |
0.939* |
|
(0.082) |
(0.089) |
(0.076) |
(0.057) |
Religion: Ref. Hindu |
|
|
|
|
Muslim |
0.814*** |
0.820*** |
0.821*** |
0.829 |
Sikh |
1.35*** |
1.35*** |
1.35*** |
1.33*** |
|
(0.004) |
(0.004) |
(0.005) |
(0.007) |
Christian |
0.649*** |
0.649*** |
0.648*** |
0.647*** |
Other |
0.959 |
0.958 |
0.962 |
0.965 |
|
(0.646) |
(0.641) |
(0.675) |
(0.697) |
Scheduled
caste |
0.979 |
0.987 |
0.992 |
1.0015 |
|
(0.557) |
(0.715) |
(0.822) |
(0.966) |
Scheduled
tribe |
0.767*** |
0.770*** |
0.764*** |
0.766*** |
Age
(respondent) |
0.990*** |
0.990*** |
0.990*** |
0.990*** |
Sons |
1.41*** |
1.41*** |
1.41*** |
1.41*** |
Daughters |
0.826*** |
0.825*** |
0.825*** |
0.825*** |
Sons,
dead |
1.13*** |
1.13*** |
1.13*** |
1.13*** |
Daughters,
dead |
0.973 |
0.973 |
0.973 |
0.973* |
|
(0.279) |
(0.281) |
(0.280) |
(0.276) |
Ideal-total |
2.98*** |
2.98*** |
2.98*** |
2.97*** |
Ideal-total
squared |
0.911*** |
0.911*** |
0.911*** |
0.912*** |
Odd
ideal |
60.2*** |
60.3*** |
60.4*** |
60.2*** |
Fixed
Effects |
|
|
|
|
Urban
Residence |
0.859*** |
0.883*** |
0.906** |
0.940 |
|
|
|
(0.011) |
(0.121) |
Constant |
0.039*** |
0.038*** |
0.037*** |
0.037*** |
State
effects omitted (25) |
|
|
|
|
Random Component (log
variance) |
||||
Household
effect |
0.594** |
0.595** |
0.595** |
0.590** |
|
(0.011) |
(0.011) |
(0.011) |
(0.010) |
Regression
Statistics |
|
|
|
|
LL |
-25,318 |
-25,308 |
-25,303 |
-25,297 |
Akaike Information Criterion |
50,735 |
50,718 |
50,711 |
50,701 |
N |
77886 |
77886 |
77886 |
77886 |
Note: Coefficients reported as odds ratios;
p-values in parentheses (p>|z|), omitted when p<0.001.
*p<.10 ** p<.05 ***p<.01
Following the last question on ideal family size, the NFHS-2 questionnaire included the following questions:
“In your opinion, how much
education should be given to girls these days?” followed by
“In your opinion, how much
education should be given to boys these days?”
Answers to these questions
were used to construct an alternative dependant variable measuring educational
bias. Table S14 presents the raw distribution of answers
Table S14 Raw
distribution of answers on educational preferences
|
Frequency |
Percent |
||
Answer |
Girls |
Boys |
Girls |
Boys |
No education |
830 |
163 |
0.99 |
0.19 |
Less than primary |
660 |
123 |
0.78 |
0.15 |
Primary |
4,264 |
741 |
5.06 |
0.88 |
Middle |
6,461 |
1,983 |
7.67 |
2.35 |
High school |
15,626 |
8,069 |
18.55 |
9.58 |
Higher secondary |
7,320 |
7,564 |
8.69 |
8.98 |
Graduate and above |
6,930 |
9,245 |
8.23 |
10.97 |
Professional degree |
3,605 |
6,425 |
4.28 |
7.63 |
As much as he/she desires |
29,601 |
39,194 |
35.13 |
46.52 |
Depends |
7,269 |
9,366 |
8.63 |
11.12 |
Don't know |
1,686 |
1,379 |
2 |
1.64 |
Total |
84,252 |
84,252 |
100 |
100 |
Answers were converted in approximate years of education (Y) up to Y=12 for higher secondary; Y=14 was used for anything above secondary. Answers “as much as he/she desired” were also given a value of 14. All answers that were exactly the same for boys and girls (including don’t knows and depends) were coded as 0 bias, other “Don’t know” and depends were dropped. The educational bias variable was calculated as
,
where subscript b is for boys and g is for girls. The variable is highly skewed toward more education for boys with less than 1% of the responses indicating higher education for girls (Figure S3). Mean and standard deviations of EduBias are respectively 0.09 and 0.18, the median and mode are zero.
Figure S3 Distribution of the educational bias
variable (NFHS-2)
Although correlation between EduBias and SP is not as
high as one would wish for an alternative dependent variable (r=.12), it is likely to capture a large part of the gender bias expressed
in son preference. The model is run using the same linear multilevel method as
for the SP model. Elasticities for
the variables of interest are compared to SP
elasticities in the article. Table S15 reports full
results on coefficients and regression statistics.
Table S15 Multilevel linear
estimation of stated educational bias: NFHS-2 sample
|
Model |
||||
Level, variables, statistics |
MW-1 |
MWR-1 |
MWR-2 |
MWR-3 |
|
State level: |
|
|
|
|
|
GSP/c |
-0.00184 |
-0.00184 |
-0.00181 |
-0.00184 |
|
(prob.<|z|) |
(0.012) |
(0.012) |
(0.013) |
(0.011) |
|
Household level: |
|
|
|
|
|
W |
-0.109 |
-0.109 |
-0.12 |
-0.124 |
|
WR |
|
|
0.00758 |
|
|
|
|
|
(0.063) |
|
|
WR×No land (×100) |
|
|
|
0.0149 |
|
|
|
|
|
(0.002) |
|
WR×Land (×100) |
|
|
|
0.00643 |
|
|
|
|
|
(0.117) |
|
Land acres × Urban |
|
0.0135 |
0.0135 |
0.0185 |
|
|
|
(0.253) |
(0.253) |
(0.121) |
|
Land acres× Rural |
|
-0.0029 |
-0.0043 |
-0.0011 |
|
|
|
(0.682) |
(0.542) |
(0.877) |
|
Individual level |
|
|
|
|
|
Illiterate |
0.0163 |
0.0163 |
0.0164 |
0.0163 |
|
Education, self |
-0.00088 |
-0.000875 |
-0.000849 |
-0.000881 |
|
|
(0.002) |
(0.002) |
(0.003) |
(0.002) |
|
Education, partner |
-0.00227 |
-0.00228 |
-0.00229 |
-0.00226 |
|
Paid work |
0.0079 |
0.00789 |
0.00802 |
0.00793 |
|
Other work |
0.0130 |
0.0131 |
0.0129 |
0.0133 |
|
Media exposure |
-0.00795 |
-0.00794 |
-0.00805 |
-0.00791 |
|
Religion: Ref. Hindu |
|
|
|
|
|
Muslim |
0.0102 |
0.0102 |
0.0103 |
0.00993 |
|
Sikh |
-0.00186 |
-0.00186 |
-0.00195 |
-0.0013 |
|
|
(0.734) |
(0.735) |
(0.722) |
(0.812) |
|
Christian |
-0.00867 |
-0.00866 |
-0.00864 |
-0.00865 |
|
|
(0.027) |
(0.028) |
(0.028) |
(0.028) |
|
Other |
-0.0157 |
-0.0156 |
-0.0155 |
-0.0156 |
|
|
(0.001) |
(0.001) |
(0.001) |
(0.001) |
|
Scheduled caste |
0.0062 |
0.00618 |
0.00634 |
0.00597 |
|
|
(0.001) |
(0.001) |
(0.001) |
(0.001) |
|
Scheduled tribe |
0.00836 |
0.00834 |
0.00818 |
0.00809 |
|
|
(0.002) |
(0.002) |
(0.002) |
(0.002) |
|
Age (respondent) |
-0.00028 |
-0.000276 |
-0.000274 |
-0.000276 |
|
|
(0.001) |
(0.002) |
(0.002) |
(0.002) |
|
Sons |
0.00348 |
0.00348 |
0.00346 |
0.00347 |
|
Daughters |
0.00625 |
0.00625 |
0.00624 |
0.00624 |
|
Fixed Effects |
|
|
|
|
|
Region (Ref. East) |
|
|
|
|
|
North |
0.0537 |
0.0538 |
0.0542 |
0.0545 |
|
|
(0.001) |
(0.001) |
(0.001) |
(0.001) |
|
Central & West |
0.0755 |
0.0755 |
0.0755 |
0.0756 |
|
South |
0.0192 |
0.0193 |
0.0194 |
0.0192 |
|
|
(0.275) |
(0.275) |
(0.270) |
(0.270) |
|
Urban Residence |
-0.0169 |
-0.0172 |
-0.0158 |
-0.0173 |
|
Constant |
0.14 |
0.14 |
0.14 |
0.141 |
|
Random Components
(Standard Deviations) |
|
||||
Level 1: State |
0.029*** |
0.029*** |
0.029*** |
0.029*** |
|
Level 2: PSU |
0.043*** |
0.043*** |
0.043*** |
0.043*** |
|
Level 3: Household |
0.071*** |
0.071*** |
0.071*** |
0.071*** |
|
Residual error |
0.141*** |
0.141*** |
0.141*** |
0.141*** |
|
Regression Statistics |
|
|
|
|
|
Akaike Information
Criterion |
-60443 |
-60440 |
-60642 |
-60448 |
|
Nested groups
(unbalanced) States Local areas (PSUs) Households |
26 3,127 65,123 |
26 3,127 65,123 |
26 3,127 65,123 |
26 3,127 65,123 |
|
N |
74,168 |
74,168 |
74,168 |
74,168 |
|
note: p-values
in parentheses (p>|z|), omitted when p<.001.
*p<.1 ** p<.05 ***p<.01
The arguments linking son-preference to relative wealth via issues related to traditions of marriage depend greatly on the geographical reach of these marriages. In a world of perfect mobility with geographically unlimited marriage searches, there should be no effect of relative wealth, at least explained through issues of marriage, as the effect would be confounded with that of absolute wealth. Although the empirical analysis finds that relative wealth does have a significant effect, the role of each theoretical argument in explaining the magnitude of the relative wealth effect cannot be identified using the model developed here (possibly providing some direction for future research).
a. Evidence from
anthropological and sociological literatures. What do we know about the
geographical reach of marriages in
Klass
(1966) provides anthropological evidence on marriage in West Bengal where, as
opposed to other parts of
“[...] while the villager considers himself free to
choose any locality, comparatively few marriages are contracted within the same
village or between families of very distant villages. The majority of marriages
are arranged with families of villages other than one's own, but within a
radius of about five to ten miles.”(
Klass 1961: 961)
Dutt,
Noble, and Davgun (1981) report a radius of 25 miles for 80% of marriages in
two village communities in Punjab (
Rosenzweig
and Stark (1989) show that distant marriages are not a feature of richer
families but a way to diversify risk. To test their hypothesis, they use a
panel of six farm villages in three different agroclimatic regions in the
semi-arid tropics of
Babu and Naidu (1992) report mean marriage distances in three endogamous caste populations in Andhra Pradesh ranging between 17 and 40 kilometers (approximately 10 to 25 miles).
From rules of village and kin exogamy/endogamy, one could deduct that marriage distance must be lower in areas with endogamous rules (mostly the South) and smaller in areas with exogamous rules (North). However, Dalmia and Lawrence (2005) find no difference in average distance of marriage migration between the two regions.
b. Dynamics. As mentioned above, marriage rules in
References
Babu, B.V. and J. M. Nadu. 1992. Marriage distance
among four caste population in Andhra Pradesh. Man in
Barber, Jennifer S. 2004. “Community Social Context
and Individualistic Attitudes toward Marriage” Social Psychology Quarterly 67(3): 236-256.
Dutt, A. K., A. G. Noble , and S. K. Davgun. 1981. Socio-Economic Factors
Affecting Marriage Distance in Two Sikh Villages of
Klass, Morton. 1966. “Marriage Rules in
Rosenzweig, Mark R.
and Oded Stark. 1989. Consumption Smoothing, Migration, and Marriage:
Evidence from Rural