The chi-square results are produced below. Although the negligible p-value suggests a strong dependence between race and the decision to lay off an employee, this data alone is insufficient from which to conclude that discrimination occurred. The chi-square test cannot take into account factors other than race which may have contributed to the decision to lay off an employee. For example, if seniority in the firm is highly correlated with race (so that employees with longer tenure with the company tend to be more likely to be white) then decisions that are based on employee seniority may in fact appear to be based on race.
Chi-Square Test
Expected counts are printed below observed counts
white black
Total
1 1051 113 1164
1036.58 127.42
2 31 20 51
45.42 5.58
Total 1082 133
1215
Chi-Sq = 0.201 + 1.631 +
4.577 + 37.232 = 43.641
DF = 1, P-Value = 0.000
(There are several
ways to do this; below is just one example.)
Tabulated Statistics
Count
(Expected
Frequency)
|
Active |
Terminated |
Total |
Age < 40 |
18 (14.09) |
7 (10.91) |
25 |
Age ³ 40 |
13 (16.91) |
17 (13.09) |
30 |
Total |
31 |
24 |
55 |
Chi-Square =
4.556, DF = 1, P-Value = 0.033
The chi-square analysis indicates that, were age unrelated to the decision to terminate an employee, the expected number of employees under age 40 who would be terminated is nearly 11, while the expected number over age 40 would be 13. Instead, only 7 employees under the age of 40 were terminated, while 17 over the age of 40 were terminated. The p-value of 0.033 means that the probability of this outcome, if in fact the decisions were made at random (that is, if the decision to terminate was independent of the employee’s age) is only 0.033.
Tabulated Statistics
Count
(Expected
Frequency)
|
Active |
Terminated |
Total |
Wages < mean |
22 15.22 |
5 11.78 |
27 |
Wages ³ mean |
9 15.78 |
19 12.22 |
28 |
Total |
31 |
24 |
55 |
Chi-Square =
13.605, DF = 1, P-Value = 0.000
The above chi-square analysis shows that the real relationship is between wages and status.
Which is more convincing:
Clearly the
plaintiffs have the better case here, at least given the data that we
have. The notion that termination
decisions were made based on salaries rather than age does not pass the
proverbial “giggle test”, since salaries are highly correlated to age (the
correlation coefficient here is 0.964 between the two). Furthermore, they specifically claimed that
the decisions were made randomly.
“Wringing” the
Bell Curve (15 points)
a. Comment on each of the problems –
Problem 1: If there are interactions between the independent variables included in the regression that are not accounted for in the model, the b’s and associated significance values cannot properly be interpreted. For example, IQ, socioeconomic status, and age may very well interact with one another in producing an effect on income.
Problem 2: The positive value of the b coefficient on the IQ variable tells us something about an association between IQ and income. Such an association cannot properly be interpreted as evidence of causality. It may be that IQ is positively correlated to other factors not included in the model (ie, drive or ambition) that is in fact what leads to the higher value of income.
Problem 3: By normalizing the predictor variable IQ, the relationship between IQ and the outcome variable could be affected. That is, a relationship might “artificially” appear to exist when in fact it does not. (This is really a very picky point, though.)
Problem 4: If educational attainment and IQ are collinear – as seems likely – then the choice to include one but not the other in the model ignores the effect that the unincluded variable has on the dependent variable. That is, the “true” effects of the variables overlap, but by including only one of the variables, the effects of both are attributed to the one that is included.
b. Propose a different model –
Different specifications of the
model might include interaction effects and test for the significance of the
coefficients on the interaction variables.
Also, including level of education in the model and testing significance
of the coefficient with and without including IQ in the model would be useful. Any model should be tested for its overall
usefulness in predicting income via an F-test.
Chattergee (10 points)
Regression
Analysis
The regression equation is
LogUsage = 4.31 - 1.60 LogTemp
Predictor Coef StDev T P
Constant 4.3083 0.1819 23.69 0.000
LogTemp -1.5989 0.1059 -15.09 0.000
S = 0.1051 R-Sq = 81.1% R-Sq(adj) = 80.8%
Analysis of Variance
Source DF SS MS F P
Regression 1 2.5167 2.5167 227.77 0.000
Residual Error 53 0.5856 0.0110
Total 54 3.1023
Unusual Observations
Obs LogTemp LogUsage Fit StDev
Fit Residual St Resid
3 1.76 1.2858 1.5009 0.0149
-0.2152 -2.07R
45 1.74 1.2868 1.5257 0.0145
-0.2389 -2.29R
47 1.86 1.0176 1.3387 0.0210
-0.3211 -3.12R
54 1.38 2.0056 2.1016 0.0378 -0.0960 -0.98 X
R denotes an observation with a large standardized residual
X denotes an observation whose X value gives it large influence.
Predicted Values
Fit StDev Fit 95.0%
CI 95.0% PI
1.5919 0.0142 ( 1.5634,
1.6205) ( 1.3792,
1.8047)
LogUsage = 1.5919 Þ Usage = 39.075 kwh
11.27 (30 points)
a. E(y) = b0+b1x1+b2x2+b3x3+b4x4+b5x5
b0 is the
constant term. b1 gives the
additional salary attributable to being male; b2 the additional
salary attributable to being white; b3 the additional salary per year of education;
b4
the additional salary per year with the firm; and b5 the additional
salary per hour worked per week.
b.
Interpret b’s as above.
c.
R2 = 0.240 suggests that approximately 24% of the
variation in salaries is explained by the variation in the independent
variables included in the model.
For a=0.05, Fa= 2.42, so reject H0. (Alternatively, p=0.0000...)
d.
H0: b1=0, HA: b1>0.
p = 0.025, so we will reject H0.
e.
A discrepancy in the salary figures alone is not enough to
conclude that the difference stems from gender discrimination. For example, if women in the sample are
generally less educated than the men in the sample, or have been at the firm
for fewer years, we would expect them to receive lower salaries for these
reasons. This would not be indicative
of discrimination. By controlling for
these factors, we can make inferences about the salary discrepancies that
“control” for these other influences.
f.
If gender and tenure with the firm interact, the b
coefficients on these variables will be biased. That is, the model does not account for the effect of a change in
one of these variables being a function of the other variable.