Final Assignment

Suggested answers

  1. Take a random sample of size 25, with replacement, from the original sample. Calculate the proportion of students in this simulated sample who work 5 or more hours. Repeat this process 1000 times to build the bootstrap distribution. Take the middle 95% of this distribution to construct a 95% confidence interval for the true proportion of statistics majors who work 5 or more hours.

  2. The exact 95% CI is (40%, 80%). Answers reasonably close to the upper and lower bounds would be accepted.

  3. (e) None of the above. The correct interpretation is “We are 95% confident that 40% to 80% of statistics majors work at least 5 hours per week.”

  4. (c) For every additional $1,000 of annual salary, the model predicts the raise to be higher, on average, by 0.016%.

  5. R2 of raise_2_fit is higher than R2 of raise_1_fit since raise_2_fit has one more predictor and R2 always

  6. The reference level of performance_rating is High, since it’s the first level alphabetically. Therefore, the coefficient -2.40% is the predicted difference in raise comparing High to Successful. In this context a negative coefficient makes sense since we would expect those with High performance rating to get higher raises than those with Successful performance.

  7. (a) “Poor”, “Successful”, “High”, “Top”.

  8. Option 3. It’s a linear model with no interaction effect, so parallel lines. And since the slope for salary_typeSalaried is positive, its intercept is higher. The equations of the lines are as follows:

    • Hourly:

      percent_incr^=1.24+0.000014×annual_salary+0.913salary_typeSalaried=1.24+0.000014×annual_salary+0.913×0=1.24+0.000014×annual_salary

    • Salaried:

      percent_incr^=1.24+0.0000137×annual_salary+0.913salary_typeSalaried=1.24+0.0000137×annual_salary+0.913×1=2.153+0.0000137×annual_salary

  9. A parsimonious model is the simplest model with the best predictive performance.

  10. (c) The exponentiated coefficient (6.502427) represents the factor by which the percentage increase is higher for Successful ratings compared to Poor ratings.\/(a) and (d).

  11. (a) and (d).

  12. Let u(x)=sin(x2)+cos(ax). Then, g(x)=[u(x)]k.

    Using the chain rule, we get:

    g(x)=k[u(x)]k1u(x)

    Now, we need to compute u(x):

    u(x)=sin(x2)+cos(ax)

    Using the chain rule for each term:

    ddxsin(x2)=cos(x2)2x][ddxcos(ax)=sin(ax)a

    Thus,

    u(x)=2xcos(x2)asin(ax)

    Combining these results:

    g(x)=k(sin(x2)+cos(ax))k1(2xcos(x2)asin(ax))

  13. We can split the integral into two separate integrals:

    abecx,dx+ab1xn,dx

    1. Integral of ecx:

    ecx,dx=1cecx

    Thus,

    abecx,dx=1c[ecx]ab=1c(ecbeca)

    1. Integral of 1xn:

    For n1,

    1xn,dx=xn,dx=xn+1n+1=11nx1n

    Thus,

    ab1xn,dx=11n[x1n]ab=11n(b1na1n)

    Combining these results:

    ab(ecx+1xn)dx=1c(ecbeca)+11n(b1na1n)

  14. The transpose of the vector y is:

    x=[x1x2x3x4]

  15. The transpose of the matrix N is: N=[n11n21n31n41n12n22n32n42]

  16. Solution parts:

    1. The dimensions of C are 3×2.
    2. The dimensions of D are 2×3.
    3. For the matrix product CD:
      1. The product is valid because the number of columns in C (which is 2) is equal to the number of rows in D (which is 2).
      2. The dimensions of the resulting matrix CD will be 3×3 (the number of rows of C by the number of columns of D).
  17. Solutions:

    1. The dimensions of E are 3×2.

    2. The dimensions of F are 2×1.

    3. For the matrix product EF:

      1. The product is valid because the number of columns in E (which is 2) is equal to the number of rows in F (which is 2).
      2. The resulting matrix EF is computed as follows:

      EF=[e11e12e21e22e31e32][f11f21]=[e11f11+e12f21e21f11+e22f21e31f11+e32f21]

    The resulting matrix EF has dimensions 3×1.