Midterm Solutions

Suggested answers

  1. b, c, f, g -

    • The blizzard_salary dataset has 409 rows.

    • The percent_incr variable is numerical and continuous.

    • The salary_type variable is categorical.

  2. Figure 1 - A dodged histogram makes it easier to compare summary statistics for the variable on the x-axis.

  3. c - It’s a value higher than the median for hourly but lower than the mean for salaried.

  4. b - There is more variability around the mean compared to the hourly distribution.

  5. a, b, e - Pie charts and waffle charts are for visualizing distributions of categorical data only. Scatterplots are for visualizing the relationship between two numerical variables.

  6. c - .Categorical() is used to create or modify a categorical variable.

  7. a - "Poor", "Successful", "High", "Top"

  8. b - Option 2. The plot in Option 1 shows the number of employees with a given performance rating for each salary type while the plot in Option 2 gives the proportion of employees with a given performance rating for each salary type. In order to assess the relationship between these variables (e.g., how much more likely is a Top rating among Salaried vs. Hourly workers), we need the proportions, not the counts.

  9. There may be some NaNs in these two variables that are not visible in the plot.

  10. The proportions under Hourly would go in the Hourly bar, and those under Salaried would go in the Salaried bar.

  11. c - blizzard_salary[(blizzard_salary['salary_type'] != "Hourly") & (blizzard_salary['performance_rating'] == "Poor")] - There are 5 observations for “not Hourly” “and” Poor.

  12. a - .sort_values() - The result is arranged in increasing order of annual_salary, which is the default for .sort_values().

  13. c, d, e, f.

  14. Part 1: The following should be fixed:

    • There should be a | after # before label

    • There should be a : after label, not =

    • There shouldn’t be a space in the chunk label, it should be plot-blizzard

    • There should be spaces after commas in the code

    • plt.show() should always be used to remove the plot from the working memory.

    Part 2: Add a category for missing values in performance_rating:

    {python}} blizzard_salary['performance_rating'] = blizzard_salary['performance_rating'].fillna('Missing')

  15. Part 1:

    1. Render: Run all of the code and render all of the text in the document and produce an output.
    2. Commit: Take a snapshot of your changes in Git with an appropriate message.
    3. Push: Send your changes off to GitHub.

    Part 2: c - Rendering or committing isn’t sufficient to send your changes to your GitHub repository, a push is needed. A pull is also not needed to view the changes in the browser.