Midterm Solutions
Suggested answers
b, c, f, g -
The
blizzard_salarydataset has 409 rows.The
percent_incrvariable is numerical and continuous.The
salary_typevariable is categorical.
Figure 1 - A dodged histogram makes it easier to compare summary statistics for the variable on the x-axis.
c - It’s a value higher than the median for hourly but lower than the mean for salaried.
b - There is more variability around the mean compared to the hourly distribution.
a, b, e - Pie charts and waffle charts are for visualizing distributions of categorical data only. Scatterplots are for visualizing the relationship between two numerical variables.
c -
.Categorical()is used to create or modify a categorical variable.a -
"Poor", "Successful", "High", "Top"b - Option 2. The plot in Option 1 shows the number of employees with a given performance rating for each salary type while the plot in Option 2 gives the proportion of employees with a given performance rating for each salary type. In order to assess the relationship between these variables (e.g., how much more likely is a Top rating among Salaried vs. Hourly workers), we need the proportions, not the counts.
There may be some
NaNs in these two variables that are not visible in the plot.The proportions under Hourly would go in the Hourly bar, and those under Salaried would go in the Salaried bar.
c -
blizzard_salary[(blizzard_salary['salary_type'] != "Hourly") & (blizzard_salary['performance_rating'] == "Poor")]- There are 5 observations for “not Hourly” “and” Poor.a -
.sort_values()- The result is arranged in increasing order ofannual_salary, which is the default for.sort_values().c, d, e, f.
Part 1: The following should be fixed:
There should be a
|after#beforelabelThere should be a
:after label, not=There shouldn’t be a space in the chunk label, it should be
plot-blizzardThere should be spaces after commas in the code
plt.show()should always be used to remove the plot from the working memory.
Part 2: Add a category for missing values in
performance_rating:{python}} blizzard_salary['performance_rating'] = blizzard_salary['performance_rating'].fillna('Missing')Part 1:
- Render: Run all of the code and render all of the text in the document and produce an output.
- Commit: Take a snapshot of your changes in Git with an appropriate message.
- Push: Send your changes off to GitHub.
Part 2: c - Rendering or committing isn’t sufficient to send your changes to your GitHub repository, a push is needed. A pull is also not needed to view the changes in the browser.