Midterm Solutions
Suggested answers
b, c, f, g -
The
blizzard_salary
dataset has 409 rows.The
percent_incr
variable is numerical and continuous.The
salary_type
variable is categorical.
Figure 1 - A dodged histogram makes it easier to compare summary statistics for the variable on the x-axis.
c - It’s a value higher than the median for hourly but lower than the mean for salaried.
b - There is more variability around the mean compared to the hourly distribution.
a, b, e - Pie charts and waffle charts are for visualizing distributions of categorical data only. Scatterplots are for visualizing the relationship between two numerical variables.
c -
.Categorical()
is used to create or modify a categorical variable.a -
"Poor", "Successful", "High", "Top"
b - Option 2. The plot in Option 1 shows the number of employees with a given performance rating for each salary type while the plot in Option 2 gives the proportion of employees with a given performance rating for each salary type. In order to assess the relationship between these variables (e.g., how much more likely is a Top rating among Salaried vs. Hourly workers), we need the proportions, not the counts.
There may be some
NaN
s in these two variables that are not visible in the plot.The proportions under Hourly would go in the Hourly bar, and those under Salaried would go in the Salaried bar.
c -
blizzard_salary[(blizzard_salary['salary_type'] != "Hourly") & (blizzard_salary['performance_rating'] == "Poor")]
- There are 5 observations for “not Hourly” “and” Poor.a -
.sort_values()
- The result is arranged in increasing order ofannual_salary
, which is the default for.sort_values()
.c, d, e, f.
Part 1: The following should be fixed:
There should be a
|
after#
beforelabel
There should be a
:
after label, not=
There shouldn’t be a space in the chunk label, it should be
plot-blizzard
There should be spaces after commas in the code
plt.show()
should always be used to remove the plot from the working memory.
Part 2: Add a category for missing values in
performance_rating
:{python}} blizzard_salary['performance_rating'] = blizzard_salary['performance_rating'].fillna('Missing')
Part 1:
- Render: Run all of the code and render all of the text in the document and produce an output.
- Commit: Take a snapshot of your changes in Git with an appropriate message.
- Push: Send your changes off to GitHub.
Part 2: c - Rendering or committing isn’t sufficient to send your changes to your GitHub repository, a push is needed. A pull is also not needed to view the changes in the browser.