- Schools that spend between 585-615 dollars per student have a better percent passing reading, percent passing math, and percent overall passing rate than schools that spend more than 615 dollars per student. Also, those schools that spend less per student are smaller schools with 2,000 students or less.
- It is also found that the schools with 2000 students or less have a higher percent passing reading, percent passing math, and percent overall passing rate than larger schools that have 2000 – 5000 students. Furthermore, the smaller the school, the less they spend per student.
- With that being said,Charter schools have a higher percent passing reading, percent passing math, and percent overall passing rate than District schools. Thus being the top five performing schools based on percent passing rates, whereas the bottom five performing schools based on percent passing rate are District schools.
- In all 15 schools, the Average Reading Score is higher than the Average Math Score.
- Lasly, for the top 5 performing schools, the percent passing math is about 20 percent lower than the percent passing reading.
# Dependencies
import pandas as pd
import numpy as np
# The path to our csv file
schools_complete = "Resources/schools_complete.csv"
students_complete = "Resources/students_complete.csv"
# create dataframe using mapping
schools_complete_df = pd.read_csv(schools_complete)
students_complete_df = pd.read_csv(students_complete)
schools_complete_df.head()
students_complete_df.head()
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
</style>
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
Student ID | name | gender | grade | school | reading_score | math_score | |
---|---|---|---|---|---|---|---|
0 | 0 | Paul Bradley | M | 9th | Huang High School | 66 | 79 |
1 | 1 | Victor Smith | M | 12th | Huang High School | 94 | 61 |
2 | 2 | Kevin Rodriguez | M | 12th | Huang High School | 90 | 60 |
3 | 3 | Dr. Richard Scott | M | 12th | Huang High School | 67 | 58 |
4 | 4 | Bonnie Ray | F | 9th | Huang High School | 97 | 84 |
# finding total schools
total_schools = len(schools_complete_df['School ID'])
# The total number of students
total_students = schools_complete_df['size'].sum()
# Get the total budget
total_budget = schools_complete_df['budget'].sum()
# Get the Average Math Score
average_math_score = round(students_complete_df['math_score'].mean(),6)
# Get the Average Reading Score
average_reading_score = round(students_complete_df['reading_score'].mean(),5)
# Get the percentage of students passing math
students_passing_math = students_complete_df.loc[students_complete_df["math_score"] > 70,:]
percent_passing_math = round(float(students_passing_math['math_score'].count()/total_students)*100,6)
# Get the percentage of students passing reading
students_passing_reading = students_complete_df.loc[students_complete_df["reading_score"] >70,:]
percent_passing_reading = round(float(students_passing_reading["reading_score"].count()/total_students)*100,6)
# Get the overall passing rate for math and reading
overall_passing_rate =round((percent_passing_math + percent_passing_reading)/2,6)
# Create a disctric Summary Dataframe
summary_df = pd.DataFrame({"Total Schools":[total_schools],
"Total Students":[total_students],
"Total Budget":[total_budget],
"Average Math Score":[average_math_score],
"Average Reading Score":[average_reading_score],
"Percent Passing Math":[percent_passing_math],
"Percent Passing Reading":[percent_passing_reading],
"Percent Overall Passing Rate":[overall_passing_rate]})
# summary_df
# to put it in the right format
district_summary_df = pd.DataFrame(summary_df, columns=["Total Schools",
"Total Students",
"Total Budget",
"Average Math Score",
"Average Reading Score",
"Percent Passing Math",
"Percent Passing Reading",
"Percent Overall Passing Rate"])
district_summary_df["Total Students"] = district_summary_df["Total Students"].map('{:,}'.format)
district_summary_df["Total Budget"] = district_summary_df["Total Budget"].map('${:,.2f}'.format)
district_summary_df
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
</style>
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
Total Schools | Total Students | Total Budget | Average Math Score | Average Reading Score | Percent Passing Math | Percent Passing Reading | Percent Overall Passing Rate | |
---|---|---|---|---|---|---|---|---|
0 | 15 | 39,170 | $24,649,428.00 | 78.985371 | 81.87784 | 72.392137 | 82.971662 | 77.6819 |
# Change header name from the dataframe from name to school
schools_complete_df = schools_complete_df.rename(columns={"name": "school"})
# schools_complete_df.columns
# Merge the two data frames together on = school
schools_complete_data_df = pd.merge(schools_complete_df, students_complete_df, on='school')
#testing
# schools_complete_data_df.head()
# count number of students in each school
student_count = schools_complete_data_df['school'].value_counts()
# testing
# student_count.head()
# changing the school type into a str
school_type = schools_complete_data_df.groupby('school')['type'].unique()
school_type = school_type.str[0]
#testing
# school_type
# total budget for each school
budget_per_school = schools_complete_data_df.groupby('school')['budget'].count()
# budget_per_school.head()
# testing
# budget_per_school
# budget for each student
budget_per_student = schools_complete_df.set_index('school')['budget']/schools_complete_df.set_index('school')['size']
# budget_per_student.head()
# testing
# budget_per_student
# Average math and readin scores for each school
school_average_math = round(schools_complete_data_df.groupby('school')['math_score'].mean(),2)
school_average_reading = round(schools_complete_data_df.groupby('school')['reading_score'].mean(),2)
# dataframe with reading and math passing only scores
passing_df = schools_complete_data_df.loc[(schools_complete_data_df['math_score'] >=70)
& (schools_complete_data_df['reading_score'] >=70)]
passing_math_df = schools_complete_data_df.loc[(schools_complete_data_df['math_score'] >=70)]
passing_reading_df = schools_complete_data_df.loc[(schools_complete_data_df['reading_score'] >=70)]
# testing
# passing_math_df.head()
# passing_reading_df.head()
# percentage for students passing math and reading
percent_passing_math = round((passing_math_df.groupby('school')['math_score'].count()/student_count)*100,1)
percent_passing_reading = round((passing_reading_df.groupby('school')['reading_score'].count()/student_count)*100,1)
# percent_passing_reading
# percent_passing_math
# overall passing percentage
overall_passing_percent = round((percent_passing_math + percent_passing_reading)/2, 2)
# overall_passing_percent
# school summary dataframe
school_summary_df = pd.DataFrame({"School Type":school_type,
"Total Students":student_count,
"Total School Budget":budget_per_school,
"Budget Per Student":budget_per_student,
"Average Math Score": school_average_math,
"Average Reading Score": school_average_reading,
"% Passing Math": percent_passing_math,
"% Passing Reading": percent_passing_reading,
"% Overall Passing Rate":overall_passing_percent},columns=["School Type",
"Total Students",
"Total School Budget",
"Budget Per Student",
"Average Math Score",
"Average Reading Score",
"% Passing Math",
"% Passing Reading",
"% Overall Passing Rate"])
#testing
# school_summary_df.head()
school_summary_df["Total Students"] = school_summary_df["Total Students"].map('{:,}'.format)
school_summary_df["Total School Budget"] = school_summary_df["Total School Budget"].map('${:,.2f}'.format)
school_summary_df["Budget Per Student"] = school_summary_df["Budget Per Student"].map('${:,.2f}'.format)
school_summary_df
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
</style>
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
School Type | Total Students | Total School Budget | Budget Per Student | Average Math Score | Average Reading Score | % Passing Math | % Passing Reading | % Overall Passing Rate | |
---|---|---|---|---|---|---|---|---|---|
Bailey High School | District | 4,976 | $4,976.00 | $628.00 | 77.05 | 81.03 | 66.7 | 81.9 | 74.30 |
Cabrera High School | Charter | 1,858 | $1,858.00 | $582.00 | 83.06 | 83.98 | 94.1 | 97.0 | 95.55 |
Figueroa High School | District | 2,949 | $2,949.00 | $639.00 | 76.71 | 81.16 | 66.0 | 80.7 | 73.35 |
Ford High School | District | 2,739 | $2,739.00 | $644.00 | 77.10 | 80.75 | 68.3 | 79.3 | 73.80 |
Griffin High School | Charter | 1,468 | $1,468.00 | $625.00 | 83.35 | 83.82 | 93.4 | 97.1 | 95.25 |
Hernandez High School | District | 4,635 | $4,635.00 | $652.00 | 77.29 | 80.93 | 66.8 | 80.9 | 73.85 |
Holden High School | Charter | 427 | $427.00 | $581.00 | 83.80 | 83.81 | 92.5 | 96.3 | 94.40 |
Huang High School | District | 2,917 | $2,917.00 | $655.00 | 76.63 | 81.18 | 65.7 | 81.3 | 73.50 |
Johnson High School | District | 4,761 | $4,761.00 | $650.00 | 77.07 | 80.97 | 66.1 | 81.2 | 73.65 |
Pena High School | Charter | 962 | $962.00 | $609.00 | 83.84 | 84.04 | 94.6 | 95.9 | 95.25 |
Rodriguez High School | District | 3,999 | $3,999.00 | $637.00 | 76.84 | 80.74 | 66.4 | 80.2 | 73.30 |
Shelton High School | Charter | 1,761 | $1,761.00 | $600.00 | 83.36 | 83.73 | 93.9 | 95.9 | 94.90 |
Thomas High School | Charter | 1,635 | $1,635.00 | $638.00 | 83.42 | 83.85 | 93.3 | 97.3 | 95.30 |
Wilson High School | Charter | 2,283 | $2,283.00 | $578.00 | 83.27 | 83.99 | 93.9 | 96.5 | 95.20 |
Wright High School | Charter | 1,800 | $1,800.00 | $583.00 | 83.68 | 83.96 | 93.3 | 96.6 | 94.95 |
# the top schools ( only top 5 )
top_performing_schools= school_summary_df.sort_values("% Overall Passing Rate", ascending=False, inplace=False)
top_performing_schools.head()
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
</style>
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
School Type | Total Students | Total School Budget | Budget Per Student | Average Math Score | Average Reading Score | % Passing Math | % Passing Reading | % Overall Passing Rate | |
---|---|---|---|---|---|---|---|---|---|
Cabrera High School | Charter | 1,858 | $1,858.00 | $582.00 | 83.06 | 83.98 | 94.1 | 97.0 | 95.55 |
Thomas High School | Charter | 1,635 | $1,635.00 | $638.00 | 83.42 | 83.85 | 93.3 | 97.3 | 95.30 |
Griffin High School | Charter | 1,468 | $1,468.00 | $625.00 | 83.35 | 83.82 | 93.4 | 97.1 | 95.25 |
Pena High School | Charter | 962 | $962.00 | $609.00 | 83.84 | 84.04 | 94.6 | 95.9 | 95.25 |
Wilson High School | Charter | 2,283 | $2,283.00 | $578.00 | 83.27 | 83.99 | 93.9 | 96.5 | 95.20 |
# only showing bottom 5
Bottom_performing_schools= school_summary_df.sort_values("% Overall Passing Rate", ascending=True, inplace=False)
Bottom_performing_schools.head()
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
</style>
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
School Type | Total Students | Total School Budget | Budget Per Student | Average Math Score | Average Reading Score | % Passing Math | % Passing Reading | % Overall Passing Rate | |
---|---|---|---|---|---|---|---|---|---|
Rodriguez High School | District | 3,999 | $3,999.00 | $637.00 | 76.84 | 80.74 | 66.4 | 80.2 | 73.30 |
Figueroa High School | District | 2,949 | $2,949.00 | $639.00 | 76.71 | 81.16 | 66.0 | 80.7 | 73.35 |
Huang High School | District | 2,917 | $2,917.00 | $655.00 | 76.63 | 81.18 | 65.7 | 81.3 | 73.50 |
Johnson High School | District | 4,761 | $4,761.00 | $650.00 | 77.07 | 80.97 | 66.1 | 81.2 | 73.65 |
Ford High School | District | 2,739 | $2,739.00 | $644.00 | 77.10 | 80.75 | 68.3 | 79.3 | 73.80 |
from pandas.api.types import CategoricalDtype
#Reset the grade order in the original students data frame "students_complete_df".
students_complete_df["grade"] = students_complete_df['grade'].astype(CategoricalDtype(["9th", "10th","11th","12th"]))
# students_complete_df
math_scores_grade = round(students_complete_df.pivot_table(index="school", columns="grade", values="math_score"),2)
math_scores_grade.index.name = None
math_scores_grade
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
</style>
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
grade | 9th | 10th | 11th | 12th |
---|---|---|---|---|
Bailey High School | 77.08 | 77.00 | 77.52 | 76.49 |
Cabrera High School | 83.09 | 83.15 | 82.77 | 83.28 |
Figueroa High School | 76.40 | 76.54 | 76.88 | 77.15 |
Ford High School | 77.36 | 77.67 | 76.92 | 76.18 |
Griffin High School | 82.04 | 84.23 | 83.84 | 83.36 |
Hernandez High School | 77.44 | 77.34 | 77.14 | 77.19 |
Holden High School | 83.79 | 83.43 | 85.00 | 82.86 |
Huang High School | 77.03 | 75.91 | 76.45 | 77.23 |
Johnson High School | 77.19 | 76.69 | 77.49 | 76.86 |
Pena High School | 83.63 | 83.37 | 84.33 | 84.12 |
Rodriguez High School | 76.86 | 76.61 | 76.40 | 77.69 |
Shelton High School | 83.42 | 82.92 | 83.38 | 83.78 |
Thomas High School | 83.59 | 83.09 | 83.50 | 83.50 |
Wilson High School | 83.09 | 83.72 | 83.20 | 83.04 |
Wright High School | 83.26 | 84.01 | 83.84 | 83.64 |
#Reset the grade order in the original students data frame "students_complete_df".
students_complete_df['grade'] = students_complete_df['grade'].astype(CategoricalDtype(["9th", "10th","11th","12th"]))
# students_complete_df
reading_scores_grade = round(students_complete_df.pivot_table(index="school", columns="grade", values="reading_score"),2)
reading_scores_grade.index.name = None
reading_scores_grade
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
</style>
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
grade | 9th | 10th | 11th | 12th |
---|---|---|---|---|
Bailey High School | 81.30 | 80.91 | 80.95 | 80.91 |
Cabrera High School | 83.68 | 84.25 | 83.79 | 84.29 |
Figueroa High School | 81.20 | 81.41 | 80.64 | 81.38 |
Ford High School | 80.63 | 81.26 | 80.40 | 80.66 |
Griffin High School | 83.37 | 83.71 | 84.29 | 84.01 |
Hernandez High School | 80.87 | 80.66 | 81.40 | 80.86 |
Holden High School | 83.68 | 83.32 | 83.82 | 84.70 |
Huang High School | 81.29 | 81.51 | 81.42 | 80.31 |
Johnson High School | 81.26 | 80.77 | 80.62 | 81.23 |
Pena High School | 83.81 | 83.61 | 84.34 | 84.59 |
Rodriguez High School | 80.99 | 80.63 | 80.86 | 80.38 |
Shelton High School | 84.12 | 83.44 | 84.37 | 82.78 |
Thomas High School | 83.73 | 84.25 | 83.59 | 83.83 |
Wilson High School | 83.94 | 84.02 | 83.76 | 84.32 |
Wright High School | 83.83 | 83.81 | 84.16 | 84.07 |
# Create the bins ( school spending )
spending_bins = [0, 585, 615, 645, 675]
# Names for the bins
spending_range = ["<$585", "$585-615", "$615-645", "$645-675"]
#schools_spending_df = school_summary_df
school_summary_df["Spending Ranges(Per Student)"] = pd.cut(budget_per_student, bins=spending_bins, labels=spending_range)
school_summary_df.head()
spending_math_score = school_summary_df.groupby(["Spending Ranges(Per Student)"])["Average Math Score"].mean()
spending_reading_score = school_summary_df.groupby(["Spending Ranges(Per Student)"])["Average Reading Score"].mean()
spending_passing_math = school_summary_df.groupby(["Spending Ranges(Per Student)"])["% Passing Math"].mean()
spending_passing_reading = school_summary_df.groupby(["Spending Ranges(Per Student)"])['% Passing Reading'].mean()
overall_passing_rate = (spending_passing_math + spending_passing_reading)/2
# Creating dataframe
scores_by_spending = pd.DataFrame ({"Average Math Score": spending_math_score,
"Average Reading Score": spending_reading_score,
"% Passing Math": spending_passing_math,
"% Passing Reading": spending_passing_reading,
"Overall Passing Rate": overall_passing_rate})
scores_by_spending
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
</style>
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
% Passing Math | % Passing Reading | Average Math Score | Average Reading Score | Overall Passing Rate | |
---|---|---|---|---|---|
Spending Ranges(Per Student) | |||||
<$585 | 93.450000 | 96.600000 | 83.452500 | 83.935000 | 95.025000 |
$585-615 | 94.250000 | 95.900000 | 83.600000 | 83.885000 | 95.075000 |
$615-645 | 75.683333 | 86.083333 | 79.078333 | 81.891667 | 80.883333 |
$645-675 | 66.200000 | 81.133333 | 76.996667 | 81.026667 | 73.666667 |
# Create the bins ( school size)
size_bins = [0, 1000, 2000, 5000]
# Names for the bins
group_names = ["small (<1000)", "Medium(1000-2000)", "Large(2000-5000)"]
school_size_df = school_summary_df
school_size_df["School Size"] = pd.cut(student_count, bins=size_bins, labels=group_names)
school_size_df
spending_math_score = school_size_df.groupby(["School Size"])["Average Math Score"].mean()
spending_reading_score = school_size_df.groupby(["School Size"])["Average Reading Score"].mean()
spending_passing_math = school_size_df.groupby(["School Size"])["% Passing Math"].mean()
spending_passing_reading = school_size_df.groupby(["School Size"])['% Passing Reading'].mean()
overall_passing_rate = (spending_passing_math + spending_passing_reading)/2
# Creating dataframe
scores_by_size = pd.DataFrame ({"Average Math Score": spending_math_score,
"Average Reading Score": spending_reading_score,
"% Passing Math": spending_passing_math,
"% Passing Reading": spending_passing_reading,
"Overall Passing Rate": overall_passing_rate})
scores_by_size
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
</style>
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
% Passing Math | % Passing Reading | Average Math Score | Average Reading Score | Overall Passing Rate | |
---|---|---|---|---|---|
School Size | |||||
small (<1000) | 93.5500 | 96.10 | 83.820 | 83.92500 | 94.82500 |
Medium(1000-2000) | 93.6000 | 96.78 | 83.374 | 83.86800 | 95.19000 |
Large(2000-5000) | 69.9875 | 82.75 | 77.745 | 81.34375 | 76.36875 |
#* Repeat the above breakdown, but this time group schools based on school type (Charter vs. District)*
scores_school_type = school_summary_df[["School Type","Average Math Score",
"Average Reading Score",
"% Passing Math",
"% Passing Reading",
"% Overall Passing Rate"]]
scores_school_type = scores_school_type.groupby('School Type').mean()
scores_school_type
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
</style>
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
Average Math Score | Average Reading Score | % Passing Math | % Passing Reading | % Overall Passing Rate | |
---|---|---|---|---|---|
School Type | |||||
Charter | 83.472500 | 83.897500 | 93.625000 | 96.575000 | 95.100000 |
District | 76.955714 | 80.965714 | 66.571429 | 80.785714 | 73.678571 |