weightedcalcs
is a pandas
-based Python library for calculating weighted means, medians, standard deviations, and more.
- Plays well with
pandas
. - Support for weighted means, medians, quantiles, standard deviations, and distributions.
- Support for grouped calculations, using
DataFrameGroupBy
objects. - Raises an error when your data contains null-values.
- Full test coverage.
pip install weightedcalcs
Every weighted calculation in weightedcalcs
begins with an instance of the weightedcalcs.Calculator
class. Calculator
takes one argument: the name of your weighting variable. So if you're analyzing a survey where the weighting variable is called "resp_weight"
, you'd do this:
import weightedcalcs as wc
calc = wc.Calculator("resp_weight")
Currently, weightedcalcs.Calculator
supports the following calculations:
calc.mean(my_data, value_var)
: The weighted arithmetic average ofvalue_var
.calc.quantile(my_data, value_var, q)
: The weighted quantile ofvalue_var
, whereq
is between 0 and 1.calc.median(my_data, value_var)
: The weighted median ofvalue_var
, equivalent to.quantile(...)
whereq=0.5
.calc.std(my_data, value_var)
: The weighted standard deviation ofvalue_var
.calc.distribution(my_data, value_var)
: The weighted proportions ofvalue_var
, interpretingvalue_var
as categories.calc.count(my_data)
: The weighted count of all observations, i.e., the total weight.calc.sum(my_data, value_var)
: The weighted sum ofvalue_var
.
The obj
parameter above should one of the following:
- A
pandas
DataFrame
object - A
pandas
DataFrame.groupby
object - A plain Python dictionary where the keys are column names and the values are equal-length lists.
Below is a basic example of using weightedcalcs
to find what percentage of Wyoming residents are married, divorced, et cetera:
import pandas as pd
import weightedcalcs as wc
# Load the 2015 American Community Survey person-level responses for Wyoming
responses = pd.read_csv("examples/data/acs-2015-pums-wy-simple.csv")
# `PWGTP` is the weighting variable used in the ACS's person-level data
calc = wc.Calculator("PWGTP")
# Get the distribution of marriage-status responses
calc.distribution(responses, "marriage_status").round(3).sort_values(ascending=False)
# -- Output --
# marriage_status
# Married 0.425
# Never married or under 15 years old 0.421
# Divorced 0.097
# Widowed 0.046
# Separated 0.012
# Name: PWGTP, dtype: float64
See this notebook to see examples of other calculations, including grouped calculations.
Max Ghenis has created a version of the example notebook that can be run directly in your browser, via Google Colab.