diff --git a/README.md b/README.md
new file mode 100644
index 0000000..524f179
--- /dev/null
+++ b/README.md
@@ -0,0 +1,69 @@
+# ProbabilisticParameterEstimators.jl
+
+Implementation of different parameter estimators that take in measures under uncertainty and produce a probability distribution over the parameters.
+
+## High Level Example
+
+``` julia
+# observation function with multivariate observations
+f(x, p) = [(x + 1)^2 - sum(p);
+           (x + 1)^3 + diff(p)[1]]
+
+# true parameter (to be estimated) and a prior belief
+θtrue = [1.0, 2.0]
+prior = MvNormal(zeros(2), 4.0 * I)
+
+# observation noise
+obsnoises = [rand()/10 * I(2) * MvNormal(zeros(2), I) for _ in eachindex(xs)]
+noisemodel = UncorrGaussianNoiseModel(obsnoises)
+
+# noisy observations x and y
+xs = rand(5)
+ysmeas = f.(xs, [θtrue]) .+ rand.(noises)
+
+# find a probabilistic description of θ either as samples or as a distribution
+# currently we provide three methods
+for est in [MCMCEstimator(prior, f),
+            LinearApproxEstimator(prior, f),
+            LSQEstimator(prior, f)]
+    # either
+    samples =  predictsamples(est, xs, ysmeas, noisemodel, 100)
+    # or
+    dist    =  predictdist(est, xs, ysmeas, noisemodel; nsamples=100)
+end
+```
+
+## Problem Setup
+We assume parameters $\theta$ in $\mathbb{R}^m$, inputs $x$ in $\mathbb{R}^n$, and measurements $y$ in $\mathbb{R}^l$, linked by a observation function $$y = f(x, \theta) + \varepsilon$$ where $\varepsilon$ is sampled from a known noise distribution $p_{\bar{\varepsilon}}$.
+Further assumptions of the noise models are discussed below.
+Notice also that $x$, $y$, and $theta$ may all be multidimensional, with different dimensions.
+
+Given that we have uncertainty in the observations, we are interested in constructing a probabilistic description $p_{\bar{\theta}}(\theta \mid y)$ of the parameters $\theta$, either as a distribution, or as a set of samples.
+We implement three estimators for this task, which map to either samples or a distribution via `predictsamples(est, xs, ys, noisemodel, nsamples)` and `predictdist(est, xs, ys, noisemodel)`, respectively.
+The conversion between samples and a distribution can be done automatically via sampling or fitting a multivariate normal distribution.
+
+![Estimator Overview](figs/distribution_graph/distribution_graph.png)
+
+### MCMCEstimator
+The `MCMCEstimator` simply phrases the problem as a Monte-Carlo Markov-Chain inference problem, which we solve using the `NUTS` algorithm provided by `Turing.jl`.
+Therefore `predictdist(::MCMCEstimator, xs, ys, nsamples)` will create `nsamples` samples (after skipping a number of warmup steps).
+`predictdist(::MCMCEstimator, xs, ys, nsamples)` will do the same, and then fit a `MvNormal` distribution.
+
+### LSQEstimator
+The `LSQEstimator` works by sampling noise $\varepsilon^{(k)}$ from the noise model and repeatedly solving a least-squares parameter estimation problem for modified measurements $y - \varepsilon^{(k)}$, i.e.
+$$
+\theta = \arg \min_\theta \sum_i ((y_i - \varepsilon_i^{(i)}) - f(x, \theta))^2 \cdot w_i
+$$
+for uncorrelated noise, where the weights $w_i$ are chosen as the inverse variance.
+For correlated noise, the weight results from the whole covariance matrix.
+
+Therefore `predictsamples(::LSQEstimator, xs, ys, nsamples)` will solve `nsamples` optimization problems and return a sample each.
+`predictdist(::LSQEstimator, xs, ys, nsamples)` will do the same, and then fit a `MvNormal` distribution.
+
+### LinearApproxEstimator
+The `LinearApproxEstimator` solves the optimization problem above just once, and then constructs a multivariate normal distribution centered at the solution.
+The covariance is constructed by computing the Jacobian of $f(x, \theta)$ and (roughly) multiplying it with the measurement uncertainty.
+See also [this wikipedia link](https://en.wikipedia.org/wiki/Non-linear_least_squares#Extension_by_weights).
+
+Therefore `predictdist(::LinearApproxEstimator, xs, ys, nsamples)` will solve one optimization problem and compute one Jacobian, yielding a `MvNormal` and making it very efficient.
+`predictsamples(::LinearApproxEstimator, xs, ys, nsamples)` will simply sample `nsample` times from this distribution, which is also very fast.
diff --git a/figs/distribution_graph/distribution_graph.pdf b/figs/distribution_graph/distribution_graph.pdf
new file mode 100644
index 0000000..50cb939
Binary files /dev/null and b/figs/distribution_graph/distribution_graph.pdf differ
diff --git a/figs/distribution_graph/distribution_graph.png b/figs/distribution_graph/distribution_graph.png
new file mode 100644
index 0000000..3bfaf67
Binary files /dev/null and b/figs/distribution_graph/distribution_graph.png differ
diff --git a/figs/distribution_graph/distribution_graph.tex b/figs/distribution_graph/distribution_graph.tex
new file mode 100644
index 0000000..bb30770
--- /dev/null
+++ b/figs/distribution_graph/distribution_graph.tex
@@ -0,0 +1,24 @@
+\documentclass[tikz, convert={density=600,outext=.png}]{standalone}
+\usetikzlibrary{positioning}
+
+\begin{document}
+\begin{tikzpicture}[
+    mybox/.style={minimum width=2.0cm, minimum height=0.70cm, draw, thick},
+    myarrow/.style={shorten <=2pt, shorten >=2pt, ->, thick},
+  ]
+  \node[mybox] (mvnormal) {\texttt{MvNormal}};
+  \node[mybox, right=2 of mvnormal, align=center] (samples) {\texttt{Samples}};
+
+  \draw[myarrow] (mvnormal) to[bend left=20] node[above] {\texttt{rand}} (samples);
+  \draw[myarrow] (samples)  to[bend left=20] node[below] {\texttt{fit(MvNormal, samples)}} (mvnormal);
+
+  \node[draw, minimum width=4.5cm, above=3 of mvnormal.west, anchor=east] (mcmcestimator) {\texttt{MCMCEstimator}};
+  \node[draw, minimum width=4.5cm, below=0 of mcmcestimator] (lsqestimator) {\texttt{LSQEstimator}};
+  \node[draw, minimum width=4.5cm, below=0 of lsqestimator] (linearapproxestimator) {\texttt{LinearApproxEstimator}};
+  
+  \draw[->] (mcmcestimator) to[out=0, in=90] (samples);
+  \draw[->] (lsqestimator) to[out=0, in=120] node[above, rotate=-30, pos=0.62, scale=0.6] {\texttt{predictsamples}} (samples);
+  \draw[->] (linearapproxestimator) to[out=0, in=90] node[above, rotate=-70, scale=0.6, pos=0.62] {\texttt{predictdist}} (mvnormal);
+\end{tikzpicture}
+  
+\end{document}