Skip to content

Commit

Permalink
add note
Browse files Browse the repository at this point in the history
add poisson  reg
  • Loading branch information
math4mad committed Oct 14, 2023
1 parent a9dd4a6 commit 2341f2d
Show file tree
Hide file tree
Showing 11 changed files with 533 additions and 7 deletions.
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
{
"hash": "cf44232c5d30886c49b64477351ae1dd",
"hash": "52abb92e4afe72b925684cfb35dc0d22",
"result": {
"markdown": "---\ntitle: \"1-salary-linear-reg\"\ncode-fold: true\n---\n\n:::{.callout-note,title=\"简介\"}\n > explore `YearsExperience` and `Salary` relationship\n\n 1. dataset: [`kaggle salary dataset`](https://www.kaggle.com/datasets/abhishek14398/salary-dataset-simple-linear-regression)\n \n 2. 数据类型需要做转换: `to_ScienceType(d)=coerce(d,:YearsExperience=>Continuous,:Salary=>Continuous)`\n 3. using `MLJLinearModels.jl` [🔗](https://github.com/alan-turing-institute/MLJLinearModels.jl)\n:::\n\n## 1. load package\n\n::: {.cell execution_count=1}\n``` {.julia .cell-code}\n include(\"../utils.jl\")\n import MLJ:fit!,fitted_params\n using GLMakie,MLJ,CSV,DataFrames\n```\n:::\n\n\n## 2. process data\n\n::: {.panel-tabset}\n# `load(csv)->dataframe` ==>\n\n::: {.cell execution_count=2}\n``` {.julia .cell-code code-fold=\"show\"}\ndf=CSV.File(\"./data/salary_dataset.csv\") |> DataFrame |> dropmissing;\nfirst(df,5)\n```\n\n::: {.cell-output .cell-output-display execution_count=3}\n```{=html}\n<div><div style = \"float: left;\"><span>5×3 DataFrame</span></div><div style = \"clear: both;\"></div></div><div class = \"data-frame\" style = \"overflow-x: scroll;\"><table class = \"data-frame\" style = \"margin-bottom: 6px;\"><thead><tr class = \"header\"><th class = \"rowNumber\" style = \"font-weight: bold; text-align: right;\">Row</th><th style = \"text-align: left;\">Column1</th><th style = \"text-align: left;\">YearsExperience</th><th style = \"text-align: left;\">Salary</th></tr><tr class = \"subheader headerLastRow\"><th class = \"rowNumber\" style = \"font-weight: bold; text-align: right;\"></th><th title = \"Int64\" style = \"text-align: left;\">Int64</th><th title = \"Float64\" style = \"text-align: left;\">Float64</th><th title = \"Float64\" style = \"text-align: left;\">Float64</th></tr></thead><tbody><tr><td class = \"rowNumber\" style = \"font-weight: bold; text-align: right;\">1</td><td style = \"text-align: right;\">0</td><td style = \"text-align: right;\">1.2</td><td style = \"text-align: right;\">39344.0</td></tr><tr><td class = \"rowNumber\" style = \"font-weight: bold; text-align: right;\">2</td><td style = \"text-align: right;\">1</td><td style = \"text-align: right;\">1.4</td><td style = \"text-align: right;\">46206.0</td></tr><tr><td class = \"rowNumber\" style = \"font-weight: bold; text-align: right;\">3</td><td style = \"text-align: right;\">2</td><td style = \"text-align: right;\">1.6</td><td style = \"text-align: right;\">37732.0</td></tr><tr><td class = \"rowNumber\" style = \"font-weight: bold; text-align: right;\">4</td><td style = \"text-align: right;\">3</td><td style = \"text-align: right;\">2.1</td><td style = \"text-align: right;\">43526.0</td></tr><tr><td class = \"rowNumber\" style = \"font-weight: bold; text-align: right;\">5</td><td style = \"text-align: right;\">4</td><td style = \"text-align: right;\">2.3</td><td style = \"text-align: right;\">39892.0</td></tr></tbody></table></div>\n```\n:::\n:::\n\n\n# `sciencetype` ==> \n\n::: {.cell execution_count=3}\n``` {.julia .cell-code}\nto_ScienceType(d)=coerce(d,:YearsExperience=>Continuous,:Salary=>Continuous)\nnew_df=to_ScienceType(df)\nfirst(new_df,5)\n```\n\n::: {.cell-output .cell-output-display execution_count=4}\n```{=html}\n<div><div style = \"float: left;\"><span>5×3 DataFrame</span></div><div style = \"clear: both;\"></div></div><div class = \"data-frame\" style = \"overflow-x: scroll;\"><table class = \"data-frame\" style = \"margin-bottom: 6px;\"><thead><tr class = \"header\"><th class = \"rowNumber\" style = \"font-weight: bold; text-align: right;\">Row</th><th style = \"text-align: left;\">Column1</th><th style = \"text-align: left;\">YearsExperience</th><th style = \"text-align: left;\">Salary</th></tr><tr class = \"subheader headerLastRow\"><th class = \"rowNumber\" style = \"font-weight: bold; text-align: right;\"></th><th title = \"Int64\" style = \"text-align: left;\">Int64</th><th title = \"Float64\" style = \"text-align: left;\">Float64</th><th title = \"Float64\" style = \"text-align: left;\">Float64</th></tr></thead><tbody><tr><td class = \"rowNumber\" style = \"font-weight: bold; text-align: right;\">1</td><td style = \"text-align: right;\">0</td><td style = \"text-align: right;\">1.2</td><td style = \"text-align: right;\">39344.0</td></tr><tr><td class = \"rowNumber\" style = \"font-weight: bold; text-align: right;\">2</td><td style = \"text-align: right;\">1</td><td style = \"text-align: right;\">1.4</td><td style = \"text-align: right;\">46206.0</td></tr><tr><td class = \"rowNumber\" style = \"font-weight: bold; text-align: right;\">3</td><td style = \"text-align: right;\">2</td><td style = \"text-align: right;\">1.6</td><td style = \"text-align: right;\">37732.0</td></tr><tr><td class = \"rowNumber\" style = \"font-weight: bold; text-align: right;\">4</td><td style = \"text-align: right;\">3</td><td style = \"text-align: right;\">2.1</td><td style = \"text-align: right;\">43526.0</td></tr><tr><td class = \"rowNumber\" style = \"font-weight: bold; text-align: right;\">5</td><td style = \"text-align: right;\">4</td><td style = \"text-align: right;\">2.3</td><td style = \"text-align: right;\">39892.0</td></tr></tbody></table></div>\n```\n:::\n:::\n\n\n# `MLJ table`\n\n::: {.cell execution_count=4}\n``` {.julia .cell-code}\n X=MLJ.table(reshape(new_df[:,2],30,1))\n y=Vector(new_df[:,3])\n show(y)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[39344.0, 46206.0, 37732.0, 43526.0, 39892.0, 56643.0, 60151.0, 54446.0, 64446.0, 57190.0, 63219.0, 55795.0, 56958.0, 57082.0, 61112.0, 67939.0, 66030.0, 83089.0, 81364.0, 93941.0, 91739.0, 98274.0, 101303.0, 113813.0, 109432.0, 105583.0, 116970.0, 112636.0, 122392.0, 121873.0]\n```\n:::\n:::\n\n\n:::\n\n## 3. MLJ workflow\n### 3.1 load model\n\n::: {.cell execution_count=5}\n``` {.julia .cell-code code-fold=\"false\"}\n LinearRegressor = @load LinearRegressor pkg=MLJLinearModels\n model=LinearRegressor()\n mach = MLJ.fit!(machine(model,X,y))\n fp=MLJ.fitted_params(mach) #学习的模型参数\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n[ Info: For silent loading, specify `verbosity=0`. \n[ Info: Training machine(LinearRegressor(fit_intercept = true, …), …).\n┌ Info: Solver: MLJLinearModels.Analytical\n│ iterative: Bool false\n└ max_inner: Int64 200\n```\n:::\n\n::: {.cell-output .cell-output-stdout}\n```\nimport MLJLinearModels ✔\n```\n:::\n\n::: {.cell-output .cell-output-display execution_count=6}\n```\n(coefs = [:x1 => 9449.962321455077],\n intercept = 24848.203966523164,)\n```\n:::\n:::\n\n\n### 3.2 build linear function \n\n::: {.cell execution_count=6}\n``` {.julia .cell-code}\n a=fp.coefs[1,1][2]\n b=fp.intercept\n line_func(t)=a*t+b\n```\n\n::: {.cell-output .cell-output-display execution_count=7}\n```\nline_func (generic function with 1 method)\n```\n:::\n:::\n\n\n## 4. plot results\n\n::: {.cell execution_count=7}\n``` {.julia .cell-code}\nxs=range(extrema(new_df[:,2])...,200)\nfig=Figure()\nax=Axis(fig[1,1];xlabel=\"YearsExperience\",ylabel=\"Salary\")\nlines!(ax,xs,line_func.(xs);label=\"fit-line\",linewidth=3)\nscatter!(ax,new_df[:,2],new_df[:,3];label=\"data\",marker_style...)\naxislegend(ax)\nfig\n```\n\n::: {.cell-output .cell-output-display execution_count=8}\n![](1-salary-linear-reg_files/figure-html/cell-8-output-1.png){}\n:::\n:::\n\n\n",
"markdown": "---\ntitle: \"1-salary-linear-reg\"\ncode-fold: true\n---\n\n:::{.callout-note title=\"简介\"}\n > explore `YearsExperience` and `Salary` relationship\n\n 1. dataset: [`kaggle salary dataset`](https://www.kaggle.com/datasets/abhishek14398/salary-dataset-simple-linear-regression)\n \n 2. 数据类型需要做转换: `to_ScienceType(d)=coerce(d,:YearsExperience=>Continuous,:Salary=>Continuous)`\n 3. using `MLJLinearModels.jl` [🔗](https://github.com/alan-turing-institute/MLJLinearModels.jl)\n:::\n\n## 1. load package\n\n::: {.cell execution_count=1}\n``` {.julia .cell-code}\n include(\"../utils.jl\")\n import MLJ:fit!,fitted_params\n using GLMakie,MLJ,CSV,DataFrames\n```\n:::\n\n\n## 2. process data\n\n::: {.panel-tabset}\n# `load(csv)->dataframe` ==>\n\n::: {.cell execution_count=2}\n``` {.julia .cell-code code-fold=\"show\"}\ndf=CSV.File(\"./data/salary_dataset.csv\") |> DataFrame |> dropmissing;\nfirst(df,5)\n```\n\n::: {.cell-output .cell-output-display execution_count=3}\n```{=html}\n<div><div style = \"float: left;\"><span>5×3 DataFrame</span></div><div style = \"clear: both;\"></div></div><div class = \"data-frame\" style = \"overflow-x: scroll;\"><table class = \"data-frame\" style = \"margin-bottom: 6px;\"><thead><tr class = \"header\"><th class = \"rowNumber\" style = \"font-weight: bold; text-align: right;\">Row</th><th style = \"text-align: left;\">Column1</th><th style = \"text-align: left;\">YearsExperience</th><th style = \"text-align: left;\">Salary</th></tr><tr class = \"subheader headerLastRow\"><th class = \"rowNumber\" style = \"font-weight: bold; text-align: right;\"></th><th title = \"Int64\" style = \"text-align: left;\">Int64</th><th title = \"Float64\" style = \"text-align: left;\">Float64</th><th title = \"Float64\" style = \"text-align: left;\">Float64</th></tr></thead><tbody><tr><td class = \"rowNumber\" style = \"font-weight: bold; text-align: right;\">1</td><td style = \"text-align: right;\">0</td><td style = \"text-align: right;\">1.2</td><td style = \"text-align: right;\">39344.0</td></tr><tr><td class = \"rowNumber\" style = \"font-weight: bold; text-align: right;\">2</td><td style = \"text-align: right;\">1</td><td style = \"text-align: right;\">1.4</td><td style = \"text-align: right;\">46206.0</td></tr><tr><td class = \"rowNumber\" style = \"font-weight: bold; text-align: right;\">3</td><td style = \"text-align: right;\">2</td><td style = \"text-align: right;\">1.6</td><td style = \"text-align: right;\">37732.0</td></tr><tr><td class = \"rowNumber\" style = \"font-weight: bold; text-align: right;\">4</td><td style = \"text-align: right;\">3</td><td style = \"text-align: right;\">2.1</td><td style = \"text-align: right;\">43526.0</td></tr><tr><td class = \"rowNumber\" style = \"font-weight: bold; text-align: right;\">5</td><td style = \"text-align: right;\">4</td><td style = \"text-align: right;\">2.3</td><td style = \"text-align: right;\">39892.0</td></tr></tbody></table></div>\n```\n:::\n:::\n\n\n# `sciencetype` ==> \n\n::: {.cell execution_count=3}\n``` {.julia .cell-code}\nto_ScienceType(d)=coerce(d,:YearsExperience=>Continuous,:Salary=>Continuous)\nnew_df=to_ScienceType(df)\nfirst(new_df,5)\n```\n\n::: {.cell-output .cell-output-display execution_count=4}\n```{=html}\n<div><div style = \"float: left;\"><span>5×3 DataFrame</span></div><div style = \"clear: both;\"></div></div><div class = \"data-frame\" style = \"overflow-x: scroll;\"><table class = \"data-frame\" style = \"margin-bottom: 6px;\"><thead><tr class = \"header\"><th class = \"rowNumber\" style = \"font-weight: bold; text-align: right;\">Row</th><th style = \"text-align: left;\">Column1</th><th style = \"text-align: left;\">YearsExperience</th><th style = \"text-align: left;\">Salary</th></tr><tr class = \"subheader headerLastRow\"><th class = \"rowNumber\" style = \"font-weight: bold; text-align: right;\"></th><th title = \"Int64\" style = \"text-align: left;\">Int64</th><th title = \"Float64\" style = \"text-align: left;\">Float64</th><th title = \"Float64\" style = \"text-align: left;\">Float64</th></tr></thead><tbody><tr><td class = \"rowNumber\" style = \"font-weight: bold; text-align: right;\">1</td><td style = \"text-align: right;\">0</td><td style = \"text-align: right;\">1.2</td><td style = \"text-align: right;\">39344.0</td></tr><tr><td class = \"rowNumber\" style = \"font-weight: bold; text-align: right;\">2</td><td style = \"text-align: right;\">1</td><td style = \"text-align: right;\">1.4</td><td style = \"text-align: right;\">46206.0</td></tr><tr><td class = \"rowNumber\" style = \"font-weight: bold; text-align: right;\">3</td><td style = \"text-align: right;\">2</td><td style = \"text-align: right;\">1.6</td><td style = \"text-align: right;\">37732.0</td></tr><tr><td class = \"rowNumber\" style = \"font-weight: bold; text-align: right;\">4</td><td style = \"text-align: right;\">3</td><td style = \"text-align: right;\">2.1</td><td style = \"text-align: right;\">43526.0</td></tr><tr><td class = \"rowNumber\" style = \"font-weight: bold; text-align: right;\">5</td><td style = \"text-align: right;\">4</td><td style = \"text-align: right;\">2.3</td><td style = \"text-align: right;\">39892.0</td></tr></tbody></table></div>\n```\n:::\n:::\n\n\n# `MLJ table`\n\n::: {.cell execution_count=4}\n``` {.julia .cell-code}\n X=MLJ.table(reshape(new_df[:,2],30,1))\n y=Vector(new_df[:,3])\n show(y)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[39344.0, 46206.0, 37732.0, 43526.0, 39892.0, 56643.0, 60151.0, 54446.0, 64446.0, 57190.0, 63219.0, 55795.0, 56958.0, 57082.0, 61112.0, 67939.0, 66030.0, 83089.0, 81364.0, 93941.0, 91739.0, 98274.0, 101303.0, 113813.0, 109432.0, 105583.0, 116970.0, 112636.0, 122392.0, 121873.0]\n```\n:::\n:::\n\n\n:::\n\n## 3. MLJ workflow\n### 3.1 load model\n\n::: {.cell execution_count=5}\n``` {.julia .cell-code code-fold=\"false\"}\n LinearRegressor = @load LinearRegressor pkg=MLJLinearModels\n model=LinearRegressor()\n mach = MLJ.fit!(machine(model,X,y))\n fp=MLJ.fitted_params(mach) #学习的模型参数\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n[ Info: For silent loading, specify `verbosity=0`. \n[ Info: Training machine(LinearRegressor(fit_intercept = true, …), …).\n┌ Info: Solver: MLJLinearModels.Analytical\n│ iterative: Bool false\n└ max_inner: Int64 200\n```\n:::\n\n::: {.cell-output .cell-output-stdout}\n```\nimport MLJLinearModels ✔\n```\n:::\n\n::: {.cell-output .cell-output-display execution_count=6}\n```\n(coefs = [:x1 => 9449.962321455077],\n intercept = 24848.203966523164,)\n```\n:::\n:::\n\n\n### 3.2 build linear function \n\n::: {.cell execution_count=6}\n``` {.julia .cell-code}\n a=fp.coefs[1,1][2]\n b=fp.intercept\n line_func(t)=a*t+b\n```\n\n::: {.cell-output .cell-output-display execution_count=7}\n```\nline_func (generic function with 1 method)\n```\n:::\n:::\n\n\n## 4. plot results\n\n::: {.cell execution_count=7}\n``` {.julia .cell-code}\nxs=range(extrema(new_df[:,2])...,200)\nfig=Figure()\nax=Axis(fig[1,1];xlabel=\"YearsExperience\",ylabel=\"Salary\")\nlines!(ax,xs,line_func.(xs);label=\"fit-line\",linewidth=3)\nscatter!(ax,new_df[:,2],new_df[:,3];label=\"data\",marker_style...)\naxislegend(ax)\nfig\n```\n\n::: {.cell-output .cell-output-display execution_count=8}\n![](1-salary-linear-reg_files/figure-html/cell-8-output-1.png){}\n:::\n:::\n\n\n",
"supporting": [
"1-salary-linear-reg_files"
"1-salary-linear-reg_files/figure-html"
],
"filters": [],
"includes": {
Expand Down
Loading

0 comments on commit 2341f2d

Please sign in to comment.