Merge pull request #40 from spentelow/main

Update analysis file to format results tables/figure; revise text of …
UBC-MDS · Nov 29, 2020 · fbb6dcb · fbb6dcb
2 parents 53111cc + 80948ea
commit fbb6dcb
Show file tree

Hide file tree

Showing 7 changed files with 875 additions and 178 deletions.
diff --git a/doc/ufo_report.Rmd b/doc/ufo_report.Rmd
@@ -2,11 +2,13 @@
 title: "UFO Report"
 author: "Group-20 DSCI-522"
 date: "11/28/2020"
-always_allow_html: true
-output: 
+output:
+  pdf_document:
+    toc: yes
   html_document:
     df_print: paged
-    toc: true
+    toc: yes
+always_allow_html: yes
 bibliography: ufo_refs.bib
 ---
 
@@ -22,7 +24,7 @@ library(knitr)
 
 # Introduction
 
-Unidentified flying objects (UFOS) have a long and somewhat contentious history. Contrary to popular belief, most sightings are actually honest mistakes and not hoaxes. Weather balloons, satellites, and other explicable phenomena account for the vast majority of sightings. We wondered if these different phenomenon might leave traces in the data. Suspecting that different causes would be associated with different shapes of UFO reported in the sighting, we thought that these different causes might lead to different duration of sightings in our home areas of Washington and British Columbia. 
+Unidentified flying objects (UFOS) have a long and somewhat contentious history. Contrary to popular belief, most sightings are actually honest mistakes and not hoaxes. Weather balloons, satellites, and other explicable phenomena account for the vast majority of sightings. We wondered if these different phenomenon might leave traces in the data. Suspecting that different causes would be associated with different shapes of UFO reported in the sighting, we thought that these different causes might lead to different duration of sightings in our home areas of Washington and British Columbia.
 
 ## Data
 
@@ -37,17 +39,17 @@ To test our hypothesis, we selected the dataset UFO sightings maintained by Nati
 
 Data was analyzed using both the R programming language [@R] and Python [@Python]. Packages utilized in analysis as well as report generation include the Tidyverse package [@tidyverse], docopt for both Python and R [@docopt; @docoptpython], as well as knitr [@knitr].
 
-Text reports of sightings were converted to seconds. We removed sightings that had approximate times or provided a range of times for example: `still here`, `seconds`, `unknown`, `some minutes`. Reports that did not specify any shape or specified something other than shape, for example `Flash`, `Light`, `Unknown`, `Other`, `Changing`, were removed. We also applied a log-transform to the duration of sightings in seconds to aid in visualizing our data and to help make the results of our testing more interpretable. The final data used in the analysis has `r nrow(ufo_tidy)` observations.
+Text reports of sightings were converted to seconds. We removed sightings that had approximate times or provided a range of times for example: `still here`, `seconds`, `unknown`, `some minutes`. Reports that did not specify any shape or specified something other than shape, for example `Flash`, `Light`, `Unknown`, `Other`, `Changing`, were removed. We also applied a log-transform to the duration of sightings in seconds to aid in visualizing our data. The final data used in the analysis has `r nrow(ufo_tidy)` observations.
 
 ## Analysis
 
 **Hypothesis**
 
--   $H_0$ The median population log duration of all the shapes are equal
+-   $H_0$ The mean ranks of the duration of sightings for all shapes are equal.
 
--   $H_A$ The median population log duration of all the shapes are equal
+-   $H_A$ The mean ranks of the duration of sightings for all shapes are not equal.
 
-We took a non-parametric approach because of differences in group size, skewed distribution even after transformation, and variance between the different duration of different shapes. We selected the Kruskal-Wallis H Test to test to determine if significant differences existed. Dunn's test was utilized for Post-Hoc analysis with Bonferroni's correction to identify pairs of groups whose median population duration are significantly different. We selected a significance level of $\alpha =0.05$ for both steps in testing.
+We took a non-parametric approach because of differences in group size, skewed distribution, and variance between the different duration of different shapes. We selected the Kruskal-Wallis H Test to test to determine if significant differences existed. Dunn's test was utilized for Post-Hoc analysis with Bonferroni's correction to identify pairs of groups whose median population duration are significantly different. We selected a significance level of $\alpha =0.05$ for both steps in testing.
 
 # Results & Discussion
 
@@ -64,17 +66,17 @@ knitr::kable(kw_test, caption = kw_cap)
 **P value of significant pairs from Dunn Test**
 
 ```{r}
-summary <- readRDS(here::here("results", "summary_shape.rds"))
-summary_cap <- "Table 2. Summary"
+summary <- readRDS(here::here("results", "Dunn.rds"))
+summary_cap <- "Table 2. Shape pairs with significant difference in mean ranks"
 knitr::kable(summary, caption = summary_cap)
 ```
 
 **Post-Hoc Analysis Result**
 
 ![](../results/pairwise_plt.png)
 
-Ultimately, our testing revealed several significant differences in log median duration between shapes. Further experimentation will be necessary to confirm if this is, in fact, due to different underlying causes. 
+Ultimately, our testing revealed several significant differences in mean rank of duration of sightings between shapes. Further experimentation would be necessary to determine the underlying cause(s) of the differences.
 
-Additionally, there are some important limitations to this work. As discussed in previous sections we removed a good deal of data in processing. There is potential that we somehow skewed our results through this process. Furthermore, our sample was not random or representative of all UFO sightings in the BC and Washington area because we only had access to samples that were reported.
+Additionally, there are some important limitations to this work. As discussed in previous sections we removed a good deal of data in processing. There is potential that we somehow introduced a bias into our results through this process. Furthermore, our sample was not random or representative of all UFO sightings in the BC and Washington area because we only had access to samples that were reported.
 
 # References