Skip to content

Commit

Permalink
Merge pull request #40 from spentelow/main
Browse files Browse the repository at this point in the history
Update analysis file to format results tables/figure; revise text of …
  • Loading branch information
JacobMcFarlane authored Nov 29, 2020
2 parents 53111cc + 80948ea commit fbb6dcb
Show file tree
Hide file tree
Showing 7 changed files with 875 additions and 178 deletions.
26 changes: 14 additions & 12 deletions doc/ufo_report.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,13 @@
title: "UFO Report"
author: "Group-20 DSCI-522"
date: "11/28/2020"
always_allow_html: true
output:
output:
pdf_document:
toc: yes
html_document:
df_print: paged
toc: true
toc: yes
always_allow_html: yes
bibliography: ufo_refs.bib
---

Expand All @@ -22,7 +24,7 @@ library(knitr)

# Introduction

Unidentified flying objects (UFOS) have a long and somewhat contentious history. Contrary to popular belief, most sightings are actually honest mistakes and not hoaxes. Weather balloons, satellites, and other explicable phenomena account for the vast majority of sightings. We wondered if these different phenomenon might leave traces in the data. Suspecting that different causes would be associated with different shapes of UFO reported in the sighting, we thought that these different causes might lead to different duration of sightings in our home areas of Washington and British Columbia.
Unidentified flying objects (UFOS) have a long and somewhat contentious history. Contrary to popular belief, most sightings are actually honest mistakes and not hoaxes. Weather balloons, satellites, and other explicable phenomena account for the vast majority of sightings. We wondered if these different phenomenon might leave traces in the data. Suspecting that different causes would be associated with different shapes of UFO reported in the sighting, we thought that these different causes might lead to different duration of sightings in our home areas of Washington and British Columbia.

## Data

Expand All @@ -37,17 +39,17 @@ To test our hypothesis, we selected the dataset UFO sightings maintained by Nati

Data was analyzed using both the R programming language [@R] and Python [@Python]. Packages utilized in analysis as well as report generation include the Tidyverse package [@tidyverse], docopt for both Python and R [@docopt; @docoptpython], as well as knitr [@knitr].

Text reports of sightings were converted to seconds. We removed sightings that had approximate times or provided a range of times for example: `still here`, `seconds`, `unknown`, `some minutes`. Reports that did not specify any shape or specified something other than shape, for example `Flash`, `Light`, `Unknown`, `Other`, `Changing`, were removed. We also applied a log-transform to the duration of sightings in seconds to aid in visualizing our data and to help make the results of our testing more interpretable. The final data used in the analysis has `r nrow(ufo_tidy)` observations.
Text reports of sightings were converted to seconds. We removed sightings that had approximate times or provided a range of times for example: `still here`, `seconds`, `unknown`, `some minutes`. Reports that did not specify any shape or specified something other than shape, for example `Flash`, `Light`, `Unknown`, `Other`, `Changing`, were removed. We also applied a log-transform to the duration of sightings in seconds to aid in visualizing our data. The final data used in the analysis has `r nrow(ufo_tidy)` observations.

## Analysis

**Hypothesis**

- $H_0$ The median population log duration of all the shapes are equal
- $H_0$ The mean ranks of the duration of sightings for all shapes are equal.

- $H_A$ The median population log duration of all the shapes are equal
- $H_A$ The mean ranks of the duration of sightings for all shapes are not equal.

We took a non-parametric approach because of differences in group size, skewed distribution even after transformation, and variance between the different duration of different shapes. We selected the Kruskal-Wallis H Test to test to determine if significant differences existed. Dunn's test was utilized for Post-Hoc analysis with Bonferroni's correction to identify pairs of groups whose median population duration are significantly different. We selected a significance level of $\alpha =0.05$ for both steps in testing.
We took a non-parametric approach because of differences in group size, skewed distribution, and variance between the different duration of different shapes. We selected the Kruskal-Wallis H Test to test to determine if significant differences existed. Dunn's test was utilized for Post-Hoc analysis with Bonferroni's correction to identify pairs of groups whose median population duration are significantly different. We selected a significance level of $\alpha =0.05$ for both steps in testing.

# Results & Discussion

Expand All @@ -64,17 +66,17 @@ knitr::kable(kw_test, caption = kw_cap)
**P value of significant pairs from Dunn Test**

```{r}
summary <- readRDS(here::here("results", "summary_shape.rds"))
summary_cap <- "Table 2. Summary"
summary <- readRDS(here::here("results", "Dunn.rds"))
summary_cap <- "Table 2. Shape pairs with significant difference in mean ranks"
knitr::kable(summary, caption = summary_cap)
```

**Post-Hoc Analysis Result**

![](../results/pairwise_plt.png)

Ultimately, our testing revealed several significant differences in log median duration between shapes. Further experimentation will be necessary to confirm if this is, in fact, due to different underlying causes.
Ultimately, our testing revealed several significant differences in mean rank of duration of sightings between shapes. Further experimentation would be necessary to determine the underlying cause(s) of the differences.

Additionally, there are some important limitations to this work. As discussed in previous sections we removed a good deal of data in processing. There is potential that we somehow skewed our results through this process. Furthermore, our sample was not random or representative of all UFO sightings in the BC and Washington area because we only had access to samples that were reported.
Additionally, there are some important limitations to this work. As discussed in previous sections we removed a good deal of data in processing. There is potential that we somehow introduced a bias into our results through this process. Furthermore, our sample was not random or representative of all UFO sightings in the BC and Washington area because we only had access to samples that were reported.

# References
Loading

0 comments on commit fbb6dcb

Please sign in to comment.