Skip to content

Experiments

Anirudh Vegesana edited this page Jul 26, 2019 · 9 revisions

Introduction

Experiment page includes all the experiments team has hosted, including the ones within CAM2 team and on AMT.
The latest experiment result should be positioned at the top of this

Since major changes were made base on results from the previous experiment, the 2nd experiment was divided into 2 parts: 2.1 and 2,2. The order of experiment is as follow: 2.1-phase01a -> 2.2-phase01a -> 2.1-phase01b -> 2.1-phase03.

Experiment 2.1:

  • Number of participants (counted by the number of HITs released):
    • phase01a: 5 ($0.15/HIT)
    • phase01b: 5 ($0.15/HIT)
    • phase03: 15 ($0.05/HIT)
  • Cost: Unknown. (results can be varied. Depend on the acceptance rate of HITs)

This is the first full-run experiment. phase01a was changed to showing 3 images instead of one. Player was asked to find similarities instead of attributes. Results was improved significantly.
The bug in phase03 causing the results to be unexpectedly low (e.g. -90).
When reporting to Professor Yin, problems of this experiment emerge to the surface:

  1. We couldn't track the merging flow conveniently.
  2. Instruction gifs were played too fast for players to read subtitles.
  3. The mechanism on choosing the question to be merged into was monotonous (the earliest input will be merged by the following ones).

Experiment 2.2:

  • Number of participants (counted by the number of HITs released):
    • phase01a: 5 ($0.15/HIT)
  • Cost: Unknown. (results can be varied. Depend on the acceptance rate of HITs) The main difference between 2.1 and 2.2 is the dataset used. No difference on design.

Only phase01a was conducted, for testing reason. Results are stored.

  • Number of participants (counted by the number of HITs released):
    • Phase01a: 15

At this time, we were still using the new design created at the beginning of 2019 summer of phase01a: showing one image at each round, along with all the previous questions entered by the other players. Player was asked to looking through the previous question list and make different ones accordingly.
However, as shown in the results of this experiment, players tend not to carefully read through the entire previous question list. Instead, most inputs were meaningless and unrelated because of their irresponsibility and misunderstanding, such as "how did the P 51 Mustang get its name?" and "what is this?".
The results indicating this experiment couldn't be considered as successful, but it did present several defects of our design. Changes were made accordingly afterwards.

CAM2 Team Experiment (4/17/2019)

In total, we hosted 2 experiments within CAM2 team at the monthly celebration party. However, due to the lack of documentation and low quality, only the second CAM2 Team Experiment is recored here.

  • Number of participants: 14
  • Attributes from phase01: 66
  • Reduced results from phase02: 51

The following is a snippet of the final attribute result sheet:

The main problem of this experiment is, players tend not to read the entire list at the phase02: reduce redundancy. 66 attributes were collected from phase01, but they were only reduced to 51 after phase02. Many results we expected to see didn't show up neither.