Skip to content
sushil edited this page Sep 13, 2010 · 17 revisions

The codesearch module in Sourcerer has a tool that allows evaluation of retrieval schemes for locating APIs. This page lists the resources that are used to run a evaluation.

A paper titled “Evaluation of Retrieval Schemes to Locate API Usage Examples in Code Repositories” has been submitted to the MSR2010 conference. Here are pointers to the entire dataset used for the evaluation in that paper. We hope this will be useful for others wanting to replicate or extend the study. The files used for MSR2010 paper is located in the following folder in this repository: (Link). Given below are links to individual files, and information on other resources.

Candidate Queries and the Oracle

  • A list of 20 queries and the solutions used to make relevancy judgement. (Download File)

Evaluation Data

  • Two files in TREC format were produced that store the information on ranked hits for each query, and relevance judgement made for each result.
    • Ranking List (Download File)
      • Follows the format: (query_id, , document_id, rank, score, scheme_id)
    • Judgement (Download File)
      • Follows the format: (query_id, , document_id, relevancy)
    • Evaluation Results generated using Galagosearch’s evaluation tool (Download File)

Repository information

The repository was created using the Jars from plugins folder of standard installation of Eclipse V3.5.1

  • list of all jars used to populate the repository

Tools used:

  • Sourcerer Feature Extractor
  • Sourcerer’s Index Creator
  • Sourcerer’s Usage Calculator and Similarity Calculator to generate Hamming Distance and Tanimoto Coefficient based similarity information
  • Sourcerer Code Search
    • Search Adapter
    • Snippet generator and Evaluation tool
  • Galagosearch’s evaluation tool for calculating metrics
  • R Scripts to generate plots/data

(under construction, please check back soon)

Clone this wiki locally