DPCS-team · SzymonPajzert · Apr 12, 2016
diff --git a/docs/StackOverflowCrawler.md → ...ckOverflowCrawler/StackOverflowCrawler.md b/docs/StackOverflowCrawler.md → ...ckOverflowCrawler/StackOverflowCrawler.md
diff --git a/docs/StackOverflowCrawler/similiarPapers.md b/docs/StackOverflowCrawler/similiarPapers.md
@@ -0,0 +1,31 @@
+# List of similar projects and notes
+Most of the papers in the list is taken from:
+http://meta.stackexchange.com/questions/134495/academic-papers-using-stack-exchange-data
+
+### [Mining StackOverflow to Turn the IDE into a Self-Confident Programming Prompter](http://www.inf.usi.ch/phd/ponzanelli/profile/publications/2014b/Ponz2014b.pdf) + [Prompter: A Self-confident Recommender System](http://www.inf.usi.ch/phd/ponzanelli/profile/publications/2014d/Ponz2014d.pdf)
+IDE plugin, querying code snippets and retrieving evaluated solution. Unluckily, algorithm uses search engines instead of their own machine learning algorithm.
+
+**Possible project value:** important
+
+### [Predicting Tags for StackOverflow Posts](http://chil.rice.edu/research/pdf/StanleyByrne2013StackOverflow.pdf)
+Prediction of tags for given text with 65% accuracy. Prediction done with use of Bayesian probabilistic model.
+
+**Possible project value:** significant
+
+### [StORMeD: Stack Overflow Ready Made Data](http://www.inf.usi.ch/phd/ponzanelli/profile/publications/2015a/Ponz2015a.pdf)
+Ready model and algorithms to mine data in Stack Overflow.
+
+**Possible project value:** meagre
+
+### [Mining Questions Asked by Web Developers](http://salt.ece.ubc.ca/publications/docs/kartik-msr14.pdf)
+Unsupervised learning - topic clustering. Data contained questions about HTML5, JavaScript and CSS. Main goal was to divide and label questions as  using natural language processing and Latent Dirichlet Allocation - type of statistical modeling that can be used to discover hidden topics in
+a collection of documents, based on the statistics of words in each document.
+
+**Possible project value:** meagre
+
+### [Automatic categorization of questions from Q&A sites](http://lascam.facom.ufu.br/cms/userfiles/downloads/2014/SAC2014CameraReady.pdf)
+Q&A questions classification algorithms. Questions on SO are divided into 3 categories: how-to-do-it, need-to-know, seeking-something. Presented algorithms, with varying efficiency classify data - the best turned out to be Naive Bayes.
+
+Naive Bayes: These classifiers assume that all the attributes are independent and that each contributes equally to the categorization. A category is assigned to a project by combining the contribution of each feature.
+
+**Possible project value:** meagre