-
Notifications
You must be signed in to change notification settings - Fork 0
/
demo.text
54 lines (48 loc) · 2.18 KB
/
demo.text
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
# Extract-Wikipedia-Infobox-data
How to extract “infobox company” data from wiki dumps
Code:
package infoboxextractor;
import edu.jhu.nlp.wikipedia.*;
public class InfoboxExtractor {
public static void main(String[] args) throws Exception{
// TODO Auto-generated method stub
WikiXMLParser parser = WikiXMLParserFactory.getSAXParser("/home/dsteam/wikipedia infobox/demowikidump.xml");
parser.setPageCallback(new PageCallbackHandler() {
public void process(WikiPage page) {
try {
InfoBox infobox=page.getInfoBox();
System.out.println(page.getTitle());
System.out.println(page.getID());
System.out.println(infobox.dumpRaw());
} catch (WikiTextParserException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
//do something with info box
}
});
parser.parse();
}
}
OUTPUT:
{{Infobox medical condition (new)
| name = Autism
| image = Autism-stacking-cans 2nd edit.jpg
| alt = Boy stacking cans
| caption = Repetitively stacking or lining up objects is associated with autism.
| field = [[Psychiatry]]
| symptoms = Trouble with [[Interpersonal relationship|social interaction]], impaired [[communication]], restricted interests, repetitive behavior
| complications =
| onset = By age two or three
| duration = Long-term
| causes = [[Heritability of autism|Genetic]] and environmental factors
| risks =
| diagnosis = Based on behavior and developmental history
| differential = [[Reactive attachment disorder]], [[intellectual disability]], [[schizophrenia]]
| prevention =
| treatment = [[Behavioral therapy]], [[speech therapy]], [[psychotropic medication]]
| medication = [[Atypical antipsychotics|Antipsychotics]], [[antidepressants]], [[stimulants]] (associated symptoms)
| prognosis = Frequently poor
| frequency = 24.8 million (2015)
| deaths =
}}