-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathREADME
79 lines (58 loc) · 1.63 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
I use WEKA for data mining, which uses ARFF files.
WEKA is a powerful tool, but sometimes I want to look at the datasets in a more universal format where I can manipulate them as I please. This Python script will convert ARFF files to XML or JSON.
Usage:
./arff_parser.py [filename1.arff filename2.arff ... ] [-json | -xml] [--debug]
# Debug mode will print a variable dump of the schema read in. Default value is False.
Expects:
- a WEKA file ('some-filename.arff') formatted as follows:
@relation <dataset-name>
% Comments (ignored)
@attribute1 [{value1,value2,value3} | datatype]
@attribute2 [{value1,value2,value3} | datatype]
...
@data
[attr1,attr2,...]
[attr1,attr2,...]
[attr1,attr2,...]
[attr1,attr2,...]
Outputs:
-json:
'some-filename.json'
{
"relation":dataset-name,
"attributes":[
{'name':attribute1,'values': [value1,value2...]},
{'name':attribute2,'values': [value1,value2...]}
],
"data":[
{attr1:val1,attr2:val2...},
{attr1:val1,attr2:val2...},
{attr1:val1,attr2:val2...},
...
]
}
-xml:
'some-filename.xml'
<dataset>
<relation>dataset-name</relation>
<attributes>
<attribute>
<name>attribute1</name>
<values>real</values>
</attribute>
<attribute>
<name>attribute2</name>
<values>
<value>val1</value>
<value>val2</value>
<value>val3</value>
<value>...</value>
</values>
</attribute>
</attributes>
<data>
<entry><attr1>val</attr1><attr2>val</attr2>...</entry>
<entry><attr1>val</attr1><attr2>val</attr2>...</entry>
...
</data>
</dataset>