-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added Convert Function as discussed on Issue#22 #33
base: master
Are you sure you want to change the base?
Changes from all commits
fcc9010
41a1cbd
415c58e
291eb42
7a82b37
a84b331
f5e7513
a2dbf02
196521f
93ddab1
164ded5
31f4899
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
# Dataset Utils | ||
|
||
This directory contains utility functions related to Datasets. | ||
|
||
Current Implemented features | ||
|
||
* Convert CSV files to JSON([Issue](https://github.com/mlpack/models/issues/22)) | ||
* Convert CSV files to XML([Issue](https://github.com/mlpack/models/issues/22)) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,105 @@ | ||
#include <iostream> | ||
#include <unordered_map> | ||
#include <boost/range/combine.hpp> | ||
#include <boost/lexical_cast.hpp> | ||
#include <boost/property_tree/ptree.hpp> | ||
#include <boost/property_tree/xml_parser.hpp> | ||
#include <boost/property_tree/json_parser.hpp> | ||
#include <boost/tokenizer.hpp> | ||
#include <string> | ||
#include <fstream> | ||
#include <unordered_map> | ||
|
||
|
||
using namespace boost::property_tree; | ||
using namespace boost; | ||
|
||
class Convert | ||
{ | ||
auto tokenize(std::string& line) | ||
{ | ||
std::vector<std::string> col_names; | ||
tokenizer<escaped_list_separator<char> > tk(line, escaped_list_separator<char>()); | ||
for (tokenizer<escaped_list_separator<char> >::iterator i(tk.begin()); i != tk.end(); ++i) | ||
col_names.push_back(*i); | ||
return col_names; | ||
} | ||
|
||
auto create_XML(std::vector<std::string>& tags, std::vector<std::string> rows) | ||
Comment on lines
+19
to
+28
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think there is some issue with indentation. Also, can we replace auto with data type (makes it easier to figure out all properties). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @heisenbuug, Can I also do this? |
||
{ | ||
static int ctr; | ||
ptree XMLobjectL; | ||
std::string tag, value; | ||
|
||
for (auto i : boost::combine(tags, rows)) | ||
{ | ||
//tag contains tags, value contains corresponding values | ||
boost::tie(tag, value) = i; | ||
XMLobjectL.put("annotation.object." + tag, value); | ||
} | ||
|
||
write_xml(std::to_string(ctr) + ".xml", XMLobjectL, std::locale(), | ||
xml_writer_make_settings<ptree::key_type>(' ', 1u)); | ||
ctr++; | ||
} | ||
|
||
auto create_JSON(std::vector<std::string>& tags, std::vector<std::string> rows) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hey, can we use camel case to be consistent. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @heisenbuug, Can I do it for you? |
||
{ | ||
static int ctr; | ||
ptree XMLobjectL; | ||
std::string tag, value; | ||
|
||
for (auto i : boost::combine(tags, rows)) | ||
{ | ||
//tag contains tags, value contains corresponding values | ||
boost::tie(tag, value) = i; | ||
XMLobjectL.put("annotation.object." + tag, value); | ||
} | ||
|
||
write_json(std::to_string(ctr) + ".json", XMLobjectL); | ||
ctr++; | ||
} | ||
|
||
void convertHelper(std::string path, std::string to) | ||
{ | ||
//static int ctr; | ||
static std::unordered_map<std::string, int> fileNames; | ||
std::vector<std::string> tags; | ||
std::vector<std::string> rows; | ||
std::ifstream file(path); | ||
std::string line; | ||
std::vector<std::string> col_names; | ||
|
||
std::getline(file, line); | ||
tags = tokenize(line); | ||
|
||
if (to == "xml") | ||
{ | ||
while (std::getline(file, line)) | ||
{ | ||
create_XML(tags, tokenize(line)); | ||
} | ||
} | ||
else if (to == "json") | ||
{ | ||
while (std::getline(file, line)) | ||
{ | ||
create_JSON(tags, tokenize(line)); | ||
} | ||
} | ||
} | ||
|
||
public: | ||
void convert(std::string path, std::string to) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's make this function static. No need to declare an object for converting datasets. Also Capitalize the first letter of the function. |
||
{ | ||
convertHelper(path, to); | ||
} | ||
}; | ||
|
||
// How to invoke | ||
/* | ||
int main() { | ||
Convert foo; | ||
foo.convert("path_to_csv.csv", "xml"); | ||
foo.convert("path_to_csv.csv", "json"); | ||
}*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, can we also add class description and usage here.