Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create spam classification tutorial #112
Create spam classification tutorial #112
Changes from 1 commit
3bf2dd2
aa5cec5
0236200
65d1df2
9b04377
943266c
ea562e7
1586e9b
0c6ba9c
11a07b7
d7c4891
1ed293c
b4df918
1e57f69
da82a46
7e5de02
3884cfc
a2ce18a
5c0ff28
4cd27f5
486eab5
3dba2c7
9013136
e7b27b2
6f4e7f3
5b1205d
62e1044
710c017
19f3803
6cc39bd
c495d4b
fd0d1e4
a1219a1
a38bce1
8132776
0677b0a
a58a6f4
9d9c401
72460e6
745e953
fef1858
4285f0e
a518cb2
2db8885
1ced319
32fec35
cbd31c3
f483f4e
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's nice to see other people who do data science with
sed
,awk
,rev
,tr
, andgrep
too! 😄There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the extra line here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I get the last sentence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps it can be reworded as:
To enable easy comparison of words which will be used as the features, only letters a-z, line endings \n and spaces are used as features. A larger feature set can be helpful, but for small data sets the occurrences of other symbols are not frequent enough to help in classification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I would remove
nice
as the default behaviour, we could mention it on the side.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On a low end laptop nice is quite useful to enable other work. On a more powerful machine, the effect will not be to drastic, so that in both cases the code works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option, if you like, would be to write a utility C++ program, and then have the users in this tutorial compile it. However, I suppose that we are not guaranteed that the user has a compiler available, since they are just using the command-line bindings. Let me know what you think.
(Also, we have some TF-IDF support coming into mlpack, so maybe the bash script above could be replaced in the future with that! It will be a lot faster too. 👍)