Skip to content

Commit

Permalink
Tokenizer: improve the serialization of the tokens by a tiny bit.
Browse files Browse the repository at this point in the history
  • Loading branch information
crista committed Sep 23, 2016
1 parent c0dc34a commit af758ef
Showing 1 changed file with 1 addition and 4 deletions.
5 changes: 1 addition & 4 deletions tokenizers/file-level/tokenizer.py
Original file line number Diff line number Diff line change
Expand Up @@ -161,11 +161,8 @@ def get_proj_stats_helper(process_num, proj_id, proj_path, file_id_global_var, F
tokens_count_unique = str(len(file_string_for_tokenization))

t_time = dt.datetime.now()
tokens = []
#SourcererCC formatting
for k, v in file_string_for_tokenization.items():
tokens.append(k+'@@::@@'+str(v))
tokens = ','.join(tokens)
tokens = ','.join(['{}@@::@@{}'.format(k, v) for k,v in file_string_for_tokenization.iteritems()])
tokens_time += (dt.datetime.now() - t_time).microseconds

# MD5
Expand Down

0 comments on commit af758ef

Please sign in to comment.