Skip to content

Commit

Permalink
Handle unicode regularizer
Browse files Browse the repository at this point in the history
On Python 3, the default string is of Unicode type, which caused this
comparison some issues. In particular, the length comparison was off as
the Unicode string may have more bytes than the equivalent ASCII string.
To fix that, just encode Unicode strings as ASCII and convert them to C
strings. This handles Python byte strings just as well. Then it is a
simple matter to compare the string length and string value. If the
encoding goes wrong (like if it isn't any kind of string), then we get a
`NULL` value, which we raise for just like if the string didn't match
the right value.
  • Loading branch information
jakirkham committed Jun 12, 2017
1 parent fa40d03 commit d93778d
Showing 1 changed file with 5 additions and 2 deletions.
7 changes: 5 additions & 2 deletions code/cmt/python/src/pyutils.cpp
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
#include "pyutils.h"
#include <inttypes.h>
#include <string.h>

#include "cmt/utils"
using CMT::Exception;
Expand Down Expand Up @@ -378,10 +379,12 @@ Regularizer PyObject_ToRegularizer(PyObject* regularizer) {
Regularizer::Norm norm = Regularizer::L2;

if(r_norm) {
if(PyString_Size(r_norm) != 2)
char* r_norm_str = PyString_AsString(r_norm);

if((r_norm_str == NULL) || (strlen(r_norm_str) != 2))
throw Exception("Regularizer norm should be 'L1' or 'L2'.");

switch(PyString_AsString(r_norm)[1]) {
switch(r_norm_str[1]) {
default:
throw Exception("Regularizer norm should be 'L1' or 'L2'.");

Expand Down

0 comments on commit d93778d

Please sign in to comment.