Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Py3 std::string is not converted to 'bytes' #85

Closed
pavelschon opened this issue Aug 26, 2016 · 4 comments
Closed

Py3 std::string is not converted to 'bytes' #85

pavelschon opened this issue Aug 26, 2016 · 4 comments

Comments

@pavelschon
Copy link

pavelschon commented Aug 26, 2016

In Python 3, std::string is incorrectly converted to python 'str' type. It should be converted to 'bytes' type. Pull request #54 probably solves this issue.

Code of str_test module:

#include <boost/python.hpp>

std::string getString()
{
    return "string";
}

std::wstring getWString()
{
    return L"wstring";
}

BOOST_PYTHON_MODULE( str_test )
{
    boost::python::def( "getString",  &getString );
    boost::python::def( "getWString", &getWString );
}

First example is Python 2 and shows correct results.

Python 2.7.9 (default, Mar  1 2015, 12:57:24) 
[GCC 4.9.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from str_test import getString, getWString
>>> type(getString())
<type 'str'>
>>> type(getWString())
<type 'unicode'>

Second example is Python 3 and shows that std::string is unexpectedly converted to 'str'.

Python 3.4.2 (default, Oct  8 2014, 10:45:20) 
[GCC 4.9.1] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from str_test import getString, getWString
>>> type(getString())
<class 'str'>     # ooops!!! expected bytes there!
>>> type(getWString())
<class 'str'>
@stefanseefeld
Copy link
Member

I agree this is an issue, though I'm not yet sure what the right fix is. std::wstring is defined as std::basic_string<wchar_t>, with wchar_t being of non-portable compiler-dependent width. So it isn't even clear what data it can hold, much less what encoding the user choses to store in it.

I think it would be best to not provide a default conversion for it, rather than second-guessing what the user may want / need.

@pavelschon
Copy link
Author

I think the issue is not with std::wstring, but std::string.

The following workaround works for me:

#if PY_MAJOR_VERSION >= 3
object getString()
{
    return object( handle<>( PyBytes_FromString( "string" ) ) );
}
#else /* python 2.x */
std::string getString()
{
    return "string";
}
#endif

Then it produces expected results:

Python 3.4.2 (default, Oct  8 2014, 10:45:20) 
[GCC 4.9.1] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from str_test import getString, getWString
>>> type(getString())
<class 'bytes'>
>>> type(getWString())
<class 'str'>

@stefanseefeld
Copy link
Member

OK, fair enough. :-)
Still, I think the handling of getWString() is wrong, too, i.e. Boost.Python shouldn't provide a default converter for std::wstring.

@tadeu
Copy link
Contributor

tadeu commented Jan 3, 2017

I think the handling is already correct in both cases, because std::string and std::wstring are both supposed to handle text, not bytes. std::wstring is supposed to be text encoded in UTF-16 when sizeof(wchar_t) == 2 (i.e., in Windows) and in UTF-32 when sizeof(wchar_t) == 4 (i.e., in Unix with default compiler flags). By extension, the universally accepted encoding for std::string would be UTF-8.

In Python 3 philosophy, bytes is supposed to be just that, an array of bytes, not text. Sometimes this array of bytes may be holding an encoded representation of text, but this is just a special case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants