Skip to content

Latest commit

 

History

History
75 lines (62 loc) · 3.66 KB

README.md

File metadata and controls

75 lines (62 loc) · 3.66 KB

IconvQtPOC: Integrating iconv() for Unicode conversion with a Qt app (POC)

This project is a proof-of-concept for using iconv(3) to convert text from any charset to Unicode, integrated in a Qt app.

Why this exists?

Qt has a convenient class: QTextCodec, that performs conversion between Unicode and other character sets, until Qt5. In Qt6 this class was moved from QtCore to the Qt5Compat module, which means that it will be relegated in maintenance priority, and it may be totally removed in the future.

Drumstick::File is a library, belonging to the Drumstick project, that supports Input/Output from Standard MIDI Files (.MID), which include text metadata, like lyrics, comments, credits, copyrights, instruments, marks and cue names. Some programs based on Drumstick (like kmidimon and dmidiplayer ) display metadata to the users, so the text needs to be converted to Unicode.

SMF metadata

The SMF standard was created as a file interchange format, so users could export compositions from one MIDI sequencer proprietary format, and import those files into another sequencer. Some sequencer hardware machines and software became deprecated and unusable with time, and the data files in proprietary format became unreadable, losing work unless the users kept SMF exported files. Those files should now be treated as original manuscripts for data preservation.

The authors of the standard, naively stated that all text metadata should be encoded in ASCII within a MIDI file, omitting the fact that many cultures exist around the world needing other encodings to reliable store their languages. Of course, the result was that composers, manufacturers and software developers all over the world ignored that standard requirement and used whatever encoding that was available at the time. Nowadays, a good encoding choice is UTF-8, and this is what Rosegarden does when exports compositions as MIDI files. But for importing from MIDI files, software apps need to convert from some unknown encodings into Unicode for processing.

Tested solution

This project does not read SMF files. Instead, it reads plain text files with unknown encodings, and tries to detect the input charset using uchardet. The user has a control to choose another charset using a combobox, if the detected one is not correct. Then, the input data is converted to Unicode by iconv(3) and displayed to the user.

There is another POC, functionally equivalent, alternative using ICU.

CMake minimum requirement is v3.11, because FindIconv.cmake was introduced in that version.

iconv() is a POSIX API, and it is included in the C library of many Unix systems, like the GNU libc. This may be detected by cmake, and in this case no additional library needs to be linked to the application program, but the external libiconv is also supported when needed.

Character set detection was never provided by Qt. This POC uses the latest uchardet library (0.0.7 at the time of writting this), which only supports pkg-config. A newer version will include also support for find_package() and imported targets.