Dev Blog

UTF-8 is here!

by Taavi on September 21, 2011

Today I’m happy to announce that FreshBooks is fully UTF-8!

What this means for you: You can use unicode snowman and all the other characters in the Unicode Basic Multilingual Plane to your heart’s content, knowing that each symbol takes the space of only one code point (essentially a “letter”), for example being able to fit a full 50 non-Latin characters into an invoice item name. It also means that you’ll be able to use snowmen and other extended characters in the time tracking section of FreshBooks!

What this means for you as an API consumer: You can now send FreshBooks the full gamut of BMP characters directly as UTF-8 with no encoding shenanigans! Previously anything not in the ISO-8859-1 character set would be smashed into a question mark, including XML entities representing those extended charaters. As a workaround, one could “double-encode” entities, e.g. your XML would contain the ASCII string ☃ to get a snowman. This is no longer required!

Here’s how things work, starting today:

If you submit this via your browser It looks like this in the browser And like this in an API response
& & &
☃
☃ ☃ ☃
If you submit this via the API It looks like this in the browser And like this in an API response
& & &
☃
☃

For the next month or so we’ll continue to accept HTML entities through the web application (and double-encoded entities through the API), and display them just like we have for years. But after that, we’ll migrate all the exiting entity-based characters into real UTF-8 codepoints, and flip the switch such that plain text is plain text.

When that happens, the behaviour will change like this:

If you submit this via your browser It looks like this in the browser And like this in an API response
& & &amp
☃ ☃ ☃
☃ ☃ ☃
If you submit this via the API It looks like this in the browser And like this in an API response
& & &
☃
☃ ☃ ☃

Enjoy!

2 Comments (add comment)

Heiko Haljand says:
Jan 25/12 7:25 am

I assume it means API understands UTF-8 encoded content IF the XML sent is specified as encoding=”UTF-8″? Does it still leave the option to encode XML as ISO-8859-1?

Fresh Taavi says:
Jan 25/12 10:34 am

Yes, you can specify an encoding in your XML PI, just as you could before.

Previous behaviour was very silly, because by default we’d accept UTF-8 bytes, and they’d THEN be converted to ISO-8859-1. If you used to specify ISO-8859-1 explicitly, the behaviour hasn’t changed at all. (but please DO NOT send double-encoded entities, as they will stop displaying correctly soon).

Also be aware that the XML parser uses a strict interpretation of encoding, so ISO-8859-1 bytes 0×80 through 0x9F are control codes mapped to U+0080 through U+009F. Many other places on the web (e.g. a browser!) those bytes are interpreted as Windows-1252 (i.e. 0×80 is €). If you actually have Windows-1252 bytes, be sure to specify encoding=”Windows-1252″ instead.

Leave a Comment ( *required)

*
*

*

Search