I've been hunting down internaltionalization bugs recently, funny how hard they are to uncover sometimes. In .Net there are tons of globalization options, but basically you work in one of two modes. There are times you want specific conversions based on the current user's settings. These are things like display, input etc. affecting numbers, date formats, strings, sorting order etc. Usually you want this, but there are times you don't -- for example if you were to write a date to disk. This is the second mode, where you want a standard format. This way when you reload, you know which is the month and which is the day for 1/2/2004 and can translate for the user accordingly. To set this 'standard neutral format' you use CultureInfo.InvariantCulture. This is described as being 'culturally insensitive', so naturally it is associated with English.
The good news is you can see this stuff fail by just switching the region/language settings in control panel. No need to go buy 200 intl versions of XP (although there is a multilingual version available for developers, it allows you to install the OS as any language).
More things can fail than you would first think. The first problem I had was just bonehead on my part - I had assumed Double.Parse defaulted to InvariantCulture, but of course why would it! The error was for a float, 0.0F. I had been stripping the possible end letters off ('F', 'f', 'D', 'd','M', 'm') before parsing the values and assumed the problem was there. Of course it was that some cultures would write the decimal with a comma, oops. This is solved by specifying the invariant culture. Of course my real error was not specifying culture at all, lesson learned...
The second problem was trickier. I have an enum of LiteralTypes (eg int, bool etc), however for consistancy the enum used Title case (Int, Bool). Actually, lowercase isn't possible here anyway, as they are reserved words. So when parsing, you have the string "int", and need to check if that is a literal type (LiteralType.Int). At first I used Enum.Parse, set ignore case and all was well. This worked fine for all languages I tested (editor's note: that would be English), however I didn't test Turkish (and thank you Burak for pointing out this bug). In Turkish, there are two letter I's, one with a dot and one without. So when you capitalize i it ends up being a capital I with a dot on top, which is not the same as an I without the dot. So int was not equal to Int. There is no way (that I can see) the specify InvariantCulture on Enum.Parse, so that failed. Attempt two used InvariantCulture in all the ToUpper and ToLower conversions, but that didn't do it either - it seems the interm strings will use local culture settings. What I really needed was a TextInfo object of InvariantCulture. As a bonus, there is a method on this called ToTitleCase, and with that, afaik, the problem is solved.
TextInfo ti = CultureInfo.InvariantCulture.TextInfo;
string val = ti.ToTitleCase(value);
p_literalType = (LiteralType)Enum.Parse(LiteralType.GetType(), val, false);
Here is the low down on some of the problems you may have if you are uncultured rather than just insensitive:
Culture-Insensitive String Operations
As for swf, it is culturally insensitive with 'toUpperCase', so it will always behave just like you would expect. Especially if you're English.
It often seems that similar things can be less compatible than radically different ones. I was assuming that testing on far eastern languages (CJK) would be best for flushing out bugs, however these are so different they are more like 'something else'. Turkish only has a handful of differences with English, and that is why it is hard. I used to think that was because similar things were more likely to share the same domain (causing more friction), but probably it is more to do with our built in expectations not being met.
posted on Friday, August 06, 2004 6:57 AM