Internationalizing Data on Mac and iPhone

Session 125 WWDC 2010

Mac OS X and iPhone OS have a rich model for processing and presenting language and locale-specific information such as dates, times, numbers, calendars, and time zones. Avoiding common mistakes when handling international data is critical to making your application ready for a global audience. Get the detailed knowledge you need to make your application shine no matter where in the world your users are.

OK, hello everybody.

I'm Deborah Goldsmith and today we're going to be talking about making your application ready for the entire world, or at least a big part of it.

So why should you care about that?

Well recently over half of Apple's revenue has been coming from customers outside of the United States.

And most of those customers don't use English as their primary language.

Now some foreign languages like French and German are pretty close to English in the way they behave.

But many of our biggest markets use languages that follow very different rules from English.

So it might be counter-intuitive about what you need to do to your application in order to support those languages.

Fortunately you don't need to figure out what those rules are yourself.

You can make your application ready for those markets by calling system APIs that do all the work for you.

And somewhat unusually today or at least in prior years we're not going to be spending a lot of time talking about API details or code samples.

Mostly we're going to be talking about concepts that will help you understand when you should call system APIs and which ones you should call.

So today we're mostly going to be talking about Internationalization not Localization.

What's the difference?

Localization is the process of translating your application's user interface.

So for example the text in the menus, the text in the buttons, the text in other controls, those kinds of things, it's the language that your application uses to talk to the user.

Internationalization is different.

Internationalization is about making data in all the world's languages work and there's many different kinds of that data.

For example there's text content, which may come from the user or it may come from an external source.

Dates, times, numbers, currency amounts, calendars can vary and we're also going to be talking about time zones a little bit, because those behave differently around the world.

So the goal for this talk is to help you understand how to make your application world ready.

And the goal is to have one version of your application, not a French version and a Chinese version, a Russian version, one binary that can support content in any language and also can run its user interface in any of the languages that the system supports.

So today again we'll be focusing on the Internationalization part not how to do Localization.

So content content is data that the user provides or it comes from an external source, maybe a website.

Content can be in multiple languages and the language doesn't have to match the language of the user interface.

That is the Localization language.

And not only can it be in multiple languages it can be in multiple languages at the same time in the same document or even in the same paragraph.

Now this is something we're not going to be focusing on this today but that's something to keep in mind as you figure out how your application processes text.

OK, language and the language preference controls the Localization, that is it controls the language of the menus controls of the user interface.

The way it works is the user picks a primary language in either the Language and Text pref pane in Mac OS X or in the Language Preference on iOS and that will pick one lang.lproj out of your application or out of another kind of bundle.

In addition the primary language controls a few other things.

It also controls the algorithm that is used to sort words for presenting an ordered list to the user and it also controls word breaking behavior.

On the desktop you can actually set those separately.

I don't know if you can see it peeking out from behind the iPhone, there's a little pop up there, which is the order for sorted lists.

Mac OS X lets you set that independent from the UI language, although by default it's the same as the UI language.

If you change the UI language you must restart applications for it to take effect.

And in the case of iOS it actually restarts the device so that all of the user interface is running in the same language.

OK the thing that we're going to be spending the most time on today is controlled by the Locale or Region preference.

Again that's in the Language and Text pref pane on the desktop and it's in International Preferences on iOS.

And this controls things like dates, times, numbers, calendars and so on are formatted.

There is a language component to the Locale or Region and you usually that language component is the same as the UI language, but it isn't always.

Now one big difference between the Locale or Region and the UI language is that you can change the Locale without having to re-launch applications.

So that's what users are going to expect in terms of correct behavior.

So here's our cast of characters or cast of classes in any case.

These 6 classes are what we're going to be spending the most time talking about today.

NSLocale is kind of the controller class for all of this.

It embodies the current region and format preferences from the user and it has a lot of different properties that you can set individually.

NSNumberFormatter as you might expect is a class that you use to format numbers and also to parse them.

NSDateFormatter does the same thing for dates and times and you may have heard of all of these classes already.

A couple of less familiar classes are NSCalendar, which handles calendar operations and NSTimeZone, which encapsulates logic for time zones.

And I'm assuming that you've all heard of NSString and we're not going to be going over all of NSString today but just the parts that pertain to natural language processing.

So let's start with NSLocale.

NSLocale again is set by the Region Format preference in the pref pane and all Locales have an identifier associated with them.

That identifier is a string, which kind of sums up the part of the world that the Locale has to do with.

So one example is the US English Locale and the identifier for that is just en_US.

Below that I have a more complex example, which we'll go through in detail, which shows most of the parts of a Locale identifier.

So every Locale identifier has a language and in this case the language is Serbian as represented by the small sr there at the front.

That part of the Locale identifier uses ISO language codes.

Almost always there's also a region in this case sort of counter-intuitively the RS stands for Serbia.

As I said almost all Locales have a region but sometimes they don't.

For example there's an Esperanto Locale.

Esperanto is an artificial language.

It doesn't really correspond to any country and so there is no country or region for the Esperanto Locale.

Also a region doesn't necessarily have to be a country.

There are letter-based codes, alphabetic codes for countries but there are also numeric codes for regions that represent parts of the world.

So for example, there is a region for all of Latin America.

There is a region for all of Europe.

Mostly we don't use those but it is possible to have that in that position.

Now something that you'll sometimes see in a Locale identifier is the script and the script is there usually for one of two reasons.

First and most importantly is if you need it for disambiguation.

So for example, Serbian is written about with equal frequency in the Cyrillic script and the Latin script.

So you always need to specify which script you're using when you're specifying a Serbian Locale, so it's required.

Sometimes you want it for overrides.

For example, there are two kinds of Chinese writing that we support in the system.

There is the simplified Chinese set and the traditional Chinese set.

And usually you can infer which one to use based on the region.

So, for example, the Chinese in Hong Kong region implies traditional Chinese.

However, if you wanted to set your Region preference to be Hong Kong using Chinese, but you wanted to force it to use the simplified version in that case you'd specify the script explicitly.

Sometimes there's a variant, very occasionally and I'll give an example of that later on.

I don't want to focus on it too much.

Finally Locale identifiers can also have keywords and in this case there's a keyword for the currency, which overrides the currency that comes from the Locale data.

So for example, in this case we're specifying that we want to use the Euro regardless of what the default is for Serbia.

You can also override the calendar using a keyword.

So where do you get a Locale object from?

Well you can create a Locale from the identifier string if you want a specific Locale but the usual way to get one is to call CurrentLocale and that will give you a Locale object that corresponds to the user's current Preference Settings.

Now that object won't change after creation and remember users expect that if they change the Preference Setting that the behavior of your application will change almost immediately.

So in order to react to that there's a notification you can respond to, which is and its CurrentLocaleDidChangeNotification, say that 10 times fast.

And so your application can look for that and when you receive it you can go through and update all your objects.

There is a convenience function, which is the autoupdatingCurrentLocale class method on NSLocale and that will give you an NSLocale that responds to that notification itself so it will update itself when the user changes their preference.

In addition, if you set that Locale on a number formatter or a date formatter or on any other kind of foundation object that takes a Locale, those objects will in turn update automatically when the notification comes in.

Now something you have to watch out for if you are looking for the notification yourself is that the NSLocale is looking for the same notification so you can get into a little bit of a race condition and I'll give you an example where that could occur.

Let's say you have a window and that window shows today's date at the top and you want that date to change when the user changes their Preference so that it uses the proper Region format.

So you might set up that date with an NSDate formatter that's sent to the autoupdatingCurrentLocale and then you look for the LocaleDidChangeNotification to repaint the window.

Well the problem is that you and the locale are both looking for the same notification and if you get it first you'll repaint the window before the Locale has had a chance to update and you'll still get the old formatting.

So, if you're using the autoupdatingCurrentLocale and you're also looking for the notification, it's important to keep in mind that it's non-deterministic who gets it first.

So let's move on to talk about numbers and some of the differences you might see in the way they're formatted between Locales.

So, one important difference is the decimal point character and the grouping separator character, but also the size of the groups.

In the United States and in many other countries groups are 3 digits in length so it's every 1,000 but in some Locales groups are 4 digits in length and in still other Locales different groups in the same number can have 4 digits or 3 digits.

So the first group might have 4 digits but subsequent groups might have 3 digits.

In this case we have a U.S. English formatted number on the left and we have a French formatted number on the right.

The French number uses a non-breaking space for the 1,000 separator, the grouping separator and a comma for the decimal point.

Not every Locale uses the ASCII digits for representing numbers, most do, but not all of them.

In this case on the right we've got a number formatted according to the Arabic Locale and you can see that it uses a completely different set of digits in order to represent the number.

Currency can also vary, not just the symbol but also where it appears.

So for example, again we've got a French formatted currency amount on the right and the Euro appears, the currency appears after the number and separated by a space.

Another thing to keep in mind is that the currency symbol can change even if the currency is the same.

So for example, in the United States when we represent an amount in dollars, we just use the dollar sign.

But if you're representing that same amount in dollars in say Australia then it would say U.S. dollars because in Australia a single dollar character means the Australian dollar not the U.S. dollar.

Percentages can also vary in the way they're formatted.

Not just the digits used but as you can see Arabic uses a different percent character than the Roman alphabet and also the different digits.

Also the positioning of the percentage sign either after or before the number can vary and finally even for floating point concepts like not a number or infinity some Locales localize that data.

So, for example, we use NAN for not a number in the U.S. English Locale but Icelandic uses a different string, which I don't even know how to pronounce but that's it on the right.

So if your needs are simple for number formatting it's very straightforward NSNumberFormatter has a class method and this is new in OS 10.6 and in iOS 4.

All you have to do is pass in the NSNumber for your Number and which Number style you want and you get a string back.

No muss, no fuss.

And there are 4 basic Number Formatting paradigms that are supported.

Paradigm is probably too big a word for this, styles.

OK, general just formats things in a general floating point number.

There's the currency style, which uses the currency symbol, there's percentage and you'll notice that for percentage the number is multiplied by 100 and the reason is that you would use the percentage format to represent a number between 0 and 1 and that would be formatted as a percentage between 0 and 100.

And finally there's a scientific style if you want a scientific notation, and again this can vary between Locales.

There are some more advanced things that you might want to do with numbers and for those advanced uses you want to create an NSNumberFormatter object and keep it.

So one example is if you're formatting a lot of numbers or creating one NSNumberFormatter and then calling it repeatedly is more efficient than calling the class method.

There's no class method for parsing numbers.

So if you want to parse a string into a number you need to create an NSNumberFormatter.

And if you need to tweak the format for example, controlling the number of significant digits, whether the fraction is shown, whether the decimal point is always shown, how the sign is represented, etc., etc., etc., there are accessor functions on NSNumberFormatter that set that up and if you need to do that you need to create an object.

Well what are some of the things that can go wrong if you don't call system APIs?

And we'll have several slides like this and all of these are lifted from real examples that happened in real applications.

So, one problem is using stringWithFormat or printf or scanf for formatting or parsing numbers.

The problem is that %e and %f will not handle non-ASCII digits like the Arabic example that we saw earlier.

So you cannot use these APIs to format numbers in a localized way.

People sometimes assume that the decimal point or the grouping separator or the size of groups are the same as whatever country they are living in whereas it varies considerably around the world.

The same thing for percents; people assume that a percent is always formatted as the ASCII digits 0 through 9 followed by an ASCII % sign.

Sometimes people will create an NSNumberFormatter and then set the pattern string and that's fine in some circumstances, but it will erase the Locale specific formatting that you get for free and so all of a sudden your number formatting is not localized any more.

And finally, a problem that people can run into is let's say you've got a document and its showing $2.00 and the U.S., sorry the user goes and changes their Currency preference to another currency, say the Euro.

Well now you've got two Euros except of course $2.00 is not two Euros.

So if its user supplied data then that's generally not something you need to worry about.

But if you are writing an application where you could be converting amounts between currencies it's important to realize that the system does not do that for you.

All that's changing is the way the number is represented and any currency conversions you have to handle yourself.

OK, moving right along, let's talk about dates and times and what differs between Locales.

I mentioned that a Locale has a language associated with it and that is the language that controls the names of the months, the days of the week, the AM/PM strings and also relative terms like today, yesterday, or tomorrow.

Now if I'm running my system in English but I set my Region Preference to be a French Locale I'm going to get French month names because that is what's associated with the Locale.

So there's an example of today's date in the French Locale.

Another thing that can differ between Locales is the Calendar in use.

So here again is today's date, except in this case we're using the Japanese Locale and we have the Japanese Calendar set and it's the date is the 22nd year of the Heisei era, June 10th.

Another thing that can be different between Locales is which day is the first day of the week.

In the U.S. when you're representing days in a calendar Sunday is on the left and Saturday is on the right.

But in other places Monday is on the left and Sunday is on the right so that can vary.

Some places use 12-hour time like the U.S. and other places like Japan or in Europe use 24-hour time and that also varies.

Even if the language is the same the order of the date elements can be different.

So, for example, in the U.S. we say June 10, 2010, but in the UK they write it like this 10 June 2010.

So again there are some predefined styles for date and time formatting and also parsing and those have the names Short, Medium, Long and Full.

And as you can see as you go from Short to Full you get more and more information represented in the resulting string.

Now there are two ways you can use this.

One is to just pick a style and stick with it.

So say I'm always going to use the long date.

But another thing you can do is start at a particular length and then maybe make it smaller, for example, if the date is in the column and the user shrinks the column, and we do this in the finder.

We start with the longer forms of the date and time and then as the user shrinks the column that represents that data we fall back from the Long to the Medium and the Medium to the Short to take up less room.

Starting in Mac OS X 10.6 and in iOS 4 we have some new features, both of which are very useful and I'd like to spend a little bit of time talking about them.

So you've probably noticed that in different places in the OS, for example the Finder or in Mail on the phone, you'll see relative terms like today or yesterday and it hasn't been very easy to do that in the past but now we've got a property on NSDateFormatter with a RelativeDateFormatting property and you can set that and your dates will do the same thing and there's an example there, except it wasn't yesterday, it should have been June 9th, but never mind.

Another new facility, which is very useful is the Date Template Facility.

Now what would you use that for?

That is what you would use if the predefined date formats or time formats doesn't meet your needs.

If you need a different subset of the date elements and what you do is you pass in a template string and a Locale and some options and this class method will return a format string, which you can then turn around and set on your NSDateFormatter and that will format things according to what you requested.

So, for example, let's say I'm writing a Calendar application and I want to put an hour view down or say a day view where I have hours down the left hand side, well, I guess the left hand side would be over here for you guys, but I just want the hour, but I don't know if it's 12- or 24-hour time.

And before this API you had to do a lot of fussing around with the date format string to try to figure out how to set up this piece of a date or time format.

In this case though, you can just pass the template string j, which is a meta-character that just says give me the hour whether it's 12 or 24 and depending on the Locale you'll get back a different format string.

So for English you may 12-hour hour followed by an AM/PM indicator or you'd get the 24-hour hour character and you get the two different results you see on the right.

Maybe my Calendar application has a month view and the month view has the month and the year at the top, but I don't know if the months or the year are supposed to come first.

If the user is using the Japanese calendar, do I put the era in?

This API takes care of that for you.

So in this case you pass a template where I say I want the shortest possible representation of the year but I want the full name of the month.

And the strings you get back are for the U.S. Locale you get the name of the month, a space and a year.

But in the Japanese Locale using the Japanese calendar I get the era, I get the year, I get the month and everything works out fine.

So what are some of the kinds of errors that we've seen people run into with dates and times?

Well something that seems to happen a lot is that people us NSDateFormatter for parsing or formatting non-localized dates.

For example dates that you get off the Internet or dates that appear in some internal data format and if you use NSDateFormatter that way without understanding that it's localized you can get bad results.

Typically dates like that aren't localized.

Now to get an NSDateFormatter that you can use to parse or format dates and times like that instead of getting one set to the current Locale, which is the default, just create a new Locale, set it to this identifier and this is an example of the variant that we talked about earlier when we were talking about Locale identifiers.

So in this case this is the POSIX variant of the U.S. English Locale and this is the Locale identifier that corresponds to the standard C Locale.

This will always give you stuff back that uses English names of months and is formatted in a standard way.

So if you set your NSDateFormatter using that Locale you'll get non-localized dates.

Another option is to just call the BSD layer where there are APIs for parsing and formatting dates and times and if you do use those, just pass NULL for the Locale because that indicates again the C Locale and that make perfect sense because the primary purpose of NSDateFormatter is to handle localized dates and times.

If you're dealing with a non-localized date or time you can set it up to do that but you don't really need to use it.

Another thing that people sometimes did is parse format strings.

So, for example, if I were writing a Calendar application and I wanted to put the months and the year at the top of the view, there was no way to get NSDateFormatter to do that prior to these recent releases.

So what people would do is they would set the full date format, then they would extract the format string from the NSDateFormatter and then they would go picking through it to try to figure out which pieces to use and that's very error prone.

But now that there's a dateFormatFromTemplate you don't need to do that anymore.

Another thing that people do is use NSCalendarDate at all, it's deprecated and you shouldn't use it.

You can use NSDate to represent a date and time.

In fact, that's its primary purpose but people have used the description method on NSDate to format dates and it will not format a date in a proper localized fashion.

So you should use NSDateFormatter whenever you're parsing or formatting dates that are localized.

And then another mistake that people have made is assuming that the calendar is always Gregorian.

OK there we go.

So Calendars, let's talk about some of the things that can differ from different Calendars and different Locales.

Well one is the year.

This is the year 2010 in the Gregorian calendar.

However, in the Thai Buddhist calendar this is the year 2553 and in various other calendars the years are all over the place.

Every Calendar has an implicit era.

So for example in the Gregorian calendar we're in the AD era, but usually we don't bother representing that.

However, for some calendars like the Japanese calendar the era changes rather more frequently and it's important to take that into account.

So, for example, this is the 22nd year of the Heisei era.

Another thing that can vary between calendars is the number of months in a year.

So for example, the Gregorian calendar always has 12 months but some calendars have 12 months or 13 months or even the number of months can vary from year to year.

The lengths of months can also vary.

You can remember of the names of the months of the Gregorian calendar using the nursery rhyme but other calendars have a different set of months and those months have different lengths than the lengths of the Gregorian calendar.

Even the lengths of the months in the Gregorian calendar can vary depending on whether it's a leap year or not.

And some calendars, for example the Coptic calendar, have months as short as 5 days.

Another thing that you really wouldn't expect is that the year can change other than at the first day of the first month of the year.

So for example, in the Japanese calendar the year changes when the reign of a new emperor begins, which doesn't have to be January 1st.

So for example, the day after January 7th of the 64th year of the Showa era is January 8th of the first year of the Heisei era.

Well fortunately an NSCalendar takes care of all of this for you.

It abstracts all the operations that you might want to do on calendars or dates, determining how many days are in a particular month, how many months are in a year, converting between calendar components and an absolute date/time, doing operations like what's the date 3 days after this one, all sorts of things like that.

Mac OS X 10.6 has support for a large set of non-Gregorian calendars.

iOS 4 supports what we call Gregorian-like non-Gregorian calendars and those are Gregorian calendars where the set of months is the same but the year and era may be different.

And at some point in the future we plan to expand support of non-Gregorian calendars on iOS also.

So what are some of the things that can go wrong when you're doing Calendar operations and you don't let the system handle it for you?

And again, these are all lifted from real situations that we've seen.

One is assuming Gregorian calendar, assuming that there are always 12 months in a year.

This is an interesting one, assuming that month numbers are sequential.

Remember that I mentioned that some calendars have years with 12 months and years with 13 months?

An example of that is an Arabic calendar.

Well in the year that has 12 months or rather the year that has 13 months that extra month is not at the end, it's in the middle.

So in a year with 12 months that month is not there.

So you skip over it when you're numbering months in a year without that month.

The same thing can happen with days, even in the Gregorian calendar.

For example, October 15, 82 in the Gregorian calendar only has 21 days and you go straight from October 4th to October 21st I think.

You can't assume that the error is optional because just, for example, in the Japanese calendar seeing the year 22 doesn't tell you anything if you don't know whether it's Heisei or Showa or what have you.

Some Apps assume that weeks always start on a Sunday.

People have been tripped up by the fact that the year can change other than on the first day of the first month.

And something that's really tricky is recurrences.

So again let's assume you're writing a Calendar application and you want to allow the user to set up a meeting that happens once a month or somebody's birthday, which is once a year or, for example, the last Tuesday of the month or the second Thursday.

Well what those terms mean changes when you change the calendar.

So, for example, if I have somebody's birthday and it's a particular day of a particular month in a particular calendar that recurrence relationship is different if I switch calendars.

The day that is the second Tuesday of the month is not the second Tuesday of the third month is not the same in the Gregorian and Arabic calendars.

They are different days.

So if you're defining a recurrence relationship like this in your calendar or a similar kind of application, it's important to keep track of the calendar that was used to define it.

So if the user set their birthday in the Arabic calendar you should keep track of that fact that it was set in the Arabic calendar.

OK let's spend a little time talking about Time Zones.

Those can also be a little counter-intuitive.

Every time someone has an offset from what's called Greenwich Mean Time or universal coordinated time, although those are not precisely the same thing, they're close enough for what we're talking about.

There are also rules about whether daylight time is observed and when it's observed.

Every time zone has a unique identifier.

Time Zone information in Mac OS X and iOS come from something called the Olson database and that's used by a wide variety of computer systems.

Every time zone in the Olson database is uniquely identified by an ID.

But time zones also have localized names.

I will spend a little bit more time talking about that.

And as Time Zone represents the abstraction of the Time Zone and it will tell you the answers to all of those questions.

So what are some of the errors that we've seen people make in working with Time Zones in applications?

One is assuming they know what the GMT offset is or what the rules are for daylight savings time.

So, for example, the U.S. as a country change the dates where we observe daylight savings time a few years back.

So those can be different based on what time period you're observing.

So if you're formatting a date that is back in say 2001 then it's going to be different from formatting a date that's in 2010 in terms of when daylight savings time kicks in.

In addition, historically whether you're observing daylight savings time or when it happens or even what your DMT offset is can vary.

So for example, for a long time Indiana did not observe daylight savings time and then when the U.S. changed the rules a few years back they decided that they would start observing it but at the same time different counties in Indiana decided that they would switch their time zones.

So for some counties that were previously Central became Eastern and some that were Eastern became Central and so NSTimeZone takes care of that for you.

It tracks it as long as a user sets a Time Zone preference correctly it will keep track of all those historical changes.

Another thing that people do is use the Olson ID which is really more like a programming identifier to show to the end user.

So, for example, the time zone that we're in right now is called America/Los_Angeles.

That's not really something that you want to show to a user and NSTimeZone and NSDateFormatter will let you get a localized name that will make more sense.

If you do call NSTimeZone make sure that you're getting the right version of the Time Zone name, the generic name versus the daylight name versus the standard name.

Another assumption that people sometimes make is that the short IDs for time zones things like PST for Pacific Standard Time are unique, they are not.

For example, PST is also used in Australia.

So you can't look at something like PST and assume you know what the full Time Zone is.

OK, lastly we're going to spend some time talking about Natural Language Processing with NSString.

So there are two operations that we're going to talk about today.

One is Breaking a string into pieces and the other is Sorting.

So new in 10.6 and iOS 4 there is an API NSString called enumerateSubstringsInRange:options:usingBlock:.

And you can use that to perform lexical operations on a string of Natural Language text.

You can find word boundaries, line break opportunities, sentence boundaries and so on.

And this is one of the APIs that's controlled by the UI language not by the Locale.

Another thing that's controlled by the UI language is the sort order.

Excuse me.

I'm just going to take a sip of water.

[Sound effects] And that's important because different languages can have very different sort orders and the way certain features are handled varies between those languages.

So, for example, the way diacritics are handled when sorting English is completely different from the way it's handled when sorting French.

And the API that you can use to do any kind of comparison for sorting purposes is localizedStandardCompare.

So here I have two examples of a sorted list just to show you how different a sort order can be.

On the left we have Hawaiian.

Now that list may not look alphabetized to you, but it is and the reason for that is that native Hawaiian uses a set of letters which is a subset of the 26 letters that are used for English and words with those letters, that is native Hawaiian words, always sort before words that use other letters.

So in this case letters like B and C are not used in native Hawaiian and therefore they sort at the end.

Similarly for French the thing that's different is the way that accents are handled.

So if you look at the last three items in the French list, in French the accent at the end of the word is more significant than the accent at the beginning.

And so you can see that those last three words are sorted according to first to the accent at the end and then according to the accent at the second position and it generalizes.

You go through the accents in French backwards in order to determine the sort order.

So localizedStandardCompare will take care of all of this for you as long as you call it.

So what are the kinds of errors that we've seen people make in applications by not calling the right APIs?

One is that people often assume that words and lines are always separated by whitespace, space, character, tab, return, etc. That's not true for many languages including some in large markets like Japanese, Chinese and Thai.

A very common mistake is to use NSString's Compare method for sorting a list that is going to be shown to the end user.

The problem is that Compare is not localized.

It will not take any of those language issues we just discussed into account.

It just uses a fixed binary order.

So Compare is great if you're doing something like building a B tree index where you need a fixed comparison order and you don't want it to change when the user changes their preference.

But if you're sorting a list to show to users then you need to use a localized comparison.

There is an extended version of Compare that can do localized compares if you need it for advanced use.

So, for example, in that API is compare:options:range:locale and you'll get a localized sort as long as you pass something for the Locale.

An example of where you might want to use this, for example, localizeStandardCompare turns on the option that we call numeric sorting and what that does is if you have numbers that appear as part of the strings that you're sorting, it will compare those as the actual numeric value.

And that's what the finder uses so that for example, if you File 1 through File 9, File 10 through 19, the finder will sort those in numeric order.

If you don't want that, you can call the advanced form of Compare and turn off that option.

That's the kind of case where you might want to call this more advanced API.

Another error that we see people make quite commonly is doing comparisons for sorting with diacritic-and case-insensitivity.

Now those options are intended for searching like a find dialogue, not for sorting a list.

If you turn on diacritic-insensitivity that French example that we saw will not be sorted properly.

Similarly if you turn on case-insensitivity you will not get the right order because some languages the order in which upper and lower case versions are shown differs based on the language.

Some languages put the uppercase version first and others put the lowercase version first.

Again, if you turn on case-insensitivity when you're sorting, the order in between upper and lower case will be essentially random.

It's whatever falls out of your sort algorithm.

So whenever you're doing a sort always make sure that you are diacritic- and case-sensitive.

OK we'll we're pretty much done.

Two sessions that are related to this topic are Advanced Text Handling for iPhone OS, which was Tuesday at 4:30 and Understanding Foundation, which was the session immediately before this one.

So all of you get in your time machine and get back and go and watch those sessions.

But if you don't have a time machine and you didn't attend the sessions already then you can find all the information about these sessions on the WWDC website.

And for more information you can go to the http://devforums.apple.com.

I'm sure you all have this URL already.

Apple, Inc. AAPL
1 Infinite Loop Cupertino CA 95014 US