Sorting Strings in PHP the Right Way: Why Locale Matters
Sorting text might look trivial at first glance. You take an array of names, run sort(), and you’re done, right? Well, not exactly. When dealing with human names, words with accents, or different alphabets, plain sorting can lead to unexpected and incorrect results.
Let’s explore why locale-aware sorting is crucial in PHP and how the Intl extension’s Collator class solves this problem.
The Problem with Default Sorting
Consider the following PHP array:
$names = ['Ása', 'Zoë', 'Émile', 'Ana'];
sort($names);
You might expect Ana to come before Ása, but the actual output may surprise you. The default sort() function compares binary code points, not linguistic rules.
That means sorting is based on the Unicode value of each character. While this works fine for plain ASCII (A–Z), it fails once accents, umlauts, and non-English letters enter the picture.
For example, Á and É are treated very differently than plain A or E. To a computer, they’re separate characters with their own numeric codes. To a human, though, Ása should clearly be near Ana, not somewhere unexpected.
Enter Collator – Locale-Aware Sorting
The solution is to use the Intl (Internationalization) extension in PHP, specifically the Collator class.
$names = ['Ása', 'Zoë', 'Émile', 'Ana'];
$collator = new Collator('fr_FR');
usort($names, fn($a, $b) => $collator->compare($a, $b));
print_r($names);
Result:
['Ana', 'Ása', 'Émile', 'Zoë']
This is correct French ordering. Why? Because the Collator understands that in French (fr_FR locale), Ana comes before Ása, and so on.
Why Locale is Crucial
Every language has its own rules for alphabetical order. For example:
- In French,
éis considered close toe. - In Swedish,
Å,Ä, andÖare actually separate letters at the end of the alphabet. - In German,
äis usually treated asae. - In Spanish,
chandllused to be considered distinct letters (until officially changed in 1994).
If you’re building software that deals with international data (names, cities, dictionaries, product lists), you cannot rely on default sorting. You must use a Collator with the correct locale.
Practical Considerations
- Locale Choice
Choose the locale carefully:fr_FRfor French rulesen_USfor U.S. Englishsv_SEfor Swedish- etc.
If your app is multilingual, you might even need dynamic locale-based sorting depending on the user’s settings.
- Performance Impact
Collation is slower than binary comparison, especially for large datasets. If you’re sorting millions of rows, consider database-level collation (e.g., MySQLutf8mb4_unicode_ci) instead of PHP-only sorting. - Consistency Between Layers
- If your database uses one collation (e.g.,
utf8mb4_general_ci) and PHP uses another (fr_FRCollator), you may see different results between backend and frontend. Always keep collation consistent.
- If your database uses one collation (e.g.,
- Fallback Handling
If Intl is not available, provide a safe fallback (like plainsort()) but warn developers that results may be incorrect for internationalized data.
Extension Requirement
The Collator class is part of the Intl extension. Ensure it’s installed and enabled:
sudo apt-get install php-intl
Without it, your code will break.
Other Use Cases for Collator
Besides sorting, Collator is useful for:
- Case-insensitive comparison (e.g., “Ana” vs “ana”) without writing hacks.
- Searching for names or words where diacritics should not block matches.
- Human-friendly comparisons in multilingual applications.
Example for comparison:
$collator = new Collator('fr_FR');
echo $collator->compare('Émile', 'Emile'); // 0 (considered equal in French)
Finally
When building global applications, text handling is never trivial. Sorting strings correctly is not just about “nice to have”—it’s about respecting cultural rules and providing users with intuitive results.
The Intl Collator is your best friend for this. It makes sorting human-aware instead of machine-only.
So, next time you see an array of names like ['Ása', 'Zoë', 'Émile', 'Ana'], remember:
- Default sorting is wrong for humans.
- Collator sorting is right for humans.
- And the locale you choose makes all the difference.
👉 Pro tip: If you’re writing multi-tenant SaaS or global products, always store a user’s locale and apply it everywhere you handle text—sorting, searching, formatting, even pluralization. That’s how you build truly international-ready software.
Comments ()