Sorting Strings in PHP the Right Way: Why Locale Matters
Sorting text might look trivial at first glance. You take an array of names, run sort()
, and you’re done, right? Well, not exactly. When dealing with human names, words with accents, or different alphabets, plain sorting can lead to unexpected and incorrect results.
Let’s explore why locale-aware sorting is crucial in PHP and how the Intl extension’s Collator class solves this problem.
The Problem with Default Sorting
Consider the following PHP array:
$names = ['Ása', 'Zoë', 'Émile', 'Ana'];
sort($names);
You might expect Ana
to come before Ása
, but the actual output may surprise you. The default sort()
function compares binary code points, not linguistic rules.
That means sorting is based on the Unicode value of each character. While this works fine for plain ASCII (A–Z), it fails once accents, umlauts, and non-English letters enter the picture.
For example, Á
and É
are treated very differently than plain A
or E
. To a computer, they’re separate characters with their own numeric codes. To a human, though, Ása
should clearly be near Ana
, not somewhere unexpected.
Enter Collator
– Locale-Aware Sorting
The solution is to use the Intl (Internationalization) extension in PHP, specifically the Collator
class.
$names = ['Ása', 'Zoë', 'Émile', 'Ana'];
$collator = new Collator('fr_FR');
usort($names, fn($a, $b) => $collator->compare($a, $b));
print_r($names);
Result:
['Ana', 'Ása', 'Émile', 'Zoë']
This is correct French ordering. Why? Because the Collator
understands that in French (fr_FR
locale), Ana
comes before Ása
, and so on.
Why Locale is Crucial
Every language has its own rules for alphabetical order. For example:
- In French,
é
is considered close toe
. - In Swedish,
Å
,Ä
, andÖ
are actually separate letters at the end of the alphabet. - In German,
ä
is usually treated asae
. - In Spanish,
ch
andll
used to be considered distinct letters (until officially changed in 1994).
If you’re building software that deals with international data (names, cities, dictionaries, product lists), you cannot rely on default sorting. You must use a Collator
with the correct locale.
Practical Considerations
- Locale Choice
Choose the locale carefully:fr_FR
for French rulesen_US
for U.S. Englishsv_SE
for Swedish- etc.
If your app is multilingual, you might even need dynamic locale-based sorting depending on the user’s settings.
- Performance Impact
Collation is slower than binary comparison, especially for large datasets. If you’re sorting millions of rows, consider database-level collation (e.g., MySQLutf8mb4_unicode_ci
) instead of PHP-only sorting. - Consistency Between Layers
- If your database uses one collation (e.g.,
utf8mb4_general_ci
) and PHP uses another (fr_FR
Collator), you may see different results between backend and frontend. Always keep collation consistent.
- If your database uses one collation (e.g.,
- Fallback Handling
If Intl is not available, provide a safe fallback (like plainsort()
) but warn developers that results may be incorrect for internationalized data.
Extension Requirement
The Collator
class is part of the Intl extension. Ensure it’s installed and enabled:
sudo apt-get install php-intl
Without it, your code will break.
Other Use Cases for Collator
Besides sorting, Collator
is useful for:
- Case-insensitive comparison (e.g., “Ana” vs “ana”) without writing hacks.
- Searching for names or words where diacritics should not block matches.
- Human-friendly comparisons in multilingual applications.
Example for comparison:
$collator = new Collator('fr_FR');
echo $collator->compare('Émile', 'Emile'); // 0 (considered equal in French)
Finally
When building global applications, text handling is never trivial. Sorting strings correctly is not just about “nice to have”—it’s about respecting cultural rules and providing users with intuitive results.
The Intl Collator is your best friend for this. It makes sorting human-aware instead of machine-only.
So, next time you see an array of names like ['Ása', 'Zoë', 'Émile', 'Ana']
, remember:
- Default sorting is wrong for humans.
- Collator sorting is right for humans.
- And the locale you choose makes all the difference.
👉 Pro tip: If you’re writing multi-tenant SaaS or global products, always store a user’s locale and apply it everywhere you handle text—sorting, searching, formatting, even pluralization. That’s how you build truly international-ready software.
Comments ()