Sorting Strings in PHP the Right Way: Why Locale Matters

Sorting Strings in PHP the Right Way: Why Locale Matters
Photo by Emmanuel Phaeton / Unsplash

Sorting text might look trivial at first glance. You take an array of names, run sort(), and you’re done, right? Well, not exactly. When dealing with human names, words with accents, or different alphabets, plain sorting can lead to unexpected and incorrect results.

Let’s explore why locale-aware sorting is crucial in PHP and how the Intl extension’s Collator class solves this problem.


The Problem with Default Sorting

Consider the following PHP array:

$names = ['Ása', 'Zoë', 'Émile', 'Ana'];
sort($names);

You might expect Ana to come before Ása, but the actual output may surprise you. The default sort() function compares binary code points, not linguistic rules.

That means sorting is based on the Unicode value of each character. While this works fine for plain ASCII (A–Z), it fails once accents, umlauts, and non-English letters enter the picture.

For example, Á and É are treated very differently than plain A or E. To a computer, they’re separate characters with their own numeric codes. To a human, though, Ása should clearly be near Ana, not somewhere unexpected.


Enter Collator – Locale-Aware Sorting

The solution is to use the Intl (Internationalization) extension in PHP, specifically the Collator class.

$names = ['Ása', 'Zoë', 'Émile', 'Ana'];

$collator = new Collator('fr_FR'); 
usort($names, fn($a, $b) => $collator->compare($a, $b));

print_r($names);

Result:

['Ana', 'Ása', 'Émile', 'Zoë']

This is correct French ordering. Why? Because the Collator understands that in French (fr_FR locale), Ana comes before Ása, and so on.


Why Locale is Crucial

Every language has its own rules for alphabetical order. For example:

  • In French, é is considered close to e.
  • In Swedish, Å, Ä, and Ö are actually separate letters at the end of the alphabet.
  • In German, ä is usually treated as ae.
  • In Spanish, ch and ll used to be considered distinct letters (until officially changed in 1994).

If you’re building software that deals with international data (names, cities, dictionaries, product lists), you cannot rely on default sorting. You must use a Collator with the correct locale.


Practical Considerations

  1. Locale Choice
    Choose the locale carefully:
    • fr_FR for French rules
    • en_US for U.S. English
    • sv_SE for Swedish
    • etc.
      If your app is multilingual, you might even need dynamic locale-based sorting depending on the user’s settings.
  2. Performance Impact
    Collation is slower than binary comparison, especially for large datasets. If you’re sorting millions of rows, consider database-level collation (e.g., MySQL utf8mb4_unicode_ci) instead of PHP-only sorting.
  3. Consistency Between Layers
    • If your database uses one collation (e.g., utf8mb4_general_ci) and PHP uses another (fr_FR Collator), you may see different results between backend and frontend. Always keep collation consistent.
  4. Fallback Handling
    If Intl is not available, provide a safe fallback (like plain sort()) but warn developers that results may be incorrect for internationalized data.

Extension Requirement
The Collator class is part of the Intl extension. Ensure it’s installed and enabled:

sudo apt-get install php-intl

Without it, your code will break.


Other Use Cases for Collator

Besides sorting, Collator is useful for:

  • Case-insensitive comparison (e.g., “Ana” vs “ana”) without writing hacks.
  • Searching for names or words where diacritics should not block matches.
  • Human-friendly comparisons in multilingual applications.

Example for comparison:

$collator = new Collator('fr_FR');
echo $collator->compare('Émile', 'Emile'); // 0 (considered equal in French)

Finally

When building global applications, text handling is never trivial. Sorting strings correctly is not just about “nice to have”—it’s about respecting cultural rules and providing users with intuitive results.

The Intl Collator is your best friend for this. It makes sorting human-aware instead of machine-only.

So, next time you see an array of names like ['Ása', 'Zoë', 'Émile', 'Ana'], remember:

  • Default sorting is wrong for humans.
  • Collator sorting is right for humans.
  • And the locale you choose makes all the difference.

👉 Pro tip: If you’re writing multi-tenant SaaS or global products, always store a user’s locale and apply it everywhere you handle text—sorting, searching, formatting, even pluralization. That’s how you build truly international-ready software.

Support Us