Skip to content

As mentioned in the initial post, only certain characters may be used in URL fragments. In the first version of the extension, the field value was therefore checked with PHP when saving the record. If necessary, replacements were made. Now a JavaScript performs an equivalent evaluation right away!

The individual steps are basically the same as in PHP:

  1. Convert to lowercase letters
  2. Remove HTML tags
  3. Replace spaces and tabs with a hyphen
  4. Replace diacritical characters and currencies (e.g. àä€)
  5. Keep only valid characters (remove all others)
  6. Convert multiple hyphen characters to a single one
  7. Convert to lowercase letters again

Replace diacritical characters

The class CharsetConverter is used in PHP with a character table that can be found under typo3temp/var/charset/csascii_utf-8.tbl. This table would be too extensive for JavaScript, though.

Instead, I found a JavaScript map with diacritical characters and their substitutions in the DataTables project. Fortunately, its license is compatible with GPL v2. Here I removed the capital letters from the map and add some common currencies.

Keep only valid characters

If Firefox were capable of handling Unicode property escapes, it would have been short and easy:

value = value.replace(/[^\p{L}\p{M}0-9\-_.]/ug, '');

But Firefox isn't (yet). So I had to find another solution. With the help of this transpiler, I converted the regular expression so that every browser can handle it. Unfortunately, this has caused it to grow considerably.

The project was a nice opportunity to take a closer look at regular expressions. Killed two birds with one stone!

Back to news list