如何使用 preg_replace 只检查字母数字和其他语言的字母表？

Question

我只需要从我母语的字母数字和字符创建一个 SEO 友好的字符串。它是僧伽罗语。

我期望的字符串应该是这样的：

$myString = "this-is-a-දහසක්-බාධක-දුක්-කම්කටොලු-මැදින්-ලෝකය-දිනන්නට-වෙර-දරන";

我正在使用一个函数来创建这样的字符串。该功能如下：

function seoUrl($string) {
    //Lower case everything
    $string = strtolower($string);
    //Make alphanumeric (removes all other characters)
    $string = preg_replace("/[^a-z0-9_\s-]/", "", $string);
    //Clean up multiple dashes or whitespaces
    $string = preg_replace("/[\s-]+/", " ", $string);
    //Convert whitespaces and underscore to dash
    $string = preg_replace("/[\s_]/", "-", $string);
    return $string;
}

此函数只对英文字符有效，上面的字符串输出如下：

$title = seoUrl("this-is-a-දහසක්-බාධක-දුක්-කම්කටොලු-මැදින්-ලෝකය-දිනන්නට-වෙර-දරන");
echo $title; // this-is-a-

谁能告诉我如何修改上面的函数来获取我所有的字符（包括我的母语字符）

希望有人能帮助我。谢谢你。

Answer 1

您使用多字节编码。 preg_replace 不适用于多字节编码。你应该使用 mb_ereg_replace 函数

Answer 2

Unicode 使用 /u 标记，字母使用 \pL，数字使用 \pN。

编辑：由于一些多字节字符，mb_ereg_replace 是不错的选择：

function seoUrl($string) {
    //Lower case everything
    $string = strtolower($string);
    //Make alphanumeric (removes all other characters)
    $string = mb_ereg_replace("[^\x0D-\x0E\w\s-]", "", $string);
    //Clean up multiple dashes or whitespaces
    $string = preg_replace("/[\s-]+/", " ", $string);
    //Convert whitespaces and underscore to dash
    $string = preg_replace("/[\s_]/", "-", $string);
    return $string;
}
$title = seoUrl("this-is-a-දහසක්-බාධක-දුක්-කම්කටොලු-මැදින්-ලෝකය-දිනන්නට-වෙර-දරන");
echo $title;

输出：

this-is-a-දහසක්-බාධක-දුක්-කම්කටොලු-මැදින්-ලෝකය-දිනන්නට-වෙර-දරන

如何使用 preg_replace 只检查字母数字和其他语言的字母表？

How to check only alphanumeric and other language alphabet by using preg_replace?

php

preg-replace