如果单词有这些字母,我如何删除单词
How I can delete words if word have this letters
如果单词中有字母 "ц"、"щ"、"ы"、"ь",我必须删除单词。我为需要我创建了这个功能,但它运行缓慢。
public function CheckToInsert($text)
{
$xarfho = array("ц", "щ", "ы", "ь","қ","ӣ","ғ","ҷ","ҳ","ӯ","Қ","Ӣ","Ғ","Ҷ","Ҳ","Ӯ");
foreach ($xarfho as $xarf)
{
if (stripos($text,$xarf) !== false)
{
return true;
}
}
return false;
}
public function UnsetUncorrectWords($words)
{
foreach ($words as $key => $value)
{
if($this->CheckToInsert($value) == false) unset($words[$key]);
if(strlen($value) < 3) unset($words[$key]);
}
return $words;
}
我建议像这样重写你的函数(或者根本不使用函数):
public function UnsetUncorrectWords($words)
{
return preg_grep('~\A[^қӣғҷҳӯҚӢҒҶҲӮ]{3,}\z~u', $words);
}
preg_grep
过滤与模式不匹配的数组项。
该模式描述的单词中至少有 3 个字符没有包含 ¥,ӣ,¥,¥,¥,¥,¥,¥,¥,¥,¥,¥,¥,Ӯ。
请注意,您不能对多字节字符使用 strlen
,因为这个 returns 字节数,而不是字符数。
您可以使用 preg_grep
来获取包含正则表达式匹配项的数组项,或者不包含带有 PREG_GREP_INVERT
标志的匹配项的数组项。
因此,要获取所有没有您选择的字母的项目,请使用
$xarfho = array("ц", "щ", "ы", "ь","қ","ӣ","ғ","ҷ","ҳ","ӯ","Қ","Ӣ","Ғ","Ҷ","Ҳ","Ӯ");
$wrds = array('Еыфвҷ','цӣвееп','аааа');
$pat = '/[' . implode("", $xarfho) . ']/u';
$res = preg_grep($pat, $wrds, PREG_GREP_INVERT);
// => Array ( [2] => аааа )
要获取带有 "ц", "щ", "ы", "ь","қ","ӣ","ғ","ҷ","ҳ","ӯ","Қ","Ӣ","Ғ","Ҷ","Ҳ","Ӯ"
个字母的项目,请使用
$xarfho = array("ц", "щ", "ы", "ь","қ","ӣ","ғ","ҷ","ҳ","ӯ","Қ","Ӣ","Ғ","Ҷ","Ҳ","Ӯ");
$wrds = array('Еыфвҷ','цӣвееп','аааа');
$pat = '/[' . implode("", $xarfho) . ']/u';
$res = preg_grep($pat, $wrds);
// => Array ( [0] => Еыфвҷ [1] => цӣвееп )
参见 another PHP demo。
正则表达式看起来像 /[цщы]/u
,其中 [...]
是一个 字符 class 匹配定义的任何字符(或字符范围)在模式中并且需要 /u
修饰符,因为您的模式包含 ASCII 以外的字符,并且 UNICODE 修饰符将使正则表达式引擎正确解析模式和输入字符串。
如果单词中有字母 "ц"、"щ"、"ы"、"ь",我必须删除单词。我为需要我创建了这个功能,但它运行缓慢。
public function CheckToInsert($text)
{
$xarfho = array("ц", "щ", "ы", "ь","қ","ӣ","ғ","ҷ","ҳ","ӯ","Қ","Ӣ","Ғ","Ҷ","Ҳ","Ӯ");
foreach ($xarfho as $xarf)
{
if (stripos($text,$xarf) !== false)
{
return true;
}
}
return false;
}
public function UnsetUncorrectWords($words)
{
foreach ($words as $key => $value)
{
if($this->CheckToInsert($value) == false) unset($words[$key]);
if(strlen($value) < 3) unset($words[$key]);
}
return $words;
}
我建议像这样重写你的函数(或者根本不使用函数):
public function UnsetUncorrectWords($words)
{
return preg_grep('~\A[^қӣғҷҳӯҚӢҒҶҲӮ]{3,}\z~u', $words);
}
preg_grep
过滤与模式不匹配的数组项。
该模式描述的单词中至少有 3 个字符没有包含 ¥,ӣ,¥,¥,¥,¥,¥,¥,¥,¥,¥,¥,¥,Ӯ。
请注意,您不能对多字节字符使用 strlen
,因为这个 returns 字节数,而不是字符数。
您可以使用 preg_grep
来获取包含正则表达式匹配项的数组项,或者不包含带有 PREG_GREP_INVERT
标志的匹配项的数组项。
因此,要获取所有没有您选择的字母的项目,请使用
$xarfho = array("ц", "щ", "ы", "ь","қ","ӣ","ғ","ҷ","ҳ","ӯ","Қ","Ӣ","Ғ","Ҷ","Ҳ","Ӯ");
$wrds = array('Еыфвҷ','цӣвееп','аааа');
$pat = '/[' . implode("", $xarfho) . ']/u';
$res = preg_grep($pat, $wrds, PREG_GREP_INVERT);
// => Array ( [2] => аааа )
要获取带有 "ц", "щ", "ы", "ь","қ","ӣ","ғ","ҷ","ҳ","ӯ","Қ","Ӣ","Ғ","Ҷ","Ҳ","Ӯ"
个字母的项目,请使用
$xarfho = array("ц", "щ", "ы", "ь","қ","ӣ","ғ","ҷ","ҳ","ӯ","Қ","Ӣ","Ғ","Ҷ","Ҳ","Ӯ");
$wrds = array('Еыфвҷ','цӣвееп','аааа');
$pat = '/[' . implode("", $xarfho) . ']/u';
$res = preg_grep($pat, $wrds);
// => Array ( [0] => Еыфвҷ [1] => цӣвееп )
参见 another PHP demo。
正则表达式看起来像 /[цщы]/u
,其中 [...]
是一个 字符 class 匹配定义的任何字符(或字符范围)在模式中并且需要 /u
修饰符,因为您的模式包含 ASCII 以外的字符,并且 UNICODE 修饰符将使正则表达式引擎正确解析模式和输入字符串。