尝试在 phpmyadmin 中查找单词的出现时计算意外字符

counting unexpected Chars when trying to find occurrence of the words in phpmyadmin

我有一个table,其中包含从一个单词到 40 个单词的不同长度的句子,我想分别计算每个单词以及它在 table 中出现的次数, 但是只要句子只包含一个词,它就会打印出意想不到的字符。出于某些原因,有什么想法吗?

这是我的数据库的演示

DB demo

这是代码

create table messages(sent varchar(200), verif int);
insert into messages values


             ('HI' , null),

             ('HI alex how are you' , null),

             ('bye' , null);

select * from messages;

UPDATE messages set sent = TRIM(sent);
UPDATE messages set sent = REGEXP_REPLACE(sent,' +',' ')

with recursive cte as (
    select 
        substring(concat(sent, ' '), 1, locate(' ', sent)) word,
        substring(concat(sent, ' '), locate(' ', sent) + 1) sent
    from messages
    union all
    select 
        substring(sent, 1, locate(' ', sent)) word,
        substring(sent, locate(' ', sent) + 1) sent
    from cte
    where locate(' ', sent) > 0
)
select row_number() over(order by count(*) desc, word) wid, word, count(*) freq
from cte 
group by word
order by wid

out put of the code 


wid word    freq
1       2
2   HI  2
3   alex    1
4   are     1
5   bye     1
6   how     1
7   you     1


expected output 
wid word    freq
1   HI  2
2   alex    1
3   are     1
4   bye     1
5   how     1
6   you     1


您的问题在这些行中:

substring(concat(sent, ' '), 1, locate(' ', sent)) word,
substring(concat(sent, ' '), locate(' ', sent) + 1) sent

sent不包含space时,locate(' ', sent)returns0和substringreturns为空字符串,即您所看到的内容计入您的输出。要解决此问题,请使用 concat(sent, ' ') 代替 sent:

substring(concat(sent, ' '), 1, locate(' ', concat(sent, ' '))) word,
substring(concat(sent, ' '), locate(' ', concat(sent, ' ')) + 1) sent

对于您的示例数据,这给出了:

wid     word    freq
1       HI      2
2       alex    1
3       are     1
4       bye     1
5       how     1
6       you     1

Demo on dbfiddle