反向因子算法

Question

我是新手。我已经学习算法两个月了。我一般都做得很好，但我不擅长理解搜索算法，也不擅长实现它们。我特别喜欢这种模式搜索算法，反向因子。我已经研究了一个星期了，但我仍然没有完全理解它，更不用说实施了。我没有可以问的人，但我不想跳过任何算法。到目前为止，我已经找到了这个算法。但我不太明白。我也不是母语人士。你能帮帮我吗？

目的是"search a pattern p in string t"。

Algorithm RF /* reverse factor string matching */
    /* denote t[i + j.. i + m] by x;
         it is the last-scanned part of the text */

    i:= 0; 
    while i _< n - m do
    { 
        j:= m; 
        while j > 1 and x ϵ FACT(p) 
            do j:=j- 1;
        /* in fact, we check the equivalent condition x^R ϵ FACT(p^R) */
        if x = p then 
            report a match at position i;
        shift := RF shift[x];
        i := i + shift;
    }
end.

Fact(p) 是 p 的所有因子（子串）的集合。

提前谢谢你。

Answer 1

我试试看：

i:= 0; 
while i _< n - m do //start at character 0
{ 
    j:= m; //start at character i + m (the potentially last character)
    whilej > 1 and x ϵ FACT(p)
        do j:=j- 1; //step back as long as t[i+j,i+m] is a substring of the pattern p
    /* in fact, we check the equivalent condition x^R ϵ FACT(p^R) */
    if x = p then // x=[i+0, i+m] == p
        report a match at position i; 
    shift := RF shift[x]; // look up the number of chars to advance
    i := i + shift; // advance
}

数组shift的构造相当困难。我不记得这是怎么做到的。但是我可以说在 shift[x].

会找到什么

shift[x] = the number of save character shifts such that the next search does not miss a match.

示例：有一个字符串 abcabcdab 和一个模式 bcd (| is i+m, * is i+j):

abc*|abcdab // start with i=0,j=3
ab*c|abcdab // c is a factor => continue
a*bc|abcdab // bc is a factor => continue
*abc|abcdab // abc is not a factor => shift = shift[bc] = 1
abca*|bcdab 
abc*a|bcdab // a is not a factor => shift = shift[] = 3
abcabcd*|ab 
abcabc*d|ab // d is a factor => continue
abcab*cd|ab // cd is a factor => continue
abca*bcd|ab // bcd is a factor and j = 0 => report match

请在此处查看 example for debugging in Java。它不像你的伪代码那么简单，但你可以调试它以更好地理解它。

反向因子算法

Reverse Factor Algorithm

algorithm

search

string-search