检测按顺序发生的字符串切片的重复元素

Question

我需要检测并列出切片的字符串字符，这些字符的顺序重复次数多于或等于 N 次。我已经设法用 Rust 编写了非高阶函数解决方案（如下），但我想知道这是否可以简化为链接 iter 方法。

想法：

let v = "1122253225";
let n = 2;

输出：

There are 2 repetition of '1'
There are 3 repetition of '2'
There are 2 repetition of '2'

发生重复的索引并不重要。重复必须按顺序进行（即 3 次“2”的重复由其他值分隔，另外 2 次“2”的重复算作单独的输出行）。

我的非迭代器解决方案：

    let mut cur_ch = '[=12=]';
    let mut repeat = 0;

    for ch in v.chars() {
        if ch == cur_ch {
            repeat = repeat + 1;
        }
        else {
            if repeat >= n {
                printf!("There are {} repetition of '{}'", repeat, cur_ch);
            }
            cur_ch = ch;
            repeat = 1;
        }
    }

    if repeat >= n {
        printf!("There are {} repetition of '{}'", repeat, cur_ch);
    }

它可以工作，但是有没有更好的方法使用链接 iter 方法来做到这一点？

Answer 1

这是使用 filter_map() 的一种尝试：

fn foo(v: &str, n: usize) -> impl Iterator<Item = (usize, char)> + '_ {
    let mut cur_ch = '[=10=]';
    let mut repeat = 0;
    v.chars().chain(std::iter::once('[=10=]')).filter_map(move |ch| {
        if ch == cur_ch {
            repeat += 1;
            return None;
        }
        
        let val = if repeat >= n {
            Some((repeat, cur_ch))
        } else {
            None
        };
        cur_ch = ch;
        repeat = 1;
        val
    })
}

fn main() {
    for (repeat, ch) in foo("1122253225", 2) {
        println!("There are {} repetition of '{}'", repeat, ch);
    }
}

然后你可以将其概括为如下内容：

fn foo<'i, I, T>(v: I, n: usize) -> impl Iterator<Item = (usize, T)> + 'i
where
    I: Iterator<Item = T> + 'i,
    T: Clone + Default + PartialEq + 'i,
{
    let mut cur = T::default();
    let mut repeat = 0;
    v.chain(std::iter::once(T::default()))
        .filter_map(move |i| {
            if i == cur {
                repeat += 1;
                return None;
            }

            let val = if repeat >= n {
                Some((repeat, cur.clone()))
            } else {
                None
            };
            cur = i;
            repeat = 1;
            val
        })
}

这将是 higher-order，但不确定它是否真的比使用 for 循环简单得多！

Answer 2

这是一个使用 scan 和 filter_map 的解决方案：

fn main() {
    let s = "112225322555";
    let n = 2;

    let i = s
        .chars()
        .map(|v| Some(v))
        .chain(std::iter::once(None))
        .scan((0, None), |(count, ch), v| match ch {
            Some(c) if *c == v => {
                *count += 1;
                Some((None, *count))
            }
            _ => Some((ch.replace(v), std::mem::replace(count, 1))),
        })
        .filter_map(|(ch, count)| match ch {
            Some(Some(ch)) if count >= n => Some((ch, count)),
            _ => None,
        });

    for (ch, num) in i {
        println!("There are {} repititions of {}", num, ch);
    }
}

Playground Link

第一步是用scan统计相邻字符的个数。 scan 的第一个参数是一个状态变量，它被传递给作为第二个参数传递的闭包的每次调用。在这种情况下，状态变量是一个包含当前字符及其出现次数的元组。

注：

我们需要将迭代扩展到我们正在分析的字符串的末尾之外（否则我们会错过字符串末尾包含满足运行个字符的情况标准）。我们通过将迭代映射到 Option<char> 然后链接到单个 None 来做到这一点。这比 special-casing 个字符如 [=17=] 更好，这样我们甚至可以数 [=17=] 个字符。
出于同样的原因，我们使用 Option<char> 作为状态元组中的当前字符。
scan 的 return 值是 (Option<Option<char>>, i32) 上的迭代器。对于迭代器中的每个重复字符，元组中的第一个值将是 None，而在字符更改的每个边界处它将是 Some(Some(char))
我们对return当前字符和计数使用replace，同时将状态元组设置为新值

最后一步是使用filter_map来：

删除 (None, i32) 变体，表示传入字符没有变化
过滤掉计数未达到限制的情况n。

检测按顺序发生的字符串切片的重复元素

Detect duplicated elements of a string slice happening in order

iterator

method-chaining

higher-order-functions

rust