将 UTF-8 字符串拆分为块

Question

我想将 UTF-8 字符串拆分成大小相等的块。我想出了一个解决方案，正是这样做的。现在我想简化它，如果可能的话删除第一个 collect 调用。有办法吗？

fn main() {
    let strings = "ĄĆĘŁŃÓŚĆŹŻ"
        .chars()
        .collect::<Vec<char>>()
        .chunks(3)
        .map(|chunk| chunk.iter().collect::<String>())
        .collect::<Vec<String>>();
    println!("{:?}", strings);
}

Playground link

Answer 1

您可以使用 chunks() from Itertools.

use itertools::Itertools; // 0.10.1

fn main() {
    let strings = "ĄĆĘŁŃÓŚĆŹŻ"
        .chars()
        .chunks(3)
        .into_iter()
        .map(|chunk| chunk.collect::<String>())
        .collect::<Vec<String>>();
    println!("{:?}", strings);
}

Answer 2

这不需要 Itertools 作为依赖项，也不分配，因为它遍历原始字符串的切片：

fn chunks(s: &str, length: usize) -> impl Iterator<Item=&str> {
    assert!(length > 0);
    let mut indices = s.char_indices().map(|(idx, _)| idx).peekable();
    
    std::iter::from_fn(move || {
        let start_idx = match indices.next() {
            Some(idx) => idx,
            None => return None,
        };
        for _ in 0..length - 1 {
            indices.next();
        }
        let end_idx = match indices.peek() {
            Some(idx) => *idx,
            None => s.bytes().len(),
        };
        Some(&s[start_idx..end_idx])
    })
}


fn main() {
    let strings = chunks("ĄĆĘŁŃÓŚĆŹŻ", 3).collect::<Vec<&str>>();
    println!("{:?}", strings);
}

Answer 3

考虑到字素的问题，我最终得到了以下解决方案。

我使用了 unicode-segmentation 板条箱。

use unicode_segmentation::UnicodeSegmentation;                                                                                                                            

fn main() {
    let strings = "ĄĆĘŁŃÓŚĆŹŻèèèèè"
        .graphemes(true)                                                                                                                                          
        .collect::<Vec<&str>>()                                                                                                                                   
        .chunks(length)                                                                                                                                           
        .map(|chunk| chunk.concat())                                                                                                                              
        .collect::<Vec<String>>();
    println!("{:?}", strings);
}

希望能再做一些简化。

将 UTF-8 字符串拆分为块

Splitting a UTF-8 string into chunks

string

iterator

rust