boost::split 即使使用 token_compress_on 也将空字符串推送到向量

Question

当输入字符串为空时，boost::split return是一个包含一个空字符串的向量。

是否可以用 boost::split return 空向量代替？

MCVE:

#include <string>
#include <vector>
#include <boost/algorithm/string.hpp>

int main() {
    std::vector<std::string> result;
    boost::split(result, "", boost::is_any_of(","), boost::algorithm::token_compress_on);
    std::cout << result.size();
}

输出：

期望的输出：

Answer 1

压缩会压缩相邻的定界符，不会避免空标记。

如果您考虑以下几点，您就会明白为什么它始终有效：

Live On Coliru

#include <boost/algorithm/string.hpp>
#include <string>
#include <iostream>
#include <iomanip>
#include <vector>

int main() {
    for (std::string const& test : {
            "", "token", 
            ",", "token,", ",token", 
            ",,", ",token,", ",,token", "token,,"
        })
    {
        std::vector<std::string> result;
        boost::split(result, test, boost::is_any_of(","), boost::algorithm::token_compress_on);
        std::cout << "\n=== TEST: " << std::left << std::setw(8) << test << " === ";
        for (auto& tok : result)
            std::cout << std::quoted(tok, '\'') << " ";
    }
}

版画

=== TEST:          === '' 
=== TEST: token    === 'token' 
=== TEST: ,        === '' '' 
=== TEST: token,   === 'token' '' 
=== TEST: ,token   === '' 'token' 
=== TEST: ,,       === '' '' 
=== TEST: ,token,  === '' 'token' '' 
=== TEST: ,,token  === '' 'token' 
=== TEST: token,,  === 'token' ''

因此，您可以通过修剪前端和末尾的分隔符并检查剩余输入是否为非空来修复它：

Live On Coliru

#include <boost/algorithm/string.hpp>
#include <boost/utility/string_view.hpp>
#include <string>
#include <iostream>
#include <iomanip>
#include <vector>

int main() {
    auto const delim = boost::is_any_of(",");

    for (std::string test : {
            "", "token", 
            ",", "token,", ",token", 
            ",,", ",token,", ",,token", "token,,"
        })
    {
        std::cout << "\n=== TEST: " << std::left << std::setw(8) << test << " === ";

        std::vector<std::string> result;

        boost::trim_if(test, delim);
        if (!test.empty())
            boost::split(result, test, delim, boost::algorithm::token_compress_on);

        for (auto& tok : result)
            std::cout << std::quoted(tok, '\'') << " ";
    }
}

正在打印：

=== TEST:          === 
=== TEST: token    === 'token' 
=== TEST: ,        === 
=== TEST: token,   === 'token' 
=== TEST: ,token   === 'token' 
=== TEST: ,,       === 
=== TEST: ,token,  === 'token' 
=== TEST: ,,token  === 'token' 
=== TEST: token,,  === 'token'

奖励：提振精神

在我看来，使用 Spirit X3 更灵活并且可能更高效：

Live On Coliru

#include <boost/spirit/home/x3.hpp>
#include <string>
#include <iostream>
#include <iomanip>
#include <vector>

int main() {
    static auto const delim = boost::spirit::x3::char_(",");

    for (std::string test : {
            "", "token", 
            ",", "token,", ",token", 
            ",,", ",token,", ",,token", "token,,"
        })
    {
        std::cout << "\n=== TEST: " << std::left << std::setw(8) << test << " === ";

        std::vector<std::string> result;
        parse(test.begin(), test.end(), -(+~delim) % delim, result);

        for (auto& tok : result)
            std::cout << std::quoted(tok, '\'') << " ";
    }
}

boost::split 即使使用 token_compress_on 也将空字符串推送到向量

boost::split pushes an empty string to the vector even with token_compress_on

c++

boost

tokenize

奖励：提振精神