如何读取文件或流直到找到字符串
How to read file or stream until string found
我正在写一个字典程序,输入由一个文件指定并这样解析:
std::string savedDictionary(std::istreambuf_iterator<char>(std::ifstream(DICTIONARY_SAVE_FILE)), {});
// entire file loaded into savedDictionary
for (size_t end = 0; ;)
{
size_t term = savedDictionary.find("|TERM|", end);
size_t definition = savedDictionary.find("|DEFINITION|", term);
if ((end = savedDictionary.find("|END|", definition)) == std::string::npos) break;
// store term and definition here...
}
这会在我的一些第三世界用户的机器上抛出 std::bad_alloc
,这些机器没有足够的 RAM 来存储字典字符串 + 字典,因为它保存在我的程序中。
如果我能做到:
std::string term;
for (std::ifstream file(DICTIONARY_SAVE_FILE); file; std::getline(file, term, "|END|")
{
// same as above
}
那就太好了,但是 std::getline
不支持字符串作为分隔符。
那么,在我找到 "|END|"
之前读取文件的最惯用方法是什么,而无需预先分配大量内存?
我们可以通过使用一个非常简单的代理来实现请求的功能class。这样就可以像往常一样轻松使用所有 std::algorithm
和所有 std::iterator
。
因此,我们定义了一个名为 LineUntilEnd
的小型代理 class。这可以与所有 stream
一起使用,如 std::ifstream
或任何你喜欢的。您可以特别简单地使用提取器运算符从输入流中提取一个值并将其放入所需的变量中。
// Here we will store the lines until |END|
LineUntilEnd lue;
// Simply read the line until |END|
while (testInput >> lue) {
它按预期工作。
如果我们有这样一个字符串,我们可以通过简单的正则表达式操作在后面解析它。
我添加了一个小示例并将结果值放入 std::multimap
以构建演示字典。
请看下面代码
#include <iostream>
#include <string>
#include <iterator>
#include <regex>
#include <map>
#include <sstream>
#include <iterator>
// Ultra simple proxy class to read data until given word is found
struct LineUntilEnd
{
// Overload the extractor operator
friend std::istream& operator >>(std::istream& is, LineUntilEnd& lue);
// Intermediate storage for result
std::string data{};
};
// Read stream until "|END|" symbol has been found
std::istream& operator >>(std::istream& is, LineUntilEnd& lue)
{
// Clear destination string
lue.data.clear();
// We will count, how many bytes of the search string have been matched
size_t matchCounter{ 0U };
// Read characters from stream
char c{'[=11=]'};
while (is.get(c))
{
// Add character to resulting string
lue.data += c;
// CHeck for a match. All characters must be matched
if (c == "|END|"[matchCounter]) {
// Check next matching character
++matchCounter;
// If there is a match for all characters in the searchstring
if (matchCounter >= (sizeof "|END|" -1)) {
// The stop reading
break;
}
}
else {
// Not all charcters could be matched. Start from the begining
matchCounter = 0U;
}
}
return is;
}
// Input Test Data
std::istringstream testInput{ "|TERM|bonjour|TERM|hola|TERM|hi|DEFINITION|hello|END||TERM|Adios|TERM|Ciao|DEFINITION|bye|END|" };
// Regex defintions. Used to build up a dictionary
std::regex reTerm(R"(\|TERM\|(\w+))");
std::regex reDefinition(R"(\|DEFINITION\|(\w+)\|END\|)");
// Test code
int main()
{
// We will store the found values in a dictionay
std::multimap<std::string, std::string> dictionary{};
// Here we will store the lines until |END|
LineUntilEnd lue;
// Simply read the line until |END|
while (testInput >> lue) {
// Search for the defintion string
std::smatch sm{};
if (std::regex_search(lue.data, sm, reDefinition)) {
// Definition string found
// Iterate over all terms
std::sregex_token_iterator tokenIter(lue.data.begin(), lue.data.end(), reTerm, 1);
while (tokenIter != std::sregex_token_iterator()) {
// STore values in dictionary
dictionary.insert({ sm[1],*tokenIter++ });
}
}
}
// And show some result to the user
for (const auto& d : dictionary) {
std::cout << d.first << " --> " << d.second << "\n";
}
return 0;
}
对于未来的人,这是我最后写的:
std::optional<std::string> ReadEntry(std::istream& stream)
{
for (struct { char ch; int matched; std::string entry; } i{}; stream.get(i.ch); i.entry.push_back(i.ch))
if (i.ch == "|END|"[i.matched++]);
else if (i.matched == sizeof("|END|")) return i.entry;
else i.matched = 0;
return {};
}
我正在写一个字典程序,输入由一个文件指定并这样解析:
std::string savedDictionary(std::istreambuf_iterator<char>(std::ifstream(DICTIONARY_SAVE_FILE)), {});
// entire file loaded into savedDictionary
for (size_t end = 0; ;)
{
size_t term = savedDictionary.find("|TERM|", end);
size_t definition = savedDictionary.find("|DEFINITION|", term);
if ((end = savedDictionary.find("|END|", definition)) == std::string::npos) break;
// store term and definition here...
}
这会在我的一些第三世界用户的机器上抛出 std::bad_alloc
,这些机器没有足够的 RAM 来存储字典字符串 + 字典,因为它保存在我的程序中。
如果我能做到:
std::string term;
for (std::ifstream file(DICTIONARY_SAVE_FILE); file; std::getline(file, term, "|END|")
{
// same as above
}
那就太好了,但是 std::getline
不支持字符串作为分隔符。
那么,在我找到 "|END|"
之前读取文件的最惯用方法是什么,而无需预先分配大量内存?
我们可以通过使用一个非常简单的代理来实现请求的功能class。这样就可以像往常一样轻松使用所有 std::algorithm
和所有 std::iterator
。
因此,我们定义了一个名为 LineUntilEnd
的小型代理 class。这可以与所有 stream
一起使用,如 std::ifstream
或任何你喜欢的。您可以特别简单地使用提取器运算符从输入流中提取一个值并将其放入所需的变量中。
// Here we will store the lines until |END|
LineUntilEnd lue;
// Simply read the line until |END|
while (testInput >> lue) {
它按预期工作。
如果我们有这样一个字符串,我们可以通过简单的正则表达式操作在后面解析它。
我添加了一个小示例并将结果值放入 std::multimap
以构建演示字典。
请看下面代码
#include <iostream>
#include <string>
#include <iterator>
#include <regex>
#include <map>
#include <sstream>
#include <iterator>
// Ultra simple proxy class to read data until given word is found
struct LineUntilEnd
{
// Overload the extractor operator
friend std::istream& operator >>(std::istream& is, LineUntilEnd& lue);
// Intermediate storage for result
std::string data{};
};
// Read stream until "|END|" symbol has been found
std::istream& operator >>(std::istream& is, LineUntilEnd& lue)
{
// Clear destination string
lue.data.clear();
// We will count, how many bytes of the search string have been matched
size_t matchCounter{ 0U };
// Read characters from stream
char c{'[=11=]'};
while (is.get(c))
{
// Add character to resulting string
lue.data += c;
// CHeck for a match. All characters must be matched
if (c == "|END|"[matchCounter]) {
// Check next matching character
++matchCounter;
// If there is a match for all characters in the searchstring
if (matchCounter >= (sizeof "|END|" -1)) {
// The stop reading
break;
}
}
else {
// Not all charcters could be matched. Start from the begining
matchCounter = 0U;
}
}
return is;
}
// Input Test Data
std::istringstream testInput{ "|TERM|bonjour|TERM|hola|TERM|hi|DEFINITION|hello|END||TERM|Adios|TERM|Ciao|DEFINITION|bye|END|" };
// Regex defintions. Used to build up a dictionary
std::regex reTerm(R"(\|TERM\|(\w+))");
std::regex reDefinition(R"(\|DEFINITION\|(\w+)\|END\|)");
// Test code
int main()
{
// We will store the found values in a dictionay
std::multimap<std::string, std::string> dictionary{};
// Here we will store the lines until |END|
LineUntilEnd lue;
// Simply read the line until |END|
while (testInput >> lue) {
// Search for the defintion string
std::smatch sm{};
if (std::regex_search(lue.data, sm, reDefinition)) {
// Definition string found
// Iterate over all terms
std::sregex_token_iterator tokenIter(lue.data.begin(), lue.data.end(), reTerm, 1);
while (tokenIter != std::sregex_token_iterator()) {
// STore values in dictionary
dictionary.insert({ sm[1],*tokenIter++ });
}
}
}
// And show some result to the user
for (const auto& d : dictionary) {
std::cout << d.first << " --> " << d.second << "\n";
}
return 0;
}
对于未来的人,这是我最后写的:
std::optional<std::string> ReadEntry(std::istream& stream)
{
for (struct { char ch; int matched; std::string entry; } i{}; stream.get(i.ch); i.entry.push_back(i.ch))
if (i.ch == "|END|"[i.matched++]);
else if (i.matched == sizeof("|END|")) return i.entry;
else i.matched = 0;
return {};
}