C ++在不使用正则表达式的情况下获取自定义分隔符之间的子字符串
C++ Get the substring between custom delimiters without the use of regex
我有一个简单的格式字符串:
"lorem ipsum <span id='1'>extract_me-1</span> dolor
sit amet <span id='2'>extract_me-2</span> adispicing consequit lorem ipsum
sit amet <span id='3'>extract_me-3</span> adispicing dolor lorem"
现在我需要提取指定的自定义分隔符之间的字符串
例如,
Substring("<span id='1'>","</span>") = extract_me-1
Substring("<span id='2'>","</span>") = extract_me-2
Substring("lorem","<span id='1'>") = ipsum
Substring("extract_me-1","dolor") = </span>
我已经使用正则表达式完成了这项任务:
std::string str="lorem ipsum <span id='1'>extract_me-1</span> dolor sit amet <span id='2'>extract_me-2</span> adispicing consequit lorem ipsum sit amet <span id='3'>extract_me-3</span> adispicing dolor lorem";
std::smatch match;
std::regex rgx ("<span id='1'>(.*?)</span>");
if (regex_search(str, match, rgx)){
//First substring
std::cout<<match.str(1);
}
有没有不使用正则表达式的方法来做到这一点。我已经尝试使用 substr
几次,但仍然无济于事。非常感谢您的帮助,thnks
编辑:输入 str
不是完整的 html 格式,只是一些随机标签。我只需要从开始到 下一个最接近的子字符串结束位置(是的,即使有相同 span
或重复的嵌套标签)
您需要检查每个 str.find()
调用的每个 return 值,就像我对第一个调用所做的那样,但这是它的要点。可能只想搜索标签,然后搜索 ID,但随后您还需要检查该标签的不存在 ID:
#include <string>
int main() {
const std::string str="lorem ipsum <span id='1'>extract_me-1</span> dolor sit amet <span id='2'>extract_me-2</span> adispicing consequit lorem ipsum sit amet <span id='3'>extract_me-3</span> adispicing dolor lorem";
const std::string tag = "<span id='";
std::string r = "";
for(size_t pos = 0;;) {
size_t tag_pos = str.find(tag, pos);
if(tag_pos == str.npos) {
break;
}
size_t id_pos = tag_pos + tag.size();
size_t id_pos2 = str.find("'", id_pos);
size_t txt_pos = str.find(">", id_pos2) + 1;
size_t txt_pos2 = str.find("<", txt_pos);
r += "txt";
r += str.substr(id_pos, id_pos2 - id_pos);
r += " = ";
r += str.substr(txt_pos, txt_pos2 - txt_pos);
r += "\n";
pos = txt_pos2;
}
}
我使用 .find
和 .substr
解决了这个问题。结果比我想象的要容易
#include <string>
#include <iostream>
using namespace std;
int t1,t2;
string str="lorem ipsum <span id='1'>extract_me-1</span> dolor sit amet <span id='2'>extract_me-2</span> adispicing consequit lorem ipsum sit amet <span id='3'>extract_me-3</span> adispicing dolor lorem";
string subStrng(string start,string end);
int main() {
string txt1 = subStrng("<span id='1'>","</span>");
string txt2 = subStrng("<span id='2'>","</span>");
string txt3 = subStrng("<span id='3'>","</span>");
cout<<txt1<<"\n"<<txt2<<"\n"<<txt3;
return 0;
}
//Substring func.
string subStrng(string start,string end){
t1=str.find(start);
if(t1 >= 0){
// string 'start' exist in str.
// Now, lets find the next closest string 'end'
t1=t1+start.length();
t2=str.find(end,t1);
if(t2 >= 0){
// next closest 'end' exists in the str.
// Now, lets extract the substring in between
return str.substr(t1,t2-t1);
}else{
return "";
}
}else{
return "";
}
}
干杯
我有一个简单的格式字符串:
"lorem ipsum <span id='1'>extract_me-1</span> dolor
sit amet <span id='2'>extract_me-2</span> adispicing consequit lorem ipsum
sit amet <span id='3'>extract_me-3</span> adispicing dolor lorem"
现在我需要提取指定的自定义分隔符之间的字符串
例如,
Substring("<span id='1'>","</span>") = extract_me-1
Substring("<span id='2'>","</span>") = extract_me-2
Substring("lorem","<span id='1'>") = ipsum
Substring("extract_me-1","dolor") = </span>
我已经使用正则表达式完成了这项任务:
std::string str="lorem ipsum <span id='1'>extract_me-1</span> dolor sit amet <span id='2'>extract_me-2</span> adispicing consequit lorem ipsum sit amet <span id='3'>extract_me-3</span> adispicing dolor lorem";
std::smatch match;
std::regex rgx ("<span id='1'>(.*?)</span>");
if (regex_search(str, match, rgx)){
//First substring
std::cout<<match.str(1);
}
有没有不使用正则表达式的方法来做到这一点。我已经尝试使用 substr
几次,但仍然无济于事。非常感谢您的帮助,thnks
编辑:输入 str
不是完整的 html 格式,只是一些随机标签。我只需要从开始到 下一个最接近的子字符串结束位置(是的,即使有相同 span
或重复的嵌套标签)
您需要检查每个 str.find()
调用的每个 return 值,就像我对第一个调用所做的那样,但这是它的要点。可能只想搜索标签,然后搜索 ID,但随后您还需要检查该标签的不存在 ID:
#include <string>
int main() {
const std::string str="lorem ipsum <span id='1'>extract_me-1</span> dolor sit amet <span id='2'>extract_me-2</span> adispicing consequit lorem ipsum sit amet <span id='3'>extract_me-3</span> adispicing dolor lorem";
const std::string tag = "<span id='";
std::string r = "";
for(size_t pos = 0;;) {
size_t tag_pos = str.find(tag, pos);
if(tag_pos == str.npos) {
break;
}
size_t id_pos = tag_pos + tag.size();
size_t id_pos2 = str.find("'", id_pos);
size_t txt_pos = str.find(">", id_pos2) + 1;
size_t txt_pos2 = str.find("<", txt_pos);
r += "txt";
r += str.substr(id_pos, id_pos2 - id_pos);
r += " = ";
r += str.substr(txt_pos, txt_pos2 - txt_pos);
r += "\n";
pos = txt_pos2;
}
}
我使用 .find
和 .substr
解决了这个问题。结果比我想象的要容易
#include <string>
#include <iostream>
using namespace std;
int t1,t2;
string str="lorem ipsum <span id='1'>extract_me-1</span> dolor sit amet <span id='2'>extract_me-2</span> adispicing consequit lorem ipsum sit amet <span id='3'>extract_me-3</span> adispicing dolor lorem";
string subStrng(string start,string end);
int main() {
string txt1 = subStrng("<span id='1'>","</span>");
string txt2 = subStrng("<span id='2'>","</span>");
string txt3 = subStrng("<span id='3'>","</span>");
cout<<txt1<<"\n"<<txt2<<"\n"<<txt3;
return 0;
}
//Substring func.
string subStrng(string start,string end){
t1=str.find(start);
if(t1 >= 0){
// string 'start' exist in str.
// Now, lets find the next closest string 'end'
t1=t1+start.length();
t2=str.find(end,t1);
if(t2 >= 0){
// next closest 'end' exists in the str.
// Now, lets extract the substring in between
return str.substr(t1,t2-t1);
}else{
return "";
}
}else{
return "";
}
}
干杯