如何提取 <br> python 之间的文本
How to extract text in between <br> python
我从 link http://www.sanfoundry.com/c-programming-questions-answers-variable-names-1/ 中提取了一些 div 到 "inspect elements" 。在 div 中有 <p> 而在 <p> 中有一些文本由 <br> 换行,我正在尝试提取这些文本,以便我可以按 one.I 放置在数组或数据库中<br> 前后的文本。
<div class="entry-content" style="visibility: visible; opacity: 1;">
<div style="text-align:justify">
This section on C interview <span id="IL_AD1" class="IL_AD">questions and answers</span> focuses on “Variable Names”. One shall practice these <span id="IL_AD5" class="IL_AD">interview questions</span> to improve their C programming skills needed for various interviews (campus interviews, walkin interviews, company interviews), placements, entrance exams and other competitive exams. These questions can be attempted by anyone focusing on learning C Programming language. They can be a beginner, fresher, engineering graduate or an experienced IT professional. Our C Interview questions come with detailed explanation of the <span id="IL_AD2" class="IL_AD">answers</span> which helps in better understanding of C <span id="IL_AD3" class="IL_AD">concepts</span>.<p></p>
<p>Here is a listing of C interview questions on “Variable Names” along with answers, explanations and/or solutions:
</p></div>
<p>1. C99 standard guarantees uniqueness of ____ characters for internal names.<br>
a) 31<br>
b) 63<br>
c) 12<br>
d) 14</p>
<span class="collapseomatic" id="id5489" tabindex="0" title="View Answer">View Answer</span><div id="target-id5489" class="collapseomatic_content " style="display: none;">Answer:b<br>
Explanation:ISO C99 compiler may consider only first 63 characters for internal.<br>
</div>
<p>2. C99 standard guarantess uniqueness of _____ characters for external names.<br>
a) 31<br>
b) 6<br>
c) 12<br>
d) 14</p>
<span class="collapseomatic " id="id7970" tabindex="0" title="View Answer">View Answer</span><div id="target-id7970" class="collapseomatic_content " style="display: none;">Answer:a<br>
Explanation:ISO C99 compiler may consider only first 31 characters for external<br>
variables having 31 characters due to which it may not be unique.<br>
</div>
<p>3. Which of the following is not a valid variable name declaration?<br>
a) int __a3;<br>
b) int __3a;<br>
c) int __A3;<br>
d) None of the mentioned</p>
<span class="collapseomatic " id="id5714" tabindex="0" title="View Answer">View Answer</span><div id="target-id5714" class="collapseomatic_content " style="display: none;">Answer:d<br>
Explanation:None.<br>
</div>
<p>4. Which of the following is not a valid variable name declaration?<br>
a) int _a3;<br>
b) int a_3;<br>
c) int 3_a;<br>
d) int _3a</p>
那么 shell 我如何分别得到 "C99 standard guarantees uniqueness of ____ characters for internal" ,"31","63","12","14 ", "C99 standard guarantess uniqueness of _____ characters for external" 和 "31","6","12","14" 等等……
除此之外,我也不需要项目符号编号和字母顺序
代码:
from bs4 import BeautifulSoup
soup = BeautifulSoup(htmls, 'html.parser')
h4s = soup.find_all('p')
for h4 in h4s:
for text in h4.find_next_siblings('br'):
print(text.strip())
知道我哪里错了吗?
在这种情况下,您可以尝试 CSS Selector
- soup.select('div.entry-content p')
将 select 一个 div 和 class(参见 .
)名称 entry-content
和 div 里面的所有 p
。我假设只有一个 div
有那个 class 名字。
from bs4 import BeautifulSoup as bs
html = """<div class="entry-content" style="visibility: visible; opacity: 1;">
<div style="text-align:justify">
This section on C interview <span id="IL_AD1" class="IL_AD">questions and answers</span> focuses on “Variable Names”. One shall practice these <span id="IL_AD5" class="IL_AD">interview questions</span> to improve their C programming skills needed for various interviews (campus interviews, walkin interviews, company interviews), placements, entrance exams and other competitive exams. These questions can be attempted by anyone focusing on learning C Programming language. They can be a beginner, fresher, engineering graduate or an experienced IT professional. Our C Interview questions come with detailed explanation of the <span id="IL_AD2" class="IL_AD">answers</span> which helps in better understanding of C <span id="IL_AD3" class="IL_AD">concepts</span>.<p></p>
<p>Here is a listing of C interview questions on “Variable Names” along with answers, explanations and/or solutions:
</p></div>
<p>1. C99 standard guarantees uniqueness of ____ characters for internal names.<br>
a) 31<br>
b) 63<br>
c) 12<br>
d) 14</p>
<span class="collapseomatic" id="id5489" tabindex="0" title="View Answer">View Answer</span><div id="target-id5489" class="collapseomatic_content " style="display: none;">Answer:b<br>
Explanation:ISO C99 compiler may consider only first 63 characters for internal.<br>
</div>
<p>2. C99 standard guarantess uniqueness of _____ characters for external names.<br>
a) 31<br>
b) 6<br>
c) 12<br>
d) 14</p>
<span class="collapseomatic " id="id7970" tabindex="0" title="View Answer">View Answer</span><div id="target-id7970" class="collapseomatic_content " style="display: none;">Answer:a<br>
Explanation:ISO C99 compiler may consider only first 31 characters for external<br>
variables having 31 characters due to which it may not be unique.<br>
</div>
<p>3. Which of the following is not a valid variable name declaration?<br>
a) int __a3;<br>
b) int __3a;<br>
c) int __A3;<br>
d) None of the mentioned</p>
<span class="collapseomatic " id="id5714" tabindex="0" title="View Answer">View Answer</span><div id="target-id5714" class="collapseomatic_content " style="display: none;">Answer:d<br>
Explanation:None.<br>
</div>
<p>4. Which of the following is not a valid variable name declaration?<br>
a) int _a3;<br>
b) int a_3;<br>
c) int 3_a;<br>
d) int _3a</p>"""
soup = bs(html,'html.parser')
p = soup.select('div.entry-content p')
for i in p[2:]:
print i.text.encode('utf-8')
print '\n'*3 # just print three newlines
输出-
1. C99 standard guarantees uniqueness of ____ characters for internal names.
a) 31
b) 63
c) 12
d) 14
2. C99 standard guarantess uniqueness of _____ characters for external names.
a) 31
b) 6
c) 12
d) 14
3. Which of the following is not a valid variable name declaration?
a) int __a3;
b) int __3a;
c) int __A3;
d) None of the mentioned
4. Which of the following is not a valid variable name declaration?
a) int _a3;
b) int a_3;
c) int 3_a;
d) int _3a
我从 link http://www.sanfoundry.com/c-programming-questions-answers-variable-names-1/ 中提取了一些 div 到 "inspect elements" 。在 div 中有 <p> 而在 <p> 中有一些文本由 <br> 换行,我正在尝试提取这些文本,以便我可以按 one.I 放置在数组或数据库中<br> 前后的文本。
<div class="entry-content" style="visibility: visible; opacity: 1;">
<div style="text-align:justify">
This section on C interview <span id="IL_AD1" class="IL_AD">questions and answers</span> focuses on “Variable Names”. One shall practice these <span id="IL_AD5" class="IL_AD">interview questions</span> to improve their C programming skills needed for various interviews (campus interviews, walkin interviews, company interviews), placements, entrance exams and other competitive exams. These questions can be attempted by anyone focusing on learning C Programming language. They can be a beginner, fresher, engineering graduate or an experienced IT professional. Our C Interview questions come with detailed explanation of the <span id="IL_AD2" class="IL_AD">answers</span> which helps in better understanding of C <span id="IL_AD3" class="IL_AD">concepts</span>.<p></p>
<p>Here is a listing of C interview questions on “Variable Names” along with answers, explanations and/or solutions:
</p></div>
<p>1. C99 standard guarantees uniqueness of ____ characters for internal names.<br>
a) 31<br>
b) 63<br>
c) 12<br>
d) 14</p>
<span class="collapseomatic" id="id5489" tabindex="0" title="View Answer">View Answer</span><div id="target-id5489" class="collapseomatic_content " style="display: none;">Answer:b<br>
Explanation:ISO C99 compiler may consider only first 63 characters for internal.<br>
</div>
<p>2. C99 standard guarantess uniqueness of _____ characters for external names.<br>
a) 31<br>
b) 6<br>
c) 12<br>
d) 14</p>
<span class="collapseomatic " id="id7970" tabindex="0" title="View Answer">View Answer</span><div id="target-id7970" class="collapseomatic_content " style="display: none;">Answer:a<br>
Explanation:ISO C99 compiler may consider only first 31 characters for external<br>
variables having 31 characters due to which it may not be unique.<br>
</div>
<p>3. Which of the following is not a valid variable name declaration?<br>
a) int __a3;<br>
b) int __3a;<br>
c) int __A3;<br>
d) None of the mentioned</p>
<span class="collapseomatic " id="id5714" tabindex="0" title="View Answer">View Answer</span><div id="target-id5714" class="collapseomatic_content " style="display: none;">Answer:d<br>
Explanation:None.<br>
</div>
<p>4. Which of the following is not a valid variable name declaration?<br>
a) int _a3;<br>
b) int a_3;<br>
c) int 3_a;<br>
d) int _3a</p>
那么 shell 我如何分别得到 "C99 standard guarantees uniqueness of ____ characters for internal" ,"31","63","12","14 ", "C99 standard guarantess uniqueness of _____ characters for external" 和 "31","6","12","14" 等等……
除此之外,我也不需要项目符号编号和字母顺序
代码:
from bs4 import BeautifulSoup
soup = BeautifulSoup(htmls, 'html.parser')
h4s = soup.find_all('p')
for h4 in h4s:
for text in h4.find_next_siblings('br'):
print(text.strip())
知道我哪里错了吗?
在这种情况下,您可以尝试 CSS Selector
- soup.select('div.entry-content p')
将 select 一个 div 和 class(参见 .
)名称 entry-content
和 div 里面的所有 p
。我假设只有一个 div
有那个 class 名字。
from bs4 import BeautifulSoup as bs
html = """<div class="entry-content" style="visibility: visible; opacity: 1;">
<div style="text-align:justify">
This section on C interview <span id="IL_AD1" class="IL_AD">questions and answers</span> focuses on “Variable Names”. One shall practice these <span id="IL_AD5" class="IL_AD">interview questions</span> to improve their C programming skills needed for various interviews (campus interviews, walkin interviews, company interviews), placements, entrance exams and other competitive exams. These questions can be attempted by anyone focusing on learning C Programming language. They can be a beginner, fresher, engineering graduate or an experienced IT professional. Our C Interview questions come with detailed explanation of the <span id="IL_AD2" class="IL_AD">answers</span> which helps in better understanding of C <span id="IL_AD3" class="IL_AD">concepts</span>.<p></p>
<p>Here is a listing of C interview questions on “Variable Names” along with answers, explanations and/or solutions:
</p></div>
<p>1. C99 standard guarantees uniqueness of ____ characters for internal names.<br>
a) 31<br>
b) 63<br>
c) 12<br>
d) 14</p>
<span class="collapseomatic" id="id5489" tabindex="0" title="View Answer">View Answer</span><div id="target-id5489" class="collapseomatic_content " style="display: none;">Answer:b<br>
Explanation:ISO C99 compiler may consider only first 63 characters for internal.<br>
</div>
<p>2. C99 standard guarantess uniqueness of _____ characters for external names.<br>
a) 31<br>
b) 6<br>
c) 12<br>
d) 14</p>
<span class="collapseomatic " id="id7970" tabindex="0" title="View Answer">View Answer</span><div id="target-id7970" class="collapseomatic_content " style="display: none;">Answer:a<br>
Explanation:ISO C99 compiler may consider only first 31 characters for external<br>
variables having 31 characters due to which it may not be unique.<br>
</div>
<p>3. Which of the following is not a valid variable name declaration?<br>
a) int __a3;<br>
b) int __3a;<br>
c) int __A3;<br>
d) None of the mentioned</p>
<span class="collapseomatic " id="id5714" tabindex="0" title="View Answer">View Answer</span><div id="target-id5714" class="collapseomatic_content " style="display: none;">Answer:d<br>
Explanation:None.<br>
</div>
<p>4. Which of the following is not a valid variable name declaration?<br>
a) int _a3;<br>
b) int a_3;<br>
c) int 3_a;<br>
d) int _3a</p>"""
soup = bs(html,'html.parser')
p = soup.select('div.entry-content p')
for i in p[2:]:
print i.text.encode('utf-8')
print '\n'*3 # just print three newlines
输出-
1. C99 standard guarantees uniqueness of ____ characters for internal names.
a) 31
b) 63
c) 12
d) 14
2. C99 standard guarantess uniqueness of _____ characters for external names.
a) 31
b) 6
c) 12
d) 14
3. Which of the following is not a valid variable name declaration?
a) int __a3;
b) int __3a;
c) int __A3;
d) None of the mentioned
4. Which of the following is not a valid variable name declaration?
a) int _a3;
b) int a_3;
c) int 3_a;
d) int _3a