JSOUP - 访问 div class 中的元素/在到达特定 div class 时停止

JSOUP - Accessing elements within a div class / stop when reaching a specific div class

我正在尝试解析来自 HTML 的数据。我需要从 html 代码中获取特定内容,其中顺序或 html 内容可能不同。

<h1>Latest Deals</h1>\r\n </div>\r\n </div>\r\n</div>\r\n\r\n
<div class=\"breadcrumb-wrapper\">\r\n    
<ul class=\"breadcrumb\">\r\n        
<li><a href=\"/Home\">Home</a></li>\r\n        
<li><a href=\"/Deals\">Deals</a></li>\r\n        
<li class=\"active\">Mau Mudik Hemat? Nikmati Diskon Hingga 20%</li>\r\n 
</ul>\r\n</div>\r\n\r\n
<div class=\"article outer clearfix\">\r\n    
<div class=\"col-sm-12\">\r\n        
<img alt=\"Mau Mudik Hemat? Nikmati Diskon Hingga 20%\" title=\"Mau Mudik Hemat? Nikmati Diskon Hingga 20%\" src=\"/images/slider/id/special-raya-offer-id-v2.jpg\">\r\n        
<h1>Mau Mudik Hemat? Nikmati Diskon Hingga 20%</h1>\r\n        
<p class=\"date\">May 18th, 2018</p>\r\n        
<p><strong class=\"text-red\"></strong></p>\r\n\r\n        
<p>This is the first paragraph</p>\r\n\r\n        
<p>This is the second paragraph.</p>\r\n\r\n        
<p>This is the third paragraph</p>\r\n\r\n        
<p>Below is the point form start:</p>\r\n\r\n        
<ol>\r\n            
<li>Point form A</li>\r\n            
<li>Point form B</li>\r\n            
<li>Point form C</li>\r\n            
<li>Point form D</li>\r\n            
</ol>\r\n\r\n\r\n\r\n        
<div class=\"m-top30 m-bottom20\">\r\n    
<a href=\"/home\" class=\"btn btn-lg btn-orange\">Home</a>\r\n\r\n    \r\n\r\n\r\n</div>\r\n\r\n\r\n

之前我通过以下方式成功获取了我想要的内容:

Document doc = Jsoup.parse(content);
Element eTitle = doc.getElementsByTag("h1").get(1);
Elements eBody = doc.getElementsByTag("p");

for (Element body : eBody) {
   detailContent += "<p>" + body.html() + "</p>";

上面的代码我从我的长 html 代码中获取了第一个 "h1" 和所有带有 "p" 的元素。但是,现在在某些情况下,我可能在 "p" 之间有元素 "ol"。例如:

<div class=\"col-sm-12\">\r\n <img alt=\"abc\" title=\"abcd\" src=\"/images/slider/id/abcd.jpg\">\r\n 
<h1>This is the header</h1>\r\n
<p class=\"date\">November 4th, 2015</p>\r\n 
<p><strong class=\"text-red\">Sorry, this promotion has expired.</strong></p>\r\n  
<p> Paragraph 1 </p>\r\n
<p> Paragraph 2 </p>\r\n
<ol>\r\n            
<li> Point 1 </li>\r\n            
<li> Point 2 </li>\r\n            
</ol>\r\n
<p> Paragraph 3 </p>\r\n
<p> Paragraph 4 </p>\r\n
<ol>\r\n            
<li> Point 1 </li>\r\n            
<li> Point 2 </li>\r\n            
</ol>\r\n
<div class=\"m-top30 m-bottom20\">

我应该如何创建我的代码来获取所有这些项目?
*P.s我只想做
1) 获取"col-sm-12"中的元素div/"m-top30 m-bottom20"之前的最后一个元素
2) 忽略"col-sm-12"

中包含的某些元素

将选择器更改为 CSS 并在第一个 div 下添加 'p' 等过滤器可以帮助您。然而,从上面的html来看,并不清楚第一个div是否在第二个div开始之前结束。如果您分享有关 html 的更多详细信息,我们可能可以改进选择器。我已经在代码注释中说明了assumptions/my理解。

    String eTitle = doc.select("div.col-sm-12 > h1").text(); //I'm assuming you are trying to fetch the title text. 

    Elements eBody = doc.select("div.col-sm-12 > p , ol"); //This CSS selector will limit the 'p' elements to this div alone. 

    for (Element body : eBody) {
      //work with the 'body' element here.