解析微数据时我得到空值
I'm getting empty values when parsing Microdata
<div class="content-sidebar-wrap"><main class="content" role="main" itemprop="mainContentOfPage" itemscope="itemscope" itemtype="http://schema.org/Blog"><article class="post-2334 post type-post status-publish format-standard has-post-thumbnail category-blog-basics entry" itemscope="itemscope" itemtype="http://schema.org/BlogPosting" itemprop="blogPost"><header class="entry-header"><h1 class="entry-title" itemprop="headline">Examples of Blogs</h1>
<p class="entry-meta">by <span class="entry-author" itemprop="author" itemscope="itemscope" itemtype="http://schema.org/Person"><span class="entry-author-name" itemprop="name">Kenneth Byrd</span></span> | Go from 0 to 5,000 blog subscribers in 60 days <a href="https://curlcentric.leadpages.net/leadbox/14581e773f72a2%3A12e927026b46dc/5758142528880640/" rel="nofollow">(Click Here)</a></p></header><img src="http://blogbasics.com/wp-content/uploads/Examples-of-Blogs.jpg" width="5315" height="3543" alt="examples of blogs" title="" class="attachment-tru-post wp-post-image" /><div class="entry-content" itemprop="text"><h3>Overview</h3>
<p>This article includes examples of blogs from various niches. There are millions of example blogs out there in all different shapes and sizes. A good place to start is <a href="http://technorati.com/" target="_blank">Technorati</a>, a directory of blogs, or <a href="http://alltop.com/" target="_blank">Alltop</a>. Search these websites and then come back and tell us about the good blogs and the bad blogs that you found. Below are also more examples of blogs that you should look at:</p>
<h3><strong>Personal blogs</strong></h3>
<p><a title="Curl Centric" href="http://www.curlcentric.com/natural-hair-101/" target="_blank">Curl Centric</a>: Dedicated to providing healthy hair care information.</p>
<h3>Travel</h3>
<p><a href="http://boardingarea.com/" target="_blank">Boarding Area</a>: A collection of bloggers on travel. Range from personal stories to specific advice on airlines, hotels and places.</p>
<div class="content-box-yellow"><a href="https://curlcentric.leadpages.net/leadbox/14581e773f72a2%3A12e927026b46dc/5758142528880640/">Learn how to start a blog that generates traffic, revenue, & popularity.</a></div>
<p><a href="http://vivisrandomramblings.blogspot.com/" target="_blank">Vivi’s Random Ramblings</a>: A nice collection of random posts mostly demonstrating that Violy is a well-travelled, excellent photographer.</p>
<!-- Quick Adsense WordPress Plugin: http://quicksense.net/ -->
<div style="float:none;margin:5px 0 5px 0;text-align:center;">
<script async src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>
<!-- Blog Basics - 300 x 250 -->
<ins class="adsbygoogle"
style="display:inline-block;width:300px;height:250px"
data-ad-client="ca-pub-5556427932737077"
data-ad-slot="6553509385"></ins>
<script>
(adsbygoogle = window.adsbygoogle || []).push({});
</script>
</div>
我正在尝试使用 Jsoup 库解析 HTML 源代码中所有 itemtype
属性中存在的所有 itemprop
的值。
这是示例 HTML 页面正文:
<body class="single single-post postid-2334 single-format-standard custom-header header-image header-full-width full-width-content" itemscope="itemscope" itemtype="http://schema.org/WebPage"><div class="site-container"><header class="site-header" role="banner" itemscope="itemscope" itemtype="http://schema.org/WPHeader"><div class="wrap"><div class="title-area"><p class="site-title" itemprop="headline"><a href="http://blogbasics.com/">Blog Basics</a></p><div id="title_image"><a href="http://blogbasics.com/" title="Blog Basics"><img src="http://blogbasics.com/wp-content/uploads/cropped-cropped-Win-1.png" title="Blog Basics" /></a><style>#title { display:none; }</style></div><p class="site-description" itemprop="description">Starting a blog? Learn how to make it amazing.</p></div></div></header><nav class="nav-primary" role="navigation" itemscope="itemscope" itemtype="http://schema.org/SiteNavigationElement"><div class="wrap"><ul id="menu-primary-navigation" class="menu genesis-nav-menu menu-primary"><li id="menu-item-2590" class="menu-item menu-item-type-custom menu-item-object-custom menu-item-home menu-item-2590"><a title="Blog Basics" href="http://blogbasics.com">Home</a></li>
<li id="menu-item-3187" class="menu-item menu-item-type-custom menu-item-object-custom menu-item-3187"><a href="http://blogbasics.com/blog">Blog</a></li>
<li id="menu-item-3722" class="menu-item menu-item-type-custom menu-item-object-custom menu-item-3722"><a href="http://blogbasics.com/welcome">Free Updates</a></li>
<li id="menu-item-2578" class="menu-item menu-item-type-post_type menu-item-object-page menu-item-2578"><a title="Blogging Tools" href="http://blogbasics.com/blogging-tools/">Blogging Tools</a></li>
</ul></div></nav><div class="site-inner"><div class="feature-area widget-area">
<div id="spyr_tru_notifybar-2" class="widget notify_bar"><div class="widget-wrap">Starting a blog? Learn how to make it awesome!</div></div>
<div id="spyr_tru_twocolumn-3" class="widget widget_spyr_tru_twocolumn"><div class="widget-wrap">
<div class="column one-half first original"><div align="middle"><a href="https://curlcentric.leadpages.net/leadbox/14581e773f72a2%3A12e927026b46dc/5758142528880640/" target="_blank"><img src="https://lh3.ggpht.com/wMcDasimpny0oKmsIxI0xowRIxbFpaa9Rjg3aAUMG8UUtp4XamG03gYcGlsXTRmvkFqgySaihhq2_KCSr8cN7Q=s0"></a><script data-leadbox="14581e773f72a2:12e927026b46dc" data-url="https://curlcentric.leadpages.net/leadbox/14581e773f72a2%3A12e927026b46dc/5758142528880640/" data-config="%7B%7D" type="text/javascript" src="https://curlcentric.leadpages.net/leadbox-910.js"></script></div>
</div>
<div class="column one-half last original"><p>Learn how to build a blog that generates traffic, revenue, & popularity in 30 days.</p>
<p>Just enter your email address in the box below and click "Submit".</p>
</div>
<div class="clear"></div>
</div></div>
<div id="spyr_tru_subscribesocial-2" class="widget feature-area-bottom tru_subscribe_social"><div class="widget-wrap">
<div class="tru_subscribesocial_wrap">
<form action="http://www.aweber.com/scripts/addlead.pl" method="post" target="_blank">
<div class="hidden_fields"><input type="hidden" name="meta_web_form_id" value="276964962" />
<input type="hidden" name="meta_split_id" value="" />
<input type="hidden" name="listname" value="awlist3567293" />
<input type="hidden" name="redirect" value="http://www.aweber.com/thankyou-coi.htm?m=text" id="redirect_f956eccce03104dc62dec5f8c897285e" />
<input type="hidden" name="meta_adtracking" value="Blog_Basics" />
<input type="hidden" name="meta_message" value="1" />
<input type="hidden" name="meta_required" value="email" />
<input type="hidden" name="meta_tooltip" value="" /></div>
<input type="email" class="default_value" name="email" value="Enter email to get updates" /></span>
<input type="submit" value="Submit" />
</form>
<div class="social_menu">
<ul id="menu-social" class="menu superfish">
</ul>
</div>
<div class="clear"></div>
</div>
</div></div>
</div><div class="content-sidebar-wrap"><main class="content" role="main" itemprop="mainContentOfPage" itemscope="itemscope" itemtype="http://schema.org/Blog"><article class="post-2334 post type-post status-publish format-standard has-post-thumbnail category-blog-basics entry" itemscope="itemscope" itemtype="http://schema.org/BlogPosting" itemprop="blogPost"><header class="entry-header"><h1 class="entry-title" itemprop="headline">Examples of Blogs</h1>
<p class="entry-meta">by <span class="entry-author" itemprop="author" itemscope="itemscope" itemtype="http://schema.org/Person"><span class="entry-author-name" itemprop="name">Kenneth Byrd</span></span> | Go from 0 to 5,000 blog subscribers in 60 days <a href="https://curlcentric.leadpages.net/leadbox/14581e773f72a2%3A12e927026b46dc/5758142528880640/" rel="nofollow">(Click Here)</a></p></header><img src="http://blogbasics.com/wp-content/uploads/Examples-of-Blogs.jpg" width="5315" height="3543" alt="examples of blogs" title="" class="attachment-tru-post wp-post-image" /><div class="entry-content" itemprop="text"><h3>Overview</h3>
<p>This article includes examples of blogs from various niches. There are millions of example blogs out there in all different shapes and sizes. A good place to start is <a href="http://technorati.com/" target="_blank">Technorati</a>, a directory of blogs, or <a href="http://alltop.com/" target="_blank">Alltop</a>. Search these websites and then come back and tell us about the good blogs and the bad blogs that you found. Below are also more examples of blogs that you should look at:</p>
<h3><strong>Personal blogs</strong></h3>
<p><a title="Curl Centric" href="http://www.curlcentric.com/natural-hair-101/" target="_blank">Curl Centric</a>: Dedicated to providing healthy hair care information.</p>
<h3>Travel</h3>
<p><a href="http://boardingarea.com/" target="_blank">Boarding Area</a>: Â A collection of bloggers on travel. Â Range from personal stories to specific advice on airlines, hotels and places.</p>
<div class="content-box-yellow"><a href="https://curlcentric.leadpages.net/leadbox/14581e773f72a2%3A12e927026b46dc/5758142528880640/">Learn how to start a blog that generates traffic, revenue, & popularity.</a></div>
<p><a href="http://vivisrandomramblings.blogspot.com/" target="_blank">Vivi’s Random Ramblings</a>: A nice collection of random posts mostly demonstrating that Violy is a well-travelled, excellent photographer.</p>
<!-- Quick Adsense WordPress Plugin: http://quicksense.net/ -->
<div style="float:none;margin:5px 0 5px 0;text-align:center;">
<script async src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>
<!-- Blog Basics - 300 x 250 -->
<ins class="adsbygoogle"
style="display:inline-block;width:300px;height:250px"
data-ad-client="ca-pub-5556427932737077"
data-ad-slot="6553509385"></ins>
<script>
(adsbygoogle = window.adsbygoogle || []).push({});
</script>
</div>
<p><a href="http://www.whygo.com/" target="_blank">Why go network of blogs</a>: Another group of travel bloggers. Â Each blogger has their own patch, which range from Portland, which looks a nice city, to Iceland and France.</p>
<h3>Technical</h3>
<p><a href="http://techcrunch.com/" target="_blank">Techcrunch</a>: Â This is the one to learn all about technology and in particular technology business, technology start-ups and gadgets. Â You’ll usually hear the techie gossip here first.</p>
<p><a href="http://speckyboy.com/2010/02/25/50-amazing-personal-blog-web-designs/" target="_blank">Speckyboy.com</a>: Great blog on the design of websites. Â Good on lists, (usually 50) of well researched examples of good or unusual design. Â Gives even the least technical good ideas to discuss with their own designers.</p>
<h3>On Blogging</h3>
<p><a href="http://www.trafficgenerationcafe.com/" target="_blank">Traffic Generation Cafe</a>: Ana Hoffman’s very friendly, very knowledgeable blog on building traffic for your blog.</p>
<div class="content-box-yellow"><a href="https://curlcentric.leadpages.net/leadbox/14581e773f72a2%3A12e927026b46dc/5758142528880640/">Learn how to start a blog that generates traffic, revenue, & popularity.</a></div>
<p><a href="http://blogbasics.com/blog/" target="_blank">Blog Basics</a>: This website is a blog that is focused on topics like ‘how to blog’ and ‘how to make money blogging’.</p>
<h3>Over to you</h3>
<p>Which blogs do you like? Â Are you writing a blog? Â Then tell us about it.</p>
<!-- Quick Adsense WordPress Plugin: http://quicksense.net/ -->
<div style="float:none;margin:5px 0 5px 0;text-align:center;">
<script async src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>
<!-- Banner -->
<ins class="adsbygoogle"
style="display:inline-block;width:468px;height:60px"
data-ad-client="ca-pub-5556427932737077"
data-ad-slot="1983708988"></ins>
<script>
(adsbygoogle = window.adsbygoogle || []).push({});
</script>
</div>
<div style="font-size:0px;height:0px;line-height:0px;margin:0;padding:0;clear:both"></div><div style="clear:both;"></div><div id='ois-1' class='ois-design' ><div class="ois-outer ois-8-outer">
<div class="ois-8-call-top"></div>
<div class="ois-8-inner ois-inner">
<div class="col-md-7 ois-8-left">
<div class="ois-8-title">Get Exclusive Tips</div>
<div class="ois-8-subtitle">Instantly discover how you can start a blog that generates traffic and income when you join the Blog Basics Tribe (It’s Free). Here's your chance. Just type in your email address.</div>
</div> <!-- .span7 left side -->
<div class="col-md-5 ois-8-right">
<div class="ois-8-img-wrapper">
<img src="data:image/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==" data-src="https://lh3.ggpht.com/wMcDasimpny0oKmsIxI0xowRIxbFpaa9Rjg3aAUMG8UUtp4XamG03gYcGlsXTRmvkFqgySaihhq2_KCSr8cN7Q=s0" class="ois-img ois-8-img" /><noscript><img src="https://lh3.ggpht.com/wMcDasimpny0oKmsIxI0xowRIxbFpaa9Rjg3aAUMG8UUtp4XamG03gYcGlsXTRmvkFqgySaihhq2_KCSr8cN7Q=s0" class="ois-img ois-8-img" /></noscript>
</div>
<div class="ois-8-form">
<form action="http://www.aweber.com/scripts/addlead.pl" method="post" id="ois-form-1" data-service="aweber" ><div id="ois-8-email-input-wrapper">
<input type="text" name="email" class="ois-8-email-input ois-email-input ois-form-control" placeholder="Your Email"/>
</div>
<div id="ois-8-button-wrapper">
<input type="submit" class="ois-btn ois-8-button" value="Submit"/>
</div><input type='hidden' name='listname' value='awlist3567293'/>
<input type='hidden' name='meta_message' value='1'/>
<input type='hidden' name='redirect' value='http://www.aweber.com/thankyou-coi.htm?m=video&e=example%40example.com&name=Example%20Subscriber&l=awlist3567293'/>
</form>
</div> <!-- #ois-8-form -->
</div><!-- .right .col-md-5 right side-->
<div style="clear:both"></div>
</div> <!-- inner -->
</div> <!-- outer --></div></div>
<div class="spyr_sliding_share">
<div class="spyr_sliding_share_text">Share this article</div>
<div class="spyr_sliding_share_wrap">
<div class="spyr_sliding_share_button spyr_sb_facebook">
<a href="#" class="icon icon-facebook"><span>Facebook</span></a>
<div class="spyr_sb_inner"><div class="fb-like" data-href="http://blogbasics.com/examples-of-blogs/" data-send="false" data-layout="button_count" data-width="100" data-show-faces="false"></div></div>
</div>
<div class="spyr_sliding_share_button spyr_sb_twitter">
<a href="#" class="icon icon-twitter"><span>Twitter</span></a>
<div class="spyr_sb_inner"><a href="https://twitter.com/share" class="twitter-share-button" data-url="http://blogbasics.com/examples-of-blogs/" data-text="Examples of Blogs | Blog Basics" data-via="kbyrdjr">Tweet</a></div>
</div>
<div class="spyr_sliding_share_button spyr_sb_gplus">
<a href="#" class="icon icon-gplus"><span>Google+</span></a>
<div class="spyr_sb_inner"><div class="g-plusone" data-size="medium" data-href="http://blogbasics.com/examples-of-blogs/"></div></div>
</div>
<div class="spyr_sliding_share_button spyr_sb_pinterest">
<a href="#" class="icon icon-pinterest"><span>Pinterest</span></a>
<div class="spyr_sb_inner"><a href="http://pinterest.com/pin/create/button/?url=http://blogbasics.com/examples-of-blogs/&media=http://blogbasics.com/wp-content/uploads/Examples-of-Blogs-550x367.jpg&description=Examples of Blogs" class="pin-it-button" count-layout="horizontal"><img border="0" src="//assets.pinterest.com/images/PinExt.png" title="Pin It" /></a></div>
</div>
<div class="spyr_sliding_share_button spyr_sb_mail">
<a href="#" class="icon icon-mail"><span>Email a Friend</span></a>
<div class="spyr_sb_inner"><a href="mailto:?subject=Examples of Blogs&body=I found value in this and I think you will too.%0A%0AExamples of Blogs: http://blogbasics.com/examples-of-blogs/">Email a Friend</a></div>
</div>
</div>
<div class="clear"></div>
</div><footer class="entry-footer"></footer></article><div class="entry-comments" id="comments"><h3>Comments</h3><ol class="comment-list">
<li class="comment even thread-even depth-1" id="comment-261">
<article itemprop="comment" itemscope="itemscope" itemtype="http://schema.org/UserComments">
<header class="comment-header">
<p class="comment-author" itemprop="creator" itemscope="itemscope" itemtype="http://schema.org/Person">
<img alt='' src="data:image/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==" data-src="http://0.gravatar.com/avatar/9e0fce9a7b5f7de7f6c528f448007f08?s=48&d=mm&r=g" srcset='http://0.gravatar.com/avatar/9e0fce9a7b5f7de7f6c528f448007f08?s=96&d=mm&r=g 2x' class='avatar avatar-48 photo' height='48' width='48' /><noscript><img alt='' src="http://0.gravatar.com/avatar/9e0fce9a7b5f7de7f6c528f448007f08?s=48&d=mm&r=g" srcset='http://0.gravatar.com/avatar/9e0fce9a7b5f7de7f6c528f448007f08?s=96&d=mm&r=g 2x' class='avatar avatar-48 photo' height='48' width='48' /></noscript><span itemprop="name">violy</span> <span class="says">says</span> </p>
<p class="comment-meta">
<time class="comment-time" datetime="2012-01-09T04:42:00+00:00" itemprop="commentTime"><a href="http://blogbasics.com/examples-of-blogs/#comment-261" class="comment-time-link" itemprop="url">January 9, 2012 at 4:42 am</a></time> </p>
</header>
<div class="comment-content" itemprop="commentText">
<p>Hi sir thank you so much for the nice compliment about my blog (Vivi’s Random Ramblings”), I’m blogging for not even 2 months now and it’s really overwhelming to see this compliment and getting a lot of good feedback  too and traffic which is a real surprise .. thank you so much!! – violy</p>
</div>
<div class="comment-reply"><a rel='nofollow' class='comment-reply-link' href='#comment-261' onclick='return addComment.moveForm( "comment-261", "261", "respond", "2334" )' aria-label='Reply to violy'>Reply</a></div>
</article>
<ul class="children">
<li class="comment odd alt depth-2" id="comment-262">
<article itemprop="comment" itemscope="itemscope" itemtype="http://schema.org/UserComments">
<header class="comment-header">
<p class="comment-author" itemprop="creator" itemscope="itemscope" itemtype="http://schema.org/Person">
<img alt='' src="data:image/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==" data-src="http://1.gravatar.com/avatar/4383ac0510061b683651a0eca3d58e42?s=48&d=mm&r=g" srcset='http://1.gravatar.com/avatar/4383ac0510061b683651a0eca3d58e42?s=96&d=mm&r=g 2x' class='avatar avatar-48 photo' height='48' width='48' /><noscript><img alt='' src="http://1.gravatar.com/avatar/4383ac0510061b683651a0eca3d58e42?s=48&d=mm&r=g" srcset='http://1.gravatar.com/avatar/4383ac0510061b683651a0eca3d58e42?s=96&d=mm&r=g 2x' class='avatar avatar-48 photo' height='48' width='48' /></noscript><span itemprop="name"><a href="http://blogbasics.com" class="comment-author-link" rel="external nofollow" itemprop="url">Paul Odtaa</a></span> <span class="says">says</span> </p>
<p class="comment-meta">
<time class="comment-time" datetime="2012-01-09T09:44:00+00:00" itemprop="commentTime"><a href="http://blogbasics.com/examples-of-blogs/#comment-262" class="comment-time-link" itemprop="url">January 9, 2012 at 9:44 am</a></time> </p>
</header>
<div class="comment-content" itemprop="commentText">
<p>Hi Violy, </p>
<p>I really like your blog and your photography is great. </p>
</div>
<div class="comment-reply"><a rel='nofollow' class='comment-reply-link' href='#comment-262' onclick='return addComment.moveForm( "comment-262", "262", "respond", "2334" )' aria-label='Reply to Paul Odtaa'>Reply</a></div>
</article>
</li><!-- #comment-## -->
</ul><!-- .children -->
</li><!-- #comment-## -->
<li class="comment even thread-odd thread-alt depth-1" id="comment-270">
<article itemprop="comment" itemscope="itemscope" itemtype="http://schema.org/UserComments">
<header class="comment-header">
<p class="comment-author" itemprop="creator" itemscope="itemscope" itemtype="http://schema.org/Person">
<img alt='' src="data:image/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==" data-src="http://1.gravatar.com/avatar/ad28f49a806d0e3abd1a94013a259b9b?s=48&d=mm&r=g" srcset='http://1.gravatar.com/avatar/ad28f49a806d0e3abd1a94013a259b9b?s=96&d=mm&r=g 2x' class='avatar avatar-48 photo' height='48' width='48' /><noscript><img alt='' src="http://1.gravatar.com/avatar/ad28f49a806d0e3abd1a94013a259b9b?s=48&d=mm&r=g" srcset='http://1.gravatar.com/avatar/ad28f49a806d0e3abd1a94013a259b9b?s=96&d=mm&r=g 2x' class='avatar avatar-48 photo' height='48' width='48' /></noscript><span itemprop="name"><a href="http://allisondduncan.com" class="comment-author-link" rel="external nofollow" itemprop="url">Allison Duncan</a></span> <span class="says">says</span> </p>
<p class="comment-meta">
<time class="comment-time" datetime="2012-01-20T21:17:00+00:00" itemprop="commentTime"><a href="http://blogbasics.com/examples-of-blogs/#comment-270" class="comment-time-link" itemprop="url">January 20, 2012 at 9:17 pm</a></time> </p>
</header>
<div class="comment-content" itemprop="commentText">
<p>Hi there,</p>
<p>Thanks for featuring my blog on your site. It’s always nice to see your work being appreciated and linked to.</p>
<p>I look forward to seeing what your site has coming down the pike.</p>
<p>Thanks for reading!</p>
<p>Allison</p>
</div>
<div class="comment-reply"><a rel='nofollow' class='comment-reply-link' href='#comment-270' onclick='return addComment.moveForm( "comment-270", "270", "respond", "2334" )' aria-label='Reply to Allison Duncan'>Reply</a></div>
</article>
</li><!-- #comment-## -->
我正在使用 jsoup 库来解析 HTML 并提取它。我正在尝试使用以下代码:
doc = Jsoup.connect("http://blogbasics.com/examples-of-blogs/").get();
Elements links = doc.select("itemtype > [itemprop]");
for (Element element : links) {
System.out.println(" itemprop :"+element.attr("itemprop"));
}
} catch (IOException e) {
e.printStackTrace();
}
但是我得到的是空值。我是这项工作的新手,请告诉我正确的代码。如果有任何其他方法可以从 HTML 中提取 itemtype
和 itemprop
,请分享这将有很大帮助。
<div class="content-sidebar-wrap">
<main class="content" role="main" itemprop="mainContentOfPage" itemscope="itemscope"
itemtype="http://schema.org/Blog"><article class="post-2334 post type-post status-publish
format-standard has-post-thumbnail category-blog-basics entry" itemscope="itemscope"
itemtype="http://schema.org/BlogPosting" itemprop="blogPost"><header class="entry-header">
<h1 class="entry-title" itemprop="headline">Examples of Blogs</h1>
<p class="entry-meta">by <span class="entry-author" itemprop="author" itemscope="itemscope"
itemtype="http://schema.org/Person"><span class="entry-author-name" itemprop="name">Kenneth Byrd</span></span> |
Go from 0 to 5,000 blog subscribers in 60 days
<a href="https://curlcentric.leadpages.net/leadbox/14581e773f72a2%3A12e927026b46dc/5758142528880640/" rel="nofollow">(Click Here)</a>
</p></header><img src="http://blogbasics.com/wp-content/uploads/Examples-of-Blogs.jpg" width="5315" height="3543"
alt="examples of blogs" title="" class="attachment-tru-post wp-post-image" /><div class="entry-content"
itemprop="text"><h3>Overview</h3><p>This article includes examples of blogs
from various niches. There are millions of example blogs out there in all
different shapes and sizes. A good place to start is
</p>
预期输出
itemtype="http://schema.org/Blog">
itemprop="mainContentOfPage"
itemtype="http://schema.org/BlogPosting"
itemprop="blogPost"
itemtype="http://schema.org/Person"
itemprop="author"
itemprop="name">
itemprop="text"
我不确定您真正想要什么,但您似乎需要获取包含属性 itemtype
的所有元素以及属性 itemprop
或仅包含 [=12] 的元素=] 而是包含 itemtype
的元素的直接 children。如果是这种情况,那么你可以使用这个:
String html = ""
+"<div class=\"content-sidebar-wrap\">"
+"<main class=\"content\" role=\"main\" itemprop=\"mainContentOfPage\" itemscope=\"itemscope\" "
+"itemtype=\"http://schema.org/Blog\"><article class=\"post-2334 post type-post status-publish "
+"format-standard has-post-thumbnail category-blog-basics entry\" itemscope=\"itemscope\" "
+"itemtype=\"http://schema.org/BlogPosting\" itemprop=\"blogPost\"><header class=\"entry-header\">"
+"<h1 class=\"entry-title\" itemprop=\"headline\">Examples of Blogs</h1> "
+"<p class=\"entry-meta\">by <span class=\"entry-author\" itemprop=\"author\" itemscope=\"itemscope\" "
+"itemtype=\"http://schema.org/Person\"><span class=\"entry-author-name\" itemprop=\"name\">Kenneth Byrd</span></span> |"
+" Go from 0 to 5,000 blog subscribers in 60 days"
+" <a href=\"https://curlcentric.leadpages.net/leadbox/14581e773f72a2%3A12e927026b46dc/5758142528880640/\" rel=\"nofollow\">(Click Here)</a>"
+" </p></header><img src=\"http://blogbasics.com/wp-content/uploads/Examples-of-Blogs.jpg\" width=\"5315\" height=\"3543\" "
+" alt=\"examples of blogs\" title=\"\" class=\"attachment-tru-post wp-post-image\" /><div class=\"entry-content\""
+" itemprop=\"text\"><h3>Overview</h3><p>This article includes examples of blogs"
+" from various niches. There are millions of example blogs out there in all "
+" different shapes and sizes. A good place to start is "
+" </p>"
;
Document doc = Jsoup.parse(html,"");
Elements els = doc.select("*[itemtype][itemprop], *[itemtype] > *[itemprop]");
for (Element el:els){
System.out.print(el.attr("itemtype").isEmpty()?"":("\n" +el.attr("itemtype")+"\n"));
System.out.println(el.attr("itemprop"));
}
重要的部分是 JSoup CSS selector *[itemtype][itemprop], *[itemtype] > *[itemprop]
,它有两部分:
*[itemtype][itemprop]
选择具有两个属性的元素。
*[itemtype] > *[itemprop]
选择具有属性 itemprop
的元素,这些元素是具有属性 itemtype
的元素的直接 children。如果你想允许所有 children,而不仅仅是直接的,那么就忽略 >
.
选择器之间的逗号用作 "OR",因此将返回与任何列出的选择器匹配的所有元素。
<div class="content-sidebar-wrap"><main class="content" role="main" itemprop="mainContentOfPage" itemscope="itemscope" itemtype="http://schema.org/Blog"><article class="post-2334 post type-post status-publish format-standard has-post-thumbnail category-blog-basics entry" itemscope="itemscope" itemtype="http://schema.org/BlogPosting" itemprop="blogPost"><header class="entry-header"><h1 class="entry-title" itemprop="headline">Examples of Blogs</h1>
<p class="entry-meta">by <span class="entry-author" itemprop="author" itemscope="itemscope" itemtype="http://schema.org/Person"><span class="entry-author-name" itemprop="name">Kenneth Byrd</span></span> | Go from 0 to 5,000 blog subscribers in 60 days <a href="https://curlcentric.leadpages.net/leadbox/14581e773f72a2%3A12e927026b46dc/5758142528880640/" rel="nofollow">(Click Here)</a></p></header><img src="http://blogbasics.com/wp-content/uploads/Examples-of-Blogs.jpg" width="5315" height="3543" alt="examples of blogs" title="" class="attachment-tru-post wp-post-image" /><div class="entry-content" itemprop="text"><h3>Overview</h3>
<p>This article includes examples of blogs from various niches. There are millions of example blogs out there in all different shapes and sizes. A good place to start is <a href="http://technorati.com/" target="_blank">Technorati</a>, a directory of blogs, or <a href="http://alltop.com/" target="_blank">Alltop</a>. Search these websites and then come back and tell us about the good blogs and the bad blogs that you found. Below are also more examples of blogs that you should look at:</p>
<h3><strong>Personal blogs</strong></h3>
<p><a title="Curl Centric" href="http://www.curlcentric.com/natural-hair-101/" target="_blank">Curl Centric</a>: Dedicated to providing healthy hair care information.</p>
<h3>Travel</h3>
<p><a href="http://boardingarea.com/" target="_blank">Boarding Area</a>: A collection of bloggers on travel. Range from personal stories to specific advice on airlines, hotels and places.</p>
<div class="content-box-yellow"><a href="https://curlcentric.leadpages.net/leadbox/14581e773f72a2%3A12e927026b46dc/5758142528880640/">Learn how to start a blog that generates traffic, revenue, & popularity.</a></div>
<p><a href="http://vivisrandomramblings.blogspot.com/" target="_blank">Vivi’s Random Ramblings</a>: A nice collection of random posts mostly demonstrating that Violy is a well-travelled, excellent photographer.</p>
<!-- Quick Adsense WordPress Plugin: http://quicksense.net/ -->
<div style="float:none;margin:5px 0 5px 0;text-align:center;">
<script async src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>
<!-- Blog Basics - 300 x 250 -->
<ins class="adsbygoogle"
style="display:inline-block;width:300px;height:250px"
data-ad-client="ca-pub-5556427932737077"
data-ad-slot="6553509385"></ins>
<script>
(adsbygoogle = window.adsbygoogle || []).push({});
</script>
</div>
我正在尝试使用 Jsoup 库解析 HTML 源代码中所有 itemtype
属性中存在的所有 itemprop
的值。
这是示例 HTML 页面正文:
<body class="single single-post postid-2334 single-format-standard custom-header header-image header-full-width full-width-content" itemscope="itemscope" itemtype="http://schema.org/WebPage"><div class="site-container"><header class="site-header" role="banner" itemscope="itemscope" itemtype="http://schema.org/WPHeader"><div class="wrap"><div class="title-area"><p class="site-title" itemprop="headline"><a href="http://blogbasics.com/">Blog Basics</a></p><div id="title_image"><a href="http://blogbasics.com/" title="Blog Basics"><img src="http://blogbasics.com/wp-content/uploads/cropped-cropped-Win-1.png" title="Blog Basics" /></a><style>#title { display:none; }</style></div><p class="site-description" itemprop="description">Starting a blog? Learn how to make it amazing.</p></div></div></header><nav class="nav-primary" role="navigation" itemscope="itemscope" itemtype="http://schema.org/SiteNavigationElement"><div class="wrap"><ul id="menu-primary-navigation" class="menu genesis-nav-menu menu-primary"><li id="menu-item-2590" class="menu-item menu-item-type-custom menu-item-object-custom menu-item-home menu-item-2590"><a title="Blog Basics" href="http://blogbasics.com">Home</a></li>
<li id="menu-item-3187" class="menu-item menu-item-type-custom menu-item-object-custom menu-item-3187"><a href="http://blogbasics.com/blog">Blog</a></li>
<li id="menu-item-3722" class="menu-item menu-item-type-custom menu-item-object-custom menu-item-3722"><a href="http://blogbasics.com/welcome">Free Updates</a></li>
<li id="menu-item-2578" class="menu-item menu-item-type-post_type menu-item-object-page menu-item-2578"><a title="Blogging Tools" href="http://blogbasics.com/blogging-tools/">Blogging Tools</a></li>
</ul></div></nav><div class="site-inner"><div class="feature-area widget-area">
<div id="spyr_tru_notifybar-2" class="widget notify_bar"><div class="widget-wrap">Starting a blog? Learn how to make it awesome!</div></div>
<div id="spyr_tru_twocolumn-3" class="widget widget_spyr_tru_twocolumn"><div class="widget-wrap">
<div class="column one-half first original"><div align="middle"><a href="https://curlcentric.leadpages.net/leadbox/14581e773f72a2%3A12e927026b46dc/5758142528880640/" target="_blank"><img src="https://lh3.ggpht.com/wMcDasimpny0oKmsIxI0xowRIxbFpaa9Rjg3aAUMG8UUtp4XamG03gYcGlsXTRmvkFqgySaihhq2_KCSr8cN7Q=s0"></a><script data-leadbox="14581e773f72a2:12e927026b46dc" data-url="https://curlcentric.leadpages.net/leadbox/14581e773f72a2%3A12e927026b46dc/5758142528880640/" data-config="%7B%7D" type="text/javascript" src="https://curlcentric.leadpages.net/leadbox-910.js"></script></div>
</div>
<div class="column one-half last original"><p>Learn how to build a blog that generates traffic, revenue, & popularity in 30 days.</p>
<p>Just enter your email address in the box below and click "Submit".</p>
</div>
<div class="clear"></div>
</div></div>
<div id="spyr_tru_subscribesocial-2" class="widget feature-area-bottom tru_subscribe_social"><div class="widget-wrap">
<div class="tru_subscribesocial_wrap">
<form action="http://www.aweber.com/scripts/addlead.pl" method="post" target="_blank">
<div class="hidden_fields"><input type="hidden" name="meta_web_form_id" value="276964962" />
<input type="hidden" name="meta_split_id" value="" />
<input type="hidden" name="listname" value="awlist3567293" />
<input type="hidden" name="redirect" value="http://www.aweber.com/thankyou-coi.htm?m=text" id="redirect_f956eccce03104dc62dec5f8c897285e" />
<input type="hidden" name="meta_adtracking" value="Blog_Basics" />
<input type="hidden" name="meta_message" value="1" />
<input type="hidden" name="meta_required" value="email" />
<input type="hidden" name="meta_tooltip" value="" /></div>
<input type="email" class="default_value" name="email" value="Enter email to get updates" /></span>
<input type="submit" value="Submit" />
</form>
<div class="social_menu">
<ul id="menu-social" class="menu superfish">
</ul>
</div>
<div class="clear"></div>
</div>
</div></div>
</div><div class="content-sidebar-wrap"><main class="content" role="main" itemprop="mainContentOfPage" itemscope="itemscope" itemtype="http://schema.org/Blog"><article class="post-2334 post type-post status-publish format-standard has-post-thumbnail category-blog-basics entry" itemscope="itemscope" itemtype="http://schema.org/BlogPosting" itemprop="blogPost"><header class="entry-header"><h1 class="entry-title" itemprop="headline">Examples of Blogs</h1>
<p class="entry-meta">by <span class="entry-author" itemprop="author" itemscope="itemscope" itemtype="http://schema.org/Person"><span class="entry-author-name" itemprop="name">Kenneth Byrd</span></span> | Go from 0 to 5,000 blog subscribers in 60 days <a href="https://curlcentric.leadpages.net/leadbox/14581e773f72a2%3A12e927026b46dc/5758142528880640/" rel="nofollow">(Click Here)</a></p></header><img src="http://blogbasics.com/wp-content/uploads/Examples-of-Blogs.jpg" width="5315" height="3543" alt="examples of blogs" title="" class="attachment-tru-post wp-post-image" /><div class="entry-content" itemprop="text"><h3>Overview</h3>
<p>This article includes examples of blogs from various niches. There are millions of example blogs out there in all different shapes and sizes. A good place to start is <a href="http://technorati.com/" target="_blank">Technorati</a>, a directory of blogs, or <a href="http://alltop.com/" target="_blank">Alltop</a>. Search these websites and then come back and tell us about the good blogs and the bad blogs that you found. Below are also more examples of blogs that you should look at:</p>
<h3><strong>Personal blogs</strong></h3>
<p><a title="Curl Centric" href="http://www.curlcentric.com/natural-hair-101/" target="_blank">Curl Centric</a>: Dedicated to providing healthy hair care information.</p>
<h3>Travel</h3>
<p><a href="http://boardingarea.com/" target="_blank">Boarding Area</a>: Â A collection of bloggers on travel. Â Range from personal stories to specific advice on airlines, hotels and places.</p>
<div class="content-box-yellow"><a href="https://curlcentric.leadpages.net/leadbox/14581e773f72a2%3A12e927026b46dc/5758142528880640/">Learn how to start a blog that generates traffic, revenue, & popularity.</a></div>
<p><a href="http://vivisrandomramblings.blogspot.com/" target="_blank">Vivi’s Random Ramblings</a>: A nice collection of random posts mostly demonstrating that Violy is a well-travelled, excellent photographer.</p>
<!-- Quick Adsense WordPress Plugin: http://quicksense.net/ -->
<div style="float:none;margin:5px 0 5px 0;text-align:center;">
<script async src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>
<!-- Blog Basics - 300 x 250 -->
<ins class="adsbygoogle"
style="display:inline-block;width:300px;height:250px"
data-ad-client="ca-pub-5556427932737077"
data-ad-slot="6553509385"></ins>
<script>
(adsbygoogle = window.adsbygoogle || []).push({});
</script>
</div>
<p><a href="http://www.whygo.com/" target="_blank">Why go network of blogs</a>: Another group of travel bloggers. Â Each blogger has their own patch, which range from Portland, which looks a nice city, to Iceland and France.</p>
<h3>Technical</h3>
<p><a href="http://techcrunch.com/" target="_blank">Techcrunch</a>: Â This is the one to learn all about technology and in particular technology business, technology start-ups and gadgets. Â You’ll usually hear the techie gossip here first.</p>
<p><a href="http://speckyboy.com/2010/02/25/50-amazing-personal-blog-web-designs/" target="_blank">Speckyboy.com</a>: Great blog on the design of websites. Â Good on lists, (usually 50) of well researched examples of good or unusual design. Â Gives even the least technical good ideas to discuss with their own designers.</p>
<h3>On Blogging</h3>
<p><a href="http://www.trafficgenerationcafe.com/" target="_blank">Traffic Generation Cafe</a>: Ana Hoffman’s very friendly, very knowledgeable blog on building traffic for your blog.</p>
<div class="content-box-yellow"><a href="https://curlcentric.leadpages.net/leadbox/14581e773f72a2%3A12e927026b46dc/5758142528880640/">Learn how to start a blog that generates traffic, revenue, & popularity.</a></div>
<p><a href="http://blogbasics.com/blog/" target="_blank">Blog Basics</a>: This website is a blog that is focused on topics like ‘how to blog’ and ‘how to make money blogging’.</p>
<h3>Over to you</h3>
<p>Which blogs do you like? Â Are you writing a blog? Â Then tell us about it.</p>
<!-- Quick Adsense WordPress Plugin: http://quicksense.net/ -->
<div style="float:none;margin:5px 0 5px 0;text-align:center;">
<script async src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>
<!-- Banner -->
<ins class="adsbygoogle"
style="display:inline-block;width:468px;height:60px"
data-ad-client="ca-pub-5556427932737077"
data-ad-slot="1983708988"></ins>
<script>
(adsbygoogle = window.adsbygoogle || []).push({});
</script>
</div>
<div style="font-size:0px;height:0px;line-height:0px;margin:0;padding:0;clear:both"></div><div style="clear:both;"></div><div id='ois-1' class='ois-design' ><div class="ois-outer ois-8-outer">
<div class="ois-8-call-top"></div>
<div class="ois-8-inner ois-inner">
<div class="col-md-7 ois-8-left">
<div class="ois-8-title">Get Exclusive Tips</div>
<div class="ois-8-subtitle">Instantly discover how you can start a blog that generates traffic and income when you join the Blog Basics Tribe (It’s Free). Here's your chance. Just type in your email address.</div>
</div> <!-- .span7 left side -->
<div class="col-md-5 ois-8-right">
<div class="ois-8-img-wrapper">
<img src="data:image/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==" data-src="https://lh3.ggpht.com/wMcDasimpny0oKmsIxI0xowRIxbFpaa9Rjg3aAUMG8UUtp4XamG03gYcGlsXTRmvkFqgySaihhq2_KCSr8cN7Q=s0" class="ois-img ois-8-img" /><noscript><img src="https://lh3.ggpht.com/wMcDasimpny0oKmsIxI0xowRIxbFpaa9Rjg3aAUMG8UUtp4XamG03gYcGlsXTRmvkFqgySaihhq2_KCSr8cN7Q=s0" class="ois-img ois-8-img" /></noscript>
</div>
<div class="ois-8-form">
<form action="http://www.aweber.com/scripts/addlead.pl" method="post" id="ois-form-1" data-service="aweber" ><div id="ois-8-email-input-wrapper">
<input type="text" name="email" class="ois-8-email-input ois-email-input ois-form-control" placeholder="Your Email"/>
</div>
<div id="ois-8-button-wrapper">
<input type="submit" class="ois-btn ois-8-button" value="Submit"/>
</div><input type='hidden' name='listname' value='awlist3567293'/>
<input type='hidden' name='meta_message' value='1'/>
<input type='hidden' name='redirect' value='http://www.aweber.com/thankyou-coi.htm?m=video&e=example%40example.com&name=Example%20Subscriber&l=awlist3567293'/>
</form>
</div> <!-- #ois-8-form -->
</div><!-- .right .col-md-5 right side-->
<div style="clear:both"></div>
</div> <!-- inner -->
</div> <!-- outer --></div></div>
<div class="spyr_sliding_share">
<div class="spyr_sliding_share_text">Share this article</div>
<div class="spyr_sliding_share_wrap">
<div class="spyr_sliding_share_button spyr_sb_facebook">
<a href="#" class="icon icon-facebook"><span>Facebook</span></a>
<div class="spyr_sb_inner"><div class="fb-like" data-href="http://blogbasics.com/examples-of-blogs/" data-send="false" data-layout="button_count" data-width="100" data-show-faces="false"></div></div>
</div>
<div class="spyr_sliding_share_button spyr_sb_twitter">
<a href="#" class="icon icon-twitter"><span>Twitter</span></a>
<div class="spyr_sb_inner"><a href="https://twitter.com/share" class="twitter-share-button" data-url="http://blogbasics.com/examples-of-blogs/" data-text="Examples of Blogs | Blog Basics" data-via="kbyrdjr">Tweet</a></div>
</div>
<div class="spyr_sliding_share_button spyr_sb_gplus">
<a href="#" class="icon icon-gplus"><span>Google+</span></a>
<div class="spyr_sb_inner"><div class="g-plusone" data-size="medium" data-href="http://blogbasics.com/examples-of-blogs/"></div></div>
</div>
<div class="spyr_sliding_share_button spyr_sb_pinterest">
<a href="#" class="icon icon-pinterest"><span>Pinterest</span></a>
<div class="spyr_sb_inner"><a href="http://pinterest.com/pin/create/button/?url=http://blogbasics.com/examples-of-blogs/&media=http://blogbasics.com/wp-content/uploads/Examples-of-Blogs-550x367.jpg&description=Examples of Blogs" class="pin-it-button" count-layout="horizontal"><img border="0" src="//assets.pinterest.com/images/PinExt.png" title="Pin It" /></a></div>
</div>
<div class="spyr_sliding_share_button spyr_sb_mail">
<a href="#" class="icon icon-mail"><span>Email a Friend</span></a>
<div class="spyr_sb_inner"><a href="mailto:?subject=Examples of Blogs&body=I found value in this and I think you will too.%0A%0AExamples of Blogs: http://blogbasics.com/examples-of-blogs/">Email a Friend</a></div>
</div>
</div>
<div class="clear"></div>
</div><footer class="entry-footer"></footer></article><div class="entry-comments" id="comments"><h3>Comments</h3><ol class="comment-list">
<li class="comment even thread-even depth-1" id="comment-261">
<article itemprop="comment" itemscope="itemscope" itemtype="http://schema.org/UserComments">
<header class="comment-header">
<p class="comment-author" itemprop="creator" itemscope="itemscope" itemtype="http://schema.org/Person">
<img alt='' src="data:image/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==" data-src="http://0.gravatar.com/avatar/9e0fce9a7b5f7de7f6c528f448007f08?s=48&d=mm&r=g" srcset='http://0.gravatar.com/avatar/9e0fce9a7b5f7de7f6c528f448007f08?s=96&d=mm&r=g 2x' class='avatar avatar-48 photo' height='48' width='48' /><noscript><img alt='' src="http://0.gravatar.com/avatar/9e0fce9a7b5f7de7f6c528f448007f08?s=48&d=mm&r=g" srcset='http://0.gravatar.com/avatar/9e0fce9a7b5f7de7f6c528f448007f08?s=96&d=mm&r=g 2x' class='avatar avatar-48 photo' height='48' width='48' /></noscript><span itemprop="name">violy</span> <span class="says">says</span> </p>
<p class="comment-meta">
<time class="comment-time" datetime="2012-01-09T04:42:00+00:00" itemprop="commentTime"><a href="http://blogbasics.com/examples-of-blogs/#comment-261" class="comment-time-link" itemprop="url">January 9, 2012 at 4:42 am</a></time> </p>
</header>
<div class="comment-content" itemprop="commentText">
<p>Hi sir thank you so much for the nice compliment about my blog (Vivi’s Random Ramblings”), I’m blogging for not even 2 months now and it’s really overwhelming to see this compliment and getting a lot of good feedback  too and traffic which is a real surprise .. thank you so much!! – violy</p>
</div>
<div class="comment-reply"><a rel='nofollow' class='comment-reply-link' href='#comment-261' onclick='return addComment.moveForm( "comment-261", "261", "respond", "2334" )' aria-label='Reply to violy'>Reply</a></div>
</article>
<ul class="children">
<li class="comment odd alt depth-2" id="comment-262">
<article itemprop="comment" itemscope="itemscope" itemtype="http://schema.org/UserComments">
<header class="comment-header">
<p class="comment-author" itemprop="creator" itemscope="itemscope" itemtype="http://schema.org/Person">
<img alt='' src="data:image/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==" data-src="http://1.gravatar.com/avatar/4383ac0510061b683651a0eca3d58e42?s=48&d=mm&r=g" srcset='http://1.gravatar.com/avatar/4383ac0510061b683651a0eca3d58e42?s=96&d=mm&r=g 2x' class='avatar avatar-48 photo' height='48' width='48' /><noscript><img alt='' src="http://1.gravatar.com/avatar/4383ac0510061b683651a0eca3d58e42?s=48&d=mm&r=g" srcset='http://1.gravatar.com/avatar/4383ac0510061b683651a0eca3d58e42?s=96&d=mm&r=g 2x' class='avatar avatar-48 photo' height='48' width='48' /></noscript><span itemprop="name"><a href="http://blogbasics.com" class="comment-author-link" rel="external nofollow" itemprop="url">Paul Odtaa</a></span> <span class="says">says</span> </p>
<p class="comment-meta">
<time class="comment-time" datetime="2012-01-09T09:44:00+00:00" itemprop="commentTime"><a href="http://blogbasics.com/examples-of-blogs/#comment-262" class="comment-time-link" itemprop="url">January 9, 2012 at 9:44 am</a></time> </p>
</header>
<div class="comment-content" itemprop="commentText">
<p>Hi Violy, </p>
<p>I really like your blog and your photography is great. </p>
</div>
<div class="comment-reply"><a rel='nofollow' class='comment-reply-link' href='#comment-262' onclick='return addComment.moveForm( "comment-262", "262", "respond", "2334" )' aria-label='Reply to Paul Odtaa'>Reply</a></div>
</article>
</li><!-- #comment-## -->
</ul><!-- .children -->
</li><!-- #comment-## -->
<li class="comment even thread-odd thread-alt depth-1" id="comment-270">
<article itemprop="comment" itemscope="itemscope" itemtype="http://schema.org/UserComments">
<header class="comment-header">
<p class="comment-author" itemprop="creator" itemscope="itemscope" itemtype="http://schema.org/Person">
<img alt='' src="data:image/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==" data-src="http://1.gravatar.com/avatar/ad28f49a806d0e3abd1a94013a259b9b?s=48&d=mm&r=g" srcset='http://1.gravatar.com/avatar/ad28f49a806d0e3abd1a94013a259b9b?s=96&d=mm&r=g 2x' class='avatar avatar-48 photo' height='48' width='48' /><noscript><img alt='' src="http://1.gravatar.com/avatar/ad28f49a806d0e3abd1a94013a259b9b?s=48&d=mm&r=g" srcset='http://1.gravatar.com/avatar/ad28f49a806d0e3abd1a94013a259b9b?s=96&d=mm&r=g 2x' class='avatar avatar-48 photo' height='48' width='48' /></noscript><span itemprop="name"><a href="http://allisondduncan.com" class="comment-author-link" rel="external nofollow" itemprop="url">Allison Duncan</a></span> <span class="says">says</span> </p>
<p class="comment-meta">
<time class="comment-time" datetime="2012-01-20T21:17:00+00:00" itemprop="commentTime"><a href="http://blogbasics.com/examples-of-blogs/#comment-270" class="comment-time-link" itemprop="url">January 20, 2012 at 9:17 pm</a></time> </p>
</header>
<div class="comment-content" itemprop="commentText">
<p>Hi there,</p>
<p>Thanks for featuring my blog on your site. It’s always nice to see your work being appreciated and linked to.</p>
<p>I look forward to seeing what your site has coming down the pike.</p>
<p>Thanks for reading!</p>
<p>Allison</p>
</div>
<div class="comment-reply"><a rel='nofollow' class='comment-reply-link' href='#comment-270' onclick='return addComment.moveForm( "comment-270", "270", "respond", "2334" )' aria-label='Reply to Allison Duncan'>Reply</a></div>
</article>
</li><!-- #comment-## -->
我正在使用 jsoup 库来解析 HTML 并提取它。我正在尝试使用以下代码:
doc = Jsoup.connect("http://blogbasics.com/examples-of-blogs/").get();
Elements links = doc.select("itemtype > [itemprop]");
for (Element element : links) {
System.out.println(" itemprop :"+element.attr("itemprop"));
}
} catch (IOException e) {
e.printStackTrace();
}
但是我得到的是空值。我是这项工作的新手,请告诉我正确的代码。如果有任何其他方法可以从 HTML 中提取 itemtype
和 itemprop
,请分享这将有很大帮助。
<div class="content-sidebar-wrap">
<main class="content" role="main" itemprop="mainContentOfPage" itemscope="itemscope"
itemtype="http://schema.org/Blog"><article class="post-2334 post type-post status-publish
format-standard has-post-thumbnail category-blog-basics entry" itemscope="itemscope"
itemtype="http://schema.org/BlogPosting" itemprop="blogPost"><header class="entry-header">
<h1 class="entry-title" itemprop="headline">Examples of Blogs</h1>
<p class="entry-meta">by <span class="entry-author" itemprop="author" itemscope="itemscope"
itemtype="http://schema.org/Person"><span class="entry-author-name" itemprop="name">Kenneth Byrd</span></span> |
Go from 0 to 5,000 blog subscribers in 60 days
<a href="https://curlcentric.leadpages.net/leadbox/14581e773f72a2%3A12e927026b46dc/5758142528880640/" rel="nofollow">(Click Here)</a>
</p></header><img src="http://blogbasics.com/wp-content/uploads/Examples-of-Blogs.jpg" width="5315" height="3543"
alt="examples of blogs" title="" class="attachment-tru-post wp-post-image" /><div class="entry-content"
itemprop="text"><h3>Overview</h3><p>This article includes examples of blogs
from various niches. There are millions of example blogs out there in all
different shapes and sizes. A good place to start is
</p>
预期输出
itemtype="http://schema.org/Blog">
itemprop="mainContentOfPage"
itemtype="http://schema.org/BlogPosting"
itemprop="blogPost"
itemtype="http://schema.org/Person"
itemprop="author"
itemprop="name">
itemprop="text"
我不确定您真正想要什么,但您似乎需要获取包含属性 itemtype
的所有元素以及属性 itemprop
或仅包含 [=12] 的元素=] 而是包含 itemtype
的元素的直接 children。如果是这种情况,那么你可以使用这个:
String html = ""
+"<div class=\"content-sidebar-wrap\">"
+"<main class=\"content\" role=\"main\" itemprop=\"mainContentOfPage\" itemscope=\"itemscope\" "
+"itemtype=\"http://schema.org/Blog\"><article class=\"post-2334 post type-post status-publish "
+"format-standard has-post-thumbnail category-blog-basics entry\" itemscope=\"itemscope\" "
+"itemtype=\"http://schema.org/BlogPosting\" itemprop=\"blogPost\"><header class=\"entry-header\">"
+"<h1 class=\"entry-title\" itemprop=\"headline\">Examples of Blogs</h1> "
+"<p class=\"entry-meta\">by <span class=\"entry-author\" itemprop=\"author\" itemscope=\"itemscope\" "
+"itemtype=\"http://schema.org/Person\"><span class=\"entry-author-name\" itemprop=\"name\">Kenneth Byrd</span></span> |"
+" Go from 0 to 5,000 blog subscribers in 60 days"
+" <a href=\"https://curlcentric.leadpages.net/leadbox/14581e773f72a2%3A12e927026b46dc/5758142528880640/\" rel=\"nofollow\">(Click Here)</a>"
+" </p></header><img src=\"http://blogbasics.com/wp-content/uploads/Examples-of-Blogs.jpg\" width=\"5315\" height=\"3543\" "
+" alt=\"examples of blogs\" title=\"\" class=\"attachment-tru-post wp-post-image\" /><div class=\"entry-content\""
+" itemprop=\"text\"><h3>Overview</h3><p>This article includes examples of blogs"
+" from various niches. There are millions of example blogs out there in all "
+" different shapes and sizes. A good place to start is "
+" </p>"
;
Document doc = Jsoup.parse(html,"");
Elements els = doc.select("*[itemtype][itemprop], *[itemtype] > *[itemprop]");
for (Element el:els){
System.out.print(el.attr("itemtype").isEmpty()?"":("\n" +el.attr("itemtype")+"\n"));
System.out.println(el.attr("itemprop"));
}
重要的部分是 JSoup CSS selector *[itemtype][itemprop], *[itemtype] > *[itemprop]
,它有两部分:
*[itemtype][itemprop]
选择具有两个属性的元素。*[itemtype] > *[itemprop]
选择具有属性itemprop
的元素,这些元素是具有属性itemtype
的元素的直接 children。如果你想允许所有 children,而不仅仅是直接的,那么就忽略>
.
选择器之间的逗号用作 "OR",因此将返回与任何列出的选择器匹配的所有元素。