在耶拿从 url 读 RDF/XML

Question

我正在尝试使用 Jena 读取 XML 文件。并且正常工作。

    final String url = "http://www.bbc.co.uk/nature/life/Human";
    Model model = ModelFactory.createDefaultModel();       
    model.read(url, "RDF/XML");

但是当我尝试另一个 URL 时，该段落包含 br 或 link。它给我这个错误。

Exception in thread "main" org.apache.jena.riot.RiotException: [line: 25, col: 6 ] {E202} Cannot have both string data "Great white sharks are at the very top of the marine food chain. Feared as man-eaters, they are only responsible for about 5-10 attacks a year, which are rarely fatal. Great whites are ultimate predators. Powerful streamlined bodies and a mouth full of terrifyingly sharp, serrated teeth, combine with super senses that can detect a single drop of blood from over a mile away. Hiding from a great white isn't an option as they can detect and home in on small electrical discharges from hearts and gills. Unlike most other sharks, live young are born that immediately swim away.
" and XML data <br> inside a property element. Maybe you want rdf:parseType='Literal'.

这是 Jena 抛出此错误的第二种情况 link http://www.bbc.co.uk/nature/life/Great_white_shark

我应该怎么做才能让它忽略它。

Answer 1

问题出在BBC站点的数据上； <br/> 需要转义为 <br/> 以将 HTML 标记放入字符串值中。在 RDF/XML 中，字符串值不能包含简单字符串的原始标记。

不幸的是，BBC 站点没有完全处理 content-negotiation：请求 Turtle 或 N-triples 得到一个 XHMTL 页面。

您需要使用常规 HTTP 请求下载文件，使用 header Accept: application/rdf+xml，修补内容，并从固定版本解析它。一种方法是将其读入 Java 字符串，执行正则表达式将 <br/> 替换为 <br/>，然后从字符串中解析。

在耶拿从 url 读 RDF/XML

read RDF/XML from url in Jena

xml

rdf

jena