JSoup.clean() 不保留相对 URL
JSoup.clean() is not preserving relative URLs
我试过:
Whitelist.relaxed();
Whitelist.relaxed().preserveRelativeLinks(true);
Whitelist.relaxed().addProtocols("a","href","#","/","http","https","mailto","ftp");
Whitelist.relaxed().addProtocols("a","href","#","/","http","https","mailto","ftp").preserveRelativeLinks(true);
None 有效:当我尝试清理亲戚 url 时,例如 <a href="/test.xhtml">test</a>
我删除了 href
属性 (<a>test</a>
) .
我正在使用 JSoup 1.8.2。
有什么想法吗?
问题很可能是调用clean方法引起的。如果您提供基本 URI,一切都应该按预期工作:
String html = ""
+ "<a href=\"/test.xhtml\">test</a>"
+ "<invalid>stuff</invalid>"
+ "<h2>header1</h2>";
String cleaned = Jsoup.clean(html, "http://base.uri", Whitelist.relaxed().preserveRelativeLinks(true));
System.out.println(cleaned);
以上工作并保持相对 links。 String cleaned = Jsoup.clean(html, Whitelist.relaxed().preserveRelativeLinks(true))
但是 link 被删除了。
注意 documentation of Whitelist.preserveRelativeLinks(true):
Note that when handling relative links, the input document must have
an appropriate base URI set when parsing, so that the link's protocol
can be confirmed. Regardless of the setting of the preserve relative
links option, the link must be resolvable against the base URI to an
allowed protocol; otherwise the attribute will be removed.
我试过:
Whitelist.relaxed();
Whitelist.relaxed().preserveRelativeLinks(true);
Whitelist.relaxed().addProtocols("a","href","#","/","http","https","mailto","ftp");
Whitelist.relaxed().addProtocols("a","href","#","/","http","https","mailto","ftp").preserveRelativeLinks(true);
None 有效:当我尝试清理亲戚 url 时,例如 <a href="/test.xhtml">test</a>
我删除了 href
属性 (<a>test</a>
) .
我正在使用 JSoup 1.8.2。
有什么想法吗?
问题很可能是调用clean方法引起的。如果您提供基本 URI,一切都应该按预期工作:
String html = ""
+ "<a href=\"/test.xhtml\">test</a>"
+ "<invalid>stuff</invalid>"
+ "<h2>header1</h2>";
String cleaned = Jsoup.clean(html, "http://base.uri", Whitelist.relaxed().preserveRelativeLinks(true));
System.out.println(cleaned);
以上工作并保持相对 links。 String cleaned = Jsoup.clean(html, Whitelist.relaxed().preserveRelativeLinks(true))
但是 link 被删除了。
注意 documentation of Whitelist.preserveRelativeLinks(true):
Note that when handling relative links, the input document must have an appropriate base URI set when parsing, so that the link's protocol can be confirmed. Regardless of the setting of the preserve relative links option, the link must be resolvable against the base URI to an allowed protocol; otherwise the attribute will be removed.