Jaunt Java getText() 返回正确的文本,但有很多“?”
Jaunt Java getText() returning correct text but with lots of "?"
标题说明了一切,另外,我已经尝试删除它们
(因为文本在那里,但不是 "aldo" 而是 "al?do",而且它似乎有一个随机模式)
(String).replace("?", "")
,但没有成功。
我也用过这个,结合UTF_8,UTF_16和ISO-8859,没有成功。
byte[] ptext = tempName.getBytes(UTF_8);
String tempName1 = new String(ptext, UTF_16);
我得到的示例:
Studded Regular Sweatshirt // Instead of this
S?tudde?d R?eg?ular? Sw?eats?h?irt // I get this
会不会是注意到无头浏览器并试图 "spoof" 其内容的网站?我该如何克服这个问题?
您抓取意图的网站很可能将 3f
和 64
字符混入了您的结果。
所以你必须把自己伪装成一个普通的浏览器,以通过替换来废弃或过滤掉它。
文字简单
Sca???rfa???ce??? E???mbr???oi�d???ered L�e???athe
过滤后
Scarface Embroidered Leather
//Sca???rfa???ce??? E???mbr???oi�d???ered L�e???athe
//Scarface Embroidered Leathe
String hex="5363613f3f3f7266613f3f3f63653f3f3f20453f3f3f6d62723f3f3f6f69643f3f3f65726564204c653f3f3f61746865";
byte[] bytes= hexStringToBytes(hex);
//the only line you need
String res = new String(bytes,"UTF-8").replaceAll("\\u003f","").replaceAll('�',"").replaceAll("�","");
private static byte charToByte(char c) {
return (byte) "0123456789ABCDEF".indexOf(new String(c));
}
public static byte[] hexStringToBytes(String hexString) {
if (hexString == null || hexString.equals("")) {
return null;
}
hexString = hexString.toUpperCase();
int length = hexString.length() / 2;
char[] hexChars = hexString.toCharArray();
byte[] d = new byte[length];
for (int i = 0; i < length; i++) {
int pos = i * 2;
d[i] = (byte) (charToByte(hexChars[pos]) << 4 | charToByte(hexChars[pos + 1]));
}
return d;
}
public static String bytesToHexString(byte[] src){
StringBuilder stringBuilder = new StringBuilder("");
if (src == null || src.length <= 0) {
return null;
}
for (int i = 0; i < src.length; i++) {
int v = src[i] & 0xFF;
String hv = Integer.toHexString(v);
if (hv.length() < 2) {
stringBuilder.append(0);
}
stringBuilder.append(hv);
}
return stringBuilder.toString();
}
public String printHexString( byte[] b) {
String a = "";
for (int i = 0; i < b.length; i++) {
String hex = Integer.toHexString(b[i] & 0xFF);
if (hex.length() == 1) {
hex = '0' + hex;
}
a = a+hex;
}
return a;
}
标题说明了一切,另外,我已经尝试删除它们
(因为文本在那里,但不是 "aldo" 而是 "al?do",而且它似乎有一个随机模式)
(String).replace("?", "")
,但没有成功。
我也用过这个,结合UTF_8,UTF_16和ISO-8859,没有成功。
byte[] ptext = tempName.getBytes(UTF_8);
String tempName1 = new String(ptext, UTF_16);
我得到的示例:
Studded Regular Sweatshirt // Instead of this
S?tudde?d R?eg?ular? Sw?eats?h?irt // I get this
会不会是注意到无头浏览器并试图 "spoof" 其内容的网站?我该如何克服这个问题?
您抓取意图的网站很可能将 3f
和 64
字符混入了您的结果。
所以你必须把自己伪装成一个普通的浏览器,以通过替换来废弃或过滤掉它。
文字简单
Sca???rfa???ce??? E???mbr???oi�d???ered L�e???athe
过滤后
Scarface Embroidered Leather
//Sca???rfa???ce??? E???mbr???oi�d???ered L�e???athe
//Scarface Embroidered Leathe
String hex="5363613f3f3f7266613f3f3f63653f3f3f20453f3f3f6d62723f3f3f6f69643f3f3f65726564204c653f3f3f61746865";
byte[] bytes= hexStringToBytes(hex);
//the only line you need
String res = new String(bytes,"UTF-8").replaceAll("\\u003f","").replaceAll('�',"").replaceAll("�","");
private static byte charToByte(char c) {
return (byte) "0123456789ABCDEF".indexOf(new String(c));
}
public static byte[] hexStringToBytes(String hexString) {
if (hexString == null || hexString.equals("")) {
return null;
}
hexString = hexString.toUpperCase();
int length = hexString.length() / 2;
char[] hexChars = hexString.toCharArray();
byte[] d = new byte[length];
for (int i = 0; i < length; i++) {
int pos = i * 2;
d[i] = (byte) (charToByte(hexChars[pos]) << 4 | charToByte(hexChars[pos + 1]));
}
return d;
}
public static String bytesToHexString(byte[] src){
StringBuilder stringBuilder = new StringBuilder("");
if (src == null || src.length <= 0) {
return null;
}
for (int i = 0; i < src.length; i++) {
int v = src[i] & 0xFF;
String hv = Integer.toHexString(v);
if (hv.length() < 2) {
stringBuilder.append(0);
}
stringBuilder.append(hv);
}
return stringBuilder.toString();
}
public String printHexString( byte[] b) {
String a = "";
for (int i = 0; i < b.length; i++) {
String hex = Integer.toHexString(b[i] & 0xFF);
if (hex.length() == 1) {
hex = '0' + hex;
}
a = a+hex;
}
return a;
}