Google 图片搜索 Xpath - 提取部分 Text()
Google images search Xpath - extract part of Text()
我需要很多图片。 Google 图片搜索当然是一个很好的来源。
我一直在寻找最好的方法来做到这一点。获得较小的 "thumbnail" 图像是可能的,但我想要原始尺寸。
使用:
//*[@id="rg_s"]/div/div/text()
我确实找到了原始尺寸的 URL。例如:
{"cb":9,"cl":9,"cr":9,"ct":9,"id":"twpCKa-qACVbrM:","isu":"twitter.com",
"itg":false,"ity":"jpg","oh":512,"ou":
"https://pbs.twimg.com/profile_images/698459967624474624/FsezpZpl.jpg",
"ow":512,"pt":"Manchester United (@ManUtd) | Twitter","rid":"5Q1F7uGUbUotPM",
"ru":"https://twitter.com/manutd","s":"","sc":1,"th":225,"tu":
"https://encrypted-tbn2.gstatic.com/images?
q\u003dtbn:ANd9GcRELkTX0VqGU4OHs9sgS93dedTCNsW0TvJT3S72YuOCCHfXxZSa","tw":225}
有:
https://pbs.twimg.com/profile_images/698459967624474624/FsezpZpl.jpg
正在 URL 到原始大小。我真的不知道这个文本块实际上可以在网站上的什么地方找到。但是我想知道的是 URL 它自己是否可以被隔离和提取?
您不能使用 XPath 提取 JSON 值的一部分,但您可以对使用 XPath 找到的文本值使用正则表达式。例如:
namespace ConsoleApplication1
{
public class Program
{
static void Main(string[] args)
{
//Load XML ....
//string s = xml.SelectSingleNode('//*[@id="rg_s"]/div/div/text()').Value
string s = @"{""cb"":9,""cl"":9,""cr"":9,""ct"":9,""id"":""twpCKa-qACVbrM:"",""isu"":""twitter.com"",
""itg"":false,""ity"":""jpg"",""oh"":512,""ou"":
""https://pbs.twimg.com/profile_images/698459967624474624/FsezpZpl.jpg"",
""ow"":512,""pt"":""Manchester United (@ManUtd) | Twitter"",""rid"":""5Q1F7uGUbUotPM"",
""ru"":""https://twitter.com/manutd"",""s"":"""",""sc"":1,""th"":225,""tu"":
""https://encrypted-tbn2.gstatic.com/images?
q\u003dtbn:ANd9GcRELkTX0VqGU4OHs9sgS93dedTCNsW0TvJT3S72YuOCCHfXxZSa"",""tw"":225}";
Console.WriteLine(System.Text.RegularExpressions.Regex.Match(s, "\"ou\":\s*?\"([^\"]+)\"").Groups[1].Value);
Console.ReadKey();
}
}
}
我需要很多图片。 Google 图片搜索当然是一个很好的来源。
我一直在寻找最好的方法来做到这一点。获得较小的 "thumbnail" 图像是可能的,但我想要原始尺寸。
使用:
//*[@id="rg_s"]/div/div/text()
我确实找到了原始尺寸的 URL。例如:
{"cb":9,"cl":9,"cr":9,"ct":9,"id":"twpCKa-qACVbrM:","isu":"twitter.com",
"itg":false,"ity":"jpg","oh":512,"ou":
"https://pbs.twimg.com/profile_images/698459967624474624/FsezpZpl.jpg",
"ow":512,"pt":"Manchester United (@ManUtd) | Twitter","rid":"5Q1F7uGUbUotPM",
"ru":"https://twitter.com/manutd","s":"","sc":1,"th":225,"tu":
"https://encrypted-tbn2.gstatic.com/images?
q\u003dtbn:ANd9GcRELkTX0VqGU4OHs9sgS93dedTCNsW0TvJT3S72YuOCCHfXxZSa","tw":225}
有: https://pbs.twimg.com/profile_images/698459967624474624/FsezpZpl.jpg
正在 URL 到原始大小。我真的不知道这个文本块实际上可以在网站上的什么地方找到。但是我想知道的是 URL 它自己是否可以被隔离和提取?
您不能使用 XPath 提取 JSON 值的一部分,但您可以对使用 XPath 找到的文本值使用正则表达式。例如:
namespace ConsoleApplication1
{
public class Program
{
static void Main(string[] args)
{
//Load XML ....
//string s = xml.SelectSingleNode('//*[@id="rg_s"]/div/div/text()').Value
string s = @"{""cb"":9,""cl"":9,""cr"":9,""ct"":9,""id"":""twpCKa-qACVbrM:"",""isu"":""twitter.com"",
""itg"":false,""ity"":""jpg"",""oh"":512,""ou"":
""https://pbs.twimg.com/profile_images/698459967624474624/FsezpZpl.jpg"",
""ow"":512,""pt"":""Manchester United (@ManUtd) | Twitter"",""rid"":""5Q1F7uGUbUotPM"",
""ru"":""https://twitter.com/manutd"",""s"":"""",""sc"":1,""th"":225,""tu"":
""https://encrypted-tbn2.gstatic.com/images?
q\u003dtbn:ANd9GcRELkTX0VqGU4OHs9sgS93dedTCNsW0TvJT3S72YuOCCHfXxZSa"",""tw"":225}";
Console.WriteLine(System.Text.RegularExpressions.Regex.Match(s, "\"ou\":\s*?\"([^\"]+)\"").Groups[1].Value);
Console.ReadKey();
}
}
}