如何从此 table (scrapy) 访问所有文本元素?
How can I access all of the text elements from this table (scrapy)?
我想我想多了,没有看到大局。我刚刚开始尝试制作我的第一个 scrapy 蜘蛛。我过去通常只使用硒进行抓取。我正在尝试从 table 的左侧抓取文本:https://whattomine.com/coins/1-btc-sha-256。列名将包含算法、块时间、最后一个块;列值将包含“SHA-256”、“10m 15s”、“709,104”等。
在 scrapy shell 这几乎让我到达那里:
key=response.css('dt:text')
val=response.css('dd:text')
data=dict(zip(key,val))
但对于某些值,我得到“\n”。我怎样才能得到所有的值?
- 'Ex rates' 文本位于 'a' 标签下,因此您看不到它们。
- 您需要阅读“\n”。
In [1]: value = response.xpath('//dl/dd//text()').getall()
In [1]: key = response.xpath('//dl/dt/text()').getall()
In [2]: value = [v.strip() for v in value if v.strip()!='']
In [3]: value
Out[3]:
['SHA-256',
'10m 3s',
'709,114',
'6.38',
'6.38',
'21,659,345M',
'21,659,345M',
'21,659,345M',
'21,659,345M',
'154,272.27 Ph/s',
',837.38 (Binance)',
',192.50 (Binance)',
',568.38 (Binance)',
',718.73 (Binance)',
'44,571.56 BTC',
',261,169,962,655',
'2,412.46 Days',
'0.00 Days']
In [4]: data = dict(zip(key, value))
In [5]: data
Out[5]:
{'Algorithm:': 'SHA-256',
'Block time:': '10m 3s',
'Last block:': '709,114',
'Bl. reward:': '6.38',
'Bl. reward 24h:': '6.38',
'Difficulty:': '21,659,345M',
'Difficulty 24h:': '21,659,345M',
'Difficulty 3 days:': '21,659,345M',
'Difficulty 7 days:': '21,659,345M',
'Nethash:': '154,272.27 Ph/s',
'Ex. rate:': ',837.38 (Binance)',
'Ex. rate 24h:': ',192.50 (Binance)',
'Ex. rate 3 days:': ',568.38 (Binance)',
'Ex. rate 7 days:': ',718.73 (Binance)',
'Ex. volume 24h:': '44,571.56 BTC',
'Market cap:': ',261,169,962,655',
'Create 1 BTC in:': '2,412.46 Days',
'Break even in:': '0.00 Days'}
我想我想多了,没有看到大局。我刚刚开始尝试制作我的第一个 scrapy 蜘蛛。我过去通常只使用硒进行抓取。我正在尝试从 table 的左侧抓取文本:https://whattomine.com/coins/1-btc-sha-256。列名将包含算法、块时间、最后一个块;列值将包含“SHA-256”、“10m 15s”、“709,104”等。
在 scrapy shell 这几乎让我到达那里:
key=response.css('dt:text')
val=response.css('dd:text')
data=dict(zip(key,val))
但对于某些值,我得到“\n”。我怎样才能得到所有的值?
- 'Ex rates' 文本位于 'a' 标签下,因此您看不到它们。
- 您需要阅读“\n”。
In [1]: value = response.xpath('//dl/dd//text()').getall()
In [1]: key = response.xpath('//dl/dt/text()').getall()
In [2]: value = [v.strip() for v in value if v.strip()!='']
In [3]: value
Out[3]:
['SHA-256',
'10m 3s',
'709,114',
'6.38',
'6.38',
'21,659,345M',
'21,659,345M',
'21,659,345M',
'21,659,345M',
'154,272.27 Ph/s',
',837.38 (Binance)',
',192.50 (Binance)',
',568.38 (Binance)',
',718.73 (Binance)',
'44,571.56 BTC',
',261,169,962,655',
'2,412.46 Days',
'0.00 Days']
In [4]: data = dict(zip(key, value))
In [5]: data
Out[5]:
{'Algorithm:': 'SHA-256',
'Block time:': '10m 3s',
'Last block:': '709,114',
'Bl. reward:': '6.38',
'Bl. reward 24h:': '6.38',
'Difficulty:': '21,659,345M',
'Difficulty 24h:': '21,659,345M',
'Difficulty 3 days:': '21,659,345M',
'Difficulty 7 days:': '21,659,345M',
'Nethash:': '154,272.27 Ph/s',
'Ex. rate:': ',837.38 (Binance)',
'Ex. rate 24h:': ',192.50 (Binance)',
'Ex. rate 3 days:': ',568.38 (Binance)',
'Ex. rate 7 days:': ',718.73 (Binance)',
'Ex. volume 24h:': '44,571.56 BTC',
'Market cap:': ',261,169,962,655',
'Create 1 BTC in:': '2,412.46 Days',
'Break even in:': '0.00 Days'}