在列表中更改刮取的字符串(转换为浮动和返回)
Changing scraped strings (convert to float and back) inside a list
我正在练习抓取网站,我得到了一串价格。我不太熟悉列表及其工作方式,所以我不确定,但我想将美元转换为澳元,大约只有 1 美元:1.32 美元的比率。我假设字符串首先是 eval() 成为浮点数列表,然后可能只是乘以 1.32,但我不确定如何实际进行比率交换:
from tkinter import *
from re import findall, MULTILINE
rss = open('rss.xhtml', encoding="utf8").read()
# prints 10 price values
regex_test = findall(r'([0-9]+[.]*[0-9]*) USD', rss)
price = ["$" + regex_test for regex_test in regex_test]
for cost in range(10):
print(price[cost])
这将打印 10 个价格,其中 => 表示过渡到下一个价格,即 20 美元变为 26.40 澳元:
- 20.00 美元 => 26.40 美元
- 20.00 美元 => 26.40 美元
- 20.00 美元 => 26.40 美元
- 20.00 美元 => 26.40 美元
- 16.00 美元 => 21.12 美元
- 23.50 美元 => 31.02 美元
- 20.00 美元 => 26.40 美元
- 16.00 美元 => 21.12 美元
- 189.00 美元 => 249.48 美元
- 16.00 美元 => 21.12 美元
为了辅助使用相同的正则表达式提取价格,这里是一个类似的 rss 提要 https://www.etsy.com/au/shop/ElvenTechnology/rss
使用 10 的范围,因为我不想抓取数百个条目,只抓取顶部的一些。
使您的 for 循环更符合 Python 风格:
from tkinter import *k from re import findall, MULTILINE
rss = open('rss.xhtml', encoding="utf8").read()
# prints 10 price values
regex_test = findall(r'([0-9]+[.]*[0-9]*) USD', rss)
price = ["$" + regex_test for regex_test in regex_test]
for individual_price in price:
print(individual_price)
将列表转换为 AUD,假设您只想乘以一个值,对于您的代码,在添加美元符号之前返回到列表似乎更好:
aud_usd_ratio = 1.32 # 1.32 AUD to 1 USD
aud_price_list = ["$" + str(float(x)*aud_usd_ratio) for x in regex_test]
print(aud_price_list)
如果你需要那两位小数,你也可以使用字符串格式:
aud_price_list = ["${:.2f}".format(float(x)*aud_usd_ratio ) for x in regex_test]
print(aud_price_list)
假设 regex_test
与我的 prices_list_usd
相同:
prices_list_usd = [11.11,12.22,21.324,3.11]
usd_aud_ratio = 1.32
prices_list_aud = [price*usd_aud_ratio for price in prices_list_usd]
combined_list = zip(prices_list_usd,prices_list_aud)
for pair in combined_list:
print("$USD {0} => $AUD {1}".format(pair[0],pair[1]))
我认为您需要提取所有值,将它们转换为 float,然后相应地设置格式,
# I don't know rss file so dummy variable
rss = ".00 => .40 .00 => .40 .00 => .12 9.00 => 9.48"
costs = re.findall(r'(?<=$)\d+\.\d+', rss)
# cast to float and multiply with 1.32
costs = [float(cost) * 1.32 for cost in costs]
# now format them
for i in range(0, len(costs), 2):
print("{:.2f} => {:.2f}".format(costs[i], costs[i + 1]))
# output
# 26.40 => 34.85
# 26.40 => 34.85
# 21.12 => 27.88
# 249.48 => 329.31
通过对 glycoaddict 的解决方案稍作改动,可以在列表中创建更新价格列表或类似 "variable",然后从中单独调用列表中的每个值:
# installs necessary modules
from tkinter import *
from re import findall, MULTILINE
import urllib.request
# downloads an rss feed to use, the feel is downloaded,
# then saved under name and format (xhtml, html, etc.)
urllib.request.urlretrieve("https://www.etsy.com/au/shop/ElvenTechnology/rss", "rss.xhtml")
# opens the downloaded file to read from, 'U' can be used instead
# of 'encoding="utf8"', however this causes issues on some feeds, for
# example this particulare feed needs to be encoded in utf8 otherwise
# a decoding error occurs as shown below;
# return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError:
# 'charmap' codec can't decode byte 0x9d in position 12605: character maps to <unidentified>
rss = open('rss.xhtml', encoding="utf8").read()
# regex is used to find all instances within the document which was opened
# and called rss
regex_test = findall(r'([0-9]+[.]*[0-9]*) USD', rss)
# formats the returned string to be modified to desired value (glycoaddict)
# aud_usd_ratio = 1.32 is the same as simply using 1.32, this just creates
# a variable with a value of 1.32 to multuply rather than simply 1.32 itself
AUD_price = ["${:.2f}".format(float(USD)*1.32) for USD in regex_test]
# loops the function 10 times, this is to stop rss feeds with thousands
# of returns listing endlessly, this only returns the first 10, which are
# taken out of the created and formatted/modified string list, and prints
# each value individually, which is useful for say a list of label
# in tkinter to be looped and placed
for individual_item_price in range(10):
print(AUD_price[individual_item_price])
请注意,每次 运行 都会下载和更新 rss 文件,这意味着可以将其视为实时价格,运行现在,然后一个小时或几个小时后return 不同的结果。
我正在练习抓取网站,我得到了一串价格。我不太熟悉列表及其工作方式,所以我不确定,但我想将美元转换为澳元,大约只有 1 美元:1.32 美元的比率。我假设字符串首先是 eval() 成为浮点数列表,然后可能只是乘以 1.32,但我不确定如何实际进行比率交换:
from tkinter import *
from re import findall, MULTILINE
rss = open('rss.xhtml', encoding="utf8").read()
# prints 10 price values
regex_test = findall(r'([0-9]+[.]*[0-9]*) USD', rss)
price = ["$" + regex_test for regex_test in regex_test]
for cost in range(10):
print(price[cost])
这将打印 10 个价格,其中 => 表示过渡到下一个价格,即 20 美元变为 26.40 澳元:
- 20.00 美元 => 26.40 美元
- 20.00 美元 => 26.40 美元
- 20.00 美元 => 26.40 美元
- 20.00 美元 => 26.40 美元
- 16.00 美元 => 21.12 美元
- 23.50 美元 => 31.02 美元
- 20.00 美元 => 26.40 美元
- 16.00 美元 => 21.12 美元
- 189.00 美元 => 249.48 美元
- 16.00 美元 => 21.12 美元
为了辅助使用相同的正则表达式提取价格,这里是一个类似的 rss 提要 https://www.etsy.com/au/shop/ElvenTechnology/rss
使用 10 的范围,因为我不想抓取数百个条目,只抓取顶部的一些。
使您的 for 循环更符合 Python 风格:
from tkinter import *k from re import findall, MULTILINE
rss = open('rss.xhtml', encoding="utf8").read()
# prints 10 price values
regex_test = findall(r'([0-9]+[.]*[0-9]*) USD', rss)
price = ["$" + regex_test for regex_test in regex_test]
for individual_price in price:
print(individual_price)
将列表转换为 AUD,假设您只想乘以一个值,对于您的代码,在添加美元符号之前返回到列表似乎更好:
aud_usd_ratio = 1.32 # 1.32 AUD to 1 USD
aud_price_list = ["$" + str(float(x)*aud_usd_ratio) for x in regex_test]
print(aud_price_list)
如果你需要那两位小数,你也可以使用字符串格式:
aud_price_list = ["${:.2f}".format(float(x)*aud_usd_ratio ) for x in regex_test]
print(aud_price_list)
假设 regex_test
与我的 prices_list_usd
相同:
prices_list_usd = [11.11,12.22,21.324,3.11]
usd_aud_ratio = 1.32
prices_list_aud = [price*usd_aud_ratio for price in prices_list_usd]
combined_list = zip(prices_list_usd,prices_list_aud)
for pair in combined_list:
print("$USD {0} => $AUD {1}".format(pair[0],pair[1]))
我认为您需要提取所有值,将它们转换为 float,然后相应地设置格式,
# I don't know rss file so dummy variable
rss = ".00 => .40 .00 => .40 .00 => .12 9.00 => 9.48"
costs = re.findall(r'(?<=$)\d+\.\d+', rss)
# cast to float and multiply with 1.32
costs = [float(cost) * 1.32 for cost in costs]
# now format them
for i in range(0, len(costs), 2):
print("{:.2f} => {:.2f}".format(costs[i], costs[i + 1]))
# output
# 26.40 => 34.85
# 26.40 => 34.85
# 21.12 => 27.88
# 249.48 => 329.31
通过对 glycoaddict 的解决方案稍作改动,可以在列表中创建更新价格列表或类似 "variable",然后从中单独调用列表中的每个值:
# installs necessary modules
from tkinter import *
from re import findall, MULTILINE
import urllib.request
# downloads an rss feed to use, the feel is downloaded,
# then saved under name and format (xhtml, html, etc.)
urllib.request.urlretrieve("https://www.etsy.com/au/shop/ElvenTechnology/rss", "rss.xhtml")
# opens the downloaded file to read from, 'U' can be used instead
# of 'encoding="utf8"', however this causes issues on some feeds, for
# example this particulare feed needs to be encoded in utf8 otherwise
# a decoding error occurs as shown below;
# return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError:
# 'charmap' codec can't decode byte 0x9d in position 12605: character maps to <unidentified>
rss = open('rss.xhtml', encoding="utf8").read()
# regex is used to find all instances within the document which was opened
# and called rss
regex_test = findall(r'([0-9]+[.]*[0-9]*) USD', rss)
# formats the returned string to be modified to desired value (glycoaddict)
# aud_usd_ratio = 1.32 is the same as simply using 1.32, this just creates
# a variable with a value of 1.32 to multuply rather than simply 1.32 itself
AUD_price = ["${:.2f}".format(float(USD)*1.32) for USD in regex_test]
# loops the function 10 times, this is to stop rss feeds with thousands
# of returns listing endlessly, this only returns the first 10, which are
# taken out of the created and formatted/modified string list, and prints
# each value individually, which is useful for say a list of label
# in tkinter to be looped and placed
for individual_item_price in range(10):
print(AUD_price[individual_item_price])
请注意,每次 运行 都会下载和更新 rss 文件,这意味着可以将其视为实时价格,运行现在,然后一个小时或几个小时后return 不同的结果。