格式化原始字符串 Python
Formatting Raw String Python
我在 Python 中有一个原始字符串,它是通过 imap 库检索的。
看起来像这样:
Season: Winter 2017-18
Activity: Basketball - Boys JV
*DATE: 02/13/2018 * - ( previously 02/06/2018 )
Event type: Game
Home/Host: Clear Lake
Opponent: Webster City
*START TIME: 6:15PM CST* - ( previously 4:30PM CST )
Location: Clear Lake High School, 125 N. 20th Street, Clear Lake, IA
废弃每个标签后的数据的最佳方法是什么(标签是 DATE:
)例如 DATE: 02/13/2018 * - ( previously 02/06/2018 )
将设置为等于 Date
之类的变量,所以当 print(date)
被打印时, 02/13/2018 * - ( previously 02/06/2018 )
将是输出。
我试过下面的代码,但它每行打印一个字符。谢谢!
for line in message:
if "DATE:" in line:
print line
您可以使用正则表达式和字典:
import re
s = """
Season: Winter 2017-18
Activity: Basketball - Boys JV
*DATE: 02/13/2018 * - ( previously 02/06/2018 )
Event type: Game
Home/Host: Clear Lake
Opponent: Webster City
*START TIME: 6:15PM CST* - ( previously 4:30PM CST )
Location: Clear Lake High School, 125 N. 20th Street, Clear Lake, IA
"""
final_dict = {(a[1:] if a.startswith('*') else a).strip('\r'):b.strip('\r') for a, b in filter(lambda x:len(x)> 1, [re.split('\:\s', i) for i in filter(None, s.split('\n'))])}
输出:
{'Home/Host': 'Clear Lake', 'Season': 'Winter 2017-18', 'START TIME': '6:15PM CST* - ( previously 4:30PM CST )', 'Location': 'Clear Lake High School, 125 N. 20th Street, Clear Lake, IA', 'Activity': 'Basketball - Boys JV', 'DATE': '02/13/2018 * - ( previously 02/06/2018 )', 'Event type': 'Game', 'Opponent': 'Webster City'}
您可以使用str.splitlines()
to split the string to lines. Then iterate over the lines and use a regular expression提取数据,例如:
import re
for line in message.splitlines():
match = re.match(r'\*DATE: (.*)', line)
if match:
date = match.group(1)
print date
For line in message
迭代消息中的每一项:简单来说,消息是一个字符串,它的项是字符(因此它迭代每个字符)。
拆分是 simple/naive 解决问题的方法,但只要您的数据不会变得太复杂,它就可能会起作用:
使用message.split("\n")
在换行符上拆分字符串并对其进行迭代。然后,您可以使用 line.strip().strip("*").split(":", maxsplit=1)
将键与值分开。第一个 strip()
删除可能保留的额外空格(例如潜在的“\r”),第二个删除额外的星号。 maxsplit=1
在第一个冒号处停止(如果您的数据将冒号作为标签的一部分,这可能会出现问题)。
我说 key/value 是因为您可能真的不需要(或不想)将这些对动态分配给实际变量,并且可能只是将其存储为字典并根据需要进行查询。
output = dict()
for line in message.split("\n"): ## Split Lines
key,value = line.strip().split(":",maxsplit=1) ## Remove extra whitespace/* and split at the first colon
output[key] = value
编辑: 我的印象是 "date" 只是你的例子,但如果这就是你要找的全部,那么显然只需添加行 if key == "DATE"
和 return/print/etc 值。
如果您的数据在名为 datafile.txt 的文件中,您可以试试这个:
with open('datafile.txt', 'r') as f:
for line in f:
if line.startswith("*DATE:"):
print(line)
这个解决方案有效(我相信相当 "Pythonic"):
lines = message.split("\n") # Split your message into "lines"
sections = [line.split(": ") for line in lines] # Split lines by the "colon space"
message_dict = {section[0].lstrip(' '): section[1] for section in sections} # Dictionary comprehension to put your keys and values into a dict struct. Also removes leading whitespace from your keys.
我在 Python 中有一个原始字符串,它是通过 imap 库检索的。
看起来像这样:
Season: Winter 2017-18
Activity: Basketball - Boys JV
*DATE: 02/13/2018 * - ( previously 02/06/2018 )
Event type: Game
Home/Host: Clear Lake
Opponent: Webster City
*START TIME: 6:15PM CST* - ( previously 4:30PM CST )
Location: Clear Lake High School, 125 N. 20th Street, Clear Lake, IA
废弃每个标签后的数据的最佳方法是什么(标签是 DATE:
)例如 DATE: 02/13/2018 * - ( previously 02/06/2018 )
将设置为等于 Date
之类的变量,所以当 print(date)
被打印时, 02/13/2018 * - ( previously 02/06/2018 )
将是输出。
我试过下面的代码,但它每行打印一个字符。谢谢!
for line in message:
if "DATE:" in line:
print line
您可以使用正则表达式和字典:
import re
s = """
Season: Winter 2017-18
Activity: Basketball - Boys JV
*DATE: 02/13/2018 * - ( previously 02/06/2018 )
Event type: Game
Home/Host: Clear Lake
Opponent: Webster City
*START TIME: 6:15PM CST* - ( previously 4:30PM CST )
Location: Clear Lake High School, 125 N. 20th Street, Clear Lake, IA
"""
final_dict = {(a[1:] if a.startswith('*') else a).strip('\r'):b.strip('\r') for a, b in filter(lambda x:len(x)> 1, [re.split('\:\s', i) for i in filter(None, s.split('\n'))])}
输出:
{'Home/Host': 'Clear Lake', 'Season': 'Winter 2017-18', 'START TIME': '6:15PM CST* - ( previously 4:30PM CST )', 'Location': 'Clear Lake High School, 125 N. 20th Street, Clear Lake, IA', 'Activity': 'Basketball - Boys JV', 'DATE': '02/13/2018 * - ( previously 02/06/2018 )', 'Event type': 'Game', 'Opponent': 'Webster City'}
您可以使用str.splitlines()
to split the string to lines. Then iterate over the lines and use a regular expression提取数据,例如:
import re
for line in message.splitlines():
match = re.match(r'\*DATE: (.*)', line)
if match:
date = match.group(1)
print date
For line in message
迭代消息中的每一项:简单来说,消息是一个字符串,它的项是字符(因此它迭代每个字符)。
拆分是 simple/naive 解决问题的方法,但只要您的数据不会变得太复杂,它就可能会起作用:
使用message.split("\n")
在换行符上拆分字符串并对其进行迭代。然后,您可以使用 line.strip().strip("*").split(":", maxsplit=1)
将键与值分开。第一个 strip()
删除可能保留的额外空格(例如潜在的“\r”),第二个删除额外的星号。 maxsplit=1
在第一个冒号处停止(如果您的数据将冒号作为标签的一部分,这可能会出现问题)。
我说 key/value 是因为您可能真的不需要(或不想)将这些对动态分配给实际变量,并且可能只是将其存储为字典并根据需要进行查询。
output = dict()
for line in message.split("\n"): ## Split Lines
key,value = line.strip().split(":",maxsplit=1) ## Remove extra whitespace/* and split at the first colon
output[key] = value
编辑: 我的印象是 "date" 只是你的例子,但如果这就是你要找的全部,那么显然只需添加行 if key == "DATE"
和 return/print/etc 值。
如果您的数据在名为 datafile.txt 的文件中,您可以试试这个:
with open('datafile.txt', 'r') as f:
for line in f:
if line.startswith("*DATE:"):
print(line)
这个解决方案有效(我相信相当 "Pythonic"):
lines = message.split("\n") # Split your message into "lines"
sections = [line.split(": ") for line in lines] # Split lines by the "colon space"
message_dict = {section[0].lstrip(' '): section[1] for section in sections} # Dictionary comprehension to put your keys and values into a dict struct. Also removes leading whitespace from your keys.