根据第一个字符拆分列表 - Python
Split list based on first character - Python
我是 Python 的新手,无法完全找到解决我的问题的方法。我想根据列表项的开头将一个列表分成两个列表。我的列表看起来像这样,每一行代表一个项目(是的,这不是正确的列表符号,但为了更好的概述,我将这样保留它):
***
**
.param
+foo = bar
+foofoo = barbar
+foofoofoo = barbarbar
.model
+spam = eggs
+spamspam = eggseggs
+spamspamspam = eggseggseggs
所以我想要一个列表,其中包含 .param 和 .model 之间以“+”开头的所有行,另一个列表包含模型之后直到结尾的所有以“+”开头的行。
我查看了 enumerate() 和 split(),但是因为我有一个列表而不是一个字符串,并且我没有尝试匹配列表中的所有项目,所以我不确定如何实现它们。
我有的是:
paramList = []
for line in newContent:
while line.startswith('+'):
paramList.append(line)
if line.startswith('.'):
break
这只是我创建第一个列表的尝试。问题是,代码也读取了“+”的第二个块,因为 break 只是退出了 while 循环,而不是 for 循环。
希望您能理解我的问题,在此先感谢您的指点!
data = {}
for line in newContent:
if line.startswith('.'):
cur_dict = {}
data[line[1:]] = cur_dict
elif line.startswith('+'):
key, value = line[1:].split(' = ', 1)
cur_dict[key] = value
这将创建一个字典的字典:
{'model': {'spam': 'eggs',
'spamspam': 'eggseggs',
'spamspamspam': 'eggseggseggs'},
'param': {'foo': 'bar',
'foofoo': 'barbar',
'foofoofoo': 'barbarbar'}}
I am new to Python
哎呀。那就别理我的回答了
I want a list that contains all lines starting with a '+' between
.param and .model and another list that contains all lines starting
with a '+' after model until the end.
import itertools as it
import pprint
data = [
'***',
'**',
'.param',
'+foo = bar',
'+foofoo = barbar',
'+foofoofoo = barbarbar',
'.model',
'+spam = eggs',
'+spamspam = eggseggs',
'+spamspamspam = eggseggseggs',
]
results = [
list(group) for key, group in it.groupby(data, lambda s: s.startswith('+'))
if key
]
pprint.pprint(results)
print '-' * 20
print results[0]
print '-' * 20
pprint.pprint(results[1])
--output:--
[['+foo = bar', '+foofoo = barbar', '+foofoofoo = barbarbar'],
['+spam = eggs', '+spamspam = eggseggs', '+spamspamspam = eggseggseggs']]
--------------------
['+foo = bar', '+foofoo = barbar', '+foofoofoo = barbarbar']
--------------------
['+spam = eggs', '+spamspam = eggseggs', '+spamspamspam = eggseggseggs']
这东西在这里:
it.groupby(data, lambda x: x.startswith('+')
...告诉 python 根据字符串的第一个字符创建组。如果第一个字符是“+”,则该字符串会被放入 True 组中。如果第一个字符不是“+”,则该字符串会被放入 False 组中。但是,多于两组是因为连续的False串会组成一组,连续的True串也会组成一组。
根据您的数据,前三个字符串:
***
**
.param
将创建一个 False 组。然后,下一个字符串:
+foo = bar
+foofoo = barbar
+foofoofoo = barbarbar
将创建一个 True 组。然后是下一个字符串:
'.model'
将创建另一个 False 组。然后是下一个字符串:
'+spam = eggs'
'+spamspam = eggseggs'
'+spamspamspam = eggseggseggs'
将创建另一个 True 组。结果将类似于:
{
False: [strs here],
True: [strs here],
False: [strs here],
True: [strs here]
}
那么就是挑出每一个True组:if key
,然后将对应的组转成列表:list(group)
.
回复评论:
where exactly does python go through data, like how does it know s is
the data it's iterating over?
groupby() 的工作方式类似于下面的 do_stuff():
def do_stuff(items, func):
for item in items:
print func(item)
#Create the arguments for do_stuff():
data = [1, 2, 3]
def my_func(x):
return x + 100
#Call do_stuff() with the proper argument types:
do_stuff(data, my_func) #Just like when calling groupby(), you provide some data
#and a function that you want applied to each item in data
--output:--
101
102
103
也可以这样写:
do_stuff(data, lambda x: x + 100)
lambda
创建一个 anonymous 函数,这对于不需要通过名称引用的简单函数很方便。
这个列表理解:
[
list(group)
for key, group in it.groupby(data, lambda s: s.startswith('+'))
if key
]
等同于:
results = []
for key, group in it.groupby(data, lambda s: s.startswith('+') ):
if key:
results.append(list(group))
显式编写 for 循环更清晰,但列表理解执行得更快。这是一些细节:
[
list(group) #The item you want to be in the results list for the current iteration of the loop here:
for key, group in it.groupby(data, lambda s: s.startswith('+')) #A for loop
if key #Only include the item for the current loop iteration in the results list if key is True
]
您真正想要的是一个简单的任务,可以使用列表切片和列表理解来完成:
data = ['**','***','.param','+foo = bar','+foofoo = barbar','+foofoofoo = barbarbar',
'.model','+spam = eggs','+spamspam = eggseggs','+spamspamspam = eggseggseggs']
# First get the interesting positions.
param_tag_pos = data.index('.param')
model_tag_pos = data.index('.model')
# Get all elements between tags.
params = [param for param in data[param_tag_pos + 1: model_tag_pos] if param.startswith('+')]
models = [model for model in data[model_tag_pos + 1: -1] if model.startswith('+')]
print(params)
print(models)
输出
>>> ['+foo = bar', '+foofoo = barbar', '+foofoofoo = barbarbar']
>>> ['+spam = eggs', '+spamspam = eggseggs']
回复评论:
假设您有一个包含从 0 到 5 的数字的列表。
l = [0, 1, 2, 3, 4, 5]
然后使用列表 slices 你可以 select l
:
的一个子集
another = l[2:5] # another is [2, 3, 4]
我们在这里做的事情:
data[param_tag_pos + 1: model_tag_pos]
关于你的最后一个问题: ...python 如何知道参数是它应该迭代的数据行以及究竟是什么paramdo 的第一个参数是什么?
Python不知道,你要告诉他。
首先param
是我在这里使用的变量名,它可以是x
、list_items
,任何你想要的。
我会为您将这行代码翻译成简单的英语:
# Pythonian
params = [param for param in data[param_tag_pos + 1: model_tag_pos] if param.startswith('+')]
# English
params is a list of "things", for each "thing" we can see in the list `data`
from position `param_tag_pos + 1` to position `model_tag_pos`, just if that "thing" starts with the character '+'.
我建议按部就班地做事。
1)分别从数组中抓取每个单词。
2)抓住单词的第一个字母。
3) 看看那是 '+' 还是 '.'
示例代码:
import re
class Dark():
def __init__(self):
# Array
x = ['+Hello', '.World', '+Hobbits', '+Dwarves', '.Orcs']
xPlus = []
xDot = []
# Values
i = 0
# Look through every word in the array one by one.
while (i != len(x)):
# Grab every word (s), and convert to string (y).
s = x[i:i+1]
y = '\n'.join(s)
# Print word
print(y)
# Grab the first letter.
letter = y[:1]
if (letter == '+'):
xPlus.append(y)
elif (letter == '.'):
xDot.append(y)
else:
pass
# Add +1
i = i + 1
# Print lists
print(xPlus)
print(xDot)
#Run class
Dark()
我是 Python 的新手,无法完全找到解决我的问题的方法。我想根据列表项的开头将一个列表分成两个列表。我的列表看起来像这样,每一行代表一个项目(是的,这不是正确的列表符号,但为了更好的概述,我将这样保留它):
***
**
.param
+foo = bar
+foofoo = barbar
+foofoofoo = barbarbar
.model
+spam = eggs
+spamspam = eggseggs
+spamspamspam = eggseggseggs
所以我想要一个列表,其中包含 .param 和 .model 之间以“+”开头的所有行,另一个列表包含模型之后直到结尾的所有以“+”开头的行。
我查看了 enumerate() 和 split(),但是因为我有一个列表而不是一个字符串,并且我没有尝试匹配列表中的所有项目,所以我不确定如何实现它们。 我有的是:
paramList = []
for line in newContent:
while line.startswith('+'):
paramList.append(line)
if line.startswith('.'):
break
这只是我创建第一个列表的尝试。问题是,代码也读取了“+”的第二个块,因为 break 只是退出了 while 循环,而不是 for 循环。 希望您能理解我的问题,在此先感谢您的指点!
data = {}
for line in newContent:
if line.startswith('.'):
cur_dict = {}
data[line[1:]] = cur_dict
elif line.startswith('+'):
key, value = line[1:].split(' = ', 1)
cur_dict[key] = value
这将创建一个字典的字典:
{'model': {'spam': 'eggs',
'spamspam': 'eggseggs',
'spamspamspam': 'eggseggseggs'},
'param': {'foo': 'bar',
'foofoo': 'barbar',
'foofoofoo': 'barbarbar'}}
I am new to Python
哎呀。那就别理我的回答了
I want a list that contains all lines starting with a '+' between .param and .model and another list that contains all lines starting with a '+' after model until the end.
import itertools as it
import pprint
data = [
'***',
'**',
'.param',
'+foo = bar',
'+foofoo = barbar',
'+foofoofoo = barbarbar',
'.model',
'+spam = eggs',
'+spamspam = eggseggs',
'+spamspamspam = eggseggseggs',
]
results = [
list(group) for key, group in it.groupby(data, lambda s: s.startswith('+'))
if key
]
pprint.pprint(results)
print '-' * 20
print results[0]
print '-' * 20
pprint.pprint(results[1])
--output:--
[['+foo = bar', '+foofoo = barbar', '+foofoofoo = barbarbar'],
['+spam = eggs', '+spamspam = eggseggs', '+spamspamspam = eggseggseggs']]
--------------------
['+foo = bar', '+foofoo = barbar', '+foofoofoo = barbarbar']
--------------------
['+spam = eggs', '+spamspam = eggseggs', '+spamspamspam = eggseggseggs']
这东西在这里:
it.groupby(data, lambda x: x.startswith('+')
...告诉 python 根据字符串的第一个字符创建组。如果第一个字符是“+”,则该字符串会被放入 True 组中。如果第一个字符不是“+”,则该字符串会被放入 False 组中。但是,多于两组是因为连续的False串会组成一组,连续的True串也会组成一组。
根据您的数据,前三个字符串:
***
**
.param
将创建一个 False 组。然后,下一个字符串:
+foo = bar
+foofoo = barbar
+foofoofoo = barbarbar
将创建一个 True 组。然后是下一个字符串:
'.model'
将创建另一个 False 组。然后是下一个字符串:
'+spam = eggs'
'+spamspam = eggseggs'
'+spamspamspam = eggseggseggs'
将创建另一个 True 组。结果将类似于:
{
False: [strs here],
True: [strs here],
False: [strs here],
True: [strs here]
}
那么就是挑出每一个True组:if key
,然后将对应的组转成列表:list(group)
.
回复评论:
where exactly does python go through data, like how does it know s is the data it's iterating over?
groupby() 的工作方式类似于下面的 do_stuff():
def do_stuff(items, func):
for item in items:
print func(item)
#Create the arguments for do_stuff():
data = [1, 2, 3]
def my_func(x):
return x + 100
#Call do_stuff() with the proper argument types:
do_stuff(data, my_func) #Just like when calling groupby(), you provide some data
#and a function that you want applied to each item in data
--output:--
101
102
103
也可以这样写:
do_stuff(data, lambda x: x + 100)
lambda
创建一个 anonymous 函数,这对于不需要通过名称引用的简单函数很方便。
这个列表理解:
[
list(group)
for key, group in it.groupby(data, lambda s: s.startswith('+'))
if key
]
等同于:
results = []
for key, group in it.groupby(data, lambda s: s.startswith('+') ):
if key:
results.append(list(group))
显式编写 for 循环更清晰,但列表理解执行得更快。这是一些细节:
[
list(group) #The item you want to be in the results list for the current iteration of the loop here:
for key, group in it.groupby(data, lambda s: s.startswith('+')) #A for loop
if key #Only include the item for the current loop iteration in the results list if key is True
]
您真正想要的是一个简单的任务,可以使用列表切片和列表理解来完成:
data = ['**','***','.param','+foo = bar','+foofoo = barbar','+foofoofoo = barbarbar',
'.model','+spam = eggs','+spamspam = eggseggs','+spamspamspam = eggseggseggs']
# First get the interesting positions.
param_tag_pos = data.index('.param')
model_tag_pos = data.index('.model')
# Get all elements between tags.
params = [param for param in data[param_tag_pos + 1: model_tag_pos] if param.startswith('+')]
models = [model for model in data[model_tag_pos + 1: -1] if model.startswith('+')]
print(params)
print(models)
输出
>>> ['+foo = bar', '+foofoo = barbar', '+foofoofoo = barbarbar']
>>> ['+spam = eggs', '+spamspam = eggseggs']
回复评论:
假设您有一个包含从 0 到 5 的数字的列表。
l = [0, 1, 2, 3, 4, 5]
然后使用列表 slices 你可以 select l
:
another = l[2:5] # another is [2, 3, 4]
我们在这里做的事情:
data[param_tag_pos + 1: model_tag_pos]
关于你的最后一个问题: ...python 如何知道参数是它应该迭代的数据行以及究竟是什么paramdo 的第一个参数是什么?
Python不知道,你要告诉他。
首先param
是我在这里使用的变量名,它可以是x
、list_items
,任何你想要的。
我会为您将这行代码翻译成简单的英语:
# Pythonian
params = [param for param in data[param_tag_pos + 1: model_tag_pos] if param.startswith('+')]
# English
params is a list of "things", for each "thing" we can see in the list `data`
from position `param_tag_pos + 1` to position `model_tag_pos`, just if that "thing" starts with the character '+'.
我建议按部就班地做事。
1)分别从数组中抓取每个单词。
2)抓住单词的第一个字母。
3) 看看那是 '+' 还是 '.'
示例代码:
import re
class Dark():
def __init__(self):
# Array
x = ['+Hello', '.World', '+Hobbits', '+Dwarves', '.Orcs']
xPlus = []
xDot = []
# Values
i = 0
# Look through every word in the array one by one.
while (i != len(x)):
# Grab every word (s), and convert to string (y).
s = x[i:i+1]
y = '\n'.join(s)
# Print word
print(y)
# Grab the first letter.
letter = y[:1]
if (letter == '+'):
xPlus.append(y)
elif (letter == '.'):
xDot.append(y)
else:
pass
# Add +1
i = i + 1
# Print lists
print(xPlus)
print(xDot)
#Run class
Dark()