根据第一个字符拆分列表 - Python

Split list based on first character - Python

我是 Python 的新手,无法完全找到解决我的问题的方法。我想根据列表项的开头将一个列表分成两个列表。我的列表看起来像这样,每一行代表一个项目(是的,这不是正确的列表符号,但为了更好的概述,我将这样保留它):

***
**
.param
+foo = bar
+foofoo = barbar
+foofoofoo = barbarbar
.model
+spam = eggs
+spamspam = eggseggs
+spamspamspam = eggseggseggs

所以我想要一个列表,其中包含 .param 和 .model 之间以“+”开头的所有行,另一个列表包含模型之后直到结尾的所有以“+”开头的行。

我查看了 enumerate() 和 split(),但是因为我有一个列表而不是一个字符串,并且我没有尝试匹配列表中的所有项目,所以我不确定如何实现它们。 我有的是:

paramList = []
for line in newContent:
    while line.startswith('+'):
        paramList.append(line)
        if line.startswith('.'):
            break

这只是我创建第一个列表的尝试。问题是,代码也读取了“+”的第二个块,因为 break 只是退出了 while 循环,而不是 for 循环。 希望您能理解我的问题,在此先感谢您的指点!

data = {}
for line in newContent:
    if line.startswith('.'):
        cur_dict = {}
        data[line[1:]] = cur_dict
    elif line.startswith('+'):
        key, value = line[1:].split(' = ', 1)
        cur_dict[key] = value

这将创建一个字典的字典:

{'model': {'spam': 'eggs',
           'spamspam': 'eggseggs',
           'spamspamspam': 'eggseggseggs'},
 'param': {'foo': 'bar',
           'foofoo': 'barbar',
           'foofoofoo': 'barbarbar'}}

I am new to Python

哎呀。那就别理我的回答了

I want a list that contains all lines starting with a '+' between .param and .model and another list that contains all lines starting with a '+' after model until the end.

import itertools as it
import pprint

data = [
    '***',
    '**',
    '.param',
    '+foo = bar',
    '+foofoo = barbar',
    '+foofoofoo = barbarbar',
    '.model',
    '+spam = eggs',
    '+spamspam = eggseggs',
    '+spamspamspam = eggseggseggs',
]

results = [
    list(group) for key, group in it.groupby(data, lambda s: s.startswith('+'))
    if key
]


pprint.pprint(results)
print '-' * 20
print results[0]
print '-' * 20
pprint.pprint(results[1])

--output:--
[['+foo = bar', '+foofoo = barbar', '+foofoofoo = barbarbar'],
 ['+spam = eggs', '+spamspam = eggseggs', '+spamspamspam = eggseggseggs']]
--------------------
['+foo = bar', '+foofoo = barbar', '+foofoofoo = barbarbar']
--------------------
['+spam = eggs', '+spamspam = eggseggs', '+spamspamspam = eggseggseggs']

这东西在这里:

it.groupby(data, lambda x: x.startswith('+')

...告诉 python 根据字符串的第一个字符创建组。如果第一个字符是“+”,则该字符串会被放入 True 组中。如果第一个字符不是“+”,则该字符串会被放入 False 组中。但是,多于两组是因为连续的False串会组成一组,连续的True串也会组成一组。

根据您的数据,前三个字符串:

***
**
.param

将创建一个 False 组。然后,下一个字符串:

+foo = bar
+foofoo = barbar
+foofoofoo = barbarbar

将创建一个 True 组。然后是下一个字符串:

'.model'

将创建另一个 False 组。然后是下一个字符串:

'+spam = eggs'
'+spamspam = eggseggs'
'+spamspamspam = eggseggseggs'

将创建另一个 True 组。结果将类似于:

{
    False: [strs here],
    True:  [strs here],
    False: [strs here],
    True:  [strs here]
}

那么就是挑出每一个True组:if key,然后将对应的组转成列表:list(group).

回复评论:

where exactly does python go through data, like how does it know s is the data it's iterating over?

groupby() 的工作方式类似于下面的 do_stuff():

def do_stuff(items, func):
    for item in items:
        print func(item)


#Create the arguments for do_stuff():

data = [1, 2, 3]

def my_func(x):
    return  x + 100 

#Call do_stuff() with the proper argument types:

do_stuff(data, my_func) #Just like when calling groupby(), you provide some data 
                        #and a function that you want applied to each item in data

--output:--
101
102
103

也可以这样写:

do_stuff(data, lambda x: x + 100)

lambda 创建一个 anonymous 函数,这对于不需要通过名称引用的简单函数很方便。

这个列表理解:

[ 
    list(group) 
    for key, group in it.groupby(data, lambda s: s.startswith('+')) 
    if key 
]

等同于

results = []

for key, group in it.groupby(data, lambda s: s.startswith('+') ):
   if key:
       results.append(list(group))

显式编写 for 循环更清晰,但列表理解执行得更快。这是一些细节:

[ 
    list(group)  #The item you want to be in the results list for the current iteration of the loop here:
    for key, group in it.groupby(data, lambda s: s.startswith('+')) #A for loop
    if key #Only include the item for the current loop iteration in the results list if key is True
]

您真正想要的是一个简单的任务,可以使用列表切片和列表理解来完成:

data = ['**','***','.param','+foo = bar','+foofoo = barbar','+foofoofoo = barbarbar',
     '.model','+spam = eggs','+spamspam = eggseggs','+spamspamspam = eggseggseggs']

# First get the interesting positions.
param_tag_pos = data.index('.param')
model_tag_pos = data.index('.model')
# Get all elements between tags.
params =  [param for param in data[param_tag_pos + 1: model_tag_pos] if param.startswith('+')]
models =  [model for model in data[model_tag_pos + 1: -1] if model.startswith('+')]

print(params)
print(models)

输出

>>> ['+foo = bar', '+foofoo = barbar', '+foofoofoo = barbarbar']
>>> ['+spam = eggs', '+spamspam = eggseggs']

回复评论:

假设您有一个包含从 0 到 5 的数字的列表。

l = [0, 1, 2, 3, 4, 5]

然后使用列表 slices 你可以 select l:

的一个子集
another = l[2:5]   # another is [2, 3, 4]

我们在这里做的事情:

data[param_tag_pos + 1: model_tag_pos]

关于你的最后一个问题: ...python 如何知道参数是它应该迭代的数据行以及究竟是什么paramdo 的第一个参数是什么?

Python不知道,你要告诉他。

首先param是我在这里使用的变量名,它可以是xlist_items,任何你想要的。

我会为您将这行代码翻译成简单的英语:

# Pythonian
params =  [param for param in data[param_tag_pos + 1: model_tag_pos] if param.startswith('+')]

# English
params is a list of "things", for each "thing" we can see in the list `data` 
from position `param_tag_pos + 1` to position `model_tag_pos`, just if that "thing" starts with the character '+'.

我建议按部就班地做事。
1)分别从数组中抓取每个单词。
2)抓住单词的第一个字母。
3) 看看那是 '+' 还是 '.'

示例代码:

import re
class Dark():
    def __init__(self):
        # Array 
        x = ['+Hello', '.World', '+Hobbits', '+Dwarves', '.Orcs']
        xPlus = []
        xDot = []
        # Values
        i = 0
        # Look through every word in the array one by one. 
        while (i != len(x)):
            # Grab every word (s), and convert to string (y).
            s = x[i:i+1]
            y = '\n'.join(s)
            # Print word
            print(y)
            # Grab the first letter.
            letter = y[:1]
            if (letter == '+'):
                xPlus.append(y)
            elif (letter == '.'):
                xDot.append(y)
            else:
                pass
            # Add +1
            i = i + 1
        # Print lists
        print(xPlus)
        print(xDot)

#Run class
Dark()