如何将多个函数应用于单个 DataFrame 列?
How to apply multiple functions onto a single DataFrame column?
假设我有 df:
Name Sequence
Bob IN,IN
Marley OUT,IN
Jack IN,IN,OUT,IN
Harlow
df 的名称和序列为 'ins/outs'。序列列中可以有空白值。如何有效地将这两个函数应用于序列列?像这样的伪代码:
df['Sequence'] = converter(sequencer(df['Sequence']))
# takes string of IN/OUT, converts to bits, returns bitstring. 'IN,OUT,IN' -> '010'
def sequencer(seq):
# 'IN,IN' -> ['IN', 'IN']
seq = seq.split(',')
# get sequence up to 3 unique digits. [0,0,1,1,0] = sequence 010
seq = [1 if x == 'IN' else 0 for x in seq]
a = seq[0]
try:
b = seq.index(1-a, 1)
except:
return str(a)
if a not in seq[b+1]:
return str(a) + str(1-a)
return str(a) + str(1-a) + str(a)
# converts bitstring back into in/out format
def converter(seq):
return '-'.join(['IN' if x == '1' else 'OUT' for x in seq])
生成此数据框?
Name Sequence
Bob IN
Marley OUT-IN
Jack IN-OUT-IN
Harlow
我在这里 看了一眼,评论说不要使用 apply,因为它效率低下,我需要效率,因为我正在处理大型数据集。
itertools
- 使用
groupby
获得唯一(非重复)的东西
- 使用
islicde
获得前 3 个。
from itertools import islice, groupby
def f(s):
return '-'.join([k for k, _ in islice(groupby(s.split(',')), 3)])
df.assign(Sequence=[*map(f, df.Sequence.fillna(''))])
Name Sequence
0 Bob IN
1 Marley OUT-IN
2 Jack IN-OUT-IN
3 Harlow
具有更好闭合性的变体,以实现最大的未来灵活性。
from itertools import islice, groupby
def get_f(n, splitter=',', joiner='-'):
def f(s):
return joiner.join([k for k, _ in islice(groupby(s.split(splitter)), n)])
return f
df.assign(Sequence=[*map(get_f(3), df.Sequence.fillna(''))])
另一种变体使我在做什么更加明显(不那么令人讨厌 Python 金光闪闪)
from itertools import islice, groupby
def get_f(n, splitter=',', joiner='-'):
def f(s):
return joiner.join([k for k, _ in islice(groupby(s.split(splitter)), n)])
return f
f = get_f(3)
df['Sequence-InOut'] = [f(s) for s in df.Sequence.fillna('')]
df
Name Sequence Sequence-InOut
0 Bob IN,IN IN
1 Marley OUT,IN OUT-IN
2 Jack IN,IN,OUT,IN IN-OUT-IN
3 Harlow None
假设我有 df:
Name Sequence
Bob IN,IN
Marley OUT,IN
Jack IN,IN,OUT,IN
Harlow
df 的名称和序列为 'ins/outs'。序列列中可以有空白值。如何有效地将这两个函数应用于序列列?像这样的伪代码:
df['Sequence'] = converter(sequencer(df['Sequence']))
# takes string of IN/OUT, converts to bits, returns bitstring. 'IN,OUT,IN' -> '010'
def sequencer(seq):
# 'IN,IN' -> ['IN', 'IN']
seq = seq.split(',')
# get sequence up to 3 unique digits. [0,0,1,1,0] = sequence 010
seq = [1 if x == 'IN' else 0 for x in seq]
a = seq[0]
try:
b = seq.index(1-a, 1)
except:
return str(a)
if a not in seq[b+1]:
return str(a) + str(1-a)
return str(a) + str(1-a) + str(a)
# converts bitstring back into in/out format
def converter(seq):
return '-'.join(['IN' if x == '1' else 'OUT' for x in seq])
生成此数据框?
Name Sequence
Bob IN
Marley OUT-IN
Jack IN-OUT-IN
Harlow
我在这里
itertools
- 使用
groupby
获得唯一(非重复)的东西 - 使用
islicde
获得前 3 个。
from itertools import islice, groupby
def f(s):
return '-'.join([k for k, _ in islice(groupby(s.split(',')), 3)])
df.assign(Sequence=[*map(f, df.Sequence.fillna(''))])
Name Sequence
0 Bob IN
1 Marley OUT-IN
2 Jack IN-OUT-IN
3 Harlow
具有更好闭合性的变体,以实现最大的未来灵活性。
from itertools import islice, groupby
def get_f(n, splitter=',', joiner='-'):
def f(s):
return joiner.join([k for k, _ in islice(groupby(s.split(splitter)), n)])
return f
df.assign(Sequence=[*map(get_f(3), df.Sequence.fillna(''))])
另一种变体使我在做什么更加明显(不那么令人讨厌 Python 金光闪闪)
from itertools import islice, groupby
def get_f(n, splitter=',', joiner='-'):
def f(s):
return joiner.join([k for k, _ in islice(groupby(s.split(splitter)), n)])
return f
f = get_f(3)
df['Sequence-InOut'] = [f(s) for s in df.Sequence.fillna('')]
df
Name Sequence Sequence-InOut
0 Bob IN,IN IN
1 Marley OUT,IN OUT-IN
2 Jack IN,IN,OUT,IN IN-OUT-IN
3 Harlow None