Python pandas 适用:第一行 None 有问题

Python pandas apply: problem with None in first row

我需要一些帮助。 编写用于查找组模式的代码并将 None 替换为该模式。 当 "None" 在第一行时,thad 不起作用:

df = pd.DataFrame([[16, None, 3], [17, None, 30], [10, "v", 30], [10, "z", 3], [None, "a", 23], [2, "a", 23]], columns=['A', 'B', 'C'])

dict_group = df.groupby('C')['B'].agg(lambda x: pd.Series.mode(x).iat[0]).to_frame().to_dict()

df.apply(lambda s: dict_group["B"][s["C"]] if ((s["B"]==None) | (pd.isnull(s["B"])==True)) else s, axis=1)["B"]

给出错误

TypeError                                 Traceback (most recent call last)
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

TypeError: an integer is required

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-129-0f7009f92c25> in <module>
----> 1 df.apply(lambda s: dict_group["B"][s["C"]] if ((s["B"]==None) | (pd.isnull(s["B"])==True)) else s, axis=1)["B"]

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\series.py in __getitem__(self, key)
    765         key = com._apply_if_callable(key, self)
    766         try:
--> 767             result = self.index.get_value(self, key)
    768 
    769             if not is_scalar(result):

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_value(self, series, key)
   3116         try:
   3117             return self._engine.get_value(s, k,
-> 3118                                           tz=getattr(series.dtype, 'tz', None))
   3119         except KeyError as e1:
   3120             if len(self) > 0 and self.inferred_type in ['integer', 'boolean']:

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

KeyError: 'B'

但是第一行没有 "None" 效果很好!

df = pd.DataFrame([[16, "y", 3], [17, None, 30], [10, "v", 30], [10, "z", 3], [None, "a", 23], [2, "a", 23]], columns=['A', 'B', 'C'])

dict_group = df.groupby('C')['B'].agg(lambda x: pd.Series.mode(x).iat[0]).to_frame().to_dict()

df.apply(lambda s: dict_group["B"][s["C"]] if ((s["B"]==None) | (pd.isnull(s["B"])==True)) else s, axis=1)["B"]

0    y
1    v
2    v
3    z
4    a
5    a
Name: B, dtype: object

我该如何解决?

使用GroupBy.transform for Series with same size like original, so possible replace Nones or NaNs by Series.fillna.

另外,对于更通用的解决方案,添加 nextiter for return None if mode return empty Series and iat[0]失败:

df = pd.DataFrame([[16, None, 3], [17, None, 30], 
                   [10, "v", 30], [10, "z", 3], 
                   [None, "a", 23], [2, "a", 23]], 
                   columns=['A', 'B', 'C'])
s = df.groupby('C')['B'].transform(lambda x: next(iter(x.mode()), None))
df['B'] = df['B'].fillna(s)
print (df)

      A  B   C
0  16.0  z   3
1  17.0  v  30
2  10.0  v  30
3  10.0  z   3
4   NaN  a  23
5   2.0  a  23

@jezrael 谢谢!我是这样制作的:

df = pd.DataFrame([[5, None, 23],
                   [17, "v", 30],
                   [10, "v", 3],
                   [10, "z", 23],
                   [None, "a", 23],
                   [2, "a", None]],
                  columns=['A', 'B', 'C'])

group_by_col = "C"
method_imput_num = "mean"
encoder_dict = {}
X = df.copy()

if isinstance(X, pd.DataFrame) == False:
    X = pd.DataFrame(X)
    X = X.astype(object).replace("None", np.nan)
    X = X.astype(object).replace("nan", np.nan)

for col in X.loc[:, X.columns != group_by_col].columns:
    if X[col].dtype == "object":
        dict_col = X.groupby(group_by_col)[col].\
            agg(lambda x: next(iter(x.mode()), None)).to_frame().to_dict()
        encoder_dict.update(dict_col)
    else:
        if method_imput_num == "mean":
            dict_col = X.groupby(group_by_col)[col].\
                agg(lambda x: x.mean()).to_frame().to_dict()
            encoder_dict.update(dict_col)
        elif method_imput_num == "median":
            dict_col = X.groupby(group_by_col)[col].\
                agg(lambda x: x.median()).to_frame().to_dict()
            encoder_dict.update(dict_col)
        elif method_imput_num == "mode":
            dict_col = X.groupby(group_by_col)[col].\
                agg(lambda x: next(iter(x.mode()), None)).to_frame().to_dict()
            encoder_dict.update(dict_col)
print (encoder_dict)

for col in X.loc[:, X.columns != group_by_col].columns:
    X[col] = X[col].fillna(X[group_by_col].map(encoder_dict[col]))