Python pandas 适用:第一行 None 有问题
Python pandas apply: problem with None in first row
我需要一些帮助。
编写用于查找组模式的代码并将 None 替换为该模式。
当 "None" 在第一行时,thad 不起作用:
df = pd.DataFrame([[16, None, 3], [17, None, 30], [10, "v", 30], [10, "z", 3], [None, "a", 23], [2, "a", 23]], columns=['A', 'B', 'C'])
dict_group = df.groupby('C')['B'].agg(lambda x: pd.Series.mode(x).iat[0]).to_frame().to_dict()
df.apply(lambda s: dict_group["B"][s["C"]] if ((s["B"]==None) | (pd.isnull(s["B"])==True)) else s, axis=1)["B"]
给出错误
TypeError Traceback (most recent call last)
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()
TypeError: an integer is required
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
<ipython-input-129-0f7009f92c25> in <module>
----> 1 df.apply(lambda s: dict_group["B"][s["C"]] if ((s["B"]==None) | (pd.isnull(s["B"])==True)) else s, axis=1)["B"]
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\series.py in __getitem__(self, key)
765 key = com._apply_if_callable(key, self)
766 try:
--> 767 result = self.index.get_value(self, key)
768
769 if not is_scalar(result):
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_value(self, series, key)
3116 try:
3117 return self._engine.get_value(s, k,
-> 3118 tz=getattr(series.dtype, 'tz', None))
3119 except KeyError as e1:
3120 if len(self) > 0 and self.inferred_type in ['integer', 'boolean']:
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
KeyError: 'B'
但是第一行没有 "None" 效果很好!
df = pd.DataFrame([[16, "y", 3], [17, None, 30], [10, "v", 30], [10, "z", 3], [None, "a", 23], [2, "a", 23]], columns=['A', 'B', 'C'])
dict_group = df.groupby('C')['B'].agg(lambda x: pd.Series.mode(x).iat[0]).to_frame().to_dict()
df.apply(lambda s: dict_group["B"][s["C"]] if ((s["B"]==None) | (pd.isnull(s["B"])==True)) else s, axis=1)["B"]
0 y
1 v
2 v
3 z
4 a
5 a
Name: B, dtype: object
我该如何解决?
使用GroupBy.transform
for Series
with same size like original, so possible replace None
s or NaN
s by Series.fillna
.
另外,对于更通用的解决方案,添加 next
和 iter
for return None
if mode return empty Series
and iat[0]
失败:
df = pd.DataFrame([[16, None, 3], [17, None, 30],
[10, "v", 30], [10, "z", 3],
[None, "a", 23], [2, "a", 23]],
columns=['A', 'B', 'C'])
s = df.groupby('C')['B'].transform(lambda x: next(iter(x.mode()), None))
df['B'] = df['B'].fillna(s)
print (df)
A B C
0 16.0 z 3
1 17.0 v 30
2 10.0 v 30
3 10.0 z 3
4 NaN a 23
5 2.0 a 23
@jezrael 谢谢!我是这样制作的:
df = pd.DataFrame([[5, None, 23],
[17, "v", 30],
[10, "v", 3],
[10, "z", 23],
[None, "a", 23],
[2, "a", None]],
columns=['A', 'B', 'C'])
group_by_col = "C"
method_imput_num = "mean"
encoder_dict = {}
X = df.copy()
if isinstance(X, pd.DataFrame) == False:
X = pd.DataFrame(X)
X = X.astype(object).replace("None", np.nan)
X = X.astype(object).replace("nan", np.nan)
for col in X.loc[:, X.columns != group_by_col].columns:
if X[col].dtype == "object":
dict_col = X.groupby(group_by_col)[col].\
agg(lambda x: next(iter(x.mode()), None)).to_frame().to_dict()
encoder_dict.update(dict_col)
else:
if method_imput_num == "mean":
dict_col = X.groupby(group_by_col)[col].\
agg(lambda x: x.mean()).to_frame().to_dict()
encoder_dict.update(dict_col)
elif method_imput_num == "median":
dict_col = X.groupby(group_by_col)[col].\
agg(lambda x: x.median()).to_frame().to_dict()
encoder_dict.update(dict_col)
elif method_imput_num == "mode":
dict_col = X.groupby(group_by_col)[col].\
agg(lambda x: next(iter(x.mode()), None)).to_frame().to_dict()
encoder_dict.update(dict_col)
print (encoder_dict)
for col in X.loc[:, X.columns != group_by_col].columns:
X[col] = X[col].fillna(X[group_by_col].map(encoder_dict[col]))
我需要一些帮助。 编写用于查找组模式的代码并将 None 替换为该模式。 当 "None" 在第一行时,thad 不起作用:
df = pd.DataFrame([[16, None, 3], [17, None, 30], [10, "v", 30], [10, "z", 3], [None, "a", 23], [2, "a", 23]], columns=['A', 'B', 'C'])
dict_group = df.groupby('C')['B'].agg(lambda x: pd.Series.mode(x).iat[0]).to_frame().to_dict()
df.apply(lambda s: dict_group["B"][s["C"]] if ((s["B"]==None) | (pd.isnull(s["B"])==True)) else s, axis=1)["B"]
给出错误
TypeError Traceback (most recent call last)
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()
TypeError: an integer is required
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
<ipython-input-129-0f7009f92c25> in <module>
----> 1 df.apply(lambda s: dict_group["B"][s["C"]] if ((s["B"]==None) | (pd.isnull(s["B"])==True)) else s, axis=1)["B"]
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\series.py in __getitem__(self, key)
765 key = com._apply_if_callable(key, self)
766 try:
--> 767 result = self.index.get_value(self, key)
768
769 if not is_scalar(result):
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_value(self, series, key)
3116 try:
3117 return self._engine.get_value(s, k,
-> 3118 tz=getattr(series.dtype, 'tz', None))
3119 except KeyError as e1:
3120 if len(self) > 0 and self.inferred_type in ['integer', 'boolean']:
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
KeyError: 'B'
但是第一行没有 "None" 效果很好!
df = pd.DataFrame([[16, "y", 3], [17, None, 30], [10, "v", 30], [10, "z", 3], [None, "a", 23], [2, "a", 23]], columns=['A', 'B', 'C'])
dict_group = df.groupby('C')['B'].agg(lambda x: pd.Series.mode(x).iat[0]).to_frame().to_dict()
df.apply(lambda s: dict_group["B"][s["C"]] if ((s["B"]==None) | (pd.isnull(s["B"])==True)) else s, axis=1)["B"]
0 y
1 v
2 v
3 z
4 a
5 a
Name: B, dtype: object
我该如何解决?
使用GroupBy.transform
for Series
with same size like original, so possible replace None
s or NaN
s by Series.fillna
.
另外,对于更通用的解决方案,添加 next
和 iter
for return None
if mode return empty Series
and iat[0]
失败:
df = pd.DataFrame([[16, None, 3], [17, None, 30],
[10, "v", 30], [10, "z", 3],
[None, "a", 23], [2, "a", 23]],
columns=['A', 'B', 'C'])
s = df.groupby('C')['B'].transform(lambda x: next(iter(x.mode()), None))
df['B'] = df['B'].fillna(s)
print (df)
A B C
0 16.0 z 3
1 17.0 v 30
2 10.0 v 30
3 10.0 z 3
4 NaN a 23
5 2.0 a 23
@jezrael 谢谢!我是这样制作的:
df = pd.DataFrame([[5, None, 23],
[17, "v", 30],
[10, "v", 3],
[10, "z", 23],
[None, "a", 23],
[2, "a", None]],
columns=['A', 'B', 'C'])
group_by_col = "C"
method_imput_num = "mean"
encoder_dict = {}
X = df.copy()
if isinstance(X, pd.DataFrame) == False:
X = pd.DataFrame(X)
X = X.astype(object).replace("None", np.nan)
X = X.astype(object).replace("nan", np.nan)
for col in X.loc[:, X.columns != group_by_col].columns:
if X[col].dtype == "object":
dict_col = X.groupby(group_by_col)[col].\
agg(lambda x: next(iter(x.mode()), None)).to_frame().to_dict()
encoder_dict.update(dict_col)
else:
if method_imput_num == "mean":
dict_col = X.groupby(group_by_col)[col].\
agg(lambda x: x.mean()).to_frame().to_dict()
encoder_dict.update(dict_col)
elif method_imput_num == "median":
dict_col = X.groupby(group_by_col)[col].\
agg(lambda x: x.median()).to_frame().to_dict()
encoder_dict.update(dict_col)
elif method_imput_num == "mode":
dict_col = X.groupby(group_by_col)[col].\
agg(lambda x: next(iter(x.mode()), None)).to_frame().to_dict()
encoder_dict.update(dict_col)
print (encoder_dict)
for col in X.loc[:, X.columns != group_by_col].columns:
X[col] = X[col].fillna(X[group_by_col].map(encoder_dict[col]))