是否可以使用 dtype 进行赋值，以便为数字设置 -99.0，为文本设置 X？

Question

我有 20 多列数据。必须有一种非手动的方式来使用数据类型，以便用 -99.0（我使用的软件将 -99.0 识别为数字缺失）和 X（我使用的软件将 X 识别为文本缺失）填充空白，如果文本.我搜索了一下，只看到手动说明所有列名的方式。如果列名从未改变但从一个项目到另一个项目，这将重复工作，我不会总是有相同的列或列名。试图自动化这个。这是一个小例子：

ID	Project	From	To	Value1	Value2
1	AAA	0	10	15	0.578
1	AAA	10	20	7.6
2		0	100	14	0.777
2		100	200	6.5
1	ABA	0	5	22.7	0.431
1	BBB	15	20	0.8	17.4
2		0	10		1.200
2	BBB	10	20	6.9	200.8

我知道我可以做到这一点，但它只能做数字：

result.fillna(0, inplace=True)

我也可以试试这个但是输入 -99.0:

dataframe[list_of_columns].replace(r'\s+', 0, regex=True)

但是那是非常手动的，我希望这是自动化的，因为我有很多项目并且希望节省时间并且它只处理数字，而不是文本列。

我找到了这个，但我无法将文本空白转换为“X”。我假设它会类似于我保存 list_of_columns 然后有一个 for 循环的地方？

def recode_empty_cells（数据帧，list_of_columns）：

for column in list_of_columns:
  dataframe[column] = dataframe[column].replace(r'\s+', np.nan, regex=True)
  dataframe[column] = dataframe[column].fillna(0)

return dataframe

最后我希望它看起来像这样：

ID	Project	From	To	Value1	Value2
1	AAA	0	10	15	0.578
1	AAA	10	20	7.6	-99.0
2	X	0	100	14	0.777
2	X	100	200	6.5	-99.0
1	ABA	0	5	22.7	0.431
1	BBB	15	20	0.8	17.4
2	X	0	10	-99.0	1.200
2	BBB	10	20	6.9	200.8

提前致谢！

Answer 1

如果您的列有正确的 dtypes，那么您可以使用 DataFrame.select_dtypes。 Select数值型填入-99，排除数值型填入X。然后加入结果并重新编制索引（如果您关心列顺序）。

import pandas as pd
import numpy as np

df = (pd.concat([df.select_dtypes(include=np.number).fillna(-99),
                 df.select_dtypes(exclude=np.number).fillna('X')], axis=1)
        .reindex(df.columns, axis=1))

   ID Project  From   To  Value1   Value2
0   1     AAA     0   10    15.0    0.578
1   1     AAA    10   20     7.6  -99.000
2   2       X     0  100    14.0    0.777
3   2       X   100  200     6.5  -99.000
4   1     ABA     0    5    22.7    0.431
5   1     BBB    15   20     0.8   17.400
6   2       X     0   10   -99.0    1.200
7   2     BBB    10   20     6.9  200.800

另一个有效的选择是使用 select_dtypes 获取列，然后手动填充。由于我们只关心列标签，而列总是只有一个 dtype，我们可以只使用 .head(1)。事实证明，由于 df.select_dtypes returns DataFrame 的一部分，它对于较大的 DataFrame 来说变得很慢，但我们只需要一行。

num_cols = df.head(1).select_dtypes(include=np.number).columns
oth_cols = df.head(1).select_dtypes(exclude=np.number).columns

df[num_cols] = df[num_cols].fillna(-99)
df[oth_cols] = df[oth_cols].fillna('X')

是否可以使用 dtype 进行赋值，以便为数字设置 -99.0，为文本设置 X？

Can dtype be used for assignment in order to set -99.0's for numeric and X's for text?

python

missing-data

pandas

dtype