Pandas 带美元符号的数据框金额值

Question

我有一个包含以下各列的 pandas 数据框。 Column_1 是 string/text 而不是整数或小数。几行具有字符串值以及名称（参考第 6 行）

S.No.  Column_1
1      256
2      1
3      0.54672
4      756
5      2.34333
6      Andrew

我想将 column_1 中的所有值转换为 numbers/int，美元值和带有名称的行除外。我要求保留美元符号，但金额应四舍五入为小数点后两位数。

预期输出：

S.No.  Column_1
1           256
2             1
3       0.55
4           756
5       2.34
6       Andrew

我使用 pd.to_numeric() 将整列转换为数字，错误='coerce'，但金额值变为空白（或）空值，因为这是一个错误。

任何关于此的 suggestions/help 将不胜感激。谢谢。

Answer 1

通过 Series.str.startswith, remove $ by Series.str.strip 过滤以 $ 开头的值，转换为数字、舍入、转换为字符串并在前面添加 $:

m = df['Column_1'].str.startswith('$', na=False)

s = '$' + df.loc[m, 'Column_1'].str.strip('$').astype(float).round(2).astype(str)

或者：

s = df.loc[m, 'Column_1'].str.strip('$').astype(float).round(2).astype(str).radd('$')

df.loc[m, 'Column_1'] = s


print (df)
   S.No. Column_1
0      1      256
1      2        1
2      3  0.55
3      4      756
4      5  2.34

Last if need non matched values convert to numeric, but get mixed data types - strings with $ 和 numbers without $:

df.loc[~m, 'Column_1'] = pd.to_numeric(df.loc[~m, 'Column_1'])
print (df)
   S.No.    Column_1
0      1         256
1      2           1
2      3  0.54672
3      4         756
4      5  2.34333

print (df['Column_1'].apply(type))
0    <class 'int'>
1    <class 'int'>
2    <class 'str'>
3    <class 'int'>
4    <class 'str'>
Name: Column_1, dtype: object

编辑最后一段：可以添加 errors='coerce' 将非数字转换为缺失值，然后将其替换为原始值：

df.loc[~m, 'Column_1'] = pd.to_numeric(df.loc[~m, 'Column_1'], errors='coerce').fillna(df['Column_1'])
print (df)
   S.No. Column_1
0      1      256
1      2        1
2      3  0.55
3      4      756
4      5  2.34
5      6   Andrew

print (df['Column_1'].apply(type))

0    <class 'float'>
1    <class 'float'>
2      <class 'str'>
3    <class 'float'>
4      <class 'str'>
5      <class 'str'>
Name: Column_1, dtype: object

Pandas 带美元符号的数据框金额值

Pandas dataframe amount value with dollar symbol

python

numeric

pandas