(Python) 如何修复数据框列值中的数字表示错误

Question

只是一个（有点）快速的问题 - 如果我有一个数据框，其列由 1.305.000, 4.65, 99.9, 443.111.34000 形式的数字组成，我如何将它们转换为 'correct' 格式：1305.000, 4.65, 99.9, 443111.34000?

如果有帮助，这些值是从 .csv 文件中的其中一列中获得的，比如说 'Total Net Revenue':

代码块形式：

Day Service Total Net Revenue
0   1   te  1.305.000
1   1   as  4.65
2   2   qw  99.9
3   3   al  443.111.34000
4   6   al  443.111.34000
5   6   te  1.305.000
6   7   pp  200
7   7   te  1.305.000
8   7   al  443.111.34000
9   7   te  1.305.000

以及基于反馈的另一种形式：

[{'Day': 1, 'Service': 'te', 'Total Net Revenue': '1.305.000'},
 {'Day': 1, 'Service': 'as', 'Total Net Revenue': '4.65'},
 {'Day': 2, 'Service': 'qw', 'Total Net Revenue': '99.9'},
 {'Day': 3, 'Service': 'al', 'Total Net Revenue': '443.111.34000'},
 {'Day': 6, 'Service': 'al', 'Total Net Revenue': '443.111.34000'},
 {'Day': 6, 'Service': 'te', 'Total Net Revenue': '1.305.000'},
 {'Day': 7, 'Service': 'pp', 'Total Net Revenue': '200'},
 {'Day': 7, 'Service': 'te', 'Total Net Revenue': '1.305.000'},
 {'Day': 7, 'Service': 'al', 'Total Net Revenue': '443.111.34000'},
 {'Day': 7, 'Service': 'te', 'Total Net Revenue': '1.305.000'}]

我似乎找不到任何关于此的参考资料，一些见解将不胜感激。谢谢！

Answer 1

我会定义一个函数来解析数字，然后在数据框的列上使用 apply。例如

def parse_number(number):
  split_number = number.split(".")
  return number if len(split_number) <= 1 else ".".join(["".join(split_number[:-1]), split_number[-1]])

df["parsed_value"] = df.value.apply(parse_number)

Answer 2

这不完全是一个 pandas 问题，它实际上是在询问如何将看起来很奇怪的字符串转换为数字（标签：数字格式）。

以下函数会将这些字符串转换为所需的数字：

import unittest


def cleanup(s: str) -> float:
    parts = s.split('.')
    if len(parts) > 1:
        s = ''.join(parts[:-1]) + '.' + parts[-1]
    return float(s)


class TestCleanup(unittest.TestCase):

    def test_cleanup(self):
        self.assertEqual(200, cleanup('200'))
        self.assertEqual(4.65, cleanup('4.65'))
        self.assertEqual(1305, cleanup('1.305.000'))
        self.assertEqual(443111.34, cleanup('443.111.34000'))

如果这些是货币数字，您可能会考虑使用 Decimal，这激发了 "scaled integer" 方法。

.apply() 将 cleanup() 函数添加到现有数据帧是一件简单的事情：

df['numeric_revenue'] = df['total_net_revenue'].apply(cleanup)

(Python) 如何修复数据框列值中的数字表示错误

(Python) How to fix numerical representation error in dataframe column values

python

number-formatting

dataframe

python-3.x

pandas