无需永久改变对象的方法链接

Question

我正在学习如何编写 python class 和方法链接。基本上，我想要一个 python (2.7) class 来保存我的数据并具有（可链接的）方法，允许我在不改变原始数据的情况下过滤数据。我做了一些谷歌搜索，看起来我的答案可能与 return self 有关，但我不确定如何实现它以使这些方法不会改变我的原始数据。

假设我有一个数据存储在一个名为 file 的 excel 文件中，如下所示：

+--------+-----+-------+
| Person | Sex | Score |
+--------+-----+-------+
| A      | M   |    10 |
| B      | F   |     9 |
| C      | M   |     8 |
| D      | F   |     7 |
+--------+-----+-------+

我想写一个叫做MyData的class，这样我就可以做一些基本的数据调用和过滤。

这是我目前得到的

class MyData:
    def __init__ (self, file):
        import pandas as pd
        self.data = pd.read_excel (file)
        self.Person = self.data['Person']
        self.Sex = self.data['Sex']
        self.Score = self.data['Score']

    def male_only(self):
        self.data = self.data[self.Sex=="M"]
        self.Person = self.Person[self.Sex=="M"]
        self.Score = self.Score[self.Sex=="M"]
        self.Sex = self.Sex[self.Sex=="M"]
        return self

    def female_only(self):
        self.data = self.data[self.Sex=="F"]
        self.Person = self.Person[self.Sex=="F"]
        self.Score = self.Score[self.Sex=="F"]
        self.Sex = self.Sex[self.Sex=="F"]
        return self

这似乎可行，但遗憾的是我的原始数据已被此代码永久变异。例如：

Data = MyData(file)
Data.data
>>> Data.data
  Person Sex  Score
0      A   M     10
1      B   F      9
2      C   M      8
3      D   F      7

Data.male_only().data
>>> Data.male_only().data
  Person Sex  Score
0      A   M     10
2      C   M      8

Data.data
>>> Data.data
  Person Sex  Score
0      A   M     10
2      C   M      8

我想要 class returns 对 Data.male_only().Person 和 Data.Person.male_only() 或 Data.male_only().data 和 Data.data.male_only() 的相同答案，而不是永久变异 Data.data 或 Data.Person.

Answer 1

当您写 self.data = ... 时，您明确修改了第一行的 self.data。您可以 return 一个新的数据实例：

    def male_only(self):
        newdata = MyData()
        newdata.data = self.data[self.Sex=="M"]
        newdata.Person = self.Person[self.Sex=="M"]
        newdata.Score = self.Score[self.Sex=="M"]
        newdata.Sex = self.Sex[self.Sex=="M"]
        return newdata

根据您的意见，这里有一个过滤器解决方案的建议：有函数可以激活一些flags/filters，然后您必须编写函数来获取属性：

# self.filters should be initialized to [] in __init__
def male_only(self):
    self.filters.append('male_only')
def person(self):
    if "male_only" in self.filters:
        return self.Person[self.Sex=="M"]
    else: 
        return self.Person

要看看这是否可行，您应该真正完成您的测试用例以帮助您修正您的想法（最好先编写测试用例，然后类 ).

Answer 2

我想详细说明@Demi-Lune 的回答。我不认为有一种方法可以解决创建 MyData 实例、修改它并从链方法中 returning 它的问题。这种事情首先起作用的全部原因是您所有的链方法都属于同一个 class，并且它们 return 是 class.[=24 的一个实例=]

例如str.swapcase、str.zfill和str.replace都是str的一部分，它们都是returnstr .

>>> string = "Hello World"
>>> string.swapcase().zfill(16).replace("L", "T")
'00000hETTO wORTD'
>>> string
'Hello World'
>>>

您正在尝试做的事情 (Data.Person.male_only()) 打破了这种模式，因为现在暗示方法 male_only 不是 MyData [=30= 的一部分]，而是属于 Person 对象的方法。什么是 self.Person 或 self.data["Person"]？我对 Pandas 不是很熟悉。它是一个字符串吗？字符串列表？在任何情况下，无论它是什么，您要实现的目标基本上都涉及向该类型的 class 添加一个名为 male_only 的新方法。

Answer 3

我同意@Demi-Lune。

我更改了 OP 的代码，以便 male_only() 和 female_only() 方法总是 return 其所属对象的副本。我更改了 __init__() 方法，因为我认为您不想在每次创建新对象时都调用 pd.read_csv() 方法。所以male_only()和female_only()方法总是return新建对象，不会影响其他对象。

import pandas as pd

# Added for creating file on memory.
import io
csv = '''Person,Sex,Score
p1,M,1
p2,M,2
p3,M,3
p4,F,4
p5,F,5
p6,F,6'''
file = io.StringIO(csv)

class MyData:
    def __init__ (self, file=None, data=None):
        import pandas as pd
        if file:
            self.data = pd.read_csv(file)
        else:
            self.data = data
        self.Person = self.data['Person']
        self.Sex = self.data['Sex']
        self.Score = self.data['Score']

    def copy_d(self):
        return MyData(data=self.data.copy())

    def male_only(self):
        d = self.copy_d()
        d.data = self.data[self.Sex=="M"]
        d.Person = self.Person[self.Sex=="M"]
        d.Score = self.Score[self.Sex=="M"]
        d.Sex = self.Sex[self.Sex=="M"]
        return d

    def female_only(self):
        d = self.copy_d()
        d.data = self.data[self.Sex=="F"]
        d.Person = self.Person[self.Sex=="F"]
        d.Score = self.Score[self.Sex=="F"]
        d.Sex = self.Sex[self.Sex=="F"]
        return d

d = MyData(file)
print(d.female_only().data)
#   Person Sex  Score
# 3     p4   F      4
# 4     p5   F      5
# 5     p6   F      6

print(d.male_only().data)
#   Person Sex  Score
# 0     p1   M      1
# 1     p2   M      2
# 2     p3   M      3

print(d.data)
#   Person Sex  Score
# 0     p1   M      1
# 1     p2   M      2
# 2     p3   M      3
# 3     p4   F      4
# 4     p5   F      5
# 5     p6   F      6

但是如果你只是使用 pandas.DataFrame，另一种方法是只使用裸 pandas.DataFrame。首先，在大多数情况下，pandas.DataFrame 对象已经具有等于列名称的属性名称。所以实际上，你不需要定义像Person、Sex、Score这样的属性，因为它已经存在于DataFrame对象中。

即：

import pandas as pd
import numpy as np
df = pd.DataFrame(np.eye(3,3), columns=['Person', 'Sex', 'Score'])

# `df` already has these properteis.
df.Person
df.Sex
df.Score
# In [986]: df.Person
# Out[986]: 
# 0    1.0
# 1    0.0
# 2    0.0
# Name: Person, dtype: float64

# In [987]: df.Sex
# Out[987]: 
# 0    0.0
# 1    1.0
# 2    0.0
# Name: Sex, dtype: float64

# In [988]: df.Score
# Out[988]: 
# 0    0.0
# 1    0.0
# 2    1.0
# Name: Score, dtype: float64

因此，您的 male_only() 和 female_only() 方法如下所示。

import pandas as pd

# Added for creating file on memory.
import io
csv = '''Person,Sex,Score
p1,M,1
p2,M,2
p3,M,3
p4,F,4
p5,F,5
p6,F,6'''
file = io.StringIO(csv)

def male_only(df):
    return df[df.Sex=='M']

def female_only(df):
    return df[df.Sex=='F']

df = pd.read_csv(file)
male_only(df)
# In [1034]: male_only(df)
# Out[1037]: 
#   Person Sex  Score
# 0     p1   M      1
# 1     p2   M      2
# 2     p3   M      3

female_only(df)
# In [1038]: female_only(df)
# Out[1041]: 
#   Person Sex  Score
# 3     p4   F      4
# 4     p5   F      5
# 5     p6   F      6

希望对你有所帮助

无需永久改变对象的方法链接

Method-chaining without permanently mutating the object

python

python-2.x

method-chaining

pandas