根据范围限制对 pandas 列进行分类

Question

我有一个包含多个数字列的数据框，它们的范围从 1 to 5 或 1 to 10

我想以这种方式创建这些列名称的两个列表：

names_1to5 = list of all columns in df with numbers ranging from 1 to 5


names_1to10 = list of all columns in df with numbers from 1 to 10

示例：

IP  track  batch  size  type
1    2      3     5      A
9    1      2     8      B
10   5      5     10     C

来自上面的数据框：

  names_1to5 = ['track', 'batch']
  names_1to10 = ['ip', 'size']

我想使用获取数据框的函数，并仅对数字在这些范围内的列执行上述转换。

我知道如果 'max()' 列是 5 而不是 1to5 与 max() is 10

相同的想法

我已经做了:

def test(df):
    list_1to5 = []
    list_1to10 = []
    
    for col in df:
        if df[col].max() == 5:
            list_1to5.append(col)
        else:
            list_1to10.append(col)
    return list_1to5, list_1to10

我尝试了上面的方法，但它返回了以下错误消息：

'>=' not supported between instances of 'float' and 'str'

列的类型是 'object' 也许这就是原因。如果这是原因，我该如何修复函数而不需要将这些列转换为浮动，因为这些列有几个，有时是数百个，如果我运行:

df['column'].max() 我得到 10 或 5

创建此函数的最佳方法是什么？

Answer 1

使用：

string = """alpha IP  track  batch  size
A   1    2      3     5
B   9    1      2     8
C   10   5      5     10"""

temp = [x.split() for x in string.split('\n')]
cols = temp[0]
data = temp[1:]

def test(df):
    list_1to5 = []
    list_1to10 = []
    
    for col in df.columns:
        if df[col].dtype!='O':
            if df[col].max() == 5:
                list_1to5.append(col)
            else:
                list_1to10.append(col)
    return list_1to5, list_1to10

df = pd.DataFrame(data, columns = cols, dtype=float)

输出：

(['track', 'batch'], ['IP', 'size'])

根据范围限制对 pandas 列进行分类

Classifying pandas columns according to range limits

pandas

jupyter-notebook