如何遍历数据框中每一列的行

How to iterate over rows of each column in a dataframe

如果只有 1 个传感器,即如果在下面提供的示例数据中删除 col2 和 col3,则我当前的代码会运行并生成一个图表,留下一列。

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

d = {'col1': [-2587.944231, -1897.324231,-2510.304231,-2203.814231,-2105.734231,-2446.964231,-2963.904231,-2177.254231, 2796.354231,-2085.304231], 'col2': [-3764.468462,-3723.608462,-3750.168462,-3694.998462,-3991.268462,-3972.878462,3676.608462,-3827.808462,-3629.618462,-1841.758462,], 'col3': [-166.1357692,-35.36576923, 321.4157692,108.9257692,-123.2257692, -10.84576923, -100.7457692, 89.27423077, -211.0857692, 101.5342308]}

df = pd.DataFrame(data=d)
sensors = 3
window_size = 5
dfn = df.rolling(window_size).corr(pairwise = True)

index = df.index #index of values in the data frame.
rows = len(index) #len(index) returns number of rows in the data.
sensors = 3

baseline_num = [0]*(rows) #baseline numerator, by default zero
baseline = [0]*(rows) #initialize baseline value
baseline = DataFrame(baseline)
baseline_num = DataFrame(baseline_num)


v = [None]*(rows) # Initialize an empty array v[] equal to amount of rows in .csv file
s = [None]*(rows) #Initialize another empty array for the slope values for detecting when there is an exposure
d = [0]*(rows)

sensors_on = True #Is the sensor detecting something (True) or not (False).
off_count  = 0
off_require = 8 # how many offs until baseline is updated
sensitivity = 1000

for i in range(0, (rows)): #This iterates over each index value, i.e. each row, and sums the values and returns them in list format.

    v[i] = dfn.loc[i].to_numpy().sum() - sensors


for colname,colitems in df.iteritems():
    for rownum,rowitem in colitems.iteritems():

        #d[rownum] = dfone.loc[rownum].to_numpy()
        #d[colname][rownum] = df.loc[colname][rownum]

        if v[rownum] >= sensitivity:
            sensors_on = True
            off_count = 0
            baseline_num[rownum] = 0

        else:
            sensors_on = False
            off_count += 1
            if off_count == off_require:
                for x in range(0, (off_require)):
                    baseline_num[colname][rownum] += df[colname][rownum - x]

            elif off_count > off_require:
                baseline_num[colname][rownum] += baseline_num[colname][rownum - 1] + df[colname][rownum] - (df[colname][rownum - off_require]) #this loop is just an optimization, one calculation per loop once the first calculation is established

        baseline[colname][rownum] = ((baseline_num[colname][rownum])//(off_require)) #mean of the last "off_require" points



dfx = DataFrame(v, columns =['Sensor Correlation']) #converts the summed correlation tables back from list format to a DataFrame, with the sole column name 'Sensor Correlation'
dft = pd.DataFrame(baseline, columns =['baseline'])
dft = dft.astype(float)

dfx.plot(figsize=(50,25), linewidth=5, fontsize=40) # plots dfx dataframe which contains correlated and summed data
dft.plot(figsize=(50,25), linewidth=5, fontsize=40)

基本上,我不想生成 1 个图表,而是只想为此循环遍历每一列:

for colname,colitems in df.iteritems():
    for rownum,rowitem in colitems.iteritems():

        #d[rownum] = dfone.loc[rownum].to_numpy()
        #d[colname][rownum] = df.loc[colname][rownum]

        if v[rownum] >= sensitivity:
            sensors_on = True
            off_count = 0
            baseline_num[rownum] = 0

        else:
            sensors_on = False
            off_count += 1
            if off_count == off_require:
                for x in range(0, (off_require)):
                    baseline_num[colname][rownum] += df[colname][rownum - x]

            elif off_count > off_require:
                baseline_num[colname][rownum] += baseline_num[colname][rownum - 1] + df[colname][rownum] - (df[colname][rownum - off_require]) #this loop is just an optimization, one calculation per loop once the first calculation is established

我尝试了其他问题的其他解决方案,但 none 似乎解决了这个问题。 到目前为止,我已经尝试过多次转换为列表和元组之类的东西,然后这样称呼它们:

baseline_num[i,column] += d[i - x,column]

以及

baseline_num[i][column += d[i - x][column]

同时使用

遍历循环
for column in columns

然而,无论我如何安排解决方案,总是会出现一些预期整数或切片索引的关键错误,以及其他错误。 在实际 data.with 不同的输入参数上查看一列的 expected/possible 输出的图片(灵敏度值和 off_require 在不同情况下会有所不同。) 一种无效的解决方案是来自 link:

的循环方法

https://www.geeksforgeeks.org/iterating-over-rows-and-columns-in-pandas-dataframe/

我也试过使用 iteritems 作为外循环创建一个循环。这也不起作用。

下面是 link 各种灵敏度值的可能图形输出,windows 在我的实际数据集中,只有一列。 (即我手动删除了其他列,并仅使用当前程序绘制了一个)

sensitivity 1000, window 8

sensitivity 800, window 5

sensitivity 1500, window 5

如果我遗漏了任何有助于解决此问题的内容,请告诉我,以便我立即纠正。

看这张图是我的原图df.head: df.head

你试过了吗,

for colname,colitems in df.iteritems():
    for rownum,rowitem in colitems.iteritems():
        print(df[colname][rownum])

第一个循环遍历所有列,第二个循环遍历该列的所有行。

编辑:

根据我们下面的对话,我认为您的基线数据框和 df 数据框没有相同的列名,因为您创建它们的方式以及访问元素的方式不同。

我的建议是创建基线数据框作为 df 数据框的副本,并从那里编辑其中的信息。

编辑:

我已经设法让你的代码在 1 个循环中工作,但是我 运行 遇到索引错误,我不确定你的优化函数做了什么,但我认为这就是导致它的原因,采取看

就是这部分 baseline_num[colname][rownum - 1],在第二个循环中,我猜是因为你执行了 rownum (0) -1,你得到了索引 -1。您需要对其进行更改,以便在第一个循环中 rownum 为 1 或其他内容,我不确定您要在那里做什么。

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

d = {'col1': [-2587.944231, -1897.324231,-2510.304231,-2203.814231,-2105.734231,-2446.964231,-2963.904231,-2177.254231, 2796.354231,-2085.304231], 'col2': [-3764.468462,-3723.608462,-3750.168462,-3694.998462,-3991.268462,-3972.878462,3676.608462,-3827.808462,-3629.618462,-1841.758462,], 'col3': [-166.1357692,-35.36576923, 321.4157692,108.9257692,-123.2257692, -10.84576923, -100.7457692, 89.27423077, -211.0857692, 101.5342308]}

df = pd.DataFrame(data=d)
sensors = 3
window_size = 5
dfn = df.rolling(window_size).corr(pairwise = True)

index = df.index #index of values in the data frame.
rows = len(index) #len(index) returns number of rows in the data.
sensors = 3

baseline_num = [0]*(rows) #baseline numerator, by default zero
baseline = [0]*(rows) #initialize baseline value
baseline = pd.DataFrame(df)
baseline_num = pd.DataFrame(df)
#print(baseline_num)


v = [None]*(rows) # Initialize an empty array v[] equal to amount of rows in .csv file
s = [None]*(rows) #Initialize another empty array for the slope values for detecting when there is an exposure
d = [0]*(rows)

sensors_on = True #Is the sensor detecting something (True) or not (False).
off_count  = 0
off_require = 8 # how many offs until baseline is updated
sensitivity = 1000

for i in range(0, (rows)): #This iterates over each index value, i.e. each row, and sums the values and returns them in list format.

    v[i] = dfn.loc[i].to_numpy().sum() - sensors


for colname,colitems in df.iteritems():
    #print(colname)
    for rownum,rowitem in colitems.iteritems():
        #print(rownum)
        #display(baseline[colname][rownum])
        #d[rownum] = dfone.loc[rownum].to_numpy()
        #d[colname][rownum] = df.loc[colname][rownum]

        if v[rownum] >= sensitivity:
            sensors_on = True
            off_count = 0
            baseline_num[rownum] = 0

        else:
            sensors_on = False
            off_count += 1
            if off_count == off_require:
                for x in range(0, (off_require)):
                    baseline_num[colname][rownum] += df[colname][rownum - x]

            elif off_count > off_require:
                baseline_num[colname][rownum] += baseline_num[colname][rownum - 1] + df[colname][rownum] - (df[colname][rownum - off_require]) #this loop is just an optimization, one calculation per loop once the first calculation is established

        baseline[colname][rownum] = ((baseline_num[colname][rownum])//(off_require)) #mean of the last "off_require" points

        print(baseline[colname][rownum])


dfx = pd.DataFrame(v, columns =['Sensor Correlation']) #converts the summed correlation tables back from list format to a DataFrame, with the sole column name 'Sensor Correlation'
dft = pd.DataFrame(baseline, columns =['baseline'])
dft = dft.astype(float)

dfx.plot(figsize=(50,25), linewidth=5, fontsize=40) # plots dfx dataframe which contains correlated and summed data
dft.plot(figsize=(50,25), linewidth=5, fontsize=40)

我的输出是这样的,

-324.0
-238.0
-314.0
-276.0
-264.0
-306.0
-371.0
-806.0
638.0
-412.0

---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/range.py in get_loc(self, key, method, tolerance)
    354                 try:
--> 355                     return self._range.index(new_key)
    356                 except ValueError as err:

ValueError: -1 is not in range


The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)

3 frames

/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/range.py in get_loc(self, key, method, tolerance)
    355                     return self._range.index(new_key)
    356                 except ValueError as err:
--> 357                     raise KeyError(key) from err
    358             raise KeyError(key)
    359         return super().get_loc(key, method=method, tolerance=tolerance)

KeyError: -1

我没有足够的代表发表评论,但以下是我能够解决的问题。希望对您有所帮助!

我在计算答案时尝试使用 to_list() 函数,结果出现错误:

AttributeError: 'DataFrame' object has no attribute 'to_list'

所以,我决定绕过那个方法,想出了这个:

indexes = [x for x in df.index]

row_vals = []

for index in indexes :
    for val in df.iloc[i].values:
        row_vals.append(val)

对象 row_vals 将按行顺序包含所有值。

如果您只想获取特定行或一组行的行值,则需要这样做:

indx_subset = [`list of row indices`] #(Ex. [1, 2, 5, 6, etc...])

row_vals = []

for indx in indx_subset:
    for val in df.loc[indx].values:
        row_vals.append(val)

row_vals 将拥有指定索引中的所有行值。