使用 pandas 在一列数字中查找数据变化
Finding data change in a column of numbers with pandas
我有一个很大的 csv table,其中的数据如下:
Loop_3_OP Loop_3_PV Line1_Cleaning Line2_Cleaning time date
59.17 29.63 0 0 18:00:33.239000 2015-11-01
59.17 29.63 0 0 18:00:34.231000 2015-11-01
在整个 table、Line1_Cleaning 和 Line2_Cleaning 中,0 和 1 之间的变化如下:
59.17 29.63 0 0 18:06:22.343000 2015-11-01
59.17 29.63 1 0 18:06:34.565000 2015-11-01
59.17 29.63 1 0 18:06:34.565000 2015-11-01
59.17 29.63 1 0 18:06:35.918000 2015-11-01
59.17 29.63 1 0 18:06:35.918000 2015-11-01
59.17 29.63 0 0 18:06:35.929000 2015-11-01
我希望能够在发生转换时只选择行,例如:
59.17 29.63 1 0 18:06:34.565000 2015-11-01
我可以在传统的 python 遍历行中做到这一点:
read = csv.reader(ifile)
for row in read :
val= row[2]
if val>lastval:
print val, row[4],"L1 Start Clean"
lastval=val
我想知道是否有人可以提供是否可以在 Pandas 中完成。我正在通过 Anaconda 和 iPython 工作,并且想要
看看是否可行。
此致
numpy.diff
在这里可能很有用:计算每列的差异,在差异 != 0 的位置您会找到行索引。您可以使用布尔值或合并两列的差异,并且不要忘记将索引偏移 1。
类似于:
diff1 = np.diff(table['Line1_Cleaning'])
diff2 = np.diff(...
diff = (diff1 != 0) | (diff2 != 0)
indices = np.arange(len(diff))[diff] + 1
changing_rows = table.ix[indices]
(完全未经测试。)
(也许 Pandas 中也有 diff
function/method,但我对 numpy 更熟悉。)
IIUC 你可以使用 diff
:
In [16]:
df[df['Line1_Cleaning'].diff() > 0]
Out[16]:
Loop_3_OP Loop_3_PV Line1_Cleaning Line2_Cleaning time \
3 59.17 29.63 1 0 18:06:34.565000
date
3 2015-11-01
所以这会调用 diff
来减去前几行的行,并在差异 >0
处过滤它们
diff
的输出:
In [17]:
df['Line1_Cleaning'].diff()
Out[17]:
0 NaN
1 0
2 0
3 1
4 0
5 0
6 0
7 -1
Name: Line1_Cleaning, dtype: float64
如果我对您的理解正确,如果 Line1_Cleaning
值为 1
,您想要 select 行。如果是这样,您可以这样做:
df = df[df.Line1_Cleaning == 1]
我认为您正在寻找当前 Line1_Cleaning
大于 previous line Line1_Cleaning
值然后提取提取行。仅 Line1_Cleaning
从 0 to 1
变化而来。
import pandas as pd
df = pd.read_csv(ifile)
final_df = df[df['Line1_Cleaning'] > df['Line1_Cleaning'].shift(1)]
print final_df
我有一个很大的 csv table,其中的数据如下:
Loop_3_OP Loop_3_PV Line1_Cleaning Line2_Cleaning time date
59.17 29.63 0 0 18:00:33.239000 2015-11-01
59.17 29.63 0 0 18:00:34.231000 2015-11-01
在整个 table、Line1_Cleaning 和 Line2_Cleaning 中,0 和 1 之间的变化如下:
59.17 29.63 0 0 18:06:22.343000 2015-11-01
59.17 29.63 1 0 18:06:34.565000 2015-11-01
59.17 29.63 1 0 18:06:34.565000 2015-11-01
59.17 29.63 1 0 18:06:35.918000 2015-11-01
59.17 29.63 1 0 18:06:35.918000 2015-11-01
59.17 29.63 0 0 18:06:35.929000 2015-11-01
我希望能够在发生转换时只选择行,例如:
59.17 29.63 1 0 18:06:34.565000 2015-11-01
我可以在传统的 python 遍历行中做到这一点:
read = csv.reader(ifile)
for row in read :
val= row[2]
if val>lastval:
print val, row[4],"L1 Start Clean"
lastval=val
我想知道是否有人可以提供是否可以在 Pandas 中完成。我正在通过 Anaconda 和 iPython 工作,并且想要 看看是否可行。
此致
numpy.diff
在这里可能很有用:计算每列的差异,在差异 != 0 的位置您会找到行索引。您可以使用布尔值或合并两列的差异,并且不要忘记将索引偏移 1。
类似于:
diff1 = np.diff(table['Line1_Cleaning'])
diff2 = np.diff(...
diff = (diff1 != 0) | (diff2 != 0)
indices = np.arange(len(diff))[diff] + 1
changing_rows = table.ix[indices]
(完全未经测试。)
(也许 Pandas 中也有
diff
function/method,但我对 numpy 更熟悉。)
IIUC 你可以使用 diff
:
In [16]:
df[df['Line1_Cleaning'].diff() > 0]
Out[16]:
Loop_3_OP Loop_3_PV Line1_Cleaning Line2_Cleaning time \
3 59.17 29.63 1 0 18:06:34.565000
date
3 2015-11-01
所以这会调用 diff
来减去前几行的行,并在差异 >0
diff
的输出:
In [17]:
df['Line1_Cleaning'].diff()
Out[17]:
0 NaN
1 0
2 0
3 1
4 0
5 0
6 0
7 -1
Name: Line1_Cleaning, dtype: float64
如果我对您的理解正确,如果 Line1_Cleaning
值为 1
,您想要 select 行。如果是这样,您可以这样做:
df = df[df.Line1_Cleaning == 1]
我认为您正在寻找当前 Line1_Cleaning
大于 previous line Line1_Cleaning
值然后提取提取行。仅 Line1_Cleaning
从 0 to 1
变化而来。
import pandas as pd
df = pd.read_csv(ifile)
final_df = df[df['Line1_Cleaning'] > df['Line1_Cleaning'].shift(1)]
print final_df