根据数据框中的位置计算元素
Counting elements based on placement in dataframe
下面,我有一个 table,其中 TST1
到 TST5
列可以没有值或以下值之一:NOT_DONE
INCOMP
UNTESTED
30
35
40
45
50
我需要计算从下面的 table 验证的元素(行)的数量。
当最右边的值介于 30 和 50 之间(由 5 分隔,所以 30、35、40...)时,元素被视为 有效。这意味着,如果该行对于所有 TST1
到 TST5
都没有值,则不会计算任何内容。如果在 NOT_DONE
INCOMP
或 UNTESTED
的左侧发现数值,则该数值未通过验证。
换句话说,我需要从右到左数每一行。
例如,在下面的 table 中,只有 6 个元素被认为是有效的。
最后,我需要统计其中有多少属于A组或B组。
我最初解决这个问题的想法是创建一个包含所有经过验证的元素的新列,但我真的不确定该怎么做。
我正在使用 python 2.7 和 pandas 0.24.2。我是新手,非常感谢任何帮助或指导。
+-------+----------+----------+----------+--------+----------+
| Group | TST1 | TST2 | TST3 | TST4 | TST5 |
+-------+----------+----------+----------+--------+----------+
| A | | NOT_DONE | | | 50 |
+-------+----------+----------+----------+--------+----------+
| A | | | 35 | | |
+-------+----------+----------+----------+--------+----------+
| B | | | | | |
+-------+----------+----------+----------+--------+----------+
| A | | | INCOMP | | |
+-------+----------+----------+----------+--------+----------+
| B | UNTESTED | | 50 | INCOMP | |
+-------+----------+----------+----------+--------+----------+
| B | | | | | |
+-------+----------+----------+----------+--------+----------+
| B | | 30 | | | |
+-------+----------+----------+----------+--------+----------+
| A | | INCOMP | 40 | | |
+-------+----------+----------+----------+--------+----------+
| B | | | | | UNTESTED |
+-------+----------+----------+----------+--------+----------+
| A | | | | | |
+-------+----------+----------+----------+--------+----------+
| B | | INCOMP | | | |
+-------+----------+----------+----------+--------+----------+
| A | | | | | |
+-------+----------+----------+----------+--------+----------+
| B | | 50 | | | |
+-------+----------+----------+----------+--------+----------+
| B | | | UNTESTED | 35 | NOT_DONE |
+-------+----------+----------+----------+--------+----------+
| B | | | | | |
+-------+----------+----------+----------+--------+----------+
| A | | 40 | | INCOMP | |
+-------+----------+----------+----------+--------+----------+
| A | | | | 30 | |
+-------+----------+----------+----------+--------+----------+
| B | | | | | |
+-------+----------+----------+----------+--------+----------+
| B | | NOT_DONE | | 30 | NOT_DONE |
+-------+----------+----------+----------+--------+----------+
编辑:
这是我尝试过的方法,但它计算了所有呈现数值的行,而不是最右边的值为数值的行。我真的不知道如何 select 从右边开始。
filter1 = df.loc[:, 'TST1':'TST5']\
.apply(lambda x: x.astype(str).str.match(r'\d+\.*\d*'), axis=0)\
.any(axis=1)
number_validated = filter1.sum()
print "Number of validated items: ", number_validated
预期输出应该只是一个简短的文本摘要:
Number of validated items: 5
Number of group A validated items: 4
Number of group B validated items: 2
另一个选项,在 python 2.7.18 和 pandas 0.24.2 上测试(虽然它在 python 3 中工作正常):
用ffill
to extract the rightmost values and to_numeric
将他们强制转换成数字:
rightmost = df.filter(like='TST').ffill(axis='columns').iloc[:, -1]
rightmost = pd.to_numeric(rightmost, errors='coerce')
# 0 NaN
# 1 35.0
# 2 NaN
# 3 NaN
# 4 NaN
# 5 NaN
# 6 30.0
# 7 40.0
# 8 NaN
# 9 NaN
# 10 NaN
# 11 NaN
# 12 50.0
# 13 NaN
# 14 NaN
# 15 NaN
# 16 30.0
# 17 NaN
# 18 NaN
# Name: TST5, dtype: float64
然后groupby
the Group
and check if they are between
30和50(含):
valid = rightmost.groupby(df.Group).apply(
lambda g: g.between(30, 50, inclusive='both').sum()
).to_frame('Valid')
# Valid
# Group
# A 3
# B 2
下面,我有一个 table,其中 TST1
到 TST5
列可以没有值或以下值之一:NOT_DONE
INCOMP
UNTESTED
30
35
40
45
50
我需要计算从下面的 table 验证的元素(行)的数量。
当最右边的值介于 30 和 50 之间(由 5 分隔,所以 30、35、40...)时,元素被视为 有效。这意味着,如果该行对于所有 TST1
到 TST5
都没有值,则不会计算任何内容。如果在 NOT_DONE
INCOMP
或 UNTESTED
的左侧发现数值,则该数值未通过验证。
换句话说,我需要从右到左数每一行。
例如,在下面的 table 中,只有 6 个元素被认为是有效的。
最后,我需要统计其中有多少属于A组或B组。
我最初解决这个问题的想法是创建一个包含所有经过验证的元素的新列,但我真的不确定该怎么做。
我正在使用 python 2.7 和 pandas 0.24.2。我是新手,非常感谢任何帮助或指导。
+-------+----------+----------+----------+--------+----------+
| Group | TST1 | TST2 | TST3 | TST4 | TST5 |
+-------+----------+----------+----------+--------+----------+
| A | | NOT_DONE | | | 50 |
+-------+----------+----------+----------+--------+----------+
| A | | | 35 | | |
+-------+----------+----------+----------+--------+----------+
| B | | | | | |
+-------+----------+----------+----------+--------+----------+
| A | | | INCOMP | | |
+-------+----------+----------+----------+--------+----------+
| B | UNTESTED | | 50 | INCOMP | |
+-------+----------+----------+----------+--------+----------+
| B | | | | | |
+-------+----------+----------+----------+--------+----------+
| B | | 30 | | | |
+-------+----------+----------+----------+--------+----------+
| A | | INCOMP | 40 | | |
+-------+----------+----------+----------+--------+----------+
| B | | | | | UNTESTED |
+-------+----------+----------+----------+--------+----------+
| A | | | | | |
+-------+----------+----------+----------+--------+----------+
| B | | INCOMP | | | |
+-------+----------+----------+----------+--------+----------+
| A | | | | | |
+-------+----------+----------+----------+--------+----------+
| B | | 50 | | | |
+-------+----------+----------+----------+--------+----------+
| B | | | UNTESTED | 35 | NOT_DONE |
+-------+----------+----------+----------+--------+----------+
| B | | | | | |
+-------+----------+----------+----------+--------+----------+
| A | | 40 | | INCOMP | |
+-------+----------+----------+----------+--------+----------+
| A | | | | 30 | |
+-------+----------+----------+----------+--------+----------+
| B | | | | | |
+-------+----------+----------+----------+--------+----------+
| B | | NOT_DONE | | 30 | NOT_DONE |
+-------+----------+----------+----------+--------+----------+
编辑: 这是我尝试过的方法,但它计算了所有呈现数值的行,而不是最右边的值为数值的行。我真的不知道如何 select 从右边开始。
filter1 = df.loc[:, 'TST1':'TST5']\
.apply(lambda x: x.astype(str).str.match(r'\d+\.*\d*'), axis=0)\
.any(axis=1)
number_validated = filter1.sum()
print "Number of validated items: ", number_validated
预期输出应该只是一个简短的文本摘要:
Number of validated items: 5
Number of group A validated items: 4
Number of group B validated items: 2
另一个选项,在 python 2.7.18 和 pandas 0.24.2 上测试(虽然它在 python 3 中工作正常):
用
ffill
to extract the rightmost values andto_numeric
将他们强制转换成数字:rightmost = df.filter(like='TST').ffill(axis='columns').iloc[:, -1] rightmost = pd.to_numeric(rightmost, errors='coerce') # 0 NaN # 1 35.0 # 2 NaN # 3 NaN # 4 NaN # 5 NaN # 6 30.0 # 7 40.0 # 8 NaN # 9 NaN # 10 NaN # 11 NaN # 12 50.0 # 13 NaN # 14 NaN # 15 NaN # 16 30.0 # 17 NaN # 18 NaN # Name: TST5, dtype: float64
然后
groupby
theGroup
and check if they arebetween
30和50(含):valid = rightmost.groupby(df.Group).apply( lambda g: g.between(30, 50, inclusive='both').sum() ).to_frame('Valid') # Valid # Group # A 3 # B 2