Python3,同pandas.dataframe,如何select某些数据按一定规则显示
Python3, with pandas.dataframe, how to select certain data by some rules to show
我有一个pandas.dataframe,我想select一些规则的某些数据。
以下代码生成数据帧
import datetime
import pandas as pd
import numpy as np
today = datetime.date.today()
dates = list()
for k in range(10):
a_day = today - datetime.timedelta(days=k)
dates.append(np.datetime64(a_day))
np.random.seed(5)
df = pd.DataFrame(np.random.randint(100, size=(10, 3)),
columns=('other1', 'actual', 'other2'),
index=['{}'.format(i) for i in range(10)])
df.insert(0, 'dates', dates)
df['err_m'] = np.random.rand(10, 1)*0.1
df['std'] = np.random.rand(10, 1)*0.05
df['gain'] = np.random.rand(10, 1)
现在,我要 select 遵循以下规则:
1. compute the sum of 'err_m' and 'std', then sort the df so that the sum is descending
2. from the result of step 1, select the part where 'actual' is > 50
谢谢
创建一个新列,然后按此列排序:
df['errsum'] = df['err_m'] + df['std']
# Return a sorted dataframe
df_sorted = df.sort('errsum', ascending = False)
Select你想要的台词
# Create an array with True where the condition is met
selector = df_sorted['errsum'] > 50
# Return a view of sorted_dataframe with only the lines you want
df_sorted[selector]
我有一个pandas.dataframe,我想select一些规则的某些数据。
以下代码生成数据帧
import datetime
import pandas as pd
import numpy as np
today = datetime.date.today()
dates = list()
for k in range(10):
a_day = today - datetime.timedelta(days=k)
dates.append(np.datetime64(a_day))
np.random.seed(5)
df = pd.DataFrame(np.random.randint(100, size=(10, 3)),
columns=('other1', 'actual', 'other2'),
index=['{}'.format(i) for i in range(10)])
df.insert(0, 'dates', dates)
df['err_m'] = np.random.rand(10, 1)*0.1
df['std'] = np.random.rand(10, 1)*0.05
df['gain'] = np.random.rand(10, 1)
现在,我要 select 遵循以下规则:
1. compute the sum of 'err_m' and 'std', then sort the df so that the sum is descending
2. from the result of step 1, select the part where 'actual' is > 50
谢谢
创建一个新列,然后按此列排序:
df['errsum'] = df['err_m'] + df['std'] # Return a sorted dataframe df_sorted = df.sort('errsum', ascending = False)
Select你想要的台词
# Create an array with True where the condition is met selector = df_sorted['errsum'] > 50 # Return a view of sorted_dataframe with only the lines you want df_sorted[selector]