如何使用 linux/python 在 CSV 文件中创建派生列?
How to create a derived column in a CSV file using linux/python?
我有一个包含以下各列的 CSV(示例)文件
PC_name,Time,Plant,Section,PC_value
35901052,2017-08-01 05:50,MIYAKONOJO,MIYAKONOJO_05,0.000
35901052,2017-08-01 05:51,MIYAKONOJO,MIYAKONOJO_05,0.000
35901052,2017-08-01 05:56,MIYAKONOJO,MIYAKONOJO_05,0.000
35901052,2017-08-01 06:01,MIYAKONOJO,MIYAKONOJO_05,0.000
35901052,2017-08-01 06:06,MIYAKONOJO,MIYAKONOJO_05,0.000
我想要一个基于“Time”列的新列“New”,如下所述
如果我的时间戳范围在 6pm(18:00) 到 6am(06:00) 之间,那么该值应该是“晚上" else "白天"
示例输出:
PC_name,Time,Plant,Section,PC_value,New
35901052,2017-08-01 05:50,MIYAKONOJO,MIYAKONOJO_05,0.000,Night
35901052,2017-08-01 05:51,MIYAKONOJO,MIYAKONOJO_05,0.000,Night
35901052,2017-08-01 05:56,MIYAKONOJO,MIYAKONOJO_05,0.000,Night
35901052,2017-08-01 06:01,MIYAKONOJO,MIYAKONOJO_05,0.000,Day
35901052,2017-08-01 06:06,MIYAKONOJO,MIYAKONOJO_05,0.000,Day
如果您可以使用 pandas 和 numpy,请使用 numpy.where and pandas.Series.dt.hour
执行以下操作
df=pd.read_csv('filename.csv',parse_dates=['Time'])
df['New'] = np.where((df.Time.dt.hour > 5) & (df.Time.dt.hour <18),'Day','Night')
df>>
PC_name Time Plant Section PC_value New
0 35901052 2017-08-01 05:50:00 MIYAKONOJO MIYAKONOJO_05 0.0 Night
1 35901052 2017-08-01 05:51:00 MIYAKONOJO MIYAKONOJO_05 0.0 Night
2 35901052 2017-08-01 05:56:00 MIYAKONOJO MIYAKONOJO_05 0.0 Night
3 35901052 2017-08-01 06:01:00 MIYAKONOJO MIYAKONOJO_05 0.0 Day
4 35901052 2017-08-01 06:06:00 MIYAKONOJO MIYAKONOJO_05 0.0 Day
df.to_csv('New_filename.csv')
您可以将系列转换为日期时间并提取小时。然后将其映射到值
df["Time"] = pd.to_datetime(df["Time"])
df["New"] = df["Time"].dt.hour.map({hour: "Night" if 18 < hour or hour < 6 else "Day" for hour in range(23)})
输出:
>>> df
PC_name Time Plant Section PC_value New
0 35901052 2017-08-01 05:50:00 MIYAKONOJO MIYAKONOJO_05 0.0 Night
1 35901052 2017-08-01 05:51:00 MIYAKONOJO MIYAKONOJO_05 0.0 Night
2 35901052 2017-08-01 05:56:00 MIYAKONOJO MIYAKONOJO_05 0.0 Night
3 35901052 2017-08-01 06:01:00 MIYAKONOJO MIYAKONOJO_05 0.0 Day
4 35901052 2017-08-01 06:06:00 MIYAKONOJO MIYAKONOJO_05 0.0 Day
我有一个包含以下各列的 CSV(示例)文件
PC_name,Time,Plant,Section,PC_value
35901052,2017-08-01 05:50,MIYAKONOJO,MIYAKONOJO_05,0.000
35901052,2017-08-01 05:51,MIYAKONOJO,MIYAKONOJO_05,0.000
35901052,2017-08-01 05:56,MIYAKONOJO,MIYAKONOJO_05,0.000
35901052,2017-08-01 06:01,MIYAKONOJO,MIYAKONOJO_05,0.000
35901052,2017-08-01 06:06,MIYAKONOJO,MIYAKONOJO_05,0.000
我想要一个基于“Time”列的新列“New”,如下所述
如果我的时间戳范围在 6pm(18:00) 到 6am(06:00) 之间,那么该值应该是“晚上" else "白天"
示例输出:
PC_name,Time,Plant,Section,PC_value,New
35901052,2017-08-01 05:50,MIYAKONOJO,MIYAKONOJO_05,0.000,Night
35901052,2017-08-01 05:51,MIYAKONOJO,MIYAKONOJO_05,0.000,Night
35901052,2017-08-01 05:56,MIYAKONOJO,MIYAKONOJO_05,0.000,Night
35901052,2017-08-01 06:01,MIYAKONOJO,MIYAKONOJO_05,0.000,Day
35901052,2017-08-01 06:06,MIYAKONOJO,MIYAKONOJO_05,0.000,Day
如果您可以使用 pandas 和 numpy,请使用 numpy.where and pandas.Series.dt.hour
执行以下操作df=pd.read_csv('filename.csv',parse_dates=['Time'])
df['New'] = np.where((df.Time.dt.hour > 5) & (df.Time.dt.hour <18),'Day','Night')
df>>
PC_name Time Plant Section PC_value New
0 35901052 2017-08-01 05:50:00 MIYAKONOJO MIYAKONOJO_05 0.0 Night
1 35901052 2017-08-01 05:51:00 MIYAKONOJO MIYAKONOJO_05 0.0 Night
2 35901052 2017-08-01 05:56:00 MIYAKONOJO MIYAKONOJO_05 0.0 Night
3 35901052 2017-08-01 06:01:00 MIYAKONOJO MIYAKONOJO_05 0.0 Day
4 35901052 2017-08-01 06:06:00 MIYAKONOJO MIYAKONOJO_05 0.0 Day
df.to_csv('New_filename.csv')
您可以将系列转换为日期时间并提取小时。然后将其映射到值
df["Time"] = pd.to_datetime(df["Time"])
df["New"] = df["Time"].dt.hour.map({hour: "Night" if 18 < hour or hour < 6 else "Day" for hour in range(23)})
输出:
>>> df
PC_name Time Plant Section PC_value New
0 35901052 2017-08-01 05:50:00 MIYAKONOJO MIYAKONOJO_05 0.0 Night
1 35901052 2017-08-01 05:51:00 MIYAKONOJO MIYAKONOJO_05 0.0 Night
2 35901052 2017-08-01 05:56:00 MIYAKONOJO MIYAKONOJO_05 0.0 Night
3 35901052 2017-08-01 06:01:00 MIYAKONOJO MIYAKONOJO_05 0.0 Day
4 35901052 2017-08-01 06:06:00 MIYAKONOJO MIYAKONOJO_05 0.0 Day