创建具有一定价格百分比的新功能
creating new features with certain percentile of price
我正在处理一个外汇分类问题,需要帮助来创建以下详细功能,我在下面分享了我的代码,还附上了图片作为手头问题的直观参考。
特征:opensimilarclose
(如果开盘价为 1 = 收盘价加减 2 点,否则为 0)
特征:开闭低
(如果开盘价和收盘价均 > 蜡烛大小的 90%,则为 1,否则为 0)
特征:openclosehigh
(如果开盘价和收盘价均小于蜡烛尺寸的 10%,则为 1,否则为 0)
MY CODE:
data['opensimilarclose'] = np.where(data.Open-data.Close<=0.02, 1,0)
data['openclosehigh'] = np.where((abs(data.Close-data.Low)>=abs(data.High-data.Low)*0.9 and ()), 1, 0)
data['opencloselow'] = np.where(abs(data.Close-data.Low)<=abs(data.High-data.Low)*0.1, 1, 0)
请查找以下数据示例:
Date Timestamp Open High Low Close Volume
2004-01-01 00:00:00 414.92199999999997 414.92199999999997 414.23199999999997 414.55800000000005 0.738269000896253
2004-01-02 00:00:00 414.32199999999995 416.098 413.86699999999996 415.395 3.82642700810902
2004-01-04 00:00:00 414.278 414.69800000000004 414.096 414.444 0.0564850000591832
2004-01-05 00:00:00 415.376 423.981 414.23400000000004 421.89300000000003 10.4188560213806
2004-01-06 00:00:00 422.332 430.17800000000005 420.07800000000003 421.777 11.182643023759699
2004-01-07 00:00:00 420.773 424.121 418.974 419.626 11.956311026187901
2004-01-08 00:00:00 419.574 424.798 416.27 423.298 12.439296027514501
2004-01-09 00:00:00 423.298 426.897 419.42699999999996 425.404 9.2499640192309
2004-01-11 00:00:00 426.49800000000005 426.49800000000005 425.876 426.23 0.0673800002332428
2004-01-12 00:00:00 425.853 428.459 422.219 424.598 10.6995250192995
2004-01-13 00:00:00 424.598 426.395 421.651 423.69800000000004 11.1990780260712
2004-01-14 00:00:00 423.389 424.397 416.78 419.298 10.835633025399101
2004-01-15 00:00:00 418.98 421.098 406.906 408.44699999999995 12.266192030985598
2004-01-16 00:00:00 408.546 410.398 404.43300000000005 406.298 9.26100601695725
2004-01-18 00:00:00 405.842 406.098 405.543 405.75300000000004 0.0658050001220545
2004-01-19 00:00:00 407.18800000000005 408.68300000000005 405.402 406.751 5.688531011830491
2004-01-20 00:00:00 406.449 412.69699999999995 404.417 411.921 10.6885030245794
2004-01-21 00:00:00 411.99800000000005 412.91 406.721 409.832 10.672994028404
2004-01-22 00:00:00 410.043 412.69800000000004 407.216 409.033 9.949593026152801
2004-01-23 00:00:00 409.398 412.29699999999997 405.461 407.398 8.921345019130971
您的代码中有几个小错误:
- 你只检查 Open-Close 是否小于 0.02,而忘记检查绝对值(如果 open=5 和 close=8 并且仍然小于 0.02)
- "openclosehigh" 和 "opencloselow" 在您的代码中与您所说的不同。只考虑收盘价。
我个人更喜欢直接使用 pandas 而不是 where
因为它不需要 - 你有一个简单的条件。
检查以下示例:
import pandas as pd
df = pd.DataFrame({"Open": [4, 3.6, 7, 6], "Close": [4.1, 3.5, 6.7, 6.8], "High": [4.12, 3.6, 7.02, 6.8], "Low":[4, 3.498, 6.7, 5.7]})
df["opensimilarclose"] = (abs(df["Open"] - df["Close"]) <= 0.02).astype(int)
df["relative_open"] = (df["Open"] - df["Low"]) / (df["High"] - df["Low"])
df["relative_close"] = (df["Close"] - df["Low"]) / (df["High"] - df["Low"])
df["openclosehigh"] = ((df["relative_open"] > 0.9) & (df["relative_close"] > 0.9)).astype(int)
df["opencloselow"] = ((df["relative_open"] < 0.1) & (df["relative_close"] < 0.1)).astype(int)
第3行计算opensimiliarclose直接询问open和close的绝对差值是否小于0.02。这是一个条件,所以结果是 True/False。要更改为 1/0,我添加了 .astype(int)
。在我看来,这种直接对所有列应用条件的格式比使用 where
.
更方便
那么对于你的第二列和第三列,我虽然先计算百分比然后检查条件更方便。 “relative_open”和“relative_close”列包含 open/close 的百分比,并且仅在接下来的两行中,我才以两者为条件来填充“opencloselow”和“openclosehigh”。您可以通过 drop
或 loc
删除所有其他列的额外列。您也可以将结果作为临时系列而不是额外的列 (tmp_series = (df["Close"]...
).
更好地投资
您可以使用以下代码可视化蜡烛...您可以看到红色和绿色蜡烛...蜡烛的颜色由之前的收盘价决定...但这里我没有使用之前的收盘价...
import matplotlib.pyplot as plt
import plotly.graph_objects as go
fig = go.Figure(data=[go.Candlestick(x=df.index,
open=df['Open'],
high=df['High'],
low=df['Low'],
close=df['Close'])],
layout={'height':500,'width':1000})
fig.update_layout(xaxis_rangeslider_visible=False)
fig.show()
这是行业标准。我了解到您采用了之前的收盘价并计算了其他特征....
我正在处理一个外汇分类问题,需要帮助来创建以下详细功能,我在下面分享了我的代码,还附上了图片作为手头问题的直观参考。
特征:opensimilarclose (如果开盘价为 1 = 收盘价加减 2 点,否则为 0)
特征:开闭低 (如果开盘价和收盘价均 > 蜡烛大小的 90%,则为 1,否则为 0)
特征:openclosehigh (如果开盘价和收盘价均小于蜡烛尺寸的 10%,则为 1,否则为 0)
MY CODE:
data['opensimilarclose'] = np.where(data.Open-data.Close<=0.02, 1,0)
data['openclosehigh'] = np.where((abs(data.Close-data.Low)>=abs(data.High-data.Low)*0.9 and ()), 1, 0)
data['opencloselow'] = np.where(abs(data.Close-data.Low)<=abs(data.High-data.Low)*0.1, 1, 0)
请查找以下数据示例:
Date Timestamp Open High Low Close Volume
2004-01-01 00:00:00 414.92199999999997 414.92199999999997 414.23199999999997 414.55800000000005 0.738269000896253
2004-01-02 00:00:00 414.32199999999995 416.098 413.86699999999996 415.395 3.82642700810902
2004-01-04 00:00:00 414.278 414.69800000000004 414.096 414.444 0.0564850000591832
2004-01-05 00:00:00 415.376 423.981 414.23400000000004 421.89300000000003 10.4188560213806
2004-01-06 00:00:00 422.332 430.17800000000005 420.07800000000003 421.777 11.182643023759699
2004-01-07 00:00:00 420.773 424.121 418.974 419.626 11.956311026187901
2004-01-08 00:00:00 419.574 424.798 416.27 423.298 12.439296027514501
2004-01-09 00:00:00 423.298 426.897 419.42699999999996 425.404 9.2499640192309
2004-01-11 00:00:00 426.49800000000005 426.49800000000005 425.876 426.23 0.0673800002332428
2004-01-12 00:00:00 425.853 428.459 422.219 424.598 10.6995250192995
2004-01-13 00:00:00 424.598 426.395 421.651 423.69800000000004 11.1990780260712
2004-01-14 00:00:00 423.389 424.397 416.78 419.298 10.835633025399101
2004-01-15 00:00:00 418.98 421.098 406.906 408.44699999999995 12.266192030985598
2004-01-16 00:00:00 408.546 410.398 404.43300000000005 406.298 9.26100601695725
2004-01-18 00:00:00 405.842 406.098 405.543 405.75300000000004 0.0658050001220545
2004-01-19 00:00:00 407.18800000000005 408.68300000000005 405.402 406.751 5.688531011830491
2004-01-20 00:00:00 406.449 412.69699999999995 404.417 411.921 10.6885030245794
2004-01-21 00:00:00 411.99800000000005 412.91 406.721 409.832 10.672994028404
2004-01-22 00:00:00 410.043 412.69800000000004 407.216 409.033 9.949593026152801
2004-01-23 00:00:00 409.398 412.29699999999997 405.461 407.398 8.921345019130971
您的代码中有几个小错误:
- 你只检查 Open-Close 是否小于 0.02,而忘记检查绝对值(如果 open=5 和 close=8 并且仍然小于 0.02)
- "openclosehigh" 和 "opencloselow" 在您的代码中与您所说的不同。只考虑收盘价。
我个人更喜欢直接使用 pandas 而不是 where
因为它不需要 - 你有一个简单的条件。
检查以下示例:
import pandas as pd
df = pd.DataFrame({"Open": [4, 3.6, 7, 6], "Close": [4.1, 3.5, 6.7, 6.8], "High": [4.12, 3.6, 7.02, 6.8], "Low":[4, 3.498, 6.7, 5.7]})
df["opensimilarclose"] = (abs(df["Open"] - df["Close"]) <= 0.02).astype(int)
df["relative_open"] = (df["Open"] - df["Low"]) / (df["High"] - df["Low"])
df["relative_close"] = (df["Close"] - df["Low"]) / (df["High"] - df["Low"])
df["openclosehigh"] = ((df["relative_open"] > 0.9) & (df["relative_close"] > 0.9)).astype(int)
df["opencloselow"] = ((df["relative_open"] < 0.1) & (df["relative_close"] < 0.1)).astype(int)
第3行计算opensimiliarclose直接询问open和close的绝对差值是否小于0.02。这是一个条件,所以结果是 True/False。要更改为 1/0,我添加了 .astype(int)
。在我看来,这种直接对所有列应用条件的格式比使用 where
.
那么对于你的第二列和第三列,我虽然先计算百分比然后检查条件更方便。 “relative_open”和“relative_close”列包含 open/close 的百分比,并且仅在接下来的两行中,我才以两者为条件来填充“opencloselow”和“openclosehigh”。您可以通过 drop
或 loc
删除所有其他列的额外列。您也可以将结果作为临时系列而不是额外的列 (tmp_series = (df["Close"]...
).
更好地投资 您可以使用以下代码可视化蜡烛...您可以看到红色和绿色蜡烛...蜡烛的颜色由之前的收盘价决定...但这里我没有使用之前的收盘价...
import matplotlib.pyplot as plt
import plotly.graph_objects as go
fig = go.Figure(data=[go.Candlestick(x=df.index,
open=df['Open'],
high=df['High'],
low=df['Low'],
close=df['Close'])],
layout={'height':500,'width':1000})
fig.update_layout(xaxis_rangeslider_visible=False)
fig.show()
这是行业标准。我了解到您采用了之前的收盘价并计算了其他特征....