多列的测试训练数据框
Test Train Dataframe for multiple columns
我有一个 csv 文件
Date,Open,High,Low,Close,Adj Close,Volume,Cash EPS,Book Value,Div/share,Net profit/share,NPM,ROE,ROCE,ROA,DEBT/EQ,ATR,CR
2004-04-26,82.924217,82.924217,82.924217,82.924217,60.026066,0,221.24,488.21,129.5,186.6,26.11,38.22,38.22,24.2,0,92.67,1.65
2004-04-27,82.778122,82.778122,79.765625,80.24453,58.086323,28616000,221.24,488.21,129.5,186.6,26.11,38.22,38.22,24.2,0,92.67,1.65
只给出 2 行以便于计算。我创建了一个数据框
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from sklearn.preprocessing import MinMaxScaler
dataframe1 = pd.read_csv('test.csv')
df = dataframe1.dropna()
scaler=MinMaxScaler(feature_range=(0,1))
df1=scaler.fit_transform(np.array(df1).reshape(-1,1))
min_max_scaler = MinMaxScaler()
df[["Open", "High", "Low", "Close", "Adj Close", "Volume", "Book Value", "Div/share", "Net profit/share", "NPM", "ROE", "ROCE", "ROA", "DEBT/EQ", "ATR", "CR"]] = min_max_scaler.fit_transform(df[["Open", "High", "Low", "Close", "Adj Close", "Volume", "Book Value", "Div/share", "Net profit/share", "NPM", "ROE", "ROCE", "ROA", "DEBT/EQ", "ATR", "CR"]])
要训练数据集,我需要日期和预测,即关闭列。
但是,关闭列值取决于多个列(即此 csv 中存在的所有列)
如何基于所有其他列训练日期和关闭列的数据,以便可以预测未来的关闭?
如果我理解这个问题,那么您正在寻找多变量时间序列模型。换句话说,它需要为每个时间步长输入多个变量才能做出前瞻性预测。下面是一些示例 link:
https://www.relataly.com/stock-market-prediction-with-multivariate-time-series-in-python/1815/
此外,我建议查看 Kaggle 股票市场预测竞赛,这里有数百个关于人们如何解决这个问题的例子。
我有一个 csv 文件
Date,Open,High,Low,Close,Adj Close,Volume,Cash EPS,Book Value,Div/share,Net profit/share,NPM,ROE,ROCE,ROA,DEBT/EQ,ATR,CR
2004-04-26,82.924217,82.924217,82.924217,82.924217,60.026066,0,221.24,488.21,129.5,186.6,26.11,38.22,38.22,24.2,0,92.67,1.65
2004-04-27,82.778122,82.778122,79.765625,80.24453,58.086323,28616000,221.24,488.21,129.5,186.6,26.11,38.22,38.22,24.2,0,92.67,1.65
只给出 2 行以便于计算。我创建了一个数据框
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from sklearn.preprocessing import MinMaxScaler
dataframe1 = pd.read_csv('test.csv')
df = dataframe1.dropna()
scaler=MinMaxScaler(feature_range=(0,1))
df1=scaler.fit_transform(np.array(df1).reshape(-1,1))
min_max_scaler = MinMaxScaler()
df[["Open", "High", "Low", "Close", "Adj Close", "Volume", "Book Value", "Div/share", "Net profit/share", "NPM", "ROE", "ROCE", "ROA", "DEBT/EQ", "ATR", "CR"]] = min_max_scaler.fit_transform(df[["Open", "High", "Low", "Close", "Adj Close", "Volume", "Book Value", "Div/share", "Net profit/share", "NPM", "ROE", "ROCE", "ROA", "DEBT/EQ", "ATR", "CR"]])
要训练数据集,我需要日期和预测,即关闭列。 但是,关闭列值取决于多个列(即此 csv 中存在的所有列)
如何基于所有其他列训练日期和关闭列的数据,以便可以预测未来的关闭?
如果我理解这个问题,那么您正在寻找多变量时间序列模型。换句话说,它需要为每个时间步长输入多个变量才能做出前瞻性预测。下面是一些示例 link:
https://www.relataly.com/stock-market-prediction-with-multivariate-time-series-in-python/1815/
此外,我建议查看 Kaggle 股票市场预测竞赛,这里有数百个关于人们如何解决这个问题的例子。