我有 12 个 pd 数据帧,我想从每个数据帧中提取一列并作为新的 df 传递并根据源 df 重命名
I have 12 pd dataframes , i want to extract one column from each and pass as new df and rename based on source df
**我需要将所有的“adj close”提取到一个新的 DF 中,并根据源重命名,根据日期映射
new_DF = date AAL AAPL ALK .....(含adj close)
请帮忙**
AAL = pd.read_csv("AAL.csv")
AAPL = pd.read_csv("AAPL.csv")
ALK = pd.read_csv("ALK.csv")
亚马逊=pd.read_csv("AMZN.csv")
BHC = pd.read_csv("BHC.csv")
CS = pd.read_csv("CS.csv")
DB = pd.read_csv("DB.csv")
GS = pd.read_csv("GS.csv")
GOOG = pd.read_csv("GOOG.csv")
HA = pd.read_csv("HA.csv")
JNJ = pd.read_csv("JNJ.csv")
MRK = pd.read_csv("MRK.csv")
SP500 = pd.read_csv("S&P500.csv")
df = 日期 |打开|高 |低|关闭| adj 关闭 |音量
试试这个:
# load csv data
# define relative path to folder containing csv data
files_folder = '/path/to/csv/'
# load all csv files in one dataframe
df_list = []
for file in glob.glob(os.path.join(files_folder, '*.csv')):
df = pd.read_csv(file)
# write here column you want to select
df_column = df['column_name'].rename(columns={'column_name':file[:-4]})
df_list.append(df_column)
# concatenate the list of dataframes into one
df_final = pd.concat(df_list, axis=1)
这是一个例子。没有您的 .csv 文件意味着我们需要在如何获取数据方面发挥创意,但假设您有 dict
个 DataFrames
,每个代码一个。
这里我们使用Yahoo finance来得到相似的数据。我们要查找的列 ('adj close'
) 不在该数据中,因此对于此示例,我们将改用 Close
。
import yfinance as yf
tickers = ['AAL', 'AAPL', 'AMZN', 'GOOG']
sources = {ticker: yf.Ticker(ticker).history(period='5d') for ticker in tickers}
此时,我们已经获得了每个代码的数据。例如:
>>> sources['AAPL']
Open High Low Close Volume Dividends Stock Splits
Date
2022-03-30 178.550003 179.610001 176.699997 177.770004 92633200 0 0
2022-03-31 177.839996 178.029999 174.399994 174.610001 103049300 0 0
2022-04-01 174.029999 174.880005 171.940002 174.309998 78699800 0 0
2022-04-04 174.570007 178.490005 174.440002 178.440002 76468400 0 0
2022-04-05 177.500000 178.300003 174.419998 175.059998 73311300 0 0
在您的情况下,您将从 CSV 文件中获取数据,因此:
sources = {k: pd.read_csv(f'{k}.csv').set_index('Date') for k in tickers}
现在,回答你的问题:
df = pd.concat([v['Close'].to_frame(k) for k, v in sources.items()], axis=1)
>>> df
AAL AAPL AMZN GOOG
Date
2022-03-30 18.049999 177.770004 3326.020020 2852.889893
2022-03-31 18.250000 174.610001 3259.949951 2792.989990
2022-04-01 18.240000 174.309998 3271.199951 2814.000000
2022-04-04 18.230000 178.440002 3366.929932 2872.850098
2022-04-05 17.840000 175.059998 3281.100098 2821.260010
同样,在您的情况下,您会 select 'adj close'
列。
**我需要将所有的“adj close”提取到一个新的 DF 中,并根据源重命名,根据日期映射
new_DF = date AAL AAPL ALK .....(含adj close) 请帮忙**
AAL = pd.read_csv("AAL.csv")
AAPL = pd.read_csv("AAPL.csv")
ALK = pd.read_csv("ALK.csv")
亚马逊=pd.read_csv("AMZN.csv")
BHC = pd.read_csv("BHC.csv")
CS = pd.read_csv("CS.csv")
DB = pd.read_csv("DB.csv")
GS = pd.read_csv("GS.csv")
GOOG = pd.read_csv("GOOG.csv")
HA = pd.read_csv("HA.csv")
JNJ = pd.read_csv("JNJ.csv")
MRK = pd.read_csv("MRK.csv")
SP500 = pd.read_csv("S&P500.csv")
df = 日期 |打开|高 |低|关闭| adj 关闭 |音量
试试这个:
# load csv data
# define relative path to folder containing csv data
files_folder = '/path/to/csv/'
# load all csv files in one dataframe
df_list = []
for file in glob.glob(os.path.join(files_folder, '*.csv')):
df = pd.read_csv(file)
# write here column you want to select
df_column = df['column_name'].rename(columns={'column_name':file[:-4]})
df_list.append(df_column)
# concatenate the list of dataframes into one
df_final = pd.concat(df_list, axis=1)
这是一个例子。没有您的 .csv 文件意味着我们需要在如何获取数据方面发挥创意,但假设您有 dict
个 DataFrames
,每个代码一个。
这里我们使用Yahoo finance来得到相似的数据。我们要查找的列 ('adj close'
) 不在该数据中,因此对于此示例,我们将改用 Close
。
import yfinance as yf
tickers = ['AAL', 'AAPL', 'AMZN', 'GOOG']
sources = {ticker: yf.Ticker(ticker).history(period='5d') for ticker in tickers}
此时,我们已经获得了每个代码的数据。例如:
>>> sources['AAPL']
Open High Low Close Volume Dividends Stock Splits
Date
2022-03-30 178.550003 179.610001 176.699997 177.770004 92633200 0 0
2022-03-31 177.839996 178.029999 174.399994 174.610001 103049300 0 0
2022-04-01 174.029999 174.880005 171.940002 174.309998 78699800 0 0
2022-04-04 174.570007 178.490005 174.440002 178.440002 76468400 0 0
2022-04-05 177.500000 178.300003 174.419998 175.059998 73311300 0 0
在您的情况下,您将从 CSV 文件中获取数据,因此:
sources = {k: pd.read_csv(f'{k}.csv').set_index('Date') for k in tickers}
现在,回答你的问题:
df = pd.concat([v['Close'].to_frame(k) for k, v in sources.items()], axis=1)
>>> df
AAL AAPL AMZN GOOG
Date
2022-03-30 18.049999 177.770004 3326.020020 2852.889893
2022-03-31 18.250000 174.610001 3259.949951 2792.989990
2022-04-01 18.240000 174.309998 3271.199951 2814.000000
2022-04-04 18.230000 178.440002 3366.929932 2872.850098
2022-04-05 17.840000 175.059998 3281.100098 2821.260010
同样,在您的情况下,您会 select 'adj close'
列。