如何使用并发将数据帧附加到空数据帧
How to append dataframe to an empty dataframe using concurrent
我想 运行 使用 Python 中的 concurrent
的函数。这是我的功能:
import concurrent.futures
import pandas as pd
import time
def putIndf(file):
listSel = getline(file)
datFram = savetoDataFrame(listSel)
return datFram #datatype : dataframe
def main():
newData = pd.DataFrame()
with concurrent.futures.ProcessPoolExecutor(max_workers=30) as executor:
for i,file in zip(fileList, executor.map(dp.putIndf, fileList)):
df = newData.append(file, ignore_index=True)
return df
if __name__ == '__main__':
main()
我想将数据帧加入一个数据帧newData
,但结果只是该函数的最后一个数据帧
基本上你是 re-assigning df 每次迭代并且永远不会增长它。您可能的意思 (ill-advised) 是初始化一个空的 df 并迭代追加:
df = pd.DataFrame()
...
df = df.append(file, ignore_index=True)
尽管如此,首选方法是构建一个数据帧的集合,以便在循环外一次 一起附加,并避免在循环内增长任何复杂的对象,例如数据帧。
def main():
with concurrent.futures.ProcessPoolExecutor(max_workers=30) as executor:
# LIST COMPREHENSION
df_list = [file for i,file in zip(fileList, executor.map(dp.putIndf, fileList))]
# DICTIONARY COMPREHENSION
# df_dict = {i:file for i,file in zip(fileList, executor.map(dp.putIndf, fileList))}
df = pd.concat(df_list, ignore_index=True)
return df
或者由于您的池进程,将数据帧附加到列表,在循环外仍然连接一次:
def main():
df_list = [] # df_dict = {}
with concurrent.futures.ProcessPoolExecutor(max_workers=30) as executor:
for i,file in zip(fileList, executor.map(dp.putIndf, fileList)):
df_list.append(file)
# df_dict[i] = file
df = pd.concat(df_list, ignore_index=True)
return df
我想 运行 使用 Python 中的 concurrent
的函数。这是我的功能:
import concurrent.futures
import pandas as pd
import time
def putIndf(file):
listSel = getline(file)
datFram = savetoDataFrame(listSel)
return datFram #datatype : dataframe
def main():
newData = pd.DataFrame()
with concurrent.futures.ProcessPoolExecutor(max_workers=30) as executor:
for i,file in zip(fileList, executor.map(dp.putIndf, fileList)):
df = newData.append(file, ignore_index=True)
return df
if __name__ == '__main__':
main()
我想将数据帧加入一个数据帧newData
,但结果只是该函数的最后一个数据帧
基本上你是 re-assigning df 每次迭代并且永远不会增长它。您可能的意思 (ill-advised) 是初始化一个空的 df 并迭代追加:
df = pd.DataFrame()
...
df = df.append(file, ignore_index=True)
尽管如此,首选方法是构建一个数据帧的集合,以便在循环外一次 一起附加,并避免在循环内增长任何复杂的对象,例如数据帧。
def main():
with concurrent.futures.ProcessPoolExecutor(max_workers=30) as executor:
# LIST COMPREHENSION
df_list = [file for i,file in zip(fileList, executor.map(dp.putIndf, fileList))]
# DICTIONARY COMPREHENSION
# df_dict = {i:file for i,file in zip(fileList, executor.map(dp.putIndf, fileList))}
df = pd.concat(df_list, ignore_index=True)
return df
或者由于您的池进程,将数据帧附加到列表,在循环外仍然连接一次:
def main():
df_list = [] # df_dict = {}
with concurrent.futures.ProcessPoolExecutor(max_workers=30) as executor:
for i,file in zip(fileList, executor.map(dp.putIndf, fileList)):
df_list.append(file)
# df_dict[i] = file
df = pd.concat(df_list, ignore_index=True)
return df