使此 pandas 代码尽可能精简和快速? [迭代大型 DataFrame 和设置]

Making this pandas code as lean and speedy as possible? [iterating over large DataFrames and setting]

就上下文而言,我的主数据集是一个 24541 行 x 1830 列的 DataFrame,其中包含 NaN 或浮点数(股票价格)。我正在处理这个 DataFrame 11 次,每次都在具有相同索引和列的铸造 DataFrame 中设置值。下面是两个 DataFrame 的示例:

data = pd.DataFrame.from_csv(filepath)
data = pd.DataFrame(data=data, dtype=np.float64)

#dataset of daily prices
data.head()

Out[14]: 
            49154  65541  32791  65568  ...  24563  81910  24571  90110
DATE                                    ...                            
1925-12-31    NaN    NaN    NaN    NaN  ...    NaN    NaN    NaN    NaN
1926-01-02    NaN    NaN    NaN    NaN  ...    NaN    NaN    NaN    NaN
1926-01-04    NaN    NaN    NaN    NaN  ...    NaN    NaN    NaN    NaN
1926-01-05    NaN    NaN    NaN    NaN  ...    NaN    NaN    NaN    NaN
1926-01-06    NaN    NaN    NaN    NaN  ...    NaN    NaN    NaN    NaN

[5 rows x 1830 columns]

MA_a_frame = pd.DataFrame(
        data=0,
        index=data.index, 
        columns=data.columns)

#bool DataFrame
MA_a_frame.head()

Out[15]: 
            49154  65541  32791  65568  ...  24563  81910  24571  90110
DATE                                    ...                            
1925-12-31      0      0      0      0  ...      0      0      0      0
1926-01-02      0      0      0      0  ...      0      0      0      0
1926-01-04      0      0      0      0  ...      0      0      0      0
1926-01-05      0      0      0      0  ...      0      0      0      0
1926-01-06      0      0      0      0  ...      0      0      0      0

[5 rows x 1830 columns]

如果满足 DataFrame "data" 中的特定条件,MA_a_frame(以及其他 10 个相同的 DataFrame)中的值将被设置为 1。即,如果 "data" 中的价格在 完全不同的 DataFrame 中计算值的 1% 以内(参数为 "j"),该数据帧是在前一个函数中生成的。因此,每次迭代总共将处理最多 3 个大型 DataFrame。

就我的迭代器而言,我只是使用 data.columns 和 data.index 创建了两个单独的列表("dates" 和 "securities")。所以我实际上是在间接迭代数据的索引和列。事不宜迟,这里是我的程序中总共 运行 11 次的代码基础(我正在尝试加速的部分!):

def gen_a():

    for date in dates:

        for security in securities: 

            try: 

                if type(data.loc[date, security]) is not float:

                    pass
                    #lots of the data is NaN, so skip these altogether

                elif j > math.log(
                        MA_a_csv.loc[date, security]/
                        data.loc[date, security]) > -j:

                    MA_dict['a'].loc[date, security] = 1

                print(f'Passed {date}, {security}')

            except: 

                print(f'Failed {date}, {security}')

现在,问题是这段代码的一个循环需要大约 8 个小时。因此,我预计每个 运行 将近 90 个小时。我有一篇学术论文作为毕业要求到期,截止日期真的开始让我害怕这些数字了!假设我的输出是完美的,事情应该没问题,但如果有人提出可以降低速度的建议,我将永远感激不已。否则,我可能不得不缩小数据范围,从而降低统计分析的能力。

P.S。我正在 运行 通过 Spyder 在 Windows 10 上使用 Intel i7 3970X 进行此操作。我无权使用任何其他计算能力。我考虑过 GPU 加速,但我的 GPU 是 GTX 670,它不是 Pascal,因此与 CuDF 不兼容。

编辑:

这是数据 DataFrame 的后五行:

s.head()
Out[16]: 
            49154      65541  32791  65568  ...  24563  81910  24571  90110
DATE                                        ...                            
2018-12-24  61.55  232.70000    NaN    NaN  ...    NaN  15.71    NaN    NaN
2018-12-26  65.11  244.59000    NaN    NaN  ...    NaN  16.48    NaN    NaN
2018-12-27  64.71  252.17999    NaN    NaN  ...    NaN  16.71    NaN    NaN
2018-12-28  64.96  249.64999    NaN    NaN  ...    NaN  16.55    NaN    NaN
2018-12-31  66.09  254.50000    NaN    NaN  ...    NaN  16.74    NaN    NaN

[5 rows x 1830 columns]

这里是比较数据帧之一的示例:

Out[23]: 
              49154       65541  32791  65568  ...  24563    81910  24571  90110
DATE                                           ...                              
2018-12-24  76.3430  258.376200    NaN    NaN  ...    NaN  19.8672    NaN    NaN
2018-12-26  75.9530  258.143600    NaN    NaN  ...    NaN  19.7980    NaN    NaN
2018-12-27  75.5552  258.127199    NaN    NaN  ...    NaN  19.7238    NaN    NaN
2018-12-28  75.1382  257.878799    NaN    NaN  ...    NaN  19.6440    NaN    NaN
2018-12-31  74.7716  257.683199    NaN    NaN  ...    NaN  19.5600    NaN    NaN

[5 rows x 1830 columns]

编辑 2:

应要求,这里是 data.head()。to_dict():

  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '44792': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '85753': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '20220': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12044': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '20239': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '28433': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12052': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12060': {Timestamp('1925-12-31 00:00:00'): 326.0,
  Timestamp('1926-01-02 00:00:00'): 326.5,
  Timestamp('1926-01-04 00:00:00'): 325.0,
  Timestamp('1926-01-05 00:00:00'): 325.5,
  Timestamp('1926-01-06 00:00:00'): 326.25},
 '12062': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '85792': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12067': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '77605': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '77606': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '20263': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12073': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12076': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12079': {Timestamp('1925-12-31 00:00:00'): 117.5,
  Timestamp('1926-01-02 00:00:00'): 124.25,
  Timestamp('1926-01-04 00:00:00'): 127.125,
  Timestamp('1926-01-05 00:00:00'): 123.75,
  Timestamp('1926-01-06 00:00:00'): 124.5},
 '61241': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12095': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '28484': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '53065': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '20298': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '77644': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '28505': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '53081': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '77659': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12124': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '77661': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '28513': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '61284': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '77668': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12140': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '85869': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '20343': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '28548': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '77702': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12167': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '85908': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12183': {Timestamp('1925-12-31 00:00:00'): 78.5,
  Timestamp('1926-01-02 00:00:00'): 78.0,
  Timestamp('1926-01-04 00:00:00'): 77.5,
  Timestamp('1926-01-05 00:00:00'): 76.875,
  Timestamp('1926-01-06 00:00:00'): 76.5},
 '44951': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '85913': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '85914': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12191': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '20386': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '77730': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '28580': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '85926': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '20394': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '69550': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12212': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '20407': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12220': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '20415': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '77768': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '85963': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '20431': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '45014': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '61399': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '69607': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '85991': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '53225': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '20474': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '20482': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '86021': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '45065': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12298': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '69649': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12308': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '20503': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '45081': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '86041': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12319': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '20511': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12343': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12345': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '20554': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12369': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '20562': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '86102': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '20570': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '86111': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12394': {Timestamp('1925-12-31 00:00:00'): 123.5,
  Timestamp('1926-01-02 00:00:00'): 124.0,
  Timestamp('1926-01-04 00:00:00'): 123.25,
  Timestamp('1926-01-05 00:00:00'): 123.5,
  Timestamp('1926-01-06 00:00:00'): 122.75},
 '36978': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '86136': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '28804': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '86158': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12431': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '61583': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '20626': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '77976': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '53401': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '86176': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12449': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '69796': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12456': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '45225': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12458': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '20650': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '28847': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 ...}

不幸的是,对于这个 post,我超出了 space,但是 MA_a_csv.head().to_dict() 产生与上面相同的结果,除了所有 NaN 而不是一个数据点。

也许在读取 csv 时使用 chunksize 参数。您需要尝试确定要使用的最佳大小,但我听说一个很好的经验法则是将其设置为可用内存的一半大小。

df = pd.read_csv("your.csv", chucksize=memory/2)

将结果写回文件时,您需要确保追加参数集:

df.to_csv("yourresults.csv", mode='a')

要么在每次 运行 代码时删除文件,要么确保 to_csv() 的第一次调用以写模式(默认)完成。

我会尝试的其他选项:

1) 使用 AWS EC2 等云资源并购买高规格的高内存机器,将您的数据和代码传输到它上面并让它 运行 您的代码。应该会快很多。

2) 我会考虑使用 Pyspark 之类的东西在多台机器上分配负载,但如果还不熟悉的话,这可能需要一些时间才能跟上速度。

祝你好运!

将两个简短的评论组合成一个答案。

1) 语句

j > math.log(
   MA_a_csv.loc[date, security]/
   data.loc[date, security]) > -j

可以通过 abs 稍微简化,例如j > abs(...)

并且可以通过单独计算一次日志并利用 log(a/b) == log(a) - log(b).

这一事实来显着加快速度

即使只对一个单元格进行一次计算,您也可以计算它并将其写回,以加快重新运行的速度。

2) 如果您在实际代码中有这些打印语句,它们将占用总时间的很大一部分。

我根据您提供的示例制作了自己的示例数据生成器。我认为它适合您所拥有的,但如果不适合请告诉我。如果数据匹配,请不要担心我是如何制作的细节。

rows = 6
cols = 5
np.random.seed(0)
data = pd.DataFrame(np.random.rand(rows, cols) * 100, 
                  index=pd.DatetimeIndex(freq='d', start='1928-12-31', periods=rows))
nan_cols = len(data.columns) // 2
random_indices = zip(pd.Series(data.index.values[:-rows // 2])
                     .sample(nan_cols, random_state=1, replace=True), 
                     pd.Series(data.columns).sample(nan_cols, random_state=2))
for row, col in random_indices:
    data.loc[:row, col] = np.nan

MA_a_csv = data * (1 + (np.random.rand(rows, cols) / 50 
                        * np.random.choice([-1, 1], size=(rows, cols))))

所以data看起来像

                    0          1          2          3          4
1928-12-31  54.881350  71.518937        NaN  54.488318        NaN
1929-01-01  64.589411  43.758721        NaN  96.366276  38.344152
1929-01-02  79.172504  52.889492  56.804456  92.559664   7.103606
1929-01-03   8.712930   2.021840  83.261985  77.815675  87.001215
1929-01-04  97.861834  79.915856  46.147936  78.052918  11.827443
1929-01-05  63.992102  14.335329  94.466892  52.184832  41.466194

而且MA_a_csv看起来像

                    0          1          2          3          4
1928-12-31  55.171734  72.626384        NaN  55.107778        NaN
1929-01-01  63.791557  44.294412        NaN  98.185186  38.867028
1929-01-02  78.603241  53.351780  57.597027  92.448175   7.008877
1929-01-03   8.829794   2.013333  83.047291  77.324770  86.368349
1929-01-04  98.977844  80.616881  45.235708  77.893620  11.876852
1929-01-05  63.785651  14.522579  94.945445  52.671519  41.668902

我运行通过看起来像你的gen_a的东西,然后制作了一个矢量化版本,得到了相同的答案:

logs = np.log(MA_a_csv / data)
ans = ((j > logs) & (logs > -j)).replace({True: 1, False: 0})

其中 ans

            0  1  2  3  4
1928-12-31  1  0  0  0  0
1929-01-01  0  0  0  0  0
1929-01-02  1  1  0  1  0
1929-01-03  0  1  1  1  1
1929-01-04  0  1  0  1  1
1929-01-05  1  0  1  1  1

np.log 可以一次对整个数组进行操作,并且 pandas 可能也在做一些奇特的事情来矢量化大于比较。 & 是按位和,所以它只是检查每个位置的两个条件是否都为真。

这比我的 gen_a 版本快 180 倍,后者没有 try/except 或 print 语句,因此对您的代码来说应该是一个更大的改进。

您也不需要 .replace({True: 1, False: 0}) 部分 - Python 1 == True0 == False 一样,因此您应该可以互换使用它们.

如果您对此有任何问题,请告诉我。如需进一步阅读,我建议阅读 Tom Augspurger 的现代 Pandas 文章 - 特别适用的是 Fast Pandas 部分。