BSpline 结合显式代码和外部代码表现不同

Question

下面是一个示例代码，其中 BSplineComp 与 ExplicitComp 或 ExternalCodeComp 相结合。这两个都做同样的计算，两个分量的梯度都是用有限差分计算的。

如果我运行版本 Bspline+ExplicitComp 结果在 2,3 次迭代内实现。如果我运行版本 Bspline+ExternalCodeComp 我必须等待很多。在这种情况下，它试图找到关于每个输入的输出梯度。因此，例如，在 bspline 组件中有 9 个控制点被插值到 70 个点。然后外部组件必须评估与插值点一样多（70次）

因此，在 bspline 与昂贵的外部代码相结合的情况下，有限差分需要与其插值的点数一样多，这成为计算的瓶颈。

基于这个输入我有两个问题

1- 如果外部代码组件基于显式组件，导致行为差异的主要差异是什么？（考虑到两者的输入都是 shape=70）

2- 在前面提到的 bspline 与昂贵的外部代码组合的场景中，除了这里显示的方式之外，还有更有效的组合它们的方式。

主代码：'external' 变量是切换 external/explicit 代码合成的标志。将 true/false 设置为运行上面解释的两种情况。

from openmdao.components.bsplines_comp import BsplinesComp
from openmdao.api import  IndepVarComp, Problem, ExplicitComponent,ExecComp,ExternalCodeComp
from openmdao.api import  ScipyOptimizeDriver, SqliteRecorder, CaseReader
import matplotlib.pyplot as plt
import  numpy as np

external=True # change this to true for the case with external code comp. or false for the case with explicit comp.  

rr=np.arange(0,70,1)
"Explicit component for the area under the line calculation" 
class AreaComp(ExplicitComponent):
    def initialize(self):
        self.options.declare('lenrr', int)        
        self.options.declare('rr', types=np.ndarray)        
    def setup(self):        
        self.add_input('h', shape=lenrr)
        self.add_output('area')
        self.declare_partials(of='area', wrt='h', method='fd')
    def compute(self, inputs, outputs):
        rr = self.options['rr']        
        outputs['area'] = np.trapz(rr,inputs['h'])          

class ExternalAreaComp(ExternalCodeComp):
    def setup(self):
        self.add_input('h', shape=70)
        self.add_output('area')

        self.input_file = 'paraboloid_input.dat'
        self.output_file = 'paraboloid_output.dat'

        # providing these is optional; the component will verify that any input
        # files exist before execution and that the output files exist after.
        self.options['external_input_files'] = [self.input_file]
        self.options['external_output_files'] = [self.output_file]

        self.options['command'] = [
            'python', 'extcode_paraboloid.py', self.input_file, self.output_file
        ]

        # this external code does not provide derivatives, use finite difference
        self.declare_partials(of='*', wrt='*', method='fd')

    def compute(self, inputs, outputs):
        h = inputs['h']

        # generate the input file for the paraboloid external code
        np.savetxt(self.input_file,h)
        # the parent compute function actually runs the external code
        super(ExternalAreaComp, self).compute(inputs, outputs)

        # parse the output file from the external code and set the value of f_xy
        f_xy=np.load('a.npy')

        outputs['area'] = f_xy


prob  = Problem()
model = prob.model

n_cp = 9
lenrr = len(rr)
"Initialize the design variables"
x = np.random.rand(n_cp)

model.add_subsystem('px', IndepVarComp('x', val=x))
model.add_subsystem('interp', BsplinesComp(num_control_points=n_cp,
                                           num_points=lenrr,
                                           in_name='h_cp',
                                           out_name='h'))

if external:
    comp=ExternalAreaComp()
    model.add_subsystem('AreaComp', comp)
else:
    comp = AreaComp(lenrr=lenrr, rr=rr)
    model.add_subsystem('AreaComp', comp)

case_recorder_filename2 = 'cases4.sql'
recorder2 = SqliteRecorder(case_recorder_filename2)
comp.add_recorder(recorder2)
comp.recording_options['record_outputs']=True
comp.recording_options['record_inputs']=True


model.connect('px.x', 'interp.h_cp')
model.connect('interp.h', 'AreaComp.h')
model.add_constraint('interp.h', lower=0.9, upper=1, indices=[0])

prob.driver = ScipyOptimizeDriver()
prob.driver.options['optimizer'] = 'SLSQP'
prob.driver.options['disp'] = True
#prob.driver.options['optimizer'] = 'COBYLA'
#prob.driver.options['disp'] = True

prob.driver.options['tol'] = 1e-9

model.add_design_var('px.x', lower=1,upper=10)
model.add_objective('AreaComp.area',scaler=1)

prob.setup(check=True)
#prob.run_model()
prob.run_driver()
cr = CaseReader(case_recorder_filename2)

case_keys = cr.system_cases.list_cases()
cou=-1
for case_key in case_keys:
    cou=cou+1
    case = cr.system_cases.get_case(case_key)
    plt.plot(rr,case.inputs['h'],'-*')

外部代码extcode_paraboloid.py如下

import numpy as np
if __name__ == '__main__':
    import sys

    input_filename = sys.argv[1]
    output_filename = sys.argv[2]

    h=np.loadtxt(input_filename)
    rr=np.arange(0,70,1)

    rk= np.trapz(rr,h)          
    np.save('a',np.array(rk))

Answer 1

在这两种情况下，您的代码都需要 3 次迭代才能运行。外部代码的壁挂时间要长得多，这仅仅是因为文件 io 的成本加上每次调用函数时都需要进行系统调用以假脱机处理新进程的要求。是的，系统调用非常昂贵，文件 i/o 也不便宜。如果你有一个成本更高的分析，它没什么大不了的，但你会明白为什么应该尽可能避免它。

在这种情况下，您可以降低 FD 成本。因为你只有 9 个 bspline 变量，你已经正确地推断出你可以运行少得多的 FD 步骤。您想要使用 OpenMDAO v2.4 中的 approximate semi-total derivative 功能来跨组而不是跨每个单独的组件设置 FD。

就这么简单：

.
. 
. 

if external:
    comp=ExternalAreaComp()
    model.add_subsystem('AreaComp', comp)
else:
    comp = AreaComp(lenrr=lenrr, rr=rr)
    model.add_subsystem('AreaComp', comp)

model.approx_totals()

.
. 
.

BSpline 结合显式代码和外部代码表现不同

BSpline combined with explicit and externalcode behaves differently

openmdao