如何在不使用函数或 class 的情况下重复代码段以实现 C++ 中的高性能循环

How to repeat a code segment without the use of function or class for high performance loop in C++

我的 C++11 程序正在执行序列化数据的联机处理,循环需要 运行 超过数百万个内存位置。计算效率是必须的,我担心的是,在这样的循环中调用函数或 class 会产生不必要的操作,从而影响效率,例如t运行sfer不同变量作用域间运算需要的几个指针值

为了举例说明,让我们考虑以下虚拟示例,其中 "something" 是重复的操作。请注意 "something" 中的代码使用循环范围内的变量。

do {
    something(&span,&foo);
    spam++
    foo++
    if ( spam == spam_spam ) {
      something(&span,&foo);
      other_things(&span,&foo);
      something(&span,&foo);
    }
    else {
      something(&span,&foo);
      still_other_things(&span,&foo);
      something(&span,&foo);
    }
}
while (foo<bar);

有没有办法重复代码块并避免使用不必要的操作移动和复制变量?在这样的循环中使用函数和 classes 是否真的意味着额外的操作以及如何避免它?


更新

按照建议,我 运行 使用下面提供的代码进行了一些测试。我测试了几个关于如何调用简单增量 1 亿次的选项。我在 Hyper-V 下的 x86_64 虚拟机上通过 RHEL 7 Server 7.6 使用 GCC。

最初,使用"g++ -std=c++17 -o test.o test.cpp"

编译

根据这些结果,我意识到编译器没有采用内联建议,即使在尝试按照 g++ doesn't inline functions

中的建议将其膨胀后也是如此

后来,按照 Mat 在同一个 post 的回答中的建议,我使用 "g++ -std=c++17 -O2 -o test.o test.cpp" 打开了编译器优化,并得到了与测试相同的迭代次数的以下结果没有优化。

结论到此为止:

以下用于测试的代码:

// Libraries
    #include <iostream>
    #include <cmath>
    #include <chrono>

// Namespaces
    using namespace std;
    using namespace std::chrono;

// constants that control program behaviour
    const long END_RESULT = 100000000;
    const double AVERAGING_LENGTH = 40.0;
    const int NUMBER_OF_ALGORITHM = 9;
    const long INITIAL_VALUE = 0;
    const long INCREMENT = 1;

// Global variables used for test with void function and to general control of the program;
    long global_variable;
    long global_increment;

// Function that returns the execution time for a simple loop
int64_t simple_loop_computation(long local_variable, long local_increment) {
    // Starts the clock to measure the execution time for the baseline
        high_resolution_clock::time_point timer_start = high_resolution_clock::now();

    // Perform the computation for baseline
        do {
            local_variable += local_increment;
        } while ( local_variable != END_RESULT);

    // Stop the clock to measure performance of the silly version
        high_resolution_clock::time_point timer_stop = high_resolution_clock::now();

        return(duration_cast<microseconds>( timer_stop - timer_start ).count());
}

// Functions that computes the execution time when using inline code within the loop
inline long increment_variable() __attribute__((always_inline));
inline long increment_variable(long local_variable, long local_increment) {
    return local_variable += local_increment;
}

int64_t inline_computation(long local_variable, long local_increment) {
    // Starts the clock to measure the execution time for the baseline
        high_resolution_clock::time_point timer_start = high_resolution_clock::now();

    // Perform the computation for baseline
        do {
            local_variable = increment_variable(local_variable,local_increment);
        } while ( local_variable != END_RESULT);

    // Stop the clock to measure performance of the silly version
        high_resolution_clock::time_point timer_stop = high_resolution_clock::now();

        return duration_cast<microseconds>( timer_stop - timer_start ).count();
}

// Functions that computes the execution time when using lambda code within the loop
int64_t labda_computation(long local_variable, long local_increment) {
    // Starts the clock to measure the execution time for the baseline
        high_resolution_clock::time_point timer_start = high_resolution_clock::now();

    // define lambda function
        auto lambda_increment = [&] {
            local_variable += local_increment;
        };

    // Perform the computation for baseline
        do {
            lambda_increment();
        } while ( local_variable != END_RESULT);

    // Stop the clock to measure performance of the silly version
        high_resolution_clock::time_point timer_stop = high_resolution_clock::now();

        return duration_cast<microseconds>( timer_stop - timer_start ).count();
}

// define lambda function
    #define define_increment() local_variable += local_increment;

// Functions that computes the execution time when using lambda code within the loop
int64_t define_computation(long local_variable, long local_increment) {
    // Starts the clock to measure the execution time for the baseline
        high_resolution_clock::time_point timer_start = high_resolution_clock::now();

    // Perform the computation for baseline
        do {
            define_increment();
        } while ( local_variable != END_RESULT);

    // Stop the clock to measure performance of the silly version
        high_resolution_clock::time_point timer_stop = high_resolution_clock::now();

        return duration_cast<microseconds>( timer_stop - timer_start ).count();
}
// Functions that compute the execution time when calling a function within the loop passing variable values
long increment_with_values_function(long local_variable, long local_increment) {
    return local_variable += local_increment;
}

int64_t function_values_computation(long local_variable, long local_increment) {
    // Starts the clock to measure the execution time for the baseline
        high_resolution_clock::time_point timer_start = high_resolution_clock::now();

    // Perform the computation for baseline
        do {
            local_variable = increment_with_values_function(local_variable,local_increment);
        } while ( local_variable != END_RESULT);

    // Stop the clock to measure performance of the silly version
        high_resolution_clock::time_point timer_stop = high_resolution_clock::now();

        return duration_cast<microseconds>( timer_stop - timer_start ).count();
}
// Functions that compute the execution time when calling a function within the loop passing variable pointers
long increment_with_pointers_function(long *local_variable, long *local_increment) {
    return *local_variable += *local_increment;
}

int64_t function_pointers_computation(long local_variable, long local_increment) {
    // Starts the clock to measure the execution time for the baseline
        high_resolution_clock::time_point timer_start = high_resolution_clock::now();

    // Perform the computation for baseline
        do {
            local_variable = increment_with_pointers_function(&local_variable,&local_increment);
        } while ( local_variable != END_RESULT);

    // Stop the clock to measure performance of the silly version
        high_resolution_clock::time_point timer_stop = high_resolution_clock::now();

        return duration_cast<microseconds>( timer_stop - timer_start ).count();
}
// Functions that compute the execution time when calling a function within the loop without passing variables 
void increment_with_void_function(void) {
    global_variable += global_increment;
}

int64_t function_void_computation(long local_variable, long local_increment) {
    // Starts the clock to measure the execution time for the baseline
        high_resolution_clock::time_point timer_start = high_resolution_clock::now();

    // set global variables
        global_variable = local_variable;
        global_increment = local_increment;

    // Perform the computation for baseline
        do {
            increment_with_void_function();
        } while ( global_variable != END_RESULT);

    // Stop the clock to measure performance of the silly version
        high_resolution_clock::time_point timer_stop = high_resolution_clock::now();

        return duration_cast<microseconds>( timer_stop - timer_start ).count();
}
// Object and Function that compute the duration when using a method of the object where data is stored without passing variables
struct object {
    long object_variable = 0;
    long object_increment = 1;

    object(long local_variable, long local_increment) {
        object_variable = local_variable;
        object_increment = local_increment;
    }

    void increment_object(void){
        object_variable+=object_increment;
    }

    void increment_object_with_value(long local_increment){
        object_variable+=local_increment;
    }
};

int64_t object_members_computation(long local_variable, long local_increment) {
    // Starts the clock to measure the execution time for the baseline
        high_resolution_clock::time_point timer_start = high_resolution_clock::now();

    // Create object
        object object_instance = {local_variable,local_increment};

    // Perform the computation for baseline
        do {
            object_instance.increment_object();
        } while ( object_instance.object_variable != END_RESULT);

    // Get the results out of the object
        local_variable = object_instance.object_variable;

    // Stop the clock to measure performance of the silly version
        high_resolution_clock::time_point timer_stop = high_resolution_clock::now();

        return duration_cast<microseconds>( timer_stop - timer_start ).count();
}

// Function that compute the duration when using a method of the object where data is stored passing variables
int64_t object_values_computation(long local_variable, long local_increment) {
    // Starts the clock to measure the execution time for the baseline
        high_resolution_clock::time_point timer_start = high_resolution_clock::now();

    // Create object
        object object_instance = {local_variable,local_increment};

    // Perform the computation for baseline
        do {
            object_instance.increment_object_with_value(local_increment);
        } while ( object_instance.object_variable != END_RESULT);

    // Get the results out of the object
        local_variable = object_instance.object_variable;

    // Stop the clock to measure performance of the silly version
        high_resolution_clock::time_point timer_stop = high_resolution_clock::now();

        return duration_cast<microseconds>( timer_stop - timer_start ).count();
}

int main() {

    // Create array to store execution time results for all tests
        pair<string,int64_t> duration_sum[NUMBER_OF_ALGORITHM]={
            make_pair("Simple loop computation (baseline): ",0.0),
            make_pair("Inline Function: ",0.0),
            make_pair("Lambda Function: ",0.0),
            make_pair("Define Macro: ",0.0)
            make_pair("Function passing values: ",0.0),
            make_pair("Function passing pointers: ",0.0),
            make_pair("Function with void: ",0.0),
            make_pair("Object method operating with members: ",0.0),
            make_pair("Object method passing values: ",0.0),
        };

    // loop to compute average of several execution times
        for ( int i = 0; i < AVERAGING_LENGTH; i++) {
            // Compute the execution time for a simple loop as the baseline
                duration_sum[0].second = duration_sum[0].second + simple_loop_computation(INITIAL_VALUE, INCREMENT);

            // Compute the execution time when using inline code within the loop (expected same as baseline)
                duration_sum[1].second = duration_sum[1].second + inline_computation(INITIAL_VALUE, INCREMENT);

            // Compute the execution time when using lambda code within the loop (expected same as baseline)
                duration_sum[2].second = duration_sum[2].second + labda_computation(INITIAL_VALUE, INCREMENT);

            // Compute the duration when using a define macro
                duration_sum[3].second = duration_sum[3].second + define_computation(INITIAL_VALUE, INCREMENT);

            // Compute the execution time when calling a function within the loop passing variables values
                duration_sum[4].second = duration_sum[4].second + function_values_computation(INITIAL_VALUE, INCREMENT);

            // Compute the execution time when calling a function within the loop passing variables pointers
                duration_sum[5].second = duration_sum[5].second + function_pointers_computation(INITIAL_VALUE, INCREMENT);

            // Compute the execution time when calling a function within the loop without passing variables
                duration_sum[6].second = duration_sum[6].second + function_void_computation(INITIAL_VALUE, INCREMENT);

            // Compute the duration when using a method of the object where data is stored without passing variables
                duration_sum[7].second = duration_sum[7].second + object_members_computation(INITIAL_VALUE, INCREMENT);

            // Compute the duration when using a method of the object where data is stored passing variables
                duration_sum[8].second = duration_sum[8].second + object_values_computation(INITIAL_VALUE, INCREMENT);
        }


        double average_baseline_duration = 0.0;

    // Print out results
        for ( int i = 0; i < NUMBER_OF_ALGORITHM; i++) {
        // compute averave from sum
            average_baseline_duration = ((double)duration_sum[i].second/AVERAGING_LENGTH)/1000.0;

        // Print the result
            cout << duration_sum[i].first << average_baseline_duration << "ms \n";
        }

    return 0;
}

如果代码足够短,可以声明为内联,编译器会把它内联。如果不是,好吧,那么重复它可能无济于事。

但是,老实说,这是最不有效的优化形式。关注高效的算法和缓存高效的数据结构。

正如其他人所建议的那样 inline 关键字可以做到这一点但是有限制(每个编译器不同)有些人不喜欢里面的循环等......所以如果你有一些你的编译器不喜欢的东西该函数根本不会内联...

因此,如果您想要更好的无限制解决方案(对于更复杂的代码)始终有效,我通常使用 #define

// definition
#define code_block1(operands) { code ... }
#define code_block2(operands) { code ... \
 code ... \
 code ... \
 code ... \
 code ... }

// usage:

code ...
code_block1(); // this is macro so the last ; is not needed but I like it there ...
code_block2();

code ...
code_block2();

code ...
code_block1();

code ...
code_block2();
code_block1();
...

// undefinition so tokens do not fight with latter on code
#undef code_block1
#undef code_block2

所以你只需以宏(#define)而不是函数的形式定义你的代码......它可以使用全局和局部变量......不需要{ }但它是一个好主意,因此宏的行为方式与单数表达式相同。一旦你开始使用像这样的东西,这将防止后者头痛:

for (i=0;i<100;i++) code_block1();

如果没有 { } 内部宏,代码将中断,因为只有宏内部的第一个表达式会在循环内部......从快速查看代码时并不明显。

对于短代码,您可以将这些内容写成一行,但如果代码很长,您可以使用 \ 将定义分成多行。请注意,不要在定义行中使用注释 //,因为那样会在代码中使用宏后注释掉所有内容,甚至代码...因此,如果您必须有注释,请改用 /* ... */

(operands) 部分是可选的,因为没有操作数,它只是

#define code_block1 { code ... }

#undef 部分是可选的,这取决于您是否希望在整个代码中或仅在某些函数中局部使用此类宏,class,文件...也如您所见宏中仅使用令牌名称,根本没有操作数。

我经常使用它,例如:

并寻找 loop_begloop_end ...它是一个循环宏,用法为:

loop_beg custom_code; loop_end

这就是为什么它没有 {} 因为 {loop_beg}loop_end.