在 Chapel 中计算矩阵的 rowSums

Question

继续我的教堂冒险...

我有一个矩阵A.

var idx = {1..n};
var adom = {idx, idx};
var A: [adom] int;
//populate A;

var rowsums: [idx] int;

填充行和的最有效方法是什么？

Answer 1

最有效 解决方案很难定义。但是，这是一种计算 rowsums 的方法，它既并行又优雅：

config const       n = 8;          // "naked" n would cause compilation to fail
const indices = 1..n;              // tio.chpl:1: error: 'n' undeclared (first use this function)
const adom = {indices, indices};
var A: [adom] int;

// Populate A
[(i,j) in adom] A[i, j] = i*j;

var rowsums: [indices] int;


forall i in indices {
  rowsums[i] = + reduce(A[i, ..]);
}

writeln(rowsums);

Try it online!

这是利用 A 的 + reduction over array slices。

请注意，forall 和 + reduce 都为上面的程序引入了并行性。如果 indices 的大小足够小，则仅使用 for 循环可能会更有效，从而避免产生任务。

Answer 2

一些提示
to make the code actually run-live `SEQ` 和 `PAR` 模式：

除了一些实施细节外，上述@bencray 的假设关于 PAR 设置的假定管理费用，这可能有利于在 SEQ 设置中进行纯串行处理， 未经过实验证实。 在这里也应该注意，由于显而易见的原因，分布式模式未在 live <TiO>-IDE 上进行测试，而即使不是微小规模的分布式实施也比运行的具有科学意义的实验更矛盾。

事实很重要

A rowsums[] 处理，即使在 2x2 的最小可能规模下，也在 SEQ 模式比 256x256 模式下的 PAR 模式更慢。

干得好，chapel 团队，在 [=12= 中最大限度地利用紧凑型硅资源的最佳对齐确实很酷]!

有关准确运行时间性能的记录，（参考自行记录的表格）如下，或者不要犹豫访问 live-IDE-运行（参考.' 以上 ) 并自行实验。

读者可能还会在小规模实验中识别出外部噪音，因为 O/S- 和托管-IDE-相关进程会干预资源使用并影响 <SECTION-UNDER-TEST> 运行通过不利 CPU/Lx-CACHE/memIO/process/et al 冲突的时间性能，事实上排除了这些测量用于某些广义解释。

希望所有人都能享受 chapel 可爱的 `[TIME]` 结果
_{在不断增长的 [EXPSPACE] 规模计算环境中展示的}

/* ---------------------------------------SETUP-SECTION-UNDER-TEST--*/ use Time;
/* ---------------------------------------SETUP-SECTION-UNDER-TEST--*/ var aStopWATCH_SEQ: Timer;
/* ---------------------------------------SETUP-SECTION-UNDER-TEST--*/ var aStopWATCH_PAR: Timer;

//nst max_idx =    123456;                   // seems to be too fat  for <TiO>-IDE to allocate                  <TiO>--   /wrappers/chapel: line 6: 24467 Killed
const max_idx =      4096;
//nst max_idx =      8192;                   // seems to be too long for <TiO>-IDE to let it run [SEQ] part     <TiO>--  The request exceeded the 60 second time limit and was terminated
//nst max_idx =     16384;                   // seems to be too long for <TiO>-IDE to let it run [PAR] part too <TiO>--   /wrappers/chapel: line 6: 12043 Killed
const indices = 1..max_idx;

const   adom  = {indices, indices};
var A: [adom] int;

[(i,j) in adom] A[i, j] = i*j;               // Populate A[,]

var rowsums: [indices] int;

/* ---------------------------------------------SECTION-UNDER-TEST--*/ aStopWATCH_SEQ.start();
for       i in indices {                     // SECTION-UNDER-TEST--
  rowsums[i] = + reduce(A[i, ..]);           // SECTION-UNDER-TEST--
}                                            // SECTION-UNDER-TEST--
/* ---------------------------------------------SECTION-UNDER-TEST--*/ aStopWATCH_SEQ.stop();

/* 
                                               <SECTION-UNDER-TEST> took     8973 [us] to run in [SEQ] mode for    2 elements on <TiO>-IDE
                                               <SECTION-UNDER-TEST> took    28611 [us] to run in [SEQ] mode for    4 elements on <TiO>-IDE
                                               <SECTION-UNDER-TEST> took    58824 [us] to run in [SEQ] mode for    8 elements on <TiO>-IDE
                                               <SECTION-UNDER-TEST> took   486786 [us] to run in [SEQ] mode for   64 elements on <TiO>-IDE
                                               <SECTION-UNDER-TEST> took  1019990 [us] to run in [SEQ] mode for  128 elements on <TiO>-IDE
                                               <SECTION-UNDER-TEST> took  2010680 [us] to run in [SEQ] mode for  256 elements on <TiO>-IDE
                                               <SECTION-UNDER-TEST> took  4154970 [us] to run in [SEQ] mode for  512 elements on <TiO>-IDE
                                               <SECTION-UNDER-TEST> took  8260960 [us] to run in [SEQ] mode for 1024 elements on <TiO>-IDE
                                               <SECTION-UNDER-TEST> took 15853000 [us] to run in [SEQ] mode for 2048 elements on <TiO>-IDE
                                               <SECTION-UNDER-TEST> took 33126800 [us] to run in [SEQ] mode for 4096 elements on <TiO>-IDE
                                               <SECTION-UNDER-TEST> took      n/a [us] to run in [SEQ] mode for 8192 elements on <TiO>-IDE

   ============================================ */


/* ---------------------------------------------SECTION-UNDER-TEST--*/ aStopWATCH_PAR.start();
forall    i in indices {                     // SECTION-UNDER-TEST--
  rowsums[i] = + reduce(A[i, ..]);           // SECTION-UNDER-TEST--
}                                            // SECTION-UNDER-TEST--
/* ---------------------------------------------SECTION-UNDER-TEST--*/ aStopWATCH_PAR.stop();
/*
                                               <SECTION-UNDER-TEST> took  12131 [us] to run in [PAR] mode for    2 elements on <TiO>-IDE
                                               <SECTION-UNDER-TEST> took   8095 [us] to run in [PAR] mode for    4 elements on <TiO>-IDE
                                               <SECTION-UNDER-TEST> took   8023 [us] to run in [PAR] mode for    8 elements on <TiO>-IDE
                                               <SECTION-UNDER-TEST> took   8156 [us] to run in [PAR] mode for   64 elements on <TiO>-IDE
                                               <SECTION-UNDER-TEST> took   7990 [us] to run in [PAR] mode for  128 elements on <TiO>-IDE
                                               <SECTION-UNDER-TEST> took   8692 [us] to run in [PAR] mode for  256 elements on <TiO>-IDE
                                               <SECTION-UNDER-TEST> took  15134 [us] to run in [PAR] mode for  512 elements on <TiO>-IDE
                                               <SECTION-UNDER-TEST> took  16926 [us] to run in [PAR] mode for 1024 elements on <TiO>-IDE
                                               <SECTION-UNDER-TEST> took  30671 [us] to run in [PAR] mode for 2048 elements on <TiO>-IDE
                                               <SECTION-UNDER-TEST> took 105323 [us] to run in [PAR] mode for 4096 elements on <TiO>-IDE
                                               <SECTION-UNDER-TEST> took 292232 [us] to run in [PAR] mode for 8192 elements on <TiO>-IDE

   ============================================ */



writeln( rowsums,
        "\n <SECTION-UNDER-TEST> took ", aStopWATCH_SEQ.elapsed( Time.TimeUnits.microseconds ), " [us] to run in [SEQ] mode for ", max_idx, " elements on <TiO>-IDE",
        "\n <SECTION-UNDER-TEST> took ", aStopWATCH_PAR.elapsed( Time.TimeUnits.microseconds ), " [us] to run in [PAR] mode for ", max_idx, " elements on <TiO>-IDE"
         );

这就是 chapel 如此出色的原因

感谢您为 HPC 开发和改进如此出色的计算工具。

在 Chapel 中计算矩阵的 rowSums

Calculate rowSums in Chapel for a matrix

chapel

一些提示to make the code actually run-live SEQ 和 PAR 模式：

事实很重要

希望所有人都能享受 chapel 可爱的 [TIME] 结果 在不断增长的 [EXPSPACE] 规模计算环境中展示的

这就是 chapel 如此出色的原因

一些提示
to make the code actually run-live `SEQ` 和 `PAR` 模式：

希望所有人都能享受 chapel 可爱的 `[TIME]` 结果
_{在不断增长的 [EXPSPACE] 规模计算环境中展示的}