Chapel-1.16.0 预发布内部错误 (-999) 对于先前的域大小不可知论声明还有什么其他可能的方法?

Chapel-1.16.0 pre-release Internal error (-999) what other approach possible to a prior domain-size agnostic declaration?

原型设计返回了 内部错误:

虽然此特定设置的目的既不感兴趣也不相关,
编译器完成了以下调试通知,
任何关于避免碰撞语法的建议将不胜感激:

<TiO>-IDE-Debug::____________________________________________________

.code.tio.chpl:77: internal error: IMP0586 chpl Version 1.16.0 pre-release (-999)

Note: This source location is a guess.

Internal errors indicate a bug in the Chapel compiler ("It's us, not you"),
and we're sorry for the hassle.  We would appreciate your reporting this bug -- 
please see http://chapel.cray.com/bugs.html for instructions.  In the meantime,
the filename + line number above may be useful in working around the issue.


( 编译器团队显然会对观察到的情况的内部处理有一些额外的兴趣和担忧,这不是本文的主要意图或主题 post )


The code, live @ <TiO>-IDE::

/* ---------------------------------------SETUP-SECTION-UNDER-TEST--*/ use Time;
/* ---------------------------------------SETUP-SECTION-UNDER-TEST--*/ var aStopWATCH_RND_GEN: Timer;
/* ---------------------------------------SETUP-SECTION-UNDER-TEST--*/ var aStopWATCH_LIN_ALG: Timer;
/* ---------------------------------------SETUP-SECTION-UNDER-TEST--*/ var aStopWATCH_MAT_REC: Timer;
/* ---------------------------------------SETUP-SECTION-UNDER-TEST--*/ var aStopWATCH_ARR_REC: Timer;
config const n_power =         5;
config const L_size  =      1000;
       const indices = 1..L_size;
       const aDomain = {indices, indices};

       var   A: [aDomain] real(64); // real(32); // may've shown some byte-word alignment artifacts
       var   B: [aDomain] real(64); // real(32); // may've shown some byte-word alignment artifacts
       const dtype =    "-real(64)";
       var   S: [aDomain] real(64); // real(32); // OK: must've been set real(64) to avoid /LinearAlgebra.chpl:535: error: type mismatch in assignment from real(64) to real(32)

/* -----------------------------------------------------------------*/ use Random;
/* ---------------------------------------------SECTION-UNDER-TEST--*/     aStopWATCH_RND_GEN.start();
    Random.fillRandom(  A );
    Random.fillRandom(  B );
/* ---------------------------------------------SECTION-UNDER-TEST--*/     aStopWATCH_RND_GEN.stop();
/* 

   ============================================ */

proc arrMUL( arrA: [?DA] real(64),
             arrB: [?DB] real(64)
             ) {                      /*
                                         <Brad> If the domain/size of the array being returned cannot be described directly in the function prototype,
                                                I believe your best bet at present is to omit any description of the return type and lean on Chapel's type inference machinery
                                                to determine that you're returning an array

                                                >>> 

                                                */
     var                       arrC: [aDomain] real(64);
                                      /*
                                                <TiO>-IDE-Debug::____________________________________________________

                                                .code.tio.chpl:77: internal error: IMP0586 chpl Version 1.16.0 pre-release (-999)

                                                Note: This source location is a guess.

                                                Internal errors indicate a bug in the Chapel compiler ("It's us, not you"),
                                                and we're sorry for the hassle.  We would appreciate your reporting this bug -- 
                                                please see http://chapel.cray.com/bugs.html for instructions.  In the meantime,
                                                the filename + line number above may be useful in working around the issue.

                                                */

 /*  var                       arrC: [{1..arrA.dim( 1 ).length(),       // ..#arrA.dim( 1 ),
                                       1..arrB.dim( 2 ).length()        // ..#arrB.dim( 2 )
                                       }
                                      ] real(64);

                                                <TiO>-IDE-Debug::____________________________________________________

                                                .code.tio.chpl:49: error: unresolved call '[domain(2,int(64),false)] real(64).dim(1)'
                                                $CHPL_HOME/modules/internal/ChapelArray.chpl:1215: note: candidates are: _domain.dim(d: int)
                                                $CHPL_HOME/modules/internal/ChapelArray.chpl:1218: note:                 _domain.dim(param d: int)

                                                */
  // forall      (row, col) in arrC.domain {    // [ROW:77] reports: internal error: IMP0586 chpl Version 1.16.0 pre-release (-999)
     forall      (row, col) in     aDomain {    // [ROW:78] reports: internal error: IMP0586 chpl Version 1.16.0 pre-release (-999) 
        for                              i in arrA.dim( 2 ) do
             arrC[row, col] += arrA[row, i]
                             * arrB[     i, col];
     }
     return  arrC;
}

proc arr_REC_POW( arrM: [?D] real(64),
                  n:          int(64) // int(32) failed:
                                      //      <- config const n_power = 5 // .code.tio.chpl:64: error: unresolved call 'arr_REC_POW([domain(2,int(64),false)] real(64), int(64))'
                  ):    [ D] real(64) {     /* 
                                                <Brad> If the domain/size of the array being returned cannot be described directly in the function prototype,
                                                       I believe your best bet at present is to omit any description of the return type and lean on Chapel's type inference machinery
                                                       to determine that you're returning an array

                                                       >>> 

                                                <TiO>-IDE-Debug::____________________________________________________

                                                .code.tio.chpl:56: error: unable to resolve return type of function 'arr_REC_POW'
                                                .code.tio.chpl:56: In function 'arr_REC_POW':
                                                .code.tio.chpl:61: error: called recursively at this point


                                                // The ? operator is called the query operator, and is used to take
                                                // undetermined values like tuple or array sizes and generic types.
                                                // For example, taking arrays as parameters. The query operator is used to
                                                // determine the domain of A. This is uesful for defining the return type,
                                                // though it's not required.

                                                //                  (c) 2017 Ian J. Bertolacci, Ben Harshbarger
                                                // Originally contributed by Ian J. Bertolacci, and updated by 8 contributor(s).

                                                        >>> https://learnxinyminutes.com/docs/chapel/>
                                                */

     if      n < 1 then return         arrM;
     else               return arrMUL( arrM, arr_REC_POW( arrM, n - 1 ) );
}

/* ---------------------------------------------SECTION-UNDER-TEST--*/     aStopWATCH_ARR_REC.start();

   forall (row, col)             in S.domain {
         S[row, col] = arr_REC_POW( A, n_power )[row,col]
                     + arr_REC_POW( B, n_power )[row,col];
   }
/* ---------------------------------------------SECTION-UNDER-TEST--*/     aStopWATCH_ARR_REC.start();
/* 

   ============================================ */

<TiO>-IDE 减少(遗憾的是没有代码折叠生产力,就像在其他 IDE 环境中一样。同意 Ben 的观点,正在审查的实验自文档布局可以根据个人喜好)

仍然

chpl:30: internal error: IMP0586 chpl Version 1.16.0 pre-release (-999)

chpl:30: 是:

forall      (row, col) in    aDomain {

>>> aClickThrough-with-an-updated-code, no syntax warnings but (-999) @<TiO>-IDE

                    use Time;

var aStopWATCH_RND_GEN: Time.Timer;
var aStopWATCH_LIN_ALG: Time.Timer;
var aStopWATCH_MAT_REC: Time.Timer;
var aStopWATCH_ARR_REC: Time.Timer;

config const n_power =         5;
config const L_size  =      1000;
       const indices = 1..L_size;
       const aDomain = {indices, indices};

       var   A: [aDomain] real(64);
       var   B: [aDomain] real(64);
       const dtype =    "-real(64)";
       var   S: [aDomain] real(64);

use Random;
/* ---------------------------------------------SECTION-UNDER-TEST--*/     aStopWATCH_RND_GEN.start();
    Random.fillRandom(  A );
    Random.fillRandom(  B );
/* ---------------------------------------------SECTION-UNDER-TEST--*/     aStopWATCH_RND_GEN.stop();

proc arrMUL( arrA: [?DA] real(64),
             arrB: [?DB] real(64)
             ) {

     var     arrC: [aDomain] real(64);

     forall      (row, col) in    aDomain {
             arrC[row, col]  = 0;
        for                              i in arrA.dim( 2 ) do
             arrC[row, col] += arrA[row, i]
                             * arrB[     i, col];
     }
     return  arrC;
}

proc arr_REC_POW( arrM: [?D] real(64),
                  n:          int(64)
                  ):    [ D] real(64) {

     if      n < 1 then return         arrM;
     else               return arrMUL( arrM, arr_REC_POW( arrM, n - 1 ) );
}

/* ---------------------------------------------SECTION-UNDER-TEST--*/     aStopWATCH_ARR_REC.start();
   forall (row, col)             in S.domain {
         S[row, col] = arr_REC_POW( A, n_power )[row,col]
                     + arr_REC_POW( B, n_power )[row,col];
   }
/* ---------------------------------------------SECTION-UNDER-TEST--*/     aStopWATCH_ARR_REC.start();

     use LinearAlgebra;
var mA = LinearAlgebra.Matrix( A );
var mB = LinearAlgebra.Matrix( B );
var mS = LinearAlgebra.Matrix( S );
/* ---------------------------------------------SECTION-UNDER-TEST--*/     aStopWATCH_LIN_ALG.start();
    mS = LinearAlgebra.matPlus( LinearAlgebra.matPow( mA, n_power ),
                                LinearAlgebra.matPow( mB, n_power )
                                );
/* ---------------------------------------------SECTION-UNDER-TEST--*/     aStopWATCH_LIN_ALG.stop();

proc mat_REC_POW( matM: [] real(64),
                  n:        int(64)
                  ) {

     if      n < 1 then return                    matM;
     else               return LinearAlgebra.dot( matM, mat_REC_POW( matM, n - 1 ) );
}

/* -----------------------------------------------re-fill-m?[,]-----*/
    Random.fillRandom(  A ); mA = Matrix( A ); // re-fill mA[,]
    Random.fillRandom(  B ); mB = Matrix( B ); // re-fill mB[,]
/* -----------------------------------------------re-fill-m?[,]-----*/

/* ---------------------------------------------SECTION-UNDER-TEST--*/     aStopWATCH_MAT_REC.start();
   forall  (row, col)              in mS.domain {
         mS[row, col]  = mat_REC_POW( mA, n_power )[row,col]
                       + mat_REC_POW( mB, n_power )[row,col];
   }
/* ---------------------------------------------SECTION-UNDER-TEST--*/     aStopWATCH_MAT_REC.start();

/* |||||||||||||||||||||||||||||||||||||||||||||||||||||||||| PERF--*/

writeln( ".fillRandom() took",           aStopWATCH_RND_GEN.elapsed( Time.TimeUnits.microseconds ), " [us] for A[,], B[,] having ", 2 * ( L_size * L_size ), dtype, " elements in total." );
writeln(
        "\n <SECTION-UNDER-TEST> took ", aStopWATCH_LIN_ALG.elapsed( Time.TimeUnits.microseconds ), " [us] in [LIN_ALG] mode ( A^n + B^b ) for [", L_size, ",", L_size, "] on <TiO>-IDE",
        "\n <SECTION-UNDER-TEST> took ", aStopWATCH_MAT_REC.elapsed( Time.TimeUnits.microseconds ), " [us] in [MAT_REC] mode ( A^n + B^b ) for [", L_size, ",", L_size, "] on <TiO>-IDE",
        "\n <SECTION-UNDER-TEST> took ", aStopWATCH_ARR_REC.elapsed( Time.TimeUnits.microseconds ), " [us] in [ARR_REC] mode ( A^n + B^b ) for [", L_size, ",", L_size, "] on <TiO>-IDE"
         );
/* ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| INF--*/

writeln(                     "<TiO>-IDE-LocaleSpace is: ", LocaleSpace, " massive. Code is executing [here], being Locale ", here.id  );
for                                                i in    LocaleSpace do
    writeln(                 "          Locale #", i, "'s ID is: ", Locales[i].id );

任务完成!使用了域助手,递归还需要一些调查

非常感谢所有帮助过的人做到这一点。
之前 W.I.P。临时评论尚未保留 for educational purposes

[OK]:代码现在通过了初始编译器的语法检查,

BLAS+ATLAS 但无法按照 v1.15/.16 文档建议
( 工作,并且正在 <TiO>-IDE 的帮助下解决管理员)

                    use Time;

var aStopWATCH_RND_GEN: Time.Timer;
var aStopWATCH_LIN_ALG: Time.Timer;
var aStopWATCH_MAT_REC: Time.Timer;
var aStopWATCH_ARR_REC: Time.Timer;

config const n_power =         5;
config const L_size  =      1000;
       const indices = 1..L_size;
       const aDomain = {indices, indices};

       var   A: [aDomain] real(64);
       var   B: [aDomain] real(64);
       const dtype =    "-real(64)";
       var   S: [aDomain] real(64);

use Random;
/* ---------------------------------------------SECTION-UNDER-TEST--*/     aStopWATCH_RND_GEN.start();
    Random.fillRandom(  A );
    Random.fillRandom(  B );
/* ---------------------------------------------SECTION-UNDER-TEST--*/     aStopWATCH_RND_GEN.stop();

proc arrMUL( arrA: [?DA] real(64),
             arrB: [?DB] real(64)
             ) {

     var     arrC: [aDomain] real(64);

     forall      (row, col) in    aDomain {
             arrC[row, col]  = 0;
     // for                              i in arrA.dim( 2 ) do          // calling .dim(2) on an array instead of it's domain. Note that dim is only defined on the domain, not the array
        for                              i in arrA.domain.dim( 2 ) do   // calling .dim(2) on an array instead of it's domain. Note that dim is only defined on the domain, not the array
             arrC[row, col] += arrA[row, i]
                             * arrB[     i, col];
     }
     return  arrC;
}

proc arr_REC_POW( arrM: [?D] real(64),
                  n:          int(64)
                  ):    [ D] real(64) {

     if      n < 1 then return         arrM;
     else               return arrMUL( arrM, arr_REC_POW( arrM, n - 1 ) );
}

/* ---------------------------------------------SECTION-UNDER-TEST--*/     aStopWATCH_ARR_REC.start();
   forall (row, col)             in S.domain {
         S[row, col] = arr_REC_POW( A, n_power )[row,col]
                     + arr_REC_POW( B, n_power )[row,col];
   }
/* ---------------------------------------------SECTION-UNDER-TEST--*/     aStopWATCH_ARR_REC.start();

     use LinearAlgebra;
var mA = LinearAlgebra.Matrix( A );
var mB = LinearAlgebra.Matrix( B );
var mS = LinearAlgebra.Matrix( S );
/* ---------------------------------------------SECTION-UNDER-TEST--*/     aStopWATCH_LIN_ALG.start();
    mS = LinearAlgebra.matPlus( LinearAlgebra.matPow( mA, n_power ),
                                LinearAlgebra.matPow( mB, n_power )
                                );
/* ---------------------------------------------SECTION-UNDER-TEST--*/     aStopWATCH_LIN_ALG.stop();

proc mat_REC_POW( matM: [?Dm] real(64),
                  n:           int(64)
                  ):    [ Dm] real(64) {

 //  if      n < 1 then return                    matM;                                           // chpl:65: error: unable to resolve return type of function 'mat_REC_POW'
     if      n < 1 then return LinearAlgebra.dot( matM, LinearAlgebra.eye( matM.shape[1] ) );     // [DID NOT HELP]: added: so as to help compiler assume the return-type
     else               return LinearAlgebra.dot( matM,       mat_REC_POW( matM, n - 1 ) );       // chpl:70: error: called recursively at this point
}

/* -----------------------------------------------re-fill-m?[,]-----*/
    Random.fillRandom(  A ); mA = Matrix( A ); // re-fill mA[,]
    Random.fillRandom(  B ); mB = Matrix( B ); // re-fill mB[,]
/* -----------------------------------------------re-fill-m?[,]-----*/

/* ---------------------------------------------SECTION-UNDER-TEST--*/     aStopWATCH_MAT_REC.start();
   forall  (row, col)              in mS.domain {
         mS[row, col]  = mat_REC_POW( mA, n_power )[row,col]
                       + mat_REC_POW( mB, n_power )[row,col];
   }
/* ---------------------------------------------SECTION-UNDER-TEST--*/     aStopWATCH_MAT_REC.start();

/* |||||||||||||||||||||||||||||||||||||||||||||||||||||||||| PERF--*/

writeln( ".fillRandom() took",           aStopWATCH_RND_GEN.elapsed( Time.TimeUnits.microseconds ), " [us] for A[,], B[,] having ", 2 * ( L_size * L_size ), dtype, " elements in total." );
writeln(
        "\n <SECTION-UNDER-TEST> took ", aStopWATCH_LIN_ALG.elapsed( Time.TimeUnits.microseconds ), " [us] in [LIN_ALG] mode ( A^n + B^n ) for [", L_size, ",", L_size, "] on <TiO>-IDE",
        "\n <SECTION-UNDER-TEST> took ", aStopWATCH_MAT_REC.elapsed( Time.TimeUnits.microseconds ), " [us] in [MAT_REC] mode ( A^n + B^n ) for [", L_size, ",", L_size, "] on <TiO>-IDE",
        "\n <SECTION-UNDER-TEST> took ", aStopWATCH_ARR_REC.elapsed( Time.TimeUnits.microseconds ), " [us] in [ARR_REC] mode ( A^n + B^n ) for [", L_size, ",", L_size, "] on <TiO>-IDE"
         );
/* ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| INF--*/

writeln(                     "<TiO>-IDE-LocaleSpace is: ", LocaleSpace, " massive. Code is executing [here], being Locale ", here.id  );
for                                                i in    LocaleSpace do
    writeln(                 "          Locale #", i, "'s ID is: ", Locales[i].id );

BLAS + ATLAS protest if tried to get compiled/linked >>> @ <TiO>-IDE,而管理员确认已安装并审查/确认两个模块都已到位([OK]:<TiO>-IDE 站点管理员和 Brad 解决了 - 对此非常感谢)

/usr/bin/ld: cannot find -lblas
/usr/bin/ld: cannot find -latlas

<TiO>-IDE 管理员 + Brad 的建议帮助它工作

单语言环境的进程性能,(线程版本ATLAS):

.fillRandom()         took  582125 [us] for A[,], B[,] having 2000000-real(64) elements in total.    
 <SECTION-UNDER-TEST> took 2702530 [us] in [LIN_ALG] mode ( A^n + B^n ) for [1000,1000] on <TiO>-IDE

--print-commands 报告了编译器开关

gcc    -I/opt/chapel//lib/chapel/1.16/third-party/qthread/install/linux64-gnu-native-flat/include
       -I/opt/chapel//lib/chapel/1.16/third-party/hwloc/install/linux64-gnu-native-flat/include
       -DCHPL_TASKS_MODEL_H=\"tasks-qthreads.h\"
       -DCHPL_THREADS_MODEL_H=\"threads-none.h\"
       -DCHPL_WIDE_POINTER_STRUCT
       -DCHPL_JEMALLOC_PREFIX=chpl_je_
       -DCHPL_HAS_GMP
       -Wno-unused
       -Wno-uninitialized
       -Wno-pointer-sign
       -Wno-tautological-compare
       -Wno-stringop-overflow
       -Wno-strict-overflow
       -c
       -o /tmp/chpl-runner-15040.deleteme/.bin.tio.tmp.o
       -I/opt/chapel//lib/chapel/1.16/third-party/qthread/install/linux64-gnu-native-flat/include
       -I.
       -I/opt/chapel//lib/chapel/1.16/runtime/include/localeModels/flat
       -I/opt/chapel//lib/chapel/1.16/runtime/include/localeModels
       -I/opt/chapel//lib/chapel/1.16/runtime/include/comm/none
       -I/opt/chapel//lib/chapel/1.16/runtime/include/comm
       -I/opt/chapel//lib/chapel/1.16/runtime/include/tasks/qthreads
       -I/opt/chapel//lib/chapel/1.16/runtime/include/threads/none
       -I/opt/chapel//lib/chapel/1.16/runtime/include
       -I/opt/chapel//lib/chapel/1.16/runtime/include/qio
       -I/opt/chapel//lib/chapel/1.16/runtime/include/atomics/intrinsics
       -I/opt/chapel//lib/chapel/1.16/runtime/include/mem/jemalloc
       -I/opt/chapel//lib/chapel/1.16/third-party/utf8-decoder
       -I/opt/chapel/share/chapel/1.16/runtime//../build/runtime/linux64/gnu/arch-native/loc-flat/comm-none/tasks-qthreads/tmr-generic/unwind-none/mem-jemalloc/atomics-intrinsics/gmp/hwloc/re2/wide-struct/fs-none/include
       -I/opt/chapel//lib/chapel/1.16/third-party/jemalloc/install/linux64-gnu-native/include
       -I/opt/chapel//lib/chapel/1.16/third-party/gmp/install/linux64-gnu-native/include
       -I/opt/chapel//lib/chapel/1.16/third-party/hwloc/install/linux64-gnu-native-flat/include /tmp/chpl-runner-15040.deleteme/_main.c

g++    -L/opt/chapel//lib/chapel/1.16/third-party/qthread/install/linux64-gnu-native-flat/lib
       -Wl,-rpath,/opt/chapel//lib/chapel/1.16/third-party/qthread/install/linux64-gnu-native-flat/lib
       -L/opt/chapel//lib/chapel/1.16/third-party/jemalloc/install/linux64-gnu-native/lib
       -L/opt/chapel//lib/chapel/1.16/third-party/gmp/install/linux64-gnu-native/lib
       -Wl,-rpath,/opt/chapel//lib/chapel/1.16/third-party/gmp/install/linux64-gnu-native/lib
       -L/opt/chapel//lib/chapel/1.16/third-party/hwloc/install/linux64-gnu-native-flat/lib
       -Wl,-rpath,/opt/chapel//lib/chapel/1.16/third-party/hwloc/install/linux64-gnu-native-flat/lib
       -L/opt/chapel//lib/chapel/1.16/third-party/re2/install/linux64-gnu-native/lib
       -Wl,-rpath,/opt/chapel//lib/chapel/1.16/third-party/re2/install/linux64-gnu-native/lib
       -o /tmp/chpl-runner-15040.deleteme/.bin.tio.tmp
       -L/opt/chapel//lib/chapel/1.16/runtime/lib/linux64/gnu/arch-native/loc-flat/comm-none/tasks-qthreads/tmr-generic/unwind-none/mem-jemalloc/atomics-intrinsics/gmp/hwloc/re2/wide-struct/fs-none
       /tmp/chpl-runner-15040.deleteme/.bin.tio.tmp.o
       /opt/chapel//lib/chapel/1.16/runtime/lib/linux64/gnu/arch-native/loc-flat/comm-none/tasks-qthreads/tmr-generic/unwind-none/mem-jemalloc/atomics-intrinsics/gmp/hwloc/re2/wide-struct/fs-none/main.o
       -lchpl
       -lm
       -lblas -L/usr/lib64/atlas
       -ltatlas
       -lgmp
       -lchpl
       -lqthread -L/opt/chapel//lib/chapel/1.16/third-party/hwloc/install/linux64-gnu-native-flat/lib
       -L/opt/chapel//lib/chapel/1.16/third-party/jemalloc/install/linux64-gnu-native/lib
       -ljemalloc
       -lhwloc
       -lm
       -lre2
       -lpthread

最后但同样重要的是,让我分享
一些关于性能数据的最后评论
, 设置开销和一系列实验可探索的边界

虽然最大的 [PAR] powers were outside the scope-of-testability ( administratively not available for obvious reasons on the public sponsored <TiO>-IDE infrastructure ) and might get further investigated on more realistic computing devices, as those that are for example available within the Cray's internal resources available and used by Cray's Chapel-initiative, the essence of both the benefits from the 语言表现力和语言实现的实际状态 令人印象深刻。

需要调查的其他一些问题可能是:


致谢

再次感谢 Dennis @<TiO>-IDE 支持和 Brad @Cray + 祝 团队一切顺利,推动和扩展这个伟大的软件项目仍然越来越好。


writeln(// "______________________________________ChplCode.<-lsatlas> implementation___________________________________ SERIAL-MODE [ATLAS] SUPPORT FOR [LinearAlgebra] MODULE" );
           "______________________________________ChplCode.<-ltatlas> implementation_________________________________ THREADED-MODE [ATLAS] SUPPORT FOR [LinearAlgebra] MODULE" );
        /* 
           As the experimentally collected performance-data show and support below,
           there is about a constant,
           Matrix scale-invariant,
           additional overhead of ~ +440 ~ +500 [ms]
           for
           a THREADED-MODE [ATLAS] SUPPORT FOR [LinearAlgebra] MODULE,
           believed to be
           associated with a setup of a thread-pool & al processing pre-arrangements,
           which
           ought be accounted for in
           an overhead-aware Amdahl Law formulation for pre-validations of a feasible choice
           whether a [PAR], using -ltatlas
           or      a [SEQ], using -lsatlas support for the [LinearAlgebra] module implementation
           will yield faster processing times.
           */

                    use Time;

var aStopWATCH_RND_GEN: Time.Timer;
var aStopWATCH_LIN_ALG: Time.Timer;
var aStopWATCH_MAT_REC: Time.Timer;
var aStopWATCH_ARR_REC: Time.Timer;

config const n_power =         5;
config const L_size  =      2600;
       const indices = 1..L_size;
       const aDomain = {indices, indices};

       var   A: [aDomain] real(64);
       var   B: [aDomain] real(64);
       const dtype =    "-real(64)";
       var   S: [aDomain] real(64);

use Random;
/* ---------------------------------------------SECTION-UNDER-TEST--*/     aStopWATCH_RND_GEN.start();
    Random.fillRandom(  A );
    Random.fillRandom(  B );
/* ---------------------------------------------SECTION-UNDER-TEST--*/     aStopWATCH_RND_GEN.stop();
writeln( ".fillRandom()        took ",                                     aStopWATCH_RND_GEN.elapsed( Time.TimeUnits.microseconds ),
         " [us] for A[,], B[,] having ", 2 * ( L_size * L_size ), dtype, " elements in total." );

     use LinearAlgebra;
var mA = LinearAlgebra.Matrix( A );
var mB = LinearAlgebra.Matrix( B );
var mS = LinearAlgebra.Matrix( S );

/* ---------------------------------------------SECTION-UNDER-TEST--*/     aStopWATCH_LIN_ALG.start();
    mS = LinearAlgebra.matPlus( LinearAlgebra.matPow( mA, n_power ),
                                LinearAlgebra.matPow( mB, n_power )
                                );
/* ---------------------------------------------SECTION-UNDER-TEST--*/     aStopWATCH_LIN_ALG.stop();
/* |||||||||||||||||||||||||||||||||||||||||||||||||||||||||| PERF--*/

writeln( ".fillRandom()        took ", aStopWATCH_RND_GEN.elapsed( Time.TimeUnits.microseconds ), " [us] for A[,], B[,] having ", 2 * ( L_size * L_size ), dtype, " elements in total." );
writeln(
       "\n<SECTION-UNDER-TEST> took ", aStopWATCH_LIN_ALG.elapsed( Time.TimeUnits.microseconds ), " [us] in [LIN_ALG] mode ( A^n + B^n ) for [", L_size, ",", L_size, "] on <TiO>-IDE"
   // ,"\n<SECTION-UNDER-TEST> took ", aStopWATCH_MAT_REC.elapsed( Time.TimeUnits.microseconds ), " [us] in [MAT_REC] mode ( A^n + B^n ) for [", L_size, ",", L_size, "] on <TiO>-IDE"
   // ,"\n<SECTION-UNDER-TEST> took ", aStopWATCH_ARR_REC.elapsed( Time.TimeUnits.microseconds ), " [us] in [ARR_REC] mode ( A^n + B^n ) for [", L_size, ",", L_size, "] on <TiO>-IDE"
        );
/* ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| INF--*/

writeln(                     "<TiO>-IDE-LocaleSpace is: ", LocaleSpace, " massive. Code is executing [here], being Locale ", here.id  );
for                                                i in    LocaleSpace do
    writeln(                 "          Locale #", i, "'s ID is: ", Locales[i].id,                                            "\n                                having a name of <_",
                                                                    Locales[i].name,                                        "_>\n                                having { REAL:"              ,
                                                                 // Locales[i].numPUs( logical = false, accessible =  true ),   " | VIRT:"                       ,
                                                                    Locales[i].numPUs(           false,               true ),   " | VIRT:"                       ,
                                                                 // Locales[i].numPUs( logical =  true, accessible =  true ),   " | TEOR:"                       ,
                                                                    Locales[i].numPUs(            true,               true ),   " | TEOR:"                       ,
                                                                 // Locales[i].numPUs( logical =  true, accessible = false ),   " } PUnits"                      ,
                                                                    Locales[i].numPUs(            true,              false ),   " } PUnits\n                                having max ",
                                                                    Locales[i].maxTaskPar,                                      " 'just'-[CONCURRENT]-tasks\n                                having max ",
                                                                    Locales[i].callStackSize,                                   "-callStackSIZE."
                                                                    );

/* ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| RES:

.fillRandom()        took      560773 [us] for A[,], B[,] having  2000000-real(64) elements in total. <BEST-CASE>s IN SERIAL-MODE [ATLAS] SUPPORT FOR [LinearAlgebra] MODULE
.fillRandom()        took     2521920 [us] for A[,], B[,] having  8000000-real(64) elements in total.
.fillRandom()        took     2717450 [us] for A[,], B[,] having  9680000-real(64) elements in total.
.fillRandom()        took     3630820 [us] for A[,], B[,] having 11520000-real(64) elements in total.

.fillRandom()        took     4429820 [us] for A[,], B[,] having 13520000-real(64) elements in total.
.fillRandom()        took     4048440 [us] for A[,], B[,] having 13520000-real(64) elements in total. ( IN THREADED-MODE ) was faster, but not systematically

.fillRandom()        took     4793110 [us] for A[,], B[,] having 15680000-real(64) elements in total. 
.fillRandom()        took     5055060 [us] for A[,], B[,] having 15680000-real(64) elements in total. ( IN THREADED-MODE )

.fillRandom()        took     5630540 [us] for A[,], B[,] having 18000000-real(64) elements in total.



<TiO>-IDE-LocaleSpace is: {0..0} massive. Code is executing [here], being Locale 0
          Locale #0's ID is: 0
                                having a name of <_tio2_>
                                having { REAL:1 | VIRT:1 | TEOR:1 } PUnits
                                having max 4 'just'-[CONCURRENT]-tasks
                                having max 8388608-callStackSIZE.

<SECTION-UNDER-TEST> took    15110000 [us] in [LIN_ALG] mode ( A^n + B^n ) for [2000,2000] on <TiO>-IDE <BEST-CASE>s IN SERIAL-MODE
<SECTION-UNDER-TEST> took    17880300 [us] in [LIN_ALG] mode ( A^n + B^n ) for [2200,2200] on <TiO>-IDE
<SECTION-UNDER-TEST> took    25094100 [us] in [LIN_ALG] mode ( A^n + B^n ) for [2400,2400] on <TiO>-IDE
<SECTION-UNDER-TEST> took    31550900 [us] in [LIN_ALG] mode ( A^n + B^n ) for [2600,2600] on <TiO>-IDE
<SECTION-UNDER-TEST> took    32996500 [us] in [LIN_ALG] mode ( A^n + B^n ) for [2600,2600] on <TiO>-IDE
<SECTION-UNDER-TEST> took    34390400 [us] in [LIN_ALG] mode ( A^n + B^n ) for [2800,2800] on <TiO>-IDE
<SECTION-UNDER-TEST> KILL-ed                                               for [3000,3000] on <TiO>-IDE, having 18,000,000-real(64) elements .fillRandom()-ed in ~ 5.6 [s] time.

______________________________________ChplCode.<-lsatlas> implementation___________________________________ SERIAL-MODE [ATLAS] SUPPORT FOR [LinearAlgebra] MODULE
                                                  ^________________________________________________________ SERIAL-MODE [ATLAS] SUPPORT FOR [LinearAlgebra] MODULE
.fillRandom()        took 5.60773e+05 [us] for A[,], B[,] having 2000000-real(64) elements in total.
.fillRandom()        took 5.62970e+05 [us] for A[,], B[,] having 2000000-real(64) elements in total.
.fillRandom()        took 5.64366e+05 [us] for A[,], B[,] having 2000000-real(64) elements in total.
.fillRandom()        took 5.70291e+05 [us] for A[,], B[,] having 2000000-real(64) elements in total.
.fillRandom()        took 5.75086e+05 [us] for A[,], B[,] having 2000000-real(64) elements in total.
.fillRandom()        took 5.85121e+05 [us] for A[,], B[,] having 2000000-real(64) elements in total.
.fillRandom()        took 6.25645e+05 [us] for A[,], B[,] having 2000000-real(64) elements in total.
.fillRandom()        took 6.77903e+05 [us] for A[,], B[,] having 2000000-real(64) elements in total.
.fillRandom()        took 6.96932e+05 [us] for A[,], B[,] having 2000000-real(64) elements in total.
.fillRandom()        took 6.98700e+05 [us] for A[,], B[,] having 2000000-real(64) elements in total.

<SECTION-UNDER-TEST> took 2.06538e+06 [us] in [LIN_ALG] mode ( A^n + B^n ) for [1000,1000] on <TiO>-IDE
<SECTION-UNDER-TEST> took 2.07902e+06 [us] in [LIN_ALG] mode ( A^n + B^n ) for [1000,1000] on <TiO>-IDE
<SECTION-UNDER-TEST> took 2.08725e+06 [us] in [LIN_ALG] mode ( A^n + B^n ) for [1000,1000] on <TiO>-IDE
<SECTION-UNDER-TEST> took 2.12497e+06 [us] in [LIN_ALG] mode ( A^n + B^n ) for [1000,1000] on <TiO>-IDE
<SECTION-UNDER-TEST> took 2.13071e+06 [us] in [LIN_ALG] mode ( A^n + B^n ) for [1000,1000] on <TiO>-IDE
<SECTION-UNDER-TEST> took 2.22075e+06 [us] in [LIN_ALG] mode ( A^n + B^n ) for [1000,1000] on <TiO>-IDE
<SECTION-UNDER-TEST> took 2.28035e+06 [us] in [LIN_ALG] mode ( A^n + B^n ) for [1000,1000] on <TiO>-IDE
<SECTION-UNDER-TEST> took 2.32674e+06 [us] in [LIN_ALG] mode ( A^n + B^n ) for [1000,1000] on <TiO>-IDE
<SECTION-UNDER-TEST> took 2.33844e+06 [us] in [LIN_ALG] mode ( A^n + B^n ) for [1000,1000] on <TiO>-IDE
<SECTION-UNDER-TEST> took 2.35908e+06 [us] in [LIN_ALG] mode ( A^n + B^n ) for [1000,1000] on <TiO>-IDE


______________________________________ChplCode.<-ltatlas> implementation_________________________________ THREADED-MODE [ATLAS] SUPPORT FOR [LinearAlgebra] MODULE
                                                  ^______________________________________________________ THREADED-MODE [ATLAS] SUPPORT FOR [LinearAlgebra] MODULE
.fillRandom()        took 5.68652e+05 [us] for A[,], B[,] having 2000000-real(64) elements in total.
.fillRandom()        took 5.73797e+05 [us] for A[,], B[,] having 2000000-real(64) elements in total.
.fillRandom()        took 5.74911e+05 [us] for A[,], B[,] having 2000000-real(64) elements in total.
.fillRandom()        took 5.81389e+05 [us] for A[,], B[,] having 2000000-real(64) elements in total.
.fillRandom()        took 5.87079e+05 [us] for A[,], B[,] having 2000000-real(64) elements in total.
.fillRandom()        took 5.92182e+05 [us] for A[,], B[,] having 2000000-real(64) elements in total.
.fillRandom()        took 6.20989e+05 [us] for A[,], B[,] having 2000000-real(64) elements in total.
.fillRandom()        took 6.62606e+05 [us] for A[,], B[,] having 2000000-real(64) elements in total.
.fillRandom()        took 6.69875e+05 [us] for A[,], B[,] having 2000000-real(64) elements in total.
.fillRandom()        took 6.71270e+05 [us] for A[,], B[,] having 2000000-real(64) elements in total.

<SECTION-UNDER-TEST> took 2.53459e+06 [us] in [LIN_ALG] mode ( A^n + B^n ) for [1000,1000] on <TiO>-IDE
<SECTION-UNDER-TEST> took 2.57695e+06 [us] in [LIN_ALG] mode ( A^n + B^n ) for [1000,1000] on <TiO>-IDE
<SECTION-UNDER-TEST> took 2.59966e+06 [us] in [LIN_ALG] mode ( A^n + B^n ) for [1000,1000] on <TiO>-IDE
<SECTION-UNDER-TEST> took 2.61859e+06 [us] in [LIN_ALG] mode ( A^n + B^n ) for [1000,1000] on <TiO>-IDE
<SECTION-UNDER-TEST> took 2.70356e+06 [us] in [LIN_ALG] mode ( A^n + B^n ) for [1000,1000] on <TiO>-IDE
<SECTION-UNDER-TEST> took 2.76325e+06 [us] in [LIN_ALG] mode ( A^n + B^n ) for [1000,1000] on <TiO>-IDE
<SECTION-UNDER-TEST> took 2.85588e+06 [us] in [LIN_ALG] mode ( A^n + B^n ) for [1000,1000] on <TiO>-IDE
<SECTION-UNDER-TEST> took 2.92058e+06 [us] in [LIN_ALG] mode ( A^n + B^n ) for [1000,1000] on <TiO>-IDE
<SECTION-UNDER-TEST> took 2.92204e+06 [us] in [LIN_ALG] mode ( A^n + B^n ) for [1000,1000] on <TiO>-IDE
<SECTION-UNDER-TEST> took 2.97887e+06 [us] in [LIN_ALG] mode ( A^n + B^n ) for [1000,1000] on <TiO>-IDE


<TiO>-IDE-LocaleSpace is: {0..0} massive.             +--------------------------------<_tio2_>
Code is executing [here], being Locale 0              V
          Locale #0's ID is: 0, having a name of <_tio2_>
                                having { REAL:1 | VIRT:1 | TEOR:1 } PUnits,
                                having max 4 just-[CONCURENT]-tasks,
                                having 8388608-callStackSIZE.

                                                      +--------------------------------<_tio3_>
                                                      V
...                             having a name of <_tio3_>
                                having { REAL:1 | VIRT:1 | TEOR:1 } PUnits
                                having max 4 'just'-[CONCURRENT]-tasks
                                having max 8388608-callStackSIZE.


*/