PMPI 和 otf2:在 CPP 程序中链接 C 代码
PMPI and otf2: linking C code in CPP program
我有一个由 wrap.py. wrap.py is used to produce wrapper for MPI program. It redirects any normal MPI call to PMPI call for intercepting purposes in order to do e.g. performance analysis. Pls download the generated code here. I use otf2 生成的 CPP 程序来跟踪 MPI 程序。
代码解释:
// test4.cpp
__attribute__((constructor)) void init(void)
{
if(!is_init)
{
archive = OTF2_Archive_Open( "./",
"ArchiveTest",
OTF2_FILEMODE_WRITE,
1024 * 1024 /* event chunk size */,
4 * 1024 * 1024 /* def chunk size */,
OTF2_SUBSTRATE_POSIX,
OTF2_COMPRESSION_NONE );
is_init = true;
}
}
__attribute__((destructor)) void fini(void)
{
if(is_init)
{
OTF2_Archive_Close( archive );
is_init = false;
}
}
我要把代码编译成一个.so文件。所以当它被导入时, constructor
会被调用;当 .so 分离时,调用 destructor
。
根据otf2的官方文档here,我编译程序:
mpic++ -fpic -c `otf2-config --cflags` -o test4.o test4.cpp
mpic++ -shared -o libtest4.so `otf2-config --ldflags` `otf2-config --libs` test4.o
如果你扩展上面的命令行,你会得到:
mpic++ -fpic -c -I/usr/include -o test4.o test4.cpp
mpic++ -shared -o libtest4.so -L/usr/lib -lotf2 -lm test4.o
截获的MPI程序来自here.
拦截:
$ mpirun -n 2 -x LD_PRELOAD=./libtest4.so ./send_recv
./send_recv: symbol lookup error: ./libtest4.so: undefined symbol: OTF2_Archive_Open
./send_recv: symbol lookup error: ./libtest4.so: undefined symbol: OTF2_Archive_Open
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[20246,1],0]
Exit code: 127
--------------------------------------------------------------------------
看来混合使用 C 和 CPP 会导致问题。链接器无法为 OTF2_Archive_Open
和 OTF2_Archive_Close
.
的 C 函数正确生成符号
我添加了 2 个声明来告诉链接器那些是 C 函数(下载修改后的程序here):
_EXTERN_C_ OTF2_Archive* OTF2_Archive_Open ( const char * archivePath,
const char * archiveName,
const OTF2_FileMode fileMode,
const uint64_t chunkSizeEvents,
const uint64_t chunkSizeDefs,
const OTF2_FileSubstrate fileSubstrate,
const OTF2_Compression compression
);
_EXTERN_C_ OTF2_ErrorCode OTF2_Archive_Close ( OTF2_Archive * archive );
但是上面的问题依然存在。和建议?
更新 1:
OTF2提供的是.a文件,不是.so文件。
$ nm /usr/lib/libotf2.a| grep -i OTF2_Archive_Open
U otf2_archive_open
0000000000000000 T OTF2_Archive_Open
U otf2_archive_open_def_files
00000000000032e0 T OTF2_Archive_OpenDefFiles
U otf2_archive_open_evt_files
00000000000030e0 T OTF2_Archive_OpenEvtFiles
U otf2_archive_open_snap_files
00000000000034e0 T OTF2_Archive_OpenSnapFiles
U OTF2_Archive_Open
0000000000001180 T otf2_archive_open
0000000000005a40 T otf2_archive_open_def_files
U OTF2_Archive_OpenDefFiles
0000000000005880 T otf2_archive_open_evt_files
U OTF2_Archive_OpenEvtFiles
0000000000005c00 T otf2_archive_open_snap_files
U OTF2_Archive_OpenSnapFiles
$ ldd ./libtest4.so
linux-vdso.so.1 => (0x00007ffe3a6ce000)
libmpi_cxx.so.1 => /usr/lib/libmpi_cxx.so.1 (0x00007f4757d67000)
libmpi.so.12 => /usr/lib/libmpi.so.12 (0x00007f4757a91000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f475770e000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f47574f8000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f475712e000)
libibverbs.so.1 => /usr/lib/libibverbs.so.1 (0x00007f4756f1e000)
libopen-rte.so.12 => /usr/lib/libopen-rte.so.12 (0x00007f4756ca4000)
libopen-pal.so.13 => /usr/lib/libopen-pal.so.13 (0x00007f4756a07000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f47567e9000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f47564e0000)
/lib64/ld-linux-x86-64.so.2 (0x00005620bef03000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f47562dc000)
libhwloc.so.5 => /usr/lib/x86_64-linux-gnu/libhwloc.so.5 (0x00007f47560a1000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f4755e99000)
libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00007f4755c96000)
libnuma.so.1 => /usr/lib/x86_64-linux-gnu/libnuma.so.1 (0x00007f4755a8a000)
libltdl.so.7 => /usr/lib/x86_64-linux-gnu/libltdl.so.7 (0x00007f4755880000)
$ nm ./libtest4.so | grep -i OTF2_Archive_Open
U OTF2_Archive_Open
奇怪的是,我在 ldd
的输出中没有看到任何 libotf2.a
。但是,如果您从他们的网站上试用 otf2 mpi writer 的标准示例,它就会成功。并且 otf2 mpi writer 的标准示例的 ldd
的输出也不包含 libotf2.a
。
您可以找到示例 here。
linking 的顺序很重要。你必须在你 link 的图书馆前面有你自己的图书馆,例如
mpic++ -shared test4.o -o libtest4.so `otf2-config --ldflags` `otf2-config --libs`
linker 从左到右解析未知符号。有关详细信息,请参阅 this answer。
如果 otf2.a
不是用 -fPIC
构建的,那可能仍然不起作用。我建议使用 --enable-shared
配置 otf2 并改用 .so
。
我有一个由 wrap.py. wrap.py is used to produce wrapper for MPI program. It redirects any normal MPI call to PMPI call for intercepting purposes in order to do e.g. performance analysis. Pls download the generated code here. I use otf2 生成的 CPP 程序来跟踪 MPI 程序。
代码解释:
// test4.cpp
__attribute__((constructor)) void init(void)
{
if(!is_init)
{
archive = OTF2_Archive_Open( "./",
"ArchiveTest",
OTF2_FILEMODE_WRITE,
1024 * 1024 /* event chunk size */,
4 * 1024 * 1024 /* def chunk size */,
OTF2_SUBSTRATE_POSIX,
OTF2_COMPRESSION_NONE );
is_init = true;
}
}
__attribute__((destructor)) void fini(void)
{
if(is_init)
{
OTF2_Archive_Close( archive );
is_init = false;
}
}
我要把代码编译成一个.so文件。所以当它被导入时, constructor
会被调用;当 .so 分离时,调用 destructor
。
根据otf2的官方文档here,我编译程序:
mpic++ -fpic -c `otf2-config --cflags` -o test4.o test4.cpp
mpic++ -shared -o libtest4.so `otf2-config --ldflags` `otf2-config --libs` test4.o
如果你扩展上面的命令行,你会得到:
mpic++ -fpic -c -I/usr/include -o test4.o test4.cpp
mpic++ -shared -o libtest4.so -L/usr/lib -lotf2 -lm test4.o
截获的MPI程序来自here.
拦截:
$ mpirun -n 2 -x LD_PRELOAD=./libtest4.so ./send_recv
./send_recv: symbol lookup error: ./libtest4.so: undefined symbol: OTF2_Archive_Open
./send_recv: symbol lookup error: ./libtest4.so: undefined symbol: OTF2_Archive_Open
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[20246,1],0]
Exit code: 127
--------------------------------------------------------------------------
看来混合使用 C 和 CPP 会导致问题。链接器无法为 OTF2_Archive_Open
和 OTF2_Archive_Close
.
我添加了 2 个声明来告诉链接器那些是 C 函数(下载修改后的程序here):
_EXTERN_C_ OTF2_Archive* OTF2_Archive_Open ( const char * archivePath,
const char * archiveName,
const OTF2_FileMode fileMode,
const uint64_t chunkSizeEvents,
const uint64_t chunkSizeDefs,
const OTF2_FileSubstrate fileSubstrate,
const OTF2_Compression compression
);
_EXTERN_C_ OTF2_ErrorCode OTF2_Archive_Close ( OTF2_Archive * archive );
但是上面的问题依然存在。和建议?
更新 1: OTF2提供的是.a文件,不是.so文件。
$ nm /usr/lib/libotf2.a| grep -i OTF2_Archive_Open
U otf2_archive_open
0000000000000000 T OTF2_Archive_Open
U otf2_archive_open_def_files
00000000000032e0 T OTF2_Archive_OpenDefFiles
U otf2_archive_open_evt_files
00000000000030e0 T OTF2_Archive_OpenEvtFiles
U otf2_archive_open_snap_files
00000000000034e0 T OTF2_Archive_OpenSnapFiles
U OTF2_Archive_Open
0000000000001180 T otf2_archive_open
0000000000005a40 T otf2_archive_open_def_files
U OTF2_Archive_OpenDefFiles
0000000000005880 T otf2_archive_open_evt_files
U OTF2_Archive_OpenEvtFiles
0000000000005c00 T otf2_archive_open_snap_files
U OTF2_Archive_OpenSnapFiles
$ ldd ./libtest4.so
linux-vdso.so.1 => (0x00007ffe3a6ce000)
libmpi_cxx.so.1 => /usr/lib/libmpi_cxx.so.1 (0x00007f4757d67000)
libmpi.so.12 => /usr/lib/libmpi.so.12 (0x00007f4757a91000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f475770e000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f47574f8000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f475712e000)
libibverbs.so.1 => /usr/lib/libibverbs.so.1 (0x00007f4756f1e000)
libopen-rte.so.12 => /usr/lib/libopen-rte.so.12 (0x00007f4756ca4000)
libopen-pal.so.13 => /usr/lib/libopen-pal.so.13 (0x00007f4756a07000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f47567e9000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f47564e0000)
/lib64/ld-linux-x86-64.so.2 (0x00005620bef03000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f47562dc000)
libhwloc.so.5 => /usr/lib/x86_64-linux-gnu/libhwloc.so.5 (0x00007f47560a1000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f4755e99000)
libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00007f4755c96000)
libnuma.so.1 => /usr/lib/x86_64-linux-gnu/libnuma.so.1 (0x00007f4755a8a000)
libltdl.so.7 => /usr/lib/x86_64-linux-gnu/libltdl.so.7 (0x00007f4755880000)
$ nm ./libtest4.so | grep -i OTF2_Archive_Open
U OTF2_Archive_Open
奇怪的是,我在 ldd
的输出中没有看到任何 libotf2.a
。但是,如果您从他们的网站上试用 otf2 mpi writer 的标准示例,它就会成功。并且 otf2 mpi writer 的标准示例的 ldd
的输出也不包含 libotf2.a
。
您可以找到示例 here。
linking 的顺序很重要。你必须在你 link 的图书馆前面有你自己的图书馆,例如
mpic++ -shared test4.o -o libtest4.so `otf2-config --ldflags` `otf2-config --libs`
linker 从左到右解析未知符号。有关详细信息,请参阅 this answer。
如果 otf2.a
不是用 -fPIC
构建的,那可能仍然不起作用。我建议使用 --enable-shared
配置 otf2 并改用 .so
。