torch.nn.CrossEntropyLoss().ignore_index 在导入转换器库时崩溃

torch.nn.CrossEntropyLoss().ignore_index is crashing when importing transfomers library

我正在使用 layoutlm github,这需要 python 3.6transformer 2.9.0。我创建了一个 conda 环境:

name: env_test

    channels:
    - defaults
    - conda-forge
    dependencies:
    - python=3.6
    - pip=20.3.3
    - pytorch=1.4.0
    - cudatoolkit=10.1
    - pip:
      - transformers==2.9.0

我有以下 test.py 代码来重现该问题:

import sys

import torch
from torch.nn import CrossEntropyLoss

from transformers import (
    BertConfig,
    __version__
)

print (sys.version)
print(torch.__version__)
print(__version__)
CrossEntropyLoss().ignore_index

print("success!")

调用 CrossEntropyLoss().ignore_index:

时导入 transformers 库导致分段错误(核心转储)a
$python test.py 
3.6.12 |Anaconda, Inc.| (default, Sep  8 2020, 23:10:56) 
[GCC 7.3.0]
1.4.0
2.9.0
Segmentation fault (core dumped)

我试着调查了一下,但我真的看不出问题出在哪里:

gdb python
GNU gdb (Ubuntu 8.1.1-0ubuntu1) 8.1.1
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from python...done.
(gdb) r test.py 
Starting program: /home/jupyter/.conda-env/env_test/bin/python test.py
warning: Error disabling address space randomization: Operation not permitted
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
3.6.12 |Anaconda, Inc.| (default, Sep  8 2020, 23:10:56) 
[GCC 7.3.0]
1.4.0
2.9.0

Program received signal SIGSEGV, Segmentation fault.
0x00007f97000055fb in ?? ()
(gdb) where
#0  0x00007f97000055fb in ?? ()
#1  0x00007f97f4755729 in void pybind11::cpp_function::initialize<void (*&)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&), void, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, pybind11::name, pybind11::scope, pybind11::sibling>(void (*&)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&), void (*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&), pybind11::name const&, pybind11::scope const&, pybind11::sibling const&)::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) ()
   from /home/jupyter/.conda-env/env_test/lib/python3.6/site-packages/torch/lib/libtorch_python.so
#2  0x00007f97f436bca6 in pybind11::cpp_function::dispatcher(_object*, _object*, _object*) () from /home/jupyter/.conda-env/env_test/lib/python3.6/site-packages/torch/lib/libtorch_python.so
#3  0x000055fbadd73a14 in _PyCFunction_FastCallDict () at /tmp/build/80754af9/python_1599604603603/work/Objects/methodobject.c:231
#4  0x000055fbaddfba5c in call_function () at /tmp/build/80754af9/python_1599604603603/work/Python/ceval.c:4851
#5  0x000055fbade1e25a in _PyEval_EvalFrameDefault () at /tmp/build/80754af9/python_1599604603603/work/Python/ceval.c:3335
#6  0x000055fbaddf5c1b in _PyFunction_FastCall (globals=<optimized out>, nargs=1, args=<optimized out>, co=<optimized out>) at /tmp/build/80754af9/python_1599604603603/work/Python/ceval.c:4933
#7  fast_function () at /tmp/build/80754af9/python_1599604603603/work/Python/ceval.c:4968
#8  0x000055fbaddfbb35 in call_function () at /tmp/build/80754af9/python_1599604603603/work/Python/ceval.c:4872
#9  0x000055fbade1e25a in _PyEval_EvalFrameDefault () at /tmp/build/80754af9/python_1599604603603/work/Python/ceval.c:3335
#10 0x000055fbaddf5166 in _PyEval_EvalCodeWithName () at /tmp/build/80754af9/python_1599604603603/work/Python/ceval.c:4166
#11 0x000055fbaddf5e51 in fast_function () at /tmp/build/80754af9/python_1599604603603/work/Python/ceval.c:4992
#12 0x000055fbaddfbb35 in call_function () at /tmp/build/80754af9/python_1599604603603/work/Python/ceval.c:4872
#13 0x000055fbade1e25a in _PyEval_EvalFrameDefault () at /tmp/build/80754af9/python_1599604603603/work/Python/ceval.c:3335
#14 0x000055fbaddf5166 in _PyEval_EvalCodeWithName () at /tmp/build/80754af9/python_1599604603603/work/Python/ceval.c:4166
#15 0x000055fbaddf5e51 in fast_function () at /tmp/build/80754af9/python_1599604603603/work/Python/ceval.c:4992
#16 0x000055fbaddfbb35 in call_function () at /tmp/build/80754af9/python_1599604603603/work/Python/ceval.c:4872
#17 0x000055fbade1e25a in _PyEval_EvalFrameDefault () at /tmp/build/80754af9/python_1599604603603/work/Python/ceval.c:3335
#18 0x000055fbaddf5166 in _PyEval_EvalCodeWithName () at /tmp/build/80754af9/python_1599604603603/work/Python/ceval.c:4166
#19 0x000055fbaddf632c in _PyFunction_FastCallDict () at /tmp/build/80754af9/python_1599604603603/work/Python/ceval.c:5084
#20 0x000055fbadd73ddf in _PyObject_FastCallDict () at /tmp/build/80754af9/python_1599604603603/work/Objects/abstract.c:2310
#21 0x000055fbadd78873 in _PyObject_Call_Prepend () at /tmp/build/80754af9/python_1599604603603/work/Objects/abstract.c:2373
#22 0x000055fbadd7381e in PyObject_Call () at /tmp/build/80754af9/python_1599604603603/work/Objects/abstract.c:2261
#23 0x000055fbaddcc88b in slot_tp_init () at /tmp/build/80754af9/python_1599604603603/work/Objects/typeobject.c:6420
#24 0x000055fbaddfbd97 in type_call () at /tmp/build/80754af9/python_1599604603603/work/Objects/typeobject.c:915
#25 0x000055fbadd73bfb in _PyObject_FastCallDict () at /tmp/build/80754af9/python_1599604603603/work/Objects/abstract.c:2331
#26 0x000055fbaddfbbae in call_function () at /tmp/build/80754af9/python_1599604603603/work/Python/ceval.c:4875
#27 0x000055fbade1e25a in _PyEval_EvalFrameDefault () at /tmp/build/80754af9/python_1599604603603/work/Python/ceval.c:3335
#28 0x000055fbaddf6969 in _PyEval_EvalCodeWithName (qualname=0x0, name=<optimized out>, closure=0x0, kwdefs=0x0, defcount=0, defs=0x0, kwstep=2, kwcount=<optimized out>, kwargs=0x0, kwnames=0x0, argcount=0, args=0x0, 
    locals=0x7f98035bf1f8, globals=0x7f98035bf1f8, _co=0x7f980357aae0) at /tmp/build/80754af9/python_1599604603603/work/Python/ceval.c:4166
#29 PyEval_EvalCodeEx () at /tmp/build/80754af9/python_1599604603603/work/Python/ceval.c:4187
#30 0x000055fbaddf770c in PyEval_EvalCode (co=co@entry=0x7f980357aae0, globals=globals@entry=0x7f98035bf1f8, locals=locals@entry=0x7f98035bf1f8) at /tmp/build/80754af9/python_1599604603603/work/Python/ceval.c:731
#31 0x000055fbade77574 in run_mod () at /tmp/build/80754af9/python_1599604603603/work/Python/pythonrun.c:1025
#32 0x000055fbade77971 in PyRun_FileExFlags () at /tmp/build/80754af9/python_1599604603603/work/Python/pythonrun.c:978
#33 0x000055fbade77b73 in PyRun_SimpleFileExFlags () at /tmp/build/80754af9/python_1599604603603/work/Python/pythonrun.c:419
#34 0x000055fbade77c7d in PyRun_AnyFileExFlags () at /tmp/build/80754af9/python_1599604603603/work/Python/pythonrun.c:81
#35 0x000055fbade7b663 in run_file (p_cf=0x7fff210dc16c, filename=0x55fbaefa6dc0 L"test.py", fp=0x55fbaefda800) at /tmp/build/80754af9/python_1599604603603/work/Modules/main.c:340
#36 Py_Main () at /tmp/build/80754af9/python_1599604603603/work/Modules/main.c:811
#37 0x000055fbadd4543e in main () at /tmp/build/80754af9/python_1599604603603/work/Programs/python.c:69
#38 0x00007f9803fd6bf7 in __libc_start_main (main=0x55fbadd45350 <main>, argc=2, argv=0x7fff210dc378, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fff210dc368) at ../csu/libc-start.c:310
#39 0x000055fbade24d0b in _start () at ../sysdeps/x86_64/elf/start.S:103
(gdb

我的包列表如下:

_libgcc_mutex             0.1                        main    defaults
_pytorch_select           0.2                       gpu_0    defaults
blas                      1.0                         mkl    defaults
ca-certificates           2020.12.8            h06a4308_0    defaults
certifi                   2020.12.5        py36h06a4308_0    defaults
cffi                      1.14.4           py36h261ae71_0    defaults
chardet                   4.0.0                    pypi_0    pypi
click                     7.1.2                    pypi_0    pypi
cudatoolkit               10.1.243             h6bb024c_0    defaults
cudnn                     7.6.5                cuda10.1_0    defaults
dataclasses               0.8                      pypi_0    pypi
filelock                  3.0.12                   pypi_0    pypi
idna                      2.10                     pypi_0    pypi
intel-openmp              2020.2                      254    defaults
joblib                    1.0.0                    pypi_0    pypi
ld_impl_linux-64          2.33.1               h53a641e_7    defaults
libedit                   3.1.20191231         h14c3975_1    defaults
libffi                    3.3                  he6710b0_2    defaults
libgcc-ng                 9.1.0                hdf63c60_0    defaults
libstdcxx-ng              9.1.0                hdf63c60_0    defaults
mkl                       2020.2                      256    defaults
mkl-service               2.3.0            py36he8ac12f_0    defaults
mkl_fft                   1.2.0            py36h23d657b_0    defaults
mkl_random                1.1.1            py36h0573a6f_0    defaults
ncurses                   6.2                  he6710b0_1    defaults
ninja                     1.10.2           py36hff7bd54_0    defaults
numpy                     1.19.2           py36h54aff64_0    defaults
numpy-base                1.19.2           py36hfa32c7d_0    defaults
openssl                   1.1.1i               h27cfd23_0    defaults
pip                       20.3.3           py36h06a4308_0    defaults
pycparser                 2.20                       py_2    defaults
python                    3.6.12               hcff3b4d_2    defaults
pytorch                   1.4.0           cuda101py36h02f0884_0    defaults
readline                  8.0                  h7b6447c_0    defaults
regex                     2020.11.13               pypi_0    pypi
requests                  2.25.1                   pypi_0    pypi
sacremoses                0.0.43                   pypi_0    pypi
sentencepiece             0.1.94                   pypi_0    pypi
setuptools                51.0.0           py36h06a4308_2    defaults
six                       1.15.0           py36h06a4308_0    defaults
sqlite                    3.33.0               h62c20be_0    defaults
tk                        8.6.10               hbc83047_0    defaults
tokenizers                0.7.0                    pypi_0    pypi
tqdm                      4.55.1                   pypi_0    pypi
transformers              2.9.0                    pypi_0    pypi
urllib3                   1.26.2                   pypi_0    pypi
wheel                     0.36.2             pyhd3eb1b0_0    defaults
xz                        5.2.5                h7b6447c_0    defaults
zlib                      1.2.11               h7b6447c_3    defaults

这个核心转储的原因是什么(我有一个内存为 30 GB 的虚拟机)?似乎与transformers有关。 conda 没有发现一些依赖性问题?这段代码似乎适用于最新版本的 transformers 4.1.1,但这与 layoutlm 不兼容。有什么建议吗?

layoutlmpytorch 1.4 related issue 似乎有什么问题。切换到 pytorch 1.6 修复了核心转储的问题,layoutlm 代码 运行 没有任何修改。