Linux 内核4.2.x: 为什么检查时预期的系统调用地址与实际地址不匹配？

Question

短背景

我目前正在编写一个 linux 内核模块作为一个项目，以更好地理解 linux 内核内部结构。我以前写过 'hello world' 类型的模块，但我想超越它，所以我试图替换一些常见的系统调用，如 open、read、write，然后 close 加上我自己的，这样我就可以 print 在系统日志中输入更多信息。

我在搜索时发现的一些内容是 pre-2.6 内核，这没有用，因为 sys_call_table 符号从内核 2.6.x 开始停止导出。另一方面，我为 2.6.x 或更高版本找到的那些似乎有它们自己的问题，即使它们当时显然有效。

一个特定的 O'Reilly article, which I found on the sys_call_table in linux kernel 2.6.18 post, suggests that what I'm trying to do ought to work, but it isn't. (Specifically, see the Intercepting sys_unlink() Using System.map 部分。）

我还通读了 Linux Kernel: System call hooking example and ，虽然有些信息，但对我没有用。

问题与疑问

第 1 部分 - 意外的地址不匹配

我在 Kubuntu 15.10 x86_64 架构安装上使用 Linux 内核 4.2.0-16-generic。由于不再导出sys_call_table符号，我grep从系统映射文件中获取地址：

# grep 'sys_call_table' < System.map-4.2.0-16-generic
ffffffff818001c0 R sys_call_table
ffffffff81801580 R ia32_sys_call_table

有了这个，我将以下行添加到我的内核模块中：

static unsigned long *syscall_table = (unsigned long *) 0xffffffff818001c0;

基于此，我期待一个简单的检查实际上可以确认我确实指向了我认为我指向的位置，即内核未导出 sys_call_table。所以，我在模块的初始化函数中写了一个像下面这样的简单检查来验证：

if(syscall_table[__NR_close] != (unsigned long *)sys_close)
{
        pr_info("sys_close = 0x%p, syscall_table[__NR_close] = 0x%p\n", sys_close, syscall_table[__NR_close]);
        return -ENXIO;
}

此检查失败，日志中打印了不同的地址。

我不期待这个 if 语句的主体被执行，因为我认为 syscall_table[__NR_close] 返回的地址与那个相同sys_close，但它确实进入。

问题 1：到目前为止，关于预期的基于地址的比较，我是否遗漏了什么？如果有，是什么？

第 2 部分 - 部分成功？

如果我删除此检查，似乎我部分成功，因为显然，我至少可以使用下面的代码成功替换 read 调用:

static asmlinkage ssize_t (*original_read)(unsigned int fd, char __user *buf, size_t count);
// ...
static void systrap_replace_syscalls(void)
{
    pr_debug("systrap: replacing system calls\n");

    original_read  = syscall_table[__NR_read];
    original_write = syscall_table[__NR_write];
    original_close = syscall_table[__NR_close];

    write_cr0(read_cr0() & ~0x10000);

    syscall_table[__NR_read]  = systrap_read;
    syscall_table[__NR_write] = systrap_write;
    syscall_table[__NR_close] = systrap_close;

    write_cr0(read_cr0() | 0x10000);

    pr_debug("systrap: system calls replaced\n");
}

我的替换函数只是打印一条消息并将调用转发给实际的系统调用。例如读取替换函数的代码如下：

static asmlinkage ssize_t systrap_read(unsigned int fd, char __user *buf, size_t count)
{
        pr_debug("systrap: reading from fd = %u\n", fd);
        return original_read(fd, buf, count);
}

当我 insmod 和 rmmod 模块时，系统日志显示以下输出：

kernel: [23226.797460] systrap: setting up module
kernel: [23226.797462] systrap: replacing system calls
kernel: [23226.797464] systrap: system calls replaced
kernel: [23226.797465] systrap: module setup complete
kernel: [23226.864198] systrap: reading from fd = 4279272912

<similar output ommitted for brevity>

kernel: [23235.560663] systrap: reading from fd = 2835745072
kernel: [23235.564774] systrap: reading from fd = 861079840
kernel: [23235.564986] systrap: cleaning up module
kernel: [23235.564990] systrap: trying to restore system calls
kernel: [23235.564993] systrap: restored sys_read
kernel: [23235.564995] systrap: restored sys_write
kernel: [23235.564997] systrap: restored sys_close
kernel: [23235.565000] systrap: system call restoration attempt complete
kernel: [23235.565002] systrap: module cleanup complete

我可以让它运行很长时间，奇怪的是，我从不观察 write 和 close 的条目函数调用——仅针对 reads，这就是为什么我认为我只是部分成功。

Q2：我是否遗漏了有关替换系统调用的信息？如果有，是什么？

第 3 部分 - `rmmod` 命令上的意外错误消息

尽管模块似乎运行正常，但当我从内核 rmmod 模块时，我总是得到以下错误：

rmmod: ERROR: ../libkmod/libkmod.c:506 lookup_builtin_file() could not open builtin file '(null)/modules.builtin.bin'

我的模块清理函数只是调用另一个（下面）尝试通过执行与上面的替换函数相反的操作来恢复函数调用：

// called by the exit function
static void systrap_restore_syscalls(void)
{
    pr_debug("systrap: trying to restore system calls\n");
    write_cr0(read_cr0() & ~0x10000);

    /* make sure no other modules have made changes before restoring */
    if(syscall_table[__NR_read] == systrap_read)
    {
            syscall_table[__NR_read] = original_read;
            pr_debug("systrap: restored sys_read\n");
    }
    else
    {
            pr_warn("systrap: sys_read not restored; address mismatch\n");
    }
    // ... ommitted: same stuff for other sys calls

    write_cr0(read_cr0() | 0x10000);
    pr_debug("systrap: system call restoration attempt complete\n");
}

Q3：不知道是什么原因导致报错信息；有什么想法吗？

第 4 部分 - `sys_open` 标记为弃用？

在另一个意想不到的事件中，我发现默认情况下不再定义 __NR_open 宏。为了让我看到定义，我必须在 #include 头文件之前 #define __ARCH_WANT_SYSCALL_NO_AT：

/*
 * Force __NR_open definition. It seems sys_open has been replaced by sys_openat(?)
 * See include/uapi/asm-generic/unistd.h:724-725
 */
#define __ARCH_WANT_SYSCALL_NO_AT

#include <linux/kernel.h>
#include <linux/init.h>
#include <linux/module.h>
// ...

查看内核源代码（在上面的评论中提到），您会发现以下评论：

/*
* All syscalls below here should go away really,
* these are provided for both review and as a porting
* help for the C library version.
*
* Last chance: are any of these important enough to
* enable by default?
*/
#ifdef __ARCH_WANT_SYSCALL_NO_AT
#define __NR_open 1024
__SYSCALL(__NR_open, sys_open)
// ...

谁能澄清一下：

Q4：...上面关于为什么 __NR_open 默认不可用的评论？，

问题 5：...用 #define 和

做我正在做的事情是否是个好主意？

问题 6：...如果我真的不应该尝试使用 __NR_open，我应该使用什么来代替？

结语 - 我的系统崩溃

我尝试使用 __NR_openat，像我用以前的那样替换那个调用：

static asmlinkage long systrap_openat(int dfd, const char __user *filename, int flags, umode_t mode)
{
    pr_debug("systrap: opening file dfd = %d, name = % s\n", filename);
    return original_openat(dfd, filename, flags, mode);
}

但这只是帮助我毫不客气地让我自己的系统崩溃，因为它们在尝试打开文件时导致其他进程出现段错误，例如：

kernel: [135489.202693] systrap: opening file dfd = 0, name = P^Q
kernel: [135489.202913] zsh[11806]: segfault at 410 ip 00007f3a380abe60 sp 00007ffd04c5b550 error 4 in libc-2.21.so[7f3a37fe1000+1c0000]

尝试打印参数数据时也显示了 odd/garbage 信息。

Q7：关于为什么它会突然崩溃以及为什么参数看起来像垃圾的任何其他建议？

我花了好几天时间来解决这个问题，我只是希望我没有错过一些非常愚蠢的事情...

如果您在评论中有什么不完全清楚的地方，请告诉我，我会尽力澄清。

如果您能提供一些实际有效的代码片段，我会非常有帮助 and/or 为我指出一个足够精确的方向，让我明白我做错了什么以及如何快速纠正解决这个问题。

Answer 1

我已经设法完成了这个，现在我正在花时间记录我的发现。

Q1: Have I missed something so far regarding the expected address-based comparison?

这个比较的问题是，在查看/proc/kallsyms之后，我看到sys_close和其他相关符号也不再导出。对于某些符号，我已经知道这一点，但我仍然（错误地）认为其他一些符号仍然可用。所以我使用的检查（下面）评估为真并导致模块未通过 'safety' 检查。

if(syscall_table[__NR_close] != (unsigned long *)sys_close)
{
        /* ... */
}

简而言之，您只需相信关于从 System.map-$(uname -r) 文件中检索到的系统调用 table 地址的假设。 'safety' 检查是不必要的，也不会按预期工作。

Q2: Have I missed something regarding the replaced system calls?

这个问题最终被追溯到我包含的以下一个或两个头文件（我没有费心去弄清楚是哪个。）：

#include <uapi/asm-generic/unistd.h>
#include <uapi/asm-generic/errno-base.h>

这些导致 __NR_* 宏被重新定义，并因此扩展为不正确的值——至少对于 x86_64 架构。例如，系统调用 table 中 sys_read 和 sys_write 的索引应该分别为 0 和 1，但它们得到的是其他值和最终索引到 table.

中完全意外的位置

只需删除上面的头文件即可解决问题，无需更改其他代码。

Q3: I don't know what causes the error message; any ideas here?

该错误消息是上一期的副作用。显然，系统调用 table 的索引不正确（参见 Q2）导致内存中的其他位置被修改。

Q4: ...the comments above on why __NR_open is not available by default?

这是我停止使用的 IDE 的误报。 __NR_open 宏已经定义； Q2 的修复使它更加明显。

Q5: ...whether it's a good idea to do what I'm doing with the #define?

简短回答：不，这不是个好主意，绝对不需要。参见上文 Q2。

Q6: ...what I should be using instead if I really shouldn't be trying to use __NR_open

根据对前面问题的回答，这不是问题。使用 __NR_open 就好了，符合预期。这部分由于 Q2

中的头文件而变得混乱

Q7: Any additional suggestions on why it would suddenly crash and why the arguments seem to be garbage-like?

__NR_openat 的使用和崩溃可能是由于宏被扩展为不正确的值引起的（再次参见 Q2）。但是，我可以说我没有 real 需要使用它。我应该按照上面指定的方式使用 __NR_open，但正在尝试 __NR_openat 作为 Q2.

中修复的问题的解决方法

简而言之，Q2 的答案帮助解决了级联效应中的几个问题。

Linux 内核4.2.x: 为什么检查时预期的系统调用地址与实际地址不匹配？

Linux Kernel 4.2.x: Why does the expected system call address not match the actual address when checked?

c

linux

operating-system

system-calls

linux-kernel

短背景

问题与疑问

第 1 部分 - 意外的地址不匹配

第 2 部分 - 部分成功？

第 3 部分 - `rmmod` 命令上的意外错误消息

第 4 部分 - `sys_open` 标记为弃用？

结语 - 我的系统崩溃

Linux 内核4.2.x: 为什么检查时预期的系统调用地址与实际地址不匹配？

Linux Kernel 4.2.x: Why does the expected system call address not match the actual address when checked?

c

linux

operating-system

system-calls

linux-kernel

短背景

问题与疑问

第 1 部分 - 意外的地址不匹配

第 2 部分 - 部分成功？

第 3 部分 - rmmod 命令上的意外错误消息

第 4 部分 - sys_open 标记为弃用？

结语 - 我的系统崩溃

第 3 部分 - `rmmod` 命令上的意外错误消息

第 4 部分 - `sys_open` 标记为弃用？