mm_struct 学习

mm_struct 分析

上一节了解了linux(x86)的分页机制,其中有提PGD只要是来源于mm_struct,为了解进程内存相关信息,详细了解一下内存描述符的结构体--mm_struct。

mm_struct定义在include/linux/mm_types.h中,其中抽象出来的进程地址空间,如下图所示:


linux的进程作为task_struct的实例在内核中实现,Task_struct中的mm字段指向内存描述符mm_struct,它是一个程序内存的总的描述。如上图所示,它存储着内存段的开始和结束、进程使用了多少物理内存页(RSS),虚拟内存使用的总量,从内存描述结构体里面我们还可以看到两个成员变量:Virt memory areasPage tables.如下图的一个展示:

每一个虚拟内存区域(VMA)是一个连续的虚拟地址,这些区域永远不会重叠。vm_area_struct`的实例完整描述了内存区域,包括开始和结束地址,flags决定访问的权限和行为,而且`vm_file字段指定该区域正在映射那个文件(如果有),不映射文件的VMA是匿名的。每一个内存段都相应到一个单独的VMA,但是内存映射段(memory mapping)除外。
struct mm_struct

 343 struct mm_struct {
 344         struct {
 345                 struct vm_area_struct *mmap;            /* list of VMAs */
 346                 struct rb_root mm_rb;
 347                 u64 vmacache_seqnum;                   /* per-thread vmacache */
            ...
 362                 pgd_t * pgd;    //指向进程的页目录
            ...
 493 #if IS_ENABLED(CONFIG_HMM)
 494                 /* HMM needs to track a few things per mm */
 495                 struct hmm *hmm;
 496 #endif
 497         } __randomize_layout;
 498             
 499         RH_KABI_RESERVE(1)
 500         RH_KABI_RESERVE(2)
 501         RH_KABI_RESERVE(3)
 502         RH_KABI_RESERVE(4)
 503         RH_KABI_RESERVE(5)
 504         RH_KABI_RESERVE(6)
 505         RH_KABI_RESERVE(7)
 506         RH_KABI_RESERVE(8)
 507 
 508         /*
 509          * The mm_cpumask needs to be at the end of mm_struct, because it
 510          * is dynamically sized based on nr_cpu_ids.
 511          */
 512         unsigned long cpu_bitmap[];
 513 };

struct vm_area_struct

 262 struct vm_area_struct {
 263         /* The first cache line has the info for VMA tree walking. */
 264 
 265         unsigned long vm_start;         /* Our start address within vm_mm. */
 266         unsigned long vm_end;           /* The first byte after our end address
 267                                            within vm_mm. */
 268 
 269         /* linked list of VM areas per task, sorted by address */
 270         struct vm_area_struct *vm_next, *vm_prev;
 271 
 272         struct rb_node vm_rb;
 273 
            ...
 286         unsigned long vm_flags;         /* Flags, see mm.h. */
 287 
            ...
 309 
 310         /* Information about our backing store: */
 311         unsigned long vm_pgoff;         /* Offset (within vm_file) in PAGE_SIZE
 312                                            units */
 313         struct file * vm_file;          /* File we map to (can be NULL). */
            ...
 324 
 325         RH_KABI_RESERVE(1)
 326         RH_KABI_RESERVE(2)
 327         RH_KABI_RESERVE(3)
 328         RH_KABI_RESERVE(4)
 329 } __randomize_layout;

some This is Blue italic. text

程序的VMA既作为mmap字段中的链接列表(按起始虚拟地址排序)又作为以mm_rb字段为根的红黑树存储在其内存描述符中。红黑树允许内核快速搜索覆盖给定虚拟地址的内存区域。当您读取文件/proc/pid_of_process/maps时,内核仅浏览该进程的VMA链表并打印每一项。

处理器查阅页表以将虚拟地址转换为物理内存地址。每个进程都有自己的PTE。每当发生过程切换时,用户空间的页表也会被切换(一般会有一个CPU寄存器来保存页表的地址,比如X86下的CR3,页表切换就是改变该寄存器的值)。 Linux在内存描述符的pgd字段中存储了指向进程的页表的指针。每个虚拟页在页表中都有一个页表项(PTE),在常规x86分页中,它是一个简单的4字节记录,如下所示:

Linux具有读取和设置PTE中每个标志的功能。bit P(present)告诉处理器物理内存中是否存在虚拟页。如果清除(等于0),则访问页面将触发页面错误(page fault)。请记住,当该位为零时,内核可以对其余字段执行任何需要的操作。 R/W标志代表读/写;如果清除,则该页面为只读。标志U/S代表user/supervisor;如果清除,则该页面只能由内核访问。这些标志用于实现我们之前看到的只读内存和受保护的内核空间。

D和A bit位表示脏(dirty)和可访问(accessed)。脏页已写入,而访问页已写入或读取。这两个标志都是粘性的:处理器只设置它们,它们必须由内核清除。最后,PTE存储与该页面相对应的起始物理地址,对齐到4KB。

在Linux中,每个页面框架都由一个描述符和几个标志跟踪。这些描述符一起跟踪计算机中的整个物理内存。每个页面框架的精确状态始终是已知的。物理内存是通过伙伴内存分配技术(buddy)来管理的。分配的页面框架可能是匿名的(anonymous),用于保存程序数据,也可能位于页面高速缓存中(page cache),用于保存存储在文件或块设备中的数据。还有其他奇特的页面框架用途,但是暂时不使用它们。
如下图,以一个用户的heap为例:

蓝色矩形代表VMA范围内的页面,箭头指向的代表PTE映射pages到页帧。某些虚拟页面没有箭头。这意味着其相应的PTE清除了Present标志。这可能是因为页面从未被使用过,或者它们的内容已被swapped out。无论哪种情况,访问这些页面都将导致page faults,即使它们在VMA中也是如此,如下图展示了kernel在内存分配的流程:

当程序通过brk()系统调用请求更多内存时,内核仅更新堆VMA。此时实际上并未分配任何页侦,并且物理内存中不存在新页面。一旦程序尝试访问页面,处理器页面就会出错并调用do_page_fault()。它使用find_vma()搜索覆盖故障虚拟地址的VMA。如果找到,则还会根据尝试的访问(读或写)检查VMA上的权限。如果没有合适的VMA,则会出现Segmentation Fault的错误。

代码验证VMA

kernel module

#include <linux/init.h>
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/sched.h>
#include <linux/mm.h>

static int pid_mem = 1;

static void print_mem(struct task_struct *task)
{
        struct mm_struct *mm;
        struct vm_area_struct *vma;
        int count = 0;
        mm = task->mm;
        printk("\nThis mm_struct has %d vmas.\n", mm->map_count);
        for (vma = mm->mmap ; vma ; vma = vma->vm_next) {
                printk ("\nVma number %d: \n", ++count);
                printk("  Starts at 0x%lx, Ends at 0x%lx, vma flags 0x%x\n",
                          vma->vm_start, vma->vm_end, vma->vm_flags);
        }
        printk("\nPgdp = 0x%lx", &(mm->pgd));
        printk("\nCode  Segment start = 0x%lx, end = 0x%lx \n"
                 "Data  Segment start = 0x%lx, end = 0x%lx\n"
                 "Stack Segment start = 0x%lx\n",
                 mm->start_code, mm->end_code,
                 mm->start_data, mm->end_data,
                 mm->start_stack);
}

static int mm_exp_load(void){
        struct task_struct *task;
        printk("\nGot the process id to look up as %d.\n", pid_mem);
        for_each_process(task) {
                if ( task->pid == pid_mem) {
                        printk("%s[%d]\n", task->comm, task->pid);
                        print_mem(task);
                }
        }
        return 0;
}

static void mm_exp_unload(void)
{
        printk("\nPrint segment information module exiting.\n");
}

module_init(mm_exp_load);
module_exit(mm_exp_unload);
module_param(pid_mem, int, 0);

userspace process

#include<stdio.h>
#include<stdlib.h>
#include<unistd.h>

int our_init_data = 30;
int our_noinit_data;

void our_prints(void)
{
        int our_local_data = 1;
        printf("\nPid of the process is = %d", getpid());
        printf("\nAddresses which fall into:");
        printf("\n 1) Data segment = %p",
                &our_init_data);
        printf("\n 2) BSS segment = %p",
                &our_noinit_data);
        printf("\n 3) Code segment = %p",
                &our_prints);
        printf("\n 4) Stack segment = %p\n",
                &our_local_data);
        while(1);
}

int main(void)
{
        our_prints();
        return 0;
}

打印出来的结果:

Got the process id to look up as 8479.                                                                                                                                                                             
[610812.058537]                                                                                                                                                                                                    
Print segment information module exiting.                                                                                                                                                                          
[610813.706706]                                                                                                                                                                                                    
Got the process id to look up as 8498.                                                                                                                                                                             
[610813.707618] vma_userspace[8498]                                                                                                                                                                                
[610813.707885]                                                                                                                                                                                                    
This mm_struct has 26 vmas.                                                                                                                                                                                        
[610813.708354]                                                                                                                                                                                                    
Vma number 1:                                                                                                                                                                                                      
[610813.708830]   Starts at 0x400000, Ends at 0x401000, vma flags 0x8000875                                                                                                                                        
[610813.709075]                                                                                                                                                                                                    
Vma number 2:                                                                                                                                                                                                      
[610813.709542]   Starts at 0x600000, Ends at 0x601000, vma flags 0x8100871                                                                                                                                        
[610813.709806]                                                                                                                                                                                                    
Vma number 3:                                                                                                                                                                                                      
[610813.710256]   Starts at 0x601000, Ends at 0x602000, vma flags 0x8100873                         
[610813.710497]                                       

...

Vma number 8:                                       
[610813.713782]   Starts at 0x7f23f34a5000, Ends at 0x7f23f3ca5000, vma flags 0x8100073                  
[610813.714022]                                     
Vma number 9:                                       
[610813.714460]   Starts at 0x7f23f3ca5000, Ends at 0x7f23f3e5d000, vma flags 0x8000075                  
[610813.714715]                                     
Vma number 10:                                      
[610813.715164]   Starts at 0x7f23f3e5d000, Ends at 0x7f23f405d000, vma flags 0x8000070                  
[610813.715402]                                     

...

[610813.726094]   Starts at 0x7f23f44a7000, Ends at 0x7f23f44a8000, vma flags 0x8100073                  
[610813.726361]                                     
Vma number 25:                                      
[610813.726885]   Starts at 0x7fff0ca81000, Ends at 0x7fff0caa2000, vma flags 0x100173                   
[610813.727153]                                     
Vma number 26:                                      
[610813.727670]   Starts at 0x7fff0caaf000, Ends at 0x7fff0cab1000, vma flags 0x8040075
[610813.727945] 
Pgdp = 0xffff880075df3ed8
mm_users = 3, mm_count = 1
Code  Segment start = 0x400000, end = 0x400d0c 
Data  Segment start = 0x600e00, end = 0x601068
Stack Segment start = 0x7fff0caa0380

proc打印的map结果

[root@localhost VMA_Lab]# cat /proc/8546/maps 
00400000-00401000 r-xp 00000000 fd:00 34442116                           /root/VMA_Lab/vma_userspace
00600000-00601000 r--p 00000000 fd:00 34442116                           /root/VMA_Lab/vma_userspace
00601000-00602000 rw-p 00001000 fd:00 34442116                           /root/VMA_Lab/vma_userspace
0172b000-0174c000 rw-p 00000000 00:00 0                                  [heap]
7fd8cf0bc000-7fd8cf0bd000 ---p 00000000 00:00 0 
7fd8cf0bd000-7fd8cf8bd000 rw-p 00000000 00:00 0 
7fd8cf8bd000-7fd8cf8be000 ---p 00000000 00:00 0 
7fd8cf8be000-7fd8d00be000 rw-p 00000000 00:00 0 
7fd8d00be000-7fd8d0276000 r-xp 00000000 fd:00 33795                      /usr/lib64/libc-2.17.so
7fd8d0276000-7fd8d0476000 ---p 001b8000 fd:00 33795                      /usr/lib64/libc-2.17.so
7fd8d0476000-7fd8d047a000 r--p 001b8000 fd:00 33795                      /usr/lib64/libc-2.17.so
7fd8d047a000-7fd8d047c000 rw-p 001bc000 fd:00 33795                      /usr/lib64/libc-2.17.so
7fd8d047c000-7fd8d0481000 rw-p 00000000 00:00 0 
7fd8d0481000-7fd8d0498000 r-xp 00000000 fd:00 33821                      /usr/lib64/libpthread-2.17.so
7fd8d0498000-7fd8d0697000 ---p 00017000 fd:00 33821                      /usr/lib64/libpthread-2.17.so
7fd8d0697000-7fd8d0698000 r--p 00016000 fd:00 33821                      /usr/lib64/libpthread-2.17.so
7fd8d0698000-7fd8d0699000 rw-p 00017000 fd:00 33821                      /usr/lib64/libpthread-2.17.so
7fd8d0699000-7fd8d069d000 rw-p 00000000 00:00 0 
7fd8d069d000-7fd8d06be000 r-xp 00000000 fd:00 24092                      /usr/lib64/ld-2.17.so
7fd8d08b2000-7fd8d08b5000 rw-p 00000000 00:00 0 
7fd8d08bc000-7fd8d08be000 rw-p 00000000 00:00 0 
7fd8d08be000-7fd8d08bf000 r--p 00021000 fd:00 24092                      /usr/lib64/ld-2.17.so
7fd8d08bf000-7fd8d08c0000 rw-p 00022000 fd:00 24092                      /usr/lib64/ld-2.17.so
7fd8d08c0000-7fd8d08c1000 rw-p 00000000 00:00 0 
7ffeb9f38000-7ffeb9f59000 rw-p 00000000 00:00 0                          [stack]
7ffeb9f6d000-7ffeb9f6f000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]

里面值得主要的是:

mm_users和mm_count的区别
通过实验在开启多线程的时候,可以看到例如一个进程开两个线程:

mm_users = 3, mm_count = 1

可以发现无论多少userspace的process,只会增加users的计数,而mm_count的计数只为1,可以推论得知,当mm_count的值大于1的时候应该有内核进程在引用该VMA空间;

mm和active_mm的区别

struct mm_struct *mm, *active_mm;

通过测试process的进程里面结果打印 struct task_struct :

task->mm = 1978423168, task->active_mm = 1978423168

可以看出mm和active_mm是相同的,但是发现在内核进程mm这个是NULL,active_mm是引用pre task里面的mm,用于内核空间的寻址

进程切换内存如何切换的呢?

主要关注一下在进程切换的时候学习mm的切换,以context_switch

2487 static inline void
2488 context_switch(struct rq *rq, struct task_struct *prev,
     /* [previous][next][first][last][top][bottom][index][help]  */
2489                struct task_struct *next) --------> (1)
2490 {
2491         struct mm_struct *mm, *oldmm;
2492 
2493         prepare_task_switch(rq, prev, next);
2494 
2495         mm = next->mm;
2496         oldmm = prev->active_mm;--------> (2)

             ...

2503 
2504         if (!mm) { --------> (3)
2505                 next->active_mm = oldmm;
2506                 atomic_inc(&oldmm->mm_count);
2507                 enter_lazy_tlb(oldmm, next);--------> (4)
2508         } else
2509                 switch_mm(oldmm, mm, next);--------> (5)
2510 
2511         if (!prev->mm) {--------> (6)
2512                 prev->active_mm = NULL;
2513                 rq->prev_mm = oldmm;
2514         }

            ...

2524 
2525         context_tracking_task_switch(prev, next);
2526         /* Here we just switch the register state and the stack. */
2527         switch_to(prev, next, prev);--------> (7)
2528 
2529         barrier();
2530         /*
2531          * this_rq must be evaluated again because prev may have moved
2532          * CPUs since it called schedule(), thus the 'rq' on its stack
2533          * frame will be invalid.
2534          */
2535         finish_task_switch(this_rq(), prev);--------> (8)
2536 }

github 地址:
git@github.com:jianpingzhao/learn-linux.git

总结:

本日志主要学习和记录了mm_struct的一些含义以及在VMA在进程中的一些字段和代表的意义,但是在学习后我们还是有如下问题:

  • 进程在alloc的具体流程,特别是kalloc和malloc的相同和不同点;
  • mmap和alloc的不同点;
  • How programs are laid out in memory.

参考链接:
https://manybutfinite.com/post/how-the-kernel-manages-your-memory/
https://www.cs.columbia.edu/~junfeng/13fa-w4118/lectures/l20-adv-mm.pdf

暂无评论

发送评论 编辑评论


				
|´・ω・)ノ
ヾ(≧∇≦*)ゝ
(☆ω☆)
(╯‵□′)╯︵┴─┴
 ̄﹃ ̄
(/ω\)
∠( ᐛ 」∠)_
(๑•̀ㅁ•́ฅ)
→_→
୧(๑•̀⌄•́๑)૭
٩(ˊᗜˋ*)و
(ノ°ο°)ノ
(´இ皿இ`)
⌇●﹏●⌇
(ฅ´ω`ฅ)
(╯°A°)╯︵○○○
φ( ̄∇ ̄o)
ヾ(´・ ・`。)ノ"
( ง ᵒ̌皿ᵒ̌)ง⁼³₌₃
(ó﹏ò。)
Σ(っ °Д °;)っ
( ,,´・ω・)ノ"(´っω・`。)
╮(╯▽╰)╭
o(*////▽////*)q
>﹏<
( ๑´•ω•) "(ㆆᴗㆆ)
😂
😀
😅
😊
🙂
🙃
😌
😍
😘
😜
😝
😏
😒
🙄
😳
😡
😔
😫
😱
😭
💩
👻
🙌
🖕
👍
👫
👬
👭
🌚
🌝
🙈
💊
😶
🙏
🍦
🍉
😣
Source: github.com/k4yt3x/flowerhd
颜文字
Emoji
小恐龙
花!
上一篇
下一篇