内存分页机制
Linux的分页机制用来实现以页(Page)为单位的虚拟内存系统,而具体的寻址方法则是逻辑地址经过分页机制的处理转换为物理地址。
控制寄存器
CR0
Bit Name Full Name Description
0 PE Protected Mode Enable If 1, system is in protected mode, else system is in real mode
1 MP Monitor co-processor Controls interaction of WAIT/FWAIT instructions with TS flag in CR0
2 EM Emulation If set, no x87 floating-point unit present, if clear, x87 FPU present
3 TS Task switched Allows saving x87 task context upon a task switch only after x87 instruction used
4 ET Extension type On the 386, it allowed to specify whether the external math coprocessor was an 80287 or 80387
5 NE Numeric error Enable internal x87 floating point error reporting when set, else enables PC style x87 error detection
16 WP Write protect When set, the CPU can't write to read-only pages when privilege level is 0
18 AM Alignment mask Alignment check enabled if AM set, AC flag (in EFLAGS register) set, and privilege level is 3
29 NW Not-write through Globally enables/disable write-through caching
30 CD Cache disable Globally enables/disable the memory cache
31 PG Paging If 1, enable paging and use the § CR3 register, else disable paging.
CR1: 保留
CR2
Contains a value called Page Fault Linear Address (PFLA). When a page fault occurs, the address the program attempted to access is stored in the CR2 register.
CR3
Used when virtual addressing is enabled, hence when the PG bit is set in CR0. CR3 enables the processor to translate linear addresses into physical addresses by locating the page directory and page tables for the current task. Typically, the upper 20 bits of CR3 become the page directory base register (PDBR), which stores the physical address of the first page directory entry. If the PCIDE bit in CR4 is set, the lowest 12 bits are used for the process-context identifier (PCID).[1]
开启分页机制
的第31位如果置位1,即开启分页机制:CR0
PG Paging If 1, enable paging and use the § CR3 register, else disable paging.
以我们常用的x86架构为例,在kernel的代码(arch/x86/Kconfig):
config PGTABLE_LEVELS
int
default 5 if X86_5LEVEL
default 4 if X86_64
default 3 if X86_PAE
default 2
从这个配置我们应该可以看到X86_64是4 level分页机制
四级页表模型
下图是四级页表模型
从上面的图可以发现有四种类型的页表
- PGD
- PUD
- PMD
- PTE
其中Page Global Directory包含Page Upper Directory的地址,而Page Middle Derectory又包括Page Middle Derectory的地址,Page Middle Derectory包含Page Table的地址,其中每个Page Table对应一个Page Frame即物理页. 因此一个线性地址被分为5个部分
四种类型的页表数据结构
pgd_t,pud_t,pmd_t,pte_t分别是四种页面的数据结构,定义如下:
typedef struct { pgdval_t pgd; } pgd_t;
typedef struct { pudval_t pud; } pud_t;
typedef struct { pmdval_t pmd; } pmd_t;
typedef struct { pteval_t pte; } pte_t;
其中pgdval_t,pudval_t,pmdval_t,pteval_t的类型全部为unsigned long.
分页机制寻址过程
每个进程都有一个独立的页表,进程的mm_struct
的成员pdg
指向页全局目录。
下图是一个虚拟地址的分解图
- 可以看出由虚拟地址的CR3寄存器里面的值加上PGD得到页全局目录项
- 由PUD索引得到PUD目录项
- 由PMD得到PMD目录项
- 由PTE得到页表项
- 由offset得到具体的物理地址
通过逻辑地址查找页表Page Table
那么我们来看看相应的代码:
4350 static int __follow_pte_pmd(struct mm_struct *mm, unsigned long address,
/* [previous][next][first][last][top][bottom][index][help] */
4351 unsigned long *start, unsigned long *end,
4352 pte_t **ptepp, pmd_t **pmdpp, spinlock_t **ptlp)
4353 {
4354 pgd_t *pgd;
4355 p4d_t *p4d;
4356 pud_t *pud;
4357 pmd_t *pmd;
4358 pte_t *ptep;
4359
/*
进程的pdg指针存放在mm-=
*/
4360 pgd = pgd_offset(mm, address);//返回执行PGD的指针地址
4361 if (pgd_none(*pgd) || unlikely(pgd_bad(*pgd)))
4362 goto out;
4363 /*
USE_EARLY_PGTABLE_L5在关闭的情况直接返回,不做任何动作
*/
4364 p4d = p4d_offset(pgd, address);//直接返回,不做任何修改
4365 if (p4d_none(*p4d) || unlikely(p4d_bad(*p4d)))
4366 goto out;
4367 /*
返回address地址指向的PUD地址
*/
4368 pud = pud_offset(p4d, address);
4369 if (pud_none(*pud) || unlikely(pud_bad(*pud)))
4370 goto out;
4371 /*
address地址指向的PMD地址
*/
4372 pmd = pmd_offset(pud, address);
4373 VM_BUG_ON(pmd_trans_huge(*pmd));
4374
...
4393
4394 if (pmd_none(*pmd) || unlikely(pmd_bad(*pmd)))
4395 goto out;
4396
...
/*
然会address地址指向的PTE的页表项
*/
4402 ptep = pte_offset_map_lock(mm, pmd, address, ptlp);
/*
检查Present的标志,如果被置位1,所指的页(页表项)是在内存中;如果该标志为0,
则这一页不在内存中,此时分页单元会把地址存放在CR2寄存器中,
并产生14号异常:缺页异常
*/
4403 if (!pte_present(*ptep))
4404 goto unlock;
4405 *ptepp = ptep;
4406 return 0;
4407 unlock:
4408 pte_unmap_unlock(ptep, *ptlp);
4409 if (start && end)
4410 mmu_notifier_invalidate_range_end(mm, *start, *end);
4411 out:
4412 return -EINVAL;
4413 }
PTE转换物理页
4471 int follow_phys(struct vm_area_struct *vma,
/* [previous][next][first][last][top][bottom][index][help] */
4472 unsigned long address, unsigned int flags,
4473 unsigned long *prot, resource_size_t *phys)
4474 {
4475 int ret = -EINVAL;
4476 pte_t *ptep, pte;
4477 spinlock_t *ptl;
4478
4479 if (!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
4480 goto out;
4481
4482 if (follow_pte(vma->vm_mm, address, &ptep, &ptl))
4483 goto out;
4484 pte = *ptep;
4485
4486 if ((flags & FOLL_WRITE) && !pte_write(pte))
4487 goto unlock;
4488
4489 *prot = pgprot_val(pte_pgprot(pte));
4490 *phys = (resource_size_t)pte_pfn(pte) << PAGE_SHIFT;
4491
4492 ret = 0;
4493 unlock:
4494 pte_unmap_unlock(ptep, ptl);
4495 out:
4496 return ret;
4497 }
更新CR3寄存器(cr3里面的是物理地址)
load_cr3(next->pgd);
load_cr3
229 static inline void load_cr3(pgd_t *pgdir)
/* [previous][next][first][last][top][bottom][index][help] */
230 {
231 write_cr3(__pa(pgdir));
232 }
总结
通过内核代码简单学习了linux的4级分页机制,直观的了解了
的过程逻辑地址
`到
`物理地址
但是我们上面了解的过程更期望下面可以学习,解答如下问题:
- 内存描述符
之间又有什么关系呢?struct mm_struct
`和进程描述符
`struct task_struct
- 进程调度切换的时候,内存有时如何切换的呢?
- 内存具体是如何分配和释放的呢?
- Present标志是由谁来置位和清空的呢?