Why? Linux VM System

Why?
• OS theory is all fine and good, but the real
world is messier.
Linux VM System
– The Linux VM is really complicated, so we’ll just
get a glimpse of the structure.
• OS research and innovation is still important.
– The Linux VM has been replace at least 3 times in
the last few versions. (I’ve lost track.)
• Caveat: I am glossing over many details, and
may get some things wrong.
• Problem with Open Source is that
documentation is thin or lags behind
development.
You have seen this before…
0xFFFFFFFF
Kernel
0xCFFFFFFF
Stack
Kernel mapped
into upper 1GB
of address space
Physical Memory Layout
Size of RAM
1 GB
Dynamically
allocated
Heap
0x00000000
Dynamically allocated
and mapped into
virtual addr space
on demand
BSS segment
16 MB
data segment
8 MB
text segment
1 MB
0
mem_map
DMA region
Kernel code image
BIOS comm area
Kernel address space is
1GB so any physical
memory > 1GB is
mapped into virtual
address space on
demand.
Physical memory map
Reserved for DMA
Kernel binary itself
Used by some devices
1
Physical Memory Map
VM Structures Overview
task_struct
mem_map
page struct
ref count addr space
# of processes
using the page
index flags
dirty,
Linked list of
index in
locked
addr space
addr
etc.
mappings
space
for mmapped files
lru
head of
the lru
list this
page is on
mm_struct
open()
mm_struct *mm
close()
start and end vaddr
nopage()
vm_file
*vm_next
Segment
Info
Virtual mem areas
vm_next
Top-level page table
Refcount (#threads)
Segment
Info
vm_next
Start of code section
Start of heap
etc.
Segment
Info
vm_next
Page table structure
called on
page fault
flags (readonly etc.)
vm_ops
mm_struct
task_struct
PID
State (runnable, etc)
User id, grp id, etc.
mm_struct *mm
sched priority
etc.
vm_area_struct
• One vm_area_struct per segment in the address space
– The list of VMAs comprises entire address space
– VMAs cannot overlap
vm_ops
nopage()
vm_area_struct
vm_area_struct
PID
State (runnable, etc)
User id, grp id, etc.
mm_struct *mm
sched priority
etc.
• Linux maintains a global array, mem_map
consisting of one entry for each physical
page in the system
file struct
File corresponding to VMA
(NULL if anonymous region.
• Linux supports three-level page tables in
software
– These map onto two-level page tables on x86.
Linear address
PGD
PMD
PTE
Physical
Memory
Off
mm_struct
->pgd
PFN
PFN
PMD#
Offset
…
PTE#
vm_area_struct
2
Page Fault Handling
Hardware page
fault trap
OK
Reclaiming Page Frames
• kswapd: Kernel pageout daemon
Allocate page
table entries
– Thread that executes in the kernel to free
up physical pages.
Valid VMA
Yes
Page Present?
No
• Each physical memory “zone”
maintains a count of free pages.
SEGV
Write access?
No
On disk?
Yes
Copy on write
Mark page as
referenced
Yes
Read page
from disk
No
Allocate and
zero new page
Reclaiming Page Frames
• Kernel maintains a “page cache”
– Set of physical pages corresponding to blocks on
disk.
• Either files or swap space.
– Before doing any disk I/O, always check page
cache for the page
• Page cache has two lists of pages: “active”
and “inactive”
– Active pages have been used recently
– Inactive pages have not been used recently and
may be swapped out.
– Zones correspond to DMA memory, “main”
memory (up to 1 GB) and “high” memory
(above 1GB)
Shrinking the page cache
• To reclaim physical memory, first try to shrink the
page cache
– kswapd has a target for the number of pages to free
– If a page has reference bit set to 0, move it to
“inactive” list
– Otherwise, move it to front of the “active” list
• This is essentially the Clock algorithm
• Next step: Decide which pages in the inactive
list to swap out
– Tries to minimize disk I/O
– If not enough pages freed, try again with higher target
– If enough pages freed, lower the target
3
Shrinking the page cache
• Other kernel caches are shrunk in a
similar manner
– e.g., Special caches for file pages, kernel
VM allocator and others
• Only if we can’t free enough memory
from these caches is process memory
freed
– Scan over process page tables and try to
free inactive pages.
4