内核zone里有个水位的概念,根据这个水位判断内存压力,从而进行内存回收,版本3.18。

struct zone {
/* Read-mostly fields */

/* zone watermarks, access with *_wmark_pages(zone) macros */
unsigned long watermark[NR_WMARK];

enum zone_watermarks {
WMARK_MIN,
WMARK_LOW,
WMARK_HIGH,
NR_WMARK
};

#define min_wmark_pages(z) (z->watermark[WMARK_MIN])
#define low_wmark_pages(z) (z->watermark[WMARK_LOW])
#define high_wmark_pages(z) (z->watermark[WMARK_HIGH])

struct zone {
/* Read-mostly fields */

/* zone watermarks, access with *_wmark_pages(zone) macros */
unsigned long watermark[NR_WMARK];

watermark计算会用到managed_pages:

/*
* spanned_pages is the total pages spanned by the zone, including
* holes, which is calculated as:
* spanned_pages = zone_end_pfn - zone_start_pfn;
*
* present_pages is physical pages existing within the zone, which
* is calculated as:
* present_pages = spanned_pages - absent_pages(pages in holes);
*
* managed_pages is present pages managed by the buddy system, which
* is calculated as (reserved_pages includes pages allocated by the
* bootmem allocator):
* managed_pages = present_pages - reserved_pages;
*
* So present_pages may be used by memory hotplug or memory power
* management logic to figure out unmanaged pages by checking
* (present_pages - managed_pages). And managed_pages should be used
* by page allocator and vm scanner to calculate all kinds of watermarks
* and thresholds.
...
*/

安装代码在__setup_per_zone_wmarks:

static void __setup_per_zone_wmarks(void)
{
unsigned long pages_min = min_free_kbytes >> (PAGE_SHIFT - 10);
unsigned long pages_low = extra_free_kbytes >> (PAGE_SHIFT - 10);
unsigned long lowmem_pages = 0;

for_each_zone(zone) {
...
zone->watermark[WMARK_LOW] = min_wmark_pages(zone) +
low + (min >> 2);
zone->watermark[WMARK_HIGH] = min_wmark_pages(zone) +
low + (min >> 1);
}
}

so, wmark min < low < high

那到底背后的判断zone水位压力逻辑是什么,先看下内核文档了解下:

水位的计算有两个变量:

min_free_kbytes:

This is used to force the Linux VM to keep a minimum number
of kilobytes free. The VM uses this number to compute a
watermark[WMARK_MIN] value for each lowmem zone in the system.
Each lowmem zone gets a number of reserved free pages based
proportionally on its size.

Some minimal amount of memory is needed to satisfy PF_MEMALLOC
allocations; if you set this to lower than 1024KB, your system will
become subtly broken, and prone to deadlock under high loads.

Setting this too high will OOM your machine instantly.

extra_free_kbytes

This parameter tells the VM to keep extra free memory between the threshold
where background reclaim (kswapd) kicks in, and the threshold where direct
reclaim (by allocating processes) kicks in.

This is useful for workloads that require low latency memory allocations
and have a bounded burstiness in memory allocations, for example a
realtime application that receives and transmits network traffic
(causing in-kernel memory allocations) with a maximum total message burst
size of 200MB may need 200MB of extra free memory to avoid direct reclaim
related latencies.

再看一段描述

watemark[WMARK_MIN/WMARK_LOW/WMARK_HIGH]/low_on_memory/zone_wake_kswapd: These
are per-zone fields, used to determine when a zone needs to be balanced. When
the number of pages falls below watermark[WMARK_MIN], the hysteric field
low_on_memory gets set. This stays set till the number of free pages becomes
watermark[WMARK_HIGH]. When low_on_memory is set, page allocation requests will
try to free some pages in the zone (providing GFP_WAIT is set in the request).
Orthogonal to this, is the decision to poke kswapd to free some zone pages.
That decision is not hysteresis based, and is done when the number of free
pages is below watermark[WMARK_LOW]; in which case zone_wake_kswapd is also set.

so, 当free page小于wmark low时, kswapd开始活动,必须要先到wmark min以下才会触发kswapd? need to check out kswapd code.

当水位在min以下后直到high才停止kswapd,否则分配也会尝试释放内存。可以通过proc下的zoneinfo查。

这里的描述是2.x,3.18内核在用zone_balanced判断是否平衡,大概看下

bool zone_watermark_ok_safe(struct zone *z, unsigned int order,
unsigned long mark, int classzone_idx, int alloc_flags)
{
long free_pages = zone_page_state(z, NR_FREE_PAGES);

if (z->percpu_drift_mark && free_pages < z->percpu_drift_mark)
free_pages = zone_page_state_snapshot(z, NR_FREE_PAGES);

return __zone_watermark_ok(z, order, mark, classzone_idx, alloc_flags,
free_pages);
}

判断标准就是free page,传入mark是high_wmark_pages。

所以说提高水位min_free_kbytes or extra_free_kbytes,回收内存提早了,因为free page更容易低于该wmark。

我们从官方Android 4.4 low memory优化说明也能看出:

<!-- Device configuration setting the /proc/sys/vm/extra_free_kbytes tunable in the kernel (if it exists). A high value will increase the amount of memory that the kernel tries to keep free, reducing allocation time and causing the lowmemorykiller to kill earlier. A low value allows more memory to be used by processes but may cause more allocations to block waiting on disk I/O or lowmemorykiller. Overrides the default value chosen by ActivityManager based on screen size. 0 prevents keeping any extra memory over what the kernel keeps by default. -1 keeps the default. -->
<integer name="config_extraFreeKbytesAbsolute">-1</integer>