原生Linux内核是把vmpressure和CONFIG_MEMCG绑定的,是给用户态用的。

makefile里写的很清楚:

obj-$(CONFIG_MEMCG) += memcontrol.o page_cgroup.o vmpressure.o

内核文档也放到了cgroup下memory.txt:

  1. Memory Pressure

The pressure level notifications can be used to monitor the memory
allocation cost; based on the pressure, applications can implement
different strategies of managing their memory resources. The pressure
levels are defined as following:

The "low" level means that the system is reclaiming memory for new

  1. Monitoring this reclaiming activity might be useful for
  2. cache level. Upon notification, the program (typically

"Activity Manager") might analyze vmstat and act in advance (i.e.
prematurely shutdown unimportant services).

The "medium" level means that the system is experiencing medium memory
pressure, the system might be making swap, paging out active file caches,

  1. Upon this event applications may decide to further analyze
    vmstat/zoneinfo/memcg or internal memory usage statistics and free any

resources that can be easily reconstructed or re-read from a disk.

The "critical" level means that the system is actively thrashing, it is
about to out of memory (OOM) or even the in-kernel OOM killer is on its
way to trigger. Applications should do whatever they can to help the

  1. It might be too late to consult with vmstat or any other
    statistics, so it's advisable to take an immediate action.

上面这段描述就是说内核提供vmpressure监测事件给用户态,应用根据level级别实现不同管理内存策略。

low level正常回收,medium level就是开始swaping了,critical就是快没内存了。

一般Android平台memcg没打开(至少7.0及以下),原因以后在看,先看vmpressure, 高通平台Android会把vmpressure放给in-kernel client,比如qualcomm lowmemorykiller.c:

static int __init lowmem_init(void)
{
    register_shrinker(&lowmem_shrinker);
    vmpressure_notifier_register(&lmk_vmpr_nb); //在原生kernel里是没有的
    return 0;
}

这里主要看下Android平台。

有几种计算vmpressure的方法:

  • vmpressure_prio(): 通过reclaimer priority level来计算, it's called from reclaim path
  • vmpressure(): 通过scanned/reclaim ratio来计算,主要是vmpressure_calc_pressure
static unsigned long vmpressure_calc_pressure(unsigned long scanned,
                            unsigned long reclaimed)
{
    unsigned long scale = scanned + reclaimed;
    unsigned long pressure;

    /*
     * We calculate the ratio (in percents) of how many pages were
     * scanned vs. reclaimed in a given time frame (window). Note that
     * time is in VM reclaimer's "ticks", i.e. number of pages
     * scanned. This makes it possible to set desired reaction time
     * and serves as a ratelimit.
     */
    pressure = scale - (reclaimed * scale / scanned);
    pressure = pressure * 100 / scale;

pressue就是(1 - reclaimed/scanned) * 100了,这里scale估计也就是取整数rounding了, 也就是(scanned - reclaimed)/scanned就是说在扫描中未回收的比例就是vmpressure了。

so 在定义level时里也注释了vmpressue越大就表示有越多的未成功回收。

/*
 * These thresholds are used when we account memory pressure through
 * scanned/reclaimed ratio. The current values were chosen empirically. In
 * essence, they are percents: the higher the value, the more number
 * unsuccessful reclaims there were.
 */
static const unsigned int vmpressure_level_med = 60;
static const unsigned int vmpressure_level_critical = 95;

vmpressure_win就是scanned pages的下限,比vmpressure_win还小就忽略这次了。