Linux内存回收Overview | TJ的技术博客

内核版本3.18, 内存回收和内存分配连在一起的, 了解下:

what is order in mm:

mel gorman’s book:

the allocator maintains blocks of free pages where each block is a power of two number of pages. The exponent for the power of two-sized block is referred to as the order.

block由2的n次方个page组成，这个n就是order了。

内存分配可以分成快速fast path(get_page_from_freelist)和慢速slow path(__alloc_pages_slowpath), fast path失败会走slow path。

/*
 * This is the 'heart' of the zoned buddy allocator.
 */
struct page *
__alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
			struct zonelist *zonelist, nodemask_t *nodemask)
{
	...
	/* First allocation attempt */
	page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, nodemask, order,
			zonelist, high_zoneidx, alloc_flags,
			preferred_zone, classzone_idx, migratetype);
	if (unlikely(!page)) { //分配不到
		/*
		 * Runtime PM, block IO and its error handling path
		 * can deadlock because I/O on the device might not
		 * complete.
		 */
		gfp_mask = memalloc_noio_flags(gfp_mask);
		page = __alloc_pages_slowpath(gfp_mask, order,
				zonelist, high_zoneidx, nodemask,
				preferred_zone, classzone_idx, migratetype);
	}

fast path里如果watermark not ok就会回收内存zone_reclaim再试一次, slow path分配里存在直接内存回收direct reclaim，slow path里也会唤醒kswapd background reclaim.

来看下slow path, 基本大逻辑从代码看:

wake up kswapd if alloc_flag with it
get_page_from_freelist with checking watermark
if nopage, check if do it again without checking watermark
if still nopage, try direct compaction(__alloc_pages_direct_compact).
if still nopage, try direct reclaim then allocating.
if still nopage, oom(__alloc_pages_may_oom)

compaction是分配huge pages用的，长期使用易碎片，huge pages的分配需求很难成功，看下配置描述

config COMPACTION
bool “Allow for memory compaction”
def_bool y
select MIGRATION
depends on MMU
help
Allows the compaction of memory for the allocation of huge pages.

有一份compaction 说明：from https://lwn.net/Articles/368869/

几个memory reclaim sum:

reclaim in fast path allocation

get_page_from_freelist -> zone_reclaim (if watermark is not ok) -> shrink_zone

direct reclaim

直接回收内存是在slowpath分配里的，来看下

/* The really slow allocator path where we enter direct reclaim */
static inline struct page *
__alloc_pages_direct_reclaim(gfp_t gfp_mask, unsigned int order,
	struct zonelist *zonelist, enum zone_type high_zoneidx,
	nodemask_t *nodemask, int alloc_flags, struct zone *preferred_zone,
	int classzone_idx, int migratetype, unsigned long *did_some_progress)
{
	struct page *page = NULL;
	bool drained = false;

	*did_some_progress = __perform_reclaim(gfp_mask, order, zonelist,
					       nodemask);
	if (unlikely(!(*did_some_progress)))
		return NULL;

	/* After successful reclaim, reconsider all zones for allocation */
	if (IS_ENABLED(CONFIG_NUMA))
		zlc_clear_zones_full(zonelist);

retry:
	page = get_page_from_freelist(gfp_mask, nodemask, order,
					zonelist, high_zoneidx,
					alloc_flags & ~ALLOC_NO_WATERMARKS,
					preferred_zone, classzone_idx,
					migratetype);

先直接回收__perform_reclaim然后get_page_from_freelist.

__perform_reclaim -> try_to_free_pages -> do_try_to_free_pages -> shrink_zones

kswapd reclaim

call graph:

kswapd -> balance_pgdat -> kswapd_shrink_zone -> shrink_zone

如果分配时all zones的free page在low wartermak以下时gfp_mask没有带上__GFP_NO_KSWAPD，也就是可以从kswapd回收，那就唤醒kswapd回收.