内核版本3.18, 内存回收和内存分配连在一起的, 了解下:

what is order in mm:

mel gorman’s book:

the allocator maintains blocks of free pages where each block is a power of two number of pages. The exponent for the power of two-sized block is referred to as the order.

block由2的n次方个page组成,这个n就是order了。

内存分配可以分成快速fast path(get_page_from_freelist)和慢速slow path(__alloc_pages_slowpath), fast path失败会走slow path。

/*
* This is the 'heart' of the zoned buddy allocator.
*/
struct page *
__alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
struct zonelist *zonelist, nodemask_t *nodemask)
{
...
/* First allocation attempt */
page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, nodemask, order,
zonelist, high_zoneidx, alloc_flags,
preferred_zone, classzone_idx, migratetype);
if (unlikely(!page)) { //分配不到
/*
* Runtime PM, block IO and its error handling path
* can deadlock because I/O on the device might not
* complete.
*/
gfp_mask = memalloc_noio_flags(gfp_mask);
page = __alloc_pages_slowpath(gfp_mask, order,
zonelist, high_zoneidx, nodemask,
preferred_zone, classzone_idx, migratetype);
}

fast path里如果watermark not ok就会回收内存zone_reclaim再试一次, slow path分配里存在直接内存回收direct reclaim,slow path里也会唤醒kswapd background reclaim.

来看下slow path, 基本大逻辑从代码看:

  1. wake up kswapd if alloc_flag with it
  2. get_page_from_freelist with checking watermark
  3. if nopage, check if do it again without checking watermark
  4. if still nopage, try direct compaction(__alloc_pages_direct_compact).
  5. if still nopage, try direct reclaim then allocating.
  6. if still nopage, oom(__alloc_pages_may_oom)

compaction是分配huge pages用的,长期使用易碎片,huge pages的分配需求很难成功,看下配置描述

config COMPACTION
bool “Allow for memory compaction”
def_bool y
select MIGRATION
depends on MMU
help
Allows the compaction of memory for the allocation of huge pages.

有一份compaction 说明:from https://lwn.net/Articles/368869/

几个memory reclaim sum:

  • reclaim in fast path allocation
get_page_from_freelist -> zone_reclaim (if watermark is not ok) -> shrink_zone
  • direct reclaim

直接回收内存是在slowpath分配里的,来看下

/* The really slow allocator path where we enter direct reclaim */
static inline struct page *
__alloc_pages_direct_reclaim(gfp_t gfp_mask, unsigned int order,
struct zonelist *zonelist, enum zone_type high_zoneidx,
nodemask_t *nodemask, int alloc_flags, struct zone *preferred_zone,
int classzone_idx, int migratetype, unsigned long *did_some_progress)
{
struct page *page = NULL;
bool drained = false;

*did_some_progress = __perform_reclaim(gfp_mask, order, zonelist,
nodemask);
if (unlikely(!(*did_some_progress)))
return NULL;

/* After successful reclaim, reconsider all zones for allocation */
if (IS_ENABLED(CONFIG_NUMA))
zlc_clear_zones_full(zonelist);

retry:
page = get_page_from_freelist(gfp_mask, nodemask, order,
zonelist, high_zoneidx,
alloc_flags & ~ALLOC_NO_WATERMARKS,
preferred_zone, classzone_idx,
migratetype);

先直接回收__perform_reclaim然后get_page_from_freelist.

__perform_reclaim -> try_to_free_pages -> do_try_to_free_pages -> shrink_zones
  • kswapd reclaim

call graph:

kswapd -> balance_pgdat -> kswapd_shrink_zone -> shrink_zone

如果分配时all zones的free page在low wartermak以下时gfp_mask没有带上__GFP_NO_KSWAPD,也就是可以从kswapd回收,那就唤醒kswapd回收.