内核版本3.18, 内存回收和内存分配连在一起的, 了解下:

what is order in mm:

mel gorman's book:

the allocator maintains blocks of free pages where each block is a power of two number of pages. The exponent for the power of two-sized block is referred to as the order.

block由2的n次方个page组成,这个n就是order了。

内存分配可以分成快速fast path(get_page_from_freelist)和慢速slow path(__alloc_pages_slowpath), fast path失败会走slow path。

/*
 * This is the 'heart' of the zoned buddy allocator.
 */
struct page *
__alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
            struct zonelist *zonelist, nodemask_t *nodemask)
{
    ...
    /* First allocation attempt */
    page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, nodemask, order,
            zonelist, high_zoneidx, alloc_flags,
            preferred_zone, classzone_idx, migratetype);
    if (unlikely(!page)) { //分配不到
        /*
         * Runtime PM, block IO and its error handling path
         * can deadlock because I/O on the device might not
         * complete.
         */
        gfp_mask = memalloc_noio_flags(gfp_mask);
        page = __alloc_pages_slowpath(gfp_mask, order,
                zonelist, high_zoneidx, nodemask,
                preferred_zone, classzone_idx, migratetype);
    }

fast path里如果watermark not ok就会回收内存zone_reclaim再试一次, slow path分配里存在直接内存回收direct reclaim,slow path里也会唤醒kswapd background reclaim.

来看下slow path, 基本大逻辑从代码看:

  1. wake up kswapd if alloc_flag with it
  2. get_page_from_freelist with checking watermark
  3. if nopage, check if do it again without checking watermark
  4. if still nopage, try direct compaction(__alloc_pages_direct_compact).
  5. if still nopage, try direct reclaim then allocating.
  6. if still nopage, oom(__alloc_pages_may_oom)

compaction是分配huge pages用的,长期使用易碎片,huge pages的分配需求很难成功,看下配置描述

config COMPACTION

   bool "Allow for memory compaction"
   def_bool y
   select MIGRATION
   depends on MMU 
   help
     Allows the compaction of memory for the allocation of huge pages.

有一份compaction 说明:from https://lwn.net/Articles/368869/

几个memory reclaim sum:

  • reclaim in fast path allocation
get_page_from_freelist -> zone_reclaim (if watermark is not ok) -> shrink_zone
  • direct reclaim

直接回收内存是在slowpath分配里的,来看下

/* The really slow allocator path where we enter direct reclaim */
static inline struct page *
__alloc_pages_direct_reclaim(gfp_t gfp_mask, unsigned int order,
    struct zonelist *zonelist, enum zone_type high_zoneidx,
    nodemask_t *nodemask, int alloc_flags, struct zone *preferred_zone,
    int classzone_idx, int migratetype, unsigned long *did_some_progress)
{
    struct page *page = NULL;
    bool drained = false;

    *did_some_progress = __perform_reclaim(gfp_mask, order, zonelist,
                           nodemask);
    if (unlikely(!(*did_some_progress)))
        return NULL;

    /* After successful reclaim, reconsider all zones for allocation */
    if (IS_ENABLED(CONFIG_NUMA))
        zlc_clear_zones_full(zonelist);

retry:
    page = get_page_from_freelist(gfp_mask, nodemask, order,
                    zonelist, high_zoneidx,
                    alloc_flags & ~ALLOC_NO_WATERMARKS,
                    preferred_zone, classzone_idx,
                    migratetype);

先直接回收__perform_reclaim然后get_page_from_freelist.

__perform_reclaim -> try_to_free_pages -> do_try_to_free_pages -> shrink_zones
  • kswapd reclaim

call graph:

kswapd -> balance_pgdat -> kswapd_shrink_zone -> shrink_zone

如果分配时all zones的free page在low wartermak以下时gfp_mask没有带上__GFP_NO_KSWAPD,也就是可以从kswapd回收,那就唤醒kswapd回收.