先scan zonelist,根据waterwark找到一个有足够多free page的zone,遍历完没有找到就try once more for remote node.
先看remote node:
/* * The first pass makes sure allocations are spread fairly within the * local node. However, the local node might have free pages left * after the fairness batches are exhausted, and remote zones haven't * even been considered yet. Try once more without fairness, and * include remote zones now, before entering the slowpath and waking * kswapd: prefer spilling to a remote zone over swapping locally. */ if (alloc_flags & ALLOC_FAIR) { alloc_flags &= ~ALLOC_FAIR; if (nr_fair_skipped) { // me: local node with ZONE_FAIR_DEPLETED zonelist_rescan = true; reset_alloc_batches(preferred_zone); } if (nr_online_nodes > 1) // me: consider remote node zonelist_rescan = true; }
if (unlikely(IS_ENABLED(CONFIG_NUMA) && zlc_active)) { /* Disable zlc cache for second zonelist scan */ zlc_active = 0; zonelist_rescan = true; }
if (zonelist_rescan) goto zonelist_scan;
returnNULL; }
注释提到了try once more的原因:
the local node会有free pages left after the fairness batches are exhausted (什么鬼?)
consider remote node in NUMA system
主要来看下怎么找到zone with enough free的: 先用zone_watermark_ok看下free page是不是在watermark之上,if it’s ok 那就走try_this_zone后的流程,if it’s not ok,那就走zone_reclaim回收后再用zone_watermark_ok检查下,same as before.
传入的watermark在:
/* * This is the 'heart' of the zoned buddy allocator. */ structpage * __alloc_pages_nodemask(gfp_tgfp_mask, unsignedintorder, structzonelist *zonelist, nodemask_t *nodemask) { enumzone_typehigh_zoneidx = gfp_zone(gfp_mask); structzone *preferred_zone; structzoneref *preferred_zoneref; structpage *page =NULL; int migratetype = gfpflags_to_migratetype(gfp_mask); unsignedint cpuset_mems_cookie; int alloc_flags = ALLOC_WMARK_LOW|ALLOC_CPUSET|ALLOC_FAIR; //这里 int classzone_idx;
ok, it’s the ALLOC_WMARK_LOW, 所以也就是在free page在low下就会走slow_path唤醒kswapd了。