kernel版本是3.18,先看下内存描述几个概念。

zone 概念

NUMA全称Non-Uniform Memory Access,主要说的是cpu访问memory bank依赖他们之间的距离成本问题。

memory被划分成bank,每一个bank叫做node,每一个node被分成很多块叫zones, node用pglist_data表示, zone里包含pages

/*
* The pg_data_t structure is used in machines with CONFIG_DISCONTIGMEM
* (mostly NUMA machines?) to denote a higher-level memory zone than the
* zone denotes.
*
* On NUMA machines, each NUMA node would have a pg_data_t to describe
* it's memory layout.
*
* Memory statistics and page replacement data structures are maintained on a
* per-zone basis.
*/
typedef struct pglist_data {
struct zone node_zones[MAX_NR_ZONES];
struct zonelist node_zonelists[MAX_ZONELISTS];
int nr_zones;
#ifdef CONFIG_FLAT_NODE_MEM_MAP /* means !SPARSEMEM */
struct page *node_mem_map;
#ifdef CONFIG_MEMCG
struct page_cgroup *node_page_cgroup;
#endif
...
}pg_data_t;

可以看到成员node_zones,就是node里有多少zones了。

注释提到这个结构是给CONFIG_DISCONTIGMEM用的,说的是内存模型,就是node如何管理pages的。

zone的数据结构是struct zone, zone种类主要有ZONE_DMA,ZONE_NORMAL,ZONE_HIGHMEM。

ZONE_HIGHMEM给32bits系统用,ZONE_DMA是架构相关特殊设备需要,ZONE_NORMAL定义

/*
* Normal addressable memory is in ZONE_NORMAL. DMA operations can be
* performed on pages in ZONE_NORMAL if the DMA devices support
* transfers to all addressable memory.
*/
ZONE_NORMAL,

64bit手机系统里一般都是一个ZONE_NORMAL。

memory module

linux内存模型按有3种,按演变先后顺序是:flat memory, discontiguous memory, sparse memory.

看下内核的配置说明

config DISCONTIGMEM_MANUAL
bool "Discontiguous Memory"
depends on ARCH_DISCONTIGMEM_ENABLE
help
This option provides enhanced support for discontiguous
memory systems, over FLATMEM. These systems have holes
in their physical address spaces, and this option provides
more efficient handling of these holes. However, the vast
majority of hardware has quite flat address spaces, and
can have degraded performance from the extra overhead that
this option imposes.

Many NUMA configurations will have this as the only option.

If unsure, choose "Flat Memory" over this option.

config SPARSEMEM_MANUAL
bool "Sparse Memory"
depends on ARCH_SPARSEMEM_ENABLE
help
This will be the only option for some systems, including
memory hotplug systems. This is normal.

For many other systems, this will be an alternative to
"Discontiguous Memory". This option provides some potential
performance benefits, along with decreased code complexity,
but it is newer, and more experimental.

If unsure, choose "Discontiguous Memory" or "Flat Memory"
over this option.

flat就是物理地址空间连续,discontigous就是不连续了,物理地址空间有holes,NUMA系统用,sparse针对是热插拔,从字面看应该是更分散,结合热插拔可以想到是更多分散的内存区域随时可以热插拔。

看下page相关成员:

typedef struct pglist_data {
struct zone node_zones[MAX_NR_ZONES];
struct zonelist node_zonelists[MAX_ZONELISTS];
int nr_zones;
#ifdef CONFIG_FLAT_NODE_MEM_MAP /* means !SPARSEMEM */
struct page *node_mem_map;
#ifdef CONFIG_MEMCG
struct page_cgroup *node_page_cgroup;
#endif
#endif
config FLAT_NODE_MEM_MAP
def_bool y
depends on !SPARSEMEM

这里的node_mem_map就是for discontigous, 从定义来看flat也应该适用。

#ifndef CONFIG_NEED_MULTIPLE_NODES
/* use the per-pgdat data instead for discontigmem - mbligh */
unsigned long max_mapnr;
struct page *mem_map;

EXPORT_SYMBOL(max_mapnr);
EXPORT_SYMBOL(mem_map);
#endif
#ifndef CONFIG_DISCONTIGMEM
/* The array of struct pages - for discontigmem use pgdat->lmem_map */
extern struct page *mem_map;
#endif
#ifndef CONFIG_NEED_MULTIPLE_NODES

extern struct pglist_data contig_page_data;
#define NODE_DATA(nid) (&contig_page_data)
#define NODE_MEM_MAP(nid) mem_map

#else /* CONFIG_NEED_MULTIPLE_NODES */

#include <asm/mmzone.h>

#endif /* !CONFIG_NEED_MULTIPLE_NODES */

上面几段代码用到了两个宏:CONFIG_NEED_MULTIPLE_NODES, CONFIG_DISCONTIGMEM:

#
# Both the NUMA code and DISCONTIGMEM use arrays of pg_data_t's
# to represent different areas of memory. This variable allows
# those dependencies to exist individually.
#
config NEED_MULTIPLE_NODES
def_bool y
depends on DISCONTIGMEM || NUMA

不连续,地址空间被分成多个node,这个node就是之前的bank了。

so, mem_map应该给flat用,从代码看sparse也能用?应该不是。mem_map就是node 0的node_mem_map.

代码如下:

static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat)
{
...
#ifndef CONFIG_NEED_MULTIPLE_NODES
/*
* With no DISCONTIG, the global mem_map is just set as node 0's
*/
if (pgdat == NODE_DATA(0)) {
mem_map = NODE_DATA(0)->node_mem_map;
...

从代码实现能看出flat模型就是single node,disconti模型就是multiple node。flat是disconti的特性情况,也就是node 0。

看下sparse是怎么管理的:

#ifdef CONFIG_SPARSEMEM
struct mem_section {
/*
* This is, logically, a pointer to an array of struct
* pages. However, it is stored with some other magic.
* (see sparse.c::sparse_init_one_section())
*
* Additionally during early boot we encode node id of
* the location of the section here to guide allocation.
* (see sparse.c::memory_present())
*
* Making it a UL at least makes someone do a cast
* before using it wrong.
*/
unsigned long section_mem_map;

用section_mem_map来表示page数组, 现在的手机一般都是sparse模型了。

#define page_to_pfn __page_to_pfn
#define pfn_to_page __pfn_to_page

这两个宏的实现按模型区分开, pfn is page frame number, 就是给page标号数组index。

对于flatmem,要找到某个page就是mem_map + n, n是该page的pfn.

对discontigmem,要找到node,然后根据node_mem_map + n

sparsemem要先找到section,然后再section_mem_map + n