memcg:understand memory usage
Android Go的用户态lmk在探测内存压力时用到了memcg的如下内存使用统计:
memcg是cgroup的一个子系统,那这两个文件是如何统计内存使用的,带着这个疑问来看下, kernel 3.18, msm平台。
How to read
static struct cftype mem_cgroup_files[] = { |
统一入口mem_cgroup_read_u64
:
static u64 mem_cgroup_read_u64(struct cgroup_subsys_state *css, |
这里usage_in_bytes
和memsw.usage_in_bytes
都走了mem_cgroup_usage(,boot swap)
。
static inline u64 mem_cgroup_usage(struct mem_cgroup *memcg, bool swap) |
从字面上看,如果是root memcg那就走mem_cgroup_recursive_stat
循环统计,如果是non root memcg那就直接call res_counter_read_u64
去读res
。
res的内核文档设计说明:
2.1. Design
The core of the design is a counter called the res_counter. The res_counter
tracks the current memory usage and limit of the group of processes associated
with the controller. Each cgroup has a memory controller specific data
structure (mem_cgroup) associated with it.2.2. Accounting
+--------------------+ | mem_cgroup | | (res_counter) | +--------------------+ / ^ \ / | \ +---------------+ | +---------------+ | mm_struct | |.... | mm_struct | | | | | | +---------------+ | +---------------+ | + --------------+ | +---------------+ +------+--------+ | page +----------> page_cgroup| | | | | +---------------+ +---------------+ (Figure 1: Hierarchy of Accounting)
Figure 1 shows the important aspects of the controller
- Accounting happens per cgroup
- Each mm_struct knows about which cgroup it belongs to
- Each page has a pointer to the page_cgroup, which in turn knows the
cgroup it belongs to
struct mem_cgroup { |
root memcg的统计用的是struct mem_cgroup_stat_cpu
的count
:
struct mem_cgroup_stat_cpu { |
What is root memcg? 就是初始化时创建的cgroup就是root memory cgroup:
static struct cgroup_subsys_state * __ref |
cgroup_init -> cgroup_init_subsys -> mem_cgroup_css_alloc(NULL) |
init(system/core/init/init.cpp)会创建non root memcg,如下:
// Set memcg property based on kernel cmdline argument |
non root的memcg创建:
cgroup_mkdir -> create_css -> mem_cgroup_css_alloc |
ok, 那lmkd统计的mem usage其实就是root memcg的统计,root memcg就是把所有memcg的mem_cgroup_stat_cpu的count累加。
static unsigned long mem_cgroup_recursive_stat(struct mem_cgroup *memcg, |
memcg使用多少内存统计的是 MEM_CGROUP_STAT_CACHE + MEM_CGROUP_STAT_RSS + MEM_CGROUP_STAT_SWAP。
看下系统cgroups的情况:
xxx:/dev/memcg/system # cat /proc/cgroups |
memory这个cgroup子系统就一个hierarchy,其id是1,这个hierarchy里包含了128个memcg。
了解下hierarchy:
- Hierarchy support
The memory controller supports a deep hierarchy and hierarchical accounting.
The hierarchy is created by creating the appropriate cgroups in the
cgroup filesystem. Consider for example, the following cgroup filesystem
hierarchyroot / | \ / | \ a b c | \ | \ d e
In the diagram above, with hierarchical accounting enabled, all memory
usage of e, is accounted to its ancestors up until the root (i.e, c and root),
that has memory.use_hierarchy enabled. If one of the ancestors goes over its
limit, the reclaim algorithm reclaims from the tasks in the ancestor and the
children of the ancestor.
How to record
主要通过charge/uncharge来记录,文档说明:
Charge
a page/swp_entry may be charged (usage += PAGE_SIZE) atmem_cgroup_try_charge()
Uncharge
a page/swp_entry may be uncharged (usage -= PAGE_SIZE) bymem_cgroup_uncharge() Called when a page's refcount goes down to 0. mem_cgroup_uncharge_swap() Called when swp_entry's refcnt goes down to 0. A charge against swap disappears.
charge-commit-cancel
Memcg pages are charged in two steps:
mem_cgroup_try_charge()
mem_cgroup_commit_charge() or mem_cgroup_cancel_charge()
At try_charge(), there are no flags to say “this page is charged”.
at this point, usage += PAGE_SIZE.
At commit(), the page is associated with the memcg.
At cancel(), simply usage -= PAGE_SIZE.
对root memcg,mem_cgroup_try_charge
就不统计res_counter:
static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask, |
cancel charge:
static void cancel_charge(struct mem_cgroup *memcg, unsigned int nr_pages) |
so non root memcg和root memcg是分开统计的,具体实现后面再看。
那什么时候去统计了,比如增加a page到page cache里时会try charge。
参考文档
- kernel3.18/Documentation/cgroups/memory.txt
- kernel3.18/Documentation/cgroups/memcg_test.txt
版权声明:本站所有文章均采用 CC BY-NC-SA 4.0 CN 许可协议。转载请注明原文链接!