原文链接:https://mp.weixin.qq.com/s/PQb_PwgxzyeeFZyz3FsO6w

EROFS pluster 模式的用处:

It’s used to judge whether inplace I/O can be used due to the current status of pclusters in the chain.

有四种:INFLIGHT, HOOKED, FOLLOWED, FOLLOWED_NOINPLACE,本文源码参考 Linux kernel 6.x.

FOLLOWED 模式

/*
* The current collection has been linked with the owned chain, and
* could also be linked with the remaining collections, which means
* if the processing page is the tail page of the collection, thus
* the current collection can safely use the whole page (since
* the previous collection is under control) for in-place I/O, as
* illustrated below:
* ________________________________________________________________
* | tail (partial) page | head (partial) page |
* | (of the current cl) | (of the previous collection) |
* | PCLUSTER_FOLLOWED or | |
* |_____PCLUSTER_HOOKED__|___________PCLUSTER_FOLLOWED____________|
*
* [ (*) the above page can be used as inplace I/O. ]
*/
Z_EROFS_PCLUSTER_FOLLOWED,

注释写到这个模式表示当前收集的 pcluster 是被 link 到这个 owned chain,而且也可以和 remaining collections连在一起,怎么理解?我们直接看代码。

如果当前收集的 pcluster 已经存在,走z_erofs_try_to_claim_pcluster:

static void z_erofs_try_to_claim_pcluster(struct z_erofs_decompress_frontend *f)
{
struct z_erofs_pcluster *pcl = f->pcl;
z_erofs_next_pcluster_t *owned_head = &f->owned_head;

/* type 1, nil pcluster (this pcluster doesn't belong to any chain.) */
if (cmpxchg(&pcl->next, Z_EROFS_PCLUSTER_NIL,
*owned_head) == Z_EROFS_PCLUSTER_NIL) {
*owned_head = &pcl->next;
/* so we can attach this pcluster to our submission chain. */
f->mode = Z_EROFS_PCLUSTER_FOLLOWED;
return;
}

而这个 pcluster 已经解压过了:

 static int z_erofs_decompress_pcluster(struct z_erofs_decompress_backend *be,
int err)
{
[...]
/* pcluster lock MUST be taken before the following line */
WRITE_ONCE(pcl->next, Z_EROFS_PCLUSTER_NIL);
mutex_unlock(&pcl->lock);
return err;
}

也就是pcl->next == Z_EROFS_PCLUSTER_NIL,那就放到这个链里,跟在owned_head后面。

当收集到新的 pcluster 时,直接增加到这个 chain 里:

static int z_erofs_register_pcluster(struct z_erofs_decompress_frontend *fe)
{
[...]
pcl->next = fe->owned_head;
pcl->pageofs_out = map->m_la & ~PAGE_MASK;
fe->mode = Z_EROFS_PCLUSTER_FOLLOWED;
[...]
}

ok, 如果当前访问的 page 是整个收集的 tail page, 那这个 page 就可以用作 in-place I/O.

static int z_erofs_attach_page(struct z_erofs_decompress_frontend *fe,
struct z_erofs_bvec *bvec, bool exclusive)
{
int ret;

if (exclusive) {
/* give priority for inplaceio to use file pages first */
if (z_erofs_try_inplace_io(fe, bvec))
return 0;

如上,在 attach page 时如果exclusive为真,就会尝试 inplace I/O。

exclusive = (!cur && (!spiltted || tight));

当访问完 tail page 部分(从 page end 处开始),cur为0, 依赖tight, 这个tight就根据 pcluster 模式来定:

/*
* Ensure the current partial page belongs to this submit chain rather
* than other concurrent submit chains or the noio(bypass) chain since
* those chains are handled asynchronously thus the page cannot be used
* for inplace I/O or bvpage (should be processed in a strict order.)
*/
tight &= (fe->mode >= Z_EROFS_PCLUSTER_HOOKED &&
fe->mode != Z_EROFS_PCLUSTER_FOLLOWED_NOINPLACE);

也就是 tail page 所属的 pcluster 模式只有 HOOKED 或 FOLLOWED 才会把这个 page 用作 inplace I/O。

HOOKED 模式

/*
* The current pclusters was the tail of an exist chain, in addition
* that the previous processed chained pclusters are all decided to
* be hooked up to it.
* A new chain will be created for the remaining pclusters which are
* not processed yet, so different from Z_EROFS_PCLUSTER_FOLLOWED,
* the next pcluster cannot reuse the whole page safely for inplace I/O
* in the following scenario:
* ________________________________________________________________
* | tail (partial) page | head (partial) page |
* | (belongs to the next pcl) | (belongs to the current pcl) |
* |_______PCLUSTER_FOLLOWED______|________PCLUSTER_HOOKED__________|
*/
Z_EROFS_PCLUSTER_HOOKED,

当前的 pcluster 处在一个已经存在的 chain 的尾部,也就是 pcl->next == Z_EROFS_PCLUSTER_TAIL,那么就新建一个 chain 给接下来的收集好了。

static void z_erofs_try_to_claim_pcluster(struct z_erofs_decompress_frontend *f)
{
[...]
/*
* type 2, link to the end of an existing open chain, be careful
* that its submission is controlled by the original attached chain.
*/
if (*owned_head != &pcl->next && pcl != f->tailpcl &&
cmpxchg(&pcl->next, Z_EROFS_PCLUSTER_TAIL,
*owned_head) == Z_EROFS_PCLUSTER_TAIL) {
*owned_head = Z_EROFS_PCLUSTER_TAIL;
f->mode = Z_EROFS_PCLUSTER_HOOKED;
f->tailpcl = NULL;
return;
}

那么这个 tight 就是 false 了。

if (cur)
tight &= (fe->mode >= Z_EROFS_PCLUSTER_FOLLOWED);

当访问 head page 时,cur还未变成 0,显然不是exclusive,也就不能走 inplace I/O了。

cur = end - min_t(unsigned int, offset + end - map->m_la, end);

INFLIGHT 模式

对一个已经存在的 pcluster,除了 nil 的情况,要么它是一个 chain 的 end (上面的 HOOKED),要么它不是一个 chain 的 end.

static void z_erofs_try_to_claim_pcluster(struct z_erofs_decompress_frontend *f)
{
[...]
/* type 3, it belongs to a chain, but it isn't the end of the chain */
f->mode = Z_EROFS_PCLUSTER_INFLIGHT;
}

FOLLOWED_NOINPLACE 模式

看命名就大概知道了,这个模式不需要 inplace I/O。

    /*
* a weak form of Z_EROFS_PCLUSTER_FOLLOWED, the difference is that it
* could be dispatched into bypass queue later due to uptodated managed
* pages. All related online pages cannot be reused for inplace I/O (or
* bvpage) since it can be directly decoded without I/O submission.
*/
Z_EROFS_PCLUSTER_FOLLOWED_NOINPLACE,
};

z_erofs_bind_cache()如果find_get_page()都找到了 pcluster 的所有 pages,那就不用 I/O 了。

static void z_erofs_bind_cache(struct z_erofs_decompress_frontend *fe,
struct page **pagepool)
{
[...]
for (i = 0; i < pcl->pclusterpages; ++i) {
[...]
page = find_get_page(mc, pcl->obj.index + i);

if (page) {
t = (void *)((unsigned long)page | 1);
} else {
/* I/O is needed, no possible to decompress directly */
standalone = false;
[...]
}
/*
* don't do inplace I/O if all compressed pages are available in
* managed cache since it can be moved to the bypass queue instead.
*/
if (standalone)
fe->mode = Z_EROFS_PCLUSTER_FOLLOWED_NOINPLACE;
}

另外,inline 的情况也不需要 inplace I/O:

static int z_erofs_do_read_page(struct z_erofs_decompress_frontend *fe,
struct page *page, struct page **pagepool)
{
[...]
if (z_erofs_is_inline_pcluster(fe->pcl)) {
[...]
fe->mode = Z_EROFS_PCLUSTER_FOLLOWED_NOINPLACE;
} else {

BTW: 最新的版本已经去掉了HOOK模式。