为什么 WQ_CPU_INTENSIVE 对 unbound 工作队列没有意义

问题

如题，Linux workqueue 文档在描述 WQ_CPU_INSTENSIVE 时提到:

This flag is meaningless for unbound wq.

为什么这样说？本文尝试去解读这个标记。

unbound wq 就是 WQ_UNBOUND:

WQ_UNBOUND
Work items queued to an unbound wq are served by the special
worker-pools which host workers which are not bound to any
specific CPU. This makes the wq behave as a simple execution
context provider without concurrency management. The unbound
worker-pools try to start execution of work items as soon as
possible.\ Unbound wq sacrifices locality but is useful for
the following cases.

* Wide fluctuation in the concurrency level requirement is
\ expected and using bound wq may end up creating large number
\ of mostly unused workers across different CPUs as the issuer
\ hops through different CPUs.

* Long running CPU intensive workloads which can be better
\ managed by the system scheduler.

就是 worker 不绑 CPU，有用的场景一是并发级别波动比较大，一时任务多，一时任务少。另一个就是对长时间运行 CPU intensive 的工作。

什么是 WQ_CPU_INTENSIVE:

WQ_CPU_INTENSIVE
Work items of a CPU intensive wq do not contribute to the
concurrency level. In other words, runnable CPU intensive
work items will not prevent other work items in the same
worker-pool from starting execution. This is useful for bound
work items which are expected to hog CPU cycles so that their
execution is regulated by the system scheduler.

带这个标记的 wq 表示这个 wq 里面都是 CPU intensive work，这些 works都期望独占CPU，加了这个标记后它们的调度由系统完成。

这些 works do not contribute to the concurrency level 怎么理解？来看code, 参考6.x:

代码分析

带有 CPU intensive 的 wq 源头处理只有 process_one_work：

static void process_one_work(struct worker *worker, struct work_struct *work)
__releases(&pool->lock)
__acquires(&pool->lock)
{
        struct pool_workqueue *pwq = get_work_pwq(work);
        struct worker_pool *pool = worker->pool;
        bool cpu_intensive = pwq->wq->flags & WQ_CPU_INTENSIVE; //tj: here
        [...]
        /*
         * CPU intensive works don't participate in concurrency management.
         * They're the scheduler's responsibility.  This takes @worker out
         * of concurrency management and the next code block will chain
         * execution of the pending work items.
         */
        if (unlikely(cpu_intensive))
                worker_set_flags(worker, WORKER_CPU_INTENSIVE);

注释写到：CPU intensive works 的并发是由 scheduler 管而不归 workqueue 管。具体怎么做的了, 在worker_set_flags()里。

static inline void worker_set_flags(struct worker *worker, unsigned int flags)
{
        struct worker_pool *pool = worker->pool;

        WARN_ON_ONCE(worker->task != current);

        /* If transitioning into NOT_RUNNING, adjust nr_running. */
        if ((flags & WORKER_NOT_RUNNING) &&
            !(worker->flags & WORKER_NOT_RUNNING)) {
                pool->nr_running--;
        }

        worker->flags |= flags;
}

真正有作用的只有两处：一个是 nr_running--，另一个就是把 WORKER_CPU_INTENSIVE 这个标记加到 worker 里。

nr_running:

/*
 * The counter is incremented in a process context on the associated CPU
 * w/ preemption disabled, and decremented or reset in the same context
 * but w/ pool->lock held. The readers grab pool->lock and are
 * guaranteed to see if the counter reached zero.
 */
int			nr_running;

怎么用的了？

void wq_worker_running(struct task_struct *task)
{
	[...]
	preempt_disable();
	if (!(worker->flags & WORKER_NOT_RUNNING))
		worker->pool->nr_running++;

void wq_worker_sleeping(struct task_struct *task)
{
	[...]
	pool->nr_running--;
}

static inline void worker_clr_flags(struct worker *worker, unsigned int flags)
{
	[...]
	if ((flags & WORKER_NOT_RUNNING) && (oflags & WORKER_NOT_RUNNING))
		if (!(worker->flags & WORKER_NOT_RUNNING))
			pool->nr_running++;
}

static void unbind_workers(int cpu)
{
	[...]
		/*
		 * The handling of nr_running in sched callbacks are disabled
		 * now.  Zap nr_running.  After this, nr_running stays zero and
		 * need_more_worker() and keep_working() are always true as
		 * long as the worklist is not empty.  This pool now behaves as
		 * an unbound (in terms of concurrency management) pool which
		 * are served by workers tied to the pool.
		 */
		pool->nr_running = 0;

static bool __need_more_worker(struct worker_pool *pool)
{
	return !pool->nr_running;
}

从以上的使用可以看出来 nr_running 就关联着并发管理，如果没有nr_running，那就不再需要worker thread了。

回到上文，cpu_intensive时的nr_running--就是把这个 worker 从并发里面拿走，这也是 WQ_CPU_INTENSIVE 的意义所在，也正如文档描述的那样。

至于 worker 的 flags 有：

/* worker flags */
WORKER_DIE		= 1 << 1,	/* die die die */
WORKER_IDLE		= 1 << 2,	/* is idle */
WORKER_PREP		= 1 << 3,	/* preparing to run works */
WORKER_CPU_INTENSIVE	= 1 << 6,	/* cpu intensive */
WORKER_UNBOUND		= 1 << 7,	/* worker is unbound */
WORKER_REBOUND		= 1 << 8,	/* worker was rebound */

WORKER_NOT_RUNNING	= WORKER_PREP | WORKER_CPU_INTENSIVE |
			  WORKER_UNBOUND | WORKER_REBOUND,

而 WORKER_CPU_INTENSIVE 的 check 全部都是和 WORKER_NOT_RUNNING 有关。

那影响 CPU_INTENSIVE 行为的只有最开始的 worker_set_flags，rt? 也就是那个 nr_running-- 的条件，也就是说如果 worker->flags 包含 WORKER_NOT_RUNNING 里的一种时，WORKER_CPU_INTENSIVE 这个标记就没意义了。

ok, 那带上 WQ_UNBOUND 来看下：

alloc_workqueue -> alloc_and_link_pwqs:

static int alloc_and_link_pwqs(struct workqueue_struct *wq)
{
	bool highpri = wq->flags & WQ_HIGHPRI;
	int cpu, ret;

	if (!(wq->flags & WQ_UNBOUND)) {
		[...]
		return 0;
	}
	// tj: 下面是for WQ_UNBOUND
	if (wq->flags & __WQ_ORDERED) {
		ret = apply_workqueue_attrs(wq, ordered_wq_attrs[highpri]);
		/* there should only be single pwq for ordering guarantee */
		WARN(!ret && (wq->pwqs.next != &wq->dfl_pwq->pwqs_node ||
			      wq->pwqs.prev != &wq->dfl_pwq->pwqs_node),
		     "ordering guarantee broken for workqueue %s\n", wq->name);
	} else {
		ret = apply_workqueue_attrs(wq, unbound_std_wq_attrs[highpri]);
	}

apply_workqueue_attrs() -> apply_workqueue_attrs_locked() -> apply_wqattrs_prepare() -> alloc_unbound_pwq() -> get_unbound_pool()

get_unbound_pool()
        |->  init_worker_pool() -> set POOL_DISASSOCIATED
        |->  create_worker() -> worker_attach_to_pool -> set WORKER_UNBOUND if pool flags has POOL_DISASSOCIATED

init_worker_pool 会默认 set pool 为 POOL_DISASSOCIATED:

/*
 * worker_pool flags
 *
 * A bound pool is either associated or disassociated with its CPU.
 * While associated (!DISASSOCIATED), all workers are bound to the
 * CPU and none has %WORKER_UNBOUND set and concurrency management
 * is in effect.
 *
 * While DISASSOCIATED, the cpu may be offline and all workers have
 * %WORKER_UNBOUND set and concurrency management disabled, and may
 * be executing on any CPU.  The pool behaves as an unbound one.
 *
[...]
POOL_DISASSOCIATED	= 1 << 2,	/* cpu can't serve workers */

POOL_DISASSOCIATED 就是没有并发管理了，这个 worker pool 是 unbound pool，里面所有 worker 都是 unbound worker，可以在任意 cpu 上执行。

create_worker 会创建一个 worker_thread:

static struct worker *create_worker(struct worker_pool *pool)
{
	[...]
	worker->task = kthread_create_on_node(worker_thread, worker, pool->node,
					      "kworker/%s", id_buf);

在后面的 worker_attach_to_pool 时会设置 WORKER_UNBOUND:

static void worker_attach_to_pool(struct worker *worker,
				   struct worker_pool *pool)
{
	mutex_lock(&wq_pool_attach_mutex);

	/*
	 * The wq_pool_attach_mutex ensures %POOL_DISASSOCIATED remains
	 * stable across this function.  See the comments above the flag
	 * definition for details.
	 */
	if (pool->flags & POOL_DISASSOCIATED)
		worker->flags |= WORKER_UNBOUND; //tj: here
	else
		kthread_set_per_cpu(worker->task, pool->cpu);

ok, 当有 work item 进来时, worker 会被唤醒处理 work item:

__queue_work ->  insert_work -> wake_up_worker

worker_thread -> process_one_work

此时，worker 的 flags 是带有 WORKER_UNBOUND 的，也就是这个 worker 表示 WORKER_NOT_RUNNING，那 cpu intensive时 nr_running 如下就不会递减喽。

/* If transitioning into NOT_RUNNING, adjust nr_running. */
if ((flags & WORKER_NOT_RUNNING) &&
    !(worker->flags & WORKER_NOT_RUNNING)) {
        pool->nr_running--;
}

so, 点个题：alloc_workqueue() 的 flags 被设置为 WQ_UNBOUND 并且同时是 WQ_CPU_INTENSIVE 时，WQ_CPU_INTENSIVE 是没有发挥作用滴。