schedutil CPUFreq governor 代码分析
参考内核5.x, code是kernel/sched/cpufreq_schedutil.c,配置说明:
config CPU_FREQ_GOV_SCHEDUTIL
bool “‘schedutil’ cpufreq policy governor”
depends on CPU_FREQ && SMP
select CPU_FREQ_GOV_ATTR_SET
select IRQ_WORK
help
This governor makes decisions based on the utilization data provided
by the scheduler. It sets the CPU frequency to be proportional to
the utilization/capacity ratio coming from the scheduler. If the
utilization is frequency-invariant, the new frequency is also
proportional to the maximum available frequency. If that is not the
case, it is proportional to the current frequency of the CPU. The
frequency tipping point is at utilization/capacity equal to 80% in
both cases.
就是根据调度时CPU utilization变化进行调频。EAS只用schedutil,因为EAS也是根据util来调度,文档描述:
told to do, for example), schedutil as opposed to other CPUFreq governors at
least requests frequencies calculated using the utilization signals.
Consequently, the only sane governor to use together with EAS is schedutil,
because it is the only one providing some degree of consistency between
frequency requests and energy predictions.
Update Utilization
that is cpufreq_update_util()
, util变化时会call,比如CFS:
static inline void cfs_rq_util_change(struct cfs_rq *cfs_rq, int flags) |
static inline void cpufreq_update_util(struct rq *rq, unsigned int flags) |
这里的->func
是在sugov_start()
里注册的:
if (policy_is_shared(policy)) |
void cpufreq_add_update_util_hook(int cpu, struct update_util_data *data, |
这里的更新还区分shared(多个CPU共享)和single(only单个CPU使用)两种policy。
同时,还支持快速和慢速两种更新方式,比如在sugov_update_single_freq()
:
if (sg_policy->policy->fast_switch_enabled) { |
sugov_fast_switch()
:
static void sugov_fast_switch(struct sugov_policy *sg_policy, u64 time, |
slow path借助一个work来完成,call __cpufreq_driver_target()
完成,需要mutex。
static void sugov_work(struct kthread_work *work) |
切换之前还有个if (sugov_update_next_freq())
:
static bool sugov_update_next_freq(struct sugov_policy *sg_policy, u64 time, |
这里的逻辑是只要need_freq_update
flag被设置了,那就更新。(也符合这个flag的名字,rt:) 反之,如果当前freq和之前的一样,就不要更新了。
need_freq_update
来自limits_changed
:
if (unlikely(sg_policy->limits_changed)) { |
set limits_changed
有两处:
一是sugov_limits()
,就是policy min/max改变了。另一个是ignore_dl_rate_limit()
:
/* |
如果deadline task增加了CPU utilization那就忽略限速,强制更新。
sugov_get_util()
:
static void sugov_get_util(struct sugov_cpu *sg_cpu) |
通过effective_cpu_util()
获取当前的CPU utilization,里面会根据schedule class来聚合util,涉及调度侧,暂略过。
计算频率
get_next_freq()
:
/** |
注释很详细,next freq就是: (1.25 * freq * util / max), 1.25来自临界点0.8。
IO-wait boosting
如果任务最近一直在等待I/O,那么就(逐步)调频到最大的freq,以免连续IO请求吞吐性能不足。
主要是两个函数:sugov_iowait_boost()
和sugov_iowait_apply()
。
sugov_iowait_boost()
:
/** |
留意最后一段:为了保持doubling,IO boost每个tick至少来1次,否则重新开始。
static void sugov_iowait_boost(struct sugov_cpu *sg_cpu, u64 time, |
当iowait_boost
有值,如果此boost超过1个tick,那忽略这次boost,复位从头再来。
static bool sugov_iowait_reset(struct sugov_cpu *sg_cpu, u64 time, |
/* Boost only tasks waking up after IO */ |
SCHED_CPUFREQ_IOWAIT
是由调度传过来in enqueue_task_fair()
,没配置就不需要boost了。
/* |
继续看:
/* Ensure boost doubles only one time at each request */ |
每一次boost只double一次。
/* First wakeup after IO: start with minimum boost */ |
第一次boost从IOWAIT_BOOST_MIN
开始。
Rate Limit
降低freq update过快带来的消耗:
Minimum time (in microseconds) that has to pass between two consecutive runs of governor computations (default: 1000 times the scaling driver's transition latency). The purpose of this tunable is to reduce the scheduler context overhead of the governor which might be excessive without it.
static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time) |
还有个DL的rate limit(ignore_dl_rate_limit()
):
/* |
看下调用:
ignore_dl_rate_limit(sg_cpu); |
if (unlikely(sg_policy->limits_changed)) { |
走need_freq_update
强制更新。
Refer Doc
- Documentation/admin-guide/pm/cpufreq.rst (schedutil)
- Documentation/scheduler/sched-energy.rst
版权声明:本站所有文章均采用 CC BY-NC-SA 4.0 CN 许可协议。转载请注明原文链接!