参考4.9,CPUFreq即CPU Frequency scaling或者CPU performance scaling,允许你运行时改变CPU clock speed.

文档路径: ./Documentation/cpu-freq/,先来看几个概念:

Some CPU frequency scaling-capable processor switch between various
frequencies and operating voltages “on the fly” without any kernel or
user involvement. This guarantees very fast switching to a frequency
which is high enough to serve the user’s needs, but low enough to save
power.

2.1 Policy

On these systems, all you can do is select the lower and upper
frequency limit as well as whether you want more aggressive
power-saving or more instantly available processing power.

有些CPU不需要由kernel控制直接切换频率,这种切换快速但耗电。在这种系统中,用户就配个max/min freq即可,,可以看到被wrap到了policy中。在代码里就是->setpolicy,后面在看。

2.2 Governor

On all other cpufreq implementations, these boundaries still need to
be set. Then, a “governor” must be selected. Such a “governor” decides
what speed the processor shall run within the boundaries. One such
“governor” is the “userspace” governor. This one allows the user - or
a yet-to-implement userspace program - to decide what specific speed
the processor shall run at.

而其他系统就是由kernel控制了,kernel要选择一个governor,由这个governor来控制,同时也要设置max/min频率。在代码里不需要用->setpolicy了,取代的是->target or ->target_index。显然这个系统会save more power相比->setpolicy

这个governor应该分成2大类,一类主要静态调频成最大(performance) or 最小(powersave),其他都是给动态调频用,比如4.9内核高通手机平台用的schedutil,文档有个流程图很清晰:

CPU can be set to switch independently   |         CPU can only be set 
within specific "limits" | to specific frequencies

"CPUfreq policy"
consists of frequency limits (policy->{min,max})
and CPUfreq governor to be used
/ \
/ \
/ the cpufreq governor decides
/ (dynamically or statically)
/ what target_freq to set within
/ the limits of policy->{min,max}
/ \
/ \
Using the ->setpolicy call, Using the ->target/target_index call,
the limits and the the frequency closest
"policy" is set. to target_freq is set.
It is assured that it
is within policy->{min,max}

另外,有些cpufreq sysfs经常用到,来看几个有疑问的:

cpuinfo_cur_freq :              Current frequency of the CPU as obtained from
the hardware, in KHz. This is the frequency
the CPU actually runs at.
scaling_cur_freq :              Current frequency of the CPU as determined by
the governor and cpufreq core, in KHz. This is
the frequency the kernel thinks the CPU runs
at.

可见cpuinfo_cur_freq是从硬件读出来的,scaling_cur_freq由governor决定。看下相关代码:

/**
* show_cpuinfo_cur_freq - current CPU frequency as detected by hardware
*/
static ssize_t show_cpuinfo_cur_freq(struct cpufreq_policy *policy,
char *buf)
{
unsigned int cur_freq = __cpufreq_get(policy);

if (cur_freq)
return sprintf(buf, "%u\n", cur_freq);

return sprintf(buf, "<unknown>\n");
}
static unsigned int __cpufreq_get(struct cpufreq_policy *policy)
{
unsigned int ret_freq = 0;

if (!cpufreq_driver->get)
return ret_freq;

ret_freq = cpufreq_driver->get(policy->cpu);

比如QCOM:

static struct cpufreq_driver msm_cpufreq_driver = {
...
.get = msm_cpufreq_get_freq,

static unsigned int msm_cpufreq_get_freq(unsigned int cpu)
{
return clk_get_rate(cpu_clk[cpu]) / 1000;
}

ok,直接从cpu_clk而来。再看下scaling_cur_freq:

static ssize_t show_scaling_cur_freq(struct cpufreq_policy *policy, char *buf)
{
ssize_t ret;

if (cpufreq_driver && cpufreq_driver->setpolicy && cpufreq_driver->get)
ret = sprintf(buf, "%u\n", cpufreq_driver->get(policy->cpu));
else
ret = sprintf(buf, "%u\n", policy->cur);
return ret;
}

竟然用了cpufreq_driver->get,这个不是从硬件获取的么,可以直接看cpuinfo啊,原来是为了某些用户态工具?

commit c034b02e213d271b98c45c4a7b54af8f69aaac1e
Author: Dirk Brandewie <dirk.j.brandewie@intel.com>
Date: Mon Oct 13 08:37:40 2014 -0700

cpufreq: expose scaling_cur_freq sysfs file for set_policy() drivers

Currently the core does not expose scaling_cur_freq for set_policy()
drivers this breaks some userspace monitoring tools.
Change the core to expose this file for all drivers and if the
set_policy() driver supports the get() callback use it to retrieve the
current frequency.

不过set_policy driver也属于CPUFreq scaling,加上也合理。

来看两个max/min freq:

scaling_min_freq and 
scaling_max_freq show the current "policy limits" (in
kHz). By echoing new values into these
files, you can change these limits.
NOTE: when setting a policy you need to
first set scaling_max_freq, then
scaling_min_freq.
cpuinfo_min_freq :              this file shows the minimum operating
frequency the processor can run at(in kHz)
cpuinfo_max_freq : this file shows the maximum operating
frequency the processor can run at(in kHz)

看代码:

#define show_one(file_name, object)			\
static ssize_t show_##file_name \
(struct cpufreq_policy *policy, char *buf) \
{ \
return sprintf(buf, "%u\n", policy->object); \
}

show_one(cpuinfo_min_freq, cpuinfo.min_freq);
show_one(cpuinfo_max_freq, cpuinfo.max_freq);
show_one(scaling_min_freq, min);
show_one(scaling_max_freq, max);

cpufreq_table_validate_and_show()会detect这个cpuinfo_min/max_freq,同时设置scaling_min/max_freq成一样的值:

int cpufreq_frequency_table_cpuinfo(struct cpufreq_policy *policy,
struct cpufreq_frequency_table *table)
{
struct cpufreq_frequency_table *pos;
unsigned int min_freq = ~0;
unsigned int max_freq = 0;
unsigned int freq;

cpufreq_for_each_valid_entry(pos, table) {
freq = pos->frequency;

if (!cpufreq_boost_enabled()
&& (pos->flags & CPUFREQ_BOOST_FREQ))
continue;

pr_debug("table entry %u: %u kHz\n", (int)(pos - table), freq);
if (freq < min_freq)
min_freq = freq;
if (freq > max_freq)
max_freq = freq;
}

policy->min = policy->cpuinfo.min_freq = min_freq;
policy->max = policy->cpuinfo.max_freq = max_freq;

table一般从dt中配置,scaling_min/max_freq可以修改这个节点设置成新的policy:

store_one(scaling_min_freq, min);
store_one(scaling_max_freq, max);

...
/*
* policy : current policy.
* new_policy: policy to be set.
*/
static int cpufreq_set_policy(struct cpufreq_policy *policy,
struct cpufreq_policy *new_policy)
{
struct cpufreq_governor *old_gov;
int ret;

pr_debug("setting new policy for CPU %u: %u - %u kHz\n",
new_policy->cpu, new_policy->min, new_policy->max);

ok。 目前CPUFreq子系统分成3层:

+------------------+
| scaling governor ---------------> such as: cpufreq_performance.c
|------------------+
| the core ----------------> cpufreq.c + freq_table.c
|------------------+
| scaling driver ----------------> such as: cpufreq-dt.c
+------------------+

高版本有变更描述,挪到Documentation/admin-guide/pm/cpufreq.rst:

CPU Performance Scaling in Linux

The Linux kernel supports CPU performance scaling by means of the CPUFreq
(CPU Frequency scaling) subsystem that consists of three layers of code: the
core, scaling governors and scaling drivers.

Done.