内核msleep()耗时过多优化

高通低端平台启动时间发现有个地方耗时达2.5s，Linux内核版本3.18，具体是msm camera driver里使用了如下code:

for(i=0;i<128;i++)
{
   do_something();
   msleep(delay);
   do_something();
}

增加log，delay是1ms，从log看实际延时多达20ms，整个循环下来达2.5s，如何达到预期的延时呢。

来看下内核文档Documentation/timers/timers-howto.txt关于延时的描述:

ATOMIC CONTEXT:
You must use the *delay family of functions. These
functions use the jiffie estimation of clock speed
and will busy wait for enough loop cycles to achieve
the desired delay:

   ndelay(unsigned long nsecs)
   udelay(unsigned long usecs)
   mdelay(unsigned long msecs)

   udelay is the generally preferred API; ndelay-level
   precision may not actually exist on many non-PC devices.

   mdelay is macro wrapper around udelay, to account for 
   possible overflow when passing large arguments to udelay.
   In general, use of mdelay is discouraged and code should
   be refactored to allow for the use of msleep.

原子操作内延时必须用*delay系列忙等函数，我们来看下udelay的实现include/asm-generic/delay.h：

#define udelay(n)                                                       \
        ({                                                              \
                if (__builtin_constant_p(n)) {                          \
                        if ((n) / 20000 >= 1)                           \
                                 __bad_udelay();                        \
                        else                                            \
                                __const_udelay((n) * 0x10c7ul);         \
                } else {                                                \
                        __udelay(n);                                    \
                }                                                       \
        })

__bad_udelay()范围出错，__udelay()和__const_udelay()都依赖于硬件架构，比如ARM在arch/arm/include/asm/delay.h:

#define __udelay(n)             arm_delay_ops.udelay(n)
#define __const_udelay(n)       arm_delay_ops.const_udelay(n)

最后会call:

#if __LINUX_ARM_ARCH__ == 6 || defined(CONFIG_ARM_ERRATA_754327)
#define cpu_relax()                     smp_mb()
#else
#define cpu_relax()                     barrier()
#endif

static void __timer_delay(unsigned long cycles)
{
        cycles_t start = get_cycles();

        while ((get_cycles() - start) < cycles)
                cpu_relax();
}

barrier()是啥，内核文档Documentation/memory-barriers.txt说明：

=========================
WHAT ARE MEMORY BARRIERS?
As can be seen above, independent memory operations are effectively performed
in random order, but this can be a problem for CPU-CPU interaction and for I/O.
What is required is some way of intervening to instruct the compiler and the
CPU to restrict the order.

Memory barriers are such interventions. They impose a perceived partial
ordering over the memory operations on either side of the barrier.

Such enforcement is important because the CPUs and other devices in a system
can use a variety of tricks to improve performance, including reordering,
deferral and combination of memory operations; speculative loads; speculative
branch prediction and various types of caching. Memory barriers are used to
override or suppress these tricks, allowing the code to sanely control the
interaction of multiple CPUs and/or devices.

barrier就是强制memory operations的唯一有序性，rt? 具体以后再看。

NON-ATOMIC CONTEXT:
You should use the *sleep[_range] family of functions.
There are a few more options here, while any of them may
work correctly, using the “right” sleep function will
help the scheduler, power management, and just make your
driver better :)
   -- Backed by busy-wait loop:
           udelay(unsigned long usecs)
   -- Backed by hrtimers:
           usleep_range(unsigned long min, unsigned long max)
   -- Backed by jiffies / legacy_timers
           msleep(unsigned long msecs)
           msleep_interruptible(unsigned long msecs)

   Unlike the *delay family, the underlying mechanism
   driving each of these calls varies, thus there are 
   quirks you should be aware of. 

非原子操作内使用*sleep[_range]函数，用的不好就会带来性能问题，具体是注意一些quirks，来看看是哪些：

   SLEEPING FOR "A FEW" USECS ( < ~10us? ):
           * Use udelay

           - Why not usleep?
                   On slower systems, (embedded, OR perhaps a speed-
                   stepped PC!) the overhead of setting up the hrtimers
                   for usleep *may* not be worth it. Such an evaluation
                   will obviously depend on your specific situation, but
                   it is something to be aware of.

< ~10us的也用udelay，不用hrtimers实现的usleep担心开销太大。what is hrtimers? see Documentation/timers/hrtimers.txt:

subsystem for high-resolution kernel timers

高精定时器，肯定准哈。

   SLEEPING FOR ~USECS OR SMALL MSECS ( 10us - 20ms):
           * Use usleep_range

           - Why not msleep for (1ms - 20ms)?
                   Explained originally here:
                           http://lkml.org/lkml/2007/8/3/250
                   msleep(1~20) may not do what the caller intends, and
                   will often sleep longer (~20 ms actual sleep for any
                   value given in the 1~20ms range). In many cases this
                   is not the desired behavior.

           - Why is there no "usleep" / What is a good range?
                   Since usleep_range is built on top of hrtimers, the
                   wakeup will be very precise (ish), thus a simple
                   usleep function would likely introduce a large number
                   of undesired interrupts.

                   With the introduction of a range, the scheduler is
                   free to coalesce your wakeup with any other wakeup
                   that may have happened for other reasons, or at the
                   worst case, fire an interrupt for your upper bound.

                   The larger a range you supply, the greater a chance
                   that you will not trigger an interrupt; this should
                   be balanced with what is an acceptable upper bound on
                   delay / performance for your specific code path. Exact
                   tolerances here are very situation specific, thus it
                   is left to the caller to determine a reasonable range.

对10us - 20ms的使用usleep_range，那我们这里是delay 1ms，不应该使用msleep，原因上面说的很清楚了就是会sleep longer，应该用usleep_range，为啥要加range？没有range会带来很多中断？具体range和你的caller有关。

   SLEEPING FOR LARGER MSECS ( 10ms+ )
           * Use msleep or possibly msleep_interruptible

           - What's the difference?
                   msleep sets the current task to TASK_UNINTERRUPTIBLE
                   whereas msleep_interruptible sets the current task to
                   TASK_INTERRUPTIBLE before scheduling the sleep. In
                   short, the difference is whether the sleep can be ended
                   early by a signal. In general, just use msleep unless
                   you know you have a need for the interruptible variant

对应延时10ms+的使用msleep or msleep_interruptible。

那这里的range写多少了，我想这个问题内核drv肯定也有人遇到过，果然这个patch就是优化这类耗时：

Date	Tue, 29 Nov 2016 07:51:55 +0100
From	Vojtech Pavlik <>
Subject	Re: [PATCH] Input: joystick: gf2k - change msleep to usleep_range for small msecs
	

    share 0
    share 0

On Tue, Nov 29, 2016 at 01:11:49AM +0530, Aniroop Mathur wrote:

> msleep(1~20) may not do what the caller intends, and will often sleep longer.
> (~20 ms actual sleep for any value given in the 1~20ms range)
> This is not the desired behaviour for many cases like device resume time,
> device suspend time, device enable time, connection time, probe time,
> loops, retry logic, etc
> msleep is built on jiffies / legacy timers which are not precise whereas
> usleep_range is build on top of hrtimers so the wakeups are precise.
> Thus, change msleep to usleep_range for precise wakeups.
> 
> For example:
> On a machine with tick rate / HZ as 100, msleep(4) will make the process to
> sleep for a minimum period of 10 ms whereas usleep_range(4000, 4100) will make
> sure that the process does not sleep for more than 4100 us or 4.1ms

And once more, patch not needed.

> 
> Signed-off-by: Aniroop Mathur <a.mathur@samsung.com>
> ---
>  drivers/input/joystick/gf2k.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/input/joystick/gf2k.c b/drivers/input/joystick/gf2k.c

msleep(delay)改成usleep_range(delay*1000, delay*1000 + 100)，试了下，果然精确延时，节省了2.5s：]

内核msleep()耗时过多优化

内核msleep()耗时过多优化

=========================WHAT ARE MEMORY BARRIERS?

=========================
WHAT ARE MEMORY BARRIERS?