Power Efficient Idle Injection Jacob Pan Intel Open Source Technology Center LinuxCon Japan 2015 1 Agenda • • • • Introduction to idle injection Techniques available in Linux Experiment results Future work 2 Why Injecting Idle? • Primary: Thermal/Power limiting • Secondary: • Performance management • Pay per use • Idle power efficiency 3 Understanding Processor Idle States/C-States 4 Motivation For Idle Injection: Increasingly lower Idle power Deep idle power is negligible! Idle Power vs Running Power On Broadwell 16 14 14 Power (watt) 12 10 8 6 4 1.9 2 0.32 *TDP=Thermal Design Power 0 95% pc7 95% pc2 TDP C0 5 When to use idle injection? Idle injection at LFM (low frequency mode) Idle injection at LFM (low frequency mode) 6 Idle Injection in Linux • • Intel PowerClamp driver Scheduler throttling, RT or CFS bandwidth control 7 Intel Power Clamp V1 (current design in mainline kernel) The idea: play idle! 8 PowerClamp v1 timeline of idle injection sched tick throttled unthrottled RT kthread 9 Limitations of Intel PowerClamp V1 • • CPU appears busy while playing idle Scheduler ticks not stopped in NOHZ idle • Removal of tick_nohz_idle_enter/exit() API • RCU grace period • Relies on timely jiffies updates 10 Limitations of Intel PowerClamp V1 CPU appears busy while playing idle 11 Limitations of Intel PowerClamp V1 Scheduler ticks not stopped in NOHZ idle • Interrupted sleep is less efficient in power • Removal of tick_nohz_idle_enter/exit() API • RCU grace period 12 Limitations of Intel PowerClamp V1 Relies on secondary timing source • timely jiffy updates • periodic timers 13 Scheduler Based Throttling Normal tasks under completely fair scheduling (CFS) class › Bandwidth control via CPU control group/container › Runqueue throttling by enqueue/dequeue tasks Root CG CG1 CG1.1 CG2 CG1.2 CG2.1 14 Time chart of CFS Bandwidth Control (two cgroups multithreaded workload) • • Pros: No fake idle task, Finer per cgroup controls Cons: No synchronization loss of package C-state opportunities unthrottle cgroup1 throttle throttle unthrottle cgroup2 15 Power Clamp V2(work in progress) • • Runqueue throttling of CFS class Synchronization around rounded Ktime instead of jiffies 16 Time Chart Powerclamp v1 vs. v2 17 Experiment Data • Goals: • • • • Comparing Power Efficiency Scalability CPU HW design trend: old vs. new Configurations: • • • CPUs: Ivy Bridge/Haswell/Broadwell clients, Haswell EX server Workload:fspin by Len Brown. CPU bound, floating Test case: Inject idle from 0 to 50% at 5% increment 18 Power and Performance Control V1 vs. V2 19 Power Efficiency Comparison On A Client Platform 20 Scalability Tests V1 vs. V2 (144 core 4 socket Haswell EX) 21 Power Efficiency Comparison On A Server Platform 22 Comparing Deep vs. Shallow Package C-States (powerclamp v2) 23 Conclusions • • • • Idle injection can effectively reduce power beyond energy efficient frequency With deeper package C-states, can achieve near linear performance and power reduction Scheduler runqueue throttling results in cleaner and more efficient solution Align activities results in significant power savings 24 Future plan • • • • Better handling of interrupts Integration with scheduler Synchronize with devices with latency tolerance Work with hardware duty cycling 25 Backups 26 Time Chart of Redesigned Power Clamp 27 Entering Idle Injection Period 28 Exiting Idle Injection 29
© Copyright 2024