System-level IPC on Multi-core Platforms SICS Multicore Day – 2013-09-23 Ola Dahl CTO Office Enea Enea Confidential – Under Copyright © 2013 EneaNDA AB Before we start • Enea ~400 employees 468 MSEK revenue Products and Services Services FOUNDED Middleware OSE Linux 1968 Now LTH • Myself Enea Confidential – Under Copyright © 2013 EneaNDA AB STLiU Ericsson System-level IPC Message-passing between processes – intra-node and inter-node Monitoring and event handling – fault-tolerance OSE operating system – kernel services, file system services, IP communication, program management, run-time loader, LINX Number of communicating entities ~ tens of thousands (pid space extension from 16 to 20 bits) – number of nodes ~ 100s Enea Confidential – Under Copyright © 2013 EneaNDA AB System-level IPC Element Messaging Framework – Name server, message dispatch, communication patterns, HA functionality, Linux C C C C A A #nodes ~ 100(s) #threads/node ~ 1000s B D B D D A D B D D C Elastic Multi-Node Fixed Multi-Node A D B C SoC Platform Cloud Enea Confidential – Under Copyright © 2013 EneaNDA AB IPC Operating System Operating System Communicating entities - Linux process, Linux thread, RTOS task, Bare-metal executive, User-space thread, Other executing entity (e.g. in an event-driven execution model) Enea Confidential – Under Copyright © 2013 EneaNDA AB IPC and Multicore Operating System C0 C1 C2 C3 Operating System C0 C4 Bus, Interconnect, Cache, Controllers, I/O C1 D0 D1 D2 Bus, Interconnect, Cache, Controllers, I/O Multicore, Multiple processing entities, Parallelism on different levels – inside one SoC block, inside SoC, between SoC Communication on different levels – interconnect, caches, memory, hardware buffers and hardware IPC support Enea Confidential – Under Copyright © 2013 EneaNDA AB IPC and Multicore Realtime Operating System C0 C1 C2 C3 Non-Realtime Operating System C0 C4 Bus, Interconnect, Cache, Controllers, I/O C1 D0 D1 D2 Bus, Interconnect, Cache, Controllers, I/O Multicore, Multiple processing entities, Parallelism on different levels – inside one SoC block, inside SoC, between SoC Communication on different levels – interconnect, caches, memory, hardware buffers and hardware IPC support Real-time – core isolation – dedicated cores for real-time response Enea Confidential – Under Copyright © 2013 EneaNDA AB Heterogeneous Hardware TCI6638K2K - Multicore DSP+ARM KeyStone II System-on-Chip http://www.ti.com/product/tci6638k2k Processing – 8 C66x DSP Cores (up to 1.2 GHz), 4 ARM Cores (up to 1.4 GHz), Wireless comm (3GPP) coprocessors Interconnect and control - Multicore Navigator, TeraNet, Multicore Shared Memory Controller, HyperLink Enea Confidential – Under Copyright © 2013 EneaNDA AB Heterogeneous Software Core isolation for real-time response Realtime Non-Realtime Real-time domain and non-real-time domain Run-time categories in real-time domain • Native threads • User-space threads • RTOS migration • Other execution frameworks, e.g. Open Event Machine • ENEA LWRT Operating System C0 C1 D0 D1 D2 Bus, Interconnect, Cache, Controllers, I/O Enea Confidential – Under Copyright © 2013 EneaNDA AB System-level IPC and Multicore Communicating entities – e.g. processes, threads, user-space threads, bare-metal executives Levels of parallelism • Multicore processor in a SoC • Multiple blocks in a SoC • Multiple SoC in a node • Multiple nodes Communication on different levels (e.g. intra-node and internode) • On each level – Establish contact, Perform communication, Monitor and act on events, Close Enea Confidential – Under Copyright © 2013 EneaNDA AB Where are we heading? Linux Hardware Virtualisation Enea Confidential – Under Copyright © 2013 EneaNDA AB Linux EE Times report - http://seminar2.techonline.com/~additionalresources/embedded_mar1913/embedded_mar1913.pdf Linux usage 2013 – 50% 2012 – 46% Enea Confidential – Under Copyright © 2013 EneaNDA AB Linux Status of embedded Linux – March 2013 http://elinux.org/images/c/cf/Status-of-Embedded-Linux-2013-03-JJ44.pdf • • • • Average time between Linux releases – 3.3 – 3.8 – 70 days Linux 3.4 – RPMsg for IPC between Linux and e.g. RTOS Linux 3.7 – ARM multi-platform support, ARM 64-bit support Linux 3.7 – perf trace (alternative to strace) Status of Linux – September 2013 • Latest stable kernel – 3.11.1 • Example changes in 3.11 (released September 2, 2013): – ARM huge page support, KVM and XEN support for ARM64 – SYSV IPC message queue scalability improvements • Example changes in 3.10 (released June 30, 2013): – Timerless multitasking Enea Confidential – Under Copyright © 2013 EneaNDA AB Linux and real-time Real-time framework e.g. Xenomai - http://www.xenomai.org/ PREEMPT_RT - https://rt.wiki.kernel.org/index.php/Main_Page Core isolation and tickless operation – striving for ”Bare-Metal Multicore Performance in a General-Purpose Operating System” http://www2.rdrop.com/~paulmck/scalability/paper/BareMetalMW.2013.02.25a. pdf Timerless multitasking in 3.10 retains 1 Hz tick also on isolated cores Linux 3.12-rc1 (2013-09-16) - even more tickless kernel (1 Hz maintenance tick removed) – still work to be done, e.g. with memory management Enea Confidential – Under Copyright © 2013 EneaNDA AB Hardware ITRS - http://public.itrs.net - fifteen-year assessment of the semiconductor industry’s future technology requirements ITRS 2012 UPDATE - http://public.itrs.net/Links/2012ITRS/Home2012.htm • System Drivers - SOC Networking Driver, SOC Consumer Driver, Microprocessor (MPU) driver, Mixed-Signal Driver, Embedded Memory Driver • SOC networking driver - moving towards “multicore architectures with heterogeneous on-demand accelerator engines”, with “integration of onboard switch fabric and L3 caches” Enea Confidential – Under Copyright © 2013 EneaNDA AB Hardware SOC networking driver – MC/AE Architecture – from http://public.itrs.net/Links/2011ITRS/2011Chapters/2011SysDrivers.pdf Enea Confidential – Under Copyright © 2013 EneaNDA AB Hardware SOC networking driver – System performance and # of cores – from http://public.itrs.net/Links/2011ITRS/2011Chapters/2011SysDrivers.pdf Assumptions - constant cost (die area), per-year increase of number of cores (1.4 x), core frequency (1.05 x), accelerator engine frequency (1.05 x) - logic, memory, cache hierarchy, switching-fabric and system interconnect will scale consistently with the number of cores System performance – the “product of number of cores, core frequency, and accelerator engine frequency” Enea Confidential – Under Copyright © 2013 EneaNDA AB Virtualization NFV – Network Function Virtualization ETSI - http://portal.etsi.org/NFV/NFV_White_Paper.pdf “leveraging standard IT virtualisation technology to consolidate many network equipment types onto industry standard high volume servers, switches and storage, which could be located in Datacentres, Network Nodes and in the end user premises” Virtualization using e.g. KVM or XEN Enea Confidential – Under Copyright © 2013 EneaNDA AB System-level IPC aspects Establishing and performing efficient communication Constraints from • Real-time • Hardware with an increasing interest in virtualization Enea Confidential – Under Copyright © 2013 EneaNDA AB IPC and Linux Is there any remaining work to do? Enea Confidential – Under Copyright © 2013 EneaNDA AB IPC in Linux (and UNIX) POSIX named semaphore Linux 2.6 mmap SVR4 pipe POSIX rt UNIX SysV FOUNDED CMA Linux 3.2 eventfd Linux 2.6.22 Now 1964 ’70 Enea ’90 ’80 Emacs flock 4.2BSD Linux 1.0 ’10 ’00 POSIX shmem Linux 2.4 POSIX mq Linux 2.6.6 Overview, book, man pages, etc. by Michael Kerrisk - http://man7.org/ Enea Confidential – Under Copyright © 2013 EneaNDA AB IPC on Linux nanomsg OpenMPI TIPC kdbus AF_BUS Binder DBUS FOUNDED RPMsg 0MQ Now 2000 ’2 ’4 ’6 ’8 LINX for Linux Enea Element Enea Confidential – Under Copyright © 2013 EneaNDA AB ’10 Work in progress sysv ipc shared mem optimizations, June 18, 2013 http://lwn.net/Articles/555469/ “With these patches applied, a custom shm microbenchmark stressing shmctl doing IPC_STAT with 4 threads a million times, reduces the execution time by 50%” ALS: Linux interprocess communication and kdbus, May 30, 2013 http://lwn.net/Articles/551969/ “The work on kdbus is progressing well and Kroah-Hartman expressed optimism that it would be merged before the end of the year. Beyond just providing a faster D-Bus (which could be accomplished without moving it into the kernel, he said), it is his hope that kdbus can eventually replace Android's binder IPC mechanism. “ Enea Confidential – Under Copyright © 2013 EneaNDA AB Work in progress Speeding up D-Bus, February 29, 2012 http://lwn.net/Articles/484203/ “D-Bus currently relies on a daemon process to authenticate processes and deliver messages that it receives over Unix sockets. Part of the performance problem is caused by the user-space daemon, which means that messages need two trips through the kernel on their way to the destination” Fast interprocess communication revisited, November 9, 2011 https://lwn.net/Articles/466304/ “Rather we start with the observation that this many attempts to solve essentially the same problem suggests that something is lacking in Linux. There is, in other words, a real need for fast IPC that Linux doesn't address” Enea Confidential – Under Copyright © 2013 EneaNDA AB Work in progress Fast interprocess messaging, September 15, 2010 http://lwn.net/Articles/405346/ “Rather than copy messages through a shared segment, they would rather deliver messages directly into another process's address space. To this end, Christopher Yeoh has posted a patch implementing what he calls cross memory attach.” Enea Confidential – Under Copyright © 2013 EneaNDA AB Which IPC to use? Functionality Performance Cost Enea Confidential – Under Copyright © 2013 EneaNDA AB Technology constraints Choosing an IPC - Functionality Functionality SysV Shared memory POSIX Shared memory FIFO Stream Socket 0MQ LINX End-point addressing SysV key Shmem object name File system node AF_UNIX – file system node, AF_INET – IP adress and port Transport and address (Transport = TCP, ipc, inproc) Endpoint name specifying path to peer End-point repr. Variable File desc File desc x 2 Socket descriptor 0MQ socket LINX endpoint, spid Channels A memory area A memory area The FIFO (unidirectional) The socket (bidirectional) 0MQ socket internal (bidirectional) – e.g. TCP or UNIX domain socket Buffer associated with LINX endpoint Initialisation shmget, shmat shm_open, mmap mkfifo, open socket, bind, listen, accept, connect Create 0MQ context and 0MQ socket linx_open, linx_hunt Closing shmdt munmap, shm_unlink close, unlink close Close 0MQ socket linx_close Enea Confidential – Under Copyright © 2013 EneaNDA AB Choosing an IPC - Functionality Functionality SysV Shared memory POSIX Shared memory FIFO Stream Socket 0MQ LINX Sending write to memory, no synchronizati on write to memory, no synchronizat ion write write Send message or number of bytes to 0MQ socket Send LINX signal Receiving Read from memory, no synchronizati on Read from memory, no synchronizat ion read read Receive message or number of bytes from 0MQ socket Receive LINX signal Blocking No (unless implemented separately) No (unless implemented separately) Blocking and nonblocking R/W Blocking and non-blocking R/W Blocking and non-blocking R/W Receive is blocking (nonblocking possible), Send is not Monitoring No (unless implemented separately) No (unless implemented separately) select, poll select, poll Monitoring callback can be registered with 0MQ context LINX attach Enea Confidential – Under Copyright © 2013 EneaNDA AB Choosing an IPC – Technology constraints Technology 0MQ kdbus LINX Sockets Yes No Yes, own type Daemons No No Discovery daemon (optional) Kernel modules No Yes Yes Pthread synchronization Yes No Yes Kernel synchronization No Yes Yes Programming languages C and more C C Development status Latest stable release is 3.2.3, from May 2013 Estimated to be ready in 2013 Initial release 2006, current version is 2.6.5, released June 2013 License LGPLv3 LGPL BSD and GPLv2 Enea Confidential – Under Copyright © 2013 EneaNDA AB Choosing an IPC - performance • ipc-bench: A UNIX inter-process communication benchmark • University of Cambridge http://www.cl.cam.ac.uk/research/srg/netos/ipc-bench/ Measures Latency, Throughput, IPI latency • Public results dataset “Since we have found IPC performance to be a complex, multi-variate problem, and because we believe that having an open corpus of performance data will be useful to guide the development of hypervisors, kernels and programming frameworks, we provide a database of aggregated ipc-bench datasets.” Enea and ipc-bench – porting to 32-bit, porting to ARM, porting to PowerPC, adding tests for CMA, LINX, ZeroMQ Enea Confidential – Under Copyright © 2013 EneaNDA AB Measuring IPC performance Why is this interesting? From The case for reconfigurable I/O channels, S. Smith et al, RESoLVE12, 2012 - http://anil.recoil.org/papers/2012-resolve-fable.pdf “We show dramatic differences in performance between communication mechanisms depending on locality and machine architecture, and observe that the interactions of communication primitives are often complex and sometimes counter-intuitive” “Furthermore, we show that virtualisation can cause unexpected effects due to OS ignorance of the underlying, hypervisor-level hardware setup” Enea Confidential – Under Copyright © 2013 EneaNDA AB Measuring IPC performance Submitted measurements - http://www.cl.cam.ac.uk/research/srg/netos/ipc-bench/details/tmpn2YlFp.html Pairwise IPC latency between cores 64 cores, AMD Opteron(TM) Processor 6272, 8 NUMA nodes, 125.9 GB Linux 3.8.5-030805-generic, x86_64 Enea Confidential – Under Copyright © 2013 EneaNDA AB Measuring IPC performance Submitted measurements - http://www.cl.cam.ac.uk/research/srg/netos/ipc-bench/details/tmpn2YlFp.html Pairwise IPC throughput between cores. (x-axis is packet size, y-axis is Gbps) 64 cores, AMD Opteron(TM) Processor 6272, 8 NUMA nodes, 125.9 GB Linux 3.8.5-030805-generic, x86_64 Enea Confidential – Under Copyright © 2013 EneaNDA AB Measuring IPC performance Intel(R) Xeon(R) CPU - X3460 @ 2.80GHz, Cores 6 and 7 180000 160000 140000 mempipe_spin_thr 120000 mempipe_thr 100000 pipe_thr tcp_thr 80000 unix_thr vmsplice_coop_pipe_thr 60000 vmsplice_pipe_thr 40000 20000 0 64 4096 65536 Enea Confidential – Under Copyright © 2013 EneaNDA AB Measuring IPC performance ARM Pandaboard @ 1 GHz, Cores 0 and 1 3000 2500 mempipe_spin_thr 2000 mempipe_thr pipe_thr 1500 tcp_thr unix_thr 1000 vmsplice_coop_pipe_thr vmsplice_pipe_thr 500 0 64 4096 65536 Enea Confidential – Under Copyright © 2013 EneaNDA AB Measuring IPC performance Intel(R) Xeon(R) CPU - X3460 @ 2.80GHz, Cores 6 and 7 30000 0MQ vs UNIX sockets 25000 20000 64 15000 4096 65536 10000 5000 0 zmq_inproc_thr zmq_ipc_thr zmq_tcp_thr Enea Confidential – Under Copyright © 2013 EneaNDA AB unix_thr Profiling and Performance Brendan Gregg - Linux Performance Analysis and Tools - SCaLE 11x 2013 http://dtrace.org/blogs/brendan/2013/06/08/linux-performance-analysis-andtools/ Apps and libs System call interface *** VFS, File systems, Block device interface Sockets, TCP/UDP, IP, Ethernet Scheduler, VM Device drivers - perf - https://perf.wiki.kernel.org/index.php/Main_Page *** - DTrace - https://github.com/dtrace4linux - SystemTap - http://sourceware.org/systemtap/ Enea Confidential – Under Copyright © 2013 EneaNDA AB Profiling and Performance Collecting data with perf – IPC test with pipes Enea Confidential – Under Copyright © 2013 EneaNDA AB Profiling and Performance Analyzing data recorded with perf Enea Confidential – Under Copyright © 2013 EneaNDA AB Profiling and Performance Examining where time is spent Enea Confidential – Under Copyright © 2013 EneaNDA AB Profiling and Performance A lot more to choose from*: strace, netstat, top, pidstat, mpstat, dstat, vmstat, slabtop, free, tcpdump, ip, nicstat, iostat, iotop, blktrace, ps, pmap, traceroute, ntop, ss, lsof, oprofile, gprof, kcachegrind, valgrind, google profiler, nfsiostat, cifsiostat, latencytop, powertop, LLTng, ktap, ... * http://www.brendangregg.com/Slides/SCaLE_Linux_Performance2013.pdf Enea Confidential – Under Copyright © 2013 EneaNDA AB Summary IPC in Linux - Stable but not finished IPC on Linux – diversified Performance and profiling – ipc-bench (with adaptations and extensions), a large selection of profiling tools Enea Confidential – Under Copyright © 2013 EneaNDA AB Conclusions • A variety of IPC mechanisms exist • There is no clear one-fits-all solution • Performance aspects and functionality aspects (location transparency, robustness) – different trade-offs for different use-cases • IPC and Linux – many stable mechanisms but still work-inprogress (e.g. kdbus) • Performance and profiling required – ipc-bench (with adaptations and extensions) – perf for performance profiling (one of several, however with a powerful feature set) Enea Confidential – Under Copyright © 2013 EneaNDA AB Challenges • Systems requirements and design - parallelism, partitioning, heterogeneity, functional requirements, performance requirements – choosing an IPC mechanism • Programming - frameworks and execution environments – legacy and re-use – choosing a programming paradigm • Verification - measurements and profiling - are we designing (and implementing) the system as we planned? – choosing the right tools Enea as an IPC partner - Long-term experience, Competence for building future IPC systems – development, integration, configuration, performance assessment Enea Confidential – Under Copyright © 2013 EneaNDA AB SICS Multicore day System-level IPC on multicore platforms Multicore System-on-Chip solutions, offering parallelization and partitioning, are increasingly used in real-time systems. As the number of cores increase, often in combination with increased heterogeneity in the form of hardware accelerated functionality, we see increased demands on effective communication, inside a multicore node but also on an inter-node system-level. The presentation will outline some of the challenges, as seen from Enea, to be expected when building future communication mechanisms, with requirements on performance and scalability, as well as transparency for applications. We will give examples from ongoing work in the Linux area, from Enea and from other open source contributors. Enea Confidential – Under Copyright © 2013 EneaNDA AB
© Copyright 2025