Tutorial: Finding Hotspots on a Remote Linux* System Intel® VTune™ Amplifier for Systems Linux* OS C++ Sample Application Code Document Number: 330219-001 Legal Information Tutorial: Finding Hotspots on a Remote Linux* System Contents Legal Information................................................................................ 3 Overview.............................................................................................. 5 Chapter 1: Navigation Quick Start Chapter 2: Finding Hotspots Prepare Your Host.................................................................................... 10 Prepare Your Target Device....................................................................... 10 Cross Build and Load the Sampling Drivers..........................................12 Prepare Your Sample Application................................................................13 Run Advanced Hotspot Analysis................................................................. 14 View Your Results.................................................................................... 16 Chapter 3: Summary Chapter 4: Key Terms 2 Legal Information By using this document, in addition to any agreements you have with Intel, you accept the terms set forth below. You may not use or facilitate the use of this document in connection with any infringement or other legal analysis concerning Intel products described herein. You agree to grant Intel a non-exclusive, royalty-free license to any patent claim thereafter drafted which includes subject matter disclosed herein. INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. A "Mission Critical Application" is any application in which failure of the Intel Product could result, directly or indirectly, in personal injury or death. SHOULD YOU PURCHASE OR USE INTEL'S PRODUCTS FOR ANY SUCH MISSION CRITICAL APPLICATION, YOU SHALL INDEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES, SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS, OFFICERS, AND EMPLOYEES OF EACH, HARMLESS AGAINST ALL CLAIMS COSTS, DAMAGES, AND EXPENSES AND REASONABLE ATTORNEYS' FEES ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY CLAIM OF PRODUCT LIABILITY, PERSONAL INJURY, OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICATION, WHETHER OR NOT INTEL OR ITS SUBCONTRACTOR WAS NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS. Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined". Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information. The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or go to: http:// www.intel.com/design/literature.htm Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. BlueMoon, BunnyPeople, Celeron, Celeron Inside, Centrino, Centrino Inside, Cilk, Core Inside, E-GOLD, Flexpipe, i960, Intel, the Intel logo, Intel AppUp, Intel Atom, Intel Atom Inside, Intel CoFluent, Intel Core, Intel Inside, Intel Insider, the Intel Inside logo, Intel NetBurst, Intel NetMerge, Intel NetStructure, Intel SingleDriver, Intel SpeedStep, Intel Sponsors of Tomorrow., the Intel Sponsors of Tomorrow. logo, Intel StrataFlash, Intel vPro, Intel Xeon Phi, Intel XScale, InTru, the InTru logo, the InTru Inside logo, InTru soundmark, Itanium, Itanium Inside, MCS, MMX, Pentium, Pentium Inside, Puma, skoool, the skoool logo, SMARTi, Sound Mark, Stay With It, The Creators Project, The Journey Inside, Thunderbolt, Ultrabook, vPro Inside, VTune, Xeon, Xeon Inside, X-GOLD, XMM, X-PMU and XPOSYS are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. 3 Tutorial: Finding Hotspots on a Remote Linux* System Java is a registered trademark of Oracle and/or its affiliates. Copyright (C) 2010-2014, Intel Corporation. All rights reserved. 4 Overview Discover how to use Advanced Hotspots Analysis of the Intel(R) VTune(TM) Amplifier for Systems to understand where your embedded application is spending time by identifying hotspots - the most timeconsuming program units. Advanced Hotspots Analysis is useful to analyze the performance of both serial and parallel applications. The Intel VTune Amplifier for Systems supports analysis of remote Linux* applications running on regular or embedded Linux systems, but this tutorial will focus on embedded platforms. About This Tutorial This tutorial uses the sample tachyon and guides you through the basic steps required to use the GUI to analyze the code for hotspots by means of remote data collection. Estimated Duration • • Learning Objectives After you complete this tutorial, you will be able to find hotspots by: More Resources 20 minutes: Preparing your host and target device for use 15 minutes: Preparing your sample application and analyzing it • • • • • Preparing your Host Preparing your Target Device Preparing your sample application, tachyon Running an Advanced Hotspot Analysis Viewing your results • The Intel(R) Developer Zone is a site devoted to software development tools, resources, forums, blogs, and knowledge bases, see http://software.intel.com The Intel(R) Software Documentation Library is part of the Intel(R) Developer Zone and is an online collection of Release Notes, User and Reference Guides, White Papers, Help, and Tutorials for Intel software products, http:// software.intel.com/en-us/intel-software-technical-documentation For troubleshooting the creation and installation of the sep drivers, see http://software.intel.com/en-us/articles/troubleshooting-issues-with-sep-inthe-embedded-tool-suite-intel-system-studio • • Start Here 5 1 Tutorial: Finding Hotspots on a Remote Linux* System Navigation Quick Start 1 The Intel(R) VTune(TM) Amplifier for Systems for Linux* provides information on code performance for users developing serial and multithreaded applications on supported embedded platforms. VTune Amplifier helps you analyze algorithm choices and identify where and how your application can benefit from available hardware resources. It reports your most significant problems thereby showing you the best ways to utilize your available optimization schedule and resources. VTune Amplifier for Systems Graphical User Interface (GUI) Access The VTune Amplifier installation includes shell scripts that you can run in your terminal window to set up required environment variables: 1. From the installation directory, enter source amplxe-vars.sh. This script sets the PATH environment variable that specifies locations of the product's graphical user interface and command line utilities. NOTE For the VTune Amplifier for Systems installed as part of Intel System Studio, the default <install_dir> is: For super-users: /opt/intel/system_studio_<version>/vtune_amplifier_< version>_for_systems For ordinary users: $HOME/intel/system_studio_<version>/vtune_amplifier_< version>_for_systems For the standalone VTune Amplifier for Systems installed without Intel System Studio, the default <install_dir> is: For super-users: /opt/intel/vtune_amplifier_for_systems_<version> For ordinary users: $HOME/intel/vtune_amplifier_for_systems_<version> 2. You can modify your login shell to include these important shell variables. For example, if you use the bash shell, you can add this line to your $HOME/.bashrc: source /opt/intel/ vtune_amplifier_<version>_for_systems/amplxe-vars.sh 3. Enter amplxe-gui to launch the product graphical interface. 6 1 Navigation Quick Start Configure and manage projects and results, and launch new analyses from the primary toolbar. Click the Project Properties button on this toolbar to manage result file locations. Newly completed and opened analysis results along with result comparisons appear in the results tab for easy navigation. Use the VTune Amplifier menu to control result collection, define and view project properties, and set various options. The Project Navigator provides an iconic representation of your projects and analysis results. Click the Project Navigator button on the toolbar to enable/disable the Project Navigator. Click the (change) link to select a viewpoint, a preset configuration of windows/panes for an analysis result. For each analysis type, you can switch among several viewpoints to focus on particular performance metrics. Click the yellow question mark icon to read the viewpoint description. Switch between window tabs to explore the analysis type configuration options and collected data provided by the selected viewpoint. Use the Grouping drop-down menu to choose a granularity level for grouping data in the grid. Use the filter toolbar to filter out the result data according to the selected categories. Next step: Finding Hotspots 7 2 Tutorial: Finding Hotspots on a Remote Linux* System Finding Hotspots 2 Use the Intel(R) VTune(TM) Amplifier 2014 for Systems for Linux* to identify and analyze hotspot functions in your serial or parallel embedded application by performing a series of steps in a workflow. This tutorial guides you through these workflow steps while using a sample ray-tracer application named tachyon that runs on your embedded device. To optimize the performance of your embedded application, you must first understand its current performance qualities as it runs on the embedded device. You then modify the application based on that performance data, and, check the new performance metrics to compare the results. You can repeat this cycle until the results match your performance goals. When you check the performance of your application you run advanced sampling-based performance analysis on the application as it runs. These analyses help you identify performance hotspots and bottlenecks, and if they are not where you expect them, you can rewrite your code accordingly and test again. Each time you make a change and test it, you compare the new results time over time to insure an increase in performance. To obtain this important sampling-based performance data, you compile and run your application in a supported, embedded development environment. Then, you define and launch a profiling agent which is called a remote data collector that also runs on the embedded device. This remote data collector then records specified performance data collected from your running application. Then this performance information is automatically transferred to a server system where you can view and analyze it, and plan your optimization strategy and its implementation based on your available time and resources. Your embedded application must be cross compiled and present on this server system as well, so that your results will accurately reflect the function names and the line numbers in your code. While there are several supported embedded OS versions, this tutorial focuses on the Yocto Project* 1.* environment. To summarize, for this tutorial you will collect data on your embedded system with the VTune Amplifier GUI amplxe-gui and SSH communication, started from the host system. Copying the kernel and drivers from your host to your target system is a one-time setup procedure, after which you can run multiple data collection sessions and view and compare the results. 8 2 Finding Hotspots Once you have collected performance data you make modifications to your code to improve its performance profile, and test again. NOTE This tutorial focuses on obtaining the baseline results for Advanced Hotspots Analysis and the tachyon sample application. For more information on the iterative process of testing, modifying, improving, and retesting your code for comparative analysis, see Tutorial: Finding Hotspots: Compare with Previous Result at http://software.intel.com/en-us/node/471858 To find hotspots in your application complete these activities: Step 1: Prepare your host • Step 2: Prepare your target device • • Set up your Linux host Set up a cross compilation environment • Install a target package including remote collectors Build a Yocto* Project kernel Cross build and load the sampling driver (sep) Configure ssh for a no-password connection Step 3: Prepare your sample application • • Cross compile tachyon for use Copy tachyon to your Yocto* Project target Step 4: Run Advanced Hotspot Analysis • • Use the GUI to set up your remote configuration Collect performance data on your application Step 5: View your results • See analysis results • • Next step: Prepare Your Host 9 2 Tutorial: Finding Hotspots on a Remote Linux* System Prepare Your Host Here are the steps for setting up a Yocto* Project 1.x environment on your host. Go to http:// www.yoctoproject.org/docs/current/ref-manual/ref-manual.html#required-packages-for-thehost-development-system to see a list of required packages for Debian*, Fedora* or other releases. 1. On your Linux* host, download the pre-built toolchain from http://downloads.yoctoproject.org/ releases/yocto/yocto-1.x/toolchain/. The following sample toolchain tar file is for a Yocto Project 1.2.1, 32-bit development host system and a 32-bit target architecture: 2. Install your selected tar file on your Linux development host in the root "/" directory, to create an installation area /opt/poky/1.x. 3. Set up your Linux host poky-eglibc-i686-i586-toolchain-gmae-1.2.1.tar.bz2 a. There are many supported Linux distributions. See the Yocto Project* Quick Start for information on the required setup for supported Linux* distributions: http://www.yoctoproject.org/ docs/1.2/yocto-quick-start/yocto-project-qs.html. b. To setup a Ubuntu* x64 12.04 system for example, run: $ sudo apt-get install sed wget cvs subversion git-core coreutils \ unzip texi2html texinfo libsdl1.2-dev docbook-utils gawk \ python-pysqlite2 diffstat help2man make gcc build-essential \ g++ desktop-file-utils chrpath libgl1-mesa-dev libglu1-mesa-dev \ mercurial autoconf automake groff Next step: Prepare Your Target Device Prepare Your Target Device After you have installed the VTune Amplifier for Systems on your host, you must install a target package with remote collectors on your target device. Then, build the Yocto* Project kernel by downloading and customizing the latest stable build system. You will also configure ssh so for a password-less connection, and cross build and load the required sampling drivers. NOTE You will not be able to identify time-consuming code in your application using Advanced Hotspots Analysis if the nmi_watchdog interrupt capability is enabled on your target system, which prevents collecting accurate event-based sampling data. You will have to disable nmi_watchdog interrupt, or see the "Troubleshooting" section of the product documentation for details. 1. Copy the required package archive located at [program files]\intel\system_studio_201x. 0.<package_num>\vtune_amplifier_201x_for_systems\targets directory to your remote Linux* target system. a. 10 The vtune_amplifier_target_sepx86_64.tgz provides hardware event-based standalone sampling collector only (sep) for x86 and 64-bit systems. It is not possible to run this driver remotely from the VTune Amplifier: it is for sep only. 2 Finding Hotspots 2. The vtune_amplifier_target_x86_64.tgz provides all VTune Amplifier remote collectors for x86 and 64-bit systems. a. b. 3. Unzip the target archive to the default /opt/intel/ directory on the target. Unzip both x86 and x86-64 packages if you plan to run and analyze 32-bit processes on 64-bit systems. Next, for the Yocto Project version you want, clone and checkout the meta-intel package. git clone git://git.yoctoproject.org/meta-intel/ cd meta-intel git checkout denzil 4. If you have not yet done so, download the Yocto Project version you selected, for example, version 1.2.1: wget http://downloads.yoctoproject.org/releases/yocto/yocto-1.2.1/pokydenzil-7.0.1.tar.bz You can see a list of required packages for Debian*, Fedora* or other releases here http:// www.yoctoproject.org/docs/current/ref-manual/ref-manual.html#required-packages-forthe-host-development-system. For a list of all Yocto Project versions, see https:// www.yoctoproject.org/downloads/yocto-project. a. Extract the contents of the archive: tar xjf poky-denzil-7.0.1.tar.bz 5. Take the appropriate BSP package if you need it. For example, for Fish River Island 2 platform, choose one of these two ways: wget https://www.yoctoproject.org/download/intel-atom-e640t-fish-river-island-2, or Checkout with git (see steps 2a, 2b, and 2c above). The fri2 BSP is located at meta-intel/metafri2. 6. For a list of all BSPs, see https://www.yoctoproject.org/downloads/bsp. Setup the build environment: source poky-denzil-7.0.1/oe-init-build-env poky-denzil-7.0.1-build 7. Setup the BSP a. b. c. Edit poky-denzil-7.0.1/build/conf/local.con by tailoring MACHINE for the BSP you want to build. In the case of the tutorial example fri2-noemgd. Edit poky-denzio-7.0.1/build/conf/bblayers.conf by specifying the meta-intel you checked out and the BSP meta directory, in this example, meta-intel/meta-fri2. You may also need to add paths to BBLAYERS in bblayess.conf, such as: poky-denzip-7.0.1/meta-intel \ poky-denzil-7.0.1/meta-intel/meta-fri2 d. It is important to check and set your parallelism options, where "n" is the number of cores on your host machine: BB_NUMBER_THREADS="n" PARALLEL_MAKE="-j n" 8. Build the Yocto Project kernel: bitbake core-image-sato. 9. This creates a kernel that will build and run sep. Configure ssh to work in password-less mode so it does not prompt for a password on each invocation. To do this, use the key generation utility on the host system. host> ssh-keygen host> cat ~/.ssh/id_dsa.pub | ssh user@target "cat >> ~/.ssh/authorized_keys" You will need the target user password to complete this operation, and will be prompted for it. If this command completes successfully, you will not require it afterwards. 11 2 Tutorial: Finding Hotspots on a Remote Linux* System After you set the password-less mode, run a command to verify that a password is not required anymore, for example: host> ssh user@target ls Make sure that only the owner (root) has read/write/execute permissions to the $HOME/.ssh/ directory. In these examples, Target can be a hostname or IP address. NOTE Another example of building a Yocto project and installing it is available at the Intel(R) Developer Zone https://software.intel.com/en-us/forums/topic/507002. Next step: Cross Build and Load Sampling Drivers Cross Build and Load the Sampling Drivers Build the sampling drivers for your target environment on your Linux* host and transfer them to the target, where you load them into the kernel you customized for this purpose. If you do not build the drivers for your specific device by version and build number, your driver will not load; or, if it loads, it will not work. To find the kernel-version, see $KERNEL-SRC-DIR/include/generated/utsrelease.h. To find what version of the kernel is currently running, use the uname -a command on the target. 1. 2. Change into the source directory: cd sepdk/src Build the sampling driver. For example: /opt/intel/system_studio_201x.0.0.***/sepdk/src/ \ build-driver -ni --c-compiler=i586-poky-linux-gcc \ --kernel-src-dir=~/yocto/poky-denzil-7.0/build/tmp/work/\ fri2_noemgd-poky-linux/linuxyocto3.2.11+git1\ +5b4c9dc78b5ae607173cc3ddab9bce1b5f78129b_1+7\ 6dc683eccc4680729a76b9d2fd425ba540a483-r1/linux-fri2-noemgd-\ standard-build --kernel-version=3.0.24-yocto-standard \ --make-args="PLATFORM=x32 ARITY=smp" --install-dir=../prebuilt 3. Once the driver files are built, copy them from your host to your target machine, and load them: cd /opt/intel/system_studio_201x.0.0.*** scp -r sepdk root@10.224.36.208:/home/root Load the sampling drivers on your target machine: target> cd /home/root/sepdk/src target> ./insmod-sep3 -re Checking for PMU arbitration service (PAX)...detected. PAX service is accessible to users in group "0" Executing: insmod ./sep3_15-x32-3.0.24-yocto-standardsmp.ko Creating /dev/sep3_15 base devices with major number 251...done. Creating /dev/sep3_15 percpu devices with major number 250 ... done. The sep3_15 drivers has been successfully loaded. Checking for vtsspp driver ... not detected. 12 2 Finding Hotspots Executing: insmod ./vtsspp/vtsspp-x32-3.0.24-yocto-standardsmp.ko gid=0 mode=0666 The vtsspp driver has been successfully loaded. 4. For some embedded Linux systems the insmod-vtss command may not work. In that event, you can load the kernel module directly by using insmod: ./insmod-sep3 -re cd /home/root/sep/sepdk/src/vtsspp insmod vtsspp.ko 5. Confirm that the driver has been installed: lsmod | grep sep sep3_10 80108 0 lsmod | grep vtsspp vtsspp 295740 0 Next step: Prepare Your Sample Application Prepare Your Sample Application The Intel(R) VTune(TM) Amplifier for Systems release includes sample code called tachyon for you to compile and use on the target system. The needed changes to the Makefiles listed in this section have been completed in the makefiles located in your distribution, which are included as examples. After compiling tachyon, copy the application to your target. 1. Change directories so you can untar the sample code: 2. Unarchive (untar) the tachyon sample application: 3. tar xvzf /opt/intel/vtune_amplifier_2014_for_systems/samples/en/C++/ tachyon_vtune_amp_xe.tgz The tachyon sample code included with your distribution is modified for the Yocto* environment. In the top level Makefile, the line containing CXX has been commented out. In the lower level tachyon/ common/gui/Makefile.gmake file, the following lines have been added: 4. If the host system is x86_64, you must comment some lines in the Makefile: cd /~yocto #ifeq ($(shell uname -m),x86_64 #Arch=intel64 #CXXXFLAGS+= -m64 #else Arch=ia32 CXXFLAGS+= -m32 #endif 5. Source important environmental variables: source /opt/poky/1.2/environment-setup-i586-poky-linux 13 2 6. Tutorial: Finding Hotspots on a Remote Linux* System Compile the tachyon code: make 7. Copy the tachyon binary, the dat folder and the libtbb.so folder to your Yocto* target, to an appropriate target area where the executable can find it. For example: scp tachyon_find_hotspots date libtbb.so root@target_ip:/usr/local/sbin Next step: Run Advanced Hotspot Analysis Run Advanced Hotspot Analysis The following steps show you how to launch the Intel(R) VTune(TM) Amplifier for Systems GUI and create a new project. 1. 2. Run amplxe-gui. Click on New Project and enter an identifying project name such as tachyon1 so that you can identify this project from other projects. Keep or change the default project file Location: and then select Create Project. 3. Specify remote Linux (SSH) for the Target system. You must also specify the user name and the host name or IP address of the remote system you are profiling at the SSH detailsitem. Then, at the Application: option, enter the full pathname for the target binary, in this case,/home/root/tachyon, and any needed arguments or application parameters, in this case the path to the needed data file / home/root/dat/balls.dat and click OK. 14 2 Finding Hotspots 4. Basic Hotspots Analysis is the default for new projects you set up. To change this to Advanced Hotspot Analysis, select Advanced Hotspots in the Analysis Type frame. You will notice communication with the remote system before the Analysis Type screen appears. Change the Analysis type from Basic Hotspots to Advanced Hotspots. 15 2 5. 6. Tutorial: Finding Hotspots on a Remote Linux* System Launch the Advanced Hotspots Analysis session by clicking on the Start button. The VTune Amplifier sets up the now passwordless SSH connection to your target device and launches the target application. It collects Advanced Hotspots data with default settings, and then copies those results back to the host. Next step: View Your Results View Your Results After the target device sends the tachyon results - usually within a minute or two - the results appear on your display: 16 2 Finding Hotspots Next step: Prepare your own embedded applications for analysis using the VTune Amplifier to view hotspots. 17 3 Tutorial: Finding Hotspots on a Remote Linux* System 3 Summary You have completed the Finding Hotspots tutorial. Here are some important things to remember when using the Intel(R) VTune(TM) Amplifier 2014 for Systems for Linux* to analyze your code for hotspots: Step Tutorial Recap Key Tutorial Take-aways 1. Prepare your host You downloaded and setup the prebuilt Yocto Project* toolchain on your Linux* host and configured a kernel for use on your target system. • You installed a stable Yocto Project kernel; setup a password-less connection to your target; and compiled and loaded sampling drivers. • 2. Prepare your target device 3. Prepare your sample application You extracted the tachyon code and if necessary modified it for use in your specific embedded environment. • • • • • 4. Run Advanced Hotspot Analysis 5. View your results You ran the VTune Amplifier GUI to configure and launch Advanced Hotspot Analysis on the tachyon code on your target device. It ran on your target and the results were sent via ssh back to your server. You viewed the Advanced Hotspots analysis on the tachyon in the GUI that you launched it from. • • • • Download and extract an appropriate toolchain from the Yocto Project web site and create an installation area Build a Yocto Project kernel and transfer it to your target for use Configure ssh so there is no password request for file transfers between your server and target Compile sampling drivers and transfer them to your target for use Unarchive tachyon in the /~yocto directory View the necessary changes to the top level Makefile View the necessary changes to the lower level Makefile.gmake Launch the GUI using theamplxe-gui command. Use the Project Properties: Target tab to choose and configure your analysis target. Use the Analysis Type configuration window to choose, configure, and run the Advanced Hotspot Analysis. You can also use the VTune Amplifier command-line interface by running the amplxe-cl command to test your code for hotspots and regressions. For details see the Command-line Interface Support section in the VTune Amplifier online help. Optimization Notice Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or 18 3 Summary Optimization Notice effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessordependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 19 4 Tutorial: Finding Hotspots on a Remote Linux* System Key Terms 4 baseline: A performance metric used as a basis for comparison of the application versions before and after optimization. Baseline should be measurable and reproducible. CPU time: The amount of time a thread spends executing on a logical processor. For multiple threads, the CPU time of the threads is summed. The application CPU time is the sum of the CPU time of all the threads that run the application. CPU usage: A performance metric when the VTune Amplifier identifies a processor utilization scale, calculates the target CPU usage, and defines default utilization ranges depending on the number of processor cores. Utilizatio n Type Default color Description Idle All CPUs are waiting - no threads are running. Poor Poor usage. By default, poor usage is when the number of simultaneously running CPUs is less than or equal to 50% of the target CPU usage. OK Acceptable (OK) usage. By default, OK usage is when the number of simultaneously running CPUs is between 51-85% of the target CPU usage. Ideal Ideal usage. By default, Ideal usage is when the number of simultaneously running CPUs is between 86-100% of the target CPU usage. Elapsed time:The total time your target ran, calculated as follows: Wall clock time at end of application – Wall clock time at start of application. finalization: A process during which the Intel® VTune™ Amplifier converts the collected data to a database, resolves symbol information, and pre-computes data to make further analysis more efficient and responsive. hotspot: A section of code that took a long time to execute. Some hotspots may indicate bottlenecks and can be removed, while other hotspots inevitably take a long time to execute due to their nature. Advanced Hotspots Analysis: A non-default analysis type used to understand the application flow of control and to identify hotspots, that works directly with the CPU without the influence of the booted operating system. VTune Amplifier creates a list of functions in your application ordered by the amount of time spent in a function. It also detects the call stacks for each of these functions so you can see how the hot functions are called. VTune Amplifier uses a low overhead (about 5%) user-mode sampling and tracing collection that gets you the information you need without slowing down the application execution significantly. A target is an executable file you analyze using the Intel® VTune™ Amplifier. host system: The Linux* server on which you install amplxe-gui and from which you launch your application analysis and view those results. target system: The supported, embedded device on which you install sampling drivers and run the application you are running performance analysis on. 20 4 Key Terms viewpoint: A preset result tab configuration that filters out the data collected during a performance analysis and enables you to focus on specific performance problems. When you select a viewpoint, you select a set of performance metrics the VTune Amplifier shows in the windows/panes of the result tab. To select the required viewpoint, click the (change) link and use the drop-down menu at the top of the result tab. 21 Tutorial: Finding Hotspots on a Remote Linux* System Index H Host, Prepare Your 10 Hotspot Analysis, Run Advanced 14 N Navigation Quick Start 6 R S Sample Application, Prepare Your 13 Sampling Drivers, Cross Build and Load 12 Summary 18 T Target Device, Prepare Your 10 Results, View Your 16 22
© Copyright 2024