Tutorial: Finding Hotspots on a Remote Linux* System

Tutorial: Finding Hotspots on a Remote
Linux* System
Intel® VTune™ Amplifier for Systems Linux* OS
C++ Sample Application Code
Document Number: 330219-001
Legal Information
Tutorial: Finding Hotspots on a Remote Linux* System
Contents
Legal Information................................................................................ 3
Overview.............................................................................................. 5
Chapter 1: Navigation Quick Start
Chapter 2: Finding Hotspots
Prepare Your Host.................................................................................... 10
Prepare Your Target Device....................................................................... 10
Cross Build and Load the Sampling Drivers..........................................12
Prepare Your Sample Application................................................................13
Run Advanced Hotspot Analysis................................................................. 14
View Your Results.................................................................................... 16
Chapter 3: Summary
Chapter 4: Key Terms
2
Legal Information
By using this document, in addition to any agreements you have with Intel, you accept the terms set forth
below.
You may not use or facilitate the use of this document in connection with any infringement or other legal
analysis concerning Intel products described herein. You agree to grant Intel a non-exclusive, royalty-free
license to any patent claim thereafter drafted which includes subject matter disclosed herein.
INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE,
EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS
GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR
SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR
IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR
WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT
OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.
A "Mission Critical Application" is any application in which failure of the Intel Product could result, directly or
indirectly, in personal injury or death. SHOULD YOU PURCHASE OR USE INTEL'S PRODUCTS FOR ANY SUCH
MISSION CRITICAL APPLICATION, YOU SHALL INDEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES,
SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS, OFFICERS, AND EMPLOYEES OF EACH,
HARMLESS AGAINST ALL CLAIMS COSTS, DAMAGES, AND EXPENSES AND REASONABLE ATTORNEYS' FEES
ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY CLAIM OF PRODUCT LIABILITY, PERSONAL INJURY, OR
DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICATION, WHETHER OR NOT INTEL OR
ITS SUBCONTRACTOR WAS NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL
PRODUCT OR ANY OF ITS PARTS.
Intel may make changes to specifications and product descriptions at any time, without notice. Designers
must not rely on the absence or characteristics of any features or instructions marked "reserved" or
"undefined". Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts
or incompatibilities arising from future changes to them. The information here is subject to change without
notice. Do not finalize a design with this information.
The products described in this document may contain design defects or errors known as errata which may
cause the product to deviate from published specifications. Current characterized errata are available on
request. Contact your local Intel sales office or your distributor to obtain the latest specifications and before
placing your product order. Copies of documents which have an order number and are referenced in this
document, or other Intel literature, may be obtained by calling 1-800-548-4725, or go to: http://
www.intel.com/design/literature.htm
Software and workloads used in performance tests may have been optimized for performance only on Intel
microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific
computer systems, components, software, operations and functions. Any change to any of those factors may
cause the results to vary. You should consult other information and performance tests to assist you in fully
evaluating your contemplated purchases, including the performance of that product when combined with
other products.
BlueMoon, BunnyPeople, Celeron, Celeron Inside, Centrino, Centrino Inside, Cilk, Core Inside, E-GOLD,
Flexpipe, i960, Intel, the Intel logo, Intel AppUp, Intel Atom, Intel Atom Inside, Intel CoFluent, Intel Core,
Intel Inside, Intel Insider, the Intel Inside logo, Intel NetBurst, Intel NetMerge, Intel NetStructure, Intel
SingleDriver, Intel SpeedStep, Intel Sponsors of Tomorrow., the Intel Sponsors of Tomorrow. logo, Intel
StrataFlash, Intel vPro, Intel Xeon Phi, Intel XScale, InTru, the InTru logo, the InTru Inside logo, InTru
soundmark, Itanium, Itanium Inside, MCS, MMX, Pentium, Pentium Inside, Puma, skoool, the skoool logo,
SMARTi, Sound Mark, Stay With It, The Creators Project, The Journey Inside, Thunderbolt, Ultrabook, vPro
Inside, VTune, Xeon, Xeon Inside, X-GOLD, XMM, X-PMU and XPOSYS are trademarks of Intel Corporation in
the U.S. and/or other countries.
*Other names and brands may be claimed as the property of others.
3
Tutorial: Finding Hotspots on a Remote Linux* System
Java is a registered trademark of Oracle and/or its affiliates.
Copyright (C) 2010-2014, Intel Corporation. All rights reserved.
4
Overview
Discover how to use Advanced Hotspots Analysis of the Intel(R) VTune(TM) Amplifier for Systems to
understand where your embedded application is spending time by identifying hotspots - the most timeconsuming program units. Advanced Hotspots Analysis is useful to analyze the performance of both serial
and parallel applications. The Intel VTune Amplifier for Systems supports analysis of remote Linux*
applications running on regular or embedded Linux systems, but this tutorial will focus on embedded
platforms.
About This Tutorial
This tutorial uses the sample tachyon and guides you through the basic steps
required to use the GUI to analyze the code for hotspots by means of remote
data collection.
Estimated Duration
•
•
Learning Objectives
After you complete this tutorial, you will be able to find hotspots by:
More Resources
20 minutes: Preparing your host and target device for use
15 minutes: Preparing your sample application and analyzing it
•
•
•
•
•
Preparing your Host
Preparing your Target Device
Preparing your sample application, tachyon
Running an Advanced Hotspot Analysis
Viewing your results
•
The Intel(R) Developer Zone is a site devoted to software development tools,
resources, forums, blogs, and knowledge bases, see http://software.intel.com
The Intel(R) Software Documentation Library is part of the Intel(R) Developer
Zone and is an online collection of Release Notes, User and Reference Guides,
White Papers, Help, and Tutorials for Intel software products, http://
software.intel.com/en-us/intel-software-technical-documentation
For troubleshooting the creation and installation of the sep drivers, see
http://software.intel.com/en-us/articles/troubleshooting-issues-with-sep-inthe-embedded-tool-suite-intel-system-studio
•
•
Start Here
5
1
Tutorial: Finding Hotspots on a Remote Linux* System
Navigation Quick Start
1
The Intel(R) VTune(TM) Amplifier for Systems for Linux* provides information on code performance for
users developing serial and multithreaded applications on supported embedded platforms. VTune Amplifier
helps you analyze algorithm choices and identify where and how your application can benefit from available
hardware resources. It reports your most significant problems thereby showing you the best ways to utilize
your available optimization schedule and resources.
VTune Amplifier for Systems Graphical User Interface (GUI) Access
The VTune Amplifier installation includes shell scripts that you can run in your terminal window to set up
required environment variables:
1.
From the installation directory, enter source amplxe-vars.sh.
This script sets the PATH environment variable that specifies locations of the product's graphical user
interface and command line utilities.
NOTE
For the VTune Amplifier for Systems installed as part of Intel System Studio, the default
<install_dir> is:
For super-users: /opt/intel/system_studio_<version>/vtune_amplifier_<
version>_for_systems
For ordinary users: $HOME/intel/system_studio_<version>/vtune_amplifier_<
version>_for_systems
For the standalone VTune Amplifier for Systems installed without Intel System Studio, the default
<install_dir> is:
For super-users: /opt/intel/vtune_amplifier_for_systems_<version>
For ordinary users: $HOME/intel/vtune_amplifier_for_systems_<version>
2.
You can modify your login shell to include these important shell variables. For example, if you use the
bash shell, you can add this line to your $HOME/.bashrc: source /opt/intel/
vtune_amplifier_<version>_for_systems/amplxe-vars.sh
3.
Enter amplxe-gui to launch the product graphical interface.
6
1
Navigation Quick Start
Configure and manage projects and results, and launch new analyses from the primary
toolbar. Click the Project Properties button on this toolbar to manage result file locations.
Newly completed and opened analysis results along with result comparisons appear in the
results tab for easy navigation.
Use the VTune Amplifier menu to control result collection, define and view project properties,
and set various options.
The Project Navigator provides an iconic representation of your projects and analysis
results. Click the Project Navigator button on the toolbar to enable/disable the Project
Navigator.
Click the (change) link to select a viewpoint, a preset configuration of windows/panes for an
analysis result. For each analysis type, you can switch among several viewpoints to focus on
particular performance metrics. Click the yellow question mark icon to read the viewpoint
description.
Switch between window tabs to explore the analysis type configuration options and collected
data provided by the selected viewpoint.
Use the Grouping drop-down menu to choose a granularity level for grouping data in the grid.
Use the filter toolbar to filter out the result data according to the selected categories.
Next step: Finding Hotspots
7
2
Tutorial: Finding Hotspots on a Remote Linux* System
Finding Hotspots
2
Use the Intel(R) VTune(TM) Amplifier 2014 for Systems for Linux* to identify and analyze hotspot
functions in your serial or parallel embedded application by performing a series of steps in a workflow. This
tutorial guides you through these workflow steps while using a sample ray-tracer application named tachyon
that runs on your embedded device.
To optimize the performance of your embedded application, you must first understand its current
performance qualities as it runs on the embedded device. You then modify the application based on that
performance data, and, check the new performance metrics to compare the results. You can repeat this cycle
until the results match your performance goals.
When you check the performance of your application you run advanced sampling-based performance analysis
on the application as it runs. These analyses help you identify performance hotspots and bottlenecks, and if
they are not where you expect them, you can rewrite your code accordingly and test again. Each time you
make a change and test it, you compare the new results time over time to insure an increase in
performance.
To obtain this important sampling-based performance data, you compile and run your application in a
supported, embedded development environment. Then, you define and launch a profiling agent which is
called a remote data collector that also runs on the embedded device. This remote data collector then
records specified performance data collected from your running application.
Then this performance information is automatically transferred to a server system where you can view and
analyze it, and plan your optimization strategy and its implementation based on your available time and
resources. Your embedded application must be cross compiled and present on this server system as well, so
that your results will accurately reflect the function names and the line numbers in your code. While there
are several supported embedded OS versions, this tutorial focuses on the Yocto Project* 1.* environment.
To summarize, for this tutorial you will collect data on your embedded system with the VTune Amplifier GUI
amplxe-gui and SSH communication, started from the host system.
Copying the kernel and drivers from your host to your target system is a one-time setup procedure, after
which you can run multiple data collection sessions and view and compare the results.
8
2
Finding Hotspots
Once you have collected performance data you make modifications to your code to improve its performance
profile, and test again.
NOTE
This tutorial focuses on obtaining the baseline results for Advanced Hotspots Analysis and the tachyon
sample application. For more information on the iterative process of testing, modifying, improving, and
retesting your code for comparative analysis, see Tutorial: Finding Hotspots: Compare with Previous
Result at http://software.intel.com/en-us/node/471858
To find hotspots in your application complete these activities:
Step 1: Prepare
your host
•
Step 2: Prepare
your target
device
•
•
Set up your Linux host
Set up a cross compilation environment
•
Install a target package including remote collectors
Build a Yocto* Project kernel
Cross build and load the sampling driver (sep)
Configure ssh for a no-password connection
Step 3: Prepare
your sample
application
•
•
Cross compile tachyon for use
Copy tachyon to your Yocto* Project target
Step 4: Run
Advanced
Hotspot Analysis
•
•
Use the GUI to set up your remote configuration
Collect performance data on your application
Step 5: View
your results
•
See analysis results
•
•
Next step: Prepare Your Host
9
2
Tutorial: Finding Hotspots on a Remote Linux* System
Prepare Your Host
Here are the steps for setting up a Yocto* Project 1.x environment on your host. Go to http://
www.yoctoproject.org/docs/current/ref-manual/ref-manual.html#required-packages-for-thehost-development-system to see a list of required packages for Debian*, Fedora* or other releases.
1.
On your Linux* host, download the pre-built toolchain from http://downloads.yoctoproject.org/
releases/yocto/yocto-1.x/toolchain/. The following sample toolchain tar file is for a Yocto
Project 1.2.1, 32-bit development host system and a 32-bit target architecture:
2.
Install your selected tar file on your Linux development host in the root "/" directory, to create an
installation area /opt/poky/1.x.
3.
Set up your Linux host
poky-eglibc-i686-i586-toolchain-gmae-1.2.1.tar.bz2
a.
There are many supported Linux distributions. See the Yocto Project* Quick Start for information
on the required setup for supported Linux* distributions: http://www.yoctoproject.org/
docs/1.2/yocto-quick-start/yocto-project-qs.html.
b.
To setup a Ubuntu* x64 12.04 system for example, run:
$ sudo apt-get install sed wget cvs subversion git-core coreutils \
unzip texi2html texinfo libsdl1.2-dev docbook-utils gawk \
python-pysqlite2 diffstat help2man make gcc build-essential \
g++ desktop-file-utils chrpath libgl1-mesa-dev libglu1-mesa-dev \
mercurial autoconf automake groff
Next step: Prepare Your Target Device
Prepare Your Target Device
After you have installed the VTune Amplifier for Systems on your host, you must install a target package with
remote collectors on your target device. Then, build the Yocto* Project kernel by downloading and
customizing the latest stable build system. You will also configure ssh so for a password-less connection, and
cross build and load the required sampling drivers.
NOTE
You will not be able to identify time-consuming code in your application using Advanced Hotspots
Analysis if the nmi_watchdog interrupt capability is enabled on your target system, which prevents
collecting accurate event-based sampling data. You will have to disable nmi_watchdog interrupt, or
see the "Troubleshooting" section of the product documentation for details.
1.
Copy the required package archive located at [program files]\intel\system_studio_201x.
0.<package_num>\vtune_amplifier_201x_for_systems\targets directory to your remote Linux*
target system.
a.
10
The vtune_amplifier_target_sepx86_64.tgz provides hardware event-based standalone
sampling collector only (sep) for x86 and 64-bit systems. It is not possible to run this driver
remotely from the VTune Amplifier: it is for sep only.
2
Finding Hotspots
2.
The vtune_amplifier_target_x86_64.tgz provides all VTune Amplifier remote collectors for x86
and 64-bit systems.
a.
b.
3.
Unzip the target archive to the default /opt/intel/ directory on the target.
Unzip both x86 and x86-64 packages if you plan to run and analyze 32-bit processes on 64-bit
systems.
Next, for the Yocto Project version you want, clone and checkout the meta-intel package.
git clone git://git.yoctoproject.org/meta-intel/
cd meta-intel
git checkout denzil
4.
If you have not yet done so, download the Yocto Project version you selected, for example, version
1.2.1:
wget http://downloads.yoctoproject.org/releases/yocto/yocto-1.2.1/pokydenzil-7.0.1.tar.bz
You can see a list of required packages for Debian*, Fedora* or other releases here http://
www.yoctoproject.org/docs/current/ref-manual/ref-manual.html#required-packages-forthe-host-development-system. For a list of all Yocto Project versions, see https://
www.yoctoproject.org/downloads/yocto-project.
a.
Extract the contents of the archive:
tar xjf poky-denzil-7.0.1.tar.bz
5.
Take the appropriate BSP package if you need it. For example, for Fish River Island 2 platform, choose
one of these two ways:
wget https://www.yoctoproject.org/download/intel-atom-e640t-fish-river-island-2, or
Checkout with git (see steps 2a, 2b, and 2c above). The fri2 BSP is located at meta-intel/metafri2.
6.
For a list of all BSPs, see https://www.yoctoproject.org/downloads/bsp.
Setup the build environment:
source poky-denzil-7.0.1/oe-init-build-env poky-denzil-7.0.1-build
7.
Setup the BSP
a.
b.
c.
Edit poky-denzil-7.0.1/build/conf/local.con by tailoring MACHINE for the BSP you want
to build. In the case of the tutorial example fri2-noemgd.
Edit poky-denzio-7.0.1/build/conf/bblayers.conf by specifying the meta-intel you
checked out and the BSP meta directory, in this example, meta-intel/meta-fri2.
You may also need to add paths to BBLAYERS in bblayess.conf, such as:
poky-denzip-7.0.1/meta-intel \
poky-denzil-7.0.1/meta-intel/meta-fri2
d.
It is important to check and set your parallelism options, where "n" is the number of cores on
your host machine:
BB_NUMBER_THREADS="n"
PARALLEL_MAKE="-j n"
8.
Build the Yocto Project kernel:
bitbake core-image-sato.
9.
This creates a kernel that will build and run sep.
Configure ssh to work in password-less mode so it does not prompt for a password on each invocation.
To do this, use the key generation utility on the host system.
host> ssh-keygen
host> cat ~/.ssh/id_dsa.pub | ssh user@target "cat >> ~/.ssh/authorized_keys"
You will need the target user password to complete this operation, and will be prompted for it. If this
command completes successfully, you will not require it afterwards.
11
2
Tutorial: Finding Hotspots on a Remote Linux* System
After you set the password-less mode, run a command to verify that a password is not required
anymore, for example:
host> ssh user@target ls
Make sure that only the owner (root) has read/write/execute permissions to the $HOME/.ssh/
directory. In these examples, Target can be a hostname or IP address.
NOTE
Another example of building a Yocto project and installing it is available at the Intel(R) Developer Zone
https://software.intel.com/en-us/forums/topic/507002.
Next step: Cross Build and Load Sampling Drivers
Cross Build and Load the Sampling Drivers
Build the sampling drivers for your target environment on your Linux* host and transfer them to the target,
where you load them into the kernel you customized for this purpose. If you do not build the drivers for your
specific device by version and build number, your driver will not load; or, if it loads, it will not work. To find
the kernel-version, see $KERNEL-SRC-DIR/include/generated/utsrelease.h. To find what version of the
kernel is currently running, use the uname -a command on the target.
1.
2.
Change into the source directory: cd sepdk/src
Build the sampling driver. For example:
/opt/intel/system_studio_201x.0.0.***/sepdk/src/ \
build-driver -ni --c-compiler=i586-poky-linux-gcc \
--kernel-src-dir=~/yocto/poky-denzil-7.0/build/tmp/work/\
fri2_noemgd-poky-linux/linuxyocto3.2.11+git1\
+5b4c9dc78b5ae607173cc3ddab9bce1b5f78129b_1+7\
6dc683eccc4680729a76b9d2fd425ba540a483-r1/linux-fri2-noemgd-\
standard-build --kernel-version=3.0.24-yocto-standard \
--make-args="PLATFORM=x32 ARITY=smp" --install-dir=../prebuilt
3.
Once the driver files are built, copy them from your host to your target machine, and load them:
cd /opt/intel/system_studio_201x.0.0.***
scp -r sepdk root@10.224.36.208:/home/root
Load the sampling drivers on your target machine:
target> cd /home/root/sepdk/src
target> ./insmod-sep3 -re
Checking for PMU arbitration service (PAX)...detected.
PAX service is accessible to users in group "0"
Executing: insmod ./sep3_15-x32-3.0.24-yocto-standardsmp.ko
Creating /dev/sep3_15 base devices with major number 251...done.
Creating /dev/sep3_15 percpu devices with major number 250 ... done.
The sep3_15 drivers has been successfully loaded.
Checking for vtsspp driver ... not detected.
12
2
Finding Hotspots
Executing: insmod ./vtsspp/vtsspp-x32-3.0.24-yocto-standardsmp.ko gid=0 mode=0666
The vtsspp driver has been successfully loaded.
4.
For some embedded Linux systems the insmod-vtss command may not work. In that event, you can
load the kernel module directly by using insmod:
./insmod-sep3 -re
cd /home/root/sep/sepdk/src/vtsspp
insmod vtsspp.ko
5.
Confirm that the driver has been installed:
lsmod | grep sep
sep3_10
80108 0
lsmod | grep vtsspp
vtsspp
295740 0
Next step: Prepare Your Sample Application
Prepare Your Sample Application
The Intel(R) VTune(TM) Amplifier for Systems release includes sample code called tachyon for you to
compile and use on the target system. The needed changes to the Makefiles listed in this section have
been completed in the makefiles located in your distribution, which are included as examples. After
compiling tachyon, copy the application to your target.
1.
Change directories so you can untar the sample code:
2.
Unarchive (untar) the tachyon sample application:
3.
tar xvzf /opt/intel/vtune_amplifier_2014_for_systems/samples/en/C++/
tachyon_vtune_amp_xe.tgz
The tachyon sample code included with your distribution is modified for the Yocto* environment. In the
top level Makefile, the line containing CXX has been commented out. In the lower level tachyon/
common/gui/Makefile.gmake file, the following lines have been added:
4.
If the host system is x86_64, you must comment some lines in the Makefile:
cd /~yocto
#ifeq ($(shell uname -m),x86_64
#Arch=intel64
#CXXXFLAGS+= -m64
#else
Arch=ia32
CXXFLAGS+= -m32
#endif
5.
Source important environmental variables:
source /opt/poky/1.2/environment-setup-i586-poky-linux
13
2
6.
Tutorial: Finding Hotspots on a Remote Linux* System
Compile the tachyon code:
make
7.
Copy the tachyon binary, the dat folder and the libtbb.so folder to your Yocto* target, to an
appropriate target area where the executable can find it. For example:
scp tachyon_find_hotspots date libtbb.so root@target_ip:/usr/local/sbin
Next step: Run Advanced Hotspot Analysis
Run Advanced Hotspot Analysis
The following steps show you how to launch the Intel(R) VTune(TM) Amplifier for Systems GUI and create a
new project.
1.
2.
Run amplxe-gui.
Click on New Project and enter an identifying project name such as tachyon1 so that you can identify
this project from other projects. Keep or change the default project file Location: and then select
Create Project.
3.
Specify remote Linux (SSH) for the Target system. You must also specify the user name and the host
name or IP address of the remote system you are profiling at the SSH detailsitem. Then, at the
Application: option, enter the full pathname for the target binary, in this case,/home/root/tachyon,
and any needed arguments or application parameters, in this case the path to the needed data file /
home/root/dat/balls.dat and click OK.
14
2
Finding Hotspots
4.
Basic Hotspots Analysis is the default for new projects you set up. To change this to Advanced Hotspot
Analysis, select Advanced Hotspots in the Analysis Type frame. You will notice communication with
the remote system before the Analysis Type screen appears. Change the Analysis type from Basic
Hotspots to Advanced Hotspots.
15
2
5.
6.
Tutorial: Finding Hotspots on a Remote Linux* System
Launch the Advanced Hotspots Analysis session by clicking on the Start button.
The VTune Amplifier sets up the now passwordless SSH connection to your target device and launches
the target application. It collects Advanced Hotspots data with default settings, and then copies those
results back to the host.
Next step: View Your Results
View Your Results
After the target device sends the tachyon results - usually within a minute or two - the results appear on
your display:
16
2
Finding Hotspots
Next step: Prepare your own embedded applications for analysis using the VTune Amplifier to
view hotspots.
17
3
Tutorial: Finding Hotspots on a Remote Linux* System
3
Summary
You have completed the Finding Hotspots tutorial. Here are some important things to remember when
using the Intel(R) VTune(TM) Amplifier 2014 for Systems for Linux* to analyze your code for hotspots:
Step
Tutorial Recap
Key Tutorial Take-aways
1. Prepare your
host
You downloaded and setup the prebuilt Yocto Project* toolchain on your
Linux* host and configured a kernel
for use on your target system.
•
You installed a stable Yocto Project
kernel; setup a password-less
connection to your target; and
compiled and loaded sampling
drivers.
•
2. Prepare your
target device
3. Prepare your
sample
application
You extracted the tachyon code and
if necessary modified it for use in
your specific embedded environment.
•
•
•
•
•
4. Run Advanced
Hotspot Analysis
5. View your
results
You ran the VTune Amplifier GUI to
configure and launch Advanced
Hotspot Analysis on the tachyon
code on your target device. It ran on
your target and the results were sent
via ssh back to your server.
You viewed the Advanced Hotspots
analysis on the tachyon in the GUI
that you launched it from.
•
•
•
•
Download and extract an appropriate
toolchain from the Yocto Project web site
and create an installation area
Build a Yocto Project kernel and transfer
it to your target for use
Configure ssh so there is no password
request for file transfers between your
server and target
Compile sampling drivers and transfer
them to your target for use
Unarchive tachyon in the /~yocto
directory
View the necessary changes to the top
level Makefile
View the necessary changes to the lower
level Makefile.gmake
Launch the GUI using theamplxe-gui
command.
Use the Project Properties: Target
tab to choose and configure your
analysis target.
Use the Analysis Type configuration
window to choose, configure, and run
the Advanced Hotspot Analysis.
You can also use the VTune Amplifier
command-line interface by running the
amplxe-cl command to test your code
for hotspots and regressions. For details
see the Command-line Interface Support
section in the VTune Amplifier online
help.
Optimization Notice
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for
optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and
SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or
18
3
Summary
Optimization Notice
effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessordependent optimizations in this product are intended for use with Intel microprocessors. Certain
optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to
the applicable product User and Reference Guides for more information regarding the specific instruction
sets covered by this notice.
Notice revision #20110804
19
4
Tutorial: Finding Hotspots on a Remote Linux* System
Key Terms
4
baseline: A performance metric used as a basis for comparison of the application versions before and after
optimization. Baseline should be measurable and reproducible.
CPU time: The amount of time a thread spends executing on a logical processor. For multiple threads, the
CPU time of the threads is summed. The application CPU time is the sum of the CPU time of all the threads
that run the application.
CPU usage: A performance metric when the VTune Amplifier identifies a processor utilization scale,
calculates the target CPU usage, and defines default utilization ranges depending on the number of processor
cores.
Utilizatio
n Type
Default
color
Description
Idle
All CPUs are waiting - no threads are running.
Poor
Poor usage. By default, poor usage is when the number of simultaneously
running CPUs is less than or equal to 50% of the target CPU usage.
OK
Acceptable (OK) usage. By default, OK usage is when the number of
simultaneously running CPUs is between 51-85% of the target CPU usage.
Ideal
Ideal usage. By default, Ideal usage is when the number of simultaneously
running CPUs is between 86-100% of the target CPU usage.
Elapsed time:The total time your target ran, calculated as follows: Wall clock time at end of application
– Wall clock time at start of application.
finalization: A process during which the Intel® VTune™ Amplifier converts the collected data to a database,
resolves symbol information, and pre-computes data to make further analysis more efficient and responsive.
hotspot: A section of code that took a long time to execute. Some hotspots may indicate bottlenecks and
can be removed, while other hotspots inevitably take a long time to execute due to their nature.
Advanced Hotspots Analysis: A non-default analysis type used to understand the application flow of
control and to identify hotspots, that works directly with the CPU without the influence of the booted
operating system. VTune Amplifier creates a list of functions in your application ordered by the amount of
time spent in a function. It also detects the call stacks for each of these functions so you can see how the hot
functions are called. VTune Amplifier uses a low overhead (about 5%) user-mode sampling and tracing
collection that gets you the information you need without slowing down the application execution
significantly.
A target is an executable file you analyze using the Intel® VTune™ Amplifier.
host system: The Linux* server on which you install amplxe-gui and from which you launch your
application analysis and view those results.
target system: The supported, embedded device on which you install sampling drivers and run the
application you are running performance analysis on.
20
4
Key Terms
viewpoint: A preset result tab configuration that filters out the data collected during a performance analysis
and enables you to focus on specific performance problems. When you select a viewpoint, you select a set of
performance metrics the VTune Amplifier shows in the windows/panes of the result tab. To select the
required viewpoint, click the (change) link and use the drop-down menu at the top of the result tab.
21
Tutorial: Finding Hotspots on a Remote Linux* System
Index
H
Host, Prepare Your 10
Hotspot Analysis, Run Advanced 14
N
Navigation Quick Start 6
R
S
Sample Application, Prepare Your 13
Sampling Drivers, Cross Build and Load 12
Summary 18
T
Target Device, Prepare Your 10
Results, View Your 16
22