n incredibly large number of everyday products employ DSP technology. The seemingly insatiable demand for digital wireless handsets leads the list, followed by networking infrastructures, consumer electronics, voice-over-IP products, industrial and automotive control systems, and hard disk controllers. Market pressures are forcing makers of these products to reduce cost, size, and power consumption, while continuing to add value and differentiate their products. Many of today's high-end multiprocessor systems are built with backplane-based architectures, such as VMEbus. Future embedded multiprocessing architectures will also employ hybrid system-on-chip (SOC) architectures, which will incorporate both DSP and MPU cores. How will these complex hybrid systems be debugged? Many of these products include a RISC or CISC processor, in addition to the DSP. According to the 1999 Embedded Systems Study conducted by Beacon Technology Partners f 11 the average number of different processor architectures per embedded system has been rising for several years, and is now between two and three. (See Figure 1). Hardware integration and software complexity The design of today's embedded systems is driven by stringent time-to-market requirements. However, increased silicon integration has the undesirable effect of working against faster time-to-market. by complicating the software development and debug process. A le ng Si I 10% of the embedded systems designed in 1999 employed eight or more processor chips! (See Figure 2) tO in Pr The number of differentprocessor architectures per embedded system is consotidating around 2-3 different embedded processor architectures used per design 1 5 or More An even more difficult problem is on the horizon. According to the sample taken in the same Beacon Technologies study, the percentage of high-end multiple-processor systems is rapidly increasing - over 30% of respondents use three or more processors per embedded design, and over Multiprocessing increases the complexity of hardware and software design and debugging. When moving from a single processor architecture to a multiple processor architecture, a whole new set of design factors must be taken into account in order to make the new system as eftcient as possible. y I 4 Number of processor architectures per embedded design There are significant benefits to using DSP chips in combination with CISC and RISC chips. However, board designers face difficult problems during the debug phase of the project. DSP and CISCI RISC technologies have evolved independently, and they are supported by different sets of development and debugging 1 3 nl I 2 The SHARC DSP processor In order to explore the problems encountered when designing multiprocessing architectures, we will use the development of a SHARC DSP multiprocessor system as an example. However, many of the same issues face developers designing SOC-based systems. DSP Engineering / Winter 2000 / I I ciency, while the remaining portion of the application software might be written in C or C++. Version control is vital Due to the complexity of multiple processor systems, the project team typically includes several people with complementary technical backgrounds, including: Sift ' software engineers silicon engineers application experts Si Frequently the team members collaborate from different locations, and even different time zones. With these concurrentlyrunning work locations, formal hardware and software version control is essential to efficient development. ng Number of rxocessors oer embedded desian Figure 2 1 The SHARC DSP chip has many features I point-to-point architectures shared bus architectures Higher levels of integration have driven a trend toward systems that integrate: II multiple processors Ilargeamounts of memory ri ? / DSP Engineering / Winter 2000 System-on-chip debugging is especially difficult due to: limited visibility into the flow of the program limited control over internal operations The application code is often written in mixed languages To further complicate the debug process, DSP software is often written using both assembly code and C or C++ code. The most time-critical functions are written in assembly language. for maximum effi- Internal visibility is vital Engineers need to have visibility into the internals of the processors. including the contents of control registers. memory locations, and peripheral registers. However, with the very high functional densities of today's silicon devices, there are only a few pins available to support test and debug. This means that system debugging must be done with a scanbased emulator. The SHARC DSP (like many other processor chips) uses the IEEE 1149.1 (JTAG) standard to scan information out of (or into) the processor chip, without adding significantly to the pin count of the silicon device. y a The SHARC also has an internal bank of shared memory that can be accessed through this external port. When several SHARCs are interconnected on a shared bus, the resulting unified address space allows direct interprocessor accesses betwen each others internal shared memory, as well as access to any external shared memory that resides on the shared bus. debug the interaction between the processors meet the real-time performance requirements nl J Shared bus multiprocessing architectures A shared bus multiprocessing architecture uses the SHARC DSP's external port to provide a connection to a single shared bus that interconnects all of the processors. The external port provides a simple glueless connection between up to six Â¥^AR DSPs. and a host processor. A debugger that has been integrated with an instruction set simulator allows testing and performance benchmarking of the software in the absence of the target hardware. Clearly, an instruction set simulator cannot test the system at full speed, or in real-time. However, it can provide useful information about performance and about the correctness of the program's algorithm. When the target hardware eventually becomes available, engineers can begin integrating the hardware and software components, and testing the complete system in real-time. tO I Debugging often requires multiple tool sets Even more difficult is the debugging of heterogeneous systems that also include a RISC or CISC processor. Software and hardware engineers are forced to use separate (and incompatible) debuggers and tools, provided for each of the processor architectures. In spite of this, they must find some way to: in I In point-to-point multiprocessing archilectures a dedicated communication channel is provided between each pair of processors, using the SHARC DSP's link ports. In most multiple-processor systems each processing node will need to have multiple . .point-to-point connections. and therefore multiple con~municationports. The complexity of these systems provides new challenges to software engineers. The tools provided with the SHARC DSP do support multiprocessor code development, including a linker that supports the creation of executables for multiple processors, and for shared memory. However the debugging of SHARC-based systems is often difficult. Pr Point-to-point multiprocessing 1 architectures J I complex peripheral communication links le [ that are specifically designed to support multiprocessing. For example. it can support: The target hardware is often unavailable Due to stringent time-to-market requirements, software development often begins prior to the availability of target hardware, regardless of whether that hardware employs discrete processor chips or SOC technology. Concurrent software and hardware debugging can be an incredible iterative challenge. Everything should be done to test each software component as it is developed, in order to minimize uncertainties with respect to proper operation and adequate performance. Software bugs are often difficult to localize In multiprocessor systems, it's often difficult to track down the source of software bugs, since the processors interact with each other. For example, if one processor sends an erroneous message to another, it might precipitate an error that is exhibited in another processor at a later time. I I 1 In order to use DSP code generation tools along with RISCICISC code generation tools, software development environments should: Ibe open and able to seainlessly use tools from various suppliers. W reduce the learning required, and eliminate costly errors by providing I 14 / DSP Engineering / Winter 2000 If a host-resident IDE is equipped to send commands to target-resident debug monitors (and if it can handle the monitor's particular communication protocols) a software engineer can use the same code generation and debugging tools as those used for scan-based debugging. If the target has a multitasking operating system, the debug monitor should run as an independent task, executing concurrently with the application software that is being debugged. Because it runs concurrently with the application software, it can process commands from the host-resident debugger without halting the processor, or the application program. When the host tells the target-resident debug monitor that it wants a breakpoint set at a particular point in the application program, the debug monitor copies and retains the machine language instruction stored at the corresponding memory location. Then is overwrites that memory location with an illegal instruction. When the illegal instruction is fetched, the application program execution is temporarily halted, and an exception handling routine is invoked. Scan-based emulation using a TAP can be extremely helpful. It allows the host-based debugger to discover (and control) the internal state of the processor core. Should the target system crash, the host system can still collect data from the target system, to perform a post-mortem analysis. It can then restart the target. The primary disadvantage of the typical JTAG on-chip debug implementation is its intrusiveness on the real-time execution of the application software, since it stops the processor each and every time it needs to access the processor's internal state. Debugging systems that don't have on-chip debug hardware Some target processors do not have a TAP. Other processors do have a TAP, but it's When the host tells the debug monitor to resume execution of the application program, the monitor replaces the illegal instruction with the original instruction, and then resumes execution at the restored instruction. y .I However, target access and control can still be achieved by using a target-resident program called a debug monitor, which communicates with the host computer and allows the host to control the execution of downloaded application software. Debug monitors have been widely used for many years, and a properly designed monitor consumes only a small percentage of the target processing power and memory. nl I only designed to verify the operation of internal hardware during the chip's manufacture, and does not support software debugging. In either case, host-to-target access (using scan-based emulation) is not an option. tO F requirement #1: Combining DSP and RISCICISC tools Today's sophisticated hardware design and simulation capabilities, licensable proces sor cores, multiprocessing enabled devices, and advanced silicon fabrication methods allow the design " of nowerful. cost-effective hardware. However, the system software s t must be developed and debugged, in order to make such systems deliver on their performance promises. Without tools to develop, integrate, and optimize embedded software, producing such a product within cost and time-to-market constraints is unlikely, if not impossible. in I In summary, designing and integrating software into a multiprocessor embedded system based upon DSP and inicroprocessor technology can be quite complex and challenging. Debugging systems that have on-chip debug hardware Many DSP and RISCICISC cores have an on-chip serial Test Access Port (TAP) which is compatible with the IEEE 1 149.1 JTAG specification. Through the use of a scan-based emulator connected to this TAP, the host-resident debugger can temporarily halt a processor at any point in the execution of the application software. Each time the execution is halted, the debugger can access the contents of the processor's on-chip hardware resources (such as its registers, memory, or peripherals) through the TAP. Pr I le As SOC designs integrate more functions into a single silicon chip, the buses between what were previously separate functional units get integrated into the silicon, and the bus signals might no longer be accessible at the package pins. This prevents the engineer from monitoring these signals for test and debug purposes. Special design accommodations can be made to provide better internal signal visibility. However, these accommodations are often limited by the number of available pins on the SOC package. I Requirement #2: Target access and control Embedded target hardware resources are typically limited to those needed to support the application. As a result, most development and debugging is done with cross development tools, where the Integrated Development Environment (IDE) executes on a host computer system (usually a PC or a UNIX workstation) which is connected to the target system through a communications link. ng Si When an error is detected it is often helpful to capture the state of the entire system - or to pause the system operation. This allows the state of the system to be analyzed. If all of the processor cores are equipped with compatible on-chip debug support, this can be done with a scan-based emulation tool. However, in heterogeneous systems this might not be the case, and customized 17ridgi11ghardware might be needed to synchronously control the operation of the various processor cores. the development team with a simple, intuitive, graphical user interface to manipulate and coordinate all project files. W include a powerful programming language editor that has been designed specifically for writing software. provide interactive compilation and editing, to facilitate the location and the correction of compilation errors. Iprovide an open interface to industry standard version control systems. Iinclude built-in network support, to make local and remote team development practical, as well as efficient. Disadvantages to using a target-resident debug monitor There are a couple of disadvantages to using a target debug monitor: W The debug monitor shares target resources with the application program, using a portion of the target memory and processing power. Since the monitor executes on the same processor as the application pro- I gram, the application program might wipe out the monitor when it crashes. If the monitor is wiped out. the host loses access to the target, thereby making it practically impossible to determine the cause of the crash. Debugging without prototype hardware In most embedded development projects, the target hardware is not available to the software engineer during the early stages of the project. However, time-to-market demands often dictate that software development begin before the hardware prototype is available. Si 1 I I Most 1 simulators model only the processor core - they don't simulate the peripherals of a highly integrated device or system. However. they do provide instruction-accurate or cycle-accurate fidelity, and are thus very useful for validating logic, verifying algorithms, and for measuring the processing resources consumed by the execution of the application software. ' Provide Requirement #3: visibility into I I timers con~municationchips Requirement #4: Evaluate and optimize system performance To paraphrase an old saying , "A correct answer that is not provided within the required time limit is the wrong answer". This is true for real-time software. It is best to discover software inefficiencies as early in the development cycle as possible. include a debugger architecture that could be used to debug a wide range of processor types, using either commercially-available tools or in-housedeveloped tools - all under a common graphical interface. allow a software engineering team to create, debug, and manage revisions of their software application. provide simulators to allow application software development to proceed without target hardware. allow access and control of target hardware. using either ROM monitors or emulators. allow application programmers to evaluate and optimize the performance of their software. ASPEX: An open, multi-core IDE One example of such an open, multi-core IDE is ASPEX by Allant Software Corporation. It's design is based on experience gained from developing three earlier generations of tools for embedded RISC and CISC n~icroprocessors. ASPEX supports the debug of a wide range of processors, including: DSPs from Analog Devices DSPs from DSP Group DSPs from Motorola DSPs from Texas Instruments ARMIThumb processors StrongARM processors nl Using DSP and RISCICISC tools The ASPEX ooen environment facilitates integration of a wide range of code generation tools and utilities. Yet, it still provides a uniform, graphical interface that tightly integrates the tools, for ease of use. All of the following tools are easily accessible from the main debugger window, and operate together seamless1y : y Typically, such analysis is an iterative process that attempts to locate the portions of the application code that use the largest portions of the processor time. When a "hot spot" is identified, the application programmer optimizes that code to "cool it down". This process can be iterated to progressively reduce the processing time needed to execute the application program. A debugger should facilitate this process by allowing the programmer to capture performance data without having to modify the application source code or having to rebuild the application program. The debugger should also be able to capture performance data when the application program is running on a simulator. It should also be able to import and use performance data from logic analyzers. 16 / DSP Engineering / Winter 2000 A proposed solution: An open, multi-core IDE How would you meet the four challenges outlined above i f you were developing a complex multiprocessor embedded system? Ideally you would like to have an open, multi-core IDE that integrates an otherwise inconsistent and incompatible collection of software development and debugging tools. This IDE would: tO integrated peripherals Highly integrated SOC architectures (as well as designs using discrete processors) include peripherals that are integral to the proper functioning of the product, such as: It should also assist the programmer in defining the required values to write into the internal chip configuration registers, based on the desired configuration of the internal chip resources. in J external memory blocks external memory-mapped registers bit Fields within external registers enumerations for external register bit fields Pr I To allow this, a debugger should allow configuration of the debugging environment to match the target system. For example, it should allow the application programmer to define and display: le J However, most software debuggers provide little or no visibility into these peripheral devices. During the early phases of application software development the application programmer must either ignore them, or deal with them as "black boxes". It would be highly beneficial if application programmers could see how the peripherals interact with their application software. ng The development and testing of application software can still proceed if the IDE includes a target simulator that uses the same code generation and debugging tools that will eventually be used when the target hardware becomes available. special 110 chips memory chips ASIC chips FPGA chips the editor the debugger the code generation tools the trace analysis the project manager the visualization tools Figure 3 shows the Code Window (the main debugger window) with two files open for editing. Editing is done in the debugger window, and can be done during debugging sessions. of the build as it progresses, and any errors that occur during the build process are displayed. Figure 5 shows the results of the build in the Output Logging Window. If an error message appears in the Output Logging Window during the build process the user double-clicks on it and the cursor is placed at the location of that error in the source code of the Code Window. The code can then be fixed, in preparation for a new build. Figure 6 shows the cursor positioned to correct a compile error. After achieving a successful build of the application software, the user can load the resulting file and then debug it, by stepping through the source code in the same window in which it was edited. Si The ASPEX debugger supports the debug of multiple-processor, heterogeneous systems from a single debugger. It also provides extensive controls over program execution of hybrid C/C++ and assembly language programs. tor ( i f - 0; i < 0x50000. i++) ng le I Figure 3 USTOM=defaul! Figure 4 shows the Project Settings Window, from which a user can graphically specify the project's build settings. This eliminates potential errors, since it saves the user from having to learn cryptic command string settings to specify tool options. The consistent graphical interface of the Project Settings Window is particularly helpful when using code generation tools from different manufacturers. The ASPEX build facility knows about the interrelationships of the files in the project and it automaticallv determines which files need to be recompiled, due to editing of the source code since the last compile. A project build is launched from the ~ b o l smenu of the Code Window. The Build tab of the Output Logging Window shows the progress 18 / DSP Engineering / Winter 2000 aul t Link Advanced Pre Post Link Aspex Commands y I ASSEMBLE=^^^ nl I Files can be opened by dragging them from Windows Explorer and dropping them on the Code Window. To open a second editor window for the same file, a file tab can be dragged and dropped from the Code Window to the Windows desktop. tO 1 d Open Close cation Histo: in ri Pr The editor provides language sensitive editing - source code is displayed using color coding to distinguish keywords, text, and comments. This allows the user to quickly spot problems during an editing session. Compatibility is provided for viy~rief,and visual c++.A user-supplied editor can be easily integrated into I To support the development of large-scale embedded programs, ASPEX includes easy-to-use source code navigation features, including browsers that quickly access program variables, as well as displaying their type, value. or definition. It Figure 4 saaplearn c - 0 warnings (+ 1 suppressed) 0 errors Mik*: Brrca- cod* &2$. wbilaÑIcin 'ÈÈÈplÇçr uk*: Error cod* 2ES, çhilÑkin 'rebuild Coxnand exited with error or warning 255 Figure 5 1 s e r i o u s error chip debug support. Figure 7a shows a Code window for RISC processor, and Figure 7b shows a Code window for a DSP. (Figure 7a also shows the Register window.) Note that the title bar of each window identifies the processor. £S 24 25 26 27 28 int main(int argc, char **argv) Debugging systems with JTAG ports JTAG-based core debugging typically uses a 5-wire interface to serially shift data in and out. (See Figure 8) The role of the JTAG TAP during the debug process is to provide access to the processor core's pipeline. This allows the debugger to: register unsigned int i = 0. unsigned int j = 0. k 0; - meninit(), for (i 0; i < 0x5000; i++) Si Some signals are available to TAP via periphery boundary le ng Some signals pass through the TAP ¥à Figure 6 also allows the user to set breakpoints on program variables, which display all of this information. and systems that mix JTAG and other forms of on-chip debugging. It can simultaneously debug multiple DSPs and RISC processors from a single instance of the debugger, whether they are connected to single or multiple emulators. It can also synchronize the starting and stopping of all processors during the debug process, within the limits of the on- REDUCE TIME-TO-MARKET Figure 8 insert instructions for reading or writing to registers insert instructions for reading or writing to memory start execution set watchpoints for tracing stop and reset the core nl tO in Pr Single emulator and multi-emulator debugging The ASPEX debugger supports both multiple processor systems that have all processors on the same JTAG scan path, JTAG interface In 5-wires I JUMPSTART YOUR DSP DESIGN 3-1/2 DAYS LECTURES & LABS W I N A FREE COURSE y USE HARDWARE/SOFTWARE DEVELOPMENT TOOLS LEARN DSP CHIP ARCHITECTURE ASSEMBLY/C PROGRAMMING LABS DSP WORKSHOPS, 814 SAN JACINTO BLVD., SUITE 200 AUSTIN. TX 78701.512-320-0032, DSP@IO.COM Enter 19 on Reader Service Card DSP Engineering / Winter 2000 / 19 with the host computer using a serial or Ethernet link. These monitors are often used on low-cost evaluation boards, to provide a debug capability without requir- ing an emulator. 32 call ( 34 35 36 if ( ! 33d 1 > ~ r i n taai I Host, DSP core ~ 1 DIM detects and traps reads and writes to external memory. le Figure 7a Figure 9 Each time the monitor writes to (or reads from) this external RAM, the DIM detects and traps this operation, notifies the Host Debug Card (HDC) and then passes the operation to the host through dual-ported memory. The debugger (running on the host) reads and writes to this same dualported memory. The cable between the HDC card and the DIM has signals that allow the HDC card to stop and hold the core in debug mode. Whenever this happens, the monitor in the DSP core copies all register values to the external RAM. Then it writes a message to a mailbox location (indicating that it has finished saving the DSP state) and enters a loop, waiting for another mailbox location to change. Eventually the debugger writes a request to this mailbox location, and the monitor then processes that request. I Figure 7b Under this model, the TAP also has access to some of the bus signals in the core, including: i- changing the divider, becoming the clock source, or by synchronizing with the clock. 4 the address bus 4 the data bus 4 the control bus The 5-wire JTAG interface can also be daisy chained with other compatible cores to enable tightly-coupled multiple processor synchronized debugging. Further, the TAP can be used to read the core's debug state, and to place the core in the debug state. It is also common to use the TAP to control the core clock, by Debugging systems with on-chip monitors and non-JTAG ports On-chip debug monitors typically reside in the target memory, and communicate 20 / DSP Engineering / Winter 2000 y nl tO in Pr RW 008136 i4=i6; RW:008137 modify (i dm(1n4,i4) RW - 008138 RW:008139 rlO=rlO+l . PRIMES\#27 estnum += RW:OO813A r2=0x2, RW 00813B r6=rb+r2, : = 0 PRIMES\#30 RW - OO8l3C r3=0; RW.OO813D 073EOOFFFFE2 jump (PC. . PRIMES\#31 . . 3 2 exit 1 x more inferoation, select H& fms ng Si 40 w i d call(in% 41 f 42 static i n t HZCV FIQ IBQ STATE M 43 switch ine 0010 %IS 'DIS 32bis "=I 44 However, not all target debug monitors use a serial or Ethernet link to communicate with the host computer. Some DSPs include an on-chip monitor that stores its data in memory on the external bus. The external bus is then connected (via a multi-pin connector) to dual-ported memory on a debug interface module (DIM). (See Figure 9) The on-chip monitor and external RAM model described above cannot be daisy chained with JTAG-compatible devices. This presents a challenge to engineers who need tightly-coupled, synchronized debugging. Bridging hardware is needed to provide tightly-coupled synchronized multiprocessor debugging. Debugging systems with on-chip monitors and JTAG ports The on-chip monitor approach described above has been modified for some DSPs, allowing the use of JTAG as a communi- ~ cation scheme. In this model the JTAG TAP serves as the external data memory for the on-chip monitor. (See Figure 10) Whenever the monitor writes data to (or reads data from) the external memory in the JTAG TAP, the TAP withholds the data transfer acknowledge signal, thus keeping the DSP in wait states. When the JTAG controller finishes scanning the data written by the monitor (or read by the monitor) in the DSP, it sends the data transfer acknowledge signal to the DSP, and execution resumes. Thus, instead of the JTAG TAP controlling the execution state of the core, the JTAG TAP can only stoplhold the core when the monitor 1 writes or reads to the TAP. I -I tightness of the synchronization varies, depending on whether the debugging is being done on the target hardware or a target simulator, and it also varies somewhat from processor to processor. Functions that can be synchronized include: 4 4 4 4 Start Stop Single-stepping Cross-triggering breakpoints Figure 11 shows how processors are selected for synchronization in the debug Manager. Using the JTAG scan chain, the Start, Stop, and Single-step operations can be closely coordinated between the processors. To achieve this level of synchronization, ASPEX independently "sets up" each processor to perform the desired action, and then sends a final "execute" sequence to all of the processors. Since the "execute" is processed independently by each processor on the JTAG scan path (daisy chain), there will be a few nanoseconds of delay (skew) between the start-up on each processor, Si !- Manaaers BE? I le ng TAP accepts external memory access requests, from the core and then withholds the data transfer acknowledge signal until the debugger (via JTAG) shifts the value out (or in). 1 Figure 10 I I ASPEX supports several different kinds of synchronization between processors. The 22 / DSP Engineering /Winter 2000 Figure 11 Tightly-coupled processors When debugging the application software on the target hardware, ASPEX uses tightly-coupled synchronization, if it is supported by the underlying processor and emulator/monitor. This means (as discussed above) that the TAPSof the various processors must be daisy chained together. A major advantage that JTAG provides is that all devices on a scan chain can be started and stopped together. The tightness of the synchronization might vary from nanoseconds to microseconds, depending on the JTAG model used by the processor. y ,1 nl 1 1Code can be executed on multiple processors, in lock-step fashion. 4 It allows the entire system to be paused, allowing examination of the state of each processor. I It prevents data from being processed (or lost) while examining the state of one of the processors. "sample. due" as "sample. due"as tO ' Synchronized debugging synchronized debugging is the ability to have one processor control the execution of another during the debugging process. This can be useful for several reasons: 0x00020005 0x00020005 in 1 The on-chip monitor and connector wiring model conform to the 5-wire JTAG interface. Thus, this model allows the DSP core to be daisy chained with other JTAG-compatible devices, enabling tightly-coupled multi-processor synchronized debugging. However, the on-chip monitor with JTAG communications model used in some DSP cores requires that the JTAG controller "peek" at the DR (data register) in the TAP. This means that there can be only one such DSP core in the scan chain, and it must be the last device in the scan chain. 21062 21062 Pr DSP core due to clock synchronization. If heterogeneous processors are connected together, the skew might be as large as 100 nanoseconds, due to differences in the start-up times. When debugging the application software on multiple target simulators, a special plug-in (called SimBridge) can be used to couple the simulators. This allows tight coupling of: shared memory access synchronization of processor clocks 4 simulation of other operations Each simulator runs independently, in its own host thread, However, the sequencing of all the simulators is 1 synchronized through the SimBridge module. The worst-case skew between the operations on the different processors is somewhat dependent on the simulator phase-accurate vs. cycle-accurate vs. instruction-accurate. However, it is typiI cally on the order of nanoseconds, or less. Multiple processor synchronized debugging ASPEX seamlessly supports debugging of multi-processor systems because it was designed from the ground up to do this. It is designed to manage multiple processors from a single instance of the debugger, and thus can provide the necessary synchronization and execution management. To keep from getting the computer's display too cluttered with debugger windows when debugging multiple processors, ASPEX allows the user to create attached or unattached windows - or a combination of the two. ng Si Cross-triggered breakpoints In many processors, an output signal (called a breakout signal) changes when the processor stops or hits a breakpoint. In many processors, this signal is can be programmed to assert or not to assert on a break or stop. Many processors also have an input trigger signal that will stop the processor (and put it into debug mode) when asserted. In some cases the trigger is under program control as well. the selected processors, the execution controls (buttons and commands) of a processor's Code Window automatically control all of the processors within that synchronization group. If attached to a processor (or to a thread) a window displays only information about that particular processor (or thread). If unattached, a window can display information about any processor (or thread) by clicking on the Increment Connection (Corm+) button in the Code window. le  When multiple cores (or chips) are crosswired correctly, it is possible to have one core (or chip) stop the others. For example, when one processor hits a breakpoint (or a watchpoint) it can stop all the other processors. With the proper hardware support in the cross connections, the debugger can be made to control two or more processors - configuring them to stop each other as desired. In some cases, an external mechanism (such as a TAP) can be used to program this triggering and control. 26 / DSP Eqineeri~lg/ Winter 2000 For example, given an array of eight SHARC DSP processors, the user may want to stop processors #3 and #7 when processor #I hits a breakpoint, while allowing the other processors to continue running. The registers and the matrix logic defined by the board or chip designer should allow this flexibility. Software requirements for cross-triggering of breakpoints If the control registers are memorymapped, the user can identifylcontrol them using ASPEX's Extended Target Visibility features, which are discussed below. The user can then set the appropriate or desired values before executing the application code. - an input signal pin to request that the processor enter emulation mode . ASPEX Extended Target Visibility ETV ASPEX provides Extended Target Visibility (ETV) by allowing board designers (or SOC designers) to configure . .-. Advanced Interrupt Controller ~FIQV.NFIQ.IRQV,NISQ.IRQID~ " -1- pending*' 'FIQP.SBIP,USOIP,USlIP.TCOIP,TClIP" TC2IP.WDIP,PIOIP.IRQOP,IIiQlP.IRQ2P" *Irq mask*" +IQM.SBIK, OSOIM, OS~IM,TCOIM,TC~IM- ~TC2IM.BDIM.PIOIM,I~OM.IRQlM,IRQ2M" star Window Figure 12 - y i Whether tightly-coupled, or loosely-coupled at the hardware (or simulator) level, when a user tells ASPEX to synchronize In either case, the signals to (or from) each processor should be tied together by a matrix that can be configured by memory-mapped control registers. These registers should allow the user to specify which processor signals which other processors. nl As there are communication delays between ASPEX and each processor (whether real or simulated) there will be unavoidable delays (skew) in the synchronization between the processors. These delays are usually on the order of milliseconds. Hardware requirements for cross-triggering of breakpoints Most processors have: As discussed above, to set up the target hardware for the tightly-coupled scheme to work, these pins must be cross connected between the various processors in the array. When processor chips are used in the target hardware, this cross connection is best accomplished with a CPLD or FPGA. In a core-based SOC design this external hardware can be avoided by adding extra on-chip control logic. tO Loosely coupled processors If the processors cannot be tightly coupled with cross wired signals, and if the SimBridge is not available to synchronize ' the particular simulators that are used, ASPEX can still use a loosely-coupled scheme for debugging multiple processors. It does this by sequentially issuing the appropriate commands to Start. Stop, or 1 Step each processor. When one processor hits a breakpoint, ASPEX sends a "stop" command to all the other processors. in I Pr Using this method, synchronization delays are reduced to picoseconds. This is extremely useful for tracking down hardto-find interaction problems within multiple processor systems. Using unattached windows can conserve display space, and make the debugging of multiple processors (or threads) less confusing. In cases where it is necessary to see information for two or more processors simultaneously, attached windows can be used. A user can also combine these techniques, since individual windows can either be attached or unattached. an output signal pin to indicate when the processor has entered emulation mode the ASPEX debugging environment with detailed, hardware-specific information about the target being debugged. Information such as memory mapping and wait states are specified graphically, and then saved in an ETV file, which can then be distributed to other users of the same type of target. Distribution of ETV files is especially useful when multiple RAMbased prototype targets are built prior to the production of a final ROM-based target. (ETV definitions also work with simulated targets.) The ETV feature allows a user to: Si When connecting to a target, the memory map is updated, based on the builtin knowledge about the standard part and the definitions specified in the ETV for the target. When an executable image is loaded to the target, the memory map is checked to confirm that the memory locations specified in the executable are valid. The executable is then loaded to target memory and then read back, to verify a successful load as well as the presence of working memory. The "auto" sections of the memory map are also updated to reflect the type of memory, based on the information in the executable. The memory map determines how memory is displayed (colored) in the Memory window. le ng define and display external memory blocks define access widths and wait states for external memory blocks define external peripheral control blocks define external memory-mapped registers, and bit fields within those registers termined by the application program. Basic memory map editing can be done directly through the Memory tab on the Managers window. More complex definitions and mappings can be specified using ETV. Cross triggering registers If the external control registers that control cross-triggering of breakpoints between processors are memory-mapped, then the user can specify them using the ETV facility. The user can then write the desired values into them before executing code. Pr Figure 12 shows that the user has defined an Advanced Interrupt Controller as a target peripheral (AIC in the left pane) and has defined the values and layout that will be dynamically added to the Register window. - ASPEX uses the Analysis window to show trace information that can be collected from any of a variety of sources. The information can be viewed in raw form, or in a higher level view. The Profiling view and the function entrylexit pairs are especially useful for performance analysis. y Memory map ASPEX also has built-in knowledge about the internal memory map for standard processor chips. It is aware of what internal memory blocks exist, and generally treats external or undefined memory as being de- *%it dig. UO,2 serial ports !h-200ktfz-IfrbttAJD&D/A Â¥- 40 MHz,2 ch,12-bi AID 4 ch 2 audio nl Figure 13 tO Registers As shown in Figure 13, ASPEX has builtin knowledge about the core registers and other internal registers found in standard processor chips. These are displayed in the Register window. When additional registers are defined using ETV, they can also be added to the Register window, where they can be displayed or updated, just like standard registers. Program Flow Trace ASPEX provides a trace facility that correlates instruction-levelexecution with the source code. Collection and display of trace information is triggered by a notation inserted into the source code display, using the GUI. Once the trace has been acquired, the user can step through the trace, either at assembly code level, or at the source code level. It is also possible to step backward through a trace, to identify the root cause of a problem. in When connecting to a target that has Extended Target Visibility definitions, the Register Window, Memory Map, Memory Window, and ASPEX internal access methods are all enhanced to include this information. The Analysis window allows the user to view the source code in the Code window, and the disassembled trace data (representing its execution by the target system) in the Analysis window. The user can then step forward or backward through the source code, simultaneously viewing the corresponding disassembled trace data. Trace data can be derived from several sources, including: LIIIG~ 1.1' on neader oe~viceCard DSP Engineering / Winter 2000 / 27 II J I1 I Si Instruction set simulators, which can record the program counter content for each instruction as the simulation runs. The simulator can also record the cycle count. A trace trigger can be used to control when to start recording, and a ring buffer can be used to show the last "n" instructions when the buffer fills. Hardware logic analyzers, which can be attached to the processor that is being debugged. Breakpoint-based tracing, which is intrusive, but allows for profiling of algorithms, and also allows the execution flow to be examined. On-chip trace buffers, which can show the last "n" branches, or can be intrusively setup to recordall branch flow. ! The Analysis window also allows filtering data access type name (for profiling) The data can also be sorted. The find menus allow searching for entries based on criteria such as time (and time range), Conclusion: An IDE that is designed specifically for embedded systems that employ multiple DSPs and microprocessors can provide engineers with a single, easy-to-learn and easy-to-use tool set. However, debug sup- port must be designed to adapt to available hardware resources, and to maximize debug fidelity when debugging multiple processors. Customized visibility into the target system exposes more of the system, and streamlines the debugging process. A trace and analysis capability simplifies debugging, and helps the application programmer optimize real-time performance. An open architecture allows developers to use best-of-class tools, as well support their own custom tools. With access to a comprehensive toolkit, application programmers can focus on developing product features that differentiate their product from the competition while meeting stringent time-to-market requirements. References: I'll Beacon Technoloev Partners. 4B Damonmill Square, Concord, ~ z 0 1 7 4 2el: , 978-371-3262, Fax: 978-371-3288 Dan Jaskolski is CEO and co-founder of Allant Software Corporation. Allant specializes in providing debugging solutions for embedded DSP, RISC and SOC systems. Prior to founding Allant, Jaskolski was with embedded tools supplier Microtec Research for 13 years as Executive Vice President and Chief Operating Officer. He holds a Bachelor's degree in mathematics from La Salle College and an MBA from the University of Pittsburgh. Ifyou have questions about this article, or i f you would like to know more about Allant's products you can contact Dan at: tO in Allant Software Corporation 1280 Civic Drive, Suite 206 Walnut Creek, CA 94596 Tel: 925-944-9690 Fax: 925-944-9612 Email: danj @allant.com Web: www.allant.com 2 10 0 00 xca 2 9 0 1 1 . 9 1 meminit i 1 Figure 14 - y nl ^ By using the Trace capability of ASPEX, a user can see the exact calling sequence of all functions, whether they are still on the stack or not. By clicking on the Func tab of the Analysis window (See Figure 15) you can view all function entries that occurred during the trace. You can then go to the source code of any such function just by clicking on it. Pr I' Function tracing Debuggers can provide the calling sequence of all nested functions that have not finished executing, because their calling address still resides on the stack. However, once the function completes, its calling address is no longer on the stack. le the raw trace code and data buffer elements W a disassembly view a function entry/exit view H an execution profile I Performance profiling Typically, a very large percent of a program's execution time is consumed executing in a small percentage of its code. Thus, the ability to identify "hot spots" can be very helpful when a program's performance must be improved. Figure 14 shows how a user can view performance profile information, such as a histogram of the percentage of execution time for each function. ng The Analysis window uses tabs for quick access to different views of the data. including: address (and address range), type of access, etc. Clicking on an entry will show that location in the Code window. Analy~sample-21065) = @SiinASO3:Sim [Unattached] Figure 15 28 / DSP Engineering /Winter 2000 Not Licensed for distribution. Visit opensystems-publishing.com/reprints for copyright permissions. © 2008 OpenSystems Publishing.
© Copyright 2025