Ben Davis XMAS: an open MIDI and sample-based music system Computer Science Tripos Robinson College May 9, 2004 Proforma Name: College: Project Title: Examination: Word Count: Originator: Supervisor: Ben Davis Robinson College XMAS: an open MIDI and sample-based music system Computer Science Tripos 2004 11302 Ben Davis Neil Johnson Original Aims of the Project To provide a good solution by which a composer can write and distribute music to be played by a machine, particularly as part of a downloadable computer game. .mid files depend heavily on the hardware or software available at the destination; .mp3, .ogg and similar files are too large in many cases; Amiga-based module files (e.g. .mod, .s3m, .xm and .it) are difficult to compose and the playback behaviour is not well defined. This project aimed to produce an open source system that would avoid all of these problems. Work Completed An XML-based structure that allows a .mid author to build his or her own instrument sounds using .wav files was designed. A software library for parsing the structure and rendering the music to a mono or stereo PCM sample stream was written. In particular this incorporated a real-time resampler with cubic interpolation and a MIDI player. The project is fairly mature and will soon be available at http://xmas.sf.net/. Special Difficulties None. i Declaration I, Ben Davis of Robinson College, being a candidate for Part II of the Computer Science Tripos, hereby declare that this dissertation and the work described in it are my own work, unaided except as may be specified below, and that the dissertation does not contain material that has already been used to any substantial extent for a comparable purpose. Signed Date ii Contents 1 Introduction 1.1 The Technology . . . . . . . . . . . . 1.1.1 MIDI . . . . . . . . . . . . . . 1.1.2 Samples and Streamed Audio 1.1.3 Amiga mod-based Files . . . . 1.2 The Problem . . . . . . . . . . . . . 1.3 The Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Preparation 2.1 Requirements . . . . . . . . . . . . . . . . . . . . 2.2 Initial Analysis of Requirements . . . . . . . . . . 2.2.1 Using Industry Standards . . . . . . . . . 2.2.2 Compression and Kolmogorov Complexity 2.2.3 Structure and Tweaks . . . . . . . . . . . 2.2.4 Same Output Everywhere . . . . . . . . . 2.3 Project Layout . . . . . . . . . . . . . . . . . . . 2.4 Further Analysis . . . . . . . . . . . . . . . . . . 2.4.1 Real-Time Playback . . . . . . . . . . . . 2.4.2 Third-Party Players . . . . . . . . . . . . . 2.5 Choice of Programming Language . . . . . . . . . 2.6 Refining the File Structure . . . . . . . . . . . . . 2.7 Final Preparations . . . . . . . . . . . . . . . . . 2.7.1 Core Requirements . . . . . . . . . . . . . 2.7.2 Extensions . . . . . . . . . . . . . . . . . . 2.7.3 Work Plan and Timetable . . . . . . . . . 2.7.4 Libraries and Code Used . . . . . . . . . . 2.7.5 Documentation Used . . . . . . . . . . . . 2.7.6 Code Management and Back-ups . . . . . iii . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 2 2 3 4 5 . . . . . . . . . . . . . . . . . . . 7 7 7 7 7 8 9 9 10 10 10 10 10 11 11 11 11 12 13 13 3 Implementation 3.1 Overview . . . . . . . . . . . . . . . . . . 3.1.1 Digital Signal Processing Modules 3.1.2 The State Tree . . . . . . . . . . 3.1.3 Generating the Music . . . . . . . 3.2 The XML . . . . . . . . . . . . . . . . . 3.3 Variables . . . . . . . . . . . . . . . . . . 3.4 Parameter Tweaks . . . . . . . . . . . . 3.5 The Modules . . . . . . . . . . . . . . . 3.5.1 Samples . . . . . . . . . . . . . . 3.5.2 Volume Envelopes . . . . . . . . . 3.5.3 Multiplexers . . . . . . . . . . . . 3.5.4 MIDI Mappings . . . . . . . . . . 3.5.5 Variable Compute Blocks . . . . . 3.6 The MIDI Playback Algorithm . . . . . 3.6.1 Overview . . . . . . . . . . . . . 3.6.2 State . . . . . . . . . . . . . . . . 3.6.3 Notes . . . . . . . . . . . . . . . 3.6.4 Algorithm . . . . . . . . . . . . . 3.6.5 Noteworthy Features . . . . . . . 4 Evaluation 4.1 Goals Achieved . . . . . . . . . . . 4.2 Evolution of the Plan . . . . . . . . 4.2.1 Generalisation . . . . . . . . 4.2.2 Filtering Whole Channels or 4.2.3 Reference Counting . . . . . 4.3 Milestones . . . . . . . . . . . . . . 4.4 Testing . . . . . . . . . . . . . . . . 4.4.1 General . . . . . . . . . . . 4.4.2 The Resampler . . . . . . . 4.5 Profiling . . . . . . . . . . . . . . . 4.6 Comments . . . . . . . . . . . . . . 4.6.1 Samples . . . . . . . . . . . 4.6.2 Volume Envelopes . . . . . . 4.6.3 MIDI Playback . . . . . . . 4.6.4 Flexibility . . . . . . . . . . 4.7 Problems Encountered . . . . . . . 4.7.1 STL Containers . . . . . . . iv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tracks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 15 15 16 16 17 17 19 20 20 22 24 25 25 25 25 26 27 28 28 . . . . . . . . . . . . . . . . . 31 31 32 32 32 33 33 34 34 34 34 38 38 38 39 39 39 39 4.7.2 4.7.3 XML Files . . . . . . . . . . . . . . . . . . . . . . . . . . . Code Size . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 40 5 Conclusions 41 Bibliography 43 A The Cubic Interpolation Function 45 B Some Example XML B.1 volenv.xml . . . . B.2 compute.xml . . . B.3 clarinet.xmi . . . B.4 pizz.xmi . . . . . B.5 general.xmi . . . B.6 Example Music . . 47 47 48 48 49 50 50 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C Demo CD Track Listing 52 D Project Proposal 54 v List of Figures 3.1 3.2 3.3 3.4 3.5 An example piece of music . . . . . A state tree, with variables . . . . . How variables are implemented . . The history buffer . . . . . . . . . . Two examples of volume envelopes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 18 19 21 22 4.1 4.2 4.3 4.4 How to filter a set of channels . . The visual resampler test. . . . . The three interpolation functions. Some profiling results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 35 36 37 vi . . . . Acknowledgements Many thanks are due to Neil Johnson, my Project Supervisor, for the guidance he offered right from inception up until the final deadline. Thanks also go to Dr Alan Mycroft, my Director of Studies, for his assistance with this dissertation. The Dissertation was written inside the skeleton structure provided by Dr Martin Richards’ How to write a dissertation in LATEX [8]. vii viii Chapter 1 Introduction Electronic music is an exciting field. Many people will insist that it is no substitute for conventional acoustic music, and they are right. Acoustic instruments— and human performance—are an enormous challenge to emulate. Electronic music is not a substitute: it is a complement. It is a whole new world, populated with many synthesisers and filters, each with its own distinctive character, and free of such constraints as the span of a pianist’s hand. Moreover, it can all be done in software inside any reasonably modern computer equipped with a sound card and speakers. I am a proficient pianist and have composed a great deal of professional quality music. This strong musical background enabled me to hear bugs in my project’s output and reason about sound quality. 1.1 The Technology The best known example of an electronic musical instrument is an electronic keyboard. This features a piano-like keyboard, a pair of speakers and a range of buttons, and is capable of producing many different instrument sounds, from approximations of acoustic instruments to sounds unlike anything heard in the acoustic or natural world. There are other types of electronic musical instrument too: MIDI controllers feature just the keyboard, while MIDI modules 1 feature just the synthesiser. These devices have to be connected together to function. Sequencers allow music to be recorded or programmed, and can then play it back if a synthesiser is connected. 1 The word ‘module’ has different meanings in different contexts. The intended meaning will always be clear in this dissertation. 1 2 1.1.1 CHAPTER 1. INTRODUCTION MIDI Enter the MIDI Specification [1]. It was created in 1983 by Sequential Circuits, Roland and several other major synthesiser manufacturers as a protocol to allow instruments to communicate with one another. There are 16 channels, numbered from 1 to 16; a device can respond to events on some channels and not others, or assign different instruments to different channels. Events such as the following can be encoded in a byte stream and sent between devices: 9c nn vv Note On 8c nn vv Note Off Cc pp Program Change Start note at pitch nn on channel c + 1 with velocity vv, a measure of how hard the key on the keyboard was hit. Stop note at pitch nn on channel c + 1. The velocity vv here is a measure of how quickly the key was lifted. Assign program pp to channel c + 1. A program is typically an instrument sound, but some devices use it for other purposes such as rhythm selection. The first byte is known as the status byte. There are many other commands, but the status byte is always in the range 80–FF, and data bytes are always in the range 00–7F. If the same command is used repeatedly, the status byte only need be specified once, and it becomes the running status. MIDI commands can also be saved with time stamps in a Standard MIDI File. Such a file has the extension .mid. XMAS uses the .mid file as a major component in a piece of music. General MIDI, an addition made to the standard in 1991, designates a standard set of instrument names for the 128 integers that can be used in a Program Change command. It further allows a single channel to be used for unpitched percussion (such as cymbal crashes), with standard percussive instrument names assigned to many of the 128 note values. On multi-part synthesisers, it is usual for Channel 10 to contain percussion. 1.1.2 Samples and Streamed Audio When we hear sounds, our ears are detecting longitudinal oscillations, or pressure waves, in the air. These waves cause our eardrums to vibrate. A microphone uses a diaphragm to detect the waves and convert them into a varying electric current. This current can be sampled at discrete intervals, a common rate being 44100 Hz, and stored in a .wav file. The individual samples—or sample points—usually 1.1. THE TECHNOLOGY 3 have 16-bit values. Many recordings have two channels, one for each speaker in a stereo set-up;2 here, the sample points are interleaved, and each left-right pair is called a sample frame. This system for encoding sound as a series of samples is called Pulse Code Modulation, or PCM . It is possible to store an arbitrary number of channels in a .wav file, but it is rare for more than two to be stored. A sample, in one sense of the overappropriated word, is a recording of a sound effect, a note, or a short, repeatable sequence of notes. It is typically stored in a .wav file. XMAS uses samples as a key component in instrument design. Lossy compression algorithms exist for PCM data. The best known is MPEG-1 Layer 3, or .mp3; others are Ogg Vorbis (.ogg) and Windows Media Audio (.wma). These typically achieve compression ratios of 12:1. It is possible to stream such compressed PCM data over a channel3 (e.g. for Internet radio), so these schemes are often referred to as streamed audio schemes. Supporting these in addition to .wav files was made an extension for reasons to be given later. Adjusting the speed of a sample—stretching or compressing it in the time axis—results in a shift in the frequencies present in the spectrum, perceived as a change of pitch. XMAS uses this to generate different notes from a single sample. It is not the best way to change the pitch of a note, but it is quick to do, and the same algorithm can enable playback at an arbitrary sampling frequency. 1.1.3 Amiga mod-based Files The Commodore Amiga was a pioneer with its ability to play up to four samples at once, at varying frequencies, without using the CPU. In 1987, Karsten Obarski released SoundTracker, a program for writing music to play on this hardware. Programs of its ilk (along with the composers who used them) became known as trackers, and the names of the files produced began with mod., short for module. This ‘extension’ moved to the end when the files were transferred to the IBM PC. As the PC’s audio and processing capabilities grew, trackers emerged for it, featuring more channels4 and more features and each with its own file format. The three best known are Scream Tracker 3, Fasttracker II and Impulse Tracker, saving files with extensions .s3m, .xm and .it respectively. The creators of these trackers have all documented their file formats. 2 Note that the term channel has been overloaded. It may refer to parts in a MIDI performance or to speakers as in this case. 3 A third meaning of channel ! 4 Analogous to MIDI channels. Conventionally, each channel can play one sample at a time, though Impulse Tracker gets around this. 4 CHAPTER 1. INTRODUCTION I have co-authored DUMB, a library for playing these four mod-based formats [5]. The experience, both positive and negative, from working on DUMB proved very useful in specifying, planning and implementing this project. 1.2 The Problem As a game developer by hobby and soon by profession, my main interest is in using a standard home computer to generate background music for computer games. However, doing this in such a way that the music can be distributed is quite a challenge. Every existing solution has a prohibitive disadvantage. .mid files. Most computers are built with the ability to play .mid files, in software if not in hardware. Since MIDI has so much industrial support, there is good hardware and software available, and .mid files are easy to produce. However, owing to the nature of MIDI, the output varies vastly from one system to another. Even General MIDI does not specify exact instrument sounds, only names: from experience I know that a change of MIDI device can be utterly devastating to a piece of music. Compressed audio streams. Why not produce MIDI files for one’s own MIDI set-up, record the output and encode to a compressed audio scheme such as .mp3? Many people do this, and for boxed games it is a fair solution. However, it is not a good choice for games made available for download on the Internet (including demos of commercial games): even a small selection of .mp3-format music could take a dial-up modem user hours to download. These formats have the further limitation that they are unstructured, and the game cannot make adjustments, e.g. to speed, on the fly. Amiga mod-based formats. These seem like a good solution, but there are two problems. The first is that producing mod-based music is very difficult. The second is that you can never be sure your music will play correctly. All the original tracker software was closed-source. Third-party players have been developed, but most of them misinterpret the data, some of them very severely. We made a serious effort to get it right in DUMB, but there are still errors. The player shipped with the popular Winamp media player is one of the least accurate, which poses a real problem to anyone releasing modbased music. Furthermore, there is a major third-party tracker, ModPlug Tracker, which differs significantly in several ways from the original tracker programs. As a result, there is no single correct way to play mod-based formats. 1.3. THE SOLUTION 1.3 5 The Solution This project set about creating a new solution to allow game developers to include music in their games. It defines a compact5 new file format in such a way that no new editor software needed to be written. It incorporates a library, libxmas, capable of loading and playing the music in real time. The file format is XML-based, so I decided upon the following file name extensions: .xmi: XML Instrument Definition. This defines one or more instrument sounds, most likely referring to .wav files or other .xmi files in the process. .xmm: XML Music. This specifies a .mid file, along with an XML Instrument Definition defining the instrument sounds to play it with. The definition may either be embedded or consist of a reference to an .xmi file. .xma: XML Music Archive. This contains zero or more .xmm files and all files they depend on, using appropriate compression for each part. It allows music to be consolidated into a single file. In addition, it is a nice take on Microsoft’s .wma format! Since .xma is the ideal final format for music, the project as a whole is called the XMA System, or XMAS for short. 5 Modulo the rather verbose XML glue, which is highly compressible. 6 CHAPTER 1. INTRODUCTION Chapter 2 Preparation 2.1 Requirements • It must be easy for a composer to produce music. • It must be possible for the composer to keep file sizes down to a minimum. • The music must be structured so as to allow tweaks on the fly. Such tweaks might include speed variation, muting of some instruments and instrument substitution. • The playback library must be capable of producing the same output on every computer. • It must be able to do so comfortably in real time on a typical home computer for a piece of music of average complexity. 2.2 2.2.1 Initial Analysis of Requirements Using Industry Standards Using industry standard file formats gives the user a choice as to what software and hardware to use. This is by far the best way of meeting the first requirement, since different people find different tools easy to use. It also enables me to use existing files for testing and demonstration. 2.2.2 Compression and Kolmogorov Complexity Fundamental to keeping file sizes down is Kolmogorov complexity. The Kolmogorov complexity of some data is the length of the shortest program capable 7 8 CHAPTER 2. PREPARATION of generating the data. This shortest program is known as the minimal description. In the worst case it will be a PRINT statement followed by the data to be output, but in the best case it can be extremely concise. The Kolmogorov complexity gives a lower bound for the size of losslessly compressed data (assuming the decompression algorithm is simple). In general, it is very difficult to meet this lower bound in any automatic process. The minimal description is usually a representation of the structure of the file, so to create it requires a good understanding of this structure. Music is highly structured. Some of the structure can be articulated. For instance, we think of a piece of music as a sequence of notes, and this is how music has always been written down, whether for live performers or for a computer. This is in fact just one of the many structures found in music. Themes and rhythms recur, harmonic progressions are often predictable, and even the scale itself is riddled with frequency ratios such as 2:3 (perfect fifth) and 3:5 (major sixth). Much of the structure is lost when the music is written down.1 However, the Western twelve-note scale, including its ratios, is implicit and can be implemented in the library, and the idea of a sequence of notes is preserved. This is enough to bring the file size down well below that of an .mp3 file, if simple instrument definitions are used. What if complicated instruments are used? I have an .xm file (mod-based) that is over 10 MB in size. An .mp3 version of typical quality would be smaller. It could still be beaten if the instrument samples were compressed, but this should be done with care. Samples are often looped, so that when playback reaches the end of a sample, it jumps back to a specified point (see Section 3.5.1). To avoid clicks, the point is chosen so that the resultant curve is continuous. .mp3 and friends are lossy, and there is no guarantee that such a loop would be preserved. In light of this consideration, providing instrument compression has been left to the extensions. 2.2.3 Structure and Tweaks Allowing the musician to specify the structure is the ideal solution. The musician, in collaboration with the other game developers, will know what will need to be tweaked and can structure the music accordingly. 1 This is not necessarily a bad thing. Live performers will never repeat a pattern precisely, and an unstructured MIDI stream will be able to capture the variation. Of course, much electronic music—particularly club music—sounds best when precise! 2.3. PROJECT LAYOUT 2.2.4 9 Same Output Everywhere Much computer music technology is unsuitable for use in this project. Examples are the DSP (digital signal processing) chip on the Sound Blaster Live! cards and VST plug-ins in Windows. The DSP chip allows a programmer to write an algorithm that does some DSP and download the algorithm into the sound card, where it will be applied to the output. This is unsuitable because it is specific to one sound card series. VST stands for Virtual Studio Technology, a standard created by Steinberg to allow effect plug-ins conforming to the standard to be used by any VSTcompatible program. It is very useful when the music is to be mastered and for example put on CD. However, it is specific to Windows, and a user will have a personal collection of plug-ins that other users may not have. If a composer used these plug-ins, the goal of being able to distribute music in a structured, compact form would not be met. There is a second, related, consideration. It must not be possible for an arbitrary music program to load a piece of music designed for XMAS and play it back incorrectly. This precludes, for example, the possibility of using a .mid file for the whole piece of music. The standard for .mid files allows proprietary data to be stored in units called chunks, but states that any unrecognised chunk should be skipped over; if XMAS used .mid files with proprietary chunks, all standardcompliant programs would succeed in loading music designed for XMAS and would proceed to play it using the wrong instrument sounds. 2.3 Project Layout The considerations so far led me to decide on a system that loads XML-formatted data containing references to .wav and .mid files. The majority of a musician’s work goes into producing the .wav and .mid files, while writing the top-level XML structure is trivial by comparison. However, since the top-level file is in a newly defined format, no existing software will inadvertently be able to load it and generate the wrong output. The component .mid files can still be loaded into any program, but this is dwarfed by the problem of having the music in many separate files; the .xma format will solve both problems (see Section 1.3). 10 2.4 2.4.1 CHAPTER 2. PREPARATION Further Analysis Real-Time Playback The existing mod-based players and software MIDI synthesisers give a good idea of what a typical computer can do in real time. Predominantly sample-based players have no trouble keeping up. Even simple filters and reverberation are no problem nowadays. However, it is important to be able to apply any such effects to multiple MIDI channels in one go. Applying them to one channel at a time, when the same output could be achieved with a blanket effect over the sum of all channels’ output, would be an unacceptable waste of processor time. Since this engine will be used in games, which must do their own processing, it is important to get the processor usage as low as possible. 2.4.2 Third-Party Players Those who wish to develop third-party players for XMAS’s files should not be made to guess, probably wrongly, how to play the files (recall the comments about Amiga mod-based formats in Section 1.2). The playback code I develop in this project will be freely available, and will serve as a reference. 2.5 Choice of Programming Language The software is required to run quickly. However, it should also be flexible and extensible. C++ has been designed to meet both of these goals, providing for highly structured programming while still allowing the programmer to sacrifice some internal safety and structure in order to gain speed. As an industry standard language, it is a good choice for this project. Most of my experience before this project was with C. Whilst I was not greatly familiar with the syntax of C++, I knew the language’s capabilities well enough to plan this project and see it through. 2.6 Refining the File Structure XMAS will use a tree structure for the music. At the leaves of the tree will be samples. Above these will be instruments, specified in XML and responsible for such activities as choosing different samples for different notes, controlling the fade-out when a note stops, and applying any desired effects. At the root of the 2.7. FINAL PREPARATIONS 11 tree will be a node, also specified in XML, parenting a set of instruments and containing a reference to a .mid file. Furthermore, the above types of node will be unified so that they can be strung together effortlessly in any layout. The term ‘DSP module’ will refer to any node, since a node’s purpose is typically to do some digital signal processing. In particular, this enables short MIDI sequences to be used as instruments in larger ones. 2.7 2.7.1 Final Preparations Core Requirements The project needs to show that it can do the job it is designed for. It may not fulfil all the requirements listed in Section 2.1, but it must be evident that the requirements have been considered and could be fulfilled with a small amount of work. By the end, I expect to have the project playing a piece of music reliably, accurately, and fairly efficiently. 2.7.2 Extensions The following two features will have to be consigned to extensions simply because of the amount of work they would involve: • Compressed samples are an extension. As mentioned in Section 2.2.2, lossy compression should not be done blindly. Doing it properly could develop into a project in its own right. Lossless compression is of limited benefit, and is not important enough to be a core requirement. • The .xma format will take some careful planning, and so has been left as an extension. Other possible extensions include extra DSP modules (such as filters, distortion and echo), click removal for when samples start, stop and loop (not for clicks in the actual sample data), support for surround sound, a GUI for editing and testing .xmi and .xmm files, a stand-alone player, and XMMS and Winamp plug-ins. 2.7.3 Work Plan and Timetable Having planned the project to the extent that I felt ready to begin writing code, I decided upon the following timetable. A spiral development model was adopted, 12 CHAPTER 2. PREPARATION with an aim to complete a design-implement-test cycle within each work package. All dates are Fridays. 24 Oct – 7 Nov 7 Nov – 28 Nov – 19 Dec – 9 Jan – 30 Jan – 27 Feb – 2.7.4 Preliminary research. In particular, read up on XML and find suitable documentation and libraries. 28 Nov Specify DSP module interface. Implement reference-counted .wav loader and sample player. 19 Dec Implement volume envelope module. Specify .xmi format. Implement reference-counted loader. Create an .xmi file for testing. 9 Jan Implement reference-counted .mid loader. Specify .xmm format. Implement reference-counted .xmm loader. Create an .xmm file for testing. 30 Jan Write the Progress Report. 27 Feb Implement MIDI sequence player and .xmm player. 19 Mar Implement command-line player. Create a more involved piece of music for testing and, later, demonstration. Libraries and Code Used XML Parsing XML 1.0 XML 2.0 C C++ libxml libxmlpp libxml2 xmlwrapp As XML is backwards-compatible and the project is using C++, I investigated xmlwrapp [3]. It had clear documentation and a good API, so I decided upon it. Expression Parsing and Evaluation Three solutions for parsing expressions were considered: • Ollivier’s Mathematical expression parser in C++ (mathexpr) [6]. • The L math processor (lmp) [7]. • Constructing my own with flex and bison. lmp is written in C, and lacks object-oriented structure. Parsing an expression consists of setting global variables to point to the expression and calling a function. Furthermore, there is a single table of variables, stored in global variable. 2.7. FINAL PREPARATIONS 13 This kind of API is not conducive to the object-oriented structure I want, and it is certainly not thread-safe. The same problem arises with flex and bison: global variables are heavily used. By contrast, mathexpr is written in C++ and has a good object-oriented structure. The site presented a worrying description and example of the parser’s behaviour, but a test proved that these were incorrect and the parser behaved as one would expect. mathexpr treats concatenation of variable names as multiplication (e.g. xy is x × y), so I performed another test to see if variable names longer than one character would be accepted. They were, with the restriction that a name could not consist of an existing name with a suffix added (so ‘note’ and ‘notevelocity’ could not coexist). I considered this an acceptable limitation and decided to use mathexpr. Since mathexpr is not a proper library with an installation procedure, I incorporated it into libxmas’s code tree. (By contrast, a user who wishes to compile libxmas will have to obtain and install xmlwrapp first.) 2.7.5 Documentation Used The .wav and .mid Formats Files documenting the .wav and .mid file formats were found at http://www. wotsit.org/. The .wav documentation was very thorough. The .mid covered only the skeleton file structure including how to load a MIDI byte stream for each track, but did not contain sufficient documentation on the contents of the byte stream. MIDI http://www.borg.com/~jglatt/tech/miditech.htm covers two important parts of the MIDI Specification in great detail. The first is the MIDI messages (e.g. Note On) that may be sent between devices or stored in .mid files. The second is the .mid file and the meta-events that are stored in it but are not MIDI messages per se (more on this later). 2.7.6 Code Management and Back-ups Since I have experience with CVS, I set up a CVS server on my system. I also wrote a script to archive the repository and upload it to Pelican, the University’s 14 CHAPTER 2. PREPARATION back-up service, keeping one old copy each time. Finally, I set up a cron job so that the script would run every day at 4:00 a.m. Chapter 3 Implementation 3.1 Overview 3.1.1 Digital Signal Processing Modules Figure 3.1 shows an example of a piece of music as defined by XMAS. Sample cymbals.wav MIDIMapping duet.mid Instruments Sample GeneralMultiplexer LookupMultiplexer channel=10,note=57 70 . . . channel=2 note 50 . . . 69 channel=1 . . . 49 harphigh.wav Sample harpmed.wav Sample harplow.wav VolumeEnvelope Sample Sustain point flute.wav Subject loop="on" Figure 3.1: An example piece of music Each node in the tree is a digital signal processing module or DSP module, and holds music data but no playback state. I refer to this tree as the data tree. In the library, the base class DSPModule abstracts all types of node. 15 16 3.1.2 CHAPTER 3. IMPLEMENTATION The State Tree When the music is to be played, a state tree is constructed alongside the data tree. The DSPModule class has a getPlayer method which constructs and returns a player object of a type derived from DSPModulePlayer. This object keeps a pointer to the DSPModule, and plays the music, single note or other sound the DSPModule represents. Building the state tree involves the construction of a player for the data tree’s root node. In this case a MIDIMappingPlayer is constructed. There is an important difference between the data tree and the state tree. A player does not necessarily construct one child for each corresponding child in the state tree. It may construct zero or many children, and it may construct and destroy children dynamically. A MIDIMappingPlayer has, at any given moment, one child for each note that is playing. Some modules are simpler; the VolumeEnvelope always constructs a single child, and the two multiplexers will construct either one or none. 3.1.3 Generating the Music Once the state tree is set up, we play the music by requesting PCM data from the root. The root MIDIMappingPlayer will request PCM data from its children, the currently playing notes, and generate output in which all the notes can be heard. Each child player will do something similar. The VolumeEnvelopePlayer will request data from its child and provide a processed version as its output. The SamplePlayers will generate their output from the PCM data stored in the Sample objects. When multiple sounds occur at the same time, the pressure waves from the individual sources are added together at each moment. Mixing PCM streams therefore involves one addition operation per sample point. Since this is so common in music, it was decided that a DSP module would always add into a buffer passed to it. The DSPModulePlayer class has a method called mixSamples(). It takes a pointer to a buffer of floats, a count indicating how many sample frames (recall Section 1.1.2) are requested, and a reference to a StreamParameters struct containing the sampling rate and the number of channels (speakers). At present, the number of channels is always 1 or 2. The API allows for the possibility of more channels in the future. All DSP modules are expected to be able to work with an arbitrary sampling rate. This is not usually done in music production, mainly because filters are dependent on the sampling rate. However, it is common in real-time situations, 3.2. THE XML 17 since lowering the sampling rate reduces the processor power required. It is possible to design filters to work with an arbitrary rate. The mixSamples() method returns the number of sample frames generated. Generally this will be the same as the number requested. However, many DSP modules are designed only to generate a finite quantity of data, and when a player has generated them all, it will use the return value to tell the parent player—or the user of the library, who manages the root player—that it has finished. 3.2 The XML Classes derived from DSPModule generally have a constructor or an initialiser that takes a (root) XML element and sets up a module tree from it. The base class DSPModule contains a static member function readModule() for identifying an XML element by its name and calling the appropriate constructor. It also recognises the element <external>, which causes a module to be read from another XML file. This mechanism is used by all modules that want to load children. The library does not distinguish between .xmi and .xmm files (defined in Section 1.3). The distinction is left to the user. As an example, the MIDIMapping in Figure 3.1 on page 15 might be stored in an .xmm file that refers to an .xmi file for the GeneralMultiplexer. This .xmi file may defer to more .xmi files for the individual instruments. 3.3 Variables Instruments have to be able to play at arbitrary pitches and velocities1 . Many filters have cut-off frequencies, resonance levels and the like, and these need to be able to be controlled by the MIDI sequence. MIDI has a plethora of parameters that could be used for this. It would also be nice if we could make a filter’s parameters, or perhaps the speed of a volume envelope, depend on pitch or velocity. Finally it would be nice to have mechanisms to control the tempo (speed) at which a MIDI sequence is played, or transpose the sequence. The list goes on, and clearly a great deal of flexibility is desired. XMAS uses a system of variables to achieve this flexibility. The system is illustrated in Figure 3.2. The MIDIMappingPlayer passes a set of variables to the constructor for each child. Three variables are shown in the diagram, but 1 Note velocities, or how fast a key was depressed; used to effect what classical musicians know as dynamics, and often just interpreted as volume. 18 CHAPTER 3. IMPLEMENTATION MIDIMappingPlayer channel = 10 note = 35 velocity = 127 channel = 2 note = 72 velocity = 127 channel = 10 note = 57 velocity = 127 GeneralMultiplexerPlayer Match: none; no child constructed GeneralMultiplexerPlayer Match: channel=2 GeneralMultiplexerPlayer Match: channel=10,note=57 LookupMultiplexerPlayer Look-up index: note=72 SamplePlayer SamplePlayer Playing: cymbals.wav Playing: harphigh.wav Figure 3.2: A state tree, with variables many more exist in reality. The GeneralMultiplexerPlayers use the variables to decide what child to construct, if any. They pass the variables on to the child constructor. This is important since the LookupMultiplexerPlayer needs to consult them to decide which SamplePlayer to construct and the SamplePlayers need to know what frequency to play the samples at. Figure 3.3 shows how variables are set up. Each variable is encapsulated in a Variable object, which incorporates a reference to a value of type double. A Variables object manages the list of Variable objects required by a player, and also keeps a pointer to the Variables object passed down by the parent. For modules that do not need to create any variables of their own, a Variables object need not be constructed. A constructor will keep a pointer to each double it needs. For instance, VolumeEnvelopePlayer will store a pointer to rate. Pointers to Variable or Variables objects are not stored, so, conveniently, these objects can safely be destroyed on exit from the constructor. The use of pointers enables the MIDIMappingPlayer to vary the pitch of a note, or any other variable, over time. The various mixSamples() methods simply dereference such pointers each time they are called. The VariableComputeBlockPlayer (see Section 3.5.5) recalculates rate before every operation involving its child VolumeEnvelopePlayer, in case the note variable has changed. 3.4. PARAMETER TWEAKS 19 MIDIMappingPlayer Variable channelVar; double channel; Name: "channel" Value: 2.0 Variables variables; Parent variables Value Our variables double note; Variable noteVar; Value: 72.0 Name: "note" Value double velocity; Value: 127.0 Variable velocityVar; Name: "velocity" Value VariableComputeBlockPlayer Variable rateVar; Name: "rate" double rate; Value: 2^((note-60)/12) Value Variables variables; Parent variables Our variables VolumeEnvelopePlayer Parent variables Figure 3.3: How variables are implemented 3.4 Parameter Tweaks Many modules have built in the ability to adjust their output volume. All modules will be able to cope with the arbitrary sampling frequency, and this capability can also be used to vary the pitch (though crudely). There are situations in which a parent module would like to be able to tap in to these capabilities: for example, a MIDI player will want to tell notes (the child module players) to respond to something like MIDI volume, but ideally we want the note generator modules not to be unnecessarily specific to MIDI, in case an alternative to MIDI is added one day. All DSP modules have a method pushParameterTweak() which takes a parameter name and a double. The name will be something like “volume” or “delta”2 . The double will be multiplied with the current value for the given parameter, after the current value is saved on a stack. Later, a call to popParameterTweak() will restore the old value. 2 I use the term delta to refer to a frequency ratio. Its meaning, and the choice of terminology, will be clarified in Section 3.5.1 on samples. 20 CHAPTER 3. IMPLEMENTATION 3.5 3.5.1 The Modules Samples The first module implemented for libxmas was the Sample module. It encapsulates a mono or stereo sample loaded from a .wav file, which can be played forwards or backwards. A section of the sample can be looped, or played repeatedly. Two types of loop are available: straight loops, in which the position pointer jumps from one end of the loop to the other, and bidirectional loops, in which the direction changes each time the position hits an endpoint. Bidirectional loops double the period of a sample loop without doubling the size of the data; the longer the period, the less likely it is that the listener will detect the repetition. They are also very useful for effects that sweep up and down periodically. The output from the SamplePlayer consists of the sample played at any volume and any speed. Adjusting the volume is simple: we multiply each sample with the volume value. The real art of this module is in the code for adjusting the speed. This process is known as resampling. A value named delta specifies the change in frequency. If delta is 1, the sample plays as it was recorded. If delta is 2, the sample will play twice as fast and be heard an octave higher. If we assume the sampling rates of the sample and the output are equal, then delta specifies how many samples to advance in the source for each sample in the destination. It is added to the position pointer each time around the resampling loop. This is why it is called “delta”. The name is used throughout XMAS in the more abstract sense of speed/frequency adjustment. The sampling theorem states that when data are sampled, no frequency at or above half the sampling rate can be represented. This cut-off point is known as the Nyquist frequency 3 . If we attempt to represent a frequency x Hz above the Nyquist frequency, it will become a frequency x Hz below the Nyquist frequency in a phenomenon known as aliasing. Since this is an arithmetic transformation and music is based on frequency ratios, aliasing will pollute the spectrum and reduce quality. Although it is important to realise that resampling is a huge discipline and the term often suggests a thoroughly researched algorithm, the methods XMAS applies are relatively crude and well known. Interpolation is used, cutting down on aliasing to an extent sufficient for most applications. 3 Named after Harry Nyquist, author of the sampling theorem. 3.5. THE MODULES 21 Most real-time resamplers keep a pointer into the sample data and effect interpolation by looking at samples before and after the current one. They have to take care not to overrun, and they cannot see transparently across flow changes such as loop points. This can create an audible click each time a sample loops, even when the continuity across the loop points is perfect. Loop mode: bidirectional Loop start +5 5 0 0 1 2 3 4 8 6 9 10 11 Loop end 12 7 -5 #9 #10 #11 1. 0 0 0 4. 1 Initial history buffer pos = 0 #0 2. 0 0 1 -4 -5 -5 -3 Just before looping pos = 12 #10 #11 #12 #12 5. 4 -5 -5 -3 -3 A bit later pos = 1 3. 0 Just looped! pos = 12 #0 #1 1 4 #11 #12 #12 #11 5 6. Later still pos = 2 #10 #9 7. #8 -5 -4 -1 -5 -3 -3 -5 A bit later after looping pos = 11 1 Playing backwards pos = 7 Figure 3.4: The history buffer The XMAS library uses a history buffer, which holds the last three samples seen before the one pos points to. The concept is illustrated in Figure 3.4, which shows the state of the history buffer at several points during playback. Between them, the history buffer and the sample indicated by pos constitute a run of four samples, and the current playback position is considered to be between the second and third. A subpos variable holds the fractional part of the position, a value indicating how far between the second and third samples we are. When it reaches 1, it is reset to 0, pos is incremented and the history buffer is updated. This method provides perfect continuity in all cases, but as presented it is hardly efficient. The library seamlessly switches to a conventional algorithm shortly after starting and after each change of flow. Three interpolation functions are provided. One of them is, ironically, the non-interpolating function, which always takes the second sample verbatim. The output is coarse and suffers from aliasing, but it can be done quickly and is 22 CHAPTER 3. IMPLEMENTATION reminiscent of sounds from old, dearly loved computer systems such as the Commodore Amiga. The second function does linear interpolation between the second and third samples. This is a fair compromise, doing only a little more work than the first function in exchange for considerably less aliasing. The third function does cubic interpolation. All four samples are taken into account. The tangent to the curve at the second sample is parallel to a line joining the first and third samples, and a similar property holds at the third sample. This ensures that the curve and its first derivative are continuous, providing optimum sound quality for a function of this complexity. Appendix A derives the equations and presents an optimisation that uses look-up tables to eliminate much of the computation. The library provides a global variable via which the programmer can set a default interpolation function. The instrument designer can override this for a specific sample by specifying a minimum and maximum quality. 3.5.2 Volume Envelopes Volume Volume 1 1 0 0 1 Sustain point Time seconds 0 0 0.05 Loop start 0.10 0.15 Time seconds Loop end Figure 3.5: Two examples of volume envelopes Behaviour A volume envelope is a graph of volume against time. The VolumeEnvelope module models this graph as a series of linearly connected volume-time pairs with time increasing monotonically, and each VolumeEnvelope object has one child, known as the subject. The VolumeEnvelopePlayer constructs one player for the subject, and applies the envelope to the player’s output. In the case of the right-hand envelope in Figure 3.5, the VolumeEnvelopePlayer’s output will be silence initially, full volume at 0.05 seconds, and silence again between 0.1 and 0.15 seconds. 3.5. THE MODULES 23 A VolumeEnvelope can also manage two loops, which are each given in terms of a starting node and an ending node. These can be the same node if it is desired that the envelope freeze at that node (see the left-hand example). One of the loops is the sustain loop, and is obeyed only as long as the note is held.4 The other loop is obeyed at all times. In Figure 3.5, the left-hand envelope fades a note in quickly, holds the note at full volume, and then fades it out pseudo-exponentially; this is quite usual, and is used by the envelope applied to flute.wav in Figure 3.1. The right-hand example is a lot more unusual, and potentially rather annoying! When a volume envelope terminates at zero volume (as happens after one second in the left-hand example if the note is released immediately), the VolumeEnvelopePlayer will terminate its output (recall Section 3.1.3). This is important. The flute.wav Sample in Figure 3.1 is set to loop indefinitely, but the VolumeEnvelope above it can terminate the output when the note has faded out, telling the MIDIMapping that the player can be destroyed. If this did not happen, the note would persist in memory and waste resources. The VolumeEnvelopePlayer is influenced by a variable called rate. If rate is 1, the output is as expected. If rate is 2, the position in the envelope will advance twice as fast, so the first envelope would elapse in half a second for notes released immediately. It is sometimes useful to compute rate from note or delta using a variable compute block (Section 3.5.5). Implementation The parameter tweak system allows a module to request of a child an adjustment that is constant for a while, but does not allow for gradual changes. Correspondingly, the VolumeEnvelopePlayer will try to use tweaks only when the volume is not changing (as while sustaining in the left-hand example). In this case, it can ask the subject to mix samples into the buffer that was passed to itself. However, if the volume is changing (or if a tweak fails), the following steps are taken: • a temporary sample buffer is allocated; • the buffer is filled with zeros; • the subject player is asked to mix its samples into the buffer; • the VolumeEnvelope mixes the contents of the temporary buffer into its own output buffer, applying the gradual change in the process; 4 There is a variable to indicate when a note is held. See Section 3.6.3. 24 CHAPTER 3. IMPLEMENTATION • the temporary buffer is freed. While this produces perfect output, it is not very efficient. I shall return to this in the Evaluation. 3.5.3 Multiplexers Multiplexers are used to select an instrument sound according to the program variable, and to distinguish Channel 10, the percussion channel, from other channels by using the channel variable.5 They are also used to select a sample according to the note variable, since the method libxmas uses to create different notes from one sample is crude and only works well over small note ranges. Review Figure 3.1 on page 15 for some examples of multiplexers. A multiplexer object manages several subject modules. Each time a player is constructed, one subject module is chosen and a single player is constructed. All subsequent operations on the multiplexer player are deferred to the subject player. There are two types of multiplexer: GeneralMultiplexers and LookupMultiplexers. They differ in how they choose a subject module. GeneralMultiplexers scan the modules in reverse order and the first matching module found is used. Each module is given with a set of variable ranges—for instance, one subject might be given with the two ranges 50 ≤ note ≤ 63 and 0 ≤ velocity ≤ 9—and the module matches if all range variables are defined and within the ranges. The extremes are always integers and the variables are rounded to the nearest integer before the comparisons take place. This is a linear search and will not scale well, so a large number of subject modules is not recommended. LookupMultiplexers specify an index variable and manage a table of subject modules. The index variable is rounded to the nearest integer and used as an index into the table. In addition to the table, there is a pointer to a module to be used for values below the table’s lower bound, and another for values above the table’s upper bound. LookupMultiplexers are more limited than GeneralMultiplexers, but the look-up is a constant-time operation. They are perfect for selecting an instrument using the program variable. 5 The example in Figure 3.1, page 15, chooses instruments according to channel instead of program. This was done so the choice could be combined with the step of identifying the percussion channel, but it is not recommended in real applications. Appendix B.5 shows the more usual approach. 3.6. THE MIDI PLAYBACK ALGORITHM 25 Both types of multiplexer can define one or more variables for use in making the decision. These are computed for the selection process only, and are not passed down to the child constructor. 3.5.4 MIDI Mappings MIDIMapping objects are very simple. A MIDIMapping manages the contents of a .mid file, which consists of a few values and a set of byte arrays (the tracks). A subject module defines all the instrument sounds. The MIDIMapping also stores some playback control parameters that are not part of the .mid file, such as extensive looping information and information on what to do when a Note On event is received for a note that is already playing. The MIDIMappingPlayer, comprising an entire MIDI playback algorithm, is a lot more involved! It is described in full in Section 3.6. 3.5.5 Variable Compute Blocks A VariableComputeBlock module has one child. It allows new variables to be defined in terms of existing ones, and these are made available to the child. The mathexpr package is used to evaluate the expressions corresponding to the variables. The new variables are set up and computed when the VariableComputeBlockPlayer is constructed. They are calculated again every time the VariableComputeBlockPlayer is used. As such they are updated along with the variables they depend on. It is possible to override an existing variable, at the same time using the existing variable to compute the replacement. The main use for the VariableComputeBlock at the moment is to control the rate variable for a VolumeEnvelope. 3.6 3.6.1 The MIDI Playback Algorithm Overview There are two common types of .mid file. The first, Type 0, contains one track. The second, Type 1, contains a number of tracks which are to be played simultaneously and synchronously. There is a third type with independent tracks, but it is uncommon and libxmas does not support it. To keep it simple, libxmas treats Type 0 as a special case of Type 1. A track is merely a sequence of MIDI events and meta-events. Each is prefixed by a delta-time representing the amount of time separating the event from the 26 CHAPTER 3. IMPLEMENTATION last. In a conventional MIDI set-up, a sequencer does the timing and sends the MIDI events to the synthesisers while interpreting the meta-events itself. .mid files adopt the classical concept of beats and subdivide them into deltatime ticks. The number of ticks per beat can be specified in the file. By default, there are 120 beats per minute, but a meta-event can override this, specifying the tempo as a number of microseconds per beat (though the value presented to a user is usually in beats per minute). Tracks are a logical subdivision of music. It is up to the author of a .mid file to decide what to put in each track. The sequencer will process all tracks simultaneously and dispatch events to the synthesisers, but information about which track an event came from is lost. Most events specify a MIDI channel (see Section 1.1.1). There is often a correspondence between tracks and channels, but they are distinct concepts not to be confused. The tracks exist in the sequencer, and the channels are distinguished by the synthesisers. XMAS’s MIDIMappingPlayer behaves like a sequencer connected to a multipart synthesiser. Rather than using hardware timing, it does timing by emulating the synthesiser for precise amounts of time and changing state in between runs of emulation. In more concrete terms, it effects an elapsed time by requesting an appropriate number of samples from the synthesiser. There is no asynchronous behaviour, and the process is deterministic. The algorithm described herein is simplified for conciseness, though some of the extra complexity is alluded to. 3.6.2 State For each track, the MIDIMappingPlayer maintains three values: • a position counter for the track, which points to the event bytes (after the delta-time) for the next event to be processed or holds the value -1 for tracks that have finished playing; • the number of delta-time ticks to wait before the next event should be processed (the wait value); • the running status byte (recall Section 1.1.1). For each channel, the MIDIMappingPlayer stores a list of all the notes currently playing. A note consists of a pointer to a DSPModulePlayer along with some pertinent variables (more on this later). Some variables that are global to the channel are also stored. These include 3.6. THE MIDI PLAYBACK ALGORITHM 27 • the pitch wheel position, used on many devices to bend all notes up or down in pitch; • the channel aftertouch, a measure of the pressure being applied to the keys on an electronic keyboard, averaged over all depressed keys; • the current program, generally used to select an instrument; • a multitude of MIDI controller 6 values, such as the channel volume, the stereo pan (left-right positioning) and the modulation wheel position. Most of these variables are made available to the instruments, but a few are processed in the MIDIMappingPlayer itself. In particular, the channel volume is applied to all notes using volume parameter tweaks, and the MIDIMappingPlayer takes it upon itself to calculate the final frequency for each note, incorporating pitch bend and other factors into the computation. As stated in Section 3.5.4, a single DSPModule is used for all the notes. It is likely that the DSPModule will include a LookupMultiplexer switching on the program variable, but it may choose to use the program variable for something else, or not to use it. Finally, the MIDIMappingPlayer also stores some global state, such as the tempo, the number of times the music has left to loop, and a measure of how much output to generate before the tracks’ wait values will be correct. This last measure is henceforth referred to as the global wait value, and is given in extremely fine units of 232 per second. 3.6.3 Notes As stated, a note consists of a DSPModulePlayer and some pertinent variables. Some of the variables are note, velocity, and held. The held variable is 1 initially and goes to 0 when the Note Off event is encountered. Each instrument should be designed to respond to the held variable in an appropriate manner. The MIDIMappingPlayer never cuts notes off, so DSPModulePlayers should terminate themselves to avoid a build-up of old notes. At present, VolumeEnvelope is the only module that responds to held. An instrument could incorporate a VolumeEnvelope designed to take the volume down to 0 after the note is released, or it might consist of a sample configured to play once without looping. 6 This is distinct from the MIDI controllers mentioned in Section 1.1. This kind of MIDI controller is simply a playback control parameter that can be set by a MIDI event. 28 CHAPTER 3. IMPLEMENTATION 3.6.4 Algorithm The playback algorithm is essentially a form of discrete event simulation. When the MIDIMappingPlayer is constructed, all the variables are initialised and the track pointers are set up. The tracks’ wait values are set to 0, and then the initial delta-time for each track is processed. Processing a delta-time involves adding the delta-time to the track’s wait value and then advancing the track pointer to the following event bytes. Finally, the processMIDI() method is called. For each track whose wait value is 0, processMIDI() processes MIDI events until it finds a nonzero delta-time tick. Then it determines how long to wait before another MIDI event will be due on any track, subtracts that amount of time from all tracks’ wait values, and adds it to the global wait value, scaling as necessary and factoring in the current tempo. Each time the MIDIMappingPlayer’s mixSamples() method is called, the following steps are undertaken. (It may be helpful to refer back to the description of mixSamples() in Section 3.1.3, page 16.) 1. First, we use the global wait value and the sampling rate to determine how many samples to generate. If this number is greater than the count passed to mixSamples(), we reduce it accordingly. 2. Each note (on each channel) is asked to generate that many samples. 3. The global wait value is reduced in accordance with the number of samples generated. If it reaches zero or goes negative, we call processMIDI() until it goes positive again. (It would always go positive straight away unless there were many delta-time ticks to a sample, which is very unlikely, but the while loop does no harm.) 4. If we have not yet generated all the samples that were requested by the caller, we advance the buffer pointer and return to Step 1. 3.6.5 Noteworthy Features Looping The following loop control parameters are specified in the MIDIMapping: • How many times to play the music. If this is 0, the music will loop indefinitely. 3.6. THE MIDI PLAYBACK ALGORITHM 29 • Where to loop back to. This can be used to avoid playing an introduction after the first time. • Whether to stop outstanding notes, reset variables or both at the end of the music. • An optional delay to be inserted at the end before looping. A period of silence at the end is an important part of some music, and it is often omitted in .mid files. • A flag indicating whether to wait for a whole beat to elapse before looping. Some .mid files end as soon as the last Note Off event is seen, which may be a little too early to loop. Looping on a beat is most likely to sound correct. Duplicate Note Handling The MIDIMapping lets the musician specify a duplicate note policy. This comes into play when two Note On events are received for the same note on the same channel without an intervening Note Off event. The following options are available. Except in the case of preempt, a second Note On will not have any effect on the first, and the notes will play together. stack (default). Each Note Off will stop the most recently started note that was started before the current delta-time tick. If no such notes exist, it will stop the last note from the current tick. This is useful when one track starts a note at the same time another track stops it, but the former track is processed first. strictstack. Each Note Off will stop the most recently started note, including any started on this delta-time tick. queue. Each Note Off will stop the note that was started earliest. preempt. A second Note On will stop the first note (but allow it to fade out). stopall. Notes can accumulate, but each Note Off will stop all notes. Portamento The MIDI Specification provides for a feature called portamento, but I have found that neither my Creative Labs Sound Blaster Live! card nor the Yamaha Portatone PSR-550 electronic keyboard obeys the relevant MIDI controller values. 30 CHAPTER 3. IMPLEMENTATION Portamento is loosely defined as sliding pitch, as exemplified by the clarinet at the beginning of Gershwin’s Rhapsody in Blue. XMAS implements it by keeping a single note playing and having this note slide to the new pitch every time a Note On event is seen. Note Off events are registered but not acted upon until portamento is disabled. In order to effect the slide, libxmas bisects the buffer recursively until a ‘granularity’ measure becomes small enough. The threshold was chosen aurally. The granularity measure was defined as the product of the step length and the size of the step in semitones, since increasing either of these will make the steps more noticeable. Chapter 4 Evaluation 4.1 Goals Achieved Here I comment on the requirements listed in Section 2.1. • “It must be easy for a composer to produce music.” I am able to use my favourite .mid and .wav editors. Writing the XML itself was painless. This goal was met. • “It must be possible for the composer to keep file sizes down to a minimum.” No compressed audio file formats are supported, so this goal was not met. However, support for compressed audio could be added with no major redesigning, and as explained in Section 2.2.2, doing it properly would have taken more time than was available. • “The music must be structured so as to allow tweaks on the fly. Such tweaks might include speed variation, muting of some instruments and instrument substitution.” No tweaks have been implemented for the MIDIMappingPlayer; I chose instead to put the available time towards supporting a good selection of MIDI events. The structure is there, so technically the goal was met. • “The playback library must be capable of producing the same output on every computer.” As far as I know, it does! I took headphones to the computer lab when my computer was out of order, and the output was the same. • “It must be able to do so comfortably in real time on a typical home computer for a piece of music of average complexity.” Overall, this goal has 31 32 CHAPTER 4. EVALUATION been met. Sections 4.1 and 4.6.2 discuss this further. See Section 4.5 for some measurements. While not all goals have been met at this stage, the project would not need any major redesigning to meet any of them. I consider the project a success. 4.2 4.2.1 Evolution of the Plan Generalisation The original project proposal (Appendix D) began by emphasising a specific structure in which a MIDI mapping appeared at the root of a tree and had modules called instruments as children. The MIDI mapping would act as a multiplexer on the MIDI program (instrument), and an instrument would multiplex on the note. An instrument would only have samples (or perhaps synthesisers) as children. Any effects such as filters or volume envelopes would be defined within the instrument modules, in the form of embedded effect trees. An effect tree would look much like the state tree from Figure 3.1, but there would be a missing leaf where a sample would be plugged in. As an afterthought, the project plan mentioned that MIDI mappings, instruments, samples and effects would actually be generalised into a unit known as a DSP module. In the time following the submission of the plan, it began to become clear that the proposed structure could be simplified. As the structure stood, it was not clear what should happen if an instrument (properly part of the main state tree) were defined as part of an effect tree. There would be many missing leaves, and it would not be clear which one a sample should be plugged into. Apart from managing the overcomplicated effect trees, the only job an instrument module performed was selecting a sample according to the note being played. I soon realised that this would be better off in dedicated modules called multiplexers (Section 3.5.3). First, this allowed multiplexing to be done on any variable, not just the current note. Secondly, it became trivial for a musician to put the effects anywhere they were required, above or below any multiplexer. Effect trees were therefore no longer necessary in defining instruments. 4.2.2 Filtering Whole Channels or Tracks Some functionality has been lost as a result of the changes described in Section 4.2.1. In addition to the instruments, the MIDI mapping was going to 4.3. MILESTONES 33 manage some effect trees whose job would be to filter one or more whole channels or tracks. It is still possible to filter whole channels. Figure 4.1 applies a filter to Channels 4, 5 and 6. This is rather involved, and the filter’s parameters must be controlled in control.mid. Sometimes it would be preferable for cool.mid to control them, especially when filtering single channels. A composer might well reject this idea in favour of filtering every note individually, clearly a waste of processor time. MIDIMapping MIDIMapping cool.mid control.mid Instruments Instruments LookupMultiplexer LookupMultiplexer 1 1 channel 2 channel Filter Subject 2 3 MIDIMapping cool.mid Instruments LookupMultiplexer 4 channel 5 6 Figure 4.1: How to filter a set of channels I believe the best way to fix this would be to split the MIDI mapping so that ‘channel player’ or ‘track player’ modules could appear as descendants with effects in between as desired. Since no such effects were actually implemented as part of the core work, it seemed appropriate to leave splitting the MIDI mapping as an extension. 4.2.3 Reference Counting I planned to implement reference counting for .wav, .mid, .xmi and .xmm files. When it came to doing it, I wanted to make my code reusable and could not see immediately how to achieve this. It was not an essential part of the project, so I left it as an extension. 4.3 Milestones I did not anticipate the amount of time it would take to do the second work package, consisting of the volume envelope and other components that can be 34 CHAPTER 4. EVALUATION used to define an instrument. A large part of this work was the system of variables discussed in Section 3.3. However, the subsequent three weeks’ work collapsed to a few days, as most of the functionality the .xmm format was going to implement already existed. In summary, the structure of the project changed to such an extent that the milestones were no longer a good subdivision of the work to be done. Nevertheless, they did their job of providing short-term goals and keeping the project moving. 4.4 Testing 4.4.1 General Most testing was performed aurally. To aid this, files were set up to check that newly added features were working properly. Test programs designed to call the mixSamples() method for varying numbers of samples at a time were written. The one part of the project that required more than aural testing was the resampler. 4.4.2 The Resampler I wrote a visual test for the resampler. Figures 4.2 and 4.3 present six screen shots from the test program. The program accepts the name of a .wav file on the command line. Keystrokes change the volume and delta parameters, select an interpolation function, adjust the looping settings, and switch between mono and stereo. The test calls mixSamples() repeatedly; the number of sample frames requested each time is a random number from 1 to 8. This test proved invaluable in the construction of the resampler. Allowing many cases to be verified in a short space of time, it found many bugs that may otherwise not have been known. 4.5 Profiling Profiling was done using gprof, after the code was compiled and linked with g++’s -pg switch and run on my AthlonXP 1800+ running at 1145 MHz, a typical modern configuration. The jou5cred.xmm file featured on the Demo CD was played, and the audio output was piped into ALSA’s aplay command. Figure 4.4 shows the results for the three different interpolation modes. The resampler uses just over half the processor time. Considerable proportions go towards the volume ramping code in the VolumeEnvelopePlayer, discussed in 4.5. PROFILING 35 Initial display: volume = 1, delta = 1. delta reduced to 14 . Cubic interpolation at work. A straight loop. Note how the curve is smooth everywhere. Figure 4.2: The visual resampler test. 36 CHAPTER 4. EVALUATION A bidirectional loop with cubic interpolation. The same loop with linear interpolation. The same loop with no interpolation. Figure 4.3: The three interpolation functions. 4.5. PROFILING 37 % cumulative self self total time seconds seconds calls ms/call ms/call name 54.46 11.00 11.00 243660 0.05 0.05 void SamplePlayer::doResample...InterpCubicF... 12.33 13.49 2.49 141864 0.02 0.02 void VolumeEnvelopePlayer::applyVolumeRamp... 9.01 15.31 1.82 main 6.63 16.65 1.34 179704 0.01 0.03 VolumeEnvelopePlayer::mixSamples... 4.90 17.64 0.99 1161462 0.00 0.00 std::_Rb_tree<...ParameterTweak*...>::find... 1.53 17.95 0.31 894780 0.00 0.00 DSPModulePlayer::PassDownTweak::applyTweak... 1.44 18.24 0.29 1161462 0.00 0.00 DSPModulePlayer::pushParameterTweak... 0.79 18.40 0.16 22656000 0.00 0.00 std::floor(float) 0.79 18.56 0.16 1417 0.11 12.67 MIDIMappingPlayer::mixSamples... Profiling results with the cubic resampler. % cumulative self self total time seconds seconds calls ms/call ms/call name 50.32 9.51 9.51 243660 0.04 0.04 void SamplePlayer::doResample...InterpLinearF... 14.07 12.17 2.66 141864 0.02 0.02 void VolumeEnvelopePlayer::applyVolumeRamp... 7.83 13.65 1.48 main 7.41 15.05 1.40 179704 0.01 0.03 VolumeEnvelopePlayer::mixSamples... 5.87 16.16 1.11 1161462 0.00 0.00 std::_Rb_tree<...ParameterTweak*...>::find... 1.96 16.53 0.37 22656000 0.00 0.00 std::floor(float) 1.16 16.75 0.22 36 6.11 7.50 Sample::readSample... 1.11 16.96 0.21 1417 0.15 11.84 MIDIMappingPlayer::mixSamples... 1.01 17.15 0.19 1161462 0.00 0.00 DSPModulePlayer::pushParameterTweak... Profiling results with the linear resampler. % cumulative self self total time seconds seconds calls ms/call ms/call name 51.81 8.61 8.61 243660 0.04 0.04 void SamplePlayer::doResample...InterpNoneF... 13.72 10.89 2.28 141864 0.02 0.02 void VolumeEnvelopePlayer::applyVolumeRamp... 6.50 11.97 1.08 main 6.44 13.04 1.07 179704 0.01 0.03 VolumeEnvelopePlayer::mixSamples... 6.08 14.05 1.01 1161462 0.00 0.00 std::_Rb_tree<...ParameterTweak*...>::find... 1.68 14.33 0.28 1417 0.20 10.66 MIDIMappingPlayer::mixSamples... 1.50 14.58 0.25 22656000 0.00 0.00 std::floor(float) 1.32 14.80 0.22 1161462 0.00 0.00 DSPModulePlayer::pushParameterTweak... 1.20 15.00 0.20 894780 0.00 0.00 DSPModulePlayer::PassDownTweak::applyTweak... Profiling results with the non-interpolating resampler. Figure 4.4: Some profiling results 38 CHAPTER 4. EVALUATION Section 4.6.2, and the code in main that converts to 16-bit integers and outputs them, which is not a concern since it is merely part of the test program and has not been optimised. Additionally, a noteworthy amount of time is spent processing parameter tweaks; this would deserve investigation given more time. Surprisingly, the choice of interpolation function does not make much difference to the amount of processor time used by the resampler. (The ‘self seconds’ column is the most appropriate measurement for this comparison.) I suspect the code generated by the compiler is sub-optimal, and the overhead per sample is greater than the cost of the interpolation function. Despite the above concerns, when compiled without the profiling overhead, the test program used 32.640 seconds of processor time to play jou5cred.xmm through aplay with cubic interpolation, as reported by Linux’s time command. The real time reported was 4 minutes and 20.481 seconds. This equates to an average of 12.5% CPU usage, which is comfortable. 4.6 4.6.1 Comments Samples Refer back to Figure 3.4 and observe how the history buffer begins filled with zeros. This ensures that the curve makes a smooth departure from the centre line as a sample starts. Unfortunately, the end of playback is another matter. If the sample in Figure 3.4 were set not to loop, and instead ended where the loop end is marked, then the output from the SamplePlayer would terminate after state 4. Ideally, the contents of the history buffer should be allowed to phase out and be replaced by zeros before the output terminates. Luckily, this is rarely a problem. Most samples are set to loop and are faded out by an envelope. Those samples that are not set to loop will usually include their own fade-out, however brief, so the output that is not generated would be very close to silence anyway. 4.6.2 Volume Envelopes As mentioned in Section 3.5.2, the volume envelope implementation is not particularly efficient. An alternative implementation would be to use parameter tweaks and adjust the volume in small steps. I rejected this idea during implementation because it would create some clicking. 4.7. PROBLEMS ENCOUNTERED 39 However, the MIDI protocol itself cannot vary a parameter smoothly over time. If a channel is faded in or out, the fade will have to be done in steps. A better implementation would use steps and have generators endowed with the ability to remove clicks themselves. The SamplePlayer could do this by including the volume ramping functionality in the resampling loop, where it would cost considerably less. 4.6.3 MIDI Playback It would be unreasonable to expect every MIDI event or feature to be handled, and quite a few are missing from libxmas. However, all the commonly occurring ones have been implemented, and all the .mid files I have tested play correctly. As evidenced in Section 3.6.5, libxmas sometimes outperforms commercially available MIDI players! 4.6.4 Flexibility I am exceptionally pleased with the flexibility XMAS offers. As I was preparing music, I felt that some of the instruments were too loud on the high notes and too quiet on the low notes. No problem; XMAS allowed me to compensate by adjusting the velocity variable. I wanted one instrument to decay more slowly for low notes. No problem; the volume envelope will respond if I set the rate variable. This is leagues ahead of what an Amiga mod-based file or a SoundFont (an instrument definition for the MIDI player on a Creative Labs Sound Blaster) can do. 4.7 4.7.1 Problems Encountered STL Containers Early in the project’s development, having used g++ to compile the code so far, I decided to try Intel’s icc and see if the code would run faster (on my AMD processor). The difference in execution time was incredible. Investigating the cause of the immediate segmentation fault, I discovered behaviour on the machine code level that suggested a compiler bug. When the project had progressed further and the code exhibited the same problem with g++, I knew something was wrong. After a while I realised what the problem was. The Standard Template Library, providing containers such as vectors and linked lists, often reallocates mem- 40 CHAPTER 4. EVALUATION ory and has to move the objects to a new location. If pointers to the objects exist anywhere, those pointers will become invalidated. The solution was to construct containers only of pointers to objects, so the pointers would be moved and the objects would not. This hitch cost me a couple of days. It did not throw the project off track. 4.7.2 XML Files xmlwrapp seemed unable to read XML files unless they were in the working directory. This could pose problems for composers who want to use directories to organise their instruments. I did not have time to investigate this problem. 4.7.3 Code Size A generic resampling algorithm, including the support for the history buffer described in Section 3.5.1, was written once in the form of C++ templates. It is instantiated with three different interpolation functions. Versions exist to play forwards and backwards. There are versions for mono source and stereo source, and versions for mono destination and stereo destination. This results in a large explosion in executable code size. Compiled with optimisation and stripped of all symbols, the MIDI playback test is 357 kB. Compressed with UPX [4] it is 120 kB, which is still a lot for just the music playback code. It is likely to continue to grow exponentially when surround sound support is added to XMAS. I do not currently have a solution to this problem. Changing the template parameters into variables would likely cause an unacceptable performance hit. Building code on the fly would tie me to a specific architecture and is prohibited on some machines for security reasons. Chapter 5 Conclusions I am extremely pleased with the outcome of this project. While a few problems are outlined in the Evaluation, they are minor and it is easy to forget how much of the project went well. I have learnt a lot from the project, particularly in terms of instrument design and C++ experience, and I shall certainly use XMAS for games I write in the future. A Demo CD is enclosed. It includes some aural test results and two complete pieces of music. A track listing is given in Appendix C. After doing a little more work on XMAS, I intend to release the library as an open source project at http://xmas.sf.net/. Please visit this site if you are interested in XMAS. 41 42 CHAPTER 5. CONCLUSIONS Bibliography [1] The MIDI Manufacturers Association. The complete MIDI 1.0 detailed specification. http://www.midi.org/about-midi/specinfo.shtml, 1996. [2] B. N. Davis. Rock ‘n’ Spin. rockspin, 2000. [3] P. Jones. xmlwrapp. 2001–2003. http://bdavis.strangesoft.net/?page= http://pmade.org/pjones/software/xmlwrapp/, [4] Markus F. X. J. Oberhumer and L´aszl´o Moln´ar. UPX, the Ultimate Packer for eXecutables. http://upx.sf.net/, 1996–2002. [5] B. N. Davis, R. J. Ohannessian and J. Cugni`ere. DUMB, Dynamic Universal Music Biblioth`eque. http://dumb.sf.net/, 2002, 2003. [6] Y. Ollivier. Mathematical expression parser in C++. http://www.eleves. ens.fr/home/ollivier/mathlib/mathexpr.html, 1997–2000. [7] B. Pietsch. The L math processor. http://lmp.sf.net/, 2000. [8] M. Richards. How to prepare a dissertation in LATEX. http://www.cl.cam. ac.uk/users/mr/demodiss.tar, 2001. 43 44 BIBLIOGRAPHY Appendix A The Cubic Interpolation Function The cubic interpolation function is based on the following formula, where x is the interpolated value, t is the fractional part of the sample position, and a, b, c and d are given in terms of four existing samples, x0 , x1 , x2 and x3 . We are interpolating between samples x1 and x2 . x = at3 + bt2 + ct + d (A.1) dx = 3at2 + 2bt + c (A.2) dt At t = 0 we desire x to evaluate to x1 , and at t = 1 we desire x to evaluate to x2 . Substituting these values into Equation A.1 gives us the following equations: d = x1 (A.3) a + b + c + d = x2 (A.4) At t = 0, we desire the curve’s gradient to be parallel to a line joining samples 0 and 2, so dx = 21 (x2 − x0 ). Likewise, the gradient at t = 1 should be parallel to a dt line joining samples 1 and 3, so dx = 12 (x3 − x1 ). Substituting into Equation A.2 dt gives the following. 1 (x2 − x0 ) = c (A.5) 2 1 (x3 2 − x1 ) = 3a + 2b + c (A.6) Equations A.3, A.4, A.5 and A.6 can be solved simultaneously, giving the following matrix equation: a b c d 1 = 2 −1 3 −3 1 x0 2 −5 4 −1 x1 −1 0 1 0 x2 0 2 0 0 x3 45 (A.7) 46 APPENDIX A. THE CUBIC INTERPOLATION FUNCTION The formula for x can also be expressed in matrix form: x= ³ t3 t2 t 1 ´ a b c d (A.8) Substituting A.7 into A.8 gives x= ´ 1³ 3 2 t t t 1 2 x0 −1 3 −3 1 2 −5 4 −1 x1 −1 0 1 0 x2 x3 0 2 0 0 (A.9) Since matrix multiplication is associative, we can elect to do the multiplication by powers of t first. The result is a vector of four values, each dependent on t alone. It is therefore possible to use four look-up tables, each indexed by t, to construct this vector, after which the only necessary operation is a four-dimensional dot product. T0 (t) = −t3 + 2t2 − t x0 x1 1 T1 (t) = 3t3 − 5t2 + 2 x= · (A.10) x 2 T2 (t) = −3t3 + 4t2 + t 2 T3 (t) = t3 − t2 x3 Furthermore, observe the following results: T0 (1 − t) = −(1 − t)3 + 2(1 − t)2 − (1 − t) = t3 − t2 = T3 (t) (A.11) T1 (1 − t) = 3(1 − t)3 − 5(1 − t)2 + 2 = −3t3 + 4t2 + t = T2 (t) (A.12) Only two look-up tables are required. Equation A.10 becomes, x= 1 2 T0 (t) T1 (t) T1 (1 − t) T0 (1 − t) and this is the formula used by libxmas. · x0 x1 x2 x3 (A.13) Appendix B Some Example XML B.1 volenv.xml This is the test for the VolumeEnvelope module. A <volenv> element contains a subject and a list of nodes. The first node is assumed to be at time zero. This example plays a sample of a harpsichord, applying an envelope (the inner one) that begins at full volume, fades to silence, immediately fades to five times the full volume, and then fades down to twice the full volume before sticking there (end of envelope). The output does not terminate since the final node is nonzero. A second envelope, much like the right-hand example pictured in Figure 3.5, is applied to the result. The output can be heard on the enclosed Demo CD. <?xml version=’1.0’?> <volenv> <subject> <volenv> <subject> <sample filename="harpsi.wav" /> </subject> <node value="1" /> <node time="0.15" value="0" /> <node time="0.4" value="5" /> <node time="0.7" value="2" /> </volenv> </subject> <node value="1" loopstart="" /> <node time="0.025" value="0" /> <node time="0.035" value="0" /> <node time="0.06" value="1" loopend="" /> </volenv> 47 48 B.2 APPENDIX B. SOME EXAMPLE XML compute.xml This is the test for the VariableComputeBlock module. The same harpsichord sample is used, but this time the note it was recorded at is specified. In the test, notes with numbers ranging from 48 to 72 are generated in quick succession. For the musicians, this constitutes a chromatic scale covering the octaves below and above middle C (60). The compute block assigns a value to velocity that starts at (72−48)∗5+7 = 127 and decreases linearly to (72 − 72) ∗ 5 + 7 = 7. The result, a scale that starts loud and fades out, can be heard on the Demo CD. <?xml version=’1.0’?> <compute> <variable name="velocity" value="(72-note)*5+7" /> <subject> <sample filename="harpsi.wav" note="A4" /> </subject> </compute> B.3 clarinet.xmi This is an instrument definition for a clarinet. Three samples are used for different note ranges, and a GeneralMultiplexer (<multiplexer> chooses between them. The note at which each sample was recorded at is given. The samples are not quite at the right pitch; the frequency is overridden to correct this. The samples are set to loop, and the loop start point is given. The loop end point defaults to the end of the sample. clarinetl.wav is enclosed in a volume envelope with a constant amplification of 70%. It sounded too loud against the other samples, so I added the envelope to compensate. Around the multiplexer, there is a VariableComputeBlock. Its purpose is to reduce the note velocity for high notes and increase it for low notes. This was judged necessary aurally, but a scientific explanation would be that higher frequency waves transmit greater power. The use of the velocity variable is a hack; we want to adjust the volume, and SamplePlayers simply interpret the note velocity as a variable. The outermost volume envelope simply applies a rapid, pseudo-exponential fade-out when the note is stopped. <?xml version="1.0"?> <volenv> <subject> <compute> B.4. PIZZ.XMI 49 <variable name="velocity" value="velocity*2^((60-note)/24)" /> <subject> <multiplexer> <generator> <subject><volenv> <subject><sample filename="clrinetl.wav" note="C5" frequency="44127.51" loop="on" loopstart="6044" /></subject> <node value="0.7" /></volenv></subject> </generator> <generator> <range variable="note" low="63" /> <subject><sample filename="clrinetm.wav" note="G5" frequency="44127.49" loop="on" loopstart="4160" /></subject> </generator> <generator> <range variable="note" low="75" /> <subject><sample filename="clrineth.wav" note="E6" frequency="44096.84" loop="on" loopstart="4409" /></subject> </generator> </multiplexer> </subject> </compute> </subject> <node value="2" sustainpoint="" /> <node time="0.05" value="1" /> <node time="0.10" value="0.4" /> <node time="0.15" value="0.14" /> <node time="0.20" value="0.04" /> <node time="0.25" value="0" /> </volenv> B.4 pizz.xmi pizz.xmi defines string instruments (the violin family) played pizzicato, where the performer plucks the strings instead of drawing a bow across them. The definition includes another example of a multiplexer, and a volume envelope. Note that this envelope has no sustain point; the pseudo-exponential fade-out happens immediately. Here, a VariableComputeBlock sets the rate variable, which the envelope obeys. The result is a long decay for low notes and a short decay for high notes. Once again, the output is on the Demo CD, this time generated by the MIDI player using a simple scale.mid file that covers the entire range of a piano (88 notes). This is a greater range than the instruments represented can manage! <?xml version="1.0"?> <compute> <variable name="rate" value="2^((note-60)/12)" /> <subject> <volenv> <subject> <multiplexer> <generator> <subject><sample filename="pizzl.wav" note="E4" frequency="11000" loop="on" loopstart="8403" /></subject> </generator> 50 APPENDIX B. SOME EXAMPLE XML <generator> <range variable="note" low="58" /> <subject><sample filename="pizzh.wav" note="E5" frequency="11000" loop="on" loopstart="6372" /></subject> </generator> </multiplexer> </subject> <node value="2" /> <node time="0.10" value="1" /> <node time="0.25" value="0.4" /> <node time="0.45" value="0.18" /> <node time="0.70" value="0.10" /> <node time="1.00" value="0.06" /> <node time="1.35" value="0.03" /> <node time="2.00" value="0" /> </volenv> </subject> </compute> B.5 general.xmi This defines a whole set of instruments, referring to separate files for the individual definitions. It also accounts for percussion as defined by General MIDI (see Section 1.1.1). There are two LookupMultiplexers. The outer one switches on the channel variable: all notes on Channel 10 are rendered using the percussion.xmi definition, which chooses samples according to the note variable. For all other channels, the inner multiplexer uses the program variable to select an instrument definition. <?xml version="1.0"?> <lookup variable="channel"> <generator> <range /> <subject> <lookup variable="program"> <generator> <range /> <generator> <range value="13" /> <generator> <range value="27" /> <generator> <range low="40" high="55" /> <generator> <range value="45" /> <generator> <range value="46" /> <generator> <range value="47" /> <generator> <range value="56" /> <generator> <range value="57" /> <generator> <range value="60" /> <generator> <range low="68" high="69" /> <generator> <range value="70" /> <generator> <range value="71" /> <generator> <range value="73" /> </lookup> </subject> </generator> <generator> <range value="10" /> <subject><external filename="percussion.xmi" </generator> </lookup> B.6 <subject><external <subject><external <subject><external <subject><external <subject><external <subject><external <subject><external <subject><external <subject><external <subject><external <subject><external <subject><external <subject><external <subject><external filename="piano.xmi" filename="xylophon.xmi" filename="bass.xmi" filename="strings.xmi" filename="pizz.xmi" filename="harp.xmi" filename="timpani.xmi" filename="trumpet.xmi" filename="trombone.xmi" filename="horn.xmi" filename="oboe.xmi" filename="bassoon.xmi" filename="clarinet.xmi" filename="flute.xmi" /></subject> /></subject> /></subject> /></subject> /></subject> /></subject> /></subject> /></subject> /></subject> /></subject> /></subject> /></subject> /></subject> /></subject> </generator> </generator> </generator> </generator> </generator> </generator> </generator> </generator> </generator> </generator> </generator> </generator> </generator> </generator> /></subject> Example Music Here is jou5cred.xmm, a simple MIDI mapping. By default, looping is off. <?xml version="1.0"?> B.6. EXAMPLE MUSIC 51 <midimapping midifilename="jou5cred.mid"> <external filename="general.xmi" /> </midimapping> rockspin-piece10.xmm is set to loop. The outer volume envelope serves the purpose of fading the music out once it starts repeating. This was done for demonstration purposes, but it shows how a MIDIMapping is no different from any other module! <?xml version=’1.0’?> <volenv> <subject> <midimapping midifilename="rockspin-piece10.mid" loop="on"> <external filename="general.xmi" /> </midimapping> </subject> <node value="1" /> <node time="170" value="1" /> <node time="180" value="0" /> </volenv> Both these pieces may be heard on the Demo CD. Appendix C Demo CD Track Listing All tracks were generated using cubic interpolation in the resampler unless otherwise stated. 1. The output from the sample player test, playing harpsi.wav. The sample was recorded at 22050 Hz and the output is at 44100 Hz, so resampling is taking place. 2. The output from the volume envelope test described in Section B.1. 3. The result of the variable compute block test presented in Section B.2. 4. This track first shows the outcome of using a single piano sample for the whole range of the instrument, illustrating the need for multiple samples. Next, recordings of twelve notes spanning the entire range of the instrument are all adjusted to Middle C and played in sequence, showing how different they are. 5. This track contains a scale covering every note on the piano. The astute listener will hear each change of sample, confirming that the multiplexer is at work. It is hoped that a casual listener can ignore the changes, especially in real music where they are usually less noticeable. 6. The same scale is played using the strings pizzicato definition from Section B.4. Note how the decay rate varies with the pitch. Some aliasing can be heard on the high notes, but such high notes are rare. 7. The scale from the last track is played again with linear interpolation. Some unwanted high frequencies can be heard on some notes, but it is subtle. 52 53 8. The scale is played with the non-interpolating resampler. The difference is very noticeable, particularly on low notes. Sometimes this effect is desired, and XMAS does indeed allow a musician to request it for a specific sample! 9. jou5cred.xmm, the first example of a complete piece of music (Section B.6). The underlying .mid file was my contribution to a game called Jou 5, which sadly the author has no further interest in distributing. 10. rockspin-piece10.xmm, the second example. The music comes from the final three levels of my game Rock ‘n’ Spin [2]. It loops, and Section B.6 shows how even the fade-out was able to be done by libxmas. c 2004 Ben Davis. The material on the Demo CD is Copyright ° Appendix D Project Proposal Computer Science Tripos Part II Project Proposal XMAS: an open MIDI and sample-based music system B. N. Davis, Robinson College Originator: B. N. Davis 22 October 2003 Special Resources Required My own computer (if it breaks down I can use the computer room) Project Supervisor: N. E. Johnson Director of Studies: Dr A. Mycroft Project Overseers: Dr I. Pratt & Dr G. Winskel 54 55 Introduction The MIDI protocol is very useful in the production of music. Devices may use it to communicate performance events (such as when a note is pressed or released) between each other. It is an industry standard with widespread software and hardware support. Unfortunately, it has been misused. Most software-based music editors can dump MIDI data to standard .mid files. These files store the aforementioned performance events, but not much else. MIDI module manufacturers have collaborated to implement a scheme called General MIDI, which specifies a standard instrument mapping (so a piano will be a piano everywhere), but synthesisers still vary wildly and a piece that sounds great on one device is likely to sound unbalanced on another (for example the string section may be too loud). This poses a problem for their distribution. The Amiga gave birth to ‘music modules’, which are files capable of storing samples in addition to the sequence data. The PC has expanded them beyond the Amiga’s limitations, and there are now several editors (‘trackers’) and players of varying quality. While not properly standardised, modules can be trusted to sound correct on any system if you are careful which software you use. However, they are limited, and the trackers are not very user-friendly. Nowadays, music can be distributed using lossy compression. This is satisfactory in many situations, but not all; dial-up Internet users have to wait a long time to download them, which is especially a problem if for example a game developer wants to offer a product for download and include one music track for each level. There are also people who can hear the degradation that results from the lossy compression. This project will produce a solution that has the advantages of both MIDI and Amiga modules without necessarily the large size or quality loss of generalpurpose streamed audio. Lossy compression may be used if small files are required, otherwise lossless compression may be used if sound quality is paramount. That said, forms of compression will be considered extensions to this project, and I will not mention them again until the ‘Extensions’ section. Description A musician may produce one or more sequence files (.mid for the purposes of this project) using any existing software and hardware, and produce or obtain a set of samples (.wav for this project). Instruments may be specified in .xmi (XML Instrument) files; these are a layer above samples and may for example specify volume envelopes and different samples for different note ranges. Then a .xmm 56 APPENDIX D. PROJECT PROPOSAL (XML Music Mapping) file ties the samples, instruments and sequences together. These two XML formats will be specified by this project. They will both allow author information and other human-readable notes to be embedded. DSP trees are used at various points. These are trees of DSP modules, which are filters capable of generating, modifying or combining PCM data. Modules may have parameters to control them, and a tree will contain expressions for evaluating the parameters; a simple expression parser will be used here, and MIDI’s continuous controls will be accessible as variables. Volume envelopes and the sample and sequence players will be implemented as DSP modules. The sample player will support stereo and offer three different interpolation options: none, linear and cubic. The user chooses a preferred algorithm, but instruments may override this. Where a tree is used to modify sound, it may have a ‘missing leaf’, at which the input will be generated. When it is used to generate sound, it may not. A tree may never have multiple missing leaves. An instrument file specifies a generator DSP tree for each note of the scale. It may also specify modifier DSP trees to apply to note ranges and to all notes. A typical instrument will use a volume envelope at the very least. In a mapping file, one sequence is designated the root. This is the one that will be played. Each MIDI instrument is assigned an XML instrument, or another mapping to use as a sub-sequence; either of these may be a separate file or a nested XML block. Sub-mappings will inherit their parents’ instrument mappings, but these may be overridden. A mapping file may assign a modifier DSP tree to each track or each MIDI channel (but not both since these are two different ways of subdividing the same set), and to the whole. Since samples, instruments, sequences and mappings may be reused, they will be loaded once and reference-counted. It is worth reiterating that all the components that have been described will be treated as DSP modules. There will be a simple command-line playback tool that writes raw PCM data to stdout; this can be piped into ALSA’s aplay command. I will use C++ for this project. Extensions Perhaps the two most important extensions are support for other sample formats, particularly those using lossy compression such as Ogg Vorbis, and support for 57 a ready-to-play archive format to make the music easier to distribute. The .xma (XMM Archive) format will allow for this; note that it is not XML since it needs to store binary data compactly. It will be able to store any combination of samples, sequences and the two XML formats; typically it will be used for a whole piece of music, a shared sample database and files that refer to this database, or a whole album. There will be support for author information and human-readable notes, typically to be used to describe the collection as a whole since there are already human-readable notes for individual pieces and instruments. Lossless compression will be used, ideally with algorithms optimised for the various types of data. The name of the project comes from the ultimate ideal of being able to distribute music as .xma files (a nice take on .wma files). ‘XMAS’ is short for ‘XMA System’. Other possible extensions include extra DSP modules (such as filters, distortion and echo), click removal for when samples start, stop and loop (not for clicks in the actual sample data), support for surround sound, a GUI for editing and testing instrument and mapping files, a stand-alone player, and XMMS and Winamp plug-ins. Finally, the product could be developed into a complete music authoring environment, with facilities for recording and editing sample and MIDI data in the GUI, but this could become a whole project in itself. Work that has to be done The core implementation work breaks down into the following sections:1. Specify DSP module interface. Implement reference-counted .wav loader and sample player. 2. Implement volume envelope module. Specify .xmi format. Implement reference-counted loader. Create a .xmi file for testing. 3. Implement reference-counted .mid loader. Specify .xmm format. Implement reference-counted .xmm loader. Create a .xmm file for testing. 4. Implement MIDI sequence player and .xmm player. 5. Implement command-line player. Create a more involved piece of music for testing and, later, demonstration. 58 APPENDIX D. PROJECT PROPOSAL Note that I have already written plenty of pieces of music that can be exported to .mid, so I will work with these. Creating example music will not use up a disproportionate amount of time. Each of these work packages takes the project to a new level of complexity. First it will be able to play .wav files. Then it will support instruments, then, two work packages later, whole pieces of music. I will be able to perform tests at the end of each work package and thus fix most of the bugs as I go along. Success Criteria By the end of the project I will have a piece of music in .xmm (or .xma) format that takes advantage of multi-sample instruments and volume envelopes. The software will be able to play this music reliably and accurately. The specifications for the .xmm and .xmi file formats will indicate what constitutes accurate playback. If the above paragraph is true, the project will be considered a success. At the time of the progress report, I expect to have a program that plays a simple hard-coded tune using a .xmi file loaded from disk, in addition to textual output to verify the integrity of loaded .xmm and .mid files. Difficulties to Overcome The following main tasks will have to be undertaken before the project can be started: • Learn about XML, select a library for XML parsing, and familiarise myself with the library. The library must be able to read XML from an arbitrary stream. • Find a suitable mathematical expression parser. • Secure documentation on the .wav and .mid formats, and on the MIDI protocol itself. Starting Point I have worked with MIDI before, and am reasonably familiar with its main features. I have written a player for Amiga module-based formats, which incorporates a sample player with cubic interpolation; its code will serve as a reference. The player, including source code, is available at http://dumb.sf.net/. 59 Resources All development work will be carried out on a Linux PC equipped with a standard PCM sound interface. I will be using my machine primarily, but if it breaks down, I will bring headphones and use the machines in the William Gates Building. I will be using CVS to manage my source code, and the repository will be archived and uploaded nightly to Pelican (one old copy will be kept each time). Work Plan All dates listed here are Fridays. 24 October 2003 – 7 November 2003 (two weeks) Preliminary work. Do the tasks listed in ‘Difficulties to Overcome’. In addition, set up a CVS server on my system and a script for making regular back-ups of the repository. Lay out the project and set up a Makefile system. 7 November 2003 – 28 November 2003 (three weeks) Do the first work package listed in ‘Work that has to be done’. The project should be able to play samples. 28 November 2003 – 19 December 2003 (three weeks) Do the second work package. This is the .xmi support. Hard-code a test that loads an instrument and plays several notes in sequence. 19 December 2003 – 9 January 2004 (three weeks) Do the third work package. This is support for loading .mid and .xmm files, but not for playing them. Generate text output for the purposes of verifying loaded data structures. 9 January 2004 – 30 January 2004 (three weeks) Do the progress report and prepare for the presentation. Include test results so far. 60 APPENDIX D. PROJECT PROPOSAL 30 January 2004 – 27 February 2004 (four weeks) Do the fourth work package. This is the music playback code and is liable to take longer. Test aurally. 27 February 2004 – 19 March 2004 (three weeks) Do the fifth and final work package. This is the command-line player, the music for demonstration, and final bug-fixes. 19 March 2004 – 14 May 2004 (eight weeks) This time will be spent on the dissertation. I will work on some extensions if I finish the dissertation early. 14 May is the final deadline.
© Copyright 2025