Faculty of Mathematics and Computing The Open University Walton Hall Milton Keynes MK76AA UK Telephone Direct line Faculty of Mathematics and Computing Fax +44 (0)1908274066 +44 (0)1908652688 +44 (0)1908 652 140 COMPUTING FOR COMMERCE AND INDUSTRY PROGRAMME SAMPLE DISSERTATION FOR MSOl - MSCPROJECT Sample NO: \ [o PLEASE RETURN TO COURSES OFFICE AT THE ABOVE ADDRESS Faculty of Mathematics arid Coinpiitirig: Applied Matlierrintics Departrrient, Ceritrefor Mnthetrmtics Educatioti, Coniputirig Deppnrtmerrr, Pure Matlierrrtitics Departtrierit, Statistics Departrnerit A PKOTOCOL TO GUARANTEE THE ORDER OF MESSAGE PROCESSING WHEN INTEGRATING ENTERPRISE APPLICATIONS A thesis submitted in partial fulfilment of the requirements for the Open Uniwrsity's Master of Science Degree in Software Development Duncan Millard U1 1 9 ~ i l 17 September 2004 Word Count: 14914 PREFACE My thanks to my supervisor Dr. Rob Walker for his guidance and suggestions, every one of which helped to improve the academic quality of this work. My thanks also go to my friends and family who have supported me throughout the last 3% years of “spare time” study, particularly ovcr the past fav months as I neglectcd them to immerse myself in this hnal piece of my MSc. Now you cm see what I have been up to while you were out cnjuping thc summer! My proofreaders also deseix a special mention for helping so selflessly with an S ~ ~ U O Ujob. S This thesis is dedicated to Liz. Without y o unconditional ~ and limitless support, understanding and cups of coffee, it would have been a very different piece of work. Thank you. Duncan Millard U1796407 -1- TABLE OF CONTENTS i Preface Table Of Contents fist of Figures V List of Tables vi Abstract vii Glossary viii 9 Chapter 1:Introduction 1. I. 1.2. 9 10 O\JEK\'LE\XJ THESIS Srautm~i? 11 Chapter 2 Enterprise Application Integration 11 11 12 12 12 13 13 14/ 14 15 16 17 '17 19 19 20 21 y' 22 22 23 23 23 24 24 25 2.1. TN'I"KODU(JI'ION 2.1 . l . Tntegmtion: lnvisible Glue 2.1.2. B2B and B2C 2.2. THEEVOLUllON OF SOPIWAR!? 2.2.1. Before Integration:The hlainfrdme 2.2.2. Integration through Data Sharing 2.2.3. Stovepipe Systems AUIDIMA'I'ING INI"I~GlWI'1ON 2.3. 2.3.1. Integration through Afiddlewaxe 2.3.2. Enterprise Applications 2.3.3. Point-to-Point Integration 2.4. A NEW AI~PR~ACI-I 'I'D I N T K f i \ T W N 2.4.1. I-Iub and Spokc Architecture 2.4.2. Latcgratim Engines 2.4.3. Automating Business Processes M in the Real World 2.4.4. 2.4.5. Asynchronous, Long-Running Processes THEAUSSAGE ORDERING ~ O I 3 l ~ l 3I'OK 1 2.5. 2.5.1. Cost of Integration 2.5.2. Extensibility Asynchronicity 2.5.3. 2.5.4. Hub and Spoke Architecture 2.5.5. Resilience 2.5.6. Flexibility and Efficiency 2.6. SUMMhKY Chapter 3: An Investigation of Message Ordering 26 3.1. IN'IRODUCI?ON 3.2. "l-llX l'ROHl.,Er\l O P A&SSAGE oIU>l:KINC; 3.2.1. An Informal Language for Discussion 3.2.2. Illuswative Scenarios MlsSi\G E OKIXKINC; APl'RO/\CI-I I 3 3.3. 3.3.1. Total Ordering 26 26 27 27 31 31 Duncan Millard U1790107 - 11 - 33 37 39 42 43 3.3.2. cdusd Ordering 3.3.3. Actual Causal Ordcckg 3.3.4. Application Ordering 3.4. SUhfhlhlO' 3.5. CONCLUSION 44 Chapter 4: Inferred Causal Ordering: A Protocol for EAI 4.1. 4.2. 4.3. 4.3.1. 4.3.2. 4.4. 4.4.1. 4.4.2. 4.5. 4.5.1. 4.5.2. 4.6. 44 44 45 45 45 IN'IXODUC'IION GUAKAN'I'I!EIN(.; CAUSAI.. OIUIEKING OHTAINING C/\USIU.,I'IY FROM t\Pl'I,ICn'llON 1'ROGKAMS Basic Ordering Tnformation XML 46 INI'I?RK13DCRUSAI, OWERING Causal Mcssagc Croups Cross-Group Dependencies N x i m m w i N c ; AND CONVEYING C~US~UI'IY Causal Log Message Annotation PKESIZKVING CAUSAl..I'IY 48 48 49 50 50 51 51 52 J' 52 52 52 55 58 Remote hkssdgc storcs Dpamk Message Tag REMOVING CAUSAI.~'I'Y INIWK~WY~ION 4.6.1. 4.6.2. 4.7. E\/AI.UA'I'ION 4.8. 4.8.1. Evaluation against EA1 Fcatures Theoretical Efficiency of the Protocol 4.8.2. 4.9. SUMMARY 60 Chapter 5: Protocol Evaluation 5.1. 5.2. 5.3. 5.3.1. 5.3.2. 5.3.3. 5.4. 5.4.1. 5.5. 5.6. 60 60 60 61 61 61 62 63 64 65 IN'II<ODUCI'ION EVAI.UA'I'ION AI'I'KOACI-I PERFOIZ\hr\iC13 M I ~ A S U R I ~ S Test Measures: Theoretical Efficieiicy Test Measures: E X Features Test Measures: Non-Quantitative Testing sIk~Ul.A'I'IONSYSIliM System Variables TESl' C i A 3 Sukihbuw 66 Chapter 6: Test Results 6.1. 6.2. 6.3. 6.4. 6.5. 6.5.1. 6.5.2. 6.6. 6.7. 6.8. 6.9. 6.9.1. 6.9.2. 6.10. 66 66 67 68 71 71 74 80 a2 84 87 INII~ODUC~ION ?dZTt-lODOLDC;Y LW~LSAC;E ORDERING kfIXSSAGlZ O\%RI-lEtW hhXSSAGE LO(:; OVlZRI-113AD Causal Information Dynamic Message Log Size EFFICIENCY RIXSILIENCX LIMTI-AI'IONS OF 'r1-11<SIMUI.A'L'ION SYS'Il31 msl' CONCI.USIONS Performance G a i n s of the Protocol Comparison of Theoretical Efficiency with Optimal Example SUhhMMY 88 87J 89 90 Chapter 7: Conclusions Duncan Millard UI796407 - iii - 90 90 91 92 93 Appendix B Simulation System Architecture 96 B.l o\TX\'lEW B.2 SENDING SYSIIZM SIMUIA'IION B.3 ~ ~ U H U R B.3.1 Rules Engine B.3.2 Integration Engine: BizTalk 2004 Message Logs B.3.3 €3.4 DIXIINAIION SYSTEMSIMUI.A'I'ION 96 96 97 97 98 98 98 References 99 Index Iluncan Millard U1796407 103 - IV - L i s 1 of F i g u r e s Figure 1: Stovepipe applications............................................................................................................................. Figure 2: Point-to-point integration between multiple systems ...................................................................... Figure 3: Hub and spoke integration..................................................................................................................... Figure 4 An informal language for message ordering ...................................................................................... Figure 5: Sequential send, non-sequential receive .............................................................................................. Figure 6: Sequential send, non-sequential receive .............................................................................................. Figure 7: Dependent concurrent send .................................................................................................................. Figure 8:Total ordering............................................................................................................................................ . blgure 9: Causal ordering.......................................................................................................................................... F i e 10:A sknple XblL document..................................................................................................................... Figure I 1: XML messages ........................................................................................................................................ .Figure 12: A cross-group dependency................................................................................................................... F i e 13: Message enveloping............................................................................................................................... Figure 14: An algorithm exhibiting O(n) efficiency ........................................................................................... Figure 15: An algorithm exhibiting O(n2) efficiency.......................................................................................... F i w e 16: An a1goarhm exhibiting O(1)efficitmcy........................................................................................... Figure 17: Message overliead hdepmdent ofnumber of destixdons ......................................................... Figwe 18 Message overhead independent of number of causal message groups...................................... Figure 19: How message overhead varies with cross-group dependencies.................................................. Fig7.m 20: I-IGWcausd lcg size varies with the iutllber ofdesthatioiis........................................................ F i i e 21: How causal log size varies with the number ofcausal groups..................................................... Figure 22: How causal iog size varies with the number of desiinanons and causai groups...................... Figure 23: How dynamic log she varies for different message quantities.................................................... Figure 24: How the sending to delivery ratio affects dynamic log size ......................................................... F i e 25: How v;uiable latency affects dynamic log size ................................................................................ F i e 26: How run time reduces as the number of causal groups increases .............................................. Figure 27: Baseline dynamic log &e for resilience testiflg ............................................................................... Figure 28: The impact of an unavailable destination on dynamic log size ................................................... Figure 29: Comparing theoretical efficiencies ..................................................................................................... F i e 30: Sirnulation system chitecturt:............................................................................................................ Figwe 31: Test message format.............................................................................................................................. .Figure 32: Simulation system causal group idenatier......................................................................................... - Duncan Millard U1796407 -v- 14 / 16 18 27 28 29 30 32 34 46 47 49 51 55 56 56 18 *-/" 69 70 ^I* 11 72 -,I /2 75 77 79 81 83 84 89 J 96 97 97 L i s 2 oJ' F i g u r e s Figure 1: Stovepipe applications............................................................................................................................. p 2 IJoint-to-pointintegration between multiple systems ...................................................................... Figure 3: Hub and spoke integration..................................................................................................................... Figure 4 An informal language for message ordering ...................................................................................... F w r e 5: Sequential send, non-sequential receive .............................................................................................. Figure 6: Sequential send, non-sequenttal receive .............................................................................................. Figure 7: Dependent Concurrent send .................................................................................................................. Figure 8: Total ordering............................................................................................................................................ Figure 9: Causal ordexing ......................................................................................................................................... Figure 1 0 A simple XML document..................................................................................................................... Figure 11: XML messages ........................................................................................................................................ Figure 1 2 A cross-group dependency................................................................................................................... Figure 13: Message enveloping............................................................................................................................... F w e 1 4 An algorithm exhibiting O(n) efficien............................................................................................ Figure 15: An algorithm &biting O(n3 effidency.......................................................................................... Figure 16: An algo,Orithm exhibiting O(1) efficiency ........................................................................................... F i w e 17: Message overhead independent of numbcr of desijnations......................................................... Figure 18: Message overhead independent ofnumber of causal message groups...................................... Figure 19: How message overhead varies with cross-group dependendes .................................................. Fip-e 2 0 How causal log size varies with die numbtr ofdestin.dnons........................................................ Figure 21: How causal log size varies with the number of causal groups..................................................... Figwe 2 2 S o w causai iog size vanes with the number of destinations and causai groups...................... Figure 23: How dynamic log size varies for different message quantities .................................................... Figure 2 4 How the sending to delivery ratio affects dynamic log size ......................................................... Figure 25: How variable latency affects dynamic log size ................................................................................ Figure 26: How run time reduces as the number of causal groups increases .............................................. .. F w e 27: Baseline dynamic log size for resthence testlng ............................................................................... Figure 28: The impact of an unavailable destination on dynamic log size ................................................... F i 29: Comparing theoretical efficiencies ..................................................................................................... F w e 3 0 Sitllulation system *architecture............................................................................................................ -.. 31: Test message format.............................................................................................................................. bigire Figure 32: Simulation system caud goup identifier......................................................................................... F Duncan A.lillard U1796407 -v- 14 J 16 18 27 28 29 30 32 34 46 47 49 51 55 56 56 68 / 69 70 71 72 73 75 77 79 81 83 84 89 96 J 07 1 97 L i s t of Tables Table 1: Suitability ofA& el al’s ordering protocol forI . ................................................................... Table 2 Suitability of Kshemkalyani and Singhal’s ordering protocol for EA1 .......................................... Table 3: Suitability of Cheng et ul’s ordering protocol for EAT ...................................................................... Table 4: Suitability of Sin& and Badarpura’s ordering protocol for ..................................................... Table 5: Suitability of message ordezlllg approaches for EA1 ......................................................................... Table 6: Summary of an inferred causal ordering protocol for E ~.............................................................. I Table 7: Test measures .............................................................................................................................................. Table 8 In-order message delivery ........................................................................................................................ Table 9: Running times, showing system overload above 50 messages....................................................... Table 10 Impact of latency on run times............................................................................................................. Table 11: Run times with unavailable destination .............................................................................................. Ihncaii Millard U1796407 - 33 36 t 39 v 41 Y 42 4 54 / 65 68 76 78 82 v’ ..4 b s t r a E t J One of the significant emerging trends in modern computing is that of Enterprise Application Integration (EAI) - the connecting of two or more individual applications via custom, automated business processes. Withul this still maturing discipline, there are a number of significant technical problems that are yet to be overcome. Integrated applications typically communicate by asynchtonous message passing, resulting in thc dclivcry of mcssagcs to rcmotc systcms in an order potentially different to their creation. This can have undesirable side effects such as a lass of data integrity. I . Distributed systems research describes a number of different approaches to solving the message ordering problem. In order to assess these approaches, criteria are developed that must be met for a protocol to be suitable for use in EAI. Measured against thcsc criteria, existing protocols are found to be lacking. This thesis presents a novcl mcssagc ordering protocol desgncd specifically for EM, in which ordering information is inferrcd from the contents of the messages themselves. The overhead of the protocol and the efficiency gains it offers compare well with other implementations of traditional causal ordering. Duncan ~Miilard U1796407 -i i - G l o s s a ry A2A Application to Application Integration API Application Programming Interface B2B Business to Business Integration B2C Business to Consumer Integration Enterprise Application Integration Enterprise An orpuization that uses computers. In practice, the term is applied much more often to larger organizations than smaller ones (NIH, n.d.) ERP Enterprise Resource Planning XML extensible Markup Language Duncan Millard ut796407 -vii- Chapter I INTRODUCTION LL Overview Businesses are increasingly realising the financial and competitive benefits of Enterprise Application Integration P I ) Many . new technologies and tools are emerging that greatly reduce the cost and complexity of implementing an integration strategy (Medjahed et d, 2003; Linthicum, 2004). As with any relatively new technology, there are still a number of issues to resolve. Integrated applications tppically communicate via asynchronous message passing (Bussler, 2002b) and are therefore comparable in nature to traditional distributed asynchronous systems. A problem common to both domains is the need to control the order of message processing between the different components of the system (Lamport, 1978) in order to ensure data integrity and consistency across each process or application. Consider for example two messages from a Human Resources (HR) system destined for a payroll system - one noafjmg that a new employee has joined a company, and one to set the employee’s salary level. If the payroll system receives the message to set the employee’s salary before it receives the “new employee” message, the salary message may be rejected, causing inconsistency of data between the two systems. Duncan Millard U1796407 Chapter 1: Introduction page 9 This thesis invesugates existing solutions to the message ordering problem, evaluates their suitability for EAI, and then presents and evaluates a message ordering protocol that is specifically designed to address the unique features of EAI. 1.2. Thesis Structure This thesis has the following structure: Chapter 2 presents a description of Enterprise Application Integration based on a literature search, showing how EA1 is the latest of many attempts to connect computers together. It discusses the typical architecture of modem EA1 systems, and explains why a message ordering problem exists. Chapter 3 describes the message ordering problem in a wider context and presents a literature review of exisung message approaches. Protocols implemenang these approaches are then assessed in the context of their suitability for EAI. Chapter 4 presents a proposal for an EA1 ordering protocol based on the problems found in Chapters 3. The approach to tesang the protocol appears in Chapter 5 with the results and conclusions appearing in Chapters 6 and 7 respectively. Appendices, a list of references and an index conclude the thesis. Duncan hIiUard U1796407 Chapter 1: Introduction Page 10 Chapter 2 ENTERPRISE APPLICATION INTEGRATION 2.1. Introduction / l Lmthicum (2000) describes Enterprise Application Integration as “the unrestricted sharing of data and business processes amongst any connected applications and data sources in the enterprise”. W t informative, i h s description describes a perfect enterprise - something that is unlikely to exist in reality. More reahtically, EA1 is the connection of two or more applications in a way that combines their data and processing capabilities, controlled by a central business process. 2.1.1. Integration: Invisible Glue We have all used integrated systems, probably without realising it. When making a purchase on a credit card, the retailer’s systems talk to the credit card provider to authorise the sale. The credit card’s payment processing system connects to the account management system, which in turn connects to a blllrng system. At the end of the month, the account management system integrates with a “supermarket-style” loyalty card and awards points based on the amount spent. Without the ability to integrate these separate applications programmatidy, humans would have to be involved at every step, increasing the cost and time needed for processmg. Duncan Millard U1796407 Chapter 2 Enterprise Application Integraaon Page 11 2.12. B2B and B2C Enterprise Application Integration (sometimes referred to as Application-toApplication Integration, or A2A) is one part of the wider field of integration. Other categories are Business-to-Business integration (B2B) and Business-to-Consumer Integration (B2C). EA1 can be thought of as the view from within the enterprise, whereas the other approaches move outside of the enterprise. Each of these types of integration has its own features and problems, but each shares a common core: connecting two or more entities together, whether those entities are applications, businesses or people. 2.2. The Evolution of Software It is worthwhile loolung at integration - or its closest equivalent at the time - in the context of each major development in computing archtecture. This gives an understanding of how today’s problems have evolved and the increased need for EAI. 2.2.1. Before Integration: The Mainframe E d y computer platforms used in industry were mainframes responsible for all processing in the business. With all data and processing centrdy located, there was no real concept of integration; if another process needed data it was heady present on the machine and accessible, albeit with programming sometimes required to access it (Lmthicum, 2000). Duncan Millard U1796407 Chapter 2 Enterprise Application Integration Page 12 2.2.2. Integration through Data Sharing The arrival of desktop-based compuung led to a rapid growth in compuang in the enterprise (Sinha, 1992). Data was no longer stored on the central mainframes; it was instead spread across many desktop machines in an uncontrolled manner. Departments were able to store data locally in ways that had not previously been possible. This created a vast number of “data islands”, with data business-critical in nature - - some of it stored and managed in an ad-hoc fashion on desktop computers (Hackney, 1996). As the desire grew to use information held by others, techologies such as file servers enabled the sharing of data between different machines and s o b a r e packages. In a similar rnanner to data reuse on the mainframe, file conversion utilities facilitated data shanng. 2.2.3. Stovepipe Systems R e c o p i n g that rnany people within a department were relylng on data sharing to achieve their goals, department-level applications began to emerge. These applications addressed a particulat department’s processes, such as general ledger or sales management (Ltnthicum, 2000). These applications typically operated in isolation, with very little attention given to interoperability with other systems. This type of system was known as a stovepipe application, for reasons that are apparent from Figure 1. Duncan Millard U1796407 Chapter 2 Enterprk Application Integration Page 13 Figure I: Stovepipe applications Data held in a stovepipe application is typically referred to as being in a “data silo” a large data store that is only accessible through the application itself. 2.3. Automating Integration 2.3.2 Integration through Middleware With no way to interrogate a data silo directly from outside of the particular stovepipe application, there was a need to pass data from one stovepipe to another in a more controlled, but automated, manner. The umbrella name given to the group of technologies used to connect applications in this manner is middleware. There are many definitions of middleware. One of the best I have seen is: Sofiare thatpmvides a link between stparate sojbare qpkations. Middlewmz? C. ..I connects two appkations and passes data between them. (Fehai Student DuncanMillard U1796407 Chapter 2 Enterprise Application Integration Page 14 This definition is specific enough to describe the role of middleware, without tying it to any particular technology. Applications began to support interoperability with different proprietary middleware software. With these interfaces in place, some of the htst truly automated integration began to emerge. Using applications together no longer involved custom or m a n d steps to extract data from an application; data was instead made available directly by each one. As these applications grew in size, a new category of system developed the enterprise application. - 2 23.2. Enterprise Applicatio * (A4 flLL Enterprise applications are large, complex inter-departmental applications. They typically feature complex business logic and provide functionality for many groups of users in an organisation. Enterprise applications could replace the indimdual data islands and departmental-level applications, or provide new, enterprise-wide functionality. Enterprise applications are more likely than other categories of system to feature support for advanced middleware. The last decade has seen the emergence of sophisticated middleware technologies, includmg CORBA, DCOM, Screen Wrappers, and JavaBeans (Themistocleous et al., 2001; Vinoslu, 2002; Reyes et al., 203). Enterprise applications are also more likely to expose application-progfamming interfaces (APIs) for accessing their functionality, thereby facilitating their integration with other systems. DuncanMillard ut796407 Chapter 2 Enterprise Application Integxatbn Page 15 23.3. Point-to-Point Integration The use of middleware to connect applications led to point-to-point integration. This is the general term given to a direct connection between two computer products or systems. Due to the reliance on a direct connection, the addition of a third system requires two new links. Similarly, the addition of a fourth requires three more links. Flgure 2 shows a number of systems connected via point-to-point integration. Figure 2 Point-to-point integration between multiple systems Point-to-point integration is attractive as a “quick fix” technology, but as a strategc integration architecture it is constrained by a number of limitations. Truman (2001) describes the point-to-point approach as “a spaghetti of interfaces closely binding systems together”. He points out that systems of this type “ n o d y exhibit the traits of hgh levels of complexity, risk and escalating cost’’ DuncanMillard U1796407 Chaptez 2 EnterpriseApplication Lutegation Page 16 A survey conducted in 2001 by Themistocleous et al. (2001) examined the integration of Enterprise Resource Planrung (ERP) systems with other systems. All respondents said that point-to-point integration was not the best approach for integratmg their systems due to the maintenance problems it introduces. Futther, Themistocleous et al. estimate that to integrate x applications usmg a point-to-point approach requires the development of x*(x-1)/2 interfaces. Espinosa and Pulido e 0 2 ) F e e . Two additional drawbacks identified are that _2__ point-to-point integration tends to be invasive to the applications involved, and that business processes are difficult to change without changing the code of the applications. Custom interfaces need to be developed to cope with messages from other systems, the development of which may require a k h degree of understanding of those systems. It is clear from these studies that point-to-point integration will lead to unsustainable, rapidly escalatmg costs as the number of systems increases. - L?L&&A \j 2.4. A New Approach to Integration 2.4.1 Hub and Spoke Architecture By the late 1990s, a number of factors began to converge that fuelled an increased demand for more complex integration. The extensible Markup Language, commonly known as XML (W3C, 2000), emerged as a de fact, transparent way of representug application data. XML received substantlal backmg from companies includmg IBM and Microsoft, but its adoption as a recommendation in 1998 by the W3C served to cement the place of XML in modem software (Eyal and Milo, 2001; DuncanMillard U1796407 Chapter 2 Enterprise Application Integration Page 17 &e Gosain et ut!, 2003). The industry also agreed on web services as the XML-based standard for inter-application communication (Levitt, 2001), offering sLgzllhcant benefits for application integration (Stal, 2002; Kreger, 2003). This new cross-platform cross-vendor approach, coupled with an increasing desire to connect applications, and the &h cost of point-to-point integration led to a new way of thmkmg. Hub and spoke integration emerged, offering a centralised approach similar to that seen with client-server computmg (Lnthicum, 2004) and offering a way to control integration complexity. Flgure 3 illustrates why this architecture is known as hub and spoke. Figure 3: Hub and spoke integration Each spoke represents a path of communication between the integration hub and the application at the end of the spoke, effectively a single point-to-point integration between the hub and each application (Hohpe and Woolf, 2004). As with point-to-point integration, applications connect to the hub using middleware - for Duncan Millard Page 18 U1796407 Chapter 2 Enterprise Application lntegration , example message queuing,'Web -3or Services /, 'U The central control of connections means that it is cost effective to offer support for a wide range of connection methods. 2.4.2. Integration Engines At the centre of the hub is typically a specialrsed integration engine, specifically designed to cope with the needs of application integration. Companies such as Tibco, IBM, Seebeyond and Microsoft have produced integration engines (Bussler, 2002a,b; Medjahed et d.,2003), and this is still a growing area of the industty. Integration hubs typically provide the following features (Lmthicum, 2000): m Message transformation from one application's format to another Intehgent rouang to decide which message to send to which application A rules engine to automate business processes Intuitively, all three points requite that the format of incoming messages is well known by the integration engine. Further, the automation of business processes requites that the integration engine has an implicit understandmg of the data contained within those messages, as described in the next sub-section. 2.4.3. Automating Business Processes Automated business processes, commonly known as orchestrations, are responsible for managmg all interactions between the various systems and for making decisions based on the data that flows through them. DuncanMillard U1796407 Chapter 2 Enterprise Application Integration Page 19 A typical example is an orchestration for processing purchasing requests from employees. The orchestration logic uses its knowledge of the message format to extract the amount of the purchase request. It then uses its implicit understan- of that data to decide whether the request is within the employee’s budgetary limit. If so, the request is routed directly to the purchasing system in a format recognised by that system. If not, the request could be converted to an e m d containing the employee’s name and purchase request, which is then sent to a purchasing supervisor. Orchestrated business processes are typically long running in nature (Kuo et aL, 2003)’ in contrast to those canied out using a database which tend to be short in duration. 2.4.4. EAI in the Real World In section 2.4.2 I mentioned that many commerual vendors have produced integration engines. To help put this into a real-world context, I will briefly present three EAI case studies and scenarios drawn from industry publications and promotional literature. It is important to remember that these examples come from RDk commercial magazine editorials and vendor-published case studies. As such, no academic conclusions should be drawn from what are essentially marketmg and opinion pieces. Despite this caveat, they do accurately represent some of the cutrent uses of EA1 in industry based on my own experiences workmg in thrs field. Magaene Sivb.rmptions A publisbmg company receives magazine subscription information from two sources: either once a day in bulk, from a third party datacollection company who Duncan Millard U1796407 Chapter 2 Enterprise Application Integration Page 20 \+&-JJ $%r:, L \ . process postal applications, or on an individual ad-hoc basis horn agents taktag telephone bookings. In both cases, a common business process runs to administer the subscription records, executing tasks such as updatmg the subscriber database and issuing a receipt to the subscriber (Altmanand Alttnan, 2004). Content Bmkering General Motors uses EA1 technology as a "broker", coordinating the communication between a number of different applications. The broker converts data between the many formats in use by the applications, and allows additional legacy systems to be added to the integrated whole (VaLl, 2OOO). FieM Force Enablement Accenture offer a solution known as Field Force Ehablement, which allows field workers to access data held in enterprise systems (Accenture, 2003). A work management system is used to assign tasks to employees, who then use handheld computers to report on the status of these tasks. Updates are automaticaly sent back to an integration hub, which updates all of the affected enterprise systems. 2.4.5. Asynchronous, Long-RunningProcesses As illustrated in the previous section, and as described in the technical documentation of commercial integration e w e s (e.g. Microsoft 2004), integration engines are designed for hw& volume, enterprise-scale processmg. As such, they typically rely on an asynchronous, message-passing approach to communication (Bussler, 2002b) meaning that no single application is tlghtly coupled to any other. Duncan Millard U1796407 Chapter 2 Enterprise Application lntegration Pa!ge 21 The combination of asynchronous message passing and long-running transactions means that the processing of a particular message will take an unpredictable length of time. As a result, it is not possible to know the order in which incoming messages will emerge from the integration hub ready for delivery to other applications. 2.5. The Message Ordering Problem for EAI As illusttated in section 1.1, a variable order of message processing can lead to problems with data integrity. At the human level, the HR example could have led to an employee not receiving their fitst salary cheque. Ths shows that the message ordering problem in hub and spoke integration can have a real and visible impact. I will now examine some of the features of EA1 that affect message ordering. This will allow the evaluation of existing ordering approaches and protocols to assess their suitability for use in hub and spoke EAI. 2.5.1, Cost of Integration Accordmg to Attachmate Corporation (Attachmate Corporation, 2004), a major cost of integra- an application occurs when the application needs modification in order to work in the integrated environment. This conclusion is supported ----_ of Espinosa and Pulido (2002). The ideal EAIcentric message should therefore be non-invasive to the integrated application - that is, the application should not need modification in order either to supply ordering information with its messages, or to understand ordenng information on messages that it receives. Duncan Millard U1796407 Chapter 2 Enterprise Application Integration page 22 !A 2.5.2. Extensibility The Content Brokering example in section 2.4.4 (Vah, 2000) demonstrates that, over time, the applications participatmg in an integrated system will vary. Similarly, the Field Force Enablement solution has no assumptions about which work management or resource scheduhg package is in use, or even that such an application exits in the integrated system. An EA1 protocol therefore must not require explicit or implicit knowledge other applications in the system to function. This also allows an incremental approach to implementation, such as that described in Emmerich et al. (2001), rather than requiring a hh-risk “all-in-one” implementation. 2.5.3. Aspchronicity As discussed in section 2.4.5, one of the main features of an integration engine is that it uses asynchronous communication.Hence, it is essential that any protocol includes a level of decouphg between the sending of a message and its receiving and delivery. 2.5.4. Hub and Spoke Architecture The hub and spoke archtecture described in section 2.4.1 is a major architecture in modem integration solutions. A full consideration of architecture is vital in ensuring --*,“ p r ,l , oz,-4, “T that a system can meet its required goals (Clements and Nothrop 1996), and SO any A :/ \ - &LA* A ,’ J1/ A protocol must be compatiblewith the hub and spoke approach. This implies that the ,,, , , 4 1 _/ Duncan Millard U1796407 Chapter 2. Enterpdse Application lntegdon Page 23 .& protocol must not rely on point-to-point communication between any of the integrated applications, placing further importance on the need for asynchronicity. 2.5.5. Resilience Integrated systems are, by defkution, situated on more than one physical They are therefore susceptible to network failures, communication delays, and other similar occurrences - there is no guarantee that a complex, multi-application integrated system will be k h l y available. Any protocol must erefore be resilient 8- - - and capable of dealtng with the unavailability of, or delays in communicating with, a remote application. 2.5.6. Flexibility and Efficiency It is widely accepted that a system processing a single message at a time will show lower throughput than a system capable of concurrent processing of multiple messages. Given the enterprise-scale nature of M I , it is important that an orderiug protocol does not place unnecessary constraints on the processing of messages, particularly in hght of the asynchronous processing and unreliable communication issues already discussed. In most cases, the optimal level of efficiency is constrained by the spec& data integnty requirements of a system. A real time stock ttadmg system would require very strict ordering of messages to ensure fair tradtng, thereby limiting its throughput. In other applications, applymg weaker ordering rules may be appropriate, increasing throughput at the potential cost of data and transactional integnty. An ordenng protocol should therefore allow the efficient throughput of Duncan Millard U1796407 Chapter 2 Enterprise Application Integration Page 24 IATL I .!n 1-4 / ; r vj4,. $’/ messages by being flexible enough to adapt to an integrated system’s particular needs. 2.6. S u m m a q / EA1 is a growth area of compuung, with the technology far fi-om mature. There are still technical hurdles to overcome, one of which is to find a solution to the message ordering problem inherent in an asynchronous hub and spoke architecture. This chapter identified a number of features that can be used to assess a particular ordering protocol’s suitability for EAI, and these will be used in the next chapter to shape a literature review of existing ordering protocols. Duncan Millard U1796407 Chapter 2 Enterprise Application Integration Page 25 Chapter 3 AN INVESTIGATION OF MESSAGE ORDERING 3.1. Introduction \/ The previous chapter illustrated the problem of message ordering as it relates to integra- enterprise applications. Message ordering is a relatively mature discipline within dstributed systems research, and this chapter presents a literature review of current practice in this area. The message ordering approaches and protocols identified are assessed with respect to theit suitability for EAI, accordmg to the features described in section 2.5. 3.2. The Problem of Message Ordering Message ordering is relatively simple to comprehend messages sent to a remote system, where communication is subject to unpredictable delays, should arrive in the coftect order. There are problems and subtleties in determining the correct order (or, more precisely, one of the correct orders) for message delivery in traditional distributed systems. These problems have warranted extensive examination in literature (for example Lamport, 1978; Cheng et uL, 1995; Murty and Gar& 1997; Fritzke, Jr. et al., 1998). Duncan MiIlard U1796407 Chapter 3 An Investigation of Message Ordering Page 26 3.2.1. An Informal Language for Discussion Before dlustratmg the problem of message ordering, it is necessary to define an informal language for presenting the example scenarios. Figure 4 shows the key terns used. ml: A message, ml Send(m1): The sendmg of message ml by a system Receive(m1): The receipt of message ml by a system hb + hb : “Happens before” - e.g. Send(m1) + Receive(m1) means that a message is sent before it is received Figure 4: An informal language for message ordering An immediate point to note is the decoupling of the sen- of a message by a system, and receipt by its destination. In essence, this is where the crux of the message ordering problem lies: there is a variable delay between the sen- of a message and its receipt. A message can therefore be received by a remote system before it is its “tum” to be processed - in other words when an earlier message on which it depends has not yet been received (Murty and Garg, 1997). 3.2.2. Illusttative Scenarios I have created a small number of scenarios to k h h g h t potential ordering problems, and illusttated these with a simple timeline for asynchronous systems. Duncan Millard U1796407 Chapter 3: An Investigation of Message Ordering Page 21 ScenatioJ with one Sender and one Receiver: A ml m2 Figure 5 Sequential send, non-sequend receive Figure 5 shows the most basic form of message ordering problem: system A sends two messages, m l and m2, intendmg them to be processed in that order. Delays cause them to be received, and hence processed, in the order m2 then ml. Stated in the informal language, the intended sequence of events on process B is hb process(m1) hb +process(m2), but process(m2) + process(m1) is the sequence that occurs. In some cases the impact may be neglqpble, but it is not hard to imagine where this could cause problems. With the Field Force Enablement example in section 2.4.4, consider a field engineer logging an additional piece of previously unplanned work on their mobile device (causing one message to be sent to the integration e w e ) . When the engineer records that work as complete (causing another message to be sent to the integration engine), the messages must anive in order, otherwise the integrated system would be told of the completion of a piece of work that it did not know about. Dunan Millard U I796407 Chapter 3: An Investigation of Message Ordering page 28 Scenatios with Muhple Senders and one Receiver A ml Figure 6: Sequential send, non-sequential receive The scenario represented in Frgure 6 shows two separate processes, each senmessages to the same receiver. This is essentially the same as the previous scenario, with the added complication that the messages originate from different processes. As in the previous section, the likely intended sequence of events at process C is hb hb process(m1) +process(d), but instead process(m.2) + process(m1) is the sequence that occurs. -? In some circumstances it d not be necessary to coordinate the delivery of these - messages - for example where the data sets for A and B are orthogonal, however where one or both of the senders are issuing time-based instructions, order of delivery could be critical. As an example, consider the subscription processing system from section 2.4.4. Message ml could contain a batch set of records from third party data processors, and m2 could contain a time-based instruction to generate the day’s invoices. In this case, the batch records submitted on the day in question will not be included in the day’s invoices. Duncan Millacd U1796407 Chapter 3: An Investigation of Message Ordering page 29 Scenarios with Multrple Senders and Muhple Receiven: A B * c ml : m2 m3 Figure 7: Dependent concurrent send Flgure 7, taken from Yoshida (2001), shows one of the more diffidt problems to address. Here, the sen- of two messages is concurrent, with no apparent means to control the order in which they are processed. It may be that the system needs to constrain the order to preserve data integrity: for example ensuring that hb process(m2) + process(m3) because m3 needs the result of m2 to continue. The Field Force Enablement system faces this problem when sending messages to a mobile device. Message ml could represent a held worker n o w the work management system of a newly identhed piece of work required in the field. Message m2 would originate from the work management system to notify the schedultng system of the new work, and m3 would be a message from the field device as@ for a new schedule. In this particular case, it would be preferable for the request for a new schedule to be processed after the message recordmg the new piece of work. DuncanMillard U1796407 Chapter 3: An Investjption of Message Ordering Page 30 3.3. Message Ordering Approaches With the message ordering problem now illustrated, and the criteria for message ordenng protocols in EA1 established in section 2.5, it is possible to evaluate existing approaches and protocols for their suitability for use in EAI. In the 1980s and 1 9 9 0 ~ academics ~ produced a number of approaches to message ordenng designed to address the problems inherent with asynchronous distributed systems. Early approaches such as causal ordering and total ordering have been followed by more recent work, includmg application ordenng (Singh and Badarpura, 2001) and actual causal ordering (Cheng et al., 1995). These more recent approaches attempt to enhance traditional causal ordering. The follow subsections describe each approach to ordering and evaluate a protocol that implements that approach. 3.3.1. Total Ordering Total ordering is perhaps the most strarghtforward message ordering concept to understand. It requires that all messages sent to multiple destinations are processed at each destination in the same order - the order in which they were sent. Furthermore, two messages sent concurrently by two different senders will always be processed in the same order at every remote site that receives them, as shown in hb Figure 8. Total ordering ensures that, if process(m1) + process(m2) at site Cy hb then process(m1) DuncanMillard U1796407 + process(m2) also holds at site D. Chapter 3: An lnvestigatim of Message Ordering Page 31 B C D i ml m2 Figure 8:Total ordering Pmtocol E valuation Aganval et aL (1998) describe a protocol for total ordenng, Totem. Totem is designed for a single-broadcast environment such as a local network with a number of listening processes, together formLng a “token ring”. The protocol uses the concept of a logical token, which is transmitted point-to-point between processes. Possession of the token grants the holder the right to broadcast messages to the other members of the ring. Each outgoing message receives a sequence number, which identifies the order of sen- of messages across the whole system. In addition to the main application messages, the system supports configuration messages, informing the ring of changes to its membership - for example a process joining or leaving. The requirement for synchronous point-to-point communication between processes means that the token passing approach is not suitable for EA1 as it is incompatible with an asynchronous hub and spoke architecture. The processes also need to explicitly handle the token and control messages, requiring an invasive approach. Extensibility is supported, as any application can be added simply by includmg it in DuncanMillard U1796407 Chapter 3 An Invesdgadon of Message Ordering Page 32 the token ring, and the protocol has built-in resilience through detection of the failure of a participatmg process. The protocol is not neither efficient nor flexible, because it constrains the order of message transmission to the process holding the token, with no way of overridrng thts. Summay Table 1 summarises the suitability of Agarwal et al.’s protocol for use in MI. A tick denotes that the protocol meets a criterion, and a cross denotes that it does not Asynchronous x Hub and Spoke Compatible x Resilient J Efficient x Flexible x ~ 3.3.2. Causal Ordering The basis for causal ordering is Lamport‘s well-known “happens before” relation (Lamport, 1978) and is a relaxation of strict total ordering. There are a number of formal and informal detinitions of causal ordering p e n in literature (for example Tyler, 1994; Mostefaoui et al., 2001), but all are essentially variations of the informal definition: DuncanMillard U1796407 Chapter 3 An Investigation of Message Ordering Page 33 . . . Event A happens before event B if the two events are from the same process, and event A occurs in time before event B Event A happens before event B if event A is the senof a message to a process and event B is the receipt of that message by that process. Further, a message cannot be received that has not previously been sent The happens b$ore relation is commutative - if event A happens before event B, and event B happens before event C, then event A also happens before event C. Further, if neither event A happens b$on event B nor event B hqpens before event A then the two events are concurrent In contrast to total ordering, their order of processing at a remote site is not defined and may vary between remote sites as iUustrated in Figure 9. A 6 C D i m2 m? Figure 9: Causal ordering It is possible that no actual relationship exists between two messages even though causal ordering prescribes one. Causal ordering therefore represents o d y the potential causal order of a series of messages. As such, no causal ordering implementation can be considered truly efficient, as it will introduce delays where none are warranted (Cheng et d,1995). Duncan h G h d U1796407 Chapter 3: An Investigation of Message Ordering page 34 Pmtocol E valuation The protocol proposed by Kshemkalyani and Singhal(1995; 1996) for implemenang (potential) causal ordenng is far closer to the requirements for EA1 than the total ordering protocol of A p a l et al. It uses asynchronous message passing to an arbitrary and dynamic number of destinations, with no assumptions made about communication patterns (abridged from the list in Kshemkalyani and Singhal, 1996). Causal data is held in two places: metadata added to the message itself, and message logs at the receiving processes. Kshemkalyani and Singhal's implementation prove that their implementation is optimal when compared to other causal ordering protocols such as Binnan and Joseph, 1987; Raynal et al., 1991; Skawratananond et al, 1998. This is because thek dual storage requires only the minimum amount of data to be added to each message (as defined in Kshemkalyani and Singhal 1995), and allows cleat and optimal requirements for persisttng causal history in message logs. W s t this represents an efficient implementation of causal ordering, causal ordering itself is not efficient per se, as it creates sometimes unnecessary delays in processing, as will be shown in section 3.3.3. It is also inflexible, as the ordering is based purely on the time each message is sent. It is the responsibdity of the receiving process to consume messages in the correct causal order using this data, hence makmg this an invasive protocol which is unsuitable for EAI. The approach of adding data to the transmitted message fits well with the EA1 feature of not relying on point-to-point communications between applications. AU DuncanMillard U1796407 Chapter 3: An Investigation of Message Ordering Page 35 causal information is contained within the message itself, without reference to any specific application in the system. The concept of a message log for persisung causal data also fits well with a hub and spoke architecture, whereby such a message log could be centralised within the hub to maintain consistent causal information for all integrated systems. Unfortunately, the protocol does not address resilience, instead assumrng that a reliable communication method exists by which to transmit messages. Szimmaty Table 2 summarises the suitabdity of Kshemkalyani and Singhal’s protocol for use in M I . A tick denotes that the protocol meets a criterion, and a cross denotes that it does not Non-Invasive x Extensible J Asynchronous J and Hub Spoke Compatible J L Resilient x Efficient x Flexible x Table 2 Suitability of Kshemkalyani and Singhal’s ordering protocol for EA1 Duncan Millard U1796407 Chapter 3: An Investigation of Message Ordering Page 36 33.3. Actual Causal Ordering Cheng et al. (1995) developed the idea of causal ordeang M e r , by noting that causal ordering based on Lamport’s beens before relation describes only thepotential causal ordering. Their observation is that although the bappens b$ore relation causally relates two messages, there may be no actlralrelationship between the two messages. This could lead to an unnecessary delay in processing the second message while the system waits for its potentially, but not actually, causally related predecessor. They propose an ordenng approach called actual causal ordering, whereby an application explicitly and programmatically specifies the causal relationships between the messages that it sends. This approach ensures that messages are not delayed unnecessarily when there is no causal relationship between the delivered and the undelivered messages. Pmtocol Evaluation The protocol for actual causal ordering described by Cheng et al. (1995) is very similar to the approach to potential causal ordering described in the previous section - the protocol adds data to an outgoing message to iden@ the causally preceding messages. The difference is in the definition of causally precedmg. In Cheng et all’s approach, the application developer specifies the causal order of sent messages using programming constructs, makrng it an invasive protocol. This ensures the optimum throughput - or more accurately the minimum latency - by ensuring that messages do not incur unnecessary delays. The protocol is therefore flexible and adapts to the particular application, lea- to efficient delivery of messages by reduung unnecessary delays. Duncan Millacd U1796407 Chapter 3: An Investigation of Message Ordering Page 37 Maintainmg the full causal history on each message allows the protocol to delmer causally related messages to different receiving processes, ensuring that the protocol itself is independent of the destination processes and hence extensible. This protocol also provides a degree of atchitectural detail not found in previously described work. It uses protocol s m to implement the actual causal ordering, accessed via a programmatic interface. Each protocol server has two parts: a sender and a receiver, which as the names suggest are responsible for transmimng a message to a destination, and dehenng a received message respectively. The decouplmg of the sender and the receiver is a good abstraction for supporting asynchronicity, which is critical for EAI.By positioning senders and receivers within the integration hub, the protocol could be adapted to a hub and spoke approach. Duncan Millard U1796407 Chapter 3: An Investigation of Message Ordering Page 38 Summaly Table 3 summarises the suitability of Cheng et aL’s protocol for use in EAI. A tick denotes that the protocol meets a criterion, and a cross denotes that it does not Non-Invasive x t-t-+ Asynchronous Hub and Spoke Compatible J Resilient J Efficient J Flexible J Table 3: Suitability of Cheng ct 01’s ordering protocol for EA1 3.3.4. Application Ordering Noting that causal ordering was effectively a weakening of constraints between messages compared to total ordering, Singh and Badarpura (2001) propose an approach whereby message ordering requirements a e instead strengthened, based upon application-specific constraints. An example of where this is useful, paraphrased from their paper, considers a distributed teachmg/student application in which students send questions to an instructor, who then replies. The application may wish to enforce that an instructor’s reply must be delivered to all students before the next student question is displayed to the students, hence ensuriag a correct paidag of question and answer for all Duncan Millard U1796407 Chapter 3: An lnvesbgatim of Message Ordering Page 39 present Causal and total ordering would not allow this, because the second student question may well precede the sen- of the answer. In the example above, the communication layer of the distributed application knows in advance that a question will generate an answer, allowing it to predetermine a sequence number for the reply. The authors show that an “ordenng specification” can be used to encode application-specihc knowledge, causmg a reduction in both the time taken to allocate and determine ordering information, and the complexity of the synchronisation logic needed by the application. Protocol E valuation Smgh and Badarpura’s application ordering protocol “pre-allocates” spaces in a sequence for future messages that the application knows will be generated - for example an answer message in response to a question message. In doing so, they claim efficiency savings when compared to (potend) causal ordering. The actual causal ordenng approach, which adopts a similar principle, is not considered in their paper. This invasive protocol supports aspchronicity and a hub and spoke architecture as it is does not rely on point to point communication between senand receiving applications. Although the sequencing produced is efficient for the partic& application, the sequence is tied to the message flow of a specific set of applications, making it inflexible. The partiupattug applications could readily change in an integrated environment, as new systems are added, meaning that it would not meet the extensibility criteria I have identified. DuncanMillard U1796407 Chapter 3: An Investigation of Message Ordering p.g. 40 However, the concept of interpreting application data and deducing causal ordering from this is one that merits further investtgation, as the nature of XML means that application data can be readily accessed and understood. This is examined further in sections 4.3.2 and 4.4. Summa9 Table 4 summarises the suitabihty of Singh and Badarpura’s protocol for use in MI. A tick denotes that the protocol meets a criterion, and a cross denotes that it does not. Non-Invasive x Extensible Asynchronous Hub and Spoke Compatible Resilient Duncan Millard U1796407 J Efficient Not specified in Paper J Flexible x Chapter 3: An Investigation of Message Ordering Page 41 3.4. S u m m a r y / Table 5 summarises how each of the protocols compares against the EAI-suitabihty criteria identified in section 2.5. Compatible Resilient x J J Not specified ir Papa Efficient x x J J Flexible x x J x Table 5: Suitability of message ordering approaches for EA1 x Duncan Millard ut796407 Key: = Feature not met = Feature met Chapter 3: An Investigation of Message Ordering Page 42 3.5. Conclusion It is clear from the above that no smgle protocol or approach meets the criteria for EAI. The major problem area in all protocols is the use of an application-invasive approach and the lack of flexibility that this implies. As already discussed, these are lughly undesirable features in EAI. In the next Chapter, the best aspects of the above protocols will be combined to define an MI-specific message ordenng protocol. Duncanmad ut796407 Chapter 3 An Investigadon of Message Ordedag Page 43 Chapter 4 INFERRED CAUSAL ORDERING: A PROTOCOL FOR EA1 4.1. Introduction ~ ~~~~~ ~ J ~~ ~ The previous chapter summarised the strengths and weaknesses of a number of existing message ordenng protocols against the features of EA1 that affect message ordering. This chapter proposes an EAI-centric protocol for message ordering, and then assesses it against the evaluation criteria used in Chapter 3. 4.2. Guaranteeing Causal Ordering Cheng et al. (1995) idenafy three aspects which are necessary to guarantee causal orderkg . . Ob- causahty from application programs Representing and conveying causahty Preserving causality (that is, ensuring that the causahty requirements are obeyed) To these, I will add one more point that is required to ensure non-invasiveness to the receiving application: Removal of causality information from delivered messages Duncan Millard U1796407 Chapter 4 Inferred Causal Order& A Protocol €or EA1 page 44 In other words, a message that has been modified to “represent and convey causality” must have this information removed before it is delivered to the target application. If this does not happen, the message could be rejected as invalid for contarnrng unexpected data. 43. Obtaining Causality from Application Programs 4.3.1. Basic Ordering Information One of the key issues for any protocol is how to obtain the correct ordenng semantics. As shown in the previous chapter, traditional protocols require that the sending application explicitly specifies ordering mfonnation. To avoid this application-invasive approach, responsibility for ad- these semantics must be deferred to the integration hub. A simple sequence number could be applied to each message as it d e s at the integration hub. Provided that this order is enforced when delivering messages, the system would be operating with total ordering. Although this is a valid approach, a more sophisticated approach is made possible by considering the nature of XML messages. 43.2. XML An XML document contains both data and structure. Tags delimit data items in a form that follows a number of simple rules. A basic XML document showmg details about a person is shown in F w e 10. Duncan Millard U1796407 Chapter 4 Inferred Causal Ordedng. A Protocol for Eru Page 45 <Person> <Forename>Duncan</Forename> <Surname>Millard</Surname> <Title>Mr</Title> </Person> Figure 10 A simple XML document As Figure 10 shows, XML is a very easy format to interpret The rules governing its syntax are relatively straightforward, ma- it simple to implement parsers to create and consume XML data. Sections 2.4.2 and 2.4.3 described how two of the main tasks of an integration engine are to transform the format of messages, and to execute business processes based on that data. This implies that the format of messages passed to the hub must be well known, and that the integration hub is empowered to make decisions based on the implicit meaning in the data that it receives. Section 3.3.4 described the use of application data to predict future message ordering in the approach called application ordering. I propose taking this idea and combining it with the transparent nature of XML to infer the causal relationships between received messages. 4.4. Inferred Causal O r d e b ~~ ~~~~ Consider Figure 11, which shows three messages sent in succession by an HR application. Traditional potential causal ordering mandates that these messages must be processed in the order message 1, then 2, then 3. By inspectmg the data however, it can be assumed that message 2 is independent of messages 1 and 3. Duncan Millard U1796407 Chapter 4 Inferred Causal Ordering: A Protocol for EAI page 46 !4essage 1: <CreateEmployee> <Number>12345</Number> <Name>Millard</Name> </CreateEmployee> Message 2 : <CreateEmployee> <Number>l2346</Number> <Name>Adams</Name> </CreateEmployee> Message 3: <SetPayLevel> <EmployeeID>12345</EmployeeID> <PayBand>C</PayBand> </SetPayLevel> Figure 11: XML messages This assumption has been deduced from the ‘Number” element of the “CreateEmployee” message, and the “EmployeelD” element of the “SetPayLevel” message, as well as from the order in which messages were sent from the HR application. Making decisions based on the content and structure of an incoming message is an integral part of the operation of the message hub (as discussed in section 2.4.3); this approach to obtaining causal information therefore fits well with a hub and spoke integration architecture. In summary, there are two sources of ordering information that can be combined 9 . Data - to determine the grouping of a message with other messages Sending order - to determine the order of messages within a group Duncan Millard U1796407 chapter 4 Inferred Causal Ord+. A Protocol for Page 47 Ob- data from these sources requires no special interaction with the integrated application - they are simply extracted when a message anives at the integration hub. I call this approach z?$md causal odring. 4.4.1. Causal Message Groups Inferring relationships in this way creates what I term causal message pz@. Each message within a causal message group is causally dependant on the earlier messages in that group as defmed by the sequence numbers within that group. Each causal message group is assigned an appropriate identifier constructed from the content of the message - for example fiom Figure 11 two causal groups could be created with group identifiers of Employee12345 and Employeel2346. The identification of causal message groups is entirely dependent upon the data received, hence allowing the creation of different groups depending on the specific ordering needs of the application being integrated. 4.4.2. Cross-Group Dependencies So far I have assumed that each causal group is fully independent - for example one causal group relating to employee 12345 and one relating to employee 98765. Given this causal grouping, consider a message that relates to both employee 12345 and employee 98765, such as that shown in Figure 12. It is not immediately clear to which causal group the message should belong - effectively it could belong to either. Interpre- the message, the inferred information is that the message depends on the “Createhployee” messages for both employee 12345 and employee 98765. DuncanMillard U1796407 Chapter 4 Inferred Causal Odenng: A Protocol for EA1 page 48 ~ Message 4 : <SetManager> <ManagedEmployeeID>12345</ManagedEmployeeID> <Manager>98765</Manager> </SetManager> Figure 1 2 A cross-group dependency In order to model this, the message is assigned to one of the causal groups, and a crvss-gmzp a$fjen&nT referencing the CreateEmployee message in the other group is added. In this particular example, the business rules and protocol configuration allow the deduction that the Createhployee message is always the first message in a causal group. The message can therefore be added to the causal group Employee12345 and a dependency added to message number 0 in the group Employee98765. 4.5. Representing and Conveying Causality The previous section identified two pieces of causal information that must be persisted The causal group that a message belongs to 9 The sequence number of a message w i h that group Causal ordering protocols typically have two places that ordering information is stored (Chandra et aL,2004): A dynamic causal message log Control data added to each message Duncan Millard U1796407 Chapter 4 Inferred Causal Ordering A Protwol for EA1 Page 49 4.5.1. CausalLog In order to apply causal information to incomrng messages, the protocol needs a centralised “causal log”, with an entry for each causal message group. Each entry will store the identiher of the group and the next sequence number to apply to a new message belonging to that group. As messages are received, the sequence number is increased. -ta The use of a message log in this way is similar to that of Kshemkalyani and Singhal (1996) and as described in section 3.3.2, but is centrally located rather than distributed at each remote process. 4.5.2. Message Annotation In addition to maintaumg the causal log, it is necessary to append causal information to each message so that the causal identity of every message is known. This annotation will inttoduce an overhead on every message that flows through the system. The information that must be annotated to every message is: . Causal message group identifier Causal message group sequence number Dependencies on messages from other causal groups In order to ensure that the protocol is non-invasive, it must also be possible to remove the causal information prior to delivery to the destination system. There is a Duncanhmud U1796407 Chapter 4 Inferred Causal Ordedng: A Protocol for EAI Page 50 I L L + general technique of temporarily annotating an XML message, known as enveloping (Hohpe and Wool€, 2004), iUustxated in Figure 13 below. An un-enveloped message: <Message> <Fieldl>Data</Fieldl> </Message> The same message, enveloped: <Envelope> <AddedData>Values</AddedData> <Message> <Fieldl>Data</Fieldl> </Message> </Envelope> Figure 13: Message enveloping An implementation of the protocol can make use of this technique within the boundaries of the integration hub to ensue that from the point of view of the sendmg and receivmg applications the messages remain unchanged. 4.6. Preserving causality 'Treserving causality" means ens- messages are processed by their destination in the correct order. Rephrased, a way is required of delaying message delivery to a destination until such time as all its causally preceding messages have been processed. 4.6.1. Remote Message Stores Kshemkalyani and Siaghal(l996) used a message store at each remote process. The centralised nature of a hub and spoke atchitecture allows this mechanism to be centtally implemented, removing the need for multiple remote stores. DuncanMillard U1796407 Chapter 4: Inferred Causal Order@ A Protocol for EAI Page 51 4.6.2. Dynamic Message Log The protocol d use a dynamic message log to store messages that have been processed by the integration engine, but whose causal predecessors have not yet been delivered to the destination application. In a similar approach to the causal log described in section 4.5.1, the dynamic log will need to track the next sequence number to be delivered for each causal group to each destination. Once the next message for a particular causal message group and destination appears in the dynamic log as identified by the causal information annotated to that message, the message can be delivered to the destination. In this way, causahty is successfully preserved. 4.7. Removing Causality Information As discussed in section 4.5.2, an envelope can be easily added to and removed from an XML message. This message annotation is only required for plaung the message in the dynamic message log. Once the message is ready to be delivered to a destination application, the causality annotation can be removed. 4.8. Evaluation 4.8.1. Evaluation against EAI Features The proposed protocol offers a non-invasive way of obtaining ordenng information from received messages. The use of enveloping ensures that this causal information can be appended to messages without affecting the source or destination systems. By centralising the message logs, hub and spoke compatibility is ensured and resilience is increased. If a remote application is unavailable, for example due to DuncanMillard U1796407 Chapter 4 Inferred Causal Od-. A Protocol for EAI Page 52 network problems, the message can be held in the log until such time as delivery is possible. One negative impact on resilience is that the dynamic message log represents a single point of fdure for the system, a problem common to any centralised system. It is worth noung that an integration hub is, by definition, a centralised system. As such, the integration engine and the hub as a whole are subject to the same problem. One way to mingate this is to ensure that any system implementing the protocol shares the same server(s) as the integration engine. This minimises the chance that the protocol system would be unavailable whilst the integration engine was still running and available: any external system failure would affect the integration hub as a whole and not just the protocol's message logs. The dynamic message log also represents a clear decouphg of the sending of a message to the hub, and its receipt and processing by a remote system, ensuring that asynchronous processing is possible. Finally, the protocol offers a flexible and efficient approach by infenkg ordering information horn the data that it receives, rather than mandating an order purely on the basis of the tuntng of the sen- of messages. Table 6 below summarises how each of the EA1 features is met by the proposed protocoL DuncanMiuacd U1796407 Chapter 4 Inferred Causal Od-. A Protocol for E N Page 53 How this is achieved EAI criteria Non-Invasive message data, not from sending application ctensible J Protocol makes no implicit or explicit assumptions about the source or destinations for a message. It purely operates on the data received. synchronous J A dynarmc message log, centrally located, decouples the sendmg and xeiving of messages and allows synchronous operation. [ub and iompatible Lesilient Spoke J h e protocol is deslgned to operate I a centralised environment, aatchmg the hub and spoke rchitecture J %e dynamic message log offers a vay to delay delivery of messages in he event of a communication 'dure. The trackmg of delivery nfonnation separately for each iestination means that no one iestination depends on any other. 5fficient J Zausal message groups allow Zausally unrelated messages to be processed independently, instead of having their order constrained unnecessarily. Flexible J The protocol creates causal message groups based on the data in a message, and as such allows the implementation of differenl ordering restrictions d e p e n d q or the needs of the applications being integrated. Table 6: Summary of an inferred causal ordering protocol for EA1 Duncan Millard U1796407 Chapter 4 Inferred Causal Ordedng: A Protocol for EAI Page 54 4.8.2. Theoretical Efficiency of the Protocol Introduction The theoretical efficiency of an algorithm consists of determuzrng mathematically the quantity of resources (execution time, memory space, etc.) needed by an algorithm as a function of the size of the input instances (Brassard and Bratley, 1988). For example, if the quantity of resources scales linearly as the input size increases, the algorithm is said to be “in the order of O(n)”, where n is the input. An important point to note is that the theoretical efficiency does not measure the actual performance of an algorithm; it instead measures its ability to scale for larger inputs. As such, constant offsets are ignored - an algorithm requiring 20+n units of resource is still in the order of O(n), hence explaining the lack of units on the Y axis of Figure 14. l ; a 0 1 2 3 4 5 6 7 8 91011 Input Variable Size Figure 14: An algorithm exhibiting O(n) efficiency Similarly, if an input of size 2 requires twice as much resource as an input of size 1, and an input of size 4 requires 16 times as much resource as an input of size 1, the alg0rh-h is said to be in the order of O(n2), as illustrated in Figure 15. DuncanMillard U1796407 Chapter 4 I n f d Causal Ordering:A Protocol for EAI Page 55 1 0 1 2 3 4 5 6 7 8 91011 Input Variable Size Figure 15: An algorithm exlubiting O(n2) efficiency Finally, if the resource required is independent of the variable size, the algorithm is said to be “in the order of 0(1)”, as illustrated in Figure 16. 0 1 2 3 4 5 6 7 8 9 10 Input Variable Size Figure 1 6 An algorithm exhibiting O(1) effiaency DuncanMillard U1796407 Chapter 4 Inferred Causal Od-. A Protocol for EA1 Page 56 The heoretical efficiency of the protocol can be predlcted by examining the following forms of space overhead . . 9 Individual Message Overhead CausalLogSize DynarmcLogSize Message Overhead The metadata added to a message is as described in section 4.5.2. A message is guaranteed to be assigned . 9 A causal message group identifier A causal message group sequence number This p e s a constant overhead per message, and eliminates the need to attach a full causal history to every message. Hence, with respect to other applications in the system, the algorithm is in the order of O(1). The only other data added to a message occurs if a message depends on one or more messages &om one or more other causal groups. Ln this case, only a dependency to the most recent message from each group is required, If there is no dependency, no additional data is added to the message. Hence, the best case overhead is in the order of O(1) and the worst case overhead is in the order of O(n), where n is the number of groups containing a message on which a message depends. Duncan Millard U1796407 Chapter 4 Inferred Causal Ordering: A Protocol for =I Page 57 Causal Log Overbead The causal log needs to maintain a permanent record of the current sequence number of each causal group for each destination. Hence the size of the message log is proportional to the product of the number of causal message groups and the number of destinations - the overhead is in the order of O(n.m), where n is the number of message groups and m is the number of destinations. Dynamic Log Ovethead In addition, in order to support both resilience and ordenng, the messages themselves must be stored until they have been processed by the remote application. The dynamic log size is therefore dependent upon the rate of receipt of messages destined for an application and the rate at which messages are consumed by that application. Once processed, a message can be discarded, m e w that the dynamic log overhead will vary with time. 4.9. S l l m m a r v /L This chapter proposed a new message ordering protocol suitable for use in Enterprise Application Integration. It has the unique feature of infemng ordering semantics from the messages themselves rather than relying on the participatmg applications to support message ordering. The use of a dynamic message log and a causal log will allow central control of ordering, and messages wiU have ordering information added and removed as they enter and leave the integration hub. DuncanMillard U1796407 Chapter 4 Inferred Causal Ord- A Protocol for EM Page 58 An evaluation of the theoretical efficiency of the protocol gave a prediction of the performance of the protocol that is suitable for experimental validation. Duncan Millard ut796407 Chapter 4 Inferred Causal Ord-. A Protocol for EA1 Page 59 Chapter 5 PROTOCOL EVALUATION 5.1 Introduction / The previous chapter presented a protocol for message ordering in EA1 and performed a theoretical evaluation of its efficiency. This chapter defines the approach for measuring the performance characteristics of the protocol when implemented in a simulation environment. 5.2. Evaluation Approach Theoretical analysis of the protocol in section 4.8.2 allowed the prediction of a number of performance characteristics of the algorithm. In order to test these hypotheses, and to prove that the protocol does ensure ordered message delivery, it is necessary to run the protocol in a simulation environment. To ensure that meanqfid results are obtained, I will first idenafy measures that will allow the system’s performance to be modelled, and then identify the parameters required by the simulation to exercise these measures. 5.3. Performance Measures In order to measure the actual performance and the suitability of the protocol to EAI, it is necessary to idenafy measures to assess the protocol in the context of the EA1 features identified in section 2.5. It is also necessary to measure the message overhead, the causal log overhead, and the dynamic log overhead. DuncanMillard U1796407 Chapter 5: Protocol Evaluation Page 60 5.3.1. Test Measures: Theoretical Efficiency Message Overhead The message overhead is the amount of metadata added to each message, measured in bytes or kilobytes. This will show the variance of the overhead in different scenarios, and should confirm that the overhead is in the order O(1) for causal information and O(n) for causal dependencies. Causa! and Dynamic Log Overheads The number of message groups recorded and the number of messages held in the dynamic log will be measured in each of the test cases. This d show the variance of these overheads in different scenarios, and should confirm that they are in the order O(n.m) for causal information, and variable for the dynamic log dependlag on factors such as application availability and backlog. 5.3.2. Test Measures: EAI Features Resilience The resilience of the protocol is its ability to cope with the unavailability of a remote system. This is measurable by tu~llng a series of messages destined for two applications with both applications available, and then repeating this test when one of the applications is unavailable. The impact on the dynamic log overhead and the time taken to process all messages for the application will be measured. 5.3.3. Test Measures: Non-Quantitative Testing Fhxibibg, Eficieny It is not possible to measure the flexibility of the protocol quantitatively. Instead, the data used in the test cases will be set up to contain a value representing a logical stream number’. The efficiency and flexibility of the protocol will be implicitly Duncan htiilad U1796407 Chapter 5: Protocol Evaluation Page 61 shown by the creation of multiple causal message groups based on the value contained in tlm field. Test cases will be run to show the impact of splitung messages into different causal groups in this way to q u a n G the benefit this approach brings. Extensibility Similarly, it is not possible to explicitly measure the extensibjltty of the system. Instead, the use of an arbitrary message format and different numbers of destination systems will show that the protocol operates successfully independently of the applications operatmg in the integrated system. Non-Invasive The sendug and receiving test applications will not have any knowledge of causal ordering, the sen- application will simply transmit test messages to the integration hub. The messages stored in the causal log ready for delivery will be checked to vedy that they do not contain any causal ordering information. These two factors will prove that the protocol is non-invasive. Asynchrvnikg and Hub and Spoke CompatibiIity The sitnulation system’s test cases will use an asynchronous hub and spoke archtecture for processing. The success of the test cases will show that the protocol is hub and spoke compatible and operates correctly under asynchronous conditions. There are no widely accepted or standard benchmarlang programs for causal --- - _- ordering (Chandra et a,!, 2004). The best alternative is to run a simulation EA1 system \ DuncanMillard U1796407 I Chapter 5: Protocol Evaluation Page 62 r, , with a known and variable range of characteristics, and use this to model the performance of the protocoL Intuitively, there are three components to a message ordering EA1 simulation system: The sending application(s) 9 The integration engine The receiving application(s) In order to simulate real-world situations and assess the impact of different types of system on the protocol's performance, it is necessary to introduce a number of variables to the simulation. 5.4.1. System Variables The Sending Application The Number of Sending Applications: As the number of sending applications increases, there is an expected impact on message log size. Time Between Sends: The discussion in section 4.8.2 hypothesises that the more Oequently an application sends messages, the greater the load on the message log over time as messages form a backlog, &en a constant receiver rate. This variable allows the testmg of that hypothesis. Potential Message Concurrency: Any message sent by an application may or may not be causally related to a predecessor. Thls variable models the likelihood that a message is causally independent of other messages. A value of 100 means that every message will be independent (that is, causally unrelated to any other message), and a value of 0 means that no message (other than the h t message sent) is independent Potential Cross-Group Dependency: Application-spedc semantics m a y mean that a message in a causal group depends on a message from another causal group. This models the percentage likelihood of a cross-group dependency being added to a message. Duncan Millard U1796407 Chapter 5: Protocol Evaluation Page 63 The EAZ Hub ‘ Duration of a Business Process: One of the main reasons for message ordering problems in EA1 is the differing duration of the business processes tnggered by each incoming message. This variable represents the probability that a random delay will be added to an incoming message. A zero value implies that no messages will be delayed - in other words all follow the same processing rule - whilst a value of 100 implies that every message is subject to different delays. This allows the determination of the protocol’s sensitivity to variable processing times ’ Transmission Delays: In addition to business process duration, a transmission delay can occur in the system. The impact of this delay on protocol performance can be measured by introducing a €ked delay on all communications and measuring the impact on message log and system thtoughput. 9 Number of Destinations for a Message: The EA1 hub is responsible for processmg incoming messages and deciding to which of the potential destinations to send the message. The upper bound of this variable is the total number of applications in the system and the lower bound is zero. This allows assessment of the “multicast sensitivity” (Chandra et d,2004) of the algorithm. The Receiving Application 8 ’ The Number of Receiving Applications: This variable is required as a counterpart to the ‘Number of destinations” variable of the EA1 hub. Enabled: In order to test resilience it must be possible to “switch off’ one or more destinations so that they do not respond to message delivery attempts. ’ Message Receipt Frequency: Intuitively, the frequency with which receiving applications consume messages has a direct impact on the performance of the algorithm. This variable allows the testing of that impact. 5.5. Test Cases Based on the variables in section 5.4 and the performance measures in section 5.3, I created a number of tests cases to exercise each set of variables. The full list of test cases and steps are shown in Appendix A, and summarised in Table 7 below. Duncan Millard U1796407 Chapter 5: Protocol Evaluation page 64 1 kiteria Measurements Measured by Test Case Ifficiencg Creation of message groups based on a logical stteam' -due Potential efficiency and impact 3f concurrent groups h e taken for end to end aocessing of non-failed .pplication ' Resilience dessage Log Overhead qexible, nvasive Non- hccess at processing for an Ltbitrary message format and lumber of senders and iestinations, without affectmg ipplications Impact of number of senders on logs, impact of number of destinations on logs (implicit tests of suitability) Extensible, Asynchronous particularly resilience, Repeated test cases run All, without failures with different impact of number of senders, and number of destinations system conhgurations. Message Overhead Variance of actual size of the message over time d w i q each test scenario Message Overhead Log Number of causal messagc groups tracked over time All, particularly impact of crossgroup dependency All test cases Number of messages held it the buffer pendeliver! over time Table 7: Test measures 5.6. S u m m a r y J Chapter 4 proposed a data-driven, flexible protocol for message ordering in M I , and made predictions about the performance of that protocol under particular circumstances. This chapter has identified the aspects of the protocol that must be tested in order to vefifv and understand its performance in response to varying input conditions. Duncan MUad U1796407 Chapter 5: Protocol Evaluation Page 65 Chapter 6 TEST RESULTS 6.1 Introduction 9/ The previous chapter identilied the variables that are necessary to test the protocol. This chapter briefly describes the tesung methodology used to exercise those variables, before presenting the results of the tests. The chapter closes with a summary of the performance characteristics of the protocol. 6.2. Methodology In order to give coverage of every variable, nine test cases were identitled. Each test case consisted of a number of steps; in each step the value of a variable was changed. The full list of test cases and steps can be found in Appendix A. Each simulated sendtug application generated an XML message containing the test name, a unique sender number, a test iteration number, and a “logical stream” number to simulate causally independent messages. The protocol was configured to construct causal message group names by combining these elements to give causal message group identifiers similar to: “Test-Number0 fhfessages1O O ~ I t e r a t i o n ~ 2 ~ F r o m ~ O ~ S ~ ~ ~ O ” . The simulation system is described in detail in Appendix B. Duncan Miuacd U1796407 Chapter 6: Test Results Page 66 For each step, the system was cleared of data from the previous run,a “warm up” test case was executed to negate any initialisation delays, and then a fixed number of messages were passed through the system and the results recorded. Each step was executed three times to ensure accurate results. Some variance was observed, which is discussed in section 6.5.2. In total approximately 150 step executions were carried out, generating in excess of 25,000 rows of data. 63. Message Ordering The fitst, and most important, aspect of the protocol is that it must ensure all messages are delivered in order to all destinations. Although the asynchronous name of the system meant that all test cases implicitly tested the in-order delivery aspects of the protocol, the variable business process speed tests in particular caused messages to pass through the system at different rates. Each test had a variable likelihood of causing a delay of up to 10 seconds to the processing. Table 8 shows the number of messages that passed through the integration hub out of order, and confirms that all of those messages were then delivered in order. The quantities were calculated based on the arrival time of the message in the message log, and its delivery time to the remote destination, both of which were recorded by the simulation system. DuncanMillard ut796407 Chaptez 6 Test Results Page 67 ~~ number of Average number of Total messages arriving out of messages delivered out of order to destination order in dynamic log Test case I 36 25% chance of variable 0 business process speed 50% chance of variable 36 0 39 0 business process speed 75% chance of variable business process speed 10Oo/o chance of varying business process speed 6.4. Message Overhead The message overhead was as predicted, bemg in the order of O(1) for all variables other than cross-group dependency. Figure 17 shows that there is a constant message overhead, irrespective of the number of destinations. : ,500 O01 : Q >r e 400 f 300 Q 9 200 Q I 100 0 I I I I I I 2 4 6 8 10 12 Number of Destinations Figure 17: Message overhead independent of number of destinations DuncanMillard U179W7 Chapter 6 Test Results Page 68 Similatly, F q e 18 shows a constant message overhead irrespective of the number of causal message groups. 6oo h (D 1 500 1 I e 400 #I Q B 200 E 100 0 0 2 4 6 8 10 12 Number of Causal Message Groups Figure 18: Message overhead independent of number of causal message groups Duncan Millard U1796407 Chapter 6 Test Results Page 69 For cross-group dependencies, the overhead was in the order of O(n) as shown in F e e 19 below, where n is the number of dependencies expressed. 1200 - 3 1000 a¶ B0 800 - Q) E 0 600 - EI) Q g 400- 200 - 04 I 0 I 1 I 2 1 I 3 4 Number of Dependencies Ftgure 19: How message overhead varies with cross-group dependencies The precise size of the overhead is specific to the simulation system implementation. For a real world implementation, a more efficient XML representation could easily be adopted. Duncanw ut796407 Chapter 6 Test Results Page 70 6.5. Message Log Overhead The message log overhead consists of the overhead of both the causal log and the dynamic log. 6.5.1. Causal Information The causal log size behaved as predicted. The following series of graphs show the behaviour of the causal log under dif€erentconditions. Number 0sDestinations F w e 20 shows that, when tested using a constant number of causal message groups, the log size is in the order of O(n) with respect to number of destinations. 04 I 4 1 2 I 3 I 4 I I 5 6 7 I 8 I 9 1 0 I I I 1 1 Number of Destinations Figure 20: How causal log size varies with the number of destinations DuncanMillard U 1796407 Chaptez 6 Test Results Page 71 Number of CausalMesJage Gmrrps Similarly, Figure 21 shows that for a constant number of destinations, the log size is in the order of O(n) with respect to the number of causal message groups. I I , I I 2 3 4 , 5 I 6 I 7 I 8 I 9 1 I 10 11 Number of Causal Message Groups F p r e 21: How causal log size varies with the number of causal groups Duncan Millard U1796407 Chapter 6: Test Results Page 72 Number $Destinations and Message Gmqbs Figure 22 combines these measurements to show that the causal message log size grows propomonally to the number of message groups and destinations. A 180 &' 1601 - causal rn 80-100 Log Size (entries) 10 " Number of Destinations IL 13 14 15 ,; Number of Causal MessageGroups Figure 22: How causal log size varies with the number of destinations and causal groups Number @Sending Applications The number of causal message groups created depends endtely on the configuration of the protocoL The number of sending applications therefore had no direct bearing on the message log size. As discussed above, the protocol was configured to construct separate causal message gtoups €or each sending application, resulting in the message log overhead bekg in the order of O(n) with respect to the number of sendmg applications. Different rules for constructmg causal groups - for example one which ignored the sending application's identity - would result in a different relationshp . Duncan Millard U1796407 Chapter 6 Test Results Page 73 6.5.2. Dynamic Message Log Size The dynamic message log holds messages until they are ready for delivery to a destination. The following series of graphs shows the impact of a number of variables on the message log size. Number $Messages Sent The hrst test was to understand how the number of messages sent affects the log size, for a single sender and s q l e receiver both o p e r a w with the same latency (I message sent per second, 1 message received per second). Flgute 23 shows the dynamic log overhead plotted against the run time of the test scenario, for input message quantities of 50, 100, 200 and 4-00. The x-axis has been scaled to a percentage rather than an absolute run time to allow comparisons between the log sizes for each test case. Duncan Millard U1796407 Chapter 6 Test Results Page 74 4 -.-.. 8 - ,’ .-_ _ _._.- 0 10 20 30 40 50 60 70 80 90 100 Percentage of Run Time Figure 23: How dynamic log size varies for dfferent message quantities On fist inspection it appears that the greater the number of messages passed into the system, the greater the dynamic log size for a constant send and delivery rate. This is counterintuitive, as theoretically messages are being stored into the log as quickly as they are delivered, i.e. at a rate of one per second, as shown by the log size p trace for the “50 messages” test case. During tesung of larger volumes of messages, I observed that the test machine became extremely untesponsive, with the hard disk hght permanently on indicatmg heavy disk activity. Simultaneously, the CPU usage of the machine was low, indicating that a slow hard disk could be the cause of the problem. I investlgated this by considering the average length of time taken to process 50,100, 200 and 400 messages. Intuitively, with messages being processed at the rate of 1 per DuncanMillard U1796407 Chapter 6 Test Results Page 75 “ \$ -J L second, processmg times should be close to 50, 100, 200, and 400 seconds respectively, albeit with some constant degree of latency introduced by the testing system itself. However, the actual average run times for those test cases were as shown in Table 9. Number of Messages Average Run length (seconds) Variance between maximum and minimum tun times 50 53 0% 100 125 17% 200 248 12% 400 476 10% ~ These results seem to confirm that variable hardware factors become sgmficant somewhere between 50 and 100 messages and may explain the counter-intuitive test results for message quantities greater than 50. For all M e r test cases, I therefore limited the sample size to 50 messages. Sending Frequency to Delivey Frequency Ratios The next test measured the impact of varying the ratio of senspeed. Tests were run in which the speed of the sen% speed to delivery system and the receiving system were changed. All messages were sent in the same causal message group to ensure delivery of only a s q l e message at a time. Duncan Millad U1796407 Chapter 6 Test Results Page 76 50.00 * e. . 45.00 40.00 A OD & 35.00 (II OD OD -w 30.00 iij 25.00 t? A .- 20.00 EmE 15.00 0 10.00 5.00 0.oc 0 20 40 60 80 100 Percentage of Run Time Figure 2 4 How the sending to delivery ratio affects dynamic log s u e Figure 24 shows the impact of v a r p g the sender to receiver frequency ratio on the dynamic log size - for example the 41 line shows the log size over time for a system that is sen% messages in a single message group four times faster than the receiver can process them. These results show that, as predicted in section 4.8.2, the greater the ratio of send speed to delivery speed, the greater the dynamic log size. Note that in aU cases when a sen- application stops sen- messages, the log size r e m s towards zero. Constant L t e n g Having established the performance of the protocol with different ratios of rate of send to rate of delivery, I tested the impact of introducing a fixed latency in the Duncan Millard U1796407 Chapter 6 Test Results Page 77 integration hub for a send to deliver ratio of 1:l. The equal send rate to delivery rate ratio caused messages to be removed from the log at the same rate as they were added, meaning that the latency had no impact on message log size. The log size performance in all cases followed the behaviour shown in Figure 24 for the 1:l ratio. The latency did increase the overall run times for the tests, as shown in Table 10. This behaviour is expected, as the latency produces a constant delay in I Latency (seconds) Average Run time (seconds) 0 53 1 56 2 57 4 59 16 71 Table 1 0 Impact of latency on run times Given a zero-latency run time of 53 seconds, it would be expected that a latency of 1 second would result in a run time of 54 seconds. I believe that this discrepancy is due to the use of a commercial integration engine (BizTalk 2004) in the simulation system. In order to model latency, the orchestration includes a "wait for x seconds" step, executed when the desired latency was non-zero. This type of instruction tends to be designed for longer pauses, such as a day or more, and therefore it is likely that there is some additional overhead caused by internal BizTak processes after issuing this instruction. Duncan MiIlard U1796407 Chapter 6 Test Results Page 78 Variabh Latemy In addition to a constant latency, a variable latency was introduced, to simulate business processes with different and random speeds of execution. Each incoming message has a random likelihood of being delayed by between 1 and 10 seconds. With so many random factors, the data gathered from these tests was intended only to show the type of impact that variable speed processing would have on dynamic g any predictions about future behaviours. message log size, without p 0 10 20 30 40 50 60 70 80 90 100 Percentage of Run Time Figure 25: How variable latency affects dynamic log size Figure 25 shows the log size over time for a delay likelihood of 25%, So%, 75% or 100%. A variable delay is applied to messages to ensure that messages pass through the integration hub in a random order and do not anrive in sequence. The i n i d increase in log size is therefore expected, as incomplete sequences build, ready for Duncan mard U1796407 Chapter 6: Test Results Page 79 delivery. For both the 25% and 100% test cases, the message log decreases in size at around 50% before increasing again. This pattern indicates that a large sequence of messages was available for delivery. I believe that d the experiment was repeated with much larger message quantities (for example 1000 or 2000 messages) then this pattern would be repeated many times for all values of the delay likelihood, with the dynamic log size varylng around some roughly constant value as sequences are assembled for delivery. 6.6. Efficienw One of the primary goals of the protocol was to relax the restrictions inherent in causal ordering according to the relationships inferred from the incoming data. As described in the previous section, the protocol was configured to create causal message groups based on a “logcal stream” identifier contained in the incoming message. A logical stream is analogous to a real-world entity, for example messages relating to a particular employee. In order to see the benefit from concurrent processing, it is necessary to send messages into the system more quickly than they are delivered. If ttus is not done, there will only ever be one message available for delivery at a given time and therefore concurrent delivery is not possible. I therefore changed the send to deliver ratio to 1:6. Flgute 26 shows the run times obtained as the number of causal message groups generated by the protocol varied. The run times have been scaled by a factor of 6 Duncan Millard U1796407 Chapter 6 Test Results Page 80 (due to the 1:6 ratio) so that they are comparable with the other test results presented in this chapter. 60 50 A Q) U - 40 %Q) al E 30 E 2 U 2 20 3 U) 10 C - I I 1 4 I 9 I I 13 15 I 20 I 28 I 33 I 38 50 Number of Causal Message Groups Created Figure 26 How run time reduces as the number of causal groups increases It is clear that a rapid reduction in run-time and increase in efficiency is obtained by the creation of causal message groups. The precise timings are not the critical measure, as they depend on factors such as the order in which messages anive for each group. However, the clear trend is that the protocol is able to deliver sqpficant efficiency gains when compared to traditional causal ordering. From the experimental Wes, it initially seems that there is no efficiency benefit obtained above 13 groups, with any greater number of groups requiring between 9 and 10 seconds for the end-to-end run time. However, the fastest possible time to DuncanMillard U1796407 Chapter 6 Test Results Page 81 deliver 50 messages at the (scaled) rate of 6 per second is 8.33 seconds, ignoring any unavoidable latency from the integration hub. Therefore it is likely that with an even greater send to deliver ratio, greater efficiency gains would be realised for the hlgher number of message groups. 6.7. Resilience To test the resilience of the protocol to a destination system being unavailable, I ran 50 messages through the simulation system, configured for delivery to two destinations. I then repeated the test with the second receiver disabled - that is, not acknowledgmg delivery and not processing any messages. The send to deliver ratio was set to 1:6, as in previous tests, therefore the run times shown in Table 11 have been scaled and are so that they are comparable with previous test results. Table 11 shows that the time taken to deliver messages to the enabled destination was not affected by the unavailability of a destination system. 1 Configuration 1 1 Both destination systems available 1 Average Run time (seconds) One destination System unavailable DuncanMillard U1796407 Chapta 6 Test Results 53 52 I I I Page 82 Basebe Measurement In order to model the impact on the dynamic message log of an unavailable destination, I first ran a baseline test with both destinations enabled and a O%, 25%, and 50% likelihood of concurrent processing, gmng the results shown in Figure 27. This ftgure shows that the dynamic log size is largely as could be predicted by exammng the results in Figute 24. Q) N 40 10 0 0 10 20 30 40 50 60 70 80 90 100 Percentage of Run Time Figure 27: Baseline dynamic log size for resilience testing Measurement with One Destination Disabled Frgute 28 shows the protile of the dynamic log size with the second destination disabled. Note that in this test, all steps were run for the same length of time. The messages destined for the unavailable receiver are simply held in the dynamic log, awaitmg availability of the remote application in the same way that out-of-order messages are held waittng for the next causal message. Therefore the protocol is able DuncanMillard U1 796407 Chapter 6 Test Results Page 83 to cope with unavailable destinations without impactmg delivery to destinations that are still available. 90 80 - 3m 70m UJ % 60E 0% 25% *50% CI .v) 50- 8 40 - 4 .O 30 E - m i 20IO - 0 iI 0 I 1 I O I I I I I I I 10 20 ---- I l l 30 I I I I I l l I I 40 I l l 50 I 1 I l 60 l I I I l l 70 I I I l l 80 IO I I I I I 90 100 Percentage of Measured Time Figure 2 8 The impact of an unavailable destination on dynamiclog size 6.8. Limitations of the Simulation Svstem ~~~ As discussed in section 6.5.2, when executing an individual test step multiple times, some variance was seen in the time taken on each run of a step. Below I discuss the factors that may have influenced the test results. In total there were four factors influencing the running time of tests: . . Lirmtations of the hardware Use of a real inteption hub in the simulation system Number of threads of execution Use of random values in the test cases DuncanMillard U1796407 Chapter 6: Test Results page 84 Lrmitations Oftbe Hardwm Tests were carried out on a hgh-powered laptop with a 2.66GHz processor and 1 GB of RAM.Despite this, during testlng it was observed that the hard disk was constantly being accessed. I have observed in normal day-to-day workmg that the laptop’s hard disk is particularly slow and believe that this had a definite impact on the consistency of test results. In order to reduce the impact of the latency of the hard disk, I reduced the frequency of message sending from the 0rqpa.l design of 10 per second to one per second and scaled back the “receiver frequency” by the same amount, hence maintaining the sendldeliver ratios and the validity of the testing. This limitation was mitigated by takrng averages of the length of time taken, and where appropriate expressing results in terms of the percentage of total run time, rather than in absolute tenns. Use Ofa Real Integration Hub The simulation system was built around BizTalk 2004. As has been discussed, integration hubs are asynchronous systems subject to variable processing speeds. This meant that sen- messages in to the hub at the rate of 1 per second was not a guatantee of receiving them at one per second. Repeaang a test case did not therefore guarantee a precisely identical result, but the results were in line with expectations. Another factor is that if BizTalk detects that a machine is overloaded, it throttles its processing in order to reduce that load. I therefore had to find a suitable message Duncan Millard U1796407 Chapter 6 Test Results Page 85 . quantity that would overcome the variable latency of BizTalk without overloadmg the system- this was determined to be 50 messages when sending at 1 per second Since this protocol is designed for use with EA1 hubs this variability in no way devalues the testing results - it instead h@dtghts the importance of the protocol. Tbnah ofExecution As can be seen from the test system architecture in Appendix B, there are a number of threads simulating destination systems. Threads are not a “free” resource, and there is a natural limit to how many can be used in a pa.rticula.r context. By experimentation, I determined that, for this hardware, the limit was 8 sending threads, meaning that if more than 8 message groups are available for delivery at any one time, only 8 of those can be serviced. This imposes a theoretical limit on the end to end speed gains observed with multiple parallel message groups, which was c o n h e d by the experimental results. Random Values in Test Cases Some of the test steps used a random factor when deudmg how to sequence messages - for example “a 25% likelihood of one message being causally unrelated to another”. This naturally means that different results were obtained for each execution of the test step. Again, this limitation was miwted by taktng averages of the length of time taken, by expressing results in terms of the percentage of total run time. Duncan Millard U1796407 Chapter 6 Test Results Page 86 6.9. Test Conclusions ~~ ~ The findtryrs presented in thts chapter were conducted usmg a real-world hub and spoke integration engine, conhrrmng the suitability of the protocol for this domain. The tests confirmed the theoretical efficiency of the protocol predicted in section 4.8.2. The message overhead was found to be in the order of O(1) in the general case, and in the order of O(n) with respect to cross group dependencies. The causal log size was found to be in the order of O(n.m) with respect to the number of destinations and number of causal groups in the system. The behaviour of the dynamic log size was found to be harder to predict, but was largely dependent on the difference between the frequency with whch messages were sent to the application hub and the frequency with which they were processed by the destination application. The automatic creation of causal message groups confirmed the flexible nature of the protocol, proving that inferring causal information from incoming messages is a successful technique representing a novel approach to obtaimng ordering information. 6.9.1. Performance Gains of the Protocol Traditional causal ordering is equivalent to all messages being processed in a single causal message group. Figure 26 showed the s@ca.nt performance gains that can be realised by splittmg messages into different causal message groups, leadmg to a clea~benefit of using this protocol for message ordering. The only overhead of using Duncan Millard U1796407 Chapter 6: Test Results Page 87 multiple causal groups is on the causal log size which, as shown in Frgure 21, scales linearly with respect to the number of groups and destinations created. 6.9.2. Comparisonof Theoretical Efficiency with Optimal Example The canonical causal message ordering algorithm of Raynal et a/. (1991) exhibits a performance of O(n9 for both message and log overheads, where n is the number of processes in a distributed system. The optimal implementation of this algorithm (Kshemkalyani and Singhal 1995) performs slgnrhcantly better than this in the general case, but is still worse than O(n). It is worth noting that in a non-EA1 implementation each process m a i n t a i n s its own message store, hence repeaang the overhead at multiple sites. Since O(n) describes the message log size at a single site, the message log space overhead for the whole system is closer to O(n3. The theoretical efficiency of an algorithm was discussed in depth in section 4.8.2, where it was clearly demonstrated that an algorithm exhibittng O(n) efficiency is preferable to one exhibiting O(n3. Fgure 29 clearly shows that my protocol, which is in the order of O(n), will scale sgdicantly better than Kshemkalyani and Singhal's optimal algorithm for causal ordering which varies between O(n3 and O(n3. Duncan Millard U1796407 Chapter 6 Test Results page aa O(n"3) - - - - O(nY) - . - .- - . O b ) I -~ 0 - , I 4 6 1 2 - 1 - 8 , 40 , 12 Input Size Figure 29: Comparing theoretical efficiencies 6.10. Summary .\/ This chapter described the results of testmg the protocol through an extensive suite of test cases. The test results showed that the protocol offers a flexible and efficient approach to message ordering for Enterprise Application Integration and that the protocol compares well to other message ordering protocols. Duncan Millard U1796407 Chapter 6:Test Results Page 89 Chapter 7 CONCLUSIONS 7.1. The Need for an E M Protocol This thesis began by describing the evolution of Enterprise Application Integration, and idenafylng the particular problem of message ordering in this domain. The w e n t state of message ordering research was evaluated in the context of its suitability for sohug the EA1 ordenng problem, and it was concluded that although the field was a rich and mature one, no one approach or protocol was suitable for M I , as summarised by the results in Table 5. By considering the best features of existing protocols and comb- these with the aspects of EA1 relevant to message ordering, I designed and evaluated a new message ordering approach and protocol, called ir$md cuusul onhing. Table 6 presented the results of this evaluation against the EA1 criteria. The protocol took the novel step of inferring ordering information from the data contained within the messages, an approach made possible by the nature of Enterprise Application Integration, whereby implicit understandmg of a message’s contents is essential for orchesttatlng business processes. 7.2. Benefits of the Protocol Chapter 6 tested the protocol using a simulation system built with a commercial integration engine. The results showed that the protocol successfully guarantees the Duncan M3Lu-d U1796407 Chapter 7: Conclusions Page 90 order of delivery of messages to multiple remote systems for a typical EA1 hub and spoke architecture. The tests also showed that the protocol introduces a minknal and predictable overhead on each message, and similarly that the causal log size can be accurately predicted by understanding the profile of the causal relationships between the sen% applications. These conclusions were detailed in Chapter 6. Perhaps the biggest advantage of the protocol was the novel way in which causal information was inferred &om the messages themselves, rather than explicitly requiring applications to be modified to add causal information. S @ m t efficiency benefits were also obtained with the protocol when compared to n o d causal ordering by splitting messages into causally unrelated groups. 7.3. Limitations of the Protocol The dynamic log size is much harder to predict than the causal message log or the message overhead as it is susceptible to a number of factors. However essentially the main contributing influence is the ratio of the rate at which messages are sent into the system against the rate at which they can be delivered to a remote application. On the assumption that all messages will eventually be delivered, the dynamic log size will reduce to zero, but sufficient log space must be available iu the dynamic log for the protocol to operate. Section 4.8.1 also detailed a problem inherent in the centralised nature of the protocol, which represents a single point of failure for the system. CO-locating the protocol implementation on the integration engine’s hardware will help to tie the Duncan Millard U1796407 chapter 7: Conclusions Page 91 availability of the protocol with that of the hub itself, but a more sophisticated algorithm-level solution is desirable. 7.4. Future Work The protocol addresses Enterprise Application Integration, providmg message ordering within a localised environment The concepts behind it could be combined with those of asynchronous message groups as described by Fritzke, Jr. ef al. (1998) to potentially create message ordenng in a Business to Business Q32B) environment, for example by implementlng timestamps as described by Mostefaoui ef a1 (2001). This work could also be extended to investtgate how to reduce the “single point of failure” problem inherent in the centralised design of the protocol, for example by spanning the protocol’s message logs across multiple sites. From the hardware perspective, techniques such as clustering and redundant systems exist to reduce the impact of the failure of an individual system but a protocol-level solution is worth invesagatmg further. It would also be interesting to undertake a formal proof of this protocol in a similar way to Skawratananond et al. (1998), and to use thts to idenafy any areas of the protocol that could be improved. Finally, I would like to model the dynamic log behaviour for far larger message quantities and more complex interconnected systems by implemenung it in a realworld integration project. Duncan Millard U1796407 Chapter 7: Conclusions Page 92 7 F l - ~ I I o o c 0 0 0 c 0 0 0 0 0 c o o o o -7 C Appendix B SIMULATION SYSTEM ARCHITECTURE B.l Overview Figure 30 shows the simulation system used to test the protocol. It has three main components: a s e n e system simulation, the EA1 hub, and a destination system simulation. E! appropriate rate Envelope Adding component mT Message store I (Ibnitedt08 SendlTransmit Figure 3 0 Simulation system archtecture B.2 Sending System Simulation A sending simulation component initialised one thread per simulated sendmg application. Each thread generated the configured number of messages at the configured rate and passed them to a simple “rules engme” forming a part of the integration hub. DuncanMillard U1796407 Appendix B Simulation System Architecture Page 96 Each message contained additional information relevant to the test case, such as how likely the message was to experience a delay, how many destinations the message was for, or the static latency for that message. In addition, dependmg on the semng for concurrency, a message would be assigned to a logcal stream. Finally, timestamp and test iteration information were used to collate results in a database. Figure 31 shows the fullinput message format that was used <Message> <SendingSystem>O</SendingSyst~> <EogicalStream>O</LcqicalStream> <LikelyhoadDelay>O</LikelyhaodDelay> <FixedDelay>O</FixedDelay> <NPrmberDestinations>l</N~rDestinatians> <TestName>NumberOfessagesSO</TestNarae> (TimeStamp>632295456317187472</TimeStamp> <Pteration>l</Iteratian> </Message> Figure 31: Test message format B.3 EAIHub B.3.1 Rules Engine The rules e w e examined the message, and used the data within it to assign a causal message group. The causal groups were given a group identifier constructed as follows: Figute 3 2 Simulation system causal group identifier For example "Test~Efficiency50~1teration~l~From~O~Stream~4". An envelope was added to hold the causal information, and the message was then passed into a BizTalk 2004 orchestration to simulate a business process. Duncan Millard U1796407 Appendix B: Simulation System Architecture Page 97 B.3.2 Integration Engine: BizTalk 2004 The BizTalk orchestration simply simulated any necessary delays, then wrote the message to the dynamic message log ready for delivery to the simulated systems. B.3.3 Message Logs The message logs were implemented as SQL Server 2000 tables. One table held the causal log, and another group of tables stored the dyuamic message log. B.4 Destination Svstem Simulation A Windows Service was written that ran eight threads to monitor the dynamic message log for causal message groups that were ready for delivery. When a thread identified an available message group it consumed the messages from that group one at a time, pausing for the configured delay between each message. If a receiving system was configured to act as being disabled, it returned a fdure code to the thread controller, which flagged the delivery as ha- failed, and requiring the message to be retried at an appropriate point. DuncanMillard U1796407 Appendix B: Simulation System Architecture Page 98 References , @ccenture (2003) Field Force Enablement, http://www.accenture.com/xdoc/en/services/microsoft/field_force .pdf /Agarwal, D. A., Moser, L. E., Melliar-Smith, P. M. and Budhia, R. K. (1998) "The Totem multiple-ring ordering and topology maintenance protocol", ACMTrans. Comput. Syst., 16 (2), pp. 93-132. man, R. and Altman, G. (2004) "An Integration Primer Part IT", Business Integration Journal, March, pp. 45-47. /Attachmate Corporation (2004) Approaches to EAI Involving Legacy Host Applications: The Five R 's, http://www.attachmate.com/article/0,1012,3163-1 3 85 8,OO.html V 6 m m - 1 , K. P. and Joseph, T. A. (1987) "Reliable communication in the presence of failures", ACM Trans. Comput. Syst., 5 (l), pp. 47-76. @'msard, G. and Bratley, P. (1988) Algorithmics Theory and Practice, PrenticeHall, Inc., Englewood Cliffs, NJ. a u s s l e r , C. (2002a) "B2B integration technology architecture", In: Proceedings of the Fourth IEEE International Workshop on Advanced Issues of ECommerce and Web-Based Information Systems, pp. 147-152 dBussler, C. (2002b) "P2P in B2BI", In: Proceedings of the 35th Annual Hawaii International Conference on System Sciences, pp. 39 15-3924 /Chan@ P., Gambhire, P. and Kshemkalyani, A. D. (2004) "Performance of the Optimal Causal Multicast Algorithm: A Statistical Analysis", IEEE Trans. Parallel Distrib. Syst., 15 (l), pp. 40-52. /Cheng, W., Jia, X. and Werner, M. (1995) "A multicast mechanism for actual causal ordering", h:IEEE First International Conference on Algorithms and Architectures for Parallel Processing, pp. 303-314 d l e r n e n t s , P. C. and Nothrop, L. N. (1996) "Software Architecture: an Executive Overview", In: Brown, A. W. (ed.). Component-Based Sofhare Engineering: Selected Papers @om the Software Engineering Institute IEEE Computer Society Press 55-68 &merich, W., Ellmer, E. and Fieglein, H. (2001) "TIGRA - an architectural style for enterprise application integration", In: Proceedings of the 23rd international conference on Sofmare engineering, pp. 567-576 DuncanMillard U1796407 Referaces Page 99 Espinosa, J. and Pulido, A. (2002) "IB (integrated business): a workflow based integration approach", In: Proceedings of the 35th Annual Hawaii International Conference on System Sciences, pp. 2566-257 1 A y a l , A. and Milo, T. (2001) "Integrating and customizing heterogeneous ecommerce applications", The VLDB Journal, 10 (l), pp. 16-38. Federal Student Aid University (n.d.), Dejnitions, http://extranet.sfa.ed.gov/sfa-university/training/Q botw/dictionary-m.ht ml A r i t z k e , U., Jr., Ingels, P., Mostefaoui, A. and Raynal, M. (1998) "Fault-tolerant Total Order Multicast to asynchronous groups", In: Proceedings of the Seventeenth IEEE Symposium on Reliable Distributed Systems, pp. 228234 1/Gosain, S., Malhotra, A., Sawy, 0. A. . and Chehade, F. (2003) "The impact of common e-business interfaces", Commun. ACM, 46 (12), pp. 186- 195. / /'Hackney, D. (1996) "Treasure in the Data Islands", DM Review, 6 (10). Jf-Iohpe, G. and Woolf, B. (2004) Enterprise Integration Patterns. Designing, Building, and Deploying Messaging Solutions, Pearson Education, Boston, MA. f l e g e r , H. (2003) "Fulfilling the Web services promise", Commun. ACM, 46 (6), pp. 29-ff. JKshemkalyani, A. D. and Singhal, M. (1995) Necessary and Suflcient Conditions on Information for Causal Message Ordering and Their Optimal Implementation, Technical Report 29.2040, IBM Research Triangle Park. ,,Kshemkalyani, A. D. and Singhal, M. (1996) "AnOptimal Algorithm for Generalized Causal Message Ordering", In: Proceedings of the fifteenth annual ACM symposium on Principles of distributed computing,87 ~ K U O D.,, Fekete, A., Greenfield, P., Jang, J. and Palmer, D. (2003) "Just what could possibly go wrong in B2B integration?", In: Proceedings of the 27th Annual International Computer Software and Applications Conference, pp. 544-549 /Lamport, L. (1978) "Time, clocks, and the ordering of events in a distributed system", Commun. ACM, 21 (7), pp. 558-565. (Levin, J. (2001) "From ED1 To XML And UDDI: A Brief History Of Web Services", Information Week, CMP MEDIA LLC. Duncan Millard U1796407 References Page 100 /Linthicum, D. S. (2000) Enterprise Application Integration, Addison-Wesley, Boston, MA. A i n t h i c u m , D. S. (2004) Next Generation Application Integration: From Simple Information to Web Services, Pearson Education, Inc., Boston, MA. ./Medjahed, B., Benatallah, B., Bouguettaya, A., Ngu, A. H. H. and Elmagarmid, A. K. (2003) "Business-to-business interactions: issues and enabling technologies", VLDB Journal: Very Large Data Bases, 12 (l), pp. 59-85. / /Microsoft (2004) Microsofi BizTaIk Server 2004 Product Documentation http://www.msdn.microsoft.com/librarv/default.asp?url=~ibrarv/enus/def%tm/ebiz def portal page.asp A., Raynal, M. and Verissimo, P. (2001) "The Logically /Mostefaoui, Instantaneous Communication Mode: a Communication Abstraction", Future Generation Computer Systems, pp. 669-678. ,Akrty, V. V. and Garg, V. K. (1997) "Characterizationof Message Ordering Specificationsand Protocols", In: 17th International Conference on Distributed Computing Systems (1 7th ICDCS'97), pp. 492-499 National Institute of Health (n.d.), NIHnet Handbook Glossary, http://www.cit.nih.gov/dnst/handbook/Main/glossary.htm /&pal, M., Schiper, A. and Toueg, S. (1991) "The causal ordering abstraction and a simple way to implement it", I n , Process. Lett., 39 (6), pp. 343350. d e y e s , A., Espino, J., Mohan, V. and Nadkar, M. (2003) "Ad hoc software interfacing:enterprise application integration (eai) when middleware is overkill", In: Proceedings of the 2 7th Annual International Computer Software and Applications Conference, pp. 570-580 A g h ,G. and Badarpura, S. (2001) "Application ordering in group communication", In: 21st International Conference on Distributed Computing Systems Workshop, pp. 11- 16 &ha, A. (1992) "Client-server computing", Commun. ACM, 35 (7), pp. 77-98. WSkawratananond, C., Mittal, N. and Garg, V. (1998) "A Lightweight Algorithm for Causal Message Ordering in Mobile Computing Systems". &, M. (2002) "Web services: beyond component-based Computing", Commun. ACM, 45 (lo), pp. 71-76. Duncan Millard U1796407 References Page 101 flemistocleous, M., Irani, Z., O'Keefe, R. and Paul, R. (2001) "EFW problems and application integration issues: an empirical survey", In: Proceedings of the 34th Annual Hawaii International Conference on System Sciences, pp. 1-10 /Truman, / /Tyler, /ah, P. (2001) Integration Framework White Paper, Cap Gemini Emst & Young. P. (1994) "Causal group multicast: a formal description", In: Proceedings of IEEE Region 10's Ninth Annual International Conference. Theme: 'Frontiers of Computer Technology', pp. 692-696 S. (2000) "Reality bites", Computer Business Review (Online Edition), Computerwire, June 2000. /Vinoski, S. (2002) "Middleware "Dark Matter"", IEEE Internet Computing, 6 (5), pp. 92-95. d 3 C (2000) Simple Object Access Protocol (SOAP) 1.1, http://www.w3c.org/TR/SOAP ( A & A-c \ Yoshida, T. (2001) "Message ordering based on the strength of a causal relation", In: Proceedings of the 15th International Conference on Information Networking, pp. 9 15-920 fl DuncanMiuard U1796407 References Page 102 Index A2A .................... See Application-to-Application Integration Actual Causal Ordering..................................... 37 39 Application Ordering........................................ Application-to-Application Integration Asynchronous Communication................21, 23 B2B ..........See Business-to-Business Integration B2C .......See Business-to-Consumer Integration Business Processes Automatlng .................................................... 19 Business-to-Business Integration ................... 12 Business-to-Consumer Integration ................ 12 Causal Message Groups.................................... 48 Causal Ordering.................................................. 33 Cross-GroupDependencies ............................ 48 Data Islands......................................................... 13 13 Data Shanng ........................................................ 14 Data Silo ............................................................... Enterprise Application Integration ................ 11 Definition ....................................................... 11 Rea-World Uses ........................................... 20 The Message OrderingProblem ...............22 Enterprise Applications .................................... 15 Enveloping ...................... See.= - Ehveloping Hub and Spoke Integration ....................... 17, 18 Architecture ................................................... 17 Inferred Causal Ordering.................................. 44 Integration Engine ............................................. 19 Lamport, Leslie "Happens Before" relation ......................... 33 35 Metadata ............................................................... Middleware....................................... 14, 15, 16, 18 Orchestration...................................................... 19 Point to Point Integration ................................ 16 Stovepipe Systems.............................................. 13 Total Ordering.................................................... 31 Totem .................................................................... 32 Web Services ................................................. 18, 19 XML .............................. 17, 41, 45, 46,47, 49, 51 Enveloping ..................................................... 51 ~ Duncan Millard U1796407 Index Page 103
© Copyright 2025