Efficient Availability Mechanisms Database in Distributed Systems Abdelsalam Helal ConlputerScience Engineering University of Texasat Arlington Arlington, TX 76019 helal@cse. uta.edu Bharat Bhargava Dept. of Computer Sciences Purdue University W. Lafayette, IN 47907 bb@cs.prrrdue.edu Abstract ure detection tion. The and can be found The be resiliency realized of distributed through tolerance failure atomic and the the nication of various failures. isms for data In configuration. RAID replication of data a library of replication able database these methods. reliable in system Repairs andnetwork facility, thus covery. We wilI through data adaptable 1 show and how partial An on-line include are also automatic isolation reconfiguration use of replication temporarily use of tion controller site ity is achieved by the cessing methods. has been For schemes implemented and high levels per, we discuss tolerance adaptability ofavailability to clatabase explore that performance. the implementation mechanisms, new policies and including lThe Purdue RAID and RAID-V2 to tlw concept of Ftedundent Array of systems, of the systeln that tion. achieve In this fault, - dat :~,replication, r%il- systems are not to copies colnn-lit tions related clelay also exception sufFer waste that clelays system response that consists of the till point failure turn, In transaction Not before they repro- responsible cluring transactions may their time for failures. indicates wasted where only (re- trigger of failure time occurs. in the in connectiv- in be (this resources replicamultiple changes detection, periods the re- communica- trigger can failure plus of the and degradation is attempted point), to so as to tolerating which, the is of the system design actions. during transactions timeout C)is.ks. Permission to copy without fse all or part of this meterial is grented provided that the copiee ere not made or distributed for &ect commercial adventaga, tha ACM copvright notice and the titla of the publication and its data appear, and notice ia given This failed the without issued database Changes surveillance negative cuting pa- RAID Inexpensive delayed fault-tolerant can are the partition. hints performance example, failure relations monitors exceptions are the transaction to parts sites. as view transactions way of site protocol (recovery) controlling that [10, 6] 1 is a distributed treated detection view Our as network of the RAID configuration Introduction RAID are pair) sub- the toward as well surveillance their clata access to fail. of combinations The of re- future tion connectivity and new the is targeted failures A surveil- a control occurrences and by this and avoid are anticipated detection scheme. is predicted, issue predictable is detected, This redistribute that partition. initiation may and and view. a failure availaa detected new Once configure changes as network the administrator use of highly rnonitorsthe failure server isolatecl. is updated in or communi- a failure failures a failure re- available and through to cletect Once system Purdue. the adaptable failures as well the site detectable reconfiguration is used of data remain has suffered for is to in- in presence copies database Both a data in RAID availability multiple of the system repairs. at the the facility sequent of maintaining copies failures. and lance re- By are accounted executed the system facility details through of replication and link integra- isirriplementedvi Such to the and goal user, though their provides that merges leading commu- on mechan- methods. (RC) facility failures and through through connectivity. communication with and Failuredetection surveillance failures developed control operations of the syste]n implementation system server cation reconfiguramechanisms in [8, 7, 19]. some even data of these immediate lations, mechanism and evaluation data, of failures. through detection, along relations, control replication we focus is implemented replication replication the The crease non-blocking of site failure mechanisms the and paper, We present within Data this clata these types can fault- isolation operability replication, of each of these include and systems integrated failure Collectively, availability presence of adaptability, commitment. enhance tion These detection, reconfiguration in a collection mechanisms. techniques, database surveillance, empirical suffer aborin exe- remote could awaited access possibly for do these be until a transac- get aborted, but they (CPIJ cycles and corn- that copying ie by permieeion of the Aeeocietion for Computing Machinery. To copy otherwise, or to republieh, requiree e fee 645 ad’or ‘Pecific ‘armiesion” CIKM ’93-11 ~ 1993 /93/D. C., USA ACM 0.8g791-626-3193 /()()1 1 . . ..$1.50 munication sary bandwidth) data not affected tions view ones that hints before that it spares necessary data the mance and tected or system of the whose repair has is distribution access to The on Sections the cussed whose be The tilne fail- by be r~lacle an overview various in section and of the data 4. Finally, ver- summary cation surveillance reconfiguration and An is dis- exchange conclusions sist of is the second RAID-V2 tributed database based, being system relational, scription tem of the Each the functionality the (Jser RAID-V2 design that RAID in the RAID of the system. the system encapsulates The Action ail multiple The and the of mutual con- about to the sites. copies Surveillance changes communicate and [9]. until All RAID replication paths of communi- solely through the actions con- inter-server a reply. further Once a request progress a reply the IJnix a brief of the Replication on that is issued transaction is received. Mechanisms Replication agrr nrllt de- that alizability been sys- 16, control [5]. Many introduced 22]. consists of serializability of database. seven servers are transaction Driver (AD), the in the 646 control of the rnanager, than of transaction man- msrrrlng control literature replication Therefore, hierarchy part for replication in the Usually, a subset is the is responsible of a concurrency of which (UI), in of a request 2 is [6]. site view of messages a transaction, is blocked its perfornlance. irrll~lel~lel~t:itioll transactions information 1 illustrates actions atomicity across reliability data. con- dis- is a server- u,ldel and RAID syste)]i we give system and in each Interface RAID-V2 section, this the database In database servers, [10, 6]. System of workstations can be found seven version on Sun system. of the Details RAID-V2 distributed developed operating the write The that and The in RAID. Servers 5. Overview advertises and ensuring connectivity Figure read a seaccess indexing, conflict. system into The device. atomically of replicated controller. queries manages to provide collects and for aborted controller objects sites, 2 elab- for 1.1 replication troller mec.hanisnls. RAID or that to database. actions. storage, do not is responsible committed parsed checks a user a relational the (SC). allows on a physical transactions Replication Controller write for Controller the that on and is responsible consistency of this second Section replication details in Section of the (RAID-V2). RAID the rest read controller The (AC), translates currency data The driver of information controller Concurrency queries retrieval are The is a front-end of low-level of clifferent to is achieved as follows. system action manager remote can be used can Architecture and the Surveillance interface QIJEL-type quence are anticipated copies System Controller (RC), user invoke object When sites whereby Adaptability are given avoid rec.onfigurability to RAID 3 gives facility. hence adaptation. The pat- (AM), the Atomicity Controller can a data Manager (CC), can access reconfiguration is organized is devoted of the and Access dynamically. paper orates copy a technique section relations response RAID-V2 of replication reconfiguration 1: The re- use reconfigura- clata transaction Figure of executing in transaction of such data potent/impotent, cost where copies de- of the Such Data data iso- the of adaptability Data at been parts through of system has Redistributing relocate employing and isolated the reperfor- aims failure avoids a local In RAID, data confirmed. improve ures are predicted, to fail. both transactions. is an example temporarily improve been throughput. create availability allowing to variations order so as to un- to cope and adapted dynamically. in eliminate are used specifications to adapt system This and whose facility, form terns facil- due to the failure. reconfiguration which past’ and go The delays or merging the used by system is another be changed avoicl that any can System surveillance and surveillance reconfigurability predicted, in ones by the resources availability configuration, ‘living the are transac- degradation. performance reconfiguration. parts and completely Adaptability lating that all time is aborted changing and unneces- contention. quirements. sion to system Adaptability with aborted are exported a transaction to transactions Consequently, response possible avoidance tion get contribute other failure. experience makes the with by the (the through) ity and contention one-copy seri- methods have [4, 2, 26, control component equivalent in a layered replication concurrency 15, 1, 20, is built that on hypothetical one-copy implementation control control top guarantees comes [6]. of a higher A log- ical operation copy on a database equivalence by the object is tested replication for controller one- and .I+iraow Cwig is then transformed able replicas. physical concurrency irnplementationof as well termine which logical operation. as view sites Replication plemented usually and Replication achieve in five RAID main fault-tolerance, is replication performance these we off-line heuristics, and The used a RC-interface off-line for creating, replicated server databases transforms interpreter– distinctive face feature interface these in RAID ated aciaptable Quorum selection to reduce with quorum the quorum overhead, implemented. can be found available formance this during section, mentation of the of mini] faciIity attribute type, that host. is For to be stored A RAID cording replicated useful Once hints per- lnand. design and replication the constructed The imple- The spec of the RAID system user read on-line The tional name. an instance through into of its relation Each database To use the database, information. logical and RC is created is activated, interfaces. write is created, [Jser operations under this users can transactions and mecha- rnat,ion off-line, ac- construction and is done replication off-line and schenia and is given file and initialized and tu- attribute The list of at which a domain of the on the direc- associated 10/raid/databases” is the database is cs.purdue.edu”. logical attach name. to it are parsed to the of a RAID rela- be described are not in unique logical is also used addition of the of to creates relations contains relations. directory user and schemas User terms into by the RAID adbut are the instantiation ancl the information named by mapping of RAID. configuration relations. are file logical names that relations infor- of host shown, meta The servers be the cre- file. where sites will automate to the pairs is mandated in to the and configuration routines, As file replication Host-Path rnriual id’s. spec The be com- arguments. configuration file. mapping terms two of the spec can dbcreate to be given the and specified clbcreate meta 647 This takes in directory database the name reads communication rather In logical in the id’s. higtl-level the using command the a mapping dresses is created, is a representation found unique a an instance are directed can flag. name copy “raidlO. spec Dbcreate contains server. database host of the “DebitCredit9” the dbcreate file database. Management database to a specification replication Once at the copies. is an internet path repli- deleted locations Host “/uraid where for key the a clatabase exa]nple, path directory Replication primary ver- of up-to-date consists is an absolute ates the database Off-line unique Path The a variety as a marker tu- tuple-level controller. specifies contains absolute nisms. 2.1 and pairs the attributes. to implement descriptor length, are used-btt to iclentify acts is to be replicated. and tory nize lnethods attributes These by con- descriptors, internal the is used attribute Host-Path Three concurrency and a jilLzrt- the spec file relation. and attribute into of attribute is used the schemas entered RC-int,erface and for controlling RAID An nal[le implementation the ber relation failures. we present details the associ- multicast provides quorums degradation Layout relation, pairs. attribute coutrol nanle, used To further in [9]. The facility Rehtnm relations a list of every for used-hit pies. The are overhead of the multicast surveillance for selecting In methods. Details RAID traffic cation inter- heuristics fuple-zd The Ttle methods. For each verszon_nu?nber, s~omnuvl RC The are first name, part granularity by an SQL quorums. always ple_~d, are oil-line spec file. the relation The tools use of a variety a kernel-level and performance to the the message are facility. The control the Database process. information and a list of Host-Path reconfiguring operationson replication facilitates methods. a stand- of the RC is its quorulrr-based to a libraryof tains oper- and RAID. 1 1 1 .RELATION 2: RAID following ihe-.space imple- quorum ]]ent the replication of selec- operations-parsed physical Figure by quorum for manage) under logical into tools, (R(;), administrating, I 1.................1 help a variety to a surveillance replication I achieve we have support I increased To management, kernel-level Relaum transparency, failures. server .AITRIBUTE ......... rl and controlled Specifically, control , ,.3,.1, ,,2,,,, im- and are implemented replication replication ations, have ,,>0““, 14,27)24]. meet location cluring mechanisms. mented tion and degradation replication to These , . ............................. .... de- certain [25, and adaptability, objectives, alone designed objectives. datareconfigurability to asbeen projects . ........................ .mtad ata data order fragmentationh r., for the use ofa perfornlinga research the a replica- informationin object avail- passes controllers requires toinvolveiu inseveral on tl~en An method directory operations control to the check. control physical replication operations serializability tion into The .conuol relation FIm E3 ,4,, ~=.l “i I,* path,@ml “J I.*, ,m.hIO@ “J ht. file, The meta of users optionally and initial- ized by inserting ter creating tory, dbcreate these file, every Host-Path pair In addition to remove, vides the in the and extend up an appropriate instance is stored. by removing that the tuples as if it from were dbeztend just relations the From the path the pairs and cal id’s. For each cal id k, the controller, troller, ager command logical the id host k, As instantiated cation the troller path knows the This how the access manager formation. The knows raid command line respective servers, arguments, to all manager servers and passes them set of the lnore The the heuristics Server The RC and mote RCS, and When in- available database quorum-based to network and an n-site system. the is its replication details server. it rums This isting replication be added use the in the in a clean method read-one-write-all, the general quorum The up to (%1 or other the hides from the implementation Currently, assignment to a library method RC can that adapt, quorum consensus, methods. Another it interface to the decides action, that in is a sub- is possible some repli- whether the to have quorum 2.3 explain to selection the use of these local of RC it maps (requests copies replies of the RC! NackRead the can out and to the transaction replies Similar ( ~ornmit 648 indicating never will arrived, send an abort local and RC are has ob- RCS and tuples message on RCS are site the request will the are passed negative, the AD its in which to the it AD. sends a can not be If one or more of eventually to the RC timeto flush system. requests, by finding transfers remote conflict. the out, of the to the Read requests site to re- contain are positive, are that due to concurrency ex- or replies and that remote up-to-date the If one dis- from most case, it returns back into After replies local If no quo- request from CC. ADs. its a NackRead on the local from copies, requests or re- local AD sites if all the satisfied the as fol- ADs interface. sends contained necessary back request to the local Read on to the more from of physical it checks or requests quorum the servers local RCS on the CC). the CC, RAID from a Read to the the copies passecl tained of it to local other requests receives a quorum physical also with Read Otherwise, request for of an to use certain methods of site feature Since Section are available, to the tolerant, interface can opt, or adapt future. general, interface replication resulted that in use needed of sites quorum, vector are currently to perform StartCommit RC passes AD. the highly the A distinctive methods. of a particular infrastructure is, can tolerate quorum-based control features through RC partition in RC operations methods. failures, receives the AD, servers’ Control (RC) sites, of copies to identify The latter on copy hint copies a quorum interfaces It physical controller view facility action The local all existing available. enough such are used. The (RC) replication are one lows. on to the Replication the the where the control action heuristics. the On-line with available than uses method. copies quests 2.2 along The interface a logical to locate set of sites driver copies. of replication consults currently action quorum-based with then case it returns the are the schema can also accept the con- directory to find which repli- replication It a local by the surveillance control available nlau - access comrnanrl available, cation con- the where logi- passes argunlent, RAID degradation available a library directory are passes the interface relation. logi- a replication to the copies then from on presented is exported which of all In addition, that actions access When relation, that in the an access the of instance. command the fault-tolerant. and relocation performance through a fully-replicated a database a concurrency way, highly replication actions is clone unifies some nlet,a H,. to locate the Pj ) with and that logical physical methods. is s tored, creates razrl at host Pj. the learns The and to into mapping The unique (H,, of performance Interface R(j’ RC maps (AD) all database their pair The related added mapped and minimizing controlled the availability a corubs ra~d colrlmand a command-line controller passed where can be found controller, servers Tl~e RAID controller, H,. 2.2.1 takes which and the are and In general, maximizing which meth- of replication starts database remotely a surveillance on of the Host-Path an atomicity It also pro- update command file, name, Host-Path and degree the RC features failures. the be at of methods during experimental-based use of replication reconfiguration. lrrethods, transparency, relations. up an appropriate configuration database the raid a copy uses it to start to directory, the where empty data fault-intolerant is its the aim comniancls files leaves RC adapting policies penalties the the proper dynatnic RAID to the destroies relnote and relations through of govern In addition, cornrnand with to file. at all hosts and feature that OCIS through directory. command dbreset more Finally, name, dbrw created adjust relations. and user allows database, path The itself autol~iatically of RAID all local database. RAID its databases. The policies layout. provides that, tinctive the configu- according database RAID Af- direc- configuration found RAID command a database the copied to dbcreate, reset, database in file. the spec file remote a powerful to and found information 2 depicts input te~llporary the clirectory, relations are then replication a specified in a local copies the meta relations Figure from files remote ration User tuples all the RC handles a quorum for each StartWrite set of sites at which each update must be com- pleted. 8. The AC sends a positive ment to the 9. The RC or negative acknowledge- RC. passes the acknowledgement on to the AD. 10. AD returns 2.z.2 The The RC code written RC server Server consists methods the RC server loop, and and 3: Control Flow Controller in RAID Replication (RC) the CLE [1 1]. The structs the a commit every Write update list that the request session. Otherwise, to the RC the local other AC read RC sends Members (union with and t,heir quest and an respec- for art the request communication servers in RAID. paths The 3 are explained commit and on RAID the the 1. Transaction 2. AD arrives processes erations, tions 3. The transaction reacling are saved remote 4. Tuples and until lJI. and K. write comlnit with op- the time. the RC, RC of local the CC or remote R(.k the AD argulnent, The AD. completed transaction processing that be updated 7. The date should RC passes list the distributed cornrnit, separately, and the the it passes the read list phase of and of tuples to the’ RC. the update includes AC, which keeps of other control reads name to read can a commu- request, computation RC power then initializes state requests; connectivity and that by has to go remote RC’S; the passed be determined maintains n-iethod local AD’s. RC path The and and all local of the which mem- ORACLE. as synchronous local remote sites. case, notified informainitial- and finally reacha- initializes is specified, as an to be used. ... to the has table hint cost inverse be the includes It also tries transactions replication main RAID 6. When the will with CC, the mem- second asynchronous database The remote information the from or by transaction bility the matrix. of the or re- is initialized, from other registration the RC members, of gateways izes its view its CC and list local communal. cost number tion bout RC’S, as asynchronous raid (MIPS) opera- register Tell-Me_A through, Write by communicating are returned from read the AD its RCS. to the originating !5. froni into through in the RC reads with at a AD awaited latter commurtication nication and of the as the which whereabout. synchronous In a synchronous the awaited ORA- RC request specify marked block, the by sending of their ORACLE, for communication The case, the not other be first code dump. we call that know can the the directory by the below. up which list to until adclress RC’S SC; between labels list will members, Once the the contacted the the statistics setting of initialization, to the ORACLE. In as soon The and server wishes block the AD. of Figure will has ber a commitment a NackSt RC of ber with every forwards the asynchronous. the transac- is found to start for interface, sections: a Tell-h4e.About servers contains quorums), constructs con- is found to which relations RC RC request a quorum to its local 3 depicts and if a quorum commit of write If The the transaction contains operation, pathes which quorums. Figure only The the set of sites commit back request. to write(union write write from quorums), wishes tive request operation. the set of sites of read tion StartCommit quorum of three lines of replica- RC set up communication request includes in name 2500 to the library termination RAID registration operation the UI. lmplexnelltation includes with to the of approximately consists Initialization Figure acknowledgement in C. In addition tion main the list, to the AC. set, of’ sites with list read-only of updates, The up- to participate sites in 649 to receive its header message type and the is destined. srarc.heel a matching for is found, message. and of its state receiving another. the When in orcler id of the The a high-level processing is decoded the message the the blocking consists decoding received, tion specified including loop message, information to extract transaction transaction t,ransac.tion message, a message id. is the to which table If the is used is then transac- to process Termination signal (11.,171/[41 of the sent message is received, actions The and list ‘“’’’’’F9-=- to and 2.2.3 RC of six request, 4: The RAID Replication Control Automaton id, RC gle-REA to Figure server 4 depicts ble sequences may go action The of state through. moves to rive, transaction these ing two the uAborting; until one to to Non- commit In of all the in of are packed into the update list which quorum the is ready, creates request moves state. to its a reply, local stays When state, table. When in never arrives, ng More each the an or gets in a transaction Non-Existing aborts single aborted any message waiting after the one timeout a timeout in into update the Figure and it g/e- WRITEceeds read and then the The sul>- included receives home the site reaches the transaction of the Park of the some methods, whole 4 by the packed RC hands it invokes system. eters following method. transition. possible 650 write For all the invocation. ends call Pack- the tJpdate its Finally, when In it-RC-Prot ocol example, case Init-RC-Protocol procedures all to do the always call Pack- order any For modiWhen Update, to start is started, necessary quorum quorum invokes required the list. server the invokes to include tuples. in RC reading relations then by AC ele- returning the function the of by relation local proand update is returned to the in latter update RC in order of the list list of The quorums the in Quorum the Sin- .Quorum passed list. function in the The the quorums numbers recovery of tuples invokes call. for update write relation each Single-READ WRITE_ the For Con- Quorum For function the update tialization. by flushi- Value relation. consists WRITE-Quorum quorum the all re- id, RC method to version When Construct_ numbers. WRITE_ of sites element Update union invo(set list. the colnniitment. replies Inechanisrll value of the tion are transaction ALL_ in that version Quorun/ union library either quorum the relation). a relation ALL- The the using passed a write nlent. where methods update analogous returns a read the method For RC invokes highest RC library h!fin-.~ite_Permuta a read is an ALL- list, fies the request, from the there parameters name, site. state, throughout be found The after by the user issuing home end-to-end to the have to it either it or a remote RAID can the state. until it is removed transaction abort details state that the most-up-to-date returns copies passes Single-READ-Quorum replied, the Value until sends controller determine writes, the returning been with is available, the RC requests have filtered then methods in- where is then quorum The on latter set of available quorum or both. set invoked. a particular by which siruct_ one The the set of sites along is Sin- descriptor call. control utation or n[oves a remote that case it forwards transaction there hand, Subtr-Reading on that the request, home the other to numbers and than ends relation the routine For transaction READ-Quorum sites The descriptor. This Single_ select quests update moves stays more the 5. calls. D. Quorum are relation returning appropriate in figure function to determine of available are update size. passed and to the replication from is built. version commitment concurrency to the transaction it is Trans- Immediately a subtransaction in which issuing the On immediately transaction state, set cation case to the files. which is a Single.REA replicated. hint trans- known aborts is shown function relation. heuristics reaches transaction list. where it moves this the of the result it consensus, state request At update Trans-Committing Non-Existing case arrived until view Random_ Pern) issu- in which that where to Trans- that or of the receives the replies state; writes user the for is pending well Interface passes directory routines between by the by the to Trans- loops the relation name transaction moves conflict), state. deferred case the Ezisting Prepare-to-Commii list of ar- state. case it l~loves concurrency point, the aborted in which the read replies Trans-Processing transaction dexes trans- out all and RC possi- transaction sending operation, it gets the all a new When requests The until (due moves its again. states by to read transaction, negative it moves set of read state a RAID JVOn-EZZStZng) of sites. another another Reading transitions a quorum process sends of specifies Trans-Reading to the version automaton From requests To a simplified automaton. a kill the kill quorum parameters Available.Sites when When average high-level name, D- Quorum the and there The method all into of RC Quorum interface invocation. Figure read consists read aborts unavailability, quorum interface each RC number average RC happens is received. statistics quorum The The R(; include due size, the dump statistics aborts RC server by the ORACLE ini- paramconsensus Recover by certain for repli- contain two or create servers. lowing i=l functional This dependency between separation was realized by not the surveillance the the fol- requirements. Al_w,l*_qucwn * ‘ngle_Raad_Q..rum l’he ● RC does require facility, Singk.wrlb.au.xum Rcplicmiw Gmrd R 0 W A_Re& R 0 w A_wrti although Mdkmk Lilmw for Qc_hd Q C_Wrue I‘V’’’”’’’-’”” 1%===7 1 1 1 instantiating despite RC treats also the allows the SC server if processing failure. information view. not allows SC servers, that as a view This replication does This transaction SC server of the server it. without as a synchronized RC from the connectivity from rectness Interface RAID to continue I not Quorum benefits It ports 5: RC highly so desirecl. ● Figure it [23] and way, the cor- methods depend on it im- hint used the by the surveillance facility. cation methods. 2.3 Quorum The Selection In the quorum usually uses tic by more a quorum, of three heuristics. one a random where prefix method, to select generates sites consensus be performed In order Heuristics data of the permutation MINSITE heuristic sites according to their RAND-MINSITE rums form that sage traffic RAID large-size the selection results sites. of quorums The helps hence the incurs Smallest-size load the of these The RAID Surveillance server in RAID named SC servers subsequent the call in detect repairs. connectivity view hint reflect the new view both site the Whenever is detected, to the Replication change. hint and In discard unreachable sites. This are directed only to sites The design the stud- of the RC Control response, quorums way, that and as well that export adapt as alarm j. last When- j is received, the correspondmessage period from of time, indicating that The up, fixed of this waiting its either protocol can to it sets the interval time the receive alarm to of time. delta We time. messages SC or peri- signals. SC is interrupted, chores, it wakes in alternation. it broadcasts up and In the an I_am_a/ive does first al- message to SC’S. Each SC timer includes uses off. a timer the ceived from i are reset for every number Whenever of site call a new (R(3) to span transactions’ of an I-am. an SC on site other ticks a/ive i, the Each before message ticks to a positive SC. left in the constant it is retimer that we it is in the second al- TimeoutTicks. When to SC servers or operations then builds that have maining does it checks so by decrernenting the failed SC is interrupted ternation, are reachable. the reflects the in the system servers R(]’s odic goes or SC. The failures a change every duration blocks remote as a separate SC servers the then ternation, Mechanism link j site timer has occurred. as an SC starts one of two RA I I) were Controller, and site since l_rrm-ahve expire periodicity. as follows. When of meshowever, heuristics is implemented Surveillance failure site the a certain will be interrupted the volume among Surveillance or a link from If the in timer be described ied in [18]. 3 j. received at from resets to smallest-size selection, distribution performance can result volume quorum is not this passed received site tlo the SC stl site j has message and messages. for each participant participant was receiving As soon load-balancing optimal the create timer that periodi- Oth- mes- hand, message an I-am_ alive a site Intuitively, can increase an SC time is to messages, I-am-alive to a private of protocol received signals for corresponding only uniform however, other quo- quorum. timer SC at heuristics quorum. produces turn, On in a skewed two quorum MINSITE selection, and sage traffic. generates RANDOM of quorums traffic. ever set of ing M IN SITE in A cluration site ancl the MI NSITE quorums of message the interrupt the our broadcast for SC maintains I.a?n.ahve a quorum. sorts use behind I_am_afive check SC. the small- returns Random the smallest const it,ut,es heuristic which, sites. set of a quorml}. the selection heuris- aucl returns size as the random Each constitutes erwise, returns We set that if it has the same it returns idea out periodically quoruln, of the main send interface RANDOM out can weights, the RANDOM and and one quorulll descendingly sorted The respectively, The is replicated of the just RC permutation The est prefix an operation than the cally not sent 651 and t,he validity one tick from a temporary nomzero after the an I-am.alive of its view. view all the timers. by positive number clec.rement. Sites message It does within including of that It sites ticks re- have not ( Timeout Ticks X delta) tain seconcls zero count guards messages. RC the from that lost or view, If, SC does message from RC with its current different from extrelrlely view, theone generated nique cated. Each can be regenerated. second of the database originally possibly is used be the sent. object’s protocol takes The choice rameters. compromise head case rate of large and of these between (in abort della TtnteoutTlcksas paraltleters increased of small values of delta). areset of delta) default Oseconds, a ble (in values case of the are relation, RAID surveillance protocol has tile following than Symmetry: the protocol not a coordinator. require or equal an election protocol is decentralized and does This running three in the avoids The case of a coordinator it all Asynchronous ipants coordination: are chrony. allowed This to avoids the be protocol completely clock drift, out prohlelns partic- a quorum of syn- RSW). and so- ● Reliability: the protocol on the timeliness Instead, makes or reliability it guards against no assuinptions of lltessage message delivery, . versatility: the restarted when ipants. This protocol does adding or not, need renlovillg accommodates for to new policies dynarllic be flect the systeln experimental reconfiguration. lance performance protocol Several Data tem To improve ures RAID the in order in [19]. been pro- and relocate adapt, fail- tirrie sys- remote whose and aim When stucly response for dominant at creating access. sites in the a time Higher availability can be used to shows are anticipated RSW 652 the and or The method for in degree availability of that in- two of replication) a given trans- experiments Figure of the for response and and conditions. ROWA do not reliais 2 improve Figure time. and the method of replication availability percent in 6 shows workstation of replication 50 upclate the experimentally degree degree the replicas. performance and system. practical is to re- on an integrated use under and time policy of replication availability impair response de- transaction The both 7, demonstrate clegrees RAID relation and failure RAID implement of the performance for transactions of 0.90. copies. fail- 6, and fully (QC- of the of varying practical and in a 9-site read-only bility the is through of the effect so appropriate is based degree is prescribed characteristics Figures this the data size of the reconfiguration copies improve to variations also action configured method quorum compromise (called and it has to performance The rela- so that different is examined analytically. availability to the methods curs the minirnurn anticipated by re-distributing can to avoid data RAID response quorum Redistribution are predicted, temporarily can mix in against transaction to optimize copy have on replication the Account one of availability study, a read-one-write- conditions. policy zero greater accessed the the with relation be instance, failure study In this is found to guard read/write operation. surveil- 13, 12, 17]. is usec] and throughput, in order the experimentally protocols [28,3,21, reconfiguration transaction local examined of Reconfiguration the performance ures. effectiveness surveillance intheliterature Data 4 was other posed and Account updating the is required. and the to use under re- and weight 2. It is also to each read(write) total prescribes of this of replication The to validation/invalidation clefinition RAID. Copies order recoldiguration and by new The For is configured For of replication partic- a four-relation quorums the Branch The data adaptability implemented the of through a fully- Ta- reacl-same-as-write policies. characteristics loss or delays. uses RAID. with is configured uses that information. write In 0, 1, and consensus can –to encapsulates specified. relation adaptability grees and are (ROWA). and RA1 D lutions. quorum relation be accessecl uiethocl RAID that R-Quorum(W_Quorum) at site rrlust technique quorums. in an 8-site consider Branch replicated ● by reconfiguring of copies to copies that failure. is done This copies. a nuniber As an exaniple, tions. method quorum copies part addition our Relation 2n ualid In replication the size of the read weight invalid, is not the and the made copy reconfiguration, database weights respec- features: ● 1 shows repli- object, write distribution lation, tively. The and Debit(jredit of these and3ticks, read Q7~orwnL database increased object. implement replicated over- and transactions The to30. imposes comnlunication values of living-in-the-past parameters pa- that be invalid tech- fully is regenerated. reconfigure with we can be re- reconfiguration however, it RAID, be physically An until in of an object Our can, a data extent- To The replica to redistributing solrle copies objects or the reconfiguration invalid. that tem- it re-notifies could or rllade requires local an acknowledge- which data whereby its subsequent rw, achieve the notifies receive itjs local To use a technique is found SC installs by the to fail. conTzcks delayed view and not timers Timeout temporary old view. whose The as permanent alternation, ment ones tacks. new the view with the of against If different porary are number of the for 7 QC- worksta- Table Relation R-Thresllolcl 1: The RAID W_Tlmesllolcl Quoruln Relation WI w? w~ W4 W5 Wlj W7 Teller 4.00 6.00 0 1 1 0 1 1 1 1 Branch 1.00 3.00 1 1 1 0 0 0 0 0 Account 5.00 5.00 1 1 1 1 1 1 1 1 History 3.00 5.00 1 1 1 1 1 1 0 0 Conclusion 5 This tion e RO WA Availability paper of -e--e-+4--e-++- and cation 1 2345(378 9 RAID stand-alone terface 6: Read-one-write-all: O 90 Rellablllty Figure 0(% [Jpdat,es, via ., 1 ● .-” QGRSW Availability / / _- A facility hints. is terms off-line and with failures. that informa- how the surveil- replication to control Finally, is facility shown and an in- detection reachability integrated of a replication surveillance fault-tolerance during Repli- in Failure We have dis- control, performance we have demon- A” x?” 9 8 7 6 5 4 3 ~ A’ ~- strat,ecl the use of data adaptability o Q(”:-RSW reconfiguration to achieve failure and replication isolation. Response Time (seconds) References [1] Anw El tioi]ed Abbadi [2] of Iiepllcmtion F’. Alsberg J. Day. Availability Proc. Fijth in ACM of Database A principle parti- SIGA Systems, C’Tpages for resilient sharing of the Int’1 P~oceedings resOurces. .n In 19S6, aud fwe,,cr Toueg. on Principles h4arcll distributed S. databases. Sgmp. 240–251, 1 23456783 and replicated SIGMOD OhTTTnTtl Degree a reliable view the detection, heuristics, facility. network-independent degradation Llf! controller, tion lance failure implemented maintains called in mechanisms reconfiguration. selection to a surveillance to increase 0.9 08 0.7 0.6 05 04 0.3 02 0.1 w been quorum implementa- These replication, replication ilnplellleuted and RAID. through has ]Ilan agement, Degree of Replication data isolation in design mechanisms system adaptable failure the fault-tolerance database include 3 2 4 describes some tributed 0.2 01 ‘“-J 1’V8 Softwure Engineering, pages %zd 562-570, of Con- October 1976. [:3] Figure 7: Quorultl Consensus( [{eacl-salll(’50% Updates, O 90 Rellahlity ~ls-~vrite): J. F. Bartlett. f[uwaii P. A. Bernsteill currency dat reliability tion is 5 copies. impairs of 0.90. Higher response increase The degrees time availability practical and, degree of replication in the sallle of replicaseverely tillle, does 2 lists not [~1 .~baws. of Pl~ilip practical dl’grtws for O, 20, and 0.99 both 50 update workstation the ROWA and percents, reliability. and The the Q(:-RSW, for 0.90, degree system. Proc, Sciences, January oj 0.95, (;oodmau. eill prol>lelll ACM pages 114–122, algorithnl for replicated on Transactions con- distributed Database Systems, 1984. al~d Nathan for replicated Symp. An in on Goodmall. Princap/es August The failure and Proceedings databases. o-f Distributed oj Gomput- 1983. of [6] replication System recovery December Bernst tlte %d the operatillg on N. aud ACM recovery significantly. a sample and contrwl 9(4):596-615, tng, Table nonstop C?onjemnce 1978. [4] tion A ]ntn’1 for and Bllarat Bllargava, vasan Jagaunatha, nlentatioll of Technical of replication Karl u, and tile Report RAID Frieseu, Johu V2 Abdelsalam Rledl. distributed CSD-TR-962, Purdue Helal, Design Sril~i- and database inlplesystem, U1liversity, March 1$)$)0. of the tical database values percent or relations once the The adaptability rum relation estimates workstation can be adapted reliabilities y is accomplished in accordance to tile of the t,ransact,iolls’ with are by updating table 2. pracupdate ol>tnillml. the quo- [7] Bharat John Bhargava, Riedl. tributed database .? IjTILpOSiILYIL 76–S5, Karl Adaptability Octo})er on systen~. Rc/tiLbtltty 1!390, Friesen, Abdelsalam Helal, ex~eriment,s in Proceedings of 271 Distributed the the raid gth Systems, and disIEEE pages Table 2: Practical Update Degrees or Replication Bharat Bhargava, alyzing availability 7tati07ta/ Jo IL7-nal December 1991. and [9] database Bharat in Bharat for ber Bbarat 5 - - 50% 1 1 1 5 3 1 Helal, and Karl database Friesen. Simulatio7t, on [22] A1l- s yst enls. file MaHa, and databa>e John Riedl. A ers. system. ~0?7L- Br( processing. [24] niodel for IEEE Engzneerz?~g, adaptable Transuc(ions 1(4):433-449, John distributed database Purdue Kenneth Birman on Operaiing 1987. Sally for 5th Zntn’1 and 071 Royappa. Teclm]cal Report Joseph. P7v71cip (~SD- Exploiting [2(;] failure cletectiou computing C’onfercwce 116–123, Dasgupta May and database on Proc. Dist~ibuted No\re;~]- [27] proto- of Compvting IEEE [28] Systtms, Morsi. An object-basecl on Report the G IT- distributed clounds operating ICS-86/07, sys- Georgia Ted), 1986. [15] D. K. Gifford. ceedings of ciples, [16] Weighted the 7th voting for Symposium 150–1 62, pages on Abdelbamid Heddaya. for Data PhD thesis, Harvard data. PH- t$ystr?n Prta - 1979. Replication Abstnzct replicated Oprmltwg Deceluber Abdelsalam tems. Managtwg Types 2n University, Event. based Dzstvzbat, (Irtober d Sys- 19S8. TR- 20-88. [17] C. Hedrick. ing [18] Routing Group, RFC information 1058, Abdelsalam Helal. Distributed Systems. protocol, Experimental Pl~D Net UJOTL lVOrk- 1988. June Analysts thesis, of 1{( plicatt Purdue on tn (llliversity, May Bllarat f3l1ar- 1991. [19] Abdelsalam [20] Surveillance during failures. on In 1992. Maurice Herlihy. abstract Systems, 4(1 Kim. DC systems. in 116–123, Pmt. Zhm]g, rolled ?Sth ) :32–53, Proc. Distributed H[lwaii pages ACM Fe},ruary A framework of IEEE Softwure ;i])d perf,]rlllall,-e <Iegra,l:iti,,l] Int~matzonal 202-210, cluor\llll-collsells~ls types. Auditor: July collt Scifncc., A data W. bilzty Yongguang for System January for [21] Helal, gava. ference Bruce G. Co rl - Kauai, Hawaii, replication T~ansactzons on nlet hc,d Comput~, 1986. for Second and l,i.gl)ly available Sympowum Datahasc Computing class the 10th Systems, May nication in R*: Barbara T. A. A on toke,l mld on Mohan, progrannning database Engineering, Software 1984. Argus. March Resilient ACM February in 31(3):300–312, F. commu- manager. 2(l), a distributed Pauf and database Wiederhold. for C. Systems, ACM, G. design- Principles, Computation Distributed scllenle system System Haas, distributed Oj the Minoura M. Yost. Compute, Liskov. computer Operating 1983. Laura Robert for on C)ctober Lindsay, and Calton Pu. on ,Systrnzs, Df3Rrhapages 1982. 654 Replication Dist~ibuted ington, May Berlld Walter. tocols. P~oc. Computing supported Technical notification oj Com 1988. extended true- system. IEEE 9(5):173-189, May 1982, 1985. M. system and A new .Sympostum 12.3–138, systelus. Proceedings Hints Sympostunz Woods, Willns, Edrn A consensus: data. Distributed Lan~pson. Transarttons virtual I I th pages /es, tton copy A Ch{ on $lth mnnicatiows TIIe 19S7. systems. W. ACM Transactions Decem- Andrew August Thomas .$y$t ems distributed tem. system. distributed Bruso. col and University, in ber Riedl, quorum replicated symposium Buttler C~om- 1991. Systems, Hierarchical for 1990. systems Riedl. John Ktunar. algorithms IEEE 1(4):393-418, distributed Akhil of In t e~- [25] TR-691, P. 1 distributed Data Bhargava, pages [14] 1 issue transaction synchrony [13] 2 1989. raid [12] 1 20% ISDN and r=O.99 r= O.9.5 r=O.90 3 [23] Raid and r=O.99 3 Computer and Know/edge [11] the Bhargava systems r= O.9.5 1 Enrique Networks puter 7 2 special Bbargava, QC-RSW QC-RSW simulation. munication [10] A the 2 replicated oj and o% Abdelsalaru of ROWA ROWA Percent r=O.90 [8] of the and System. Nested PhD thesis, Transactions in University of the Wash- 1985 Network of IEEE Systems, partitioning 5th pages Intn’1 and Conjemnce 124-129, May surveillance on 1985. pro- DtstTibuted -
© Copyright 2024