Soft Comput DOI 10.1007/s00500-008-0356-2 FOCUS A hybrid evolutionary approach for heterogeneous multiprocessor scheduling C. K. Goh · E. J. Teoh · K. C. Tan © Springer-Verlag 2008 Abstract This article investigates the assignment of tasks with interdependencies in a heterogeneous multiprocessor environment; specific to this problem, task execution time varies depending on the nature of the tasks as well as with the processing element assigned. The solution to this heterogeneous multiprocessor scheduling problem involves the optimization of complete task assignments and processing order between the assigned processors to arrive at a minimum makespan, subject to a precedence constraint. To solve an NP-hard combinatorial optimization problem, as is typified by this problem, this paper presents a hybrid evolutionary algorithm that incorporates two local search heuristics, which exploit the intrinsic structure of the solution, as well as through the use of specialized genetic operators to promote exploration of the search space. The effectiveness and contribution of the proposed features are subsequently validated on a set of benchmark problems characterized by different degrees of communication times, task, and processor heterogeneities. Preliminary results from simulations demonstrate the effectiveness of the proposed algorithm in finding useful schedule sets based on the set of new benchmark problems. Keywords Multiprocessor scheduling · Heterogeneous · Hybrid evolutionary algorithm · Local search · Precedence C. K. Goh (B) Spintronics, Media and Interface Division, Data Storage Institute, DSI Building, 5 Engineering Drive 1, Singapore 117608, Singapore e-mail: GOH_Chi_Keong@dsi.a-star.edu.sg E. J. Teoh · K. C. Tan Department of Electrical and Computer Engineering, National University of Singapore, 4 Engineering Drive 3, Singapore 117576, Singapore 1 Introduction The multiprocessor scheduling problem is a broad category of a class of combinatorial optimization problems in which an originally large problem is broken down into smaller tasks. These smaller, partitioned tasks then require suitable assignment to the individual processing units of a multiprocessor system or processing elements (PE) to be solved. To obtain solutions for optimal schedules in such systems, it has been shown that the problem is NP-hard for the general case (Garey and Johnson 1979; Kasahara and Narita 1984; Lewis and El-Rewini 1992; Papadimitriou and Yannakakis 1990). The underlying motivation for this problem is quite significant, considering the emergence of computer programs with increasingly higher computational requirements and algorithmic complexity. These factors have necessitated the need for parallel PE in a multi-computer environment, which in turn has seen the increasing need for task allocation to be ‘optimally’ distributed in a suitable manner to these individual processing units. A typical program can usually be decomposed into a set of smaller tasks, similar to a divide-and-conquer approach. These smaller tasks almost always have dependencies, and hence precedence requirements in that the results of another set of tasks are required before a particular task can be executed. The critical aim of a scheduler is thus to assign partitioned tasks to available processors in a manner such that (1) the requirements (or constraints) of precedence between these tasks are met and (2) the resulting overall length of time required to execute the entire program, the schedule length or makespan, is minimized (Wu et al. 2004). To complicate matters, the scheduling of tasks becomes more challenging when communication delays are accounted for. A multiprocessor scheduling problem can be categorized into different classes based on the characteristics of the 123 C. K. Goh et al. problem, the tasks to be scheduled, the multiprocessor system, as well as the availability of a priori information regarding the processing time (El-Rewini et al. 1994; Kwok and Ahmad 1997, 1999). Typically the PE constituting a multi-computer environment can be of the same capability (this is known as a homogenous environment) or of a different capability (this is known as a heterogeneous environment)— this paper is focused on the latter. Presently, there are numerous methods and approaches which have been developed and subsequently applied to the multiprocessor scheduling problem, typically using a deterministic approach. El-Rewini et al. (1994) provides a fairly comprehensive taxonomy of how scheduling problems can be categorized, and highlights the key differences that distinguishes one class from the next. Further to this, in Kwok and Ahmad (1997, 1999), present a wide-ranging overview and classification of scheduling algorithms, particularly focusing on deterministic and static scheduling problems. Most of the present techniques are based on heuristics (Kruatrachue and Lewis 1987; Macey and Zomaya 1998) that are not only greedy in nature but also capable of solving certain instances of the scheduling problem efficiently. With that in mind, the approach proposed here is largely inspired by developments in computational intelligence: evolutionary algorithms (EAs) are a class of stochastic global optimization techniques that has been gaining significant attention from researchers in many fields and it has also been applied to solve the heterogenous multiprocessor scheduling optimization problem (Ritchie and Levine 2004; Zhong et al. 2004). While EAs are excellent global search algorithms, it is known that they can take a relatively long time to locate the local optimum in the region of convergence (Ong et al. 2006). On the other hand, local search heuristics are capable of locating the optimum quickly but are prone to local optimal traps. Therefore, EAs are often hybridized with local search heuristics to maintain a balance between exploration and exploitation, which is crucial to the success of search and optimization processes (Burke et al. 2001; Franca et al. 2001; Ishibuchi et al. 2003; Merz and Freisleben 2000; Ong and Keane 2004; Tang et al. 2007; Zhou et al. 2007). Multiprocessor systems have also been exploited to improve EA performance (Lim et al. 2007). This paper attempts to present a new hybrid evolutionary algorithm (HEA) for solving the above heterogeneous multiprocessor scheduling problem. The proposed algorithm incorporates two local search operators, based on firstly, list scheduling and secondly, task duplication; both methods attempt to exploit the intrinsic structure of the scheduling problem. Unlike existing evolutionary approaches used to solve the heterogeneous multiprocessor scheduling problem, the proposed HEA also implements a variable length chromosome which preserves the precedence relations, a PE schedule crossover which facilitates the exchange of good schedules 123 assigned to the individual processors as well as specialized mutation operators to improve the diversity of the evolving population. This paper is organized as follows: Sect. 2 gives an overview of existing works as well as the problem formulation of the heterogeneous multiprocessor scheduling problem. Section 3 presents the various features of the proposed HEA including the local search heuristics and specialized genetic operators as well as the algorithmic flow. Section 4 presents the extensive simulation results and analysis of the proposed algorithm. Conclusions are then drawn in Sect. 5. 2 Background information 2.1 Overview of existing works Multiprocessor scheduling based on methods motivated by evolutionary computation approaches have been the focus of many research works over the last decade. Here we offer a brief, non-exhaustive overview of similar works that have motivated our interests and research. Ahmad and Kwok (1998) proposed a task duplication approach (together with a review and comparison of some similar algorithms) to mitigate the expensive communication overhead in interprocessor communications that is required when executing dependent tasks on multiple processors. In a similar manner, (Baskiyar and Dickinson 2005) addresses static scheduling of a directed a-cyclic task graph (DAG) on a heterogeneous, bounded set of distributed processors to minimize the makespan, also based on a task duplication approach. Most of the present techniques are based on heuristics that are capable of solving only certain instances of the scheduling problem efficiently. However, the scheduling of tasks with communication overheads and dependencies are gaining increasing attention from researchers. Here, we investigate an alternative paradigm, based on biologically inspired algorithms, to efficiently solve the scheduling problem without the need to apply any restricting assumptions. Aside from the above, other works in the literature have used EAs to determine task priorities based on list scheduling techniques. List scheduling heuristic (LSH) is an approach involving the assignment of a priority to each task to be scheduled within a list, which is then subsequently sorted in decreasing task priority. The task with the highest priority in the unscheduled task list is typically assigned to the first available processor and then removed from the list. If there are more than one task being assigned the same priority level, selection from among the candidate tasks is typically done randomly. This conventional approach will be applied in the comparative study conducted in this paper. On the other hand, an alternative approach would be to use EAs to directly evolve task assignment and order in A hybrid evolutionary approach for HMPS processors. Hou et al. (1994) used an EA to evolve candidate solutions, or individuals that in turn consist of multiple lists, with each list representing the tasks assigned to one processor; the authors restrict the explorable design space in order to avoid invalid solutions. However in their proposed approach, the authors consider only homogeneous multiprocessor systems. Consequently, the crossover operation then exchanges tasks between corresponding processors from two different individuals, after which the mutation operator then exchanges these tasks within a single individual. Overall, this approach restricts the actions of genetic operators to ensure the validity of evolved individuals. However, such an approach would mean that some parts of the search space may be unreachable by the algorithm. Correa et al. (1999) subsequently claims to improve upon Hou’s original approach to circumvent this problem, and allow the entire search space to be explored. In Kwok and Ahmad (1997), proposed a coarse-grained parallel genetic algorithm (GA) together with a heuristical list scheduling method, where candidate solutions are vectors of length n, with n being the number of tasks to be scheduled. The elements of a vector represent the tasks themselves and the order of the tasks gives the relative task priorities. A number of order-based crossover operators are presented and a mutation operator is used to perform random swapping of tasks. In Dhodi et al. (1995), proposed a “Problem Space Genetic Algorithm” (PSGA) for datapath synthesis. The problem itself is modified by the EA and subsequently transformed into solution space by means of a heuristic, thus avoiding infeasible solutions. Blickle et al. (1996) use an EA to perform allocation and binding on a system level. Scheduling is achieved in a separate step. The authors use multichromosomal individuals to encode the problem and to subsequently guide repair heuristics in parallel. Tsuchiya et al. (1998) proposed an approach in which a GA scheduler allows task duplication where a single task may be assigned to multiple processors. Alternatively, Zomaya et al. (1999) incorporate heuristics in the generation of the initial population of an EA and perform a thorough study of how GA performance varies with changing parameter settings. Wu et al. (2004) claims that an EA-based approach achieves good performance on most of the problems applied. They also suggest that GAs appear to be the most flexible algorithm for heterogeneous systems because heterogeneous processors make it more difficult for list scheduling algorithms to accurately estimate task priority. An alternative approach, motivated by ant colony optimization (ACO), is developed by Ritchie and Levine (2004). When combined with local and tabu search, the ACO-based algorithm is able to find shorter schedules on a few benchmark problems. ACO, as the authors also claim, has been shown to be a successful strategy for problems related to scheduling jobs in a heterogeneous computing environment. This approach was only tested in solving a scheduling problem in a static environment for independent jobs. 2.2 Heterogeneous multiprocessor scheduling problem Technological advancements have led to the development of large scale parallel and distributed systems for a large range of applications. However, applications are only able to exploit parallelism when their parts do not wait for data longer than necessary. This necessitates appropriate scheduling strategies, which are able to control access to processing resources, as well as scheduling strategies, which control execution of these parallel application modules. Thus, it is not surprising that the focus of research in this area has been on the efficiency and effectiveness of scheduling algorithms. There are increasing concerns that comparative studies performed are not adequate to evaluate the true abilities of the algorithms under test. Addressing the issue of data set generation, Hall and Posner (2001) presented a set of guidelines on how data sets should be generated for the evaluation of the various scheduling algorithms. In order to generate a set of good test problems, the researcher must consider: 1. 2. 3. 4. the purpose of the experiment, tests performed should be comparable, unintended bias that can skew the test results, and the reproducibility of the generation scheme. Further, Hall and Posner also state that the generation scheme should have properties such as variety, practical relevance, scale and size invariance, regularity, describability, efficiency, and parsimony. Kwok and Ahmad (1999) presented a suite of five different benchmark graphs. The proposed sets are peer set graphs (PSG), random graphs with optimal solutions using branch-and-bound (RGBOS), random graphs with predetermined optimal schedules (RGPOS), random graphs with no known optimal schedules (RGNOS), and traced graphs (TG). RGPOS is probably the most interesting set of task graphs in the sense that they are generated based on a set of pre-determined solutions. To our knowledge, this is the first instance of such generation scheme for a multiprocessor scheduling problem with communication delay. RGBOS also have a set of optimal solutions, which are determined using the A∗ algorithm (Ahmad and Kwok 1998). The A∗ algorithm is a search heuristic, which incrementally searches all paths from the starting point until it finds the shortest path to a goal. PSG is a collection of task graphs used by various researchers. RGNOS consists of large scale randomly generated task graphs while TG represent real-world applications. Coll et al. (2002) considered the issue of generating benchmark test sets for heterogeneous systems. The degree of heterogeneity between different processors is defined by a 123 C. K. Goh et al. processor power ratio (PPR), which represents the relative speeds between processors. In addition, they considered the different precedence relationships based on the specific nature of the task to be processed. More recently, Davidovic and Crainic (2003) proposed a set of benchmark problems modeling homogeneous systems with communication delays. Based on the criteria proposed by Hall and Posner (2001), they proposed two sets of task graphs. Similar to Kwok and Ahmad (1999), one of the proposed sets is generated based on some pre-determined desired solution. However, Davidovic and Crainic provide a much higher degree of control, allowing parameters such as dependency densities to be changed. 2.2.1 Problem formulation The multiprocessor scheduling problem can be simply stated as follows: Assuming there are n tasks that have to be executed on m processors—where and when should each task be executed, such that some performance measure(s) is (are) optimized? The task of the scheduling algorithm is to ultimately minimize a given cost function of time. The objective function used in this paper is defined as: F = min max T f (vi ) i=1,...,n (1) where T f (vi ) denotes the time for the complete execution of task vi . The goal of task assignment/mapping is to determine an assignment of tasks to processors and an order in which tasks are executed to optimize some performance measures. Often, the assignment process should aim to minimize the total cost of executing the programs. An optimal assignment determines both the allocation (identifying specific processor to run certain modules) and the schedule (execution order) of each task. A task in turn, is a collection of instructions, procedures or subroutines, possibly together with some data. Each task is assumed to be immutable. While distributing the tasks to parallel PEs is not difficult, introducing dependencies between the tasks causes degradation of the overall system performance. There are bindings or linkages between some pairs of tasks (we call these dependencies) since a procedure in one task may wish to (1) transfer control to another procedure in a different task or (2) access data contained/produced in a different task. It should be noted that these tasks only incur a communication delay when they are assigned to different PEs. It is, thus, important to make the assumption that the cost of executing tasks on different processors and the cost of the communication delay are known in advance. 123 As to why deterministic scheduling is considered, we are inclined to believe that a priori efforts must be devoted to analyze data from machine manufacturers and accumulate actual experience from running smaller programs on fewer processors. Such efforts are fully justified especially if repeated deterministic or production runs of important large programs will be run on large parallel systems where termination and successful results are expected. In fact, it is precisely for these production runs that the effort of optimizing assignment is justified in the first place. The duration of each task is known as well as precedence relations among tasks, i.e. which tasks should be completed before some others can begin. In addition, if dependent tasks are executed on different processors, data transferring times or communication delays that are given in advance are also considered. These latencies also include memory access and synchronization delays. To further include realism into our problem model, we also consider a heterogeneous system, that is a multiprocessor environment consisting of processors with different capabilities. Moreover, we only consider a non-preemptive system, that is, each PE will complete the processing of each task that is assigned to it. Essentially, this means that PEs will not suspend its processing to take on another task. 2.2.2 Problem generator In order to verify the efficacy of our proposed approach, a set of problems are needed for the experimental study. This is achieved via the construction of a benchmark problem generator, which produces a representative problem of a certain complexity based upon a set of input parameters. These test problems are in turn used as the input problem to the task scheduler. Having said that, there are four key components in a task scheduler: the parallel program of interrelated tasks, the target machine (model), the generated schedule, and the performance criterion. Previous works on task scheduling with dependencies usually use a graph representation for either the tasks of the parallel program and the computer model, or both. In an actual multiprocessor computing system, particularly those consisting of heterogeneous elements, the running time of a particular job is not the sole or primary factor to be considered when scheduling jobs. An equally important consideration is the time that it takes to migrate the executables and its associated data from one processor to the next. Braun et al. (2001) defined three types of heterogeneity: task heterogeneity, machine heterogeneity and consistency. Task heterogeneity is defined as the amount of variance possible among the execution times of the jobs. Machine heterogeneity, on the other hand, represents the variation of the running time of a particular job across the processors. Lastly, consistency can be categorized as either: consistent, A hybrid evolutionary approach for HMPS inconsistent and semi-consistent. A system is said to be consistent if for a processor A that executes a job C faster than another processor B, then A will execute all other jobs faster than B. A consistent system can therefore be seen as modeling a heterogeneous system in which the processors differ only in their processing speed. A semi-inconsistent system is made of elements from both consistent and inconsistent systems. Higher degrees of machine heterogeneity increase the complexity of the multiprocessor scheduling problem. This is because the scheduling algorithm now needs to account for the variation in the individual processor’s capabilities, in that certain processors might be more suitable for certain tasks due to hardware or software configurations and compatibility. The multiprocessor system is made up of m processors with their own local memories. The system can have various degree of heterogeneity and the processors are connected via bi-directional links of equal capacity. Each processor has an I/O unit that allows for communication and processing to be performed simultaneously. We assume that there are no start-up costs for initiating each task and that input buffers have infinite capacity. A convenient representation for the partially ordered set of tasks is a directed acyclic graph (DAG), which is also known as a task (dependency) graph, where a directed edge e( p, j) between two tasks v p and vi specifies that task v p must be completed before vi can begin. These directed edges in a DAG correspond to the communication messages as well as precedence constraints between the tasks. We consider a node and a task to be equivalent. A task is a set of instructions that must be executed sequentially in the same processor. They are considered to be the smallest possible instruction set that cannot be broken up any further. Mathematically, node v p is a predecessor of node vi if a directed edge originates from v p and ends at vi . In a similar manner, node vs is a successor of node vi if a directed edge originating from vi and ending at vs exists. From a mathematical perspective, for any vertex v in the DAG, there is no non-empty directed path that starts and ends on v—as such, for our multiprocessor task scheduling problem, DAGs are quite ideal models since it is not tractable for a vertex to have a path to itself; for example, if an edge v p → vi indicates that vi is a part of v p , such a path would indicate that v p is a part of itself, which is impossible. The test sets that were artificially generated using our benchmark problem generator is based on this concept of DAGs. From a practical viewpoint, actual multiprocessor systems are immensely complicated combinations of hardware, software and network components and thus it is difficult to make equitable comparisons of the different approaches that have been used on various systems. In constructing these problems artificially, and in a random manner, the input variables essentially controls not only the size, but also the Table 1 Description of inputs to task generator Parameter Description Values CCR Communication-to-computation ratio {0.5,1,1.5,2} Meanproc Mean processing time {10} h pe Variance of processing time {0.25,0.5,0.75} ht Degree of heterogeneity {0.25,0.5,0.75} dpe Width of DAG {0.5} dt Degree of dependency {0.25,0.5,0.75} n Number of processors {15} m Number of tasks {100} complexity of the generated test set. Specifically, these variables are: 1. 2. 3. 4. 5. 6. 7. 8. the number of nodes/tasks, the number of processors available, the degree of network connectivity, the communication-to-computation ratio—average communication cost divided by its average computation cost in a multiprocessor system. A low CCR in a DAG can be considered as a computation-intensive application; on the other hand if CCR is high, it is a communicationintensive application, the mean processing time—the average processing time for all the available processors, the variance of processing time—how large the spread of processing time between the available processors, the degree of heterogeneity—how widely differing the capabilities of the processors are, i.e. processors have different execution time on same the task, the degree of precedence/dependency relationship—how many predecessor tasks that must be completed before a particular task can be executed. Having said that, the generator produces different test sets for a given set of input parameters. For similar set of parameter, different task problems are generated due to randomness. The input parameters to the generator are shown in Table 1, together with the associated range of values. The variance in the processing times of the difference tasks comes from h pe , i.e the mean processing time of the ith task is given by Tmproc (vi ) = meanproc + h t · meanproc · U (−1, 1). (2) where U (−1, 1) denotes a random number sampled using uniform distribution. As mentioned before, each task may have different execution times on different processors. The actual processing time of the ith task on the jth processor is thus given by, Tproc (vi , pe j ) = Tmproc (vi ) + h pe · meanproc · U (−1, 1). (3) 123 C. K. Goh et al. Table 2 Generated test sets Test set CCR Meanproc h pe ht dpe dt n m T1 0.5 10 0.25 0.25 0.5 0.5 15 100 T2 1 10 0.25 0.25 0.5 0.5 15 100 T3 1.5 10 0.25 0.25 0.5 0.5 15 100 T4 2 10 0.25 0.25 0.5 0.5 15 100 T5 1 10 0.25 0.25 0.5 0.25 15 100 T6 1 10 0.25 0.25 0.5 0.75 15 100 T7 1 10 0.5 0.25 0.5 0.5 15 100 T8 1 10 0.75 0.25 0.5 0.5 15 100 T9 1 10 0.25 0.5 0.5 0.5 15 100 T10 1 10 0.25 0.75 0.5 0.5 15 100 Using these inputs for the benchmark problem generator, sets of random DAGs were constructed to be used as the test bed problems in our experimental study. For our simulation study, ten test sets were generated using various combination of the above input parameters, and are listed in Table 2. While the standard multiprocessor scheduling problem is itself an NP-hard problem, additional factors such as communication delays and heterogeneity increase the complexity of the problem. Hence, due to the sheer number of potential solutions in the search space, scheduling becomes a complex task without the use of an effective search algorithm. These sets are classified in terms of the possible difficulties. Each test set consists of different test problems with different degrees of heterogeneity and dependencies. Here, we consider a total of ten test sets generated in this study, which differs in terms of degree of heterogeneity, density, and CCR. A higher CCR value penalizes dependencies which require transmission or passing of messages from one processor to the next, making it less optimal for inter-processor communication to occur. The variance of processing time and degree of heterogeneity affects the individual processing capabilities of each processor, thus making ‘slower’ processors less likely to be assigned tasks, and biasing the utility of ‘faster’ processors. Lastly, the degree of dependency affects the total latency of the makespan in that each processor would have to ‘wait’ for its dependent tasks to finish execution. 3 Hybrid evolutionary algorithm This section presents the HEA specifically designed to solve the heterogeneous multiprocessor scheduling problem by means of specialized genetic and local search operators. The procedure for generating the initial population is presented in Sect. 3.1 while Sect. 3.2 describes the structure of the variable-length chromosome used to encode the task schedule in the HEA. Sections 3.3 and 3.4 describe the specialized 123 crossover and mutation operators used to explore the search space, respectively. Two local search heuristics that exploit the intrinsic structures of a heterogeneous multiprocessor scheduling problem solution are presented in Sect. 3.5. Finally, the algorithmic flow of the HEA is presented in Sect. 3.6. 3.1 Initialization The initial population is built using a random LSH, which ensures that the precedence relationships among the tasks are preserved. The initialization process starts with the assignment of priority to each task to be scheduled. In this paper, the priority of the ith task is simply the sum of the number of its parent tasks and their priorities as given below PrT j (4) PrTi = |Pi | + j∈|Pi | where Pi is the set of parent tasks of the ith task. The list of task is then sorted in the order of increasing priority. This priority list is also used during the genetic processes to maintain the precedence requirements. Instead of assigning the tasks to the earliest available PE, the lowest priority task is assigned to the PEs randomly. The rationale is to provide the initial population with a wider range of diversity to start with. 3.2 Variable PE chromosome Evolutionary algorithms operates on a set of encoded parameters to explore the solution space, providing researchers with the flexibility to design an appropriate representation that fulfills some criteria such as ease of implementation or exploitation of the problem structure. For simplicity, the chromosome is often represented as a fixed-structure and the embedded variables are usually assumed to be independent and context insensitive. As mentioned before, the precedence relations among the tasks must be satisfied in the heterogeneous multiprocessor scheduling problem. In Braun et al. (2001); Ritchie and Levine (2004), the chromosome is a n-dimensional array denoting the n tasks to be allocated and the encoded variable in each element represents the PE scheduled to execute the associated task. While such an encoding scheme is simple to implement, it does not consider the order in which the various tasks are processed and the evolved schedules will not satisfy the precedence constraints. On the other hand, Wu et al. (2004) considered a representation which encodes task-processor pairs and the order in which the pairs appear in the chromosome determines the order in which the tasks will be performed on each processor. This paper adopts a variable length chromosome which is illustrated in Fig. 1. In contrast to the mentioned works, this encoding scheme does not enforce a fixed number of PEs, i.e. A hybrid evolutionary approach for HMPS Fig. 1 Illustration of a the variable length chromosome and b the associated schedule (a) PEs used in the encoded solution PE1 PE3 PE4 (b) PE6 Processors 1 5 10 14 2 6 11 15 3 7 12 4 8 13 PE1 1 2 3 4 PE3 5 6 7 8 PE4 10 11 12 13 14 15 PE2 9 Tasks to be executed by the associated PE PE5 Tasks 9 PE Schedule the length of the chromosome varies with the actual number of PE utilized. For each of these PEs, there is an associated list of task assigned as well as the order of execution. Each of the task list will henceforth be denoted as PE schedule. When a task is scheduled to run before its predecessor tasks, which have been assigned to other PEs, the only problem is the long idle time incurred while waiting for all the predecessor tasks to be completed. On the other hand, if a task is scheduled to run before its predecessor tasks on the same PEs, then there is no way the task will ever be completed. The overall schedule is infeasible only if a task is scheduled to be executed before a predecessor task within a PE. This follows that it is sufficient to maintain a feasible overall schedule by ensuring the feasibility of each PE schedule. The precedence relations for the tasks executed in a PE can be easily preserved in the proposed scheme by maintaining the order of priority calculated at the beginning of the optimization process. 3.3 PE schedule crossover The crossover operation applied by most EAs to solve heterogeneous multiprocessor scheduling problem generally involve the swapping of random segments of tasks or processes between chromosomes, which do not preserve the quality of the different PE schedules. Descriptions of a number of ordered-based crossovers for combinatorial problems can also be found in Davis (1991), Eiben and Smith (2003). However, these crossover operators are not applicable due to the unique structure of the proposed variable length chromosome. The proposed PE schedule crossover is motivated by the fact that the makespan of the multiprocessor schedule is dependent on the fitness of the constituent PE schedules. Since the chromosome encodes a separate list of tasks for each PE, it is intuitive to design a crossover which allows good PE schedules to be shared with other chromosomes in the evolving population. The operation of the crossover is PE6 illustrated in Fig. 2. In the PE schedule crossover, a random PE schedule from each parent is selected for crossover. In the case where one of the selected chromosomes has only one PE schedule, only a schedule associated with a different PE is selected and inserted from the other parent. The selected PE schedule of one parent will either be inserted into the other chromosome as a new schedule or replaces the original schedule of that particular PE, if it is present. Duplicated tasks are deleted while missing tasks are randomly inserted to the other original PE schedules. The new PE schedule will remain intact. To ensure the feasibility of chromosomes after the crossover, the priority list computed at the beginning of the evolutionary process is used to sort the task assigned to each PE in ascending order to preserve feasibility. 3.4 Specialized mutation This paper applies three different specialized mutation operators to improve the diversity of evolving population. For every chromosome undergoing the mutation process, only one particular mutation operator is applied as shown by the pseudocode in Fig. 3. The main functionalities of the three mutation operators are summarized in Table 3. Similar to the PE Schedule crossover, each PE schedule is sorted based on the priority list at the end of the mutation operation. 3.5 Local search 3.5.1 Partial list scheduling The optimality of the multiprocessor schedule is only as good as the last completion time of the task. The idea of partial list scheduling (PLS) is to split up the workload among the PEs with the best and worst completion times to improve the makespan. The first step in this heuristic is to select the appropriate PEs from which all tasks are extracted and placed in a list. These PEs are selected based on two criteria, either the 123 C. K. Goh et al. (a) (b) Parent 2 Parent 1 Child 1 Child 2 PE1 PE3 PE4 PE6 PE1 PE5 PE6 PE1 PE3 PE4 PE5 PE6 PE1 PE5 PE6 1 2 6 12 7 2 1 1 2 6 2 12 7 2 12 3 4 8 14 8 6 3 3 4 8 6 14 8 6 14 5 10 9 10 9 4 5 10 9 9 10 9 15 11 5 13 11 13 11 15 11 7 11 7 13 12 15 15 13 13 14 14 14 (c) Child 1 Child 2 PE1 PE3 PE4 PE5 PE6 PE1 PE5 PE6 1 4 8 2 12 1 2 3 3 10 6 5 6 4 5 15 9 7 9 12 11 8 11 14 13 10 13 14 15 7 Fig. 2 Illustration of the PE schedule crossover for the various steps a selection of random PE schedule, b swapping of selected PE schedules, and c deletion of duplicates and random insertion of missing tasks to form child chromosome Mutation Operation rand < mutation rate IF Select one mutation operator with equal probability Partial Exchange AND No. of PE Schedules > 1 IF Perform Partial Exchange ELSEIF Schedule Merge AND No. of PE Schedules > 1 Perform Schedule Merge ELSEIF Partial Split AND No. of PE Schedules < |PE| Perform Partial Split END END Sort task based on priority END Fig. 3 Pseudocode of the mutation operation PE has a completion time that is greater than the upper quartile or it’s completion time is lower than the lower quartile of the PE completion times. In the next step, the extracted tasks are sorted based on their priorities determined at the start of the evolutionary process. The tasks, in the order of their priorities, are then assigned to the best possible processor, i.e. the one which allows the earliest start time considering inter-task communication (ITC). The new solution will be compared against the original and the better of the two will be retained. 123 3.5.2 Duplication scheduling In multiprocessor scheduling with task interdependencies, some PEs will be idle during various time slots because some task require data from its parent tasks which are assigned to other processors. The idea of duplicating tasks in these idle time slots is to reduce the waiting and ITC delays incurred to reduce the makespan. The pseudocode of the duplication scheduling (DS) heuristic is shown in Fig. 4. The task duplication procedure is conducted iteratively every task in the order of its execution for each PE. The heuristic first determines the idle time which is the difference between the actual and earliest possible start time of the task. It then attempts to duplicate the parent tasks, in the order of their contribution to the delay, until the idle time is used up. The new solution will be compared against the original and the better of the two will be retained. 3.6 Algorithmic flow The algorithmic flow of the HEA is shown in Fig. 5. The optimization process begins with the initialization of the A hybrid evolutionary approach for HMPS Table 3 Description of the mutation operation Operator Description Partial exchange The partial exchange operation involves a number of partial schedule exchanges. For each exchange, two PE schedules are randomly chosen and a segment of the selected schedules is then randomly selected and exchanged. In addition, a mechanism is in place such that no PE schedules will be selected twice in a particular partial exchange operation Schedule merge This operation concatenates the two PE schedules with the least number of tasks in the chromosome. Intuitively, this operation is not applicable to solutions with only one PE schedule Partial split This operation searches for the PE schedules with the most number of tasks, and breaks the schedule into two at a random point. After which, the upper segment of the divided schedule is assigned randomly to either an idle PE or inserted into the PE schedule with the least number of tasks Duplication Scheduling Local Search FOR All PE Schedules FOR All Task in PE Schedule Compute Tidle before task execution Determine parents of task Sort parents in descending order of completion time FOR Parents Determine Texe required if d uplicated Execution time< Tidle IF Duplicate parent Update Tidle: Tidle = Tidle - Texe ELSE Break END END END END Sort task based on priority Evaluate new solution new solution is better than old solution IF Replace old solution END Fig. 4 Pseudocode of the duplication scheduling local search Start Build Initial Population YES Stopping Criteria met? No Evaluate and Rank Solutions Update Archive Return Solution Evaluate and Rank Solutions Update Archive Perform Local Search Local Search Criteria met? Yes Tournament Selection PE Schedule Crossover No Mutation Fig. 5 Flowchart of HEA population based on the procedure described in Sect. 3.1. After the initial evolving population is formed, all the chromosomes are evaluated and ranked according to their final execution time in the population. Following the ranking process, an archive population is updated. In this paper, an archive is applied to store all the best solutions found during the search. The archive maintains a fixed number of solutions and the updating process consists of a few steps. The evolving population and the archived solutions are first combined and all duplicate solutions are deleted. The remaining solutions in the combined population are then inserted into the archive in the order of increasing rank until the archive is filled. The binary tournament selection scheme is then performed on the archive. In the binary tournament selection, a pair of individuals is selected randomly from the archive. Thereafter, the selected pair of individuals will enter a tournament where the chromosome with the lower rank is selected for reproduction. This procedure is performed until the mating pool is filled to preserve the original population size. The genetic operators consist of the PE schedule crossover and the three mutation operators presented in Sects. 3.4 and 3.5, respectively. The PLS and DS are applied to the archive populations at a fixed interval, TLS , for better local exploitation in the evolutionary search. Different schemes for incorporating the two local search methods will be explored in Sect. 4. The evolution process is repeated until the stopping criterion is satisfied. 4 Simulation results and analysis This section presents the extensive simulation results and analysis of the proposed HEA. The simulations are implemented using Matlab on an Intel Pentium 4 2.8 GHz computer and the results shown are based on the final makespan value of the best archived solution. Thirty independent runs are performed for each of the test sets in order to obtain the statistical information, such as consistency and robustness of the 123 C. K. Goh et al. Table 5 Different case setups to examine contribution of the local search heuristics Parameter Settings Populations Population size 20 Archive size 20 Chromosome Variable length chromosome Selection Binary tournament selection Crossover rate 0.9 Mutation rate 0.3 Evaluations 600 Local search frequency, TLS 5 algorithms. The various parameter settings for the algorithm are listed in Table 4. The number of evaluations includes the evaluation of solutions from the main algorithmic cycle as well as the local search operations. Section 4.1 demonstrates the effectiveness of the proposed local search operators, as well as analyzes how the various settings of the local search heuristics will affect algorithmic performances. Section 4.3 investigates the impact of different problem characteristics on HEA performances and how it compares against conventional heuristics. 4.1 Effects of local search The HEA incorporates the local search heuristics in order to exploit local schedules in parallel with global evolutionary optimization. In this section, the dynamics and parameter settings of PLS and DS are examined. Note that T1 and T4 are used in the study here since it has been observed in previous works that ITC will have severe impact on schedule optimality. Six settings of HEA with various implementations of the local search operators are investigated as shown in Table 5. No local search is applied in setup 1 while only one heuristic is applied for each solution undergoing local search for setups 2–6. In setup 2, either PLS or DS is randomly applied. (a) 350 Setup1 Setup2 Setup3 Setup4 Setup5 Setup6 340 Makespan 330 320 310 300 290 280 270 0 40 80 120 160 200 240 280 320 360 400 Evaluation Fig. 6 Evolutionary trend of the six setups for a T1 and b T4 123 2 3 4 5 6 PLS – Random Yes – Alternate∗ Alternate DS – Random – Yes Alternate Alternate∗ In the third and fourth setup, only one heuristic is applied. The asterisk (∗ ) in setup 5 and setup 6 denotes which local search is activated first as they are alternately executed. The evolutionary trends of the makespan averaged over 30 runs for T1 and T4 are plotted in Fig. 6a, b. From the plots, it can be observed that the application of local search results in significant dips in the convergence trace, particularly in instances where DS is applied. Figure 6a, b distinctively demonstrate the effectiveness of local exploitation in the HEA as the five setups which incorporates local search performed better as compared to setup 1. The performances of setup 2, setup 4, setup 5, and setup 6 are comparable, although the combination of DS being activated first and PLS in setup 6 seems to have a slight edge for both problems. On the other hand, setup 5 which activates PLS first has a slower convergence rate for both T1 and T4. The effectiveness of duplicating tasks in reducing overall completion time is also evident since the four settings of setup 2, setup 3, setup 5 and setup 6, are able to find solutions with makespans that are significantly lower than those found without local search and by PLS only. Interestingly, the application of DS seems to have more impact on T1 with an average of 10% improvement as compared to 5% for T4 which has a more severe CCR restriction. Setup 6 will be used as the default setup for all subsequent experiments. 4.1.1 Effect of local search frequency In general, there is a need to maintain a balance between exploration and exploitation. Therefore, experiments are also conducted to study the impact of local search frequency on (b) 560 Setup1 Setup2 Setup3 Setup4 Setup5 Setup6 540 520 500 480 460 260 250 1 Makespan Table 4 Parameter setting for HEA 440 0 40 80 120 160 200 240 280 320 360 400 Evaluation A hybrid evolutionary approach for HMPS (a) 360 (b) 580 1 3 5 7 10 340 1 3 5 7 10 560 540 Makespan Makespan 320 300 520 500 280 480 260 240 460 0 50 100 150 200 250 300 350 440 400 0 50 100 150 200 250 300 350 400 Evaluation Evaluation Fig. 7 Effects of various TLS settings for a T1 and b T4 250 460 (a) (b) 248 455 244 Makespan Makespan 246 242 240 238 450 445 236 440 234 232 Archive only Population only Setups Both Archive only Population only Setups Both Fig. 8 Makespan of HEA with different individual local search selection schemes for a T1 and b T4 the performance of the HEA. Apart from the original setting of applying local search at TLS = 5, four other settings where local search is applied in every generation and at intervals of TLS = {3, 7, 10} generations, respectively, are used in the test. Thirty simulation runs of each of the five settings were performed and the convergence traces are plotted in Fig. 7a, b. It should be noted that the maximum number of evaluations is maintained at 400 for all simulations, i.e. increasing the frequency of local search reduces the number of generations. From the figures, it can be observed that the convergence speed increases with decreasing TLS . While it is expected that increasing local search frequency will improve convergence speeds, there is always a risk of yielding local optimum solutions due to the lack of sufficient exploration. Nonetheless, we note that a well-designed LSH is capable of achieving schedules within 25% of the optimal solution. By comparing the results achieved by the HEA and the conventional heuristics which will be shown in the Sect. 4.3, it is thus unlikely that the HEA is trapped in a local optimal. This is probably due to the global exploration capability of the HEA. 4.1.2 Effect of individual selection Apart from TLS , another factor that will influence the effectiveness of the local search process is the selection of individuals. In the preceding sections, only archived individuals are exploited by the heuristics. In this section, two other methods of individual selection are investigated. In the second approach, only individuals from the evolving population will undergo local search while random individuals are selected via tournament selection from the archive and evolving population in the third approach. The simulation results are summarized in the form of boxplots in Fig. 8a, b. From the figures, it can be observed that performing local search on archived solutions have an edge as compared to exploiting the evolving population only. On the other hand, the original approach of exploiting the archive only is comparable to the third approach which exploits a selected set of archived and evolving population individuals. The KS-test is also compared and it showed that the only the second method is statistically different from the other methods. 123 C. K. Goh et al. Table 6 Makespan of HEA with and without the various genetic operators for T1 and T4 HEA T1 T4 First quartile 236.5115 440.6800 Median 238.9705 444.2914 Third quartile 241.9287 447.5509 First quartile Crossover only Mutation only Median 236.5379 239.2379 Table 7 Simulation results of LSH, DSH and HEA for the various benchmark problems LSH DSH First quartile HEA Median Third quartile T1 241.2088 238.4844 236.5115 238.9705 241.9287 T2 339.3681 333.3969 313.3463 315.4760 317.0450 442.7701 T3 438.3704 400.8312 368.8242 371.2755 372.9756 445.0349 T4 496.6009 473.6011 440.6800 444.2914 447.5509 Third quartile 243.1408 447.935 T5 301.7023 299.9843 293.0909 295.1409 297.8713 First quartile 238.6333 443.9139 T6 342.1177 340.6002 319.5344 321.5004 323.6641 Median 241.7449 446.3875 T7 350.9543 307.5352 295.8357 298.9078 304.0759 449.0084 T8 322.0563 316.2526 275.6143 281.5325 284.9465 Third quartile 245.8042 The best result in highlighted in bold T9 388.6536 371.4102 337.1911 342.0356 344.8998 T10 454.2243 415.2846 391.3507 395.1606 397.0869 4.2 Effects of genetic operators This section examines the contribution of the specialized genetic operators to the performance of HEA on the problems of T1 and T4. In order to assess the effects of the PE schedule crossover and the mutation operators, simulations for two different setups of HEA are conducted. Specifically, the first setup incorporates only the PE schedule crossover while only the mutation operator is implemented in the second setup. The simulation results are summarized in Table 6. The results indicate a deterioration of algorithmic performance when either the crossover or mutation operator is removed. Nonetheless, it can be observed that the crossover has a greater impact on algorithmic performance indicating the importance of exchanging PE schedules between individuals. The KS-test conducted also showed that HEA is statistically better than HEA with mutation operator only. 4.3 Investigation of other test problems In order to examine the effectiveness of HEA, a comparative study with conventional LSH and duplication scheduling heuristic (DSH) (Kruatrachue and Lewis 1987) is carried out based upon the ten test problems described earlier. LSH has been described earlier in Sect. 2.1. DSH is an instantiation of the LSH with the task duplication described in Sect. 3.5. Specifically, in DSH, parent task are assigned into idle slots whenever possible after all tasks are assigned using LSH. As before, 30 simulation runs are conducted for all test problems and the results are summarized in Table 7. LSH and DSH are deterministic heuristics and only one solution is produced for each problem. As noted before in Sect. 4.1, the effectiveness of task duplication is evident by comparing the performances between LSH and DSH. The difference between the two conventional heuristics becomes even more apparent as the CCR or degree of heterogeneity increases. On the other hand, the 123 HEA outperforms both heuristics for all test problems. With the exception of T1, it can be observed from Table 7 that the third quartile makespan value attained by HEA is much lower as compared to LSH and DSH for the benchmark problems. This also implies that the HEA is capable of evolving good schedules consistently. In order to analyze the impact of the various problem parameters, the performance trend over the different settings is plotted in Fig. 9a, d. In general, increasing the degree of CCR, precedence and task heterogeneity result in higher makespans for all algorithms. Nonetheless, it can be observed that the problem and algorithmic performances have different sensitivities toward these parameters. For instance, total execution time seems to vary almost linearly with CCR and task heterogeneity. As CCR increases beyond a certain threshold, we can expect that solutions which employ fewer PEs or, at least, concentrate the workload on a few core PEs to become more desirable. On the other hand, the initial increment in the degree of precedence relation from 25 to 50% leads to a sharp increase in makespan. This is probably due to the subsequent increase in waiting time before a task can be executed. However, it can be seen from Fig. 9b that such an effect seems to saturate as the degree of precedence is further increased to 75%. Interestingly, we can observe from Fig. 9c that increasing PE heterogeneity actually improves the makespan. This behavior can be attributed to the PEs that can be either very efficient or inefficient with certain tasks. As a result, HEA is able to exploit such a problem characteristic to generate schedules that are much better compared to DSH and LSH. 5 Conclusion Task scheduling in a multiprocessor system is an NP-hard problem that is critical in distinguishing the performance of a A hybrid evolutionary approach for HMPS 500 345 (a) (b) 340 450 335 Makespan Makespan 330 400 350 325 320 315 310 305 300 300 295 250 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 30 CCR 350 40 50 60 70 Degree of Precedence (%) (c) (d) 440 340 420 Makespan Makespan 330 320 310 400 380 360 300 340 290 320 280 30 40 50 60 70 30 Degree of PE Het (%) 40 50 60 70 Degree of Task Het (%) Fig. 9 Performance trend of HEA (open circle), LHS (open inverted triangle) and DHS (open square) for various degrees of a CCR (results for T1, T2, T3 and T4), b precedence (results for T2, T5 and T6), c PE heterogeneity (results for T2, T7 and T8) and d task heterogeneity (results for T2, T9 and T10) multiprocessor system over a single processor system. However, the fact that tasks are located on different PEs mean that additional overheads are incurred. Furthermore, the aim of minimizing time is only a single objective—practical requirements require that other measures of cost are minimized as well. In this article, we proposed a HEA specifically designed to solve the heterogeneous multiprocessor scheduling problem by means of a variable-length chromosome, as well as specialized genetic and local search operators. The starting population is initialized using a random LSH to preserve the precedence relationships between the tasks. The evolutionary process is driven two primary variation operator—a schedule crossover and three variants of the mutation operator— partial exchange, schedule merge, and schedule split; the local search operators on the other hand consists of a partial list scheduling and duplication scheduling approach. In presenting our results based on a fairly extensive simulation study, we showed that, the proposed genetic operators, when coupled with the local search operators performed better than in the case where any one of the operators were omitted. By incorporating the local search operators, particularly duplication local search into the overall proposed algorithm, convergence time, as expected, was shown to decrease. This observation becomes more evident on test sets where the CCR is smaller. References Ahmad I, Kwok YK (1998) Optimal and near-optimal allocation of precedence-constrained tasks to parallel processors: defying the high complexity using effective search techniques. In: Proceedings of 1998 international conference on parallel processing, pp 423–431 Ahmad I, Kwok YK (1998) On exploiting task duplication in parallel program scheduling. IEEE Trans Parallel Distrib Syst 9(9): 872–892 Baskiyar S, Dickinson C (2005) Scheduling directed a-cyclic task graphs on a bounded set of heterogeneous processors using task duplication. J Parallel Distrib Comput 65(8):911–921 Blickle T, Teich J, Thiele L (1996) System level synthesis using evolutionary algorithms, TIK-Report, Nr. 16 Braun TD, Siegel HJ, Beck N, Boloni LL, Maheswaran M, Reuther AI, Robertson JP, Theys MD, Yao B, Hensgen D, Freund RF (2001) 123 C. K. Goh et al. A comparison of eleven static heuristics for mapping a class of independent tasks onto heterogeneous distributed computing systems. J Parallel Distrib Comput 61(6):810–837 Burke EK, Cowling P, De Causmaecker P (2001) A memetic approach to the nurse rostering problem. Appl Intell 15(3):199–214 Coll PE, Ribeiro CC, de Sousa CC (2002) Test instances for scheduling unrelated processors under precedence constraints. http://www-di. inf.pucrio.br/celso/grupo/readme.ps Correa RC, Ferreira A, Rebreyend P (1999) Scheduling multiprocessor tasks with genetic algorithms. IEEE Trans Parallel Distrib Syst 10(8):825–837 Davidovic T, Crainic TG (2003) New benchmarks for static task scheduling on homogenous multiprocessor systems with communication delays, Publication CRT, 2003-04, Centre de Recherche sur les Transports, Universite de Montreal, pp 123–136 Davis L (1991) Handbook of genetic algorithms. Van Nostrand Reinhold, London Dhodi MK, Hielscher EH, Storer RH, Bhasker J (1995) Datapath synthesis using a problem space genetic algorithm. IEEE Trans CAD 14(8):934–944 Eiben AE, Smith JE (2003) Introduction to evolutionary computing. Springer, New York El-Rewini H, Lewis TG, Ali HH (1994) Task scheduling in parallel and distributed systems. Prentice Hall, Englewood Cliffs Franca PM, Mendes A, Moscato P (2001) A memetic algorithm for the total tardiness single machine scheduling problem. Eur J Oper Res 132(1):224–242 Garey MR, Johnson DS (1979) Computers and intractability, a guide to the theory of NP-completeness. W.H. Freeman and Co., San Francisco Hall NG, Posner ME (2001) Generating experimental data for computational testing with machine scheduling applications. Oper Res 49:854–865 Hou ES, Ansari N, Ren H (1994) A genetic algorithm for multiprocessor scheduling. IEEE Trans Parallel Distrib Syst 5(2):113–120 Ishibuchi H, Yoshida T, Murata T (2003) Balance between genetic search and local search in memetic algorithms for multiobjective permutation flowshop scheduling. IEEE Trans Evol Comput 7(2):204–223 Kasahara H, Narita S (1984) Practical multiprocessor scheduling algorithms for efficient parallel processing. IEEE Trans Comput 33(11):1023–1029 Kruatrachue B, Lewis TG (1987) Duplication scheduling heuristic, a new precedence task scheduler for parallel systems, Technical Report 87-60-3, Oregon State University Kwok Y, Ahmad I (1997) Efficient scheduling of arbitrary task graphs to multiprocessors using a parallel genetic algorithm. J Parallel Distrib Comput 47(1):58–77 123 Kwok Y, Ahmad I (1999) Static scheduling algorithms for allocating directed task graphs to multiprocessors. ACM Comput Surv 31(4):406–471 Lewis TG, El-Rewini H (1992) Introduction to parallel computing. Prentice Hall, New York Lim D, Ong YS, Jin Y, Sendhoff B, Lee BS (2007) Efficient hierarchical parallel genetic algorithm using grid computing. In: Future generation computer systems: the international journal of grid computing: theory, methods and applications, pp 658–670 Macey BS, Zomaya AY (1998) A performance evaluation of CP list scheduling heuristics for communication intensive task graphs. In: Proceedings of the joint 12th international parallel processing symposium and ninth symposium on parallel and distributed programming, pp 538–541 Merz P, Freisleben B (2000) Fitness landscape analysis and memetic algorithms for the quadratic assignment problem. IEEE Trans Evol Comput 4(4):337–352 Ong YS, Keane AJ (2004) Meta-Lamarckian learning in memetic algorithms. IEEE Trans Evol Comput 8(2):99–110 Ong YS, Lim MH, Zhu N, Wong KW (2006) Classification of adaptive memetic algorithms: a comparative study. IEEE Trans Syst Man Cybern B 36(1):141–152 Papadimitriou C, Yannakakis M (1990) Toward an architecture independent analysis of parallel algorithms. SIAM J Comput 19: 322–328 Ritchie G, Levine J (2004) A hybrid ant algorithm for scheduling independent jobs in heterogeneous computing environments. In: Proceedings of the 23rd workshop of the UK planning and scheduling special interest group Tang J, Lim MH, Ong YS (2007) Diversity-adaptive parallel memetic algorithm for solving large scale combinatorial optimization problems. Soft Comput 7(9):873–888 Tsuchiya T, Osada T, Kikuno T (1998) Genetic-based multiprocessor scheduling using task duplication. Microprocessors Microsyst 22:197–207 Wu AS, Yu H, Jin S, Lin KC, Schiavone G (2004) An incremental genetic algorithm approach to multiprocessor scheduling. IEEE Trans Parallel Distrib Syst 15(9):824–834 Zhou Z, Ong YS, Lim MH, Lee BS (2007) Memetic algorithm using multi-surrogates for computationally expensive optimization problems. Soft Comput 11(10):957–972 Zhong YW, Yang JG, Qi HN (2004) A hybrid genetic algorithm for task scheduling in heterogeneous computing systems. In: Proceedings of the third international conference on machine learning and cybernetics, pp 2463–2468 Zomaya AY, Ward C, Macey B (1999) Genetic scheduling for parallel processor systems: comparative studies and performance issues. IEEE Trans Parallel Distrib Syst 10(8):795–812
© Copyright 2025