How to build towers of arbitrary heights, and other hard problems Debora Field1 and Allan Ramsay2 Abstract. This paper presents a reasoning-centred approach to planning: an AI planner that can achieve blocks-world goals of the form ‘above(X,Y)’ by knowing that ‘above’ is the transitive closure of ‘on’, and using that knowledge to reason over the effects of actions. The fully implemented model can achieve goals that do not match action effects, but that are rather entailed by them, which it does by reasoning about how to act: state-space planning is interwoven with theorem proving in such a way that a theorem prover uses the effects of actions as hypotheses. The planner was designed for the purpose of planning communicative actions, whose effects are famously unknowable and unobservable by the doer/speaker, and depend on the beliefs of and inferences made by the recipient/hearer. 1 INTRODUCTION The approach of many of today’s top domain-independent planners is to use heuristics, either to constrain the generation of a search space, or to prune and guide the search through the state space for a solution, or both [5, 6, 19, 25]. All such planners succeed by relying on the static effects of actions—on the fact that you can tell by inspection what the effects of an action will be in any situation—which limits their scope in a particular way. In their Graphplan paper, Blum and Furst make this observation [5, p. 299]: One main limitation of Graphplan is that it applies only to STRIPS-like domains. In particular, actions cannot create new objects and the effect of performing an action must be something that can be determined statically. There are many kinds of planning situations that violate these conditions. For instance, if one of the actions allows the planner to dig a hole of an arbitrary integral depth, then there are potentially infinitely many objects that can be created. . . The effect of this action cannot be determined statically. . . This limitation is also true of HSP [6], FF [19], REPOP [25], and all the planners in the planning graph and POP traditions. The class of problems that these planners do not attempt to solve—the ability to dig a hole of an arbitrary depth (or perhaps to build a tower of blocks of an arbitrary height)—was one that particularly interested us. 1 2 Department of Computer Science, University of Liverpool, Liverpool L69 3BX, UK: debora@csc.liv.ac.uk School of Informatics, University of Manchester, PO Box 88, Manchester M60 1QD, UK: allan@co.umist.ac.uk Our motivation for this research was the problem of planning communicative actions. The well-documented, considerable difficulties involved in making a plan to communicate a message to another person include this: a key player in the ensuing evolution of the environment is that person (the ‘hearer’ of the utterance).3 Consider Rob the robot, who can stack blocks, but cannot communicate messages. Rob’s goal in each task is to attain a state in which there is a certain configuration of blocks. Rob makes a plan to build a configuration of blocks, on a strong assumption that at plan execution time, he will be the author of the positions of the blocks. When Rob puts a block in a certain position, he knows that the block will have no desires and opinions of its own concerning which position it should take up—he, Rob, expects to be in reasonable control of the situation, (notwithstanding impeding concurrent events, possible sensor or motor failures, and other hindrances). If, however, human agent John’s goal is to get human agent Sally to believe the proposition John is kind, in some respects, John has a harder problem than Rob. John knows that Sally has desires and opinions of her own, and, unlike Rob, John has no direct access to the environment he wishes to affect. John cannot simply implant John is kind into Sally’s belief state—it is more complicated than this. John has to plan to do something that he considers will lead Sally to infer John is kind. This means that when John is planning his action— whether to give her some chocolate, pay her a compliment, lend her his credit card—he has to consider the many different messages Sally might infer from the one thing John chooses to say or do. There is no STRIPS operator [13] John can choose that will have his desired effect; he has to plan an action that he expects will entail the state he desires. With the ultimate aim of planning communicative actions (both linguistic and non-linguistic), an agent was required that could make plans for the purpose of communication, while knowing that any single plan could have a myriad different effects on a hearer’s belief state, depending on what the hearer chose to infer. ‘Reasoningcentred’ planning of actions that entailed goals was an approach we considered would enable this difficult predicament to be managed. By means of experimenting with an implementation, a planner was designed that reasons about the effects of actions, and is able to plan to achieve goals that do not match action effects, but that are entailed by the final state. 3 The term ‘hearer’ is used loosely to mean the observer or recipient of some act of communication. 2 PLANNER DESIGN ISSUES The planner’s general design was strongly influenced by its intended application: to investigate the planning of communicative acts without using ‘speech acts’ [4, 28].4 We will now discuss some of the early work we carried out on communicative action planning, in order to help explain the decisions that were taken concerning the planner’s design. 2.1 An early experiment in communicative action Some tests were carried out to investigate the idea of informing. ‘Inform’ was simplistically interpreted as ‘to make know’, the idea being that the effect of informing some agent a of a proposition p is knows(a,p). A STRIPS operator was designed, having the name inform(SPEAKER, HEARER, PROP).5 The operator had two preconditions pre([ knows(SPEAKER, PROP), knows(SPEAKER, not(knows(HEARER, PROP))) ]) task52j(P):plan( [% GOAL CONDITION knows(john,above(b,a)), knows(john,above(c,a)), knows(john,above(d,a)), knows(john,above(e,a))], [% INITIAL STATE % (1) JOHN’S KNOWLEDGE STATE [knows(john,isblock(a)), knows(john,on(a,b)), knows(john,clear(a)), knows(john,isblock(b)), knows(john,on(b,c)), knows(john,isblock(c)), knows(john,on(c,d)), knows(john,isblock(d)), knows(john,on(d,e)), knows(john,isblock(e)), knows(john,on(e,table))], % (2) GEORGE’S KNOWLEDGE STATE [knows(george,colour(a,blue)), knows(george,not(knows(john,colour(a,blue)))), knows(george,colour(b,green)), knows(george,not(knows(john,colour(b,green)))), knows(george,colour(c,blue)), knows(george,not(knows(john,colour(c,blue)))), knows(george,colour(d,green)), knows(george,not(knows(john,colour(e,green)))), knows(george,colour(e,blue)), knows(george,not(knows(john,colour(e,blue))))], % RULE: above_1 [knows(_,on(X,Z) ==> above(X,Z)), % RULE: above_2 knows(_,(on(X,Y) and above(Y,Z)) ==> above(X,Z)) ]],P). it had one positive effect: and here is the first solution the early planner returned for it: add([ knows(mutual([SPEAKER, HEARER]), PROP) ]) and it had one negative effect: delete([knows(SPEAKER, not(knows(HEARER, PROP))) ]) The positive effect expressed the idea that after a speaker has uttered a proposition (PROP) to his hearer, then the speaker and hearer mutually know PROP (i.e., each knows PROP, each knows the other knows PROP, each knows the other knows they both know PROP, . . . ).6 The operator’s negative effect matched one of the preconditions, and represented the idea that after informing HEARER of PROP, it is no longer the case that SPEAKER knows that HEARER does not know PROP. With the three actions inform(SPEAKER,HEARER,PROP), stack(DOER,BLOCKX,BLOCKZ), unstack(DOER,BLOCKX,BLOCKZ), the planner could make plans to communicate items of information between agents, as well as to move blocks about. Tasks were designed to test inform, including some in which a colour-blind agent had to make blocks towers without putting two blocks of the same colour next to each other.7 To achieve a task, a second ‘colour-seeing’ agent would have to inform the colour-blind agent of the blocks’ colours. The different agents therefore had to have their own distinct knowledge states, and so the theorem prover had to be developed into an epistemic theorem prover before the tasks would work. Here is one of the ‘colour-blind’ blocks-world tasks: 4 5 6 7 Although early work on the ‘speech acts with STRIPS’ approach was promising, [7, 10, 2, 3], subsequent work has found many problems with this approach [17, 26, 8]. The example is lifted from the model, which is implemented in Prolog. Uppercase arguments are variables. The model embodies a deduction model of belief [20], rather than a ‘possible worlds’ model [18, 21] (to be discussed shortly). Thus agents are not required to draw all logically possible inferences, and are therefore not required to infer an infinite number of propositions from a mutual belief. In order to make this possible, colour constraints were added to the stack operator. P = [unstack(john,a,b),unstack(john,b,c), inform(george,john,colour(b,green)), inform(george,john,colour(a,blue)), stack(john,b,a),unstack(john,c,d), inform(george,john,colour(c,blue)), stack(john,c,b),unstack(john,d,e), inform(george,john,colour(d,green)), stack(john,d,c), inform(george,john,colour(e,blue)), stack(john,e,d)] ? Being the first steps towards planning linguistic actions, the inform operator, its accompanying tasks, and the planner’s new communicative action procedures were very crude. First, testing of the inform operator at this early stage was minimal. There were many aspects of informative utterances that inform did not attempt to account for. These included: that people cannot know what other people do or do not know; that you cannot assume that if you tell someone p they will believe you; and that you may want to tell someone p in order to inform them of a proposition q that is different from p [15]. Secondly, with regard to procedure, the planner made plans by directly accessing both George’s and John’s knowledge states. The planner was behaving like a third agent asking itself, ‘What would be the ideal solution if George and John had this problem?’. A more realistic approach would have been if the planner had been planning from John’s point of view alone, using John’s knowledge about the current block positions, and his beliefs about George and what George might know, to ask George for information, in anticipation of needed answers. This early work highlights some of the peculiarities of planning communicative action as opposed to physical action. To communicate, agents have to be able to reason with their own beliefs about the belief states of other agents, and to use that reasoning in order to decide how to act—so this planner needs to be able to do epistemic reasoning about its current situation, and its goals. Communicating agents also have to be able to deal with the difficulty that uttering a proposition p may cause the hearer to believe p, or it may not, or it may cause the hearer to believe other things as well as or instead of p—in other words, that the effects of communicative actions are unknowable during planning (not to mention unobservable after application), and that they depend largely on the intention of the hearer. 2.2 Development strategy In order to develop a design which possessed the desired capabilities, the following strategy was taken: • A state-space search was implemented that searches backwards in hypothetical time from the goal via STRIPS operators (based on foundational work in classical planning [23, 24, 14, 22, 13]); • A theorem prover for first order logic was implemented that can constructively prove conjunctions, disjunctions, implications, and negations, and that includes procedures for modus ponens and unit resolution; • State-space search and theorem proving are interwoven in such a way that: – not only can disjunctions, implications and negations be proved true, they can also be achieved;8 – not only can a goal q be proved true by proving (p ⇒ q) ∧ p, but q can also be achieved by proving p ⇒ q and achieving p; – a goal can be achieved by using the theorem prover’s inference rules to reason with recursive domain-specific rules, which transform the ‘slot-machine’ effects of actions into powerful non-deterministic unquantifiable effects. • The theorem prover has been transformed into an epistemic theorem prover by incorporating a theory of knowledge and belief suitable for human reasoning about action, so that agents can make plans according to their beliefs about the world, including their beliefs about others’ beliefs. Since it is not possible to judge whether goals like above(e,f) are true by inspecting the current state (which contains no above( , ) facts), reasoning is carried out to find out which goals are false. Similarly, since goals like above(e,f) do not match any action effects, reasoning is also done to find out which actions would make a goal true. To accomplish this, a theorem prover uses the effects of actions as hypotheses. A goal is proved by assuming the effect of some action is true, on the grounds that it would be true in the situation that resulted from performing that action. Hence, a set of actions is computed that might be useful for achieving a goal by carrying out hypothetical proofs, where the hypotheses are the actions whose effects have been exploited. 2.4 One approach to reasoning-centred planning might have been to grow a graph forwards from the initial state, using heuristics to constrain its growth, and then to search through it, employing a theorem prover and some inference rules to determine whether the goal condition was entailed by the state that had been attained. This strategy would do reasoning retrospectively—‘Now that I’ve arrived at this state by applying some action/s, can I prove that my goal is entailed by this state?’ A prerequisite of this research was, however, that the agent should be able to reason from the goal. This preference for a backwards search was motivated by a defining quality of the communication problem, as epitomised by utterance planning: there are too many applicable actions to make a forwards search feasible. People generally have the physical and mental capabilities to say whatever they want at any moment. This means that the answer to the question ‘What can I say in the current state?’ is something like ‘Anything, I just have to decide what I want to say’. A backwards search is far more suitable than a forwards search under conditions like these. 3 2.3 Achieving entailed goals The planner’s design is essentially an interweaving of epistemic reasoning and planning in such a way that the planner is not limited to analysing actions purely as ‘slot-machine’ type operators (if you apply this action, you get this static effect, and that’s the end of the matter). In the blocks-world context, for example, the planner is not limited to understanding the effect of stacking a block X onto another block Y as on(X,Y); it is able to reason that X will be above any block which happens to be beneath Y. In order to achieve a goal above(X,Y), where above is the transitive closure of on, something different from an action with an above( , ) expression in its add list is needed. Stacking e onto d, for example, might have the effect above(e,f), and it might not—it depends on the positions of d and f. The planner achieves an entailed goal (a goal which does not match any action effects) by achieving parts of what would otherwise be the constructive proof. It can thus achieve the goal above(e,f) (for example) by using rules which define the meaning of ‘above’ in terms that can be related to the available actions. 8 Proving a goal by assuming the effect of some action is true is referred to here as ‘achieving a goal’. The terms ‘proof’, ‘proving’ etc. are reserved for the proving of goals within states. Search direction DEDUCTION MODEL OF BELIEF The planner’s theorem prover embodies a deduction theory of belief [20], chosen in preference to the ‘possible worlds’ model [18, 21]. The meaning of implication under possible worlds semantics does not represent very well the way ‘implies’ is understood and used by humans. Under possible worlds semantics, an agent draws all the possible inferences from his knowledge (the well-documented problem of ‘logical omniscience’). This means that knowing q automatically follows from knowing p ∧ (p ⇒ q), there is no progressive ‘drawing’ of the conclusion q, no awareness by the agent that the reason why he knows q is because he knows p ∧ (p ⇒ q). In contrast, humans reason with their knowledge; they make inferences, which they would not have to do if they were logically omniscient. To make their inferences, humans think about the content of an argument and the content relationships between the conclusions that they draw, and the premises from which the conclusions are drawn. Furthermore, humans are not ideal reasoners—their belief states are incomplete, inaccurate, and inconsistent, and their inference strategies are imperfect. A model of belief that would reflect human reasoning better than the possible worlds model would be one in which an agent: (i) had a belief set in which some beliefs were contradictory; (ii) could choose certain beliefs from his belief set, while ignoring others; (iii) did not have a complete set of inference rules; and (iv) was not forced to infer every proposition that followed from his belief set. A model which satisfies points (iii) and (iv), and therefore allows for points (i) and (ii), is Konolige’s deduction model [20]. Konolige’s model avoids the problem of logical omniscience, enabling the process of inference to be modelled (how p is concluded, from which premises, and using which rules). 4 CONSTRUCTIVE LOGIC In addition to a deduction theory of belief, the planner’s theorem prover also embodies a constructive/intuitionist logic, in preference to classical logic. Classical logic can be thought of as investigation into truth, to the exclusion of all else: the emphasis is truth, the goal is truth, and the proofs are viewed as a means to the end of truth. Soundness and completeness are of paramount importance, while how the conclusion p is reached is not. In contrast, constructive logic can be thought of as an investigation into proofs, rather than into truth. The emphasis in constructive logic is on what can be done, what are the rules, what moves can be made, rather than on whether p is true or not. Classical logic can tell you that it’s true that there either is or there isn’t a very valuable stamp in your attic, simply by invoking the law of the excluded middle. p ∨ ¬p is always true. No-one has to look in the attic in order to carry out the proof. In contrast, constructive logic lacks the law of the excluded middle. This does not mean that in constructive logic p ∨ ¬p is false, but it does mean that in constructive logic, if there is no construction which proves p ∨ ¬p, then p ∨ ¬p is not provable. So to constructively prove that it’s true that there is or there is not a very valuable stamp in your attic, someone will have to look in the attic (i.e., construct a proof of p or construct a proof of ¬p).9 4.1 Why constructive logic? The reason behind the preference for constructive logic is the difference between the meaning of the implication operator in constructive logic, and its meaning in classical logic. Planning actions on the basis of reasoning about what one believes necessitates being able to make inferences concerning what is true and what is false. But under classical logic, an agent cannot infer ¬ p unless the formula ¬ p is already in his belief set. So, for example, John could not infer that Sally was not single, if he knew that Sally was married, and he knew that a person cannot be both married and single at the same time. The reason why under material implication it is unacceptable to infer a belief that ¬ p from a set of beliefs that does not include ¬ p is as follows. Classically, p ⇒ q can be proved from ¬ p, which means that Bjohn p ⇒ Bjohn q 10 can be proved from ¬Bjohn p. If we then assert that Bjohn (p ⇒ q) is entailed by Bjohn p ⇒ Bjohn q, then (by transitivity of implication) we are saying that Bjohn (p ⇒ q) is entailed by ¬Bjohn p. In other words, that John is required to infer 9 . . . and they will have to be a philately expert. This formula is in an epistemic logic notation (standard since [18]). In the formulae Ba p, a is the name of an agent, Ba is the epistemic operator for belief (which incorporates the agent’s name), and p is a proposition, and the scope of the epistemic operator. In English, Ba p means a believes p. 10 from not having the belief p that he has the belief p ⇒ q. This is an unacceptably huge inferencing leap to make. Since it is unacceptable to infer knowledge of the relation of implication from no knowledge of such a relation, it is therefore also unacceptable to infer knowledge of a negation from no knowledge of that negation (because the negation ¬ p has the same meaning in classical logic as the implication p ⇒ ⊥). Under constructive implication, however, John is not required to infer the belief p ⇒ q from not having the belief that p, because under constructive implication, Bjohn p ⇒ Bjohn q cannot be proved from ¬Bjohn p, it can only be proved if you can construct a proof of Bjohn q while pretending that Bjohn p is true. This means that it is acceptable to infer Bjohn (p ⇒ q) from Bjohn p ⇒ Bjohn q, because no unacceptably huge inferencing leaps are required by this. A small inferencing leap is made, however, and here follows our justification for it. 4.2 Argument from similarity We make the assumption that the agent’s reasoning powers are the same as ours, he is neither a more able reasoner, nor a less able reasoner than we humans are. We humans understand p ⇒ q to mean that there is some connection between objects p and q. That ⇒ signifies a connection is allowed for and even emphasised in constructive logic, but it is not acknowledged in classical logic. Since I understand ⇒ to mean some kind of connection between objects, then I can infer that John will understand ⇒ in the same way—to be the same kind of connection between objects—because John is just like me. This means that I can argue as follows. I know that the following is true: John believes it0 s raining ⇒ John believes it0 s cloudy (1) in other words, I can see that there is a connection (shown by ⇒) between John believing that it is raining, and John believing that it is cloudy. I make the assumption that John’s reasoning capabilities are just like mine. l can therefore infer that if (1) is true, John will himself be able to see that there is a connection (⇒) between his believing that it is raining, and his believing that it is cloudy—in other words, I can infer that (1) entails (2): John believes (it0 s raining ⇒ it0 s cloudy) 5 (2) REASONING-CENTRED PLANNING Now that the motivation for the planner design has been outlined, the remainder of the paper will concentrate on the theme of achieving goals that do not match action effects, but that are rather entailed by them. This section will present a summarised example to give an overview of what is meant by using the effects of actions as hypotheses. As explained, the planner is an epistemic planner. All predicates in the model therefore have an ‘epistemic context’ (after [30]). For example, believes(john,(on(a,b))) appears in the model, instead of simply on(a,b). However for the purposes of this paper, whose focus is planning rather than communication, epistemic context will be omitted from the remaining examples. Figure 1 shows a pictorial representation of a blocks-world problem. The long thick horizontal line represents the table; where two objects are touching, this indicates that the upper object is on the lower object; and where an arc is drawn between two blocks, this indicates that the upper block is above the lower block, where above is the transitive closure of on. Figure 1 is followed by the Prolog code for the task:11 Initial State Goal Condition the agent reasons that above(c,b) would be entailed by a state in which on(c,b) was true: [on(c, b)] ⇒ above(c, b) (3) and so, by state-space search, it considers planning to achieve on(c,b) by unstacking a from b, then stacking c onto b. This path fails, however, since on(a,b) is also a goal, and on(c,b) and on(a,b) cannot cooccur. So the agent invokes its second above rule (above 2): c a a b b c Figure 1. Task A taskA(P):plan( [% GOAL CONDITION above(c,b), on(a,b)], [% INITIAL STATE isblock(a), on(a,b), clear(a), isblock(b), on(b,table), isblock(c), on(c,table), clear(c), % RULE: above_1 [on(X,Z)] ==> above(X,Z), % RULE: above_2 [on(X,Y),above(Y,Z)] ==> above(X,Z)], P). Let us assume that we have two operators: stack and unstack. Here is a shortened version of stack: action(stack(X, Z), pre([...],add([on(X, Z)]),delete([...]))). The critical issue here is that although the goal condition of Task A contains a predicate of the form above(X,Y), the add list of stack does not contain any above predicates, it contains only a single on predicate. Our planner is able to find stack actions to achieve above goals, because it understands the relationship between ‘above’ and ‘on’ by virtue of the two above inference rules in the initial state:12 one block X is above another block Z in situations where X is on the top of a tower of blocks which includes Z, with any number of blocks (including zero) intervening between X and Z. To achieve Task A, the planning agent begins by establishing whether or not the goal condition is entailed by the initial state. It finds by inspection of the initial state that one of the goal condition conjuncts is true (on(a,b)), and by using a theorem prover to reason with the above rules it finds that the second goal condition conjunct is false (above(c,b)). The agent then reasons to discover which actions would achieve above(c,b) from the initial state. To accomplish this, theorem proving is carried out using the effects of actions as hypotheses. Using the first of its above rules (above 1), 11 12 A conjunction of items (p ∧ q) appears in the encoded task as the list [p, q]. The rules shown are a simplified version of the model’s above rules. In the model they include constraints ensuring that any attempt that is made to prove that an item is above itself, or to plan to put an item onto itself, will fail immediately. [on(c, Y), above(Y, b)] ⇒ above(c, b) (4) Now by using both above rules, the agent reasons that above(c,b) would be entailed by a state in which c was on some block Y, and Y was on b, and so it makes a plan (by state-space search) to achieve the conjunction (on(c,Y) ∧ on(Y,b)). It starts by addressing the first conjunct, on(c,Y), and finds that action stack(c,a) would achieve it, and is applicable in the initial state. The antecedent of (4) now becomes fully instantiated: [on(c, a), above(a, b)] ⇒ above(c, b) (5) Now the second conjunct above(a,b) of (5) is addressed. Using rule above 1 only, the agent reasons that above(a,b) is entailed by the initial state, since it can see by inspection that on(a,b) is true in the initial state, and so no further actions are planned for the achievement of the antecedent of (5). Plan [stack(c,a)] is then applied to the initial state, a state is attained which entails the goal condition, and the planner returns solution [stack(c,a)]. 6 ACHIEVING TRUE GOALS A more detailed explanation will now be given of one of the features that enable the planner to make plans by reasoning with the above rules, its ‘LAN’ heuristic. During testing, it was found that there were many tasks the planner could not achieve unless it had the facility to achieve a goal that it could prove true in the current state. This sounds odd—why would one want to achieve a goal that was already true? The problem arises during the proof/achievement of a goal having a variable as an argument. The situations in which this occurs include when certain domain-specific rules are invoked (including recursive rules such as above 2, and absurdity rules), and when the preconditions of a desirable action are being addressed, as will now be discussed. 6.1 Recursive rules Consider the procedure the planner follows to achieve the task shown in Figure 2: Initial State Goal Condition e c/d c d c/d e (above(e,c) ∧ above(e,d)) Figure 2. Task B The planner initially tries to prove that both task goals are true in the initial state. To do this, it first invokes inference rule above 1 to prove the first task goal above(e,c) true in the initial state: Step 1 : P rove above(e, c) using rule above 1 : [on(e, c)] ⇒ above(e, c) Now it tries to prove that on(e,c) (the single antecedent of this rule) is true, but it fails, since block e is not on block c in the initial state. Now the planner backtracks and invokes inference rule above 2 to try to prove the same goal above(e,c) true in the initial state: Step 2 : P rove above(e, c) using rule above 2 : [on(e, Y), above(Y, f)] ⇒ above(e, c) Addressing the conjunctive antecedent of this rule, the planner now proves on(e,Y) from on(e,table) in the initial state, but then fails to prove above(table,f), and so the proof of above(e,c) from the initial state has failed. Now that this proof has failed, the planner abandons the first task goal, and tries to prove the second one instead: above(e,d). This proof also fails, and so the planner sets about making a plan to achieve the first task goal above(e,c). To do this, as with the attempted proofs, the planner first invokes inference rule above 1: Step 3 : Achieve above(e, c) using rule above 1 : [on(e, c)] ⇒ above(e, c) As before, the planner now tries to prove the rule’s antecedent on(e,c) true, but this time when the proof fails, it makes a plan to achieve on(e,c) (to stack e onto c). Having done this, the planner finds that it cannot achieve the second task goal above(e,d) without invoking a goal protection violation [29], and so it backtracks to try to achieve the first task goal above(e,c) by invoking the alternative inference rule above 2: Step 4 : Achieve above(e, c) using rule above 2 : [on(e, Y), above(Y, f)] ⇒ above(e, c) Now the first conjunct on(e,Y) of the rule’s antecedent is addressed. Since every block in this blocks world is always on something, either on the table, or on another block, then in any state in which there is a block called e, on(e,Y) will always be true—and this is where we see that it is necessary to be able to achieve a goal that can be proved true in the current state. A proof of on(e,Y) holds, but block e may nevertheless have to be moved off Y in order to form a solution. To enable the planner to achieve true goals without disabling it, a heuristic was built in which constrains this activity to restricted circumstances. The heuristic has been named ‘LAN’, which stands for ‘look away now’. ‘Look away now’ is typically said by British TV sports news readers who are about to show a new football match result. It is said as a warning to viewers who don’t want to know the score yet, perhaps because they have videoed the match. The name ‘LAN’ reflects the characteristic feature of the heuristic, which is to look away from the Prolog database (the facts), and pretend that something is not true, for the sake of future gain. The LAN heuristic is restricted to working on goals that are not fully instantiated (i.e., goals which include at least one variable), which means it is only invoked in the following three contexts: when certain preconditions are being achieved, when the antecedents of rules are being achieved, and when there are variables in the task goal. The planner is prohibited from achieving goals which are fully instantiated and can be proved true in the current state, because there does not appear to be any need for this, and allowing this dramatically increases the search space, hence slowing the planner down enormously. 6.1.1 Skipping proofs At Step 4 in the procedure described above for Task B, although the first conjunct on(e,Y) of the antecedent of above 2 can be proved in the initial state of Task B from on(e,table), at this point no attempt is made to prove on(e,Y), the planner goes straight to searching for a plan to make on(e,Y) true. This skipping of a proof is only permitted: (i) when the first conjunct of the antecedent of a recursive rule is being addressed; and (ii) when the goal condition is false in the current state. Being unable to do a proof at this point speeds the planner up considerably, as it cuts out much repetitive searching and backtracking. It does not break the planner, because the proof has already been tried earlier and failed. If on(e,Y) were already true in a way conversant with the solution, the planner would not have reached this point (which it can only do by backtracking). The planner has already tried proving on(e,Y) (via above 1). The proof may have failed, or it may have succeeded, in which case the path must have reached a dead end and the planner must have backtracked to try achieving on(e,Y) instead. Either way, the planner must have also tried achieving on(e,Y) by stacking e onto Y (also via above 1). This must, however, also have failed to lead to a solution, because the planner has now backtracked to try achieving on(e,Y) by using above 2. 6.2 Preconditions When achieving the partially instantiated first conjunct of an antecedent, the planning agent does not even attempt a proof, because it knows that a proof could not lead to a solution. However, when a partially instantiated precondition of an action is being addressed, the planning agent does attempt a proof of the precondition, but if a successful proof fails to lead to a solution, the agent is permitted to try to achieve the precondition, in spite of it being true in the current state. Figure 3 shows a task which illustrates: Initial State Goal Condition c d p f q e r e q f (on(e,q) ∧ above(e,f)) Figure 3. Task C To achieve the goal condition conjunct above(e,f), the antecedent of above 2 gets instantiated as follows: [on(e, Y), above(Y, f)] ⇒ above(e, f) The preconditions of stack(e,Y) (for achieving on(e,Y)) include the following: [on(e, table), clear(Y), clear(e)] Block e is already on the table in the initial state, so the first condition is met. To get Y (some block) clear, nothing needs doing, because there is a clear block in the initial state that is not block e.13 To get e clear, the tower positioned on top of e in the initial state is unstacked, leaving five blocks clear (Figure 4). Precondition clear(Y) remains true, as there is still a clear block that is not e. p 6.2.1 Inverse action problems There is a problem with allowing the planner to achieve preconditions that are already true. The agent is prone to making ‘undesirable’ plans like [stack(X,Z),unstack(X,Z)] which do not advance the solution. For example, in order to get block a onto the table when a is not clear, what we want to do is first unstack whatever happens to be on top of a, in order to make a clear. If this path fails to lead to a solution, we do not want the planner to backtrack and achieve the precondition on(X,a) by planning to put something onto a, just so that the same block can then be unstacked. This, however, is what happens if we allow the planner to backtrack and achieve the precondition on(X,Z) of action unstack(X,Z) when it is true in the current state. One of the reasons for the generation of these ‘undesirable’ plans is that operators stack(X,Z) and unstack(X,Z) are related to each other in the sense that each operator is a kind of ‘inverse’ of the other. By this is meant that the ‘net’ effects of each operator satisfy precisely the preconditions of the other, nothing more happens than this, and nothing less. ‘Net’ effects means here the union of the contents of an operator’s add list, and those things which are true before the application of the operator and remain true after its application. If this was not stopped, the fruitless time the planner would spend making plans including undesirable ‘inverse’ pairs of actions would be significant. However, if the planner is totally prohibited from trying to achieve true goals, many tasks fall outside its scope (Task C is one of these). In order to cope with such cases, the planner carries out a search for action pairs where: (i) the net effect of applying them in sequence is zero; and (ii) one of them has a precondition which is an effect of the other. For one member of each such pair we mark as ‘special’ the preconditions which match effects of the other member of the pair (for instance, the precondition on(X,Z) of action unstack(X,Z) might be marked). If a precondition has been proved true in the current state, but the path has failed to lead to a solution, the planner is permitted to backtrack to try achieving it, unless the precondition has been marked; in this case, the agent may only attempt re-proofs, it may not try to achieve the precondition. (These restrictions only apply if the marked precondition cannot be proved in the current state: if it is not already provable then the planner is at liberty to try to achieve it.) q f d c e r Figure 4. Task C mid-search The difficulty is that to achieve this task, we do not at this point need any block Y to be clear, we need q to be made clear, so that the planner can make plans to stack e onto q, and q onto f —in other words, although clear(Y) is true, it is true ‘in the wrong way’, and the planner needs to make it true in a different way by planning some actions to achieve clear(Y), even though it is true. The LAN heuristic enables the planner to deal with this difficulty. 13 Operator constraints ensure that block Y cannot be the same block as block e. 7 EXPLOITING RECURSIVE RULES This section will discuss in more detail how the antecedent of above 2 is used to achieve an above goal. Consider Task D (depicted in Figure 5). Let us assume the planning agent has reached a point at which rule above 2 is instantiated for the achievement above(e,c): [on(e, Y), above(Y, c)] ⇒ above(e, c) (6) The first important point here is that when the agent does a proof of above(e,c), it is proving both conjuncts of the antecedent true at the same time/in the same state. However, when the agent is trying to achieve above(e,c), the plan to achieve on(e,Y), and the plan to achieve above(Y,c) apply to different times/states: the plan to Initial State Goal Condition heuristic reduces encounters with goal interactions considerably. Figure 6 shows a task that demonstrates ThA at work: e Initial State Goal Condition c/d/f c/d/f c d e c/d/f f (above(e,c) ∧ above(e,d) ∧ above(e,f)) c a a b b c (on(a,b) ∧ above(c,b)) Figure 5. Task D Figure 6. Task E achieve on(e,Y) must (with these operators) chronologically follow the plan to achieve above(Y,c). Secondly, the agent can infer from (6) that in order to achieve the goal above(e,c), action stack(e,Y) will have to be included at some point in the solution, and action stack(Z,c) will also have to be included in the solution. The agent does not yet know the identities that Y and Z will take in the solution. Y may be identical to Z, or it may not: Here is above 2 instantiated for the achievement of goal above(c,b): [on(c, Y), above(Y, b)] ⇒ above(c, b) (7) A plan is made that would achieve the first conjunct on(c,Y) of the antecedent of (7) from the initial state W0: [. . . stack(Z, c), . . . stack(e, Y), . . .] Thirdly, we observe that the antecedent of above 2 is conjunctive, and we know that using a linear approach to achieve a conjunction of goals presents the problem of goal interaction [29]. It may not be possible to achieve on(e,Y) from the state which is attained by the achievement of above(Y,c) without generating goal protection violations. Bearing these points in mind, a linear approach to achieving the antecedent of above 2 is unattractive. A linear approach will suffer from negative goal interactions, and even if it did not, there is no guarantee that the plan made to achieve [on(e,Y),above(Y,c)] from the current state could form part of the solution, because the consequent of (6), above(e,c), is itself one of a conjunction of goals in the goal condition. This planner therefore takes a slightly non-linear approach to the achievement of the antecedents of recursive rules: actions are planned within a linear framework, but a bit of plan cross-splicing is done, by means of a heuristic that has been named ‘Think Ahead’, or ‘ThA’ for short. 7.1 Thinking ahead ThA works by incorporating thinking about the preconditions of chronologically later actions into the planning of earlier actions, which it does by exploiting the chronological information in the antecedent of a recursive domain-specific inference rule. It takes the plans made to achieve the separate conjuncts in the antecedent of a recursive rule, and uses parts of them to construct a new ‘suggested’ plan. The approach is a mixture of total order (linear) and plan-space planning; a commitment to chronological ordering is made at the time the actions are found, but the ordering of actions within the plan is systematically changed later on. By taking into account the preconditions of a later goal during planning for an earlier goal, the ThA [unstack(a, b), unstack(b, c) | stack(c, a)] (8) (8) is divided into two parts: the single action stack(c,a) that will achieve on(c,Y); and the ‘preconditions plan’ [unstack(a,b),unstack(b,c)], the plan that would transform W0 into a state in which stack(c,a) was applicable. Now a plan is made to achieve the second (now fully instantiated) antecedent conjunct of rule (7), above(a,b), also from state W0: [unstack(a, b) | stack(a, b)] (9) (9) is similarly divided into a final action, in this case stack(a,b), and a preconditions plan: [unstack(a,b)]. The preconditions plan for achieving on(c,Y) then gets spliced onto the final action planned for the achievement of above(Y,b): [unstack(a, b), unstack(b, c), stack(a, b)] (10) When (10) is applied to W0, Figure 7 is the result: a c Figure 7. b Task E mid-search: W1 By this point, task goal above(c,b) is still not true. To complete the solution, the planner again addresses above(c,b), but this time from the new state. Rule above 2 is again used: [on(c, Y), above(Y, b)] ⇒ above(c, b) Initial state (11) but this time no cross-splicing is done, and a different plan is made: [stack(c,a)]. This is because the first antecedent member on(c,Y) of (11) has plan [stack(c,a)] made to achieve it, whereupon the second antecedent member becomes instantiated to above(a,b). This is found to be true in the current state (W1), so no plan is made to achieve it. The path succeeds, and the following solution is returned: Goal condition a c b a b Figure 8. c Task Sussman1 [unstack(a, b), unstack(b, c), stack(a, b), stack(c, a)] 7.3 In the case of Task E, the cross-splicing alone led to a solution. There is, however, no guarantee that the application of the preconditions plan for the first conjunct will always attain a state in which the final action of the plan for the second conjunct can be applied. This is dealt with in the model by an additional state-space search. 7.2 A guarded states check prevents repeated work Planning actions to achieve above( , ) goals by using the two above rules can lead to fruitless repeated reasoning, unless this is checked. For example, consider again Task E (see Figure 6). This task has two goals: [on(a,b),above(c,b)]. Here is a trace of the first stages of reasoning for this task, in which the planner attains the same state via two different paths: 1. Achieves goal above(c,b) by stacking c onto b (using above 1). 2. The current complete plan is [unstack(a,b),unstack(b,c),stack(c,b)]. 2a. Tries to subsequently achieve on(a,b), but can’t without goal protection violation. 2b. Fails and backtracks to point (1) to try achieving above(c,b) in a different way (by using above 2). 3. Plans to achieve first conjunct of above 2, on(c,Y) by stack(c,Y). 3a. Preconditions of stack(c,Y) include that c is on the table and that Y is clear. 3b. To get c onto the table, unstacks a from b, and then b from c. 3c. To get Y clear, nothing has to be done, because two blocks are now clear (a and b). Block b is the first clear block the planner finds, so Y gets instantiated to b. 4. Achieves first conjunct on(c,Y) of above 2 by stacking c onto b. 5. The current complete plan is [unstack(a,b),unstack(b,c),stack(c,b)]. By this point, the planner has made the same plan [unstack(a,b),unstack(b,c),stack(c,b)] twice in two different ways. If the planner does not fail now and backtrack, it will repeat the work it did at point (2a) (trying to subsequently achieve on(a,b) and failing), before backtracking to try to achieve on(c,Y) in a different way. Fruitless repeated work like this is prevented by ‘guarding’ each state that is attained in the search for a solution. If the identical state is reached for a second time, the planner knows that there is no point in continuing, and is instructed to fail and backtrack. For simpler tasks (like Task E), the time wasted doing repeated work like this is not significant, but for harder tasks, the time grows very quickly. For Task C, for example, the planner takes 7 seconds to return the first solution if the guarded states check is made, and 95 seconds if the check is not made. Solving the Sussman anomaly With regard to the Sussman anomaly [27], if the goal condition is expressed by on(a,b) ∧ on(b,c), as in the original representation of the problem (see also Figure 8): task_sussman1(P):plan( [ % GOAL CONDITION on(a,b), on(b,c)], [ % INITIAL STATE [isblock(a), on(a,table)], [isblock(b), on(b,table), clear(b)], [isblock(c), on(c,a), clear(c) ]], P). the planner is unable to invoke the ThA heuristic (which reduces goal interaction problems), because ThA is designed for the achievement of the antecedents of domain-specific inference rules, and Task Sussman1 doesn’t contain any such rules. This means that the planner fails for this task. However, the planner can achieve the Sussman anomaly by the appropriate use of an above( , ) goal, and the inclusion of the above rules in the initial state. Figure 9 shows one option: Initial state Goal condition a b c a c b Figure 9. Task Sussman2 task_sussman2(P):plan( [ % GOAL CONDITION above(a,c), on(b,c)], [ % INITIAL STATE [isblock(a), on(a,table)], [isblock(b), on(b,table), clear(b)], [isblock(c), on(c,a), clear(c)], % RULE: above_1 [on(X,Z) ==> above(X,Z), % RULE: above_2 (on(X,Y) and above(Y,Z)) ==> above(X,Z) ]], P). for which the following plan is returned: P = [unstack(c, a), stack(b, c), stack(a, b)]? Alternatively, if there are additional blocks on the table besides a, b and c, and one requires that no blocks may ever be stacked between a and b, or between b and c in any solutions, the following goal condition will ensure this: task_sussman3(P):plan( [ % GOAL CONDITION above(a,c), on(a,b), on(b,c)], ... P). Yet another alternative is to specify the goal condition as the single goal above(a,c). The first solution returned for this task is: P = [unstack(c, a), stack(a, c)]?; while the second solution generates the state specified by the Sussman anomaly: P = [unstack(c, a), stack(b, c), stack(a, b)]? To solve the Sussman anomaly, then, the planner solves a harder problem, a consequence of which is that there are a number of different goal conditions which will ensure that the configuration of blocks shown in Figure 10: a b c Figure 10. can be achieved from the configuration of blocks shown in Figure 11: c a b Figure 11. 8 without speech acts.14 Deliberations on the nature of the effects of communicative actions, on the intentions of a speaker, and on the role of the hearer in the evolution of the environment, guided decisions concerning planner design towards a model that could achieve entailed goals by reasoning about actions, and which embodied a theory of belief representative of human reasoning. The planner therefore has a reasoning-centred approach, rather than the more popular search-centred approach. This enables it, for example, to make plans for building blocks-world towers whose configurations are not specified in the goal condition—towers of arbitrary heights. The planner achieves goals of the form above(X,Y) by knowing that ‘above’ is the transitive closure of ‘on’, and using that knowledge to reason over the effects of actions. A theorem prover uses the effects of actions as hypotheses. A goal is proved by assuming the effect of some action is true, on the grounds that it would be true in the situation that resulted from performing that action. The model can thus achieve goals that do not match action effects, but that are entailed by them. Decisions made concerning planning approach, logic, theorem proving, and modelling belief, were considered to be of paramount importance to the success of the research into planning dialogue without speech acts, and so care was taken to make suitable choices. The planner’s epistemic theorem prover therefore embodies a constructive logic, and a deduction model of belief. The planner’s essential design is an interweaving of reasoning, backwards state-space search via STRIPS operators, and plan-space planning. The search is enhanced with two specially designed heuristics which enable it to overcome the difficulties presented by planning for conjunctive goals using a backwards-searching state-space planner. The LAN (‘look away now’) heuristic enables the planner to, in certain highly restricted contexts, achieve a goal that is provable in the current state; the ThA (‘think ahead’) heuristic takes into account the preconditions of a chronologically later goal during planning for an earlier goal, which it does by exploiting the information held in the antecedents of recursive rules. No claims are being made that the model is competitive in terms of efficiency with state-of-the-art AI planners for large problems not requiring inference; the claim that is being made is that it can solve an essentially more difficult kind of problem than these fast planners can solve, by understanding and using actions in the wider context of their positive ramifications. Since the model represents the first steps towards developing a very particular kind of planning, it suffers from a number of obvious limitations, including the following: • Unlike real knowledge states, the initial state is very small; • The initial state specifies everything the planning agent needs to know—the agent does not have to cope with an underspecified initial state, for example uncertain beliefs, or a lack of necessary information; • The planner does not take into account possible conflicts between resources of time, energy, materials, etc.; • The planner does not take into account the durative nature of actions, or that the universe is dynamic. SUMMARY AND CONCLUSION This paper has presented an AI planner that was designed and implemented in preparation for some research into planning dialogue 14 The research eventually led to the development of a single linguistic act for all contexts [12], and hence the abandonment of a distinct inform operator. ACKNOWLEDGEMENTS This research was funded by an EPSRC studentship. REFERENCES [1] J. Allen, J. Hendler, and A. Tate. Editors. Readings in planning, 1990. San Mateo, California: Morgan Kaufmann. [2] J. F. Allen and C. R. Perrault. Analyzing intention in utterances, 1980. Artificial Intelligence 15: 143–78. Reprinted in [16], pp. 441–58. [3] D. E. Appelt. Planning English referring expressions, 1985. Artificial Intelligence 26: 1–33. Reprinted in [16], pp. 501–17. [4] J. L. Austin. How to do things with words, 1962. Oxford: Oxford University Press, 2nd edition. [5] A. L. Blum and M. L. Furst. Fast planning through planning graph analysis, 1995. In Artificial Intelligence 90: 281–300, 1997. [6] B. Bonet and H. Geffner. Hsp: Heuristic Search Planner, 1998. Entry at Artificial Intelligence Planning Systems (AIPS)-98 Planning Competition. AI Magazine 21(2), 2000. [7] B. C. Bruce. Generation as a social action, 1975. In B. L. Nash-Webber and R. C. Schank. Editors. Theoretical issues in natural language processing, pp. 74–7. Cambridge, Massachusetts: Ass. for Computational Linguistics. Reprinted in [16], pp. 419–22. [8] P. R. Cohen and H. J. Levesque. Rational interaction as the basis for communication, 1990. In [9], pp. 221–55. [9] P. R. Cohen, J. Morgan, and M. E. Pollack. Editors. Intentions in communication, 1990. Cambridge, Massachusetts: MIT. [10] P. R. Cohen and C. R. Perrault. Elements of a plan-based theory of speech acts, 1979. Cognitive Science 3: 177–212. Reprinted in [16], pp. 423–40. [11] E. A. Feigenbaum and J. Feldman. Editors. Computers and thought, 1995. Cambridge, Massachusetts: MIT Press. First published in 1963 by McGraw-Hill Book Company. [12] D. Field and A. Ramsay. Sarcasm, deception, and stating the obvious: Planning dialogue without speech acts, 2004. Artificial Intelligence Review 22(2): 149–171. [13] R. E. Fikes and N. J. Nilsson. STRIPS: A new approach to the application of theorem proving to problem solving, 1971. Artificial Intelligence 2: 189–208. Reprinted in [1], pp. 88–97. [14] C. Green. Application of theorem proving to problem solving, 1969. In Proc. 1st IJCAI, pp. 219–39. Reprinted in [1], pp. 67–87. [15] H. P. Grice. Logic and conversation, 1975. In P. Cole and J. Morgan. Editors. Syntax and semantics, Vol. 3: Speech acts, 1975, pp. 41–58. New York: Academic Press. [16] B. J. Grosz, K. Sparck Jones, and B. L. Webber. Editors. Readings in natural language processing, 1986. Los Altos, California: Morgan Kauffmann. [17] B. J. Grosz and C. L. Sidner. Plans for discourse, 1990. In [9], pp. 416– 44. [18] J. Hintikka. Knowledge and belief: An introduction to the two notions, 1962. New York: Cornell University Press. [19] J. Hoffmann and B. Nebel. The FF planning system: Fast plan generation through heuristic search, 2001. Journal of Artificial Intelligence Research 14: 253–302. [20] K. Konolige. A deduction model of belief, 1986. London: Pitman. [21] S. Kripke. Semantical considerations on modal logic, 1963. In Acta Philosophica Fennica 16: 83–94. Also in L. Linsky. Editor. Reference and modality (Oxford readings in philosophy), 1971, pp. 63–72. London: Oxford University Press. [22] J. McCarthy and P. J. Hayes. Some philosophical problems from the standpoint of artificial intelligence, 1969. Machine Intelligence 4: 463– 502. Reprinted in [1], pp. 393–435. [23] A. Newell, J. C. Shaw, and H. A. Simon. Empirical explorations with the logic theory machine, 1957. In Proceedings of the Western Joint Computer Conference, 15: 218–239, 1957. Reprinted in [11], pp. 109– 133. [24] A. Newell and H. A. Simon. GPS, a program that simulates human thought, 1963. In [11], pp. 279–93. [25] X. Nguyen and S. Kambhampati. Reviving partial order planning, 2001. In Proc. IJCAI, pp. 459–66. [26] M. E. Pollack. Plans as complex mental attitudes, 1990. In [9], pp. 77– 103. [27] E. D. Sacerdoti. The nonlinear nature of plans, 1975. In Proc. 4th IJCAI, Tbilisi, Georgia, USSR, pp. 206–14. Reprinted in [1], pp. 162– 70. [28] J. R. Searle. What is a speech act?, 1965. In M. Black. Editor. Philosophy in America, pp. 221–39. Allen and Unwin. Reprinted in J. R. Searle. Editor. The philosophy of language, 1991, pp. 39–53. Oxford: Oxford University Press. [29] G. J. Sussman. The virtuous nature of bugs, 1974. In Proc. Artificial Intelligence and the Simulation of Behaviour (AISB) Summer Conference. Reprinted in [1], pp. 111–17. [30] L. Wallen. Matrix proofs for modal logics, 1987. Proc. 10th IJCAI, pp. 917–23.
© Copyright 2024