Rational Agents with Limited Performance Hardware - Multiple Execution Architecutes

A diagram of this architecture

Philosophy

Ogasawara and Russell approach architecture design from a decision-theoretic viewpoint. At any point in time, they would like the architecture to choose the optimal the action d*. The optimal action is the one that gives the largest expected utility for the state probability distribution that results from that action:

Here, d ranges over all actions, s in S ranges over all states that may result from action d, p(s|d) is the probability of obtaining state s given action d, and u(s) is the utility of state s.

Computing this function for every action might take too long in circumstances where reaction time is important. But, it is as accurate or more so than any other method, so it ought to be used when possible. To speed the computation, Ogasawara and Russell identified 3 types of knowledge needed to use the equation:

Given information about the state at time t, new information about the state may be deduced (A). The effects of actions (B) on the given state may compute a new state. Utility knowledge about future states (C), combined with Decision Theory's Maximum Expected Utility principle, may then compute the next best move.

After decomposing the knowledge into these logical mappings, the insight Ogasawara and Russell had was that other knowledge could be used to do two or more mappings in a combined manner. They identified the following "composite mappings":

E knowledge maps the current state and a given action directly to the expected utility, F knowledge maps potential future states to the best action that yields them, and D knowledge directly maps the current state to the best action for it. Mapping usages of knowledge are diagrammed below:


Architecture:

Not wanting to commit to any one method, the RALPH-MEA architecture uses all four paths through the knowledge diagram given above. Each method of mapping the current state to the best action is done by a module called an Execution Architecture (or EA). Ogasawara and Russell name the EAs:

All the knowledge is in one global database that the EAs share. All of the EAs may use additional information about the current state (A). Because the Goal-Based and Decision-Theoretic EAs use knowledge that map states to other states (B), they may plan sequences of actions by iteratively assigning the resulting state to the present state.

The Condition-Action (CA) is the fastest EA, mapping from the current state to the best action in just one step. The Action-Utility (AU) EA benefits from good knowledge of the effects of actions (E). The Goal-based (GB) EA similarly benefits from good utility knowledge of states, and may do multi-step planning. The Decision-Theoretic (DT) EA also may do multi-step planning, as well as the full treatment. The cost of the full treatment makes it the slowest, though.

Perception data is fed to the four EAs, which work in parallel. A meta-level planner takes the output (which, in general, emerges from the EAs at different times), and decides what operation to execute. This choice relies on domain information too, so the meta-level arbitrator also needs access to the global database. The architecture is diagrammed below:


Agent Properties:

All of RALPH's EAs and the meta-level arbitrator use the same database, so the knowledge is uniform and global. They did not explicitly state so, but for problems to be decomposable into a finite number of states the representation must be symbolic. Knowledge for the deliberative Decision-Theoretic EA is declarative, while that for the reactive Condition-Action EA is more procedural.

Four EAs run in parallel, so the architecture is modular. Because the architecture operates in perceive-think-execute cycles, I would characterize its sensing policy as "intermediate cost". This cyclic mode of operation allows the environment to change dramatically, so to make the best decisions it should take in as much information as its sensors allow. But the sensors need not be on during thinking. The condition-action EA allows for rapid response.


Capabilities:

The RALPH capability that Ogasawara and Russell tout the most is planning. Planning is done to select the action sequence with the highest utility. The time for planning, however, may grow exponentially with depth. To bound the time planning takes, the act of planning is considered an action in its own right called replanning.

There is no architecturally-defined learning mechanism in RALPH, but there is a natural way to add one. The Action-Utility and Goal-Based Execution Architectures may use rules cached from the more deliberate decision-theoretic Execution Architecture. Likewise, the reactive Condition-Action Execution Architecture may use cached results from all three.


Environment and Agent Body:

RALPH was designed for dynamic, unpredictable robotic domains. Although the domain was simulated, it included details like sensor and mechanical failures. Its EA configuration allows it to think for as long as it thinks it has time for, so its computational limits are a function of the dynamicism of its environment. RALPH has demonstrated its ability to replan when given goals of different importance if new circumstances arise.

RALPH, however, may only work in domains where the Markov assumption holds: that all relavent information about a system be discernable from the present state. Also, if it is thinking and the sensors detect something of potential importance, there seems to be no way of interrupting the architecture.

Ogasawara and Russell claim that the meta-level planner chooses the best action for the given amount of time. But their paper does not tell how it knows how much time it has.


Issues:

RALPH's meta-level planner chooses between actions that have been recommended EAs of varying reasoning capabilities. The quicker it is, the less "thought" goes into their solution. The choice is made based on the urgency of the need to respond. More time (ideally) yields a better solution. Thus, RALPH may considered rational, and efficient in terms of time. But that it runs four EAs in parallel makes it inefficient in terms of resource usage.

How does RALPH scale? The Markov assumption limits its applicability to a subset of interesting environments. Also, it will get slower with more data. Unfortunately, the "smartest" EA, the Decision Theoretic one, should suffer the most because it makes the most memory references.

RALPH was not designed to model how humans are believed to think.


Other Architectures

Back to the Title Page