Here, d ranges over all actions, s in S ranges over all states that may result from action d, p(s|d) is the probability of obtaining state s given action d, and u(s) is the utility of state s.
Computing this function for every action might take too long in circumstances where reaction time is important. But, it is as accurate or more so than any other method, so it ought to be used when possible. To implement a decision-theoretic agent, Ogasawara and Russell identified a number of different knowledge types and four execution architectures of varying speeds and sophistication.
The assumptions implicit in RALPH-MEA include:
All of the knowledge types encode information conditional only on the current state. Future performance and events are assumed not to depend on the entire history of the agent. In other words, the representation of the current state must incorporate all relevant information.However, the RALPH developers have addressed this problem by making a distinction between perceived state and actual state. The actual state does contain all relevant information, but the perceived state may not be complete because of sensor errors (or the lack of a sensor to perceive certain information). Therefore, the actual state by definition obeys the Markov assumption, but the perceived state may not. The developers model sensor errors with a joint probabilistic network, which maintains a representation of the joint distribution over world states. This allows RALPH to probabilistically react to the actual state notwithstanding sensor errors.
A major issue with decision theory is its computational intractability. The hope of an MEA is that the different knowledge types will provide enough versatility so that at least one EA will produce a good answer within the limited response time.