[Note: Mathematical notation on this page has degraded in the HTML rendering.]
For cases where the operator costs are deterministic and solely a function of the current state, A* is a well-understood state-space search technique that guarantees optimal paths. However, when costs are stochastic and path dependent, A* may prune partial paths that could lead to superior solutions.
Path-dependent costs occur in situations where the cost, cij, of applying operator j in state i depends on the cost of the path taken to that state. One source of path dependence is a utility function that is nonlinear in cost, for example if cost is measured in time and utility is based on meeting a deadline.
It has been shown (Kaufman and Smith 1993) that A* produces optimal solutions even with path-dependent cost functions, as long as a particular consistency, or monotonicity, condition applies. The monotonicity condition demands that for any path costs c < c',
Note that this form of monotonicity-on the accumulated path cost-is weaker than requiring monotonicity of cij itself.
It follows from this condition that for two paths, A and A', both leading to the same state, the superiority of one, Cost(A) < Cost(A'), implies that the same relation holds for these paths extended by a given action, a, that is, Cost(Aa) < Cost(A'a). Given this result, it is safe to prune A' because for any path to the goal based on that path, there is a path at least as good based on A.
A second variation of standard state-space search is to admit stochastic costs, that is, to treat cij as a random variable. If cij depends only on the state, and utility is linear in cost (i.e., the agent is risk neutral), then it is sufficient to use A* with operator costs represented by their means. However, if the problem requires both stochastic and path-dependent operator costs, then we are no longer justified in pruning paths based upon expected costs. Rather we must use the Stochastic Dominance A* (SDA*) algorithm (Wellman, Ford, and Larson 1995). There are four primary differences between SDA* and A*.
Stochastic Monotonicity: We require a stochastic version of the monotonicity condition used to address path dependence in the deterministic case. Specifically, for all costs c, c', and z, c < c' implies
Pruning: Rather than keeping the single lowest-cost path to a node, we must keep all of the admissible paths, where admissibility is defined by stochastic dominance. We have shown that if paths A and A' lead to the same state and A stochastically dominates A', Cost(A) <SD Cost(A'), then A' cannot be part of a uniquely optimal solution. Specifically, the stochastic monotonicity condition in this situation entails that for any incremental action a, Cost(Aa) <SD Cost(A'a). If, however, A' is not stochastically dominated, then it is possible to construct an example where it does in fact lead to the optimal solution.
Heuristics: When costs are path dependent, the heuristic functions must likewise take into account path cost as well as the state. And whereas a heuristic is admissible for A* if it underestimates cost to the goal, for the stochastic case an admissible heuristic must produce estimated cost distributions that stochastically dominate the actual cost distribution.
Priority: Search nodes are expanded in order of estimated expected utility. Like A*, the algorithm terminates when a goal node is popped off the priority queue. The reasoning is as follows: given that the heuristic function is stochastically admissible, and the accumulated path costs stochastically monotone, expected utility is monotonically decreasing along a path. Thus, when a solution is found, any intermediate path that had an estimated expected utility less than that of the solution must have already been explored or pruned.
Under the conditions described above, SDA* provides an optimal and complete solution procedure for problems with path-dependent stochastic operator costs. In Table 1a, we see that path-dependence alone can be accommodated by a monotonicity condition (Kaufman and Smith 1993), and stochastic costs alone by using means, but the conjunction of both requires SDA* (Wellman, Ford, and Larson 1995).
| scalar costs | multidimensional costs | |||
|---|---|---|---|---|
| State Dependent | Path Dependent | State Dependent | Path Dependent | |
| Deterministic Costs | A* | A* | MOA* | MOA* |
| Stochastic Costs | A* with means | SDA* | MOA* with means | MO-SDA* |
However, in our effort to apply SDA* to a problem of factory scheduling under uncertainty (Wurman and Wellman 1996), we required a further extension. The utility function employed in this problem depends on the tardiness penalty of each order, and thus our cost is not expressible as a scalar quantity (without undue expansion of the state space). We therefore define a two-dimensional cost structure, representing the penalty accumulated for orders completed thus far, along with the time taken along the path. Fortunately, the extension of A* to multidimensional costs has already been investigated by Stewart and White (Stewart and White 1991), who proposed the Multiobjective A* (MOA*) algorithm for this case.
Like SDA*, MOA* extends A* by pruning paths based on dominance rather than point utility. With multidimensional costs, it is generally not possible to order intermediate solutions. Under the assumption that utility is nonincreasing in each cost dimension, however, the policy of pruning based on dominance guarantees finding all optimal solutions. This result can be extended to stochastic and path-dependent costs in a manner analogous to the scalar case, as we diagram in the table above. The particular contribution of our work on factory scheduling under uncertainty (Wurman and Wellman 1996) lies in the lower right cell of this table.