Test beds are the environments in which the standard tasks may be implemented. In addition to the environment itself, these tools provide a method for data collection, the ability to control environmental parameters, and scenario generation techniques. The purpose of a test bed is to provide metrics for evaluation (objective comparison) and to lend the experimenter a fine-grained control in testing agents.
The use of test beds -- especially in small (highly abstracted) environments -- is somewhat controversial. There is a tension between bottom-up and top-down approaches to agent design. The former, which is somewhat reductionist, seeks to create agents by defining capabilities independent of one another. Test beds provide the means for developing these agents in a piece-meal fashion. This results in an incremental theory of behavior. The top-down approach is more engineering oriented: agents are built and then their performance tested. For such an approach, test beds offer only partial utility since abstracting away environmental considerations may make the agent appear more capable than it actually is (i.e. the results may not be as general as they appear).
In both these approaches, a small problem is used as an exemplar for very large problems. Yet there are issues in using small problems to predict or validate behavior on larger ones. Most significant are the issues of scalability and generality. In the first case, maintaining rationality with the addition of more knowledge and more capabilities may become impossible; efficiency decreases as the scale of the problem increases. Similarly, there may be interactions among capabilities and between individual capabilities and the environment that were not considered in the smaller problem. Thus, the system does not generalize to larger problems.
One way to avoid these issues is to experiment on full-scale systems. Controlled experimentation in such problems has been considered very difficult and even impossible. However, the systematic evaluation of large-scale systems will be necessary as cognitive architectures move out of the research laboratory and into real-world applications.