Learning by Observation and Instruction
One of the primary factors limiting the capabilities of advanced computer
generated forces is the time and effort required to extract, encode, debug,
maintain, and extend the knowledge that drives their behavior. The goal of
this research is to explore, develop, and evaluate automated techniques for
extending and correcting the knowledge of advance synthetic forces. These
techniques will not only improve the development of synthetic forces, but
will also make it possible to quickly correct and customize tactics,
through both instruction and learning by observation. Such techniques can
lead to more capable, and more realistic synthetic forces that in turn
support more accurate and more realistic simulation environments for
training, mission rehearsal, and analysis.
The current approach for building synthetic forces relies on an iteration
of multiple stages. The knowledge engineer(s) starts by consulting
existing manuals for all "formal" information available on the desired
behavior of the synthetic forces (SFs). Formal documents, such as field
and training manuals, provide only bare-bones specification of behavior.
They rightly assume that there will be other forms of instruction
(classroom and briefings) as well as field training and experience. Thus,
the knowledge engineer must rely on a subject matter expert (SME) to "fill
in the details." This involves extensive interviews followed by the
development of the SF. However, once the SF is built, there must be
additional rounds of interviews as the knowledge engineer discovers areas
in which the knowledge is incomplete or incorrect. In addition, it is
necessary for the expert to view the behavior to verify the correctness of
the behavior. This is critical because it is extremely difficult for the
knowledge engineer and the SME to specify all aspects of behavior,
especially when there are interactions between different goals/objectives
in the SF. This complete process is very time consuming and has the
additional flaw that at some point it stops, not providing a means for
continual improvements which are crucial given the dynamic nature of both
available weapons systems and doctrine.
We are pursuing two basic approaches to automatically and
semi-automatically construct synthetic forces: learning by observation and
learning by instruction.
- Learning by Observation
We are developing technology that involves detailed monitoring of a human
performing the desired tasks. Machine learning techniques are used to
induce and extract the knowledge required to generate that behavior in a
SF. This technique is called "learning by observation," or "behavioral
cloning" and has been demonstrated for learning to perform simple maneuvers
in a simulated plane by Claude Sammut and his colleagues. However, his
techniques required manual segmentation of the human's behavior. We have
recently demonstrated an extension to his technique for a small bit of
tactical behavior where real-time annotations are provided, making the
manual segmentation unnecessary. We see this as a very rich area of
research on issues dealing with directly extracting knowledge and
performance data from SMEs. This will allow us to not only build synthetic
forces quickly, but to build synthetic forces customized to specific
individuals or groups of individuals. Thus, this may make it possible to
quickly construct synthetic forces that use a specific country's tactics,
or to create a group of synthetic forces that model the diversity and
variety of behavior of an actual fighting group.
- Learning by Instruction
We are also developing technology to create SFs that can receive
instruction directly from an SME to correct their behavior for the current
situation. In this approach, either the SF recognizes that it needs help
and requests assistance from an SME, or the SME notices some errors in the
SF's behavior. In either case, the SF momentarily pauses the simulation
and requests instruction. The SF then follows instructions from the SME
as it is performing its task, and then learns from them, thereby avoiding
future instruction. This focuses the knowledge acquisition process on
those places that need correction. Learning from instruction is an
extremely complex process to automate. However, restricting the
instruction so that it comes while the learner is performing a task,
greatly simplifies the problem. This approach makes it easier for the
instructor because there is no need to pre-organize the material into a
coherent presentation, and think of all possible cases. Instead the
instructor just views the behavior of the learner, and only has to
determine what he would do in the same situation. The instructor can then
provide assistance when requested (because of lack of knowledge of the
learner), or correct the learner when mistakes are made. The learner has a
much easier job because there is no need to try to interpret general
instructions and determine how they apply to arbitrary future situations.
Instead, the instruction can almost always be applied directly to the
current situation (or to close hypothetical situations). From a specific
example, the learner can use its existing knowledge of the domain to
explain to itself why the instruction was appropriate, and then generalize
the experience so it can use it in the future (using a form of learning
called Explanation-Based learning).
Instructo-Soar is a system developed at the University of Michigan under
the supervision of the PI, which learns from instruction. Instructo-Soar
uses a specialization of the technique described above to extend its
knowledge of how to solve problems. Instructo-Soar learns new procedures
and extensions of procedures for novel situations, and other domain
knowledge such as control knowledge, knowledge of operator effects, and
state inferences. In this project, we will be taking the basic structure
of Instructo-Soar and applying it to a much more dynamic domain, which
requires significantly more domain knowledge.
Back to John Laird's Homepage