Reading Between the Lines: Learning to Map High-level Instructions to Commands

Experimental Framework

During the reinforcement learning process, the learner maps each instruction document to a candidate sequence of actions, executes them in the target environment (in this case the Windows 2000 user interface), and learns from how well these candidate actions work. For this process to work, the learner needs to be able to control the Windows 2000 operating system in two ways:

1. Reset the Windows 2000 OS to some specified initial state

2. Execute selected action sequences in the Windows 2000 user interface, and observe the resulting changes.

Resetting to An Initial State

The first requirement is met by running the Windows 2000 operating system on a virtual machine. In our experiments, VMware Workstation (http://www.vmware.com) was used as the virtualization software - but this was simply due to familiarity, and any alternatives should be equally good for this purpose. The initial state to which the OS needs to be reset was saved as a virtual machine snapshot. Then the command line interface of VMware was used to programmatically reset the virtual machine to this snapshot when necessary. The reinforcement learner gets access to the VMware command line through the VM snapshot reset process.

Executing Actions, and Observing User Interface State

The requirement of being able to observe the current state of the Windows 2000 user interface, and to execute selected user interface actions was achieved through the operating system instrumentation agent. This program when run in the target Windows 2000 OS, connects to the reinforcement learner through a TCP/IP socket connection, and commnicates with it using a simple human readable protocol. Through this agent, the learner is able to retrieve the current set of user interface objects along with their attributes, and also execute user interface commands on these objects.

Framework Diagram

Figure 1. This diagram shows the complete framework used in the Windows 2000 experiments. Only a single Reinforcement Learner and a single Target Environment are needed for our algorithm. However, the Cache process allows for the transparent, non-blocking multiplexing of multiple learners to multiple target environment. This multiplexing enables the environment state observations of the different learners to contribute to the cache in parallel, thereby speeding up the cache-building process, and reducing the wall-clock run-times of all learners. The multiplexing also allowing for better utilization of available hardware resources by removing the need of an individual target environment for each learner.

List of Framework Components

Given below are descriptions of each component of the experimental framework, along with links to code and configuration files. See the code section below for a complete archive of all the components, packaged for ease of compilation.

1. Reinforcement Learner [ code ] [ configuration ]

Command line :

This is the implementation of the algorithm presented in the paper.

2. Cache [ code ] [ configuration ]

Command line :

The primary bottleneck in this experimental framework is the need to interact with the Windows 2000 environment. This interaction is expensive in terms of time: the environment needs to be reset for each document which takes approximately 30 seconds, and every command execution takes approximately 1 second to complete. The Cache is simply a processes that sits between the reinforcement learner and the environment, and transparently caches the communications protocol. Since the Windows 2000 environment is deterministic, caching at the protocol level allows for the interaction time cost to be significantly mitigated. The cache also allows for multiple learners to be multiplexed to multiple target environments, removing the need for dedicated target environments for each learner, and further improving performance via sharing of observations across environments.

3. VM snapshot reset process [ code ]

Command line : python vm_snapshot_reset_process.py 5002

This program allows the reinforcement learner to reset the Windows 2000 setup to an initial state through the command line interface of VMware. This code will need to be re-written if different virtualization software is used. If VMware is used, the following line in the code will need to be modified to point to the vmx file of your virtual machine:
sVMX = "/home/virtual-machines/vmware/win2k_sp4/TEST_WIN2K_SP4.vmx".

The number specified on the command line (5002) is the TCP/IP port on which the learner will attempt to connect to this process. This value needs to match the corresponding world_*_reset_service port specified in the cache configuration file. The default value is 5002.

4. TCP packet relay [ code ]

Command line : tcp_relay 5000

This program is a simple TCP/IP packet relay. It allows the reinforcement learner to connect to the os instrumentation agent while insulating it from the effects of the virtual machine being reset.

The number specified on the command line (5000) is the TCP/IP port on which the learner will attempt to connect to this process. This value needs to match the corresponding world_*_agent_service port specified in the cache configuration file. The default value is 5000.

Operating system instrumentation agent

[

code ] [

runnable bundle (including dlls) ] [

configuration ]

This program is run by double-clicking on interact.exe from the Windows file explorer.

This program when run in Windows 2000 allows the reinforcement learner to observe and interact with the user interface of the operating system, and of the applications running in it. Currently it is only able to observe and interact with user interface objects that are part of the standard Windows 2000 UI library. Interaction with other UI objects was not attempted due both to lack of documentation, and to the peculiarities of their APIs.

Additional Notes

1.	The current version of the operating system instrumentation agent is only able to observe and interact with user interface objects that are part of the standard Windows 2000 UI library. Interaction with other UI objects was not attempted due both to lack of documentation, and to the peculiarities of their APIs.
2.	Windows 2000 was selected as the target operating system both for ease of instrumentation, and availability of help documents.
3.	During a normal learning run, the Windows 2000 virtual machine will be reset multiple times. At every reset, the TCP/IP connection from the operating system instrumentation agent to the reinforcement learner will be interrupted. The tcp packet relay process is used to insulate the learner from this repeated socket disconnection/reconnection.
4.	In our experiments, simply for the sake of convenience and flexibility, the learner and the virtual machine were run on different compute hardware. This is the setup shown in Figure 1. However, both processes can be run on a single compute server if hardware resources are sufficient.
5.	For the sake of performance, no anti-virus or firewall was installed on the Windows 2000 setup. To keep the operating system safe from intrusion, the virtual machine was setup to disallow network connectivity to the external world. Network connections from Windows 2000 were only allowed to the server on which the virtual machine was running (i.e. "local network only").