Weakly-supervised hierarchical incremental neurosymbolic learning
Last updated: Apr 24, 2021
Algorithm
Input: (1) a policy that lacks a skill (e.g. a policy that can traverse a flat terrain but fails to traverse a terrain with holes) and (2) an oracle that signals when the new skill should is needed (e.g. that signals when there is a hole).
Output: (1) a new policy that has the desired new skill and (2) a symbolic program that decides when to execute the two.
Instantiate a new policy (e.g. a neural network).
Optimize the new policy with a Reinforcement Learning algorithm using the oracle (i.e. the oracle decides which of the two policies should be executed at every step).
Form a dataset of \((\text{observation}, \text{oracle})\) pairs.
Using the dataset synthesize a program (e.g. a decision tree) to select which of the two policies to execute.
Initial results
Observations
The fitted decision tree is not very effective. Removing the oracle leads to the robot failing the task.
Increasing the maximum depth of the decision tree led to similar results.
Possible next steps
Improve the synthesis of the “choosing programs”. Are we in the right space of symbolic programs to activate primitives in locomotion tasks? Do we need to change the representation or the algorithm? Is the imbalanced data the cause of poor performance?
What happens in hierarchies of multiple levels?