Reinforcement Learning¶

We provide a full re-implementation of the PPO algorithm [1]. The policy supports three types of input features:

Users can specify any non-empty combination of these feature types (e.g., obs_hidden, obs_hidden_dist).

References¶