Figure1

Opponent modeling with trajectory representation clustering

Figure 1. Overview of contrastive predictive coding (CPC), a representation extraction algorithm by contrasting positive and negative samples.The context $$ c_t $$ and subsequent state embeddings $$ \{z_{t+1}, z_{t+2}, \dots, z_{H-1}\} $$ are regarded as positive samples when they come from the same trajectory; otherwise, they are regarded as negative samples. By increasing the similarity between positive samples and reducing the similarity between negative samples, we obtain trajectory representations to distinguish different opponent policies.

Intelligence & Robotics
ISSN 2770-3541 (Online)
Follow Us

Portico

All published articles are preserved here permanently:

https://www.portico.org/publishers/oae/

Portico

All published articles are preserved here permanently:

https://www.portico.org/publishers/oae/