Figure3

Opponent modeling with trajectory representation clustering

Figure 3. (a) The average reward curve of interacting with opponent policy $$ \pi_1^{-1} $$; and (b) the proportion change curve of opponent $$ \pi_1^{-1} $$ trajectory in replay buffer.

Intelligence & Robotics
ISSN 2770-3541 (Online)
Follow Us

Portico

All published articles are preserved here permanently:

https://www.portico.org/publishers/oae/

Portico

All published articles are preserved here permanently:

https://www.portico.org/publishers/oae/