Figure4

Opponent modeling with trajectory representation clustering

Figure 4. The average reward curve of interacting with opponent policy $$ \pi_1^{-1} $$ when $$ w $$ change from 0.5 to 0, 0.02, 0.04, 0.06, 0.08, and 0.1.

Intelligence & Robotics
ISSN 2770-3541 (Online)
Follow Us

Portico

All published articles are preserved here permanently:

https://www.portico.org/publishers/oae/

Portico

All published articles are preserved here permanently:

https://www.portico.org/publishers/oae/