Figure4

Reinforcement learning with parameterized action space and sparse reward for UAV navigation

Figure 4. The network architecture of HER-MPDQN. For the symbols in the figure, $$ s $$ denotes the observed state, $$ g $$ represents the goal of the agent, $$ \theta_x $$ and $$ \theta_Q $$ refer to network parameters, and (256, 128, 64) indicates the number of neurons in the network. $$ x_k $$ is the continuous parameter corresponding to the $$ k $$th discrete action, where $$ k= 1, 2, ..., K $$. $$ K $$ is the total number of discrete actions. $$ xe_k $$ represents the expanded continuous parameter vector derived from $$ x_k $$. $$ Q_{kk} $$ denotes the Q value associated with the $$ k $$th discrete action. The selection of the discrete action $$ k $$ is determined based on the largest Q value.

Intelligence & Robotics
ISSN 2770-3541 (Online)
Follow Us

Portico

All published articles are preserved here permanently:

https://www.portico.org/publishers/oae/

Portico

All published articles are preserved here permanently:

https://www.portico.org/publishers/oae/