Figure4

From: Reinforcement learning with parameterized action space and sparse reward for UAV navigation

Figure 4. The network architecture of HER-MPDQN. For the symbols in the figure, $$ s $$ denotes the observed state, $$ g $$ represents the goal of the agent, $$ \theta_x $$ and $$ \theta_Q $$ refer to network parameters, and (256, 128, 64) indicates the number of neurons in the network. $$ x_k $$ is the continuous parameter corresponding to the $$ k $$th discrete action, where $$ k= 1, 2, ..., K $$. $$ K $$ is the total number of discrete actions. $$ xe_k $$ represents the expanded continuous parameter vector derived from $$ x_k $$. $$ Q_{kk} $$ denotes the Q value associated with the $$ k $$th discrete action. The selection of the discrete action $$ k $$ is determined based on the largest Q value.