REFERENCES

1. Bledt G, Wensing PM, Ingersoll S, Kim S. Contact Model Fusion for Event-Based Locomotion in Unstructured Terrains. In: 2018 IEEE International Conference on Robotics and Automation (ICRA); 2018. pp. 4399-406.

2. Hwangbo J, Bellicoso CD, Fankhauser P, Hutter M. Probabilistic foot contact estimation by fusing information from dynamics and differential/forward kinematics. In: 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS); 2016. pp. 3872-78.

3. Camurri M, Fallon M, Bazeille S, et al. Probabilistic contact estimation and impact detection for state estimation of quadruped robots. IEEE Robotics and Automation Letters 2017;2:1023-30.

4. Bloesch M, Gehring C, Fankhauser P, et al. State estimation for legged robots on unstable and slippery terrain. In: 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems; 2013. pp. 6058-64.

5. Sutton RS, Barto AG. Reinforcement learning: an introduction. IEEE Trans Neural Netw 2005;16:285-86.

6. Hwangbo J, Lee J, Dosovitskiy A, et al. Learning agile and dynamic motor skills for legged robots. Sci Robot 2019;4.

7. Lee J, Hwangbo J, Wellhausen L, Koltun V, Hutter M. Learning quadrupedal locomotion over challenging terrain. Sci Robot 2020;5. Available from: https://robotics.sciencemag.org/content/5/47/eabc5986 [last accessed on 30 Aug 2022].

8. Miki T, Lee J, Hwangbo J, et al. Learning robust perceptive locomotion for quadrupedal robots in the wild. Sci Robot 2022;7: eabk2822. Available from: https://www.science.org/doi/abs/10.1126/scirobotics.abk2822 [Last accessed on 30 Aug 2022].

9. Yang C, Yuan K, Zhu Q, Yu W, Li Z. Multi-expert learning of adaptive legged locomotion. Sci Robot 2020;5. Available from: https://robotics.sciencemag.org/content/5/49/eabb2174 [Last accessed on 30 Aug 2022].

10. Peng XB, Coumans E, Zhang T, et al. Learning agile robotic locomotion skills by imitating animals. In: Robotics: Science and Systems; 2020.

11. Fu Z, Kumar A, Agarwal A, et al. Coupling vision and proprioception for navigation of legged robots. arXiv preprint arXiv: 211202094 2021. Available from: https://openaccess.thecvf.com/content/CVPR2022/html/Fu_Coupling_Vision_and_Proprioception_for_Navigation_of_Legged_Robots_CVPR_2022_paper.html [Last accessed on 30 Aug 2022].

12. Bohez S, Tunyasuvunakool S, Brakel P, et al. Imitate and repurpose: learning reusable robot movement skills from human and animal behaviors. arXiv preprint arXiv: 220317138 2022.

13. Yang R, Zhang M, Hansen N, Xu H, Wang X. Learning vision-guided quadrupedal locomotion end-to-end with cross-modal transformers.

14. Zhang T, Mo H. Reinforcement learning for robot research: a comprehensive review and open issues. Int J Advanc Robot Syst 2021;18:17298814211007305.

15. Ibarz J, Tan J, Finn C, et al. How to train your robot with deep reinforcement learning: lessons we have learned. Int J Robot Res 2021;40:698-721.

16. Koos S, Mouret JB, Doncieux S. Crossing the reality gap in evolutionary robotics by promoting transferable controllers. In: Conference on Genetic and Evolutionary Computation. United States: ACM, publisher; 2010. pp. 119-26. Available from: https://hal.archives-ouvertes.fr/hal-00633927.

17. Zhao W, Queralta JP, Westerlund T. Sim-to-real transfer in deep reinforcement learning for robotics: a survey. In: 2020 IEEE Symposium Series on Computational Intelligence (SSCI). IEEE; 2020. pp. 737-44.

18. Yue J. Learning locomotion for legged robots based on reinforcement learning: a survey. In: 2020 International Conference on Electrical Engineering and Control Technologies (CEECT). IEEE; 2020. pp. 1-7.

19. Sutton RS, Barto AG. Introduction to reinforcement learning 1998. Available from: https://login.cs.utexas.edu/sites/default/files/legacy_files/research/documents/1%20intro%20up%20to%20RL%3ATD.pdf [Last accessed on 30 Aug 2022].

20. Schulman J, Levine S, Abbeel P, Jordan M, Moritz P. Trust region policy optimization. In: International conference on machine learning. PMLR; 2015. pp. 1889-97. Available from: https://proceedings.mlr.press/v37/schulman15.html [Last accessed on 30 Aug 2022].

21. Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O. Proximal policy optimization algorithms. CoRR 2017;abs/1707.06347. Available from: http://arxiv.org/abs/1707.06347 [Last accessed on 30 Aug 2022].

22. Schulman J, Moritz P, Levine S, Jordan MI, Abbeel P. High-dimensional continuous control using generalized advantage estimation. In: Bengio Y, LeCun Y, editors. 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings; 2016. Available from: http://arxiv.org/abs/1506.02438 [Last accessed on 30 Aug 2022].

23. Mania H, Guy A, Recht B. Simple random search provides a competitive approach to reinforcement learning. CoRR 2018;abs/1803.07055. Available from: http://arxiv.org/abs/1803.07055 [Last accessed on 30 Aug 2022].

24. Haarnoja T, Zhou A, Abbeel P, Levine S. Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: Dy JG, Krause A, editors. Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. vol. 80 of Proceedings of Machine Learning Research. PMLR; 2018. pp. 1856-65. Avaialble from: http://proceedings.mlr.press/v80/haarnoja18b.html [Last accessed on 30 Aug 2022].

25. Song HF, Abdolmaleki A, Springenberg JT, et al. V-MPO: on-policy maximum a posteriori policy optimization for discrete and continuous control. OpenReview. net; 2020. Available from: https://openreview.net/forum?id=SylOlp4FvH [Last accessed on 30 Aug 2022].

26. Abdolmaleki A, Huang SH, Hasenclever L, et al. A distributional view on multi-objective policy optimization. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. vol. 119 of Proceedings of Machine Learning Research. PMLR; 2020. pp. 11-22. Avaialble from: http://proceedings.mlr.press/v119/abdolmaleki20a.html [Last accessed on 30 Aug 2022].

27. Brakel P, Bohez S, Hasenclever L, Heess N, Bousmalis K. Learning coordinated terrain-adaptive locomotion by imitating a centroidal dynamics planner. CoRR 2021;abs/2111.00262. Avaialble from: https://arxiv.org/abs/2111.00262 [Last accessed on 30 Aug 2022].

28. Gangapurwala S, Mitchell AL, Havoutis I. Guided constrained policy optimization for dynamic quadrupedal robot locomotion. IEEE Robotics Autom Lett 2020;5:3642-49.

29. Chen X, Wang C, Zhou Z, Ross KW. Randomized ensembled double Q-learning: learning fast without a model. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview. net; 2021. Avaialble from: https://openreview.net/forum?id=AY8zfZm0tDd [Last accessed on 30 Aug 2022].

30. Smith L, Kew JC, Peng XB, et al. Legged robots that keep on learning: fine-tuning locomotion policies in the real world. In: 2022 IEEE International Conference on Robotics and Automation (ICRA); 2022. pp. 1-7.

31. Coumans E, Bai Y. PyBullet, a Python module for physics simulation for games, robotics and machine learning; 2016-2021. http://pybullet.org.

32. Hwangbo J, Lee J, Hutter M. Per-Contact Iteration Method for Solving Contact Dynamics. IEEE Robotics Autom Lett 2018;3: 895-902. Avaialble from: https://doi.org/10.1109/LRA.2018.2792536 [Last accessed on 30 Aug 2022].

33. Todorov E, Erez T, Tassa Y. MuJoCo: a physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE; 2012. pp. 5026-33.

34. Makoviychuk V, Wawrzyniak L, Guo Y, et al. Isaac gym: high performance GPU based physics simulation for robot learning. In: Vanschoren J, Yeung S, editors. Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, NeurIPS Datasets and Benchmarks 2021, December 2021, virtual; 2021. Avaialble from: https://datasets-benchmarks-proceedings.neurips.cc/paper/2021/hash/28dd2c7955ce926456240b2ff0100bde-Abstract-round2.html [Last accessed on 30 Aug 2022].

35. Rudin N, Hoeller D, Reist P, Hutter M. Learning to walk in minutes using massively parallel deep reinforcement learning. In: Conference on Robot Learning. PMLR; 2022. pp. 91-100. Avaialble from: https://proceedings.mlr.press/v164/rudin22a.html [Last accessed on 30 Aug 2022].

36. Margolis GB, Yang G, Paigwar K, Chen T, Agrawal P. Rapid locomotion via reinforcement learning. arXiv preprint arXiv: 220502824 2022.

37. Escontrela A, Peng XB, Yu W, et al. Adversarial motion priors make good substitutes for complex reward functions. arXiv e-prints 2022: arXiv: 2203.15103.

38. Vollenweider E, Bjelonic M, Klemm V, et al. Advanced skills through multiple adversarial motion priors in reinforcement learning. arXiv e-prints 2022: arXiv: 2203.14912.

39. Tan J, Zhang T, Coumans E, et al. Sim-to-real: learning agile locomotion for quadruped robots. In: Kress-Gazit H, Srinivasa SS, Howard T, Atanasov N, editors. Robotics: Science and Systems XIV, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA, June 26-30, 2018; 2018. Avaialble from: http://www.roboticsproceedings.org/rss14/p10.html [Last accessed on 30 Aug 2022].

40. Hutter M, Gehring C, Jud D, et al. Anymal-a highly mobile and dynamic quadrupedal robot. In: 2016 IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE; 2016. pp. 38-44.

41. Ha S, Xu P, Tan Z, Levine S, Tan J. Learning to walk in the real world with minimal human effort 2020;155: 1110-20. Available from: https://proceedings.mlr.press/v155/ha21c.html [Lasta accessed on 30 Aug 2022].

42. Gangapurwala S, Geisert M, Orsolino R, Fallon M, Havoutis I. Rloc: terrain-aware legged locomotion using reinforcement learning and optimal control. IEEE Trans Robot 2022.

43. Peng XB, van~de Panne M. Learning locomotion skills using DeepRL: does the choice of action space matter? In: Teran J, Zheng C, Spencer SN, Thomaszewski B, Yin K, editors. Proceedings of the ACM SIGGRAPH/Eurographics Symposium on Computer Animation, Los Angeles, CA, USA, July 28-30, 2017. Eurographics Association/ACM; 2017. pp. 12: 1-2: 13. Avaialble from: https://doi.org/10.1145/3099564.3099567 [Last accessed on 30 Aug 2022].

44. Chen S, Zhang B, Mueller MW, Rai A, Sreenath K. Learning torque control for quadrupedal locomotion. CoRR 2022;abs/2203.05194. Avaialble from: https://doi.org/10.48550/arXiv.2203.05194 [Last accessed on 30 Aug 2022].

45. Carlo JD, Wensing PM, Katz B, Bledt G, Kim S. Dynamic locomotion in the MIT cheetah 3 through convex model-predictive control. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2018, Madrid, Spain, October 1-5, 2018. IEEE; 2018. pp. 1-9. Avaialble from: https://doi.org/10.1109/IROS.2018.8594448 [Last accessed on 30 Aug 2022].

46. Jain D, Iscen A, Caluwaerts K. Hierarchical reinforcement learning for quadruped locomotion. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2019, Macau, SAR, China, November 3-8, 2019. IEEE; 2019. pp. 7551-57. Avaialble from: https://doi.org/10.1109/IROS40897.2019.8967913 [Last accessed on 30 Aug 2022].

47. Li T, Lambert NO, Calandra R, Meier F, Rai A. Learning generalizable locomotion skills with hierarchical reinforcement learning. In: 2020 IEEE International Conference on Robotics and Automation, ICRA 2020, Paris, France, May 31 - August 31, 2020. IEEE; 2020. pp. 413-19. Avaialble from: https://doi.org/10.1109/ICRA40945.2020.9196642 [Last accessed on 30 Aug 2022].

48. Lee J, Hwangbo J, Hutter M. Robust recovery controller for a quadrupedal robot using deep reinforcement learning. CoRR 2019;abs/1901.07517. Available from: http://arxiv.org/abs/1901.07517 [Lasta accessed on 30 Aug 2022].

49. Iscen A, Caluwaerts K, Tan J, et al. Policies modulating trajectory generators. In: 2nd Annual Conference on Robot Learning, CoRL 2018, Zürich, Switzerland, 29-31 October 2018, Proceedings. vol. 87 of Proceedings of Machine Learning Research. PMLR; 2018. pp. 916-26. Available from: http://proceedings.mlr.press/v87/iscen18a.html [last accessed on 30 Aug 2022].

50. Rahme M, Abraham I, Elwin ML, Murphey TD. Dynamics and domain randomized gait modulation with Bezier curves for sim-to-real legged locomotion. CoRR 2020;abs/2010.12070. Avaialble from: https://arxiv.org/abs/2010.12070 [Last accessed on 30 Aug 2022].

51. Zhang H, Wang J, Wu Z, Wang Y, Wang D. Terrain-aware risk-assessment-network-aided deep reinforcement learning for quadrupedal locomotion in tough terrain. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2021, Prague, Czech Republic, September 27 - Oct. 1, 2021. IEEE; 2021. pp. 4538-45. Avaialble from: https://doi.org/10.1109/IROS51168.2021.9636519 [Last accessed on 30 Aug 2022].

52. Yang Y, Zhang T, Coumans E, Tan J, Boots B. Fast and efficient locomotion via learned gait transitions. In: Faust A, Hsu D, Neumann G, editors. Conference on Robot Learning, 8-11 November 2021, London, UK. vol. 164 of Proceedings of Machine Learning Research. PMLR; 2021. pp. 773-83. Avaialble from: https://proceedings.mlr.press/v164/yang22d.html [Last accessed on 30 Aug 2022].

53. Gangapurwala S, Geisert M, Orsolino R, Fallon MF, Havoutis I. Real-time trajectory adaptation for quadrupedal locomotion using deep reinforcement learning. In: IEEE International Conference on Robotics and Automation, ICRA 2021, Xi'an, China, May 30 - June 5, 2021. IEEE; 2021. pp. 5973-79. Avaialble from: https://doi.org/10.1109/ICRA48506.2021.9561639 [Last accessed on 30 Aug 2022].

54. Yao Q, Wang J, Wang D, et al. Hierarchical terrain-aware control for quadrupedal locomotion by combining deep reinforcement learning and optimal control. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2021, Prague, Czech Republic, September 27 - Oct. 1, 2021. IEEE; 2021. pp. 4546-51. Avaialble from: https://doi.org/10.1109/IROS51168.2021.9636738 [Last accessed on 30 Aug 2022].

55. Singla A, Bhattacharya S, Dholakiya D, et al. Realizing learned quadruped locomotion behaviors through kinematic motion primitives. In: International Conference on Robotics and Automation, ICRA 2019, Montreal, QC, Canada, May 20-24, 2019. IEEE; 2019. pp. 7434-40. Avaialble from: https://doi.org/10.1109/ICRA.2019.8794179 [Last accessed on 30 Aug 2022].

56. Vollenweider E, Bjelonic M, Klemm V, et al. Advanced skills through multiple adversarial motion priors in reinforcement learning. arXiv preprint arXiv: 220314912 2022.

57. Li A, Wang Z, Wu J, Zhu Q. Efficient learning of control policies for robust quadruped bounding using pretrained neural networks. arXiv preprint arXiv: 201100446 2020.

58. Shao Y, Jin Y, Liu X, et al. Learning free gait transition for quadruped robots via phase-guided controller. IEEE Robotics and Automation Letters 2021;7:1230-37.

59. Luo J, Hauser KK. Robust trajectory optimization under frictional contact with iterative learning. Auton Robots 2017;41: 1447-61. Avaialble from: https://doi.org/10.1007/s10514-017-9629-x [Last accessed on 30 Aug 2022].

60. Kumar A, Fu Z, Pathak D, Malik J. RMA: rapid motor adaptation for legged robots. In: Proceedings of Robotics: Science and Systems. Virtual; 2021.

61. Liu J, Zhang H, Wang D. DARA: dynamics-aware reward augmentation in offline reinforcement learning. CoRR 2022;abs/2203.06662. Avaialble from: https://doi.org/10.48550/arXiv.2203.06662 [Last accessed on 30 Aug 2022].

62. Shi H, Zhou B, Zeng H, et al. Reinforcement learning with evolutionary trajectory generator: A general approach for quadrupedal locomotion. IEEE Robotics Autom Lett 2022;7: 3085-92. Avaialble from: https://doi.org/10.1109/LRA.2022.3145495 [Last accessed on 30 Aug 2022].

63. Zhao W, Queralta JP, Westerlund T. Sim-to-real transfer in deep reinforcement learning for robotics: a survey. In: 2020 IEEE Symposium Series on Computational Intelligence, SSCI 2020, Canberra, Australia, December 1-4, 2020. IEEE; 2020. pp. 737-44. Avaialble from: https://doi.org/10.1109/SSCI47803.2020.9308468 [Last accessed on 30 Aug 2022].

64. Mnih V, Kavukcuoglu K, Silver D, et al. Playing atari with deep reinforcement learning. arXiv preprint arXiv: 13125602 2013.

65. Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O. Proximal policy optimization algorithms. arXiv preprint arXiv: 170706347 2017.

66. Wang C, Yang T, Hao J, et al. ED2: an environment dynamics decomposition framework for world model construction. CoRR 2021;abs/2112.02817. Avaialble from: https://arxiv.org/abs/2112.02817 [Last accessed on 30 Aug 2022].

67. Kostrikov I, Yarats D, Fergus R. Image augmentation is all you need: regularizing deep reinforcement learning from pixels. arXiv preprint arXiv: 200413649 2020.

68. Yarats D, Fergus R, Lazaric A, Pinto L. Mastering visual continuous control: improved data-augmented reinforcement learning. arXiv preprint arXiv: 210709645 2021.

69. Ahmed O, Träuble F, Goyal A, et al. Causalworld: a robotic manipulation benchmark for causal structure and transfer learning. arXiv preprint arXiv: 201004296 2020.

70. Dittadi A, Träuble F, Wüthrich M, et al. The role of pretrained representations for the OOD generalization of RL agents. arXiv preprint arXiv: 210705686 2021.

71. Hsu K, Kim MJ, Rafailov R, Wu J, Finn C. Vision-based manipulators need to also see from their hands. arXiv preprint arXiv: 220312677 2022.

72. Eysenbach B, Asawa S, Chaudhari S, Levine S, Salakhutdinov R. Off-dynamics reinforcement learning: training for transfer with domain classifiers. arXiv preprint arXiv: 200613916 2020.

73. Liu J, Zhang H, Wang D. DARA: dynamics-aware reward augmentation in offline reinforcement learning. arXiv preprint arXiv: 220306662 2022.

74. Lee K, Seo Y, Lee S, Lee H, Shin J. Context-aware dynamics model for generalization in model-based reinforcement learning. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. vol. 119 of Proceedings of Machine Learning Research. PMLR; 2020. pp. 5757-66. Avaialble from: http://proceedings.mlr.press/v119/lee20g.html [Last accessed on 30 Aug 2022].

75. Yu W, Tan J, Liu CK, Turk G. Preparing for the unknown: learning a universal policy with online system identification. arXiv preprint arXiv: 170202453 2017.

76. Chen D, Zhou B, Koltun V, Krähenbühl P. Learning by cheating. In: Conference on Robot Learning. PMLR; 2020. pp. 66-75. Available from: http://proceedings.mlr.press/v100/chen20a.html [Last accessed on 30 Aug 2022].

77. Tobin J, Fong R, Ray A, et al. Domain randomization for transferring deep neural networks from simulation to the real world 2017.

78. Peng XB, Andrychowicz M, Zaremba W, Abbeel P. Sim-to-real transfer of robotic control with dynamics randomization 2017.

79. Kaiser L, Babaeizadeh M, Milos P, et al. Model-based reinforcement learning for atari. arXiv preprint arXiv: 190300374 2019.

80. Schrittwieser J, Antonoglou I, Hubert T, et al. Mastering atari, go, chess and shogi by planning with a learned model. Nature 2020;588:604-9.

81. Ye W, Liu S, Kurutach T, Abbeel P, Gao Y. Mastering atari games with limited data. Adv Neural Inform Proc Syst 2021;34: 25476-88. Available from: https://proceedings.neurips.cc/paper/2021/hash/d5eca8dc3820cad9fe56a3bafda65ca1-Abstract.html [Last accessed on 30 Aug 2022].

82. Chua K, Calandra R, McAllister R, Levine S. Deep reinforcement learning in a handful of trials using probabilistic dynamics models. Adva neural inform proc syst 2018;31. Available from: https://proceedings.neurips.cc/paper/2018/hash/3de568f8597b94bda53149c7d7f5958c-Abstract.html [Last accessed on 30 Aug 2022].

83. Janner M, Fu J, Zhang M, Levine S. When to trust your model: model-based policy optimization. Adv Neural Inform Proc Syst 2019;32. Available from: https://proceedings.neurips.cc/paper/2019/hash/5faf461eff3099671ad63c6f3f094f7f-Abstract.html [Last accessed on 30 Aug 2022].

84. Hansen N, Wang X, Su H. Temporal difference learning for model predictive control. arXiv preprint arXiv: 220304955 2022.

85. Tassa Y, Doron Y, Muldal A, et al. Deepmind control suite. arXiv preprint arXiv: 180100690 2018.

86. Peng XB, Ma Z, Abbeel P, Levine S, Kanazawa A. Amp: adversarial motion priors for stylized physics-based character control. ACM Trans Graph (TOG) 2021;40:1-20.

87. Peng XB, Guo Y, Halper L, Levine S, Fidler S. ASE: large-scale reusable adversarial skill embeddings for physically simulated characters. arXiv preprint arXiv: 220501906 2022.

88. Merel J, Hasenclever L, Galashov A, et al. Neural probabilistic motor primitives for humanoid control. arXiv preprint arXiv: 181111711 2018.

89. Hasenclever L, Pardo F, Hadsell R, Heess N, Merel J. Comic: complementary task learning & mimicry for reusable skills. In: International Conference on Machine Learning. PMLR; 2020. pp. 4105-15. Available from: https://proceedings.mlr.press/v119/hasenclever20a.html [Last accessed on 30 Aug 2022].

90. Liu S, Lever G, Wang Z, et al. From motor control to team play in simulated humanoid football. arXiv preprint arXiv: 210512196 2021.

91. Escontrela A, Peng XB, Yu W, et al. Adversarial motion priors make good substitutes for complex reward functions. arXiv preprint arXiv: 220315103 2022.

92. Levine S, Kumar A, Tucker G, Fu J. Offline reinforcement learning: Tutorial, review, and perspectives on open problems. arXiv preprint arXiv: 200501643 2020.

93. Duan Y, Schulman J, Chen X, et al. Rl_ˆ2: Fast reinforcement learning via slow reinforcement learning. arXiv preprint arXiv: 161102779 2016.

94. Nichol A, Achiam J, Schulman J. On first-order meta-learning algorithms. arXiv preprint arXiv: 180302999 2018.

95. Rakelly K, Zhou A, Finn C, Levine S, Quillen D. Efficient off-policy meta-reinforcement learning via probabilistic context variables. In: International Conference on Machine Learning. PMLR; 2019. pp. 5331-40. Available from: http://proceedings.mlr.press/v97/rakelly19a.html [Last accessed on 30 Aug 2022].

96. Seo Y, Lee K, James SL, Abbeel P. Reinforcement learning with action-free pre-training from videos. In: International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. vol. 162 of Proceedings of Machine Learning Research. PMLR; 2022. pp. 19561-79. Avaialble from: https://proceedings.mlr.press/v162/seo22a.html [Last accessed on 30 Aug 2022].

97. Mandi Z, Abbeel P, James S. On the Effectiveness of Fine-tuning Versus Meta-reinforcement Learning. arXiv preprint arXiv: 220603271 2022.

98. Haarnoja T, Zhou A, Ha S, et al. Learning to walk via deep reinforcement learning 2019.

99. Yang Y, Caluwaerts K, Iscen A, et al. Data efficient reinforcement learning for legged robots. In: Kaelbling LP, Kragic D, Sugiura K, editors. 3rd Annual Conference on Robot Learning, CoRL 2019, Osaka, Japan, October 30 - November 1, 2019, Proceedings. vol. 100 of Proceedings of Machine Learning Research. PMLR; 2019. pp. 1-10. Avaialble from: http://proceedings.mlr.press/v100/yang20a.html [Last accessed on 30 Aug 2022].

100. Tsounis V, Alge M, Lee J, Farshidian F, Hutter M. DeepGait: planning and control of quadrupedal gaits using deep reinforcement learning. IEEE Robot Autom Lett 2020;5:3699-706.

101. Da X, Xie Z, Hoeller D, et al. Learning a contact-adaptive controller for robust, efficient legged locomotion. PMLR; 2020. Available from: https://proceedings.mlr.press/v155/da21a.html [Lasta accessed on 30 Aug 2022].

102. Liang J, Makoviychuk V, Handa A, et al. GPU-accelerated robotic simulation for distributed reinforcement learning. CoRR 2018;abs/1810.05762. Avaialble from: http://arxiv.org/abs/1810.05762 [Last accessed on 30 Aug 2022].

103. Escontrela A, Yu G, Xu P, Iscen A, Tan J. Zero-shot terrain generalization for visual locomotion policies. CoRR 2020;abs/2011.05513. Avaialble from: https://arxiv.org/abs/2011.05513 [Last accessed on 30 Aug 2022].

104. Jiang Y, Zhang T, Ho D, et al. SimGAN: hybrid simulator identification for domain adaptation via adversarial reinforcement learning 2021: 2884-90. Available from: https://doi.org/10.1109/ICRA48506.2021.9561731 [last accessed on 30 Aug 2022].

105. Tan W, Fang X, Zhang W, et al. A hierarchical framework for quadruped locomotion based on reinforcement learning. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2021, Prague, Czech Republic, September 27 - Oct. 1, 2021. IEEE; 2021. pp. 8462-68. Avaialble from: https://doi.org/10.1109/IROS51168.2021.9636757 [Last accessed on 30 Aug 2022].

106. Michel O. WebotsTM: professional mobile robot simulation. CoRR 2004;abs/cs/0412052. Avaialble from: http://arxiv.org/abs/cs/0412052 [Last accessed on 30 Aug 2022].

107. Fu Z, Kumar A, Malik J, Pathak D. Minimizing energy consumption leads to the emergence of gaits in legged robots. CoRR 2021;abs/2111.01674. Avaialble from: https://arxiv.org/abs/2111.01674 [Last accessed on 30 Aug 2022].

108. Kim S, Sorokin M, Lee J, Ha S. Human motion control of quadrupedal robots using deep reinforcement learning. arXiv preprint arXiv: 220413336 2022. Avaialble from: http://www.roboticsproceedings.org/rss18/p021.pdf [Last accessed on 30 Aug 2022].

109. Bogdanovic M, Khadiv M, Righetti L. Model-free reinforcement learning for robust locomotion using trajectory optimization for exploration. arXiv preprint arXiv: 210706629 2021.

110. Fernbach P, Tonneau S, Stasse O, Carpentier J, Taïx M. C-CROC: continuous and convex resolution of centroidal dynamic trajectories for legged robots in multicontact scenarios. IEEE Trans Robot 2020;36:676-91.

111. Zhang H, Starke S, Komura T, Saito J. Mode-adaptive neural networks for quadruped motion control. ACM Trans Graph (TOG) 2018;37:1-11.

112. Feldman A, Goussev V, Sangole A, Levin M. Threshold position control and the principle of minimal interaction in motor actions. Progr brain res 2007;02;165:267-81.

113. Winkler AW, Bellicoso CD, Hutter M, Buchli J. Gait and trajectory optimization for legged systems through phase-based end-effector parameterization. IEEE Robot Autom Lett 2018;3:1560-67.

114. Liu H, Jia W, Bi L. Hopf oscillator based adaptive locomotion control for a bionic quadruped robot. 2017 IEEE Int Confer Mechatr Autom (ICMA) 2017:949-54.

115. Carlo JD, Wensing PM, Katz B, Bledt G, Kim S. Dynamic locomotion in the MIT cheetah 3 through convex model-predictive control. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS); 2018. pp. 1-9.

116. Bellicoso D, Jenelten F, Fankhauser P, et al. Dynamic locomotion and whole-body control for quadrupedal robots. 2017 IEEE/RSJ Int Conf Intell Robots Sys (IROS) 2017:3359-65.

117. Sethian JA. Fast marching methods. SIAM Rev 1999;41:199-235.

118. Ponton B, Khadiv M, Meduri A, Righetti L. Efficient multi-contact pattern generation with sequential convex approximations of the centroidal dynamics. CoRR 2020;abs/2010.01215. Avaialble from: https://arxiv.org/abs/2010.01215 [Last accessed on 30 Aug 2022].

119. Zhang T, McCarthy Z, Jow O, et al. Deep imitation learning for complex manipulation tasks from virtual reality teleoperation. CoRR 2017;abs/1710.04615. Avaialble from: http://arxiv.org/abs/1710.04615 [Last accessed on 30 Aug 2022].

120. Thor M, Kulvicius T, Manoonpong P. Generic neural locomotion control framework for legged robots. IEEE Trans Neural Netw Learn Syst 2021;32:4013-25.