1 Abstract

Brief description of your task and how you went about to solve it.

2 Introduction

Based on the literature (survey articles, books, journals, conference proceedings), which you have found explain and discuss how the scientific subject which you have been investigating is embedded in the superior field, e.g. learning by experimentation is sub field of machine learning … What are neighboring disciplines (inductive learning, one-shot learning, reinforcement learning, etc)? What are the special aspects which are addressed in the subject and how do they distinguish the subject from neighboring subjects/disciplines? What are the typical assumptions made in the research on the subject? What is the methodology used in the field? (one page)

3 Description of the subject - Deep Learning in the context of robotics

Based on the literature (survey articles, books, journals, conference proceedings), which you have found explain and discuss how the scientific subject which you have been investigating is decomposed into different subfields and/or aspects and/or problem areas (e.g. learning by experimentation: cognitive/developmental psychology, epistemology, theory of experimentation, optimal design and evaluation of experiments, etc. Explain for each subfield/aspect/problem area why you think that is is of crucial importance to the subject which you have been investigating. Explain why you think that the set of subfields/aspects/problems which you have identified in fact covers the whole subject. (one page)

Harley’s note: Add here a small resume of the next seccion what they cover.

Robot Design

Multimodal Sensors Actuators

” ‘multimodal’ means to combine different channels of information simultaneously to understand our surroundings.”

Multimodal sensors fusion:

Locomotion

In robotics system there are several ways to move the platform from a point “a” to “b”, from a taxonomical point of view locomotion can be roughly divided through the medium by which the robotic system moves, this can be essentially air, land and water, from which it can be expanded into mechanical-structural categories [Yim M (1994) Locomotion with a Unit-Modular Reconfigurable Robot] [A review of robotics taxonomies in terms of form and structure], these categories correspond to legged, wheeled and exoskeletons in terms of land as a means to move.

With the advancement of deep learning and deep reinforcement learning methods, it has been possible to develop locomotion models that better adapt to the dynamic nature of the environments, without the costs involved in classical approaches that require a considerable human effort of fine tuning.

zero-shot transfer and learn policies

Manipulation

Inspired by research in the field of Biological Cybernetics and Neuroscience,the brain always predicts the next state of sensation and movement. It behaves to minimize the error (prediction error ≃ uncertainty) between the prediction and reality. deep predictive learning (DPL) has been proposed, described as adjusting cognitive models (perceptual inference) and behavior to the outside world (active inference)

3D registration:

Hardware-centric:

Learning by demonstration:

Deep Reinforcement Learning:

RL methods require careful hyperparameter-tuning, are difficult to train, and do not scale well to the high-dimensional action spaces.

challenges:

Perception

Visual:

Haptic:

Hearing:

4 Annotated Bibliography - [insert topic here]

In this section you should establish a subsection for each subfield/aspect/problem area which you have identified in the foregoing Section (“Description of the subject”). In each of the subsection you give a brief overview of the subfield list the annotated bibliography, i.e. all the papers which you found for this subfield, where each entry in this annotated bibliography should consist of the reference itself and a brief summary of the content of the paper. (as many pages as it takes)

Locomotion


My template


Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning

Keywords: Reinforcement Learning, Legged Robots, Sim-to-real.

Abstract: In this work, we present and study a training set-up that achieves fast policy generation for real-world robotic tasks by using massive parallelism on a single workstation GPU. We analyze and discuss the impact of different training algorithm components in the massively parallel regime on the final policy performance and training times. In addition, we present a novel game-inspired curriculum that is well suited for training with thousands of simulated robots in parallel. We evaluate the approach by training the quadrupedal robot ANYmal to walk on challenging terrain. The parallel approach allows training policies for flat terrain in under four minutes, and in twenty minutes for uneven terrain. This represents a speedup of multiple orders of magnitude compared to previous work. Finally, we transfer the policies to the real robot to validate the approach. We open-source our training code to help accelerate further research in the field of learned legged locomotion: https://leggedrobotics.github.io/legged_gym/.

Proposed approach

Method(s) for evaluating approach

Contributions

Open-ended research questions


Learning robust perceptive locomotion for quadrupedal robots in the wild.

Authors: Takahiro Miki, Joonho Lee, Jemin Hwangbo, Lorenz Wellhausen, Vladlen Koltun, Marco Hutter

Keywords:

Abstract: Legged robots that can operate autonomously in remote and hazardous environments will greatly increase opportunities for exploration into under-explored areas. Exteroceptive perception is crucial for fast and energyefficient locomotion: perceiving the terrain before making contact with it enables planning and adaptation of the gait ahead of time to maintain speed and stability. However, utilizing exteroceptive perception robustly for locomotion has remained a grand challenge in robotics. Snow, vegetation, and water visually appear as obstacles on which the robot cannot step – or are missing altogether due to high reflectance. Additionally, depth perception can degrade due to difficult lighting, dust, fog, reflective or transparent surfaces, sensor occlusion, and more. For this reason, the most robust and general solutions to legged locomotion to date rely solely on proprioception. This severely limits locomotion speed, because the robot has to physically feel out the terrain before adapting its gait accordingly. Here we present a robust and general solution to integrating exteroceptive and proprioceptive perception for legged locomotion. We leverage an attention-based recurrent encoder that integrates proprioceptive and exteroceptive input. The encoder is trained end-to-end and learns to seamlessly combine the different perception modalities without resorting to heuristics. The result is a legged locomotion controller with high robustness and speed. The controller was tested in a variety of challenging natural and urban environments over multiple seasons and completed an hour-long hike in the Alps in the time recommended for human hikers.

Deep Reinforcement Learning Pipeline:

  1. Privileged agent, learning teaching policy:
  1. Student policy:
  1. Functions Loss:
  1. Deployment:

Method(s) for evaluating approach

Contributions

Results:

Challenges

Personal Notes:


Learning Agile Soccer Skills for a Bipedal Robot with Deep Reinforcement Learning

Authors: Tuomas Haarnoja, Ben Moran, Guy Lever, Sandy H. Huang, Dhruva Tirumala, Markus Wulfmeier, Jan Humplik, Saran Tunyasuvunakool, Noah Y. Siegel, Roland Hafner, Michael Bloesch, Kristian Hartikainen, Arunkumar Byravan, Leonard Hasenclever, Yuval Tassa, Fereshteh Sadeghi, Nathan Batchelor, Federico Casarini, Stefano Saliceti, Charles Game, Neil Sreendra, Kushal Patel, Marlon Gwira, Andrea Huber, Nicole Hurley, Francesco Nori, Raia Hadsell, Nicolas Heess

Keywords:

Abstract: We investigate whether Deep Reinforcement Learning (Deep RL) is able to synthesize sophisticated and safe movement skills for a low-cost, miniature humanoid robot that can be composed into complex behavioral strategies in dynamic environments. We used Deep RL to train a humanoid robot with 20 actuated joints to play a simplified one-versus-one (1v1) soccer game. We first trained individual skills in isolation and then composed those skills end-to-end in a self-play setting. The resulting policy exhibits robust and dynamic movement skills such as rapid fall recovery, walking, turning, kicking and more; and transitions between them in a smooth, stable, and efficient manner—well beyond what is intuitively expected from the robot. The agents also developed a basic strategic understanding of the game, and learned, for instance, to anticipate ball movements and to block opponent shots. The full range of behaviors emerged from a small set of simple rewards. Our agents were trained in simulation and transferred to real robots zero-shot. We found that a combination of sufficiently high-frequency control, targeted dynamics randomization, and perturbations during training in simulation enabled good-quality transfer, despite significant unmodeled effects and variations across robot instances. Although the robots are inherently fragile, minor hardware modifications together with basic regularization of the behavior during training led the robots to learn safe and effective movements while still performing in a dynamic and agile way. Indeed, even though the agents were optimized for scoring, in experiments they walked 156 % faster, took 63 % less time to get up, and kicked 24 % faster than a scripted baseline, while efficiently combining the skills to achieve the longer term objectives. Examples of the emergent behaviors and full 1v1 matches are available on the supplementary website: https://sites.google.com/view/op3-soccer.

Proposed approach:

Reward policy:

The action space:

The proprioception sensing consists:

The game state information is obtained via a motion capture setup in the real environment:

Training pipeline:

Self-Play:

Sim-to-Real Transfer:

Method(s) for evaluating approach

Robotic platform: OP3

Training time:

To compare their results they isolated 3 behaviors:

Personal Notes:

Open-ended research questions:


Learning High-Speed Flight in the Wild

Authors: Antonio Loquercio, Elia Kaufmann, René Ranftl, Matthias Müller, Vladlen Koltun, Davide Scaramuzza

Keywords:

Abstract: Quadrotors are agile. Unlike most other machines, they can traverse extremely complex environments at high speeds. To date, only expert human pilots have been able to fully exploit their capabilities. Autonomous operation with onboard sensing and computation has been limited to low speeds. State-of-the-art methods generally separate the navigation problem into subtasks: sensing, mapping, and planning. Although this approach has proven successful at low speeds, the separation it builds upon can be problematic for high-speed navigation in cluttered environments. The subtasks are executed sequentially, leading to increased processing latency and a compounding of errors through the pipeline. Here we propose an end-to-end approach that can autonomously fly quadrotors through complex natural and human-made environments at high speeds, with purely onboard sensing and computation. The key principle is to directly map noisy sensory observations to collision-free trajectories in a receding-horizon fashion. This direct mapping drastically reduces processing latency and increases robustness to noisy and incomplete perception. The sensorimotor mapping is performed by a convolutional network that is trained exclusively in simulation via privileged learning: imitating an expert with access to privileged information. By simulating realistic sensor noise, our approach achieves zero-shot transfer from simulation to challenging real-world environments that were never experienced during training: dense forests, snow-covered terrain, derailed trains, and collapsed buildings. Our work demonstrates that end-to-end policies trained in simulation enable high-speed autonomous flight through challenging environments, outperforming traditional obstacle avoidance pipelines.

Proposed approach:

The policy:

Method(s) for evaluating approach:

Contributions:

Conclusions:

Results:

Challenges:

Personal Notes:

Manipulation


Robot peels banana with goal-conditioned dual-action deep imitation learning

Authors: Heecheol Kim, Yoshiyuki Ohmura, Yasuo Kuniyoshi

Keywords: Imitation Learning, Deep Learning in Grasping and Manipulation, Dual Arm Manipulation, Force and Tactile Sensing, Telerobotics and Teleoperation

Abstract: A long-horizon dexterous robot manipulation task of deformable objects, such as banana peeling, is problematic because of difficulties in object modeling and a lack of knowledge about stable and dexterous manipulation skills. This paper presents a goal-conditioned dual-action deep imitation learning (DIL) which can learn dexterous manipulation skills using human demonstration data. Previous DIL methods map the current sensory input and reactive action, which easily fails because of compounding errors in imitation learning caused by recurrent computation of actions. The proposed method predicts reactive action when the precise manipulation of the target object is required (local action) and generates the entire trajectory when the precise manipulation is not required. This dual-action formulation effectively prevents compounding error with the trajectorybased global action while respond to unexpected changes in the target object with the reactive local action. Furthermore, in this formulation, both global/local actions are conditioned by a goal state which is defined as the last step of each subtask, for robust policy prediction. The proposed method was tested in the real dual-arm robot and successfully accomplished the banana peeling task.

Proposed approach:

A. Robot framework B. Task Specification: C. goal-conditioned dual-action

D. Model Architecture:

Training (give the training process !!!)

Method(s) for evaluating approach:

Ablation studies (each ablation study was tested with 15 bananas):

Ablation study:

The effect of goal-conditioned:

Contributions:

Results

Challenges:

Open-ended research questions:


Fully Autonomous Real-World Reinforcement Learning with Applications to Mobile Manipulation

Authors: Charles Sun, Jędrzej Orbik, Coline Devin, Brian Yang, Abhishek Gupta, Glen Berseth, Sergey Levine

Keywords: Mobile Manipulation, Reinforcement Learning, Reset-Free

Abstract: We study how robots can autonomously learn skills that require a combination of navigation and grasping. While reinforcement learning in principle provides for automated robotic skill learning, in practice reinforcement learning in the real world is challenging and often requires extensive instrumentation and supervision. Our aim is to devise a robotic reinforcement learning system for learning navigation and manipulation together, in an autonomous way without human intervention, enabling continual learning under realistic assumptions. Our proposed system, ReLMM, can learn continuously on a real-world platform without any environment instrumentation, without human intervention, and without access to privileged information, such as maps, objects positions, or a global view of the environment. Our method employs a modularized policy with components for manipulation and navigation, where manipulation policy uncertainty drives exploration for the navigation controller, and the manipulation module provides rewards for navigation. We evaluate our method on a room cleanup task, where the robot must navigate to and pick up items scattered on the floor. After a grasp curriculum training phase, ReLMM can learn navigation and grasping together fully automatically in around 40 hours of autonomous real-world training.

Proposed approach

Networks:

Grasping Policy Training (given an image, predict the likelihood of grasp success for each action):

Navigation Policy Training:

Autonomous Pseudo-Resets:

Training Curricula:

Method(s) for evaluating approach

Contributions:

Results:

Challenges:

Personal Notes


One-Shot Domain-Adaptive Imitation Learning via Progressive Learning

Authors: Dandan Zhang, Wen Fan, John Lloyd, Chenguang Yang, Nathan Lepora

Keywords:

Abstract: Traditional deep learning-based visual imitation learning techniques require a large amount of demonstration data for model training, and the pre-trained models are difficult to adapt to new scenarios. To address these limitations, we propose a unified framework using a novel progressive learning approach comprised of three phases: i) a coarse learning phase for concept representation, ii) a fine learning phase for action generation, and iii) an imaginary learning phase for domain adaptation. Overall, this approach leads to a one-shot domain-adaptive imitation learning framework. We use robotic pouring task as an example to evaluate its effectiveness. Our results show that the method has several advantages over contemporary end-to-end imitation learning approaches, including an improved success rate for task execution and more efficient training for deep imitation learning. In addition, the generalizability to new domains is improved, as demonstrated here with novel background, target container and granule combinations. We believe that the proposed method can be broadly applicable to different industrial or domestic applications that involve deep imitation learning for robotic manipulation, where the target scenarios have high diversity while the human demonstration data is limited.

Proposed approach:

Robotic Pouring Task:

The demonstration database is constructed from ten distinct pouring scenes.

A. Coarse Learning: is an adapted version of a ResNet18 model reorganized into a multi-head structure for multi-variable classification

tilt angle control: 3-class classification problem on the visual images

3D Position Adjustment: 3-class classification problem

Encoding characteristics:

To update the model

B. Fine Learning: Action Generation: use a Long-Short Term Memory (LSTM) recurrent neural network

C. Domain Adaptation: due to the limitation of aligned image pairs from different domains

Method(s) for evaluating approach:

Contributions:

Conclusions:

Results:

Challenges:

Personal Notes:


FFHNet: Generating Multi-Fingered Robotic Grasps for Unknown Objects in Real-time

Authors: Vincent Mayer, Qian Feng, Jun Deng, Yunlei Shi, Zhaopeng Chen, Alois Knoll

Keywords

Abstract: Grasping unknown objects with multi-fingered hands at high success rates and in real-time is an unsolved problem. Existing methods are limited in the speed of grasp synthesis or the ability to synthesize a variety of grasps from the same observation. We introduce Five-finger Hand Net (FFHNet), an ML model which can generate a wide variety of high-quality multi-fingered grasps for unseen objects from a single view. Generating and evaluating grasps with FFHNet takes only 30ms on a commodity GPU. To the best of our knowledge, FFHNet is the first ML-based real-time system for multi-fingered grasping with the ability to perform grasp inference at 30 frames per second (FPS). For training, we synthetically generate 180k grasp samples for 129 objects. We are able to achieve 91% grasping success for unknown objects in simulation and we demonstrate the model’s capabilities of synthesizing high-quality grasps also for real unseen objects.

Proposed approach

Grasps sampling:

Grasp evaluation:

Grasp generation: Convolutional Variational Autoencoder

Grasp evaluation: able to distinguish between successful and unsuccessful grasps

The core building block of both models is the FC ResBlock:

The dataset:

Data generation:

Sampling joint configuration:

Method(s) for evaluating approach:

Sim experimental evaluation:

Sim-to-real grasping:

Contributions:

Results:

Challenges:


Deep Reinforcement Learning for Robotic Manipulation with Asynchronous Off-Policy Updates

Authors: Shixiang Gu, Ethan Holly, Timothy Lillicrap, Sergey Levine

Keywords:

Abstract: Reinforcement learning holds the promise of enabling autonomous robots to learn large repertoires of behavioral skills with minimal human intervention. However, robotic applications of reinforcement learning often compromise the autonomy of the learning process in favor of achieving training times that are practical for real physical systems. This typically involves introducing hand-engineered policy representations and human-supplied demonstrations. Deep reinforcement learning alleviates this limitation by training general-purpose neural network policies, but applications of direct deep reinforcement learning algorithms have so far been restricted to simulated settings and relatively simple tasks, due to their apparent high sample complexity. In this paper, we demonstrate that a recent deep reinforcement learning algorithm based on offpolicy training of deep Q-functions can scale to complex 3D manipulation tasks and can learn deep neural network policies efficiently enough to train on real physical robots. We demonstrate that the training times can be further reduced by parallelizing the algorithm across multiple robots which pool their policy updates asynchronously. Our experimental evaluation shows that our method can learn a variety of 3D manipulation skills in simulation and a complex door opening skill on real robots without any prior demonstrations or manually designed representations. Proposed approach:

Contributions

Challenges:


Solving Rubik’s Cube With A Robot Hand

Authors: OpenAI, Ilge Akkaya, Marcin Andrychowicz, Maciek Chociej, Mateusz Litwin, Bob McGrew, Arthur Petron, Alex Paino, Matthias Plappert, Glenn Powell, Raphael Ribas, Jonas Schneider, Nikolas Tezak, Jerry Tworek, Peter Welinder, Lilian Weng, Qiming Yuan, Wojciech Zaremba, Lei Zhang

Keywords:

Abstract: We demonstrate that models trained only in simulation can be used to solve a manipulation problem of unprecedented complexity on a real robot. This is made possible by two key components: a novel algorithm, which we call automatic domain randomization (ADR) and a robot platform built for machine learning. ADR automatically generates a distribution over randomized environments of ever-increasing difficulty. Control policies and vision state estimators trained with ADR exhibit vastly improved sim2real transfer. For control policies, memory-augmented models trained on an ADR-generated distribution of environments show clear signs of emergent meta-learning at test time. The combination of ADR with our custom robot platform allows us to solve a Rubik’s cube with a humanoid robot hand, which involves both control and state estimation problems. Videos summarizing our results are available: https://openai.com/blog/solving-rubiks-cube/

Proposed approach:

Two tasks:

Automatic Domain Randomization:

Contributions:

Conclusions

Results:

Challenges:


Efficient multitask learning with an embodied predictive model for door opening and entry with whole-body control

Authors: Hiroshi Ito, Kenjiro Yamamoto, Hiroki Mori, And Tetsuya Ogata

Keywords:

Abstract: Robots need robust models to effectively perform tasks that humans do on a daily basis. These models often require substantial developmental costs to maintain because they need to be adjusted and adapted over time. Deep reinforcement learning is a powerful approach for acquiring complex real-world models because there is no need for a human to design the model manually. Furthermore, a robot can establish new motions and optimal trajectories that may not have been considered by a human. However, the cost of learning is an issue because it requires a huge amount of trial and error in the real world. Here, we report a method for realizing complicated tasks in the real world with low design and teaching costs based on the principle of prediction error minimization. We devised a module integration method by introducing a mechanism that switches modules based on the prediction error of multiple modules. The robot generates appropriate motions according to the door’s position, color, and pattern with a low teaching cost. We also show that by calculating the prediction error of each module in real time, it is possible to execute a sequence of tasks (opening door outward and passing through) by linking multiple modules and responding to sudden changes in the situation and operating procedures. The experimental results show that the method is effective at enabling a robot to operate autonomously in the real world in response to changes in the environment.

Proposed approach:

Method(s) for evaluating approach:

Contributions

Conclusions

Results

Challenges:

Personal Notes:


Deep Haptic Model Predictive Control for Robot-Assisted Dressing

Authors: Zackory Erickson, Henry M. Clever, Greg Turk, C. Karen Liu, and Charles C. Kemp

Keywords:

Abstract: Robot-assisted dressing offers an opportunity to benefit the lives of many people with disabilities, such as some older adults. However, robots currently lack common sense about the physical implications of their actions on people. The physical implications of dressing are complicated by non-rigid garments, which can result in a robot indirectly applying high forces to a person’s body. We present a deep recurrent model that, when given a proposed action by the robot, predicts the forces a garment will apply to a person’s body. We also show that a robot can provide better dressing assistance by using this model with model predictive control. The predictions made by our model only use haptic and kinematic observations from the robot’s end effector, which are readily attainable. Collecting training data from real world physical human-robot interaction can be time consuming, costly, and put people at risk. Instead, we train our predictive model using data collected in an entirely self-supervised fashion from a physics-based simulation. We evaluated our approach with a PR2 robot that attempted to pull a hospital gown onto the arms of 10 human participants. With a 0.2s prediction horizon, our controller succeeded at high rates and lowered applied force while navigating the garment around a persons fist and elbow without getting caught. Shorter prediction horizons resulted in significantly reduced performance with the sleeve catching on the participants’ fists and elbows, demonstrating the value of our model’s predictions. These behaviors of mitigating catches emerged from our deep predictive model and the controller objective function, which primarily penalizes high forces.

Proposed approach:

Simulation And Model Training:

The predictor (G):

advantages of a split architecture:

Method(s) for evaluating approach:

Experimental Setup:

Contributions:

Conclusions:

Results:

Challenges:


Deep Learning for Tactile Understanding From Visual and Haptic Data

Authors: Yang Gao, Lisa Anne Hendricks, Katherine J. Kuchenbecker, Trevor Darrell

Keywords:

Abstract: Robots which interact with the physical world will benefit from a fine-grained tactile understanding of objects and surfaces. Additionally, for certain tasks, robots may need to know the haptic properties of an object before touching it. To enable better tactile understanding for robots, we propose a method of classifying surfaces with haptic adjectives (e.g., compressible or smooth) from both visual and physical interaction data. Humans typically combine visual predictions and feedback from physical interactions to accurately predict haptic properties and interact with the world. Inspired by this cognitive pattern, we propose and explore a purely visual haptic prediction model. Purely visual models enable a robot to “feel” without physical interaction. Furthermore, we demonstrate that using both visual and physical interaction signals together yields more accurate haptic classification. Our models take advantage of recent advances in deep neural networks by employing a unified approach to learning features for physical interaction and visual observations. Even though we employ little domain specific knowledge, our model still achieves better results than methods based on hand-designed features.

Proposed approach:

Haptic CNN Model for classification:

Haptic LSTM Model: natural fit for understanding haptic time-series signals

Visual CNN Model: using transfer learning from a CNN that is fine-tuned on the Materials in Context Database (MINC)

Multimodal Learning:

Method(s) for evaluating approach:

Contributions:

Conclusions:

Results:

5 Conclusions

Summarize you view on the state of the art in the field which you have been investigating. (half a page, times roman 11pt, single space)

6 References

List of references in IEEE, ACM, APA, etc. format. Attach an electronic copy of the paper to each reference!!!!!!!!!

7 Appendix

7.3 Sources

7.3.1 List of searched journals

7.3.2 List of searched conference proceedings

7.3.3 List of searched magazines

7.3.4 Other searched publications

pictured either as structured list or as tree

7.5 List of most important conferences

7.6 List of most important journals and magazines

Journals:

7.7 List of top research labs/researchers (in no particular order)

Top research labs:

7.8 Mindmap