Publications

You can find already published work on my Google scholar page.
Research videos are available on YouTube.

2024

Uncertainty Representations in State-Space Layers for Deep Reinforcement Learning under Partial Observability
Carlos E. Luis, Alessandro G Bottero, Julia Vinogradska, Felix Berkenkamp, Jan Peters
arXiv preprint arXiv:2409.16824, 2024
Optimal decision-making under partial observability requires reasoning about the uncertainty of the environment’s hidden state. However, most reinforcement learning architectures handle partial observability with sequence models that have no internal mechanism to incorporate uncertainty in their hidden state representation, such as recurrent neural networks, deterministic state-space models and transformers. Inspired by advances in probabilistic world models for reinforcement learning, we propose a standalone Kalman filter layer that performs closed-form Gaussian inference in linear state-space models and train it end-to-end within a model-free architecture to maximize returns. Similar to efficient linear recurrent layers, the Kalman filter layer processes sequential data using a parallel scan, which scales logarithmically with the sequence length. By design, Kalman filter layers are a drop-in replacement for other recurrent layers in standard model-free architectures, but importantly they include an explicit mechanism for probabilistic filtering of the latent state representation. Experiments in a wide variety of tasks with partial observability show that Kalman filter layers excel in problems where uncertainty reasoning is key for decision-making, outperforming other stateful models.

Close
```
@article{luis2024uncertainty,
  title = {Uncertainty Representations in State-Space Layers for Deep Reinforcement Learning under Partial Observability},
  author = {Luis, Carlos E. and Bottero, Alessandro G and Vinogradska, Julia and Berkenkamp, Felix and Peters, Jan},
  journal = {arXiv preprint arXiv:2409.16824},
  year = {2024},
  url = {https://arxiv.org/abs/2409.16824}
}
```
Scalable Meta-Learning with Gaussian Processes
Petru Tighineanu, Lukas Grossberger, Paul Baireuther, Kathrin Skubch, Stefan Falkner, Julia Vinogradska, Felix Berkenkamp
in The 25th International Conference on Artificial Intelligence and Statistics (AISTATS), 2024
Meta-learning is a powerful approach that exploits historical data to quickly solve new tasks from the same distribution. In the low-data regime, methods based on the closed-form posterior of Gaussian processes (GP) together with Bayesian optimization have achieved high performance. However, these methods are either computationally expensive or introduce assumptions that hinder a principled propagation of uncertainty between task models. This may disrupt the balance between exploration and exploitation during optimization. In this paper, we develop ScaML-GP, a modular GP model for meta-learning that is scalable in the number of tasks. Our core contribution is a carefully designed multi-task kernel that enables hierarchical training and task scalability. Conditioning ScaML-GP on the meta-data exposes its modular nature yielding a test-task prior that combines the posteriors of meta-task GPs. In synthetic and real-world meta-learning experiments, we demonstrate that ScaML-GP can learn efficiently both with few and many meta-tasks.

Close
```
@inproceedings{tighineanu2023scalable,
  title = {Scalable Meta-Learning with Gaussian Processes},
  booktitle = {The 25th International Conference on Artificial Intelligence and Statistics (AISTATS)},
  author = {Tighineanu, Petru and Grossberger, Lukas and Baireuther, Paul and Skubch, Kathrin and Falkner, Stefan and Vinogradska, Julia and Berkenkamp, Felix},
  year = {2024},
  eprint = {2312.00742},
  archiveprefix = {arXiv},
  primaryclass = {stat.ML},
  url = {https://arxiv.org/abs/2312.00742}
}
```
Information-Theoretic Safe Bayesian Optimization
Alessandro G. Bottero, Carlos E. Luis, Julia Vinogradska, Felix Berkenkamp, Jan Peters
Technical report, ArXiv, 2024
We consider a sequential decision making task, where the goal is to optimize an unknown function without evaluating parameters that violate an a priori unknown (safety) constraint. A common approach is to place a Gaussian process prior on the unknown functions and allow evaluations only in regions that are safe with high probability. Most current methods rely on a discretization of the domain and cannot be directly extended to the continuous case. Moreover, the way in which they exploit regularity assumptions about the constraint introduces an additional critical hyperparameter. In this paper, we propose an information-theoretic safe exploration criterion that directly exploits the GP posterior to identify the most informative safe parameters to evaluate. The combination of this exploration criterion with a well known Bayesian optimization acquisition function yields a novel safe Bayesian optimization selection criterion. Our approach is naturally applicable to continuous domains and does not require additional explicit hyperparameters. We theoretically analyze the method and show that we do not violate the safety constraint with high probability and that we learn about the value of the safe optimum up to arbitrary precision. Empirical evaluations demonstrate improved data-efficiency and scalability.

Close
```
@misc{bottero2024informationtheoretic,
  title = {Information-Theoretic Safe Bayesian Optimization},
  publisher = {ArXiv},
  author = {Bottero, Alessandro G. and Luis, Carlos E. and Vinogradska, Julia and Berkenkamp, Felix and Peters, Jan},
  year = {2024},
  eprint = {2402.15347},
  archiveprefix = {arXiv},
  primaryclass = {cs.LG},
  url = {https://arxiv.org/abs/2402.15347}
}
```
Value-Distributional Model-Based Reinforcement Learning
Carlos E. Luis, G. Alessandro Bottero, Julia Vinogradska, Felix Berkenkamp, Jan Peters
Journal of Machine Learning Research, vol. 25, no. 298, 2024
Quantifying uncertainty about a policy’s long-term performance is important to solve sequential decision-making tasks. We study the problem from a model-based Bayesian reinforcement learning perspective, where the goal is to learn the posterior distribution over value functions induced by parameter (epistemic) uncertainty of the Markov decision process. Previous work restricts the analysis to a few moments of the distribution over values or imposes a particular distribution shape, e.g., Gaussians. Inspired by distributional reinforcement learning, we introduce a Bellman operator whose fixed-point is the value distribution function. Based on our theory, we propose Epistemic Quantile-Regression (EQR), a model-based algorithm that learns a value distribution function. We combine EQR with soft actor-critic (SAC) for policy optimization with an arbitrary differentiable objective function of the learned value distribution. Evaluation across several continuous-control tasks shows performance benefits with respect to both model-based and model-free algorithms. The code is available at https://github.com/boschresearch/dist-mbrl.

Close
```
@article{Luis2023ValueDistributional,
  title = {Value-Distributional Model-Based Reinforcement Learning},
  author = {Luis, Carlos E. and Bottero, G. Alessandro and Vinogradska, Julia and Berkenkamp, Felix and Peters, Jan},
  journal = {Journal of Machine Learning Research},
  year = {2024},
  volume = {25},
  number = {298},
  pages = {1--42},
  url = {http://jmlr.org/papers/v25/23-0913.html}
}
```
MALIBO: Meta-learning for Likelihood-free Bayesian Optimization
Jiarong Pan, Stefan Falkner, Felix Berkenkamp, Joaquin Vanschoren
in International Conference on Machine Learning (ICML), 2024
Bayesian optimization (BO) is a popular method to optimize costly black-box functions. While traditional BO optimizes each new target task from scratch, meta-learning has emerged as a way to leverage knowledge from related tasks to optimize new tasks faster. However, existing meta-learning BO methods rely on surrogate models that suffer from scalability issues and are sensitive to observations with different scales and noise types across tasks. Moreover, they often overlook the uncertainty associated with task similarity. This leads to unreliable task adaptation when only limited observations are obtained or when the new tasks differ significantly from the related tasks. To address these limitations, we propose a novel meta-learning BO approach that bypasses the surrogate model and directly learns the utility of queries across tasks. Our method explicitly models task uncertainty and includes an auxiliary model to enable robust adaptation to new tasks. Extensive experiments show that our method demonstrates strong anytime performance and outperforms state-of-the-art meta-learning BO methods in various benchmarks.

Close
```
@inproceedings{Pan2024Malibo,
  title = {MALIBO: Meta-learning for Likelihood-free Bayesian Optimization},
  booktitle = {International Conference on Machine Learning (ICML)},
  author = {Pan, Jiarong and Falkner, Stefan and Berkenkamp, Felix and Vanschoren, Joaquin},
  year = {2024},
  url = {https://arxiv.org/abs/2307.03565}
}
```

2023

Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization
Carlos E. Luis, Alessandro G. Bottero, Julia Vinogradska, Felix Berkenkamp, Jan Peters
Technical report, ArXiv, 2023
We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning. In particular, we focus on characterizing the variance over values induced by a distribution over MDPs. Previous work upper bounds the posterior variance over values by solving a so-called uncertainty Bellman equation (UBE), but the over-approximation may result in inefficient exploration. We propose a new UBE whose solution converges to the true posterior variance over values and leads to lower regret in tabular exploration problems. We identify challenges to apply the UBE theory beyond tabular problems and propose a suitable approximation. Based on this approximation, we introduce a general-purpose policy optimization algorithm, Q-Uncertainty Soft Actor-Critic (QU-SAC), that can be applied for either risk-seeking or risk-averse policy optimization with minimal changes. Experiments in both online and offline RL demonstrate improved performance compared to other uncertainty estimation methods.

Close
```
@misc{luis2023modelbased,
  title = {Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization},
  publisher = {ArXiv},
  author = {Luis, Carlos E. and Bottero, Alessandro G. and Vinogradska, Julia and Berkenkamp, Felix and Peters, Jan},
  year = {2023},
  eprint = {2312.04386},
  archiveprefix = {arXiv},
  primaryclass = {cs.LG},
  url = {https://arxiv.org/abs/2312.04386}
}
```
Projected Off-Policy Q-Learning (POP-QL) for Stabilizing Offline Reinforcement Learning
Melrose Roderick, Gaurav Manek, Felix Berkenkamp, J. Zico Kolter
Technical report, ArXiv, 2023
A key problem in off-policy Reinforcement Learning (RL) is the mismatch, or distribution shift, between the dataset and the distribution over states and actions visited by the learned policy. This problem is exacerbated in the fully offline setting. The main approach to correct this shift has been through importance sampling, which leads to high-variance gradients. Other approaches, such as conservatism or behavior-regularization, regularize the policy at the cost of performance. In this paper, we propose a new approach for stable off-policy Q-Learning. Our method, Projected Off-Policy Q-Learning (POP-QL), is a novel actor-critic algorithm that simultaneously reweights off-policy samples and constrains the policy to prevent divergence and reduce value-approximation error. In our experiments, POP-QL not only shows competitive performance on standard benchmarks, but also out-performs competing methods in tasks where the data-collection policy is significantly sub-optimal.

Close
```
@misc{roderick2023projected,
  title = {Projected Off-Policy Q-Learning (POP-QL) for Stabilizing Offline Reinforcement Learning},
  publisher = {ArXiv},
  author = {Roderick, Melrose and Manek, Gaurav and Berkenkamp, Felix and Kolter, J. Zico},
  year = {2023},
  eprint = {2311.14885},
  archiveprefix = {arXiv},
  primaryclass = {cs.LG},
  url = {https://arxiv.org/abs/2311.14885}
}
```
Generative Posterior Networks for Approximately Bayesian Epistemic Uncertainty Estimation
Melrose Roderick, Felix Berkenkamp, Fatemeh Sheikholeslami, Zico Kolter
Technical report, ArXiv, 2023
In many real-world problems, there is a limited set of training data, but an abundance of unlabeled data. We propose a new method, Generative Posterior Networks (GPNs), that uses unlabeled data to estimate epistemic uncertainty in high-dimensional problems. A GPN is a generative model that, given a prior distribution over functions, approximates the posterior distribution directly by regularizing the network towards samples from the prior. We prove theoretically that our method indeed approximates the Bayesian posterior and show empirically that it improves epistemic uncertainty estimation and scalability over competing methods.

Close
```
@misc{roderick2023generative,
  title = {Generative Posterior Networks for Approximately Bayesian Epistemic Uncertainty Estimation},
  publisher = {ArXiv},
  author = {Roderick, Melrose and Berkenkamp, Felix and Sheikholeslami, Fatemeh and Kolter, Zico},
  year = {2023},
  eprint = {2312.17411},
  archiveprefix = {arXiv},
  primaryclass = {cs.LG},
  url = {https://arxiv.org/abs/2312.17411}
}
```
Model-Based Uncertainty in Value Functions
Carlos E. Luis, G. Alessandro Bottero, Julia Vinogradska, Felix Berkenkamp, Jan Peters
in The 25th International Conference on Artificial Intelligence and Statistics (AISTATS), 2023
We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning. In particular, we focus on characterizing the variance over values induced by a distribution over MDPs. Previous work upper bounds the posterior variance over values by solving a so-called uncertainty Bellman equation, but the over-approximation may result in inefficient exploration. We propose a new uncertainty Bellman equation whose solution converges to the true posterior variance over values and explicitly characterizes the gap in previous work. Moreover, our uncertainty quantification technique is easily integrated into common exploration strategies and scales naturally beyond the tabular setting by using standard deep reinforcement learning architectures. Experiments in difficult exploration tasks, both in tabular and continuous control settings, show that our sharper uncertainty estimates improve sample-efficiency.

Close
```
@inproceedings{Luis2023ModelUncertainty,
  title = {Model-Based Uncertainty in Value Functions},
  booktitle = {The 25th International Conference on Artificial Intelligence and Statistics (AISTATS)},
  author = {Luis, Carlos E. and Bottero, G. Alessandro and Vinogradska, Julia and Berkenkamp, Felix and Peters, Jan},
  year = {2023},
  url = {https://arxiv.org/abs/2302.12526}
}
```

2022

Information-Theoretic Safe Exploration with Gaussian Processes
G. Alessandro Bottero, Carlos E. Luis, Julia Vinogradska, Felix Berkenkamp, Jan Peters
in Neural Information Processing Systems (NeurIPS), 2022
We consider a sequential decision making task where we are not allowed to evaluate parameters that violate an a priori unknown (safety) constraint. A common approach is to place a Gaussian process prior on the unknown constraint and allow evaluations only in regions that are safe with high probability. Most current methods rely on a discretization of the domain and cannot be directly extended to the continuous case. Moreover, the way in which they exploit regularity assumptions about the constraint introduces an additional critical hyperparameter. In this paper, we propose an information-theoretic safe exploration criterion that directly exploits the GP posterior to identify the most informative safe parameters to evaluate. Our approach is naturally applicable to continuous domains and does not require additional hyperparameters. We theoretically analyze the method and show that we do not violate the safety constraint with high probability and that we explore by learning about the constraint up to arbitrary precision. Empirical evaluations demonstrate improved data-efficiency and scalability.

Close
```
@inproceedings{Bottero2022SafeInformation,
  title = {Information-Theoretic Safe Exploration with Gaussian Processes},
  booktitle = {Neural Information Processing Systems (NeurIPS)},
  author = {Bottero, G. Alessandro and Luis, Carlos E. and Vinogradska, Julia and Berkenkamp, Felix and Peters, Jan},
  year = {2022},
  url = {https://arxiv.org/abs/2212.04914}
}
```
On-Policy Model Errors in Reinforcement Learning
Lukas P. Fröhlich, Maksym Lefarov, Melanie N. Zeilinger, Felix Berkenkamp
in International Conference on Learning Representations (ICLR), 2022
Model-free reinforcement learning algorithms can compute policy gradients given sampled environment transitions, but require large amounts of data. In contrast, model-based methods can use the learned model to generate new data, but model errors and bias can render learning unstable or suboptimal. In this paper, we present a novel method that combines real-world data and a learned model in order to get the best of both worlds. The core idea is to exploit the real-world data for on-policy predictions and use the learned model only to generalize to different actions. Specifically, we use the data as time-dependent on-policy correction terms on top of a learned model, to retain the ability to generate data without accumulating errors over long prediction horizons. We motivate this method theoretically and show that it counteracts an error term for model-based policy improvement. Experiments on MuJoCo- and PyBullet-benchmarks show that our method can drastically improve existing model-based approaches without introducing additional tuning parameters.

Close
```
@inproceedings{Froehlich2022OnPolicyCorrections,
  title = {On-Policy Model Errors in Reinforcement Learning},
  booktitle = {International Conference on Learning Representations (ICLR)},
  author = {Fröhlich, Lukas P. and Lefarov, Maksym and Zeilinger, Melanie N. and Berkenkamp, Felix},
  year = {2022},
  url = {https://arxiv.org/abs/2110.07985}
}
```
Transfer Learning with Gaussian Processes for Bayesian Optimization
Petru Tighineanu, Kathrin Skubch, Paul Baireuther, Attila Reiss, Felix Berkenkamp, Julia Vinogradska
in The 25th International Conference on Artificial Intelligence and Statistics (AISTATS), 2022
Bayesian optimization is a powerful paradigm to optimize black-box functions based on scarce and noisy data. Its data efficiency can be further improved by transfer learning from related tasks. While recent transfer models meta-learn a prior based on large amount of data, in the low-data regime methods that exploit the closed-form posterior of Gaussian processes (GPs) have an advantage. In this setting, several analytically tractable transfer-model posteriors have been proposed, but the relative advantages of these methods are not well understood. In this paper, we provide a unified view on hierarchical GP models for transfer learning, which allows us to analyze the relationship between methods. As part of the analysis, we develop a novel closed-form boosted GP transfer model that fits between existing approaches in terms of complexity. We evaluate the performance of the different approaches in large-scale experiments and highlight strengths and weaknesses of the different transfer-learning methods.

Close
```
@inproceedings{Tighineanu2022Transfer,
  title = {Transfer Learning with Gaussian Processes for Bayesian Optimization},
  booktitle = {The 25th International Conference on Artificial Intelligence and Statistics (AISTATS)},
  author = {Tighineanu, Petru and Skubch, Kathrin and Baireuther, Paul and Reiss, Attila and Berkenkamp, Felix and Vinogradska, Julia},
  year = {2022},
  url = {https://arxiv.org/pdf/2111.11223}
}
```

2020

Efficient Model-Based Reinforcement Learning through Optimistic Policy Search and Planning
Sebastian Curi*, Felix Berkenkamp*, Andreas Krause
in Neural Information Processing Systems (NeurIPS), 2020
Spotlight talk
Model-based reinforcement learning algorithms with probabilistic dynamical models are amongst the most data-efficient learning methods. This is often attributed to their ability to distinguish between epistemic and aleatoric uncertainty. However, while most algorithms distinguish these two uncertainties for \em learning the model, they ignore it when \em optimizing the policy. In this paper, we show that ignoring the epistemic uncertainty leads to greedy algorithms that do not explore sufficiently. In turn, we propose a \em practical optimistic-exploration algorithm (\alg), which enlarges the input space with \em hallucinated inputs that can exert as much control as the \em epistemic uncertainty in the model affords. We analyze this setting and construct a general regret bound for well-calibrated models, which is provably sublinear in the case of Gaussian Process models. Based on this theoretical foundation, we show how optimistic exploration can be easily combined with state-of-the-art reinforcement learning algorithms and different probabilistic models. Our experiments demonstrate that optimistic exploration significantly speeds up learning when there are penalties on actions, a setting that is notoriously difficult for existing model-based reinforcement learning algorithms.

Close
```
@inproceedings{Curi2020OptimisticModel,
  title = {Efficient Model-Based Reinforcement Learning through Optimistic Policy Search and Planning},
  booktitle = {Neural Information Processing Systems (NeurIPS)},
  author = {Curi*, Sebastian and Berkenkamp*, Felix and Krause, Andreas},
  year = {2020},
  url = {https://arxiv.org/abs/2006.08684}
}
```
Structured Variational Inference in Partially Observable Unstable Gaussian Process State Space Models
Silvan Melchior, Sebastian Curi, Felix Berkenkamp, Andreas Krause
in Learning for Dynamics and Control (L4DC), 2020
We propose a new variational inference algorithm for learning in Gaussian Process State-Space Models (GPSSMs). Our algorithm enables learning of unstable and partially observable systems, where previous algorithms fail. Our main algorithmic contribution is a novel approximate posterior that can be calculated efficiently using a single forward and backward pass along the training trajectories. The forward-backward pass is inspired on Kalman smoothing for linear dynamical systems but generalizes to GPSSMs. Our second contribution is a modification of the conditioning step that effectively lowers the Kalman gain. This modification is crucial to attaining good test performance where no measurements are available. Finally, we show experimentally that our learning algorithm performs well in stable and unstable real systems with hidden states.

Close
```
@inproceedings{Melchior2020Structured,
  title = {Structured Variational Inference in Partially Observable Unstable Gaussian Process State Space Models},
  booktitle = {Learning for Dynamics and Control (L4DC)},
  author = {Melchior, Silvan and Curi, Sebastian and Berkenkamp, Felix and Krause, Andreas},
  year = {2020},
  url = {https://arxiv.org/abs/1907.07035}
}
```
Keep Doing What Worked: Behavior Modelling Priors for Offline Reinforcement Learning
Noah Siegel, Jost Tobias Springenberg, Felix Berkenkamp, Abbas Abdolmaleki, Michael Neunert, Thomas Lampe, Roland Hafner, Nicolas Heess, Martin Riedmiller
in International Conference on Learning Representations (ICLR), 2020
Off-policy reinforcement learning algorithms promise to be applicable in settings where only a fixed data-set (batch) of environment interactions is available and no new experience can be acquired. This property makes these algorithms appealing for real world problems such as robot control. In practice, however, standard off-policy algorithms fail in the batch setting for continuous control. In this paper, we propose a simple solution to this problem. It admits the use of data generated by arbitrary behavior policies and uses a learned prior – the advantage-weighted behavior model (ABM) – to bias the RL policy towards actions that have previously been executed and are likely to be successful on the new task. Our method can be seen as an extension of recent work on batch-RL that enables stable learning from conflicting data-sources. We find improvements on competitive baselines in a variety of RL tasks – including standard continuous control benchmarks and multi-task learning for simulated and real-world robots.

Close
```
@inproceedings{Siegel2020Keep,
  title = {Keep Doing What Worked: Behavior Modelling Priors for Offline Reinforcement Learning},
  author = {Siegel, Noah and Springenberg, Jost Tobias and Berkenkamp, Felix and Abdolmaleki, Abbas and Neunert, Michael and Lampe, Thomas and Hafner, Roland and Heess, Nicolas and Riedmiller, Martin},
  booktitle = {International Conference on Learning Representations (ICLR)},
  year = {2020},
  url = {https://openreview.net/forum?id=rke7geHtwH}
}
```

2019

Safe Exploration in Reinforcement Learning: Theory and Applications in Robotics

Felix Berkenkamp

PhD thesis, ETH Zurich, 2019

Bibtex
PDF

@phdthesis{Berkenkamp19Thesis,
  author = {Berkenkamp, Felix},
  title = {Safe Exploration in Reinforcement Learning: Theory and Applications in Robotics},
  school = {ETH Zurich},
  year = {2019},
  url = {https://www.research-collection.ethz.ch/bitstream/handle/20.500.11850/370833/1/root.pdf}
}

Safe Exploration for Interactive Machine Learning
Matteo Turchetta, Felix Berkenkamp, Andreas Krause
in Neural Information Processing Systems (NeurIPS), 2019
In interactive machine learning (IML), we iteratively make decisions and obtain noisy observations of an unknown function. While IML methods, e.g., Bayesian optimization and active learning, have been successful in applications, on real-world systems they must provably avoid unsafe decisions. To this end, safe IML algorithms must carefully learn about a priori unknown constraints without making unsafe decisions. Existing algorithms for this problem learn about the safety of all decisions to ensure convergence. This is sample-inefficient, as it explores decisions that are not relevant for the original IML objective. In this paper, we introduce a novel framework that renders any existing unsafe IML algorithm safe. Our method works as an add-on that takes suggested decisions as input and exploits regularity assumptions in terms of a Gaussian process prior in order to efficiently learn about their safety. As a result, we only explore the safe set when necessary for the IML problem. We apply our framework to safe Bayesian optimization and to safe exploration in deterministic Markov Decision Processes (MDP), which have been analyzed separately before. Our method outperforms other algorithms empirically.

Close
```
@inproceedings{Turchetta19Goose,
  author = {Turchetta, Matteo and Berkenkamp, Felix and Krause, Andreas},
  booktitle = {Neural Information Processing Systems (NeurIPS)},
  title = {Safe Exploration for Interactive Machine Learning},
  year = {2019},
  url = {https://arxiv.org/abs/1910.13726}
}
```
Learning-based Model Predictive Control for Safe Exploration and Reinforcement Learning
Torsten Koller*, Felix Berkenkamp*, Matteo Turchetta, Joschka Boedecker, Andreas Krause
Technical report, ArXiv, 2019
Reinforcement learning (RL) has been successfully used to solve difficult tasks in complex unknown environments solely based on feedback signals from the system. However, these methods typically do not provide any safety guarantees, especially in early stages when the RL agent actively explores its environment. This prevents their use in safety-critical, real-world applications. In this paper, we present a learning-based model predictive control scheme that provides high-probability safety guarantees during the RL learning process. Based on a reliable statistical model, we construct provably accurate confidence intervals on predicted trajectories. Unlike previous approaches, we allow for input-dependent uncertainties. Based on these reliable predictions, we guarantee that trajectories satisfy safety constraints. Moreover, we use a terminal set constraint to recursively guarantee the existence of safe control actions at every iteration. We evaluate the resulting algorithm to safely explore the dynamics of an inverted pendulum and to solve a RL task in a cart-pole dynamical system with safety constraints.

Close
```
@misc{Koller2019Learningbased,
  title = {Learning-based Model Predictive Control for Safe Exploration and Reinforcement Learning},
  publisher = {ArXiv},
  author = {Koller*, Torsten and Berkenkamp*, Felix and Turchetta, Matteo and Boedecker, Joschka and Krause, Andreas},
  year = {2019},
  eprint = {1906.12189},
  archiveprefix = {arXiv},
  primaryclass = {eess.SY},
  url = {https://arxiv.org/abs/1906.12189}
}
```
No-Regret Bayesian Optimization with Unknown Hyperparameters
Felix Berkenkamp, Angela P. Schoellig, Andreas Krause
Journal of Machine Learning Research, vol. 20, no. 50, 2019
Bayesian optimization (BO) based on Gaussian process models is a powerful paradigm to optimize black-box functions that are expensive to evaluate. While several BO algorithms provably converge to the global optimum of the unknown function, they assume that the hyperparameters of the kernel are known in advance. This is not the case in practice and misspecification often causes these algorithms to converge to poor local optima. In this paper, we present the first BO algorithm that is provably no-regret and converges to the optimum without knowledge of the hyperparameters. During optimization we slowly adapt the hyperparameters of stationary kernels and thereby expand the associated function class over time, so that the BO algorithm considers more complex function candidates. Based on the theoretical insights, we propose several practical algorithms that achieve the empirical sample efficiency of BO with online hyperparameter estimation, but retain theoretical convergence guarantees. We evaluate our method on several benchmark problems.

Close
```
@article{Berkenkamp2019UnknownHyperparameters,
  author = {Berkenkamp, Felix and Schoellig, Angela P. and Krause, Andreas},
  title = {No-Regret Bayesian Optimization with Unknown Hyperparameters},
  journal = {Journal of Machine Learning Research},
  year = {2019},
  volume = {20},
  number = {50},
  pages = {1-24},
  url = {http://jmlr.org/papers/v20/18-213.html}
}
```
Information-Directed Exploration for Deep Reinforcement Learning
Nikolay Nikolov, Johannes Kirschner, Felix Berkenkamp, Andreas Krause
in International Conference on Learning Representations (ICLR), 2019
Efficient exploration remains a major challenge for reinforcement learning. One reason is that the variability of the returns often depends on the current state and action, and is therefore heteroscedastic. Classical exploration strategies such as upper confidence bound algorithms and Thompson sampling fail to appropriately account for heteroscedasticity, even in the bandit setting. Motivated by recent findings that address this issue in bandits, we propose to use Information-Directed Sampling (IDS) for exploration in reinforcement learning. As our main contribution, we build on recent advances in distributional reinforcement learning and propose a novel, tractable approximation of IDS for deep Q-learning. The resulting exploration strategy explicitly accounts for both parametric uncertainty and heteroscedastic observation noise. We evaluate our method on Atari games and demonstrate a significant improvement over alternative approaches.

Close
```
@inproceedings{Nikolov2018InformationDirected,
  title = {Information-Directed Exploration for Deep Reinforcement Learning},
  booktitle = {International Conference on Learning Representations (ICLR)},
  author = {Nikolov, Nikolay and Kirschner, Johannes and Berkenkamp, Felix and Krause, Andreas},
  year = {2019},
  url = {https://arxiv.org/abs/1812.07544},
  poster = {https://las.inf.ethz.ch/files/nikolov2019information_poster.pdf}
}
```
Learning to Compensate Photovoltaic Power Fluctuations from Images of the Sky by Imitating an Optimal Policy
Robin Spiess, Felix Berkenkamp, Jan Poland, Andreas Krause
in Proceedings of the European Control Conference (ECC), 2019
The energy output of photovoltaic (PV) power plants depends on the environment and thus fluctuates over time. As a result, PV power can cause instability in the power grid, in particular when increasingly used. Limiting the rate of change of the power output is a common way to mitigate these fluctuations, often with the help of large batteries. A reactive controller that uses these batteries to compensate ramps works in practice, but causes stress on the battery due to a high energy throughput. In this paper, we present a deep learning approach that uses images of the sky to compensate power fluctuations predictively and reduces battery stress. In particular, we show that the optimal control policy can be computed using information that is only available in hindsight. Based on this, we use imitation learning to train a neural network that approximates this hindsight-optimal policy, but uses only currently available sky images and sensor data. We evaluate our method on a large dataset of measurements and images from a real power plant and show that the trained policy reduces stress on the battery.

Close
```
@inproceedings{Spiess2018SkyImitation,
  title = {Learning to Compensate Photovoltaic Power Fluctuations from Images of the Sky by Imitating an Optimal Policy},
  booktitle = {Proceedings of the European Control Conference (ECC)},
  author = {Spiess, Robin and Berkenkamp, Felix and Poland, Jan and Krause, Andreas},
  year = {2019},
  url = {https://arxiv.org/abs/1811.05788}
}
```

2018

The Lyapunov Neural Network: Adaptive Stability Certification for Safe Learning of Dynamical Systems
Spencer M. Richards, Felix Berkenkamp, Andreas Krause
in Proceedings of The 2nd Conference on Robot Learning (CoRL), vol. 87, 2018
Oral presentation
Learning algorithms have shown considerable prowess in simulation by allowing robots to adapt to uncertain environments and improve their performance. However, such algorithms are rarely used in practice on safety-critical systems, since the learned policy typically does not yield any safety guarantees. That is, the required exploration may cause physical harm to the robot or its environment. In this paper, we present a method to learn accurate safety certificates for nonlinear, closed-loop dynamical systems. Specifically, we construct a neural network Lyapunov function and a training algorithm that adapts it to the shape of the largest safe region in the state space. The algorithm relies only on knowledge of inputs and outputs of the dynamics, rather than on any specific model structure. We demonstrate our method by learning the safe region of attraction for a simulated inverted pendulum. Furthermore, we discuss how our method can be used in safe learning algorithms together with statistical models of dynamical systems.

Close
```
@inproceedings{Spencer2018LyapunovNN,
  title = {The Lyapunov Neural Network: Adaptive Stability Certification for Safe Learning of Dynamical Systems},
  booktitle = {Proceedings of The 2nd Conference on Robot Learning (CoRL)},
  author = {Richards, Spencer M. and Berkenkamp, Felix and Krause, Andreas},
  pages = {466--476},
  volume = {87},
  publisher = {PMLR},
  year = {2018},
  url = {https://arxiv.org/abs/1808.00924}
}
```
Learning-based Model Predictive Control for Safe Exploration
Torsten Koller, Felix Berkenkamp, Matteo Turchetta, Andreas Krause
in Proc. of the Conference on Decision and Control (CDC), 2018
Learning-based methods have been successful in solving complex control tasks without significant prior knowledge about the system. However, these methods typically do not provide any safety guarantees, which prevents their use in safety-critical, real-world applications. In this paper, we present a learning-based model predictive control scheme that can provide provable high-probability safety guarantees. To this end, we exploit regularity assumptions on the dynamics in terms of a Gaussian process prior to construct provably accurate confidence intervals on predicted trajectories. Unlike previous approaches, we do not assume that model uncertainties are independent. Based on these predictions, we guarantee that trajectories satisfy safety constraints. Moreover, we use a terminal set constraint to recursively guarantee the existence of safe control actions at every iteration. In our experiments, we show that the resulting algorithm can be used to safely and efficiently explore and learn about dynamic systems.

Close
```
@inproceedings{Koller2018BSafeMPC,
  title = {Learning-based Model Predictive Control for Safe Exploration},
  author = {Koller, Torsten and Berkenkamp, Felix and Turchetta, Matteo and Krause, Andreas},
  booktitle = {Proc. of the Conference on Decision and Control (CDC)},
  year = {2018},
  url = {http://arxiv.org/abs/1602.04450}
}
```
Verifying Controllers Against Adversarial Examples with Bayesian Optimization
Shromona Ghosh, Felix Berkenkamp, Gireeja Ranade, Shaz Qadeer, Ashish Kapoor
in Proc. of the International Conference on Robotics and Automation (ICRA), 2018
Recent successes in reinforcement learning have lead to the development of complex controllers for real-world robots. As these robots are deployed in safety-critical applications and interact with humans, it becomes critical to ensure safety in order to avoid causing harm. A first step in this direction is to test the controllers in simulation. To be able to do this, we need to capture what we mean by safety and then efficiently search the space of all behaviors to see if they are safe. In this paper, we present an active-testing framework based on Bayesian Optimization. We specify safety constraints using logic and exploit structure in the problem in order to test the system for adversarial counter examples that violate the safety specifications. These specifications are defined as complex boolean combinations of smooth functions on the trajectories and, unlike reward functions in reinforcement learning, are expressive and impose hard constraints on the system. In our framework, we exploit regularity assumptions on individual functions in form of a Gaussian Process (GP) prior. We combine these into a coherent optimization framework using problem structure. The resulting algorithm is able to provably verify complex safety specifications or alternatively find counter examples. Experimental results indicate that the proposed method is able to find adversarial examples quickly.

Close
```
@inproceedings{Ghosh2018Verifying,
  title = {Verifying Controllers Against Adversarial Examples with Bayesian Optimization},
  booktitle = {Proc. of the International Conference on Robotics and Automation (ICRA)},
  author = {Ghosh, Shromona and Berkenkamp, Felix and Ranade, Gireeja and Qadeer, Shaz and Kapoor, Ashish},
  year = {2018},
  url = {https://arxiv.org/abs/1802.08678}
}
```

2017

Safe Model-based Reinforcement Learning with Stability Guarantees
Felix Berkenkamp, Matteo Turchetta, Angela P. Schoellig, Andreas Krause
in Neural Information Processing Systems (NeurIPS), 2017
Full talk at CoRL 2017
Reinforcement learning is a powerful paradigm for learning optimal policies from experimental data. However, to find optimal policies, most reinforcement learning algorithms explore all possible actions, which may be harmful for real-world systems. As a consequence, learning algorithms are rarely applied on safety-critical systems in the real world. In this paper, we present a learning algorithm that explicitly considers safety, defined in terms of stability guarantees. Specifically, we extend control-theoretic results on Lyapunov stability verification and show how to use statistical models of the dynamics to obtain high-performance control policies with provable stability certificates. Moreover, under additional regularity assumptions in terms of a Gaussian process prior, we prove that one can effectively and safely collect data in order to learn about the dynamics and thus both improve control performance and expand the safe region of the state space. In our experiments, we show how the resulting algorithm can safely optimize a neural network policy on a simulated inverted pendulum, without the pendulum ever falling down.

Close
```
@inproceedings{Berkenkamp2017SafeRL,
  title = {Safe Model-based Reinforcement Learning with Stability Guarantees},
  booktitle = {Neural Information Processing Systems (NeurIPS)},
  author = {Berkenkamp, Felix and Turchetta, Matteo and Schoellig, Angela P. and Krause, Andreas},
  year = {2017},
  url = {https://arxiv.org/abs/1705.08551}
}
```
Constrained Bayesian optimization with Particle Swarms for Safe Adaptive Controller Tuning
Rikky R.P.R. Duivenvoorden, Felix Berkenkamp, Nicolas Carion, Andreas Krause, Angela P. Schoellig
in Proc. of the IFAC (International Federation of Automatic Control) World Congress, 2017
Tuning controller parameters is a recurring and time-consuming problem in control. This is especially true in the field of adaptive control, where good performance is typically only achieved after significant tuning. Recently, it has been shown that constrained Bayesian optimization is a promising approach to automate the tuning process without risking system failures during the optimization process. However, this approach is computationally too expensive for tuning more than a couple of parameters. In this paper, we provide a heuristic in order to efficiently perform constrained Bayesian optimization in high-dimensional parameter spaces by using an adaptive discretization based on particle swarms. We apply the method to the tuning problem of an L1 adaptive controller on a quadrotor vehicle and show that we can reliably and automatically tune parameters in experiments.

Close
```
@inproceedings{Duivenvoorden2017SafeOptSwarm,
  author = {Duivenvoorden, Rikky R.P.R. and Berkenkamp, Felix and Carion, Nicolas and Krause, Andreas and Schoellig, Angela P.},
  booktitle = {Proc. of the IFAC (International Federation of Automatic Control) World Congress},
  pages = {12306--12313},
  title = {Constrained {B}ayesian optimization with Particle Swarms for Safe Adaptive Controller Tuning},
  year = {2017}
}
```
Virtual vs. Real: Trading Off Simulations and Physical Experiments in Reinforcement Learning with Bayesian Optimization
Alonso Marco, Felix Berkenkamp, Philipp Hennig, Angela P. Schoellig, Andreas Krause, Stefan Schaal, Sebastian Trimpe
in Proc. of the International Conference on Robotics and Automation (ICRA), 2017
In practice, the parameters of control policies are often tuned manually. This is time-consuming and frustrating. Reinforcement learning is a promising alternative that aims to automate this process, yet often requires too many experiments to be practical. In this paper, we propose a solution to this problem by exploiting prior knowledge from simulations, which are readily available for most robotic platforms. Specifically, we extend Entropy Search, a Bayesian optimization algorithm that maximizes information gain from each experiment, to the case of multiple information sources. The result is a principled way to automatically combine cheap, but inaccurate information from simulations with expensive and accurate physical experiments in a cost-effective manner. We apply the resulting method to a cart-pole system, which confirms that the algorithm can find good control policies with fewer experiments than standard Bayesian optimization on the physical system only.

Close
```
@inproceedings{Marco17VirtualvsReal,
  author = {Marco, Alonso and Berkenkamp, Felix and Hennig, Philipp and Schoellig, Angela P. and Krause, Andreas and Schaal, Stefan and Trimpe, Sebastian},
  booktitle = {Proc. of the International Conference on Robotics and Automation (ICRA)},
  pages = {1557--1563},
  title = {Virtual vs. Real: Trading Off Simulations and Physical Experiments in Reinforcement Learning with {B}ayesian Optimization},
  year = {2017}
}
```

2016

Bayesian Optimization for Maximum Power Point Tracking in Photovoltaic Power Plants
Hany Abdelrahman, Felix Berkenkamp, Jan Poland, Andreas Krause
in Proc. of the European Control Conference (ECC), 2016
Best Application Paper Award
The amount of power that a photovoltaic (PV) power plant generates depends on the DC voltage that is applied to the PV panels. The relationship between this control input and the generated power is non-convex and has multiple local maxima. Moreover, since the generated power depends on time-varying environmental conditions, such as solar irradiation, the location of the global maximum changes over time. Maximizing the amount of energy that is generated over time is known as the maximum power point tracking (MPPT) problem. Traditional approaches to solve the MPPT problem rely on heuristics and data-based gradient estimates. These methods typically converge to local optima and thus waste energy. Our approach formalizes the MPPT problem as a Bayesian optimization problem. This formalization admits algorithms that can find the maximum power point after only a few evaluations at different input voltages. Specifically, we model the power-voltage curve as a Gaussian process (GP) and use the predictive uncertainty information in this model to choose control inputs that are informative about the location of the maximum. We extend the basic approach by including operational constraints and making it computationally tractable so that the method can be used on real systems. We evaluate our method together with two standard baselines in experiments, which show that our approach outperforms both.

Close
```
@inproceedings{Abdelrahman16Bayesian,
  author = {Abdelrahman, Hany and Berkenkamp, Felix and Poland, Jan and Krause, Andreas},
  booktitle = {Proc. of the European Control Conference (ECC)},
  title = {Bayesian Optimization for Maximum Power Point Tracking in Photovoltaic Power Plants},
  year = {2016},
  pages = {2078--2083}
}
```
Bayesian Optimization with Safety Constraints: Safe and Automatic Parameter Tuning in Robotics
Felix Berkenkamp, Andreas Krause, Angela P. Schoellig
Technical report, ArXiv, 2016
Robotics algorithms typically depend on various parameters, the choice of which significantly affects the robot’s performance. While an initial guess for the parameters may be obtained from dynamic models of the robot, parameters are usually tuned manually on the real system to achieve the best performance. Optimization algorithms, such as Bayesian optimization, have been used to automate this process. However, these methods may evaluate parameters during the optimization process that lead to safety-critical system failures. Recently, a safe Bayesian optimization algorithm, called SafeOpt, has been developed and applied in robotics, which guarantees that the performance of the system never falls below a critical value; that is, safety is defined based on the performance function. However, coupling performance and safety is not desirable in most cases. In this paper, we define separate functions for performance and safety. We present a generalized SafeOpt algorithm that, given an initial safe guess for the parameters, maximizes performance but only evaluates parameters that satisfy all safety constraints with high probability. It achieves this by modeling the underlying and unknown performance and constraint functions as Gaussian processes. We provide a theoretical analysis and demonstrate in experiments on a quadrotor vehicle that the proposed algorithm enables fast, automatic, and safe optimization of tuning parameters. Moreover, we show an extension to context- or environment-dependent, safe optimization in the experiments.

Close
```
@misc{Berkenkamp2016BayesianSafety,
  title = {Bayesian Optimization with Safety Constraints: Safe and Automatic Parameter Tuning in Robotics},
  publisher = {ArXiv},
  author = {Berkenkamp, Felix and Krause, Andreas and Schoellig, Angela P.},
  year = {2016},
  eprint = {1602.04450},
  archiveprefix = {arXiv},
  primaryclass = {cs.RO},
  url = {https://arxiv.org/abs/1602.04450}
}
```
Safe Learning of Regions of Attraction for Uncertain, Nonlinear Systems with Gaussian Processes
Felix Berkenkamp, Riccardo Moriconi, Angela P. Schoellig, Andreas Krause
in Proc. of the IEEE Conference on Decision and Control, 2016
Control theory can provide useful insights into the properties of controlled, dynamic systems. One important property of nonlinear systems is the region of attraction (ROA), which is a safe subset of the state space in which a given controller renders an equilibrium point asymptotically stable. The ROA is typically estimated based on a model of the system. However, since models are only an approximation of the real world, the resulting estimated safe region can contain states outside the ROA of the real system. This is not acceptable in safety-critical applications. In this paper, we consider an approach that learns the ROA from experiments on a real system, without ever leaving the ROA of the real system. This approach enables us to find an estimate of the real ROA, without risking safety-critical failures. Based on regularity assumptions on the model errors in terms of a Gaussian process prior, we determine a region in which an equilibrium point is asymptotically stable with high probability, according to an underlying Lyapunov function. Moreover, we actively select areas of the state space to evaluate in order to expand the ROA. We demonstrate the effectiveness of this method in simulated experiments.

Close
```
@inproceedings{Berkenkamp2016ROA,
  title = {Safe Learning of Regions of Attraction for Uncertain, Nonlinear Systems with {G}aussian Processes},
  booktitle = {Proc. of the IEEE Conference on Decision and Control},
  author = {Berkenkamp, Felix and Moriconi, Riccardo and Schoellig, Angela P. and Krause, Andreas},
  year = {2016},
  pages = {4661--4666},
  url = {\href{https://arxiv.org/abs/1603.04915}{arXiv:1603.04915 [cs.SY]}}
}
```
Safe Exploration in Finite Markov Decision Processes with Gaussian Processes
Matteo Turchetta, Felix Berkenkamp, Andreas Krause
in Neural Information Processing Systems (NeurIPS), 2016
In classical reinforcement learning, when exploring an environment, agents accept arbitrary short term loss for long term gain. This is infeasible for safety critical applications, such as robotics, where even a single unsafe action may cause system failure. In this paper, we address the problem of safely exploring finite Markov decision processes (MDP). We define safety in terms of an, a priori unknown, safety constraint that depends on states and actions. We aim to explore the MDP under this constraint, assuming that the unknown function satisfies regularity conditions expressed via a Gaussian process prior. We develop a novel algorithm for this task and prove that it is able to completely explore the safely reachable part of the MDP without violating the safety constraint. To achieve this, it cautiously explores safe states and actions in order to gain statistical confidence about the safety of unvisited state-action pairs from noisy observations collected while navigating the environment. Moreover, the algorithm explicitly considers reachability when exploring the MDP, ensuring that it does not get stuck in any state with no safe way out. We demonstrate our method on digital terrain models for the task of exploring an unknown map with a rover.

Close
```
@inproceedings{Turchetta2016SafeMDP,
  title = {Safe Exploration in Finite {M}arkov {D}ecision {P}rocesses with {G}aussian Processes},
  booktitle = {Neural Information Processing Systems (NeurIPS)},
  author = {Turchetta, Matteo and Berkenkamp, Felix and Krause, Andreas},
  year = {2016},
  pages = {4305--4313},
  url = {\href{https://arxiv.org/abs/1606.04753}{arXiv:1606.04753 [cs.LG]}}
}
```
Safe Controller Optimization for Quadrotors with Gaussian Processes
Felix Berkenkamp, Angela P. Schoellig, Andreas Krause
in Proc. of the IEEE International Conference on Robotics and Automation (ICRA), 2016
One of the most fundamental problems when designing controllers for dynamic systems is the tuning of the controller parameters. Typically, a model of the system is used to obtain an initial controller, but ultimately the controller parameters must be tuned manually on the real system to achieve the best performance. To avoid this manual tuning step, methods from machine learning, such as Bayesian optimization, have been used. However, as these methods evaluate different controller parameters on the real system, safety-critical system failures may happen. In this paper, we overcome this problem by applying, for the first time, a recently developed safe optimization algorithm, SafeOpt, to the problem of automatic controller parameter tuning. Given an initial, low-performance controller, SafeOpt automatically optimizes the parameters of a control law while guaranteeing safety. It models the underlying performance measure as a Gaussian process and only explores new controller parameters whose performance lies above a safe performance threshold with high probability. Experimental results on a quadrotor vehicle indicate that the proposed method enables fast, automatic, and safe optimization of controller parameters without human intervention.

Close
```
@inproceedings{Berkenkamp2016SafeOpt,
  title = {Safe Controller Optimization for Quadrotors with {G}aussian Processes},
  booktitle = {Proc. of the IEEE International Conference on Robotics and Automation (ICRA)},
  author = {Berkenkamp, Felix and Schoellig, Angela P. and Krause, Andreas},
  year = {2016},
  pages = {493--496},
  url = {\href{https://arxiv.org/abs/1509.01066}{arXiv:1509.01066 [cs.RO]}}
}
```

2015

Safe and Automatic Controller Tuning with Gaussian Processes
Felix Berkenkamp, Angela P. Schoellig, Andreas Krause
in Proc. of the Workshop on Machine Learning in Planning and Control of Robot Motion, IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2015
- Abstract
- Bibtex
One of the most fundamental problems when designing controllers for dynamic systems is the tuning of the controller parameters. Typically, a model of the system is used to design an initial controller, but ultimately, the controller parameters must be tuned manually on the real system to achieve the best performance. To avoid this manual tuning, methods from machine learning, such as Bayesian optimization, have been used. However, as these methods evaluate different controller parameters, safety-critical system failures may happen. We overcome this problem by applying, for the first time, a recently developed safe optimization algorithm, \textscSafeOpt, to the problem of automatic controller parameter tuning. Given an initial, low-performance controller, \textscSafeOpt automatically optimizes the parameters of a control law while guaranteeing system safety and stability. It achieves this by modeling the underlying performance measure as a Gaussian process and only exploring new controller parameters whose performance lies above a safe performance threshold with high probability. Experimental results on a quadrotor vehicle indicate that the proposed method enables fast, automatic, and safe optimization of controller parameters without human intervention.

Close
```
@inproceedings{Berkenkamp2015SafeTuning,
  title = {Safe and Automatic Controller Tuning with {{Gaussian}} Processes},
  booktitle = {Proc. of the Workshop on Machine Learning in Planning and Control of Robot Motion, IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
  author = {Berkenkamp, Felix and Schoellig, Angela P. and Krause, Andreas},
  year = {2015}
}
```
Safe and Robust Learning Control with Gaussian Processes
Felix Berkenkamp, Angela P. Schoellig
in Proc. of the European Control Conference (ECC), 2015
This paper introduces a learning-based robust control algorithm that provides robust stability and performance guarantees during learning. The approach uses Gaussian process (GP) regression based on data gathered during operation to update an initial model of the system and to gradually decrease the uncertainty related to this model. Embedding this data-based update scheme in a robust control framework guarantees stability during the learning process. Traditional robust control approaches have not considered online adaptation of the model and its uncertainty before. As a result, their controllers do not improve performance during operation. Typical machine learning algorithms that have achieved similar high-performance behavior by adapting the model and controller online do not provide the guarantees presented in this paper. In particular, this paper considers a stabilization task, linearizes the nonlinear, GP-based model around a desired operating point, and solves a convex optimization problem to obtain a linear robust controller. The resulting performance improvements due to the learning-based controller are demonstrated in experiments on a quadrotor vehicle.

Close
```
@inproceedings{Berkenkamp2015Robust,
  title = {Safe and Robust Learning Control with {{Gaussian}} Processes},
  booktitle = {Proc. of the European Control Conference (ECC)},
  author = {Berkenkamp, Felix and Schoellig, Angela P.},
  year = {2015},
  pages = {2501--2506}
}
```

2014

Learning-based Robust Control: Guaranteeing Stability while Improving Performance
Felix Berkenkamp, Angela P. Schoellig
in Proc. of the Workshop on Machine Learning in Planning and Control of Robot Motion, IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2014
- Abstract
- Bibtex
To control dynamic systems, modern control theory relies on accurate mathematical models that describe the system behavior. Machine learning methods have proven to be an effective method to compensate for initial errors in these models and to achieve high-performance maneuvers by adapting the system model and control online. However, these methods usually do not guarantee stability during the learning process. On the other hand, the control community has traditionally accounted for model uncertainties by designing robust controllers. Robust controllers use a mathematical description of the uncertainty in the dynamic system derived prior to operation and guarantee robust stability for all uncertainties. Unlike machine learning methods, robust control does not improve the control performance by adapting the model online. This paper combines machine learning and robust control theory for the first time with the goal of improving control performance while guaranteeing stability. Data gathered during operation is used to reduce the uncertainty in the model and to learn systematic errors. Specifically, a nonlinear, nonparametric model of the unknown dynamics is learned with a Gaussian Process. This model is used for the computation of a linear robust controller, which guarantees stability around an operating point for all uncertainties. As a result, the robust controller improves its performance online while guaranteeing robust stability. A simulation example illustrates the performance improvements due to the learning-based robust controller.

Close
```
@inproceedings{Berkenkamp2014LearningBased,
  title = {Learning-based Robust Control: Guaranteeing Stability while Improving Performance},
  booktitle = {Proc. of the Workshop on Machine Learning in Planning and Control of Robot Motion, IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
  author = {Berkenkamp, Felix and Schoellig, Angela P.},
  year = {2014}
}
```
Hybrid Model Predictive Control of Stratified Thermal Storages in Buildings
Felix Berkenkamp, Markus Gwerder
Energy and Buildings, vol. 84, 2014
In this paper a generic model predictive control (MPC) algorithm for the management of stratified thermal storage tanks in buildings is proposed that can be used independently of the building’s heat/cold generation, consumption and consumption control. The main components of the considered storage management are the short term load forecasting (STLF) of the heat/cold consumer(s) using weather forecast, the MPC algorithm using a bilinear dynamic model of the stratified storage and operating modes of the heat/cold generator(s), modeled as static operating points. The MPC algorithm chooses between these operating modes to satisfy the predicted cold demand with minimal costs. By considering the generator(s) in terms of operating modes the bilinearity in the storage model is resolved which leads to a hybrid MPC problem. For computational efficiency this problem is approximated by an iterative algorithm that converges to a close to optimal solution. Simulation results suggest that the approach is well suited for the use in buildings with a limited number of heat/cold generators. Additionally, the approach is promising for practical use because of its independence from the heat/cold consumer’s control and because it requires limited information and instrumentation on the plant, i.e. low costs for control equipment.

Close
```
@article{Berkenkamp2014Hybrid,
  title = {Hybrid Model Predictive Control of Stratified Thermal Storages in Buildings},
  volume = {84},
  issn = {0378-7788},
  timestamp = {2015-09-09T07:19:30Z},
  journal = {Energy and Buildings},
  author = {Berkenkamp, Felix and Gwerder, Markus},
  year = {2014},
  pages = {233--240}
}
```