POPULARITY
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Latent Adversarial Training, published by Adam Jermyn on June 29, 2022 on The AI Alignment Forum. The Problem We'd like to train models to be robustly safe, even in environments that may fall well outside of the training distribution. Unfortunately all we get to work with is the training distribution, which makes ensuring robust generalization difficult. Deception is an example of this concern. Models may be well-behaved in the training environment because they realize it is a training environment, but behave poorly in deployment because they notice the distributional shift. An Approach Background It would be great if we could generate scenarios where the model behaves badly. For instance if the model is deceptive we'd like to generate inputs to trick it into thinking it's been deployed so that it reveals that deception. Adversarial Training (AT), and Relaxed Adversarial Training (RAT), are approaches to this problem. Very briefly, in AT, we search for inputs that generate large loss, while in RAT we search for distributions of inputs or other more general evidence that the model is capable of generating large loss. The approach I want to explore, called Latent Adversarial Training (LAT), is a form of RAT. Concept Suppose we have two models, the Agent and the Surgeon. The Agent is the model we're trying to train, and the Surgeon is going to be its adversary. The training proceeds as follows: The Agent processes input x from the environment, producing latent state s(x) and output y(s). The Surgeon takes inputs (x,s,y) and searches for a modification s→s+δs such that y(s+δs) does poorly on the loss function L(y). We update the Agent based on L(y(s+δs)), treating the Surgeon's modifications as constants. The idea is that the Surgeon gets to meddle with the latent state of the Agent with full knowledge of what the Agent was planning on doing, and we train the Agent to be robust against this kind of attack. Implementation Concretely, suppose the Agent is a neural network. The latent state s consists of the activations of all the neurons and the Surgeon gets to specify modifications like “this neuron gets +5”. How does the Surgeon decide what to do? There are a number of options. It could examine the gradient of the Agent's loss with respect to each activation, then push on the activations that give it the most loss. Or it could randomly generate its modifications and use some other optimization process like the Metropolis-Hastings algorithm to identify loss-maximizing options. In general I'm imagining the Surgeon as more of a fuzzer with access to program memory and less of a complicated model. In principle it could be some complicated neural network, but then I'd want it to be simpler than the Agent, possibly of bounded size, so I could be quite sure I understood what it was doing. The last thing I want is a misaligned Surgeon trying to achieve its own mesa-objective by programming the Agent to be Surgeon-aligned and deceptive against humans. I suspect it's possible to have the Surgeon pick useful modifications that generate meaningful changes in the loss without having the Surgeon's complexity scale with that of the Agent, but that's just an intuition, and if it fails then this approach looks a lot less appealing. Core Ideas The key difference between LAT and Adversarial Training is that the Surgeon gets to directly manipulate the Agent's inner state, which makes the Surgeon's job much easier than in the ordinary adversarial training setup. Take the classic example of an Agent that only defects if it sees a valid solution to a hard cryptographic problem: the Surgeon doesn't need to identify an input that causes the Agent to defect, it only needs to identify that somewhere in the middle of the Agent there is a switch which, if flipped, causes the Agent to defect and gene...
In season five episode three we chat about take a listener question about Five Papers for Mike Tipping, take a listener question on AIAI and chat with Eoin O'Mahony of Uber Here are Neil's five papers. What are yours? Stochastic variational inference by Hoffman, Wang, Blei and Paisley http://arxiv.org/abs/1206.7051 A way of doing approximate inference for probabilistic models with potentially billions of data ... need I say more? Austerity in MCMC Land: Cutting the Metropolis Hastings by Korattikara, Chen and Welling http://arxiv.org/abs/1304.5299 Oh ... I do need to say more ... because these three are at it as well but from the sampling perspective. Probabilistic models for big data ... an idea so important it needed to be in the list twice. Practical Bayesian Optimization of Machine Learning Algorithms by Snoek, Larochelle and Adams http://arxiv.org/abs/1206.2944 This paper represents the rise in probabilistic numerics, I could also have chosen papers by Osborne, Hennig or others. There are too many papers out there already. Definitely an exciting area, be it optimisation, integration, differential equations. I chose this paper because it seems to have blown the field open to a wider audience, focussing as it did on deep learning as an application, so it let's me capture both an area of developing interest and an area that hits the national news. Kernel Bayes Rule by Fukumizu, Song, Gretton http://arxiv.org/abs/1009.5736 One of the great things about ML is how we have different (and competing) philosophies operating under the same roof. But because we still talk to each other (and sometimes even listen to each other) these ideas can merge to create new and interesting things. Kernel Bayes Rule makes the list. http://www.cs.toronto.edu/~hinton/absps/imagenet.pdf An obvious choice, but you don't leave the Beatles off lists of great bands just because they are an obvious choice.
Fakultät für Mathematik, Informatik und Statistik - Digitale Hochschulschriften der LMU - Teil 02/02
Predicting the epidemiological effects of new vaccination programmes through mathematical-statistical transmission modelling is of increasing importance for the German Standing Committee on Vaccination. Such models commonly capture large populations utilizing a compartmental structure with its dynamics being governed by a system of ordinary differential equations (ODEs). Unfortunately, these ODE-based models are generally computationally expensive to solve, which poses a challenge for any statistical procedure inferring corresponding model parameters from disease surveillance data. Thus, in practice parameters are often fixed based on epidemiological knowledge hence ignoring uncertainty. A Bayesian inference framework incorporating this prior knowledge promises to be a more suitable approach allowing for additional parameter flexibility. This thesis is concerned with statistical methods for performing Bayesian inference of ODE-based models. A posterior approximation approach based on a Gaussian distribution around the posterior mode through its respective observed Fisher information is presented. By employing a newly proposed method for adjusting the likelihood impact in terms of using a power posterior, the approximation procedure is able to account for the residual autocorrelation in the data given the model. As an alternative to this approximation approach, an adaptive Metropolis-Hastings algorithm is described which is geared towards an efficient posterior sampling in the case of a high-dimensional parameter space and considerable parameter collinearities. In order to identify relevant model components, Bayesian model selection criteria based on the marginal likelihood of the data are applied. The estimation of the marginal likelihood for each considered model is performed via a newly proposed approach which utilizes the available posterior sample obtained from the preceding Metropolis-Hastings algorithm. Furthermore, the thesis contains an application of the presented methods by predicting the epidemiological effects of introducing rotavirus childhood vaccination in Germany. Again, an ODE-based compartmental model accounting for the most relevant transmission aspects of rotavirus is presented. After extending the model with vaccination mechanisms, it becomes possible to estimate the rotavirus vaccine effectiveness through routinely collected surveillance data. By employing the Bayesian framework, model predictions on the future epidemiological development assuming a high vaccination coverage rate incorporate uncertainty regarding both model structure and parameters. The forecast suggests that routine vaccination may cause a rotavirus incidence increase among older children and elderly, but drastically reduces the disease burden among the target group of young children, even beyond the expected direct vaccination effect by means of herd protection. Altogether, this thesis provides a statistical perspective on the modelling of routine vaccination effects in order to assist decision making under uncertainty. The presented methodology is thereby easily applicable to other infectious diseases such as influenza.
In a fun historical journey, Katie and Ben explore the history of the Manhattan Project, discuss the difficulties in modeling particle movement in atomic bombs with only punch-card computers and ingenuity, and eventually come to present-day uses of the Metropolis-Hastings algorithm... mentioning Solitaire along the way.
StatLearn 2013 - Workshop on "Challenging problems in Statistical Learning"
When an unbiased estimator of the likelihood is used within an Markov chain Monte Carlo (MCMC) scheme, it is necessary to tradeoff the number of samples used against the computing time. Many samples for the estimator will result in a MCMC scheme which has similar properties to the case where the likelihood is exactly known but will be expensive. Few samples for the construction of the estimator will result in faster estimation but at the expense of slower mixing of the Markov chain.We explore the relationship between the number of samples and the efficiency of the resulting MCMC estimates. Under specific assumptions about the likelihood estimator, we are able to provide guidelines on the number of samples to select for a general Metropolis-Hastings proposal.We provide theory which justifies the use of these assumptions for a large class of models. On a number of examples, we find that the assumptions on the likelihood estimator are accurate. This is joint work with Mike Pitt (University of Warwick) and Robert Kohn (UNSW).
Mathematik, Informatik und Statistik - Open Access LMU - Teil 02/03
In this paper we present and evaluate a Gibbs sampler for a Poisson regression model including spatial e ects. The approach is based on Frühwirth-Schnatter and Wagner (2004b) who show that by data augmentation using the introduction of two sequences of latent variables a Poisson regression model can be transformed into an approximate normal linear model. We show how this methodology can be extended to spatial Poisson regression models and give details of the resulting Gibbs sampler. In particular, the influence of model parameterisation and di erent update strategies on the mixing of the MCMC chains is discussed. The developed Gibbs samplers are analysed in two simulation studies and applied to model the expected number of claims for policyholders of a German car insurance company. The mixing of the Gibbs samplers depends crucially on the model parameterisation and the update schemes. The best mixing is achieved when collapsed algorithms are used, reasonable low autocorrelations for the spatial e ects are obtained in this case. For the regression e ects however, autocorrelations are rather high, especially for data with very low heterogeneity. For comparison a single component Metropolis Hastings algorithms is applied which displays very good mixing for all components. Although the Metropolis Hastings sampler requires a higher computational e ort, it outperforms the Gibbs samplers which would have to be run considerably longer in order to obtain the same precision of the parameters.
Mathematik, Informatik und Statistik - Open Access LMU - Teil 02/03
In this paper we present a Gibbs sampler for a Poisson model including spatial effects. Frühwirth-Schnatter und Wagner (2004b) show that by data augmentation via the introduction of two sequences of latent variables a Poisson regression model can be transformed into a normal linear model. We show how this methodology can be extended to spatial Poisson regression models and give details of the resulting Gibbs sampler. In particular, the influence of model parameterisation and different update strategies on the mixing of the MCMC chains are discussed. The developed Gibbs samplers are analysed in two simulation studies and appliedto model the expected number of claims for policyholders of a German car insurance data set. In general, both large and small simulated spatial effects are estimated accurately by the Gibbs samplers and reasonable low autocorrelations are obtained when the data variability is rather large. However, for data with very low heterogeneity, the autocorrelations resulting from the Gibbs samplers are very high, withdrawing the computational advantage over a Metropolis Hastings independence sampler which exhibits very low autocorrelations in all settings.
Mathematik, Informatik und Statistik - Open Access LMU - Teil 01/03
Dynamic models extend state space models to non-normal observations. This paper suggests a specific hybrid Metropolis-Hastings algorithm as a simple, yet flexible and efficient tool for Bayesian inference via Markov chain Monte Carlo in dynamic models. Hastings proposals from the (conditional) prior distribution of the unknown, time-varying parameters are used to update the corresponding full conditional distributions. Several blocking strategies are discussed to ensure good mixing and convergence properties of the simulated Markov chain. It is also shown that the proposed method is easily extended to robust transition models using mixtures of normals. The applicability is illustrated with an analysis of a binomial and a binary time series, known in the literature.
Mathematik, Informatik und Statistik - Open Access LMU - Teil 01/03
Dynamic generalized linear mixed models are proposed as a regression tool for nonnormal longitudinal data. This framework is an interesting combination of dynamic models, by other name state space models, and mixed models, also known as random effect models. The main feature is, that both time- and unit-specific parameters are allowed, which is especially attractive if a considerable number of units is observed over a longer period. Statistical inference is done by means of Markov chain Monte Carlo techniques in a full Bayesian setting. The algorithm is based on iterative updating using full conditionals. Due to the hierarchical structure of the model and the extensive use of Metropolis-Hastings steps for updating this algorithm mainly evaluates (log-)likelihoods in multivariate normal distributed proposals. It is derivative-free and covers a wide range of different models, including dynamic and mixed models, the latter with slight modifications. The methodology is illustrated through an analysis of artificial binary data and multicategorical business test data.