POPULARITY
We present SplattingAvatar, a hybrid 3D representation of photorealistic human avatars with Gaussian Splatting embedded on a triangle mesh, which renders over 300 FPS on a modern GPU and 30 FPS on a mobile device. We disentangle the motion and appearance of a virtual human with explicit mesh geometry and implicit appearance modeling with Gaussian Splatting. The Gaussians are defined by barycentric coordinates and displacement on a triangle mesh as Phong surfaces. We extend lifted optimization to simultaneously optimize the parameters of the Gaussians while walking on the triangle mesh. SplattingAvatar is a hybrid representation of virtual humans where the mesh represents low-frequency motion and surface deformation, while the Gaussians take over the high-frequency geometry and detailed appearance. Unlike existing deformation methods that rely on an MLP-based linear blend skinning (LBS) field for motion, we control the rotation and translation of the Gaussians directly by mesh, which empowers its compatibility with various animation techniques, e.g., skeletal animation, blend shapes, and mesh editing. Trainable from monocular videos for both full-body and head avatars, SplattingAvatar shows state-of-the-art rendering quality across multiple datasets. 2024: Zhijing Shao, Zhaolong Wang, Zhuang Li, Duotun Wang, Xiangru Lin, Yu Zhang, Mingming Fan, Zeyu Wang https://arxiv.org/pdf/2403.05087v1.pdf
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: When is correlation transitive?, published by Ege Erdil on June 23, 2023 on LessWrong. It's a well-known property of correlation that it's not transitive in general. If X,Y,Z are three real-valued random variables such that ρ(X,Y)>0 and ρ(Y,Z)>0, it doesn't have to be the case that ρ(X,Z)>0. Nevertheless, there are some circumstances under which correlation is transitive. I will focus on two such cases in this post. Primer: correlation as an inner product For what follows, some background knowledge is necessary that we can regard correlations of real-valued random variables with finite second moments as inner products in an appropriate Hilbert space. Specifically, if X,Y are two such random variables with zero mean and unit standard deviation, which is a simplification we can always make as correlation is invariant under translation and scalar multiplication, then we can compute ρ(X,Y)=cov(X,Y)σXσY=E[XY]−E[X]E[Y]=E[XY] The pairing (A,B)E[AB] defines an inner product on the space of random variables with finite second moments where two random variables are considered equivalent if they are equal with probability 1 (almost surely). The properties that we expect out of an inner product are easy to check: the pairing is obviously bilinear and positive definite. Furthermore, it turns out this inner product turns the space of random variables with finite second moments into a Hilbert space: the vector space turns out to be complete under the induced norm ∥X∥=√E[X2]. Roughly speaking, this means that we can take orthogonal projections onto closed subspaces with impunity. Now that we have this framework, we can move on to the main results of this post. Correlation is transitive when the correlations are sufficiently strong I'll first prove the following: Claim 1: If ρ(X,Y)=a and ρ(Y,Z)=b, then ab−√(1−a2)(1−b2)≤ρ(X,Z)≤ab+√(1−a2)(1−b2) Moreover, these bounds are tight: for any a,b, there is a combination X,Y,Z for which we can make either the right or the left inequality into an equality. Proof We can assume X,Y,Z have mean zero and unit variance without loss of generality. Taking orthogonal projections of X,Z onto the one-dimensional subspace spanned by Y, we can write X=aY+√1−a2HXYZ=bY+√1−b2HZY where E[YHXY]=E[YHZY]=0 and the random variables HXY,HZY have mean zero and variance 1. Taking inner products gives E[XZ]=ab+√(1−a2)(1−b2)E[HXYHZY] Using the Cauchy-Schwarz inequality for our inner product finishes the proof: |E[HXYHZY]|≤∥HXY∥∥HZY∥=1. For the existence proof, let Y be an arbitrary random variable with mean zero and unit variance and pick HXY,HYZ to be perfectly correlated or perfectly anti-correlated standard Gaussians that are uncorrelated with Y. Interpretation When a,b are large and positive, the lower bound ab−√(1−a2)(1−b2) is also positive, and so we have a guaranteed positive correlation between X and Z. One way to simplify this is to make it single-dimensional by assuming a=b. In this case, the lower bound is 2a2−1. If we want a guaranteed positive correlation between X and Z, this means the correlations ρ(X,Y)=ρ(Y,Z)=a have to satisfy a>1/√2≈0.7. This condition is quite strict, and we might wonder if some transitivity of correlation can be recovered in the absence of such strong correlations between X,Y and Y,Z. It turns out the answer is yes, at least if we assume the random variables are in some sense "generic". Correlation is transitive on the average It turns out that in a suitable sense, when X,Y and Y,Z are positively correlated, there is a tendency for X,Z to also be positively correlated, even though per Claim 1 we can't deduce that they must be positively correlated. The precise version of this claim is as follows: Claim 2: Let X,Y,Z be vectors independently and uniformly distributed on the n-dimensional unit sphere Sn⊂Rn+1, and let −1≤a,b≤1 be two re...
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: When is correlation transitive?, published by Ege Erdil on June 23, 2023 on LessWrong. It's a well-known property of correlation that it's not transitive in general. If X,Y,Z are three real-valued random variables such that ρ(X,Y)>0 and ρ(Y,Z)>0, it doesn't have to be the case that ρ(X,Z)>0. Nevertheless, there are some circumstances under which correlation is transitive. I will focus on two such cases in this post. Primer: correlation as an inner product For what follows, some background knowledge is necessary that we can regard correlations of real-valued random variables with finite second moments as inner products in an appropriate Hilbert space. Specifically, if X,Y are two such random variables with zero mean and unit standard deviation, which is a simplification we can always make as correlation is invariant under translation and scalar multiplication, then we can compute ρ(X,Y)=cov(X,Y)σXσY=E[XY]−E[X]E[Y]=E[XY] The pairing (A,B)E[AB] defines an inner product on the space of random variables with finite second moments where two random variables are considered equivalent if they are equal with probability 1 (almost surely). The properties that we expect out of an inner product are easy to check: the pairing is obviously bilinear and positive definite. Furthermore, it turns out this inner product turns the space of random variables with finite second moments into a Hilbert space: the vector space turns out to be complete under the induced norm ∥X∥=√E[X2]. Roughly speaking, this means that we can take orthogonal projections onto closed subspaces with impunity. Now that we have this framework, we can move on to the main results of this post. Correlation is transitive when the correlations are sufficiently strong I'll first prove the following: Claim 1: If ρ(X,Y)=a and ρ(Y,Z)=b, then ab−√(1−a2)(1−b2)≤ρ(X,Z)≤ab+√(1−a2)(1−b2) Moreover, these bounds are tight: for any a,b, there is a combination X,Y,Z for which we can make either the right or the left inequality into an equality. Proof We can assume X,Y,Z have mean zero and unit variance without loss of generality. Taking orthogonal projections of X,Z onto the one-dimensional subspace spanned by Y, we can write X=aY+√1−a2HXYZ=bY+√1−b2HZY where E[YHXY]=E[YHZY]=0 and the random variables HXY,HZY have mean zero and variance 1. Taking inner products gives E[XZ]=ab+√(1−a2)(1−b2)E[HXYHZY] Using the Cauchy-Schwarz inequality for our inner product finishes the proof: |E[HXYHZY]|≤∥HXY∥∥HZY∥=1. For the existence proof, let Y be an arbitrary random variable with mean zero and unit variance and pick HXY,HYZ to be perfectly correlated or perfectly anti-correlated standard Gaussians that are uncorrelated with Y. Interpretation When a,b are large and positive, the lower bound ab−√(1−a2)(1−b2) is also positive, and so we have a guaranteed positive correlation between X and Z. One way to simplify this is to make it single-dimensional by assuming a=b. In this case, the lower bound is 2a2−1. If we want a guaranteed positive correlation between X and Z, this means the correlations ρ(X,Y)=ρ(Y,Z)=a have to satisfy a>1/√2≈0.7. This condition is quite strict, and we might wonder if some transitivity of correlation can be recovered in the absence of such strong correlations between X,Y and Y,Z. It turns out the answer is yes, at least if we assume the random variables are in some sense "generic". Correlation is transitive on the average It turns out that in a suitable sense, when X,Y and Y,Z are positively correlated, there is a tendency for X,Z to also be positively correlated, even though per Claim 1 we can't deduce that they must be positively correlated. The precise version of this claim is as follows: Claim 2: Let X,Y,Z be vectors independently and uniformly distributed on the n-dimensional unit sphere Sn⊂Rn+1, and let −1≤a,b≤1 be two re...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Hessian and Basin volume, published by Vivek Hebbar on July 10, 2022 on The AI Alignment Forum. Thanks to Thomas Kwa for the question which prompted this post. Note: This is mostly a primer / introductory reference, not a research post. However, the details should be interesting even to those familiar with the area. When discussing “broad basins” in the loss landscape of a DNN, the Hessian of loss is often referenced. This post explains a simple theoretical approximation of basin volume which uses the Hessian of loss. Suppose our minimum has loss=0. Define the basin as the region of parameter space draining to our minimum where loss < threshold T. Simplest model If all eigenvalues of the Hessian are positive and non trivial, we can approximate the loss as a paraboloid centered on our minimum: The vertical axis is loss, and the horizontal plane is parameter space. The shape of the basin in parameter space is the shadow of this paraboloid, which is an ellipsoid. The principal directions of curvature of the paraboloid are given by the eigenvectors of the Hessian. The curvatures (second derivative) in each of those directions is given by the corresponding eigenvalue. Radii of the ellipsoid: If we start at our minimum and walk away in a principal direction, the loss as a function of distance traveled is L(x)=12λix2, where λi is the Hessian eigenvalue for that direction. So given our loss threshold T, we will hit that threshold at a distance of x=√2Tλi. This is the radius of the loss-basin ellipsoid in that direction. The volume of the ellipsoid is Vbasin=Vn∏i√2Tλi, where the constant Vn is the volume of the unit n-ball in n dimensions. Since the product of the eigenvalues is the determinant of the Hessian, we can write this as: So the basin volume is inversely proportional to the square root of the determinant of the Hessian. Everything in the numerator is a constant, so only the determinant of the Hessian matters in this model. The problem with this model is that the determinant of the Hessian is usually zero, due to zero eigenvalues. Fixing the model If we don't include a regularization term in the loss, the basin as we defined it earlier can actually be infinitely big (it's not just a problem with the paraboloid model). However, we don't really care about volume that is so far from the origin that it is never reached. A somewhat principled way to fix the model is to look at volume weighted by the initialization distribution. This is easiest to work with if the initialization is Gaussian. To make the math tractable, we can replace our ellipsoid with a "fuzzy ellipsoid" -- i.e. a multivariate Gaussian. Now we just have to integrate the product of two Gaussians, which should be easy. There are also somewhat principled reasons for using a "fuzzy ellipsoid", which I won't explain here. However, this is only somewhat principled; if you think about it further, it starts to become unclear: Should we use the initialization Gaussian, or one based on the expected final L2 norm? What about cases where the norm peaks in the middle of training, and is smaller at the start and finish? If we have an L2 regularization term in the loss, then the infinite volume problem usually goes away; the L2 term makes all the eigenvalues positive, so the formula is fine. If we have weight decay, we can interpret this as L2 regularization and add it to the loss. For a relatively simple approximation, I recommend the formula: Where: Loss is the unregularized loss λ is the amount of weight decay (or L2 regularization 12λθ2) c=k/σ2, where σ is the standard deviation of the initialization Gaussian, and k is a constant on the order of unity. I have not calculated the theoretically most appropriate value of k. For a crude model, k=1 is probably good enough. T is the loss threshold. If you really care about...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: [MLSN #3]: NeurIPS Safety Paper Roundup, published by Dan Hendrycks on March 8, 2022 on The AI Alignment Forum. As part of a larger community building effort, I am writing a safety newsletter which is designed to cover empirical safety research and be palatable to the broader machine learning research community. You can subscribe here or follow the newsletter on twitter here. Welcome to the 3rd issue of the ML Safety Newsletter. In this edition, we cover: NeurIPS ML safety papers experiments showing that Transformers have no edge for adversarial robustness and anomaly detection a new method leveraging fractals to improve various reliability metrics a preference learning benchmark ... and much more. Robustness Are Transformers More Robust Than CNNs? This paper evaluates the distribution shift robustness and adversarial robustness of ConvNets and Vision Transformers (ViTs). Compared with previous papers, its evaluations are more fair and careful. After controlling for data augmentation, they find that Transformers exhibit greater distribution shift robustness. For adversarial robustness, findings are more nuanced. First, ViTs are far more difficult to adversarially train. When successfully adversarially trained, ViTs are more robust than off-the-shelf ConvNets. However, ViTs' higher adversarial robustness is explained by their smooth activation function, the GELU. If ConvNets use GELUs, they obtain similar adversarial robustness. Consequently, Vision Transformers are more robust than ConvNets to distribution shift, but they are not intrinsically more adversarially robust. Fractals Improve Robustness (+ Other Reliability Metrics) PixMix improves both robustness (corruptions, adversaries, prediction consistency) and uncertainty estimation (calibration, anomaly detection). PixMix is a data augmentation strategy that mixes training examples with fractals or feature visualizations; models then learn to classify these augmented examples. Whereas previous methods sacrifice performance on some reliability axes for improvements on others, this is the first to have no major reliability tradeoffs and is near Pareto-optimal. Other Recent Robustness Papers A new adversarial robustness state-of-the-art by finding a better way to leverage data augmentations. A highly effective gradient-based adversarial attack for text-based models. A new benchmark for detecting adversarial text attacks. Adversarially attacking language models with bidirectional and large-scale unidirectional language models. First works on certified robustness under distribution shift: [1], [2], [3]. A dataset where in-distribution accuracy is negatively correlated with out-of-distribution robustness. Improving performance in tail events by augmenting prediction pipelines with retrieval. A set of new, more realistic 3D common corruptions. Multimodality can dramatically improve robustness. Monitoring Synthesizing Outlier for Out-of-Distribution Detection The authors model the hidden feature representations of in-distribution examples as class-conditional Gaussians, and they sample virtual outliers from the low-likelihood region. The model is trained to separate in-distribution examples from virtual outliers. A path towards better out-of-distribution (OOD) detection is through generating diverse and unusual examples. As a step in that direction, this paper proposes to generate hidden representations or “virtual” examples that are outliers, rather than generate raw inputs that are outliers. The method is evaluated on many object detection and classification tasks, and it works well. It is not evaluated on the more difficult setting where anomalies are held-out classes from similar data generating processes. If the authors evaluated their CIFAR-10 model's ability to detect CIFAR-100 anomalies, then we would have more of...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: How long does it take to become Gaussian? , published by Maxwell Peterson on the AI Alignment Forum. The central limit theorems all say that if you convolve stuff enough, and that stuff is sufficiently nice, the result will be a Gaussian distribution. How much is enough, and how nice is sufficient? Identically-distributed distributions converge quickly For many distributions d , the repeated convolution d ∗ d ∗ ⋯ ∗ d looks Gaussian. The number of convolutions you need to look Gaussian depends on the shape of d . This is the easiest variant of the central limit theorem: identically-distributed distributions. The uniform distribution converges real quick: The result of uniform(1, 2) uniform(1, 2) ... uniform(1, 2), with 30 distributions total. This plot is an animated version of the plots in the previous post. The black curve is the Gaussian distribution with the same mean and variance as the red distribution. The more similar red is to black, the more Gaussian the result of the convolutions is. The numbers on the x axis are increasing because the mean of f ∗ g is the sum of the means of f and g , so if we start with positive means, repeated convolutions shoot off into higher numbers. Similar for the variance - notice how the width starts as the difference between 1 and 2, but ends with differences in the tens. You can keep the location stationary under convolution by starting with a distribution centered at 0, but you can't keep the variance from increasing, because you can't have a variance of 0 (except in the limiting case). Here's a more skewed distribution: beta(50, 1). beta(50, 1) is the probability distribution that represents knowing that a lake has bass and carp, but not how many of each, and then catching 49 bass in a row. It's fairly skewed! This time, after 30 convolutions, we're not quite Gaussian - the skew is still hanging around. But for a lot of real applications, I'd call the result "Gaussian enough". beta(50, 1) convolved with itself 30 times. A similar skew in the opposite direction, from the exponential distribution: exp(20) I was surprised to see the exponential distribution go into a Gaussian, because Wikipedia says that an exponential distribution with parameter θ goes into a gamma distribution with parameters gamma( n θ ) when you convolve it with itself n times. But it turns out gamma( n θ ) looks more and more Gaussian as n goes up. How about our ugly bimodal-uniform distribution? It starts out rough and jagged, but already by 30 convolutions it's Gaussian. And here's what it looks like to start with a Gaussian: The red curve starts out the exact same as the black curve, then nothing happens because Gaussians stay Gaussian under self-convolution. An easier way to measure Gaussianness (Gaussianity?) We're going to want to look at many more distributions under n convolutions and see how close they are to Gaussian, and these animations take a lot of space. We need a more compact way. So let's measure the kurtosis of the distributions, instead. The kurtosis is the fourth moment of a probability distribution; it describes the shape of the tails. All Gaussian distributions have kurtosis 3. There are other distributions with kurtosis 3, too, but they're not likely to be the result of a series of convolutions. So to check how close a distribution is to Gaussian, we can just check how far from 3 its kurtosis is. We can chart the kurtosis as a function of how many convolutions have been done so far, for each of the five distributions above: We see our conclusions from the animations repeated: the exp(20), being very skewed, is the furthest from Gaussian after 30 convolutions. beta(50, 1), also skewed, is also relatively far (though close in absolute terms). The bimodal and uniform got to Gaussian much faster, in the animations, and we see that refle...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Recent Progress in the Theory of Neural Networks, published by interstice on the AI Alignment Forum. It's common wisdom that neural networks are basically "matrix multiplications that nobody understands" , impenetrable to theoretical analysis, which have achieved great results largely through trial-and-error. While this may have been true in the past, recently there has been significant progress towards developing a theoretical understanding of neural networks. Most notably, we have obtained an arguably complete understanding of network initialization and training dynamics in a certain infinite-width limit. There has also been some progress towards understanding their generalization behavior. In this post I will review some of this recent progress and discuss the potential relevance to AI alignment. Infinite Width Nets: Initialization The most exciting recent developments in the theory of neural networks have focused the infinite-width limit. We consider neural networks where the number of neurons in all hidden layers are increased to infinity. Typically we consider networks with a Gaussian-initialized weights, and scale the variance at initialization as 1 √ H , where H is the number of hidden units in the preceding layer(this is needed to avoid inputs blowing up, and is also the initialization scheme usually used in real networks). In this limit, we have obtained an essentially complete understanding of both behavior at initialization and training dynamics[1]. (Those with limited interest/knowledge of math may wish to "Significance and Limitations" below). We've actually had a pretty good understanding of the behavior of infinite-width neural networks at initialization for a while, since the work of Radford Neal(1994). He proved that in this limit, fully-connected neural networks with Gaussian-distributed weights and biases limit to what are known as Gaussian processes. Gaussian processes can be thought of the generalization of Gaussian distributions from finite-dimensional spaces to spaces of functions. Neal's paper provides a very clear derivation of this behavior, but I'll explain it briefly here. A neural network with m real-valued inputs and 1 real valued outputs defines a function from R m to R . Thus, a distribution over the weights and biases of such a neural network -- such as the standard Gaussian initialization -- implicitly defines a distribution over functions on R m . Neal's paper shows that, for fully-connected neural networks, this distribution limits to a Gaussian process. What is a Gaussian process? It's a distribution over functions f with the property that, for any finite collection of points X 1 X N , the values f X 1 f X N have a joint distribution which is a multivariate Gaussian. Any Gaussian process is uniquely defined by its mean and covariance functions, μ x and C x x ′ . For points X 1 X N , the distribution of f X 1 f X N will have mean μ X 1 μ X N with covariance matrix C i j C X i X j The argument that fully-connected neural networks limit to Gaussian processes in the infinite-width limit is pretty simple. Consider a three-layer neural network, with an activation function σ in the second layer and a single linear output unit. This network can be defined by the equation y ∑ V k σ ∑ W k j X j . At initialization, V and W are filled with independent Gaussians, with variance of V scaled as the inverse square-root of the number of hidden-units. Each hidden unit h k will has a value for each of the inputs X i h k X i σ ∑ W k j X j i . Since W is random, for each k h k → X is an independent random vector(where we write → X for X 1 X N ). All of these random vectors follow the same distribution, and the output → y f → X of the network is simply the sum of these identical distributions multiplied by the univariate Gaussians V k . By the multidi...
#tvae #topographic #equivariant Variational Autoencoders model the latent space as a set of independent Gaussian random variables, which the decoder maps to a data distribution. However, this independence is not always desired, for example when dealing with video sequences, we know that successive frames are heavily correlated. Thus, any latent space dealing with such data should reflect this in its structure. Topographic VAEs are a framework for defining correlation structures among the latent variables and induce equivariance within the resulting model. This paper shows how such correlation structures can be built by correctly arranging higher-level variables, which are themselves independent Gaussians. OUTLINE: 0:00 - Intro 1:40 - Architecture Overview 6:30 - Comparison to regular VAEs 8:35 - Generative Mechanism Formulation 11:45 - Non-Gaussian Latent Space 17:30 - Topographic Product of Student-t 21:15 - Introducing Temporal Coherence 24:50 - Topographic VAE 27:50 - Experimental Results 31:15 - Conclusion & Comments Paper: https://arxiv.org/abs/2109.01394 Code: https://github.com/akandykeller/topog... Abstract: In this work we seek to bridge the concepts of topographic organization and equivariance in neural networks. To accomplish this, we introduce the Topographic VAE: a novel method for efficiently training deep generative models with topographically organized latent variables. We show that such a model indeed learns to organize its activations according to salient characteristics such as digit class, width, and style on MNIST. Furthermore, through topographic organization over time (i.e. temporal coherence), we demonstrate how predefined latent space transformation operators can be encouraged for observed transformed input sequences -- a primitive form of unsupervised learned equivariance. We demonstrate that this model successfully learns sets of approximately equivariant features (i.e. "capsules") directly from sequences and achieves higher likelihood on correspondingly transforming test sequences. Equivariance is verified quantitatively by measuring the approximate commutativity of the inference network and the sequence transformations. Finally, we demonstrate approximate equivariance to complex transformations, expanding upon the capabilities of existing group equivariant neural networks. Authors: T. Anderson Keller, Max Welling Links: TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://discord.gg/4H8xxDF BitChute: https://www.bitchute.com/channel/yann... Minds: https://www.minds.com/ykilcher Parler: https://parler.com/profile/YannicKilcher LinkedIn: https://www.linkedin.com/in/ykilcher BiliBili: https://space.bilibili.com/1824646584 If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannick... Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.09.03.281162v1?rss=1 Authors: Invernizzi, A., Haak, K. V., Carvalho, J., Renken, R., Cornelissen, F. Abstract: The majority of neurons in the human brain process signals from neurons elsewhere in the brain. Connective Field (CF) modeling is a biologically-grounded method to describe this essential aspect of the brain s circuitry. It allows characterizing the response of a population of neurons in terms of the activity in another part of the brain. CF modeling translates the concept of the receptive field (RF) into the domain of connectivity by assessing the spatial dependency between signals in distinct cortical visual field areas. Standard CF model estimation has some intrinsic limitations in that it cannot estimate the uncertainty associated with each of its parameters. Obtaining the uncertainty will allow identification of model biases, e.g. related to an over- or under-fitting or a co-dependence of parameters, thereby improving the CF prediction. To enable this, here we present a Bayesian framework for the CF model. Using a Markov Chain Monte Carlo (MCMC) approach, we estimate the underlying posterior distribution of the CF parameters and consequently, quantify the uncertainty associated with each estimate. We applied the method and its new Bayesian features to characterize the cortical circuitry of the early human visual cortex of 12 healthy participants that were assessed using 3T fMRI. In addition, we show how the MCMC approach enables the use of effect size (beta) as a data-driven parameter to retain relevant voxels for further analysis. Finally, we demonstrate how our new method can be used to compare different CF models. Our results show that single Gaussian models are favoured over differences of Gaussians (i.e. center-surround) models, suggesting that the cortico-cortical connections of the early visual system do not possess center-surround organisation. We conclude that our new Bayesian CF framework provides a comprehensive tool to improve our fundamental understanding of the human cortical circuitry in health and disease. Copy rights belong to original authors. Visit the link for more info
In part two of our conversation on what counts as an explanation in science, we pickup with special guest David Barack giving his thoughts on the "model–mechanism–mapping" criteria for explanation. This leads us into a lengthy discussion on explanatory versus phenomenological (or "descriptive") models. We ask if there truly is a distinction between these model classes or if a sufficiently good description will end up being explanatory. We illustrate these points with examples such as the Nernst equation, the Hodgkin-Huxley model of the action potential, and multiple uses of Difference of Gaussians in neuroscience. Throughout, we ask such burning questions as: can a model be explanatory if the people who made it thought it wasn't? are diagrams explanations? and, is gravity descriptive or mechanistic?
StatLearn 2012 - Workshop on "Challenging problems in Statistical Learning"
Consider the usual regression problem in which we want to study the conditional distribution of a response Y given a set of predictors X. Sufficient dimension reduction (SDR) methods aim at replacing the high-dimensional vector of predictors by a lower-dimensional function R(X) with no loss of information about the dependence of the response variable on the predictors. Almost all SDR methods restrict attention to the class of linear reductions, which can be represented in terms of the projection of X onto a dimension-reduction subspace (DRS). Several methods have been proposed to estimate the basis of the DRS, such as sliced inverse regression (SIR; Li, 1991), principal Hessian directions (PHD; Li, 1992), sliced average variance estimation (SAVE; Cook and Weisberg, 1991), directional regression (DR; Li et al., 2005) and inverse regression estimation (IRE; Cook and Ni, 2005). A novel SDR method, called MSIR, based on finite mixtures of Gaussians has been recently proposed (Scrucca, 2011) as an extension to SIR. The talk will present the MSIR methodology and some recent advances. In particular, a BIC criterion for the selection the dimensionality of DRS will be introduced, and its extension for the purpose of variable selection. Finally, the application of MSIR in classification problems, both supervised and semi-supervised, will be discussed.
Lecture by Professor Andrew Ng for Machine Learning (CS 229) in the Stanford Computer Science department. Professor Ng discusses unsupervised learning in the context of clustering, Jensen's inequality, mixture of Gaussians, and expectation-maximization.
Mathematik, Informatik und Statistik - Open Access LMU - Teil 01/03
We consider sequential or online learning in dynamic neural regression models. By using a state space representation for the neural network' s parameter evolution in time we obtain approximations to the unknown posterior by either deriving posterior modes via the Fisher scoring algorithm or by deriving approximate posterior means with the importance sampling method. Furthermore, we replace the commonly used Gaussian noise assumption in the neural regression model by a more flexible noise model based on the Student t-density. Since the t-density can be interpreted as being an infinite mixture of Gaussians, hyperparameters such as the degrees of freedom of the t-density can be learned from the data based on an online EM-type algorithm. We show experimentally that our novel methods outperform state-of-the art neural network online learning algorithms like the extended Kalman filter method for both, situations with standard Gaussian noise terms and situations with measurement outliers.