Die Universitätsbibliothek (UB) verfügt über ein umfangreiches Archiv an elektronischen Medien, das von Volltextsammlungen über Zeitungsarchive, Wörterbücher und Enzyklopädien bis hin zu ausführlichen Bibliographien und mehr als 1000 Datenbanken reicht. Auf iTunes U stellt die UB unter anderem eine…
Ludwig-Maximilians-Universität München
The generalized additive model is a well established and strong tool that allows to model smooth effects of predictors on the response. However, if the link function, which is typically chosen as the canonical link, is misspecified, substantial bias is to be expected. A procedure is proposed that simultaneously estimates the form of the link function and the unknown form of the predictor functions including selection of predictors. The procedure is based on boosting methodology, which obtains estimates by using a sequence of weak learners. It strongly dominates fitting procedures that are unable to modify a given link function if the true link function deviates from the fixed function. The performance of the procedure is shown in simulation studies and illustrated by a real world example.
In this paper, individual differences scaling (INDSCAL) is revisited, considering INDSCAL as being embedded within a hierarchy of individual difference scaling models. We explore the members of this family, distinguishing (i) models, (ii) the role of identification and substantive constraints, (iii) criteria for fitting models and (iv) algorithms to optimise the criteria. Model formulations may be based either on data that are in the form of proximities or on configurational matrices. In its configurational version, individual difference scaling may be formulated as a form of generalized Procrustes analysis. Algorithms are introduced for fitting the new models. An application from sensory evaluation illustrates the performance of the methods and their solutions.
We discuss two-sample global permutation tests for sets of multivariate ordinal data in possibly high-dimensional setups, motivated by the analysis of data collected by means of the World Health Organisation's International Classification of Functioning, Disability and Health. The tests do not require any modelling of the multivariate dependence structure. Specifically, we consider testing for marginal inhomogeneity and direction-independent marginal order. Max-T test statistics are known to lead to good power against alternatives with few strong individual effects. We propose test statistics that can be seen as their counterparts for alternatives with many weak individual effects. Permutation tests are valid only if the two multivariate distributions are identical under the null hypothesis. By means of simulations, we examine the practical impact of violations of this exchangeability condition. Our simulations suggest that theoretically invalid permutation tests can still be 'practically valid'. In particular, they suggest that the degree of the permutation procedure's failure may be considered as a function of the difference in group-specific covariance matrices, the proportion between group sizes, the number of variables in the set, the test statistic used, and the number of levels per variable.
In linear mixed models, the assumption of normally distributed random effects is often inappropriate and unnecessarily restrictive. The proposed approximate Dirichlet process mixture assumes a hierarchical Gaussian mixture that is based on the truncated version of the stick breaking presentation of the Dirichlet process. In addition to the weakening of distributional assumptions, the specification allows to identify clusters of observations with a similar random effects structure. An Expectation-Maximization algorithm is given that solves the estimation problem and that, in certain respects, may exhibit advantages over Markov chain Monte Carlo approaches when modelling with Dirichlet processes. The method is evaluated in a simulation study and applied to the dynamics of unemployment in Germany as well as lung function growth data.
Variable selection has been suggested for Random Forests to improve their efficiency of data prediction and interpretation. However, its basic element, i.e. variable importance measures, can not be computed straightforward when there is missing data. Therefore an extensive simulation study has been conducted to explore possible solutions, i.e. multiple imputation, complete case analysis and a newly suggested importance measure for several missing data generating processes. The ability to distinguish relevant from non-relevant variables has been investigated for these procedures in combination with two popular variable selection methods. Findings and recommendations: Complete case analysis should not be applied as it lead to inaccurate variable selection and models with the worst prediction accuracy. Multiple imputation is a good means to select variables that would be of relevance in fully observed data. It produced the best prediction accuracy. By contrast, the application of the new importance measure causes a selection of variables that reflects the actual data situation, i.e. that takes the occurrence of missing values into account. It's error was only negligible worse compared to imputation.
Tue, 1 Jan 2013 12:00:00 +0100 https://epub.ub.uni-muenchen.de/21739/1/MA_Bitterlich.pdf Bitterlich, Manuela ddc:500, Ausgewählte Abschlussarbeiten, Statistik
Tue, 1 Jan 2013 12:00:00 +0100 https://epub.ub.uni-muenchen.de/21729/1/BA_Zeis.pdf Zeis, Klara ddc:500, ddc:310, Ausgewählte Abschlussarbeiten, Statistik
Tue, 1 Jan 2013 12:00:00 +0100 https://epub.ub.uni-muenchen.de/21731/1/BA_Huber_Cynt.pdf Huber, Cynthia ddc:500, Ausgewählte Abschlussarbeiten, Statistik
Tue, 1 Jan 2013 12:00:00 +0100 https://epub.ub.uni-muenchen.de/21736/1/BA_Hoelzl.pdf Hölzl, Andreas ddc:500, Ausgewählte Abschlussarbeiten, Statistik, Mathematik, In
Tue, 1 Jan 2013 12:00:00 +0100 https://epub.ub.uni-muenchen.de/21746/1/MA_Casalicchio.pdf Casalicchio, Giuseppe ddc:500, Ausgewählte Abschlussarbeiten, Statistik
Tue, 1 Jan 2013 12:00:00 +0100 https://epub.ub.uni-muenchen.de/21742/1/MA_Ernst.pdf Ernst, Dominik ddc:500, Ausgewählte Abschlussarbeiten, Statistik, Mathematik, Informatik un
Tue, 1 Jan 2013 12:00:00 +0100 https://epub.ub.uni-muenchen.de/21455/1/BA_Hummrich.pdf Hummrich, Katrin
Tue, 1 Jan 2013 12:00:00 +0100 https://epub.ub.uni-muenchen.de/21732/1/MA_Berger.pdf Berger, Moritz ddc:500, Ausgewählte Abschlussarbeiten, Statistik, Mathema
Tue, 1 Jan 2013 12:00:00 +0100 https://epub.ub.uni-muenchen.de/21738/1/MA_Manuilova.pdf Manuilova, Ekaterina ddc:500, Ausgewählte Abschlu
Tue, 1 Jan 2013 12:00:00 +0100 https://epub.ub.uni-muenchen.de/21456/1/MA_Obst.pdf Obst, Ronert ddc:500, Ausgewählte Abschlussarbeiten, Statistik, Mathematik, Informatik und Statistik 0
Tue, 1 Jan 2013 12:00:00 +0100 https://epub.ub.uni-muenchen.de/21730/1/MA_Meingast.pdf Meingast, Maximilian ddc:500, Ausgewählte Abschlussarbeiten, Stat
Tue, 1 Jan 2013 12:00:00 +0100 https://epub.ub.uni-muenchen.de/21735/1/BA_Wenzler.pdf Wenzler, Germaine ddc:500, Ausgewählte Absch
Tue, 1 Jan 2013 12:00:00 +0100 https://epub.ub.uni-muenchen.de/21452/1/MA_Poppe.pdf Poppe, Melanie ddc:500, Ausgewählte Abschlussarbeiten, Statistik, Mathematik, Informatik und Statistik
This short note contains an explicit proof of the Dirichlet distribution being the conjugate prior to the Multinomial sample distribution as resulting from the general construction method described, e.g., in Bernardo and Smith (2000). The well-known Dirichlet-Multinomial model is thus shown to fit into the framework of canonical conjugate analysis (Bernardo and Smith 2000, Prop.~5.6, p.~273), where the update step for the prior parameters to their posterior counterparts has an especially simple structure. This structure is used, e.g., in the Imprecise Dirichlet Model (IDM) by Walley (1996), a simple yet powerful model for imprecise Bayesian inference using sets of Dirichlet priors to model vague prior knowledge, and furthermore in other imprecise probability models for inference in exponential families where sets of priors are considered.
Mon, 9 Jul 2012 12:00:00 +0100 http://scitation.aip.org/content/aip/journal/jmp/53/9/10.1063/1.4728982 https://epub.ub.uni-muenchen.de/16210/1/Siedentop_16210.pdf Siedentop, Heinz; Maier, Thomas ddc:530, ddc:510, Mathematik, Infor
The use of the multinomial logit model is typically restricted to applications with few predictors, because in high-dimensional settings maximum likelihood estimates tend to deteriorate. In this paper we are proposing a sparsity-inducing penalty that accounts for the special structure of multinomial models. In contrast to existing methods, it penalizes the parameters that are linked to one variable in a grouped way and thus yields variable selection instead of parameter selection. We develop a proximal gradient method that is able to efficiently compute stable estimates. In addition, the penalization is extended to the important case of predictors that vary across response categories. We apply our estimator to the modeling of party choice of voters in Germany including voter-specific variables like age and gender but also party-specific features like stance on nuclear energy and immigration.
The use of the multinomial logit model is typically restricted to applications with few predictors, because in high-dimensional settings maximum likelihood estimates tend to deteriorate. In this paper we are proposing a sparsity-inducing penalty that accounts for the special structure of multinomial models. In contrast to existing methods, it penalizes the parameters that are linked to one variable in a grouped way and thus yields variable selection instead of parameter selection. We develop a proximal gradient method that is able to efficiently compute stable estimates. In addition, the penalization is extended to the important case of predictors that vary across response categories. We apply our estimator to the modeling of party choice of voters in Germany including voter-specific variables like age and gender but also party-specific features like stance on nuclear energy and immigration.
A method is proposed that aims at identifying clusters of individuals that show similar patterns when observed repeatedly. We consider linear mixed models which are widely used for the modeling of longitudinal data. In contrast to the classical assumption of a normal distribution for the random effects a finite mixture of normal distributions is assumed. Typically, the number of mixture components is unknown and has to be chosen, ideally by data driven tools. For this purpose an EM algorithm-based approach is considered that uses a penalized normal mixture as random effects distribution. The penalty term shrinks the pairwise distances of cluster centers based on the group lasso and the fused lasso method. The effect is that individuals with similar time trends are merged into the same cluster. The strength of regularization is determined by one penalization parameter. For finding the optimal penalization parameter a new model choice criterion is proposed.
A novel point process model continuous in space-time is proposed for quantifying the transmission dynamics of the two most common meningococcal antigenic sequence types observed in Germany 2002-2008. Modelling is based on the conditional intensity function (CIF) which is described by a superposition of additive and multiplicative components. As an epidemiological interesting finding, spread behaviour was shown to depend on type in addition to age: basic reproduction numbers were 0.25 (95% CI 0.19-0.34) and 0.11 (95% CI 0.07-0.17) for types B:P1.7-2,4:F1-5 and C:P1.5,2:F3-3, respectively. Altogether, the proposed methodology represents a comprehensive and universal regression framework for the modelling, simulation and inference of self-exciting spatio-temporal point processes based on the CIF. Usability of the modelling in biometric practice is promoted by an implementation in the R package surveillance.
A novel point process model continuous in space-time is proposed for quantifying the transmission dynamics of the two most common meningococcal antigenic sequence types observed in Germany 2002-2008. Modelling is based on the conditional intensity function (CIF) which is described by a superposition of additive and multiplicative components. As an epidemiological interesting finding, spread behaviour was shown to depend on type in addition to age: basic reproduction numbers were 0.25 (95% CI 0.19-0.34) and 0.11 (95% CI 0.07-0.17) for types B:P1.7-2,4:F1-5 and C:P1.5,2:F3-3, respectively. Altogether, the proposed methodology represents a comprehensive and universal regression framework for the modelling, simulation and inference of self-exciting spatio-temporal point processes based on the CIF. Usability of the modelling in biometric practice is promoted by an implementation in the R package surveillance.
Sun, 1 Jan 2012 12:00:00 +0100 https://epub.ub.uni-muenchen.de/25521/1/MA_Fink_Paul.pdf Fink, Paul ddc:500, Ausgewählte Abschlussarbeiten, Statistik
Security-Frameworks sind baukastenähnliche, zunächst abstrakte Konzepte, die aufeinander abgestimmte technische und organisatorische Maßnahmen zur Prävention, Detektion und Bearbeitung von Informationssicherheitsvorfällen bündeln. Anders als bei der Zusammenstellung eigener Sicherheitskonzepte aus einer Vielzahl punktueller Einzelmaßnahmen wird bei der Anwendung von Security-Frameworks das Ziel verfolgt, mit einem relativ geringen Aufwand auf bewährte Lösungsansätze zur Absicherung von komplexen IT-Diensten und IT-Architekturen zurückgreifen zu können. Die praktische Umsetzung eines Security-Frameworks erfordert seine szenarienspezifische Adaption und Implementierung, durch die insbesondere eine nahtlose Integration in die vorhandene Infrastruktur sichergestellt und die Basis für den nachhaltigen, effizienten Betrieb geschaffen werden müssen. Die vorliegende Arbeit behandelt das integrierte Management von Security-Frameworks. Im Kern ihrer Betrachtungen liegen folglich nicht individuelle Frameworkkonzepte, sondern Managementmethoden, -prozesse und -werkzeuge für den parallelen Einsatz mehrerer Frameworkinstanzen in komplexen organisationsweiten und -übergreifenden Szenarien. Ihre Schwerpunkte werden zum einen durch die derzeit sehr technische Ausprägung vieler Security-Frameworks und zum anderen durch die fehlende Betrachtung ihres Lebenszyklus über die szenarienspezifische Anpassung hinaus motiviert. Beide Aspekte wirken sich bislang inhibitorisch auf den praktischen Einsatz aus, da zur Umsetzung von Security-Frameworks immer noch ein erheblicher szenarienspezifischer konzeptioneller Aufwand erbracht werden muss. Nach der Diskussion der relevanten Grundlagen des Sicherheitsmanagements und der Einordnung von Security-Frameworks in Informationssicherheitsmanagementsysteme werden auf Basis ausgewählter konkreter Szenarien mehr als 50 Anforderungen an Security-Frameworks aus der Perspektive ihres Managements abgeleitet und begründet gewichtet. Die anschließende Anwendung dieses Anforderungskatalogs auf mehr als 75 aktuelle Security-Frameworks zeigt typische Stärken sowie Schwächen auf und motiviert neben konkreten Verbesserungsvorschlägen für Frameworkkonzepte die nachfolgend erarbeiteten, für Security-Frameworks spezifischen Managementmethoden. Als Bezugsbasis für alle eigenen Konzepte dient eine detaillierte Analyse des gesamten Lebenszyklus von Security-Frameworks, der zur grundlegenden Spezifikation von Managementaufgaben, Verantwortlichkeiten und Schnittstellen zu anderen Managementprozessen herangezogen wird. Darauf aufbauend werden an den Einsatz von Security-Frameworks angepasste Methoden und Prozesse u. a. für das Risikomanagement und ausgewählte Disziplinen des operativen Sicherheitsmanagements spezifiziert, eine Sicherheitsmanagementarchitektur für Security-Frameworks konzipiert, die prozessualen Schnittstellen am Beispiel von ISO/IEC 27001 und ITIL v3 umfassend ausgearbeitet und der Einsatz von IT-Sicherheitskennzahlen zur Beurteilung von Security-Frameworks demonstriert. Die praktische Anwendung dieser innovativen Methoden erfordert dedizierte Managementwerkzeuge, die im Anschluss im Detail konzipiert und in Form von Prototypen bzw. Simulationen umgesetzt, exemplifiziert und bewertet werden. Ein umfassendes Anwendungsbeispiel demonstriert die praktische, parallele Anwendung mehrerer Security-Frameworks und der spezifizierten Konzepte und Werkzeuge. Abschließend werden alle erreichten Ergebnisse kritisch beurteilt und ein Ausblick auf mögliche Weiterentwicklungen und offene Forschungsfragestellungen in verwandten Bereichen gegeben.
The partial area under the receiver operating characteristic curve (PAUC) is a well-established performance measure to evaluate biomarker combinations for disease classification. Because the PAUC is defined as the area under the ROC curve within a restricted interval of false positive rates, it enables practitioners to quantify sensitivity rates within pre-specified specificity ranges. This issue is of considerable importance for the development of medical screening tests. Although many authors have highlighted the importance of PAUC, there exist only few methods that use the PAUC as an objective function for finding optimal combinations of biomarkers. In this paper, we introduce a boosting method for deriving marker combinations that is explicitly based on the PAUC criterion. The proposed method can be applied in high-dimensional settings where the number of biomarkers exceeds the number of observations. Additionally, the proposed method incorporates a recently proposed variable selection technique (stability selection) that results in sparse prediction rules incorporating only those biomarkers that make relevant contributions to predicting the outcome of interest. Using both simulated data and real data, we demonstrate that our method performs well with respect to both variable selection and prediction accuracy. Specifically, if the focus is on a limited range of specificity values, the new method results in better predictions than other established techniques for disease classification.
Sun, 1 Jan 2012 12:00:00 +0100 https://epub.ub.uni-muenchen.de/21743/1/MA_Stuckart.pdf Stuckart, Claudia ddc:500, Ausgewählte Abschlussarbeiten, Statistik, Mathematik, Inf
Sun, 1 Jan 2012 12:00:00 +0100 https://epub.ub.uni-muenchen.de/21856/1/DA_Bothmann.pdf Bothmann, Ludwig ddc:500, Ausgewählte Abschlussarbeiten, Mathematik, Informatik un
Sun, 1 Jan 2012 12:00:00 +0100 https://epub.ub.uni-muenchen.de/21744/1/MA_Lindenlaub.pdf Lindenlaub, Christian ddc:500, Ausgewählte Abschlussarbeiten, Statistik, Mathematik, Informatik und Statistik
Sat, 1 Jan 2011 12:00:00 +0100 https://epub.ub.uni-muenchen.de/21857/1/DA_Kuehnle.pdf Kühnle, Oliver ddc:500, Ausgewählte Abschlussarbeiten, Mathematik, Informatik und Statistik
Investigating differences between means of more than two groups or experimental conditions is a routine research question addressed in biology. In order to assess differences statistically, multiple comparison procedures are applied. The most prominent procedures of this type, the Dunnett and Tukey-Kramer test, control the probability of reporting at least one false positive result when the data are normally distributed and when the sample sizes and variances do not differ between groups. All three assumptions are non-realistic in biological research and any violation leads to an increased number of reported false positive results. Based on a general statistical framework for simultaneous inference and robust covariance estimators we propose a new statistical multiple comparison procedure for assessing multiple means. In contrast to the Dunnett or Tukey-Kramer tests, no assumptions regarding the distribution, sample sizes or variance homogeneity are necessary. The performance of the new procedure is assessed by means of its familywise error rate and power under different distributions. The practical merits are demonstrated by a reanalysis of fatty acid phenotypes of the bacterium Bacillus simplex from the "Evolution Canyons" I and II in Israel. The simulation results show that even under severely varying variances, the procedure controls the number of false positive findings very well. Thus, the here presented procedure works well under biologically realistic scenarios of unbalanced group sizes, non-normality and heteroscedasticity.
The case of continuous effect modifiers in varying-coefficient models has been well investigated. Categorial effect modifiers, however, have been largely neglected. In this paper a regularization technique is proposed that allows for selection of covariates and fusion of categories of categorial effect modifiers in a linear model. It is distinguished between nominal and ordinal variables, since for the latter more economic parametrizations are warranted. The proposed methods are illustrated and investigated in simulation studies and real world data evaluations. Moreover, some asymptotic properties are derived.
The case of continuous effect modifiers in varying-coefficient models has been well investigated. Categorial effect modifiers, however, have been largely neglected. In this paper a regularization technique is proposed that allows for selection of covariates and fusion of categories of categorial effect modifiers in a linear model. It is distinguished between nominal and ordinal variables, since for the latter more economic parametrizations are warranted. The proposed methods are illustrated and investigated in simulation studies and real world data evaluations. Moreover, some asymptotic properties are derived. The paper is a preprint of an article that has been accepted for publication in Statistica Sinica. Please use the journal version for citation.
’Wie kann eine Anwendung zum Lernen mit Vorlesungsaufzeichnungen so gestaltet werden, dass sie den Wissenserwerb möglichst optimal unterstützt?’. Dies war die zentrale Frage dieser Diplomarbeit, zu deren Beantwortung, aufbauend auf ein aktuelles System zur Bereitstellung von aufgezeichneten Vorlesungen, eine neue prototypische Lernanwendung implementiert wurde. Dazu wurden die Entwicklungsmöglichkeiten des Systems der ’UnterrichtsMitschau’ an der LMU München entsprechend der gemäßigt konstruktivistischen Lerntheorie herausgearbeitet. Um die Akzeptanz einer Anwendung bei den studentischen Nutzern zu gewährleisten, wurden deren Wünsche und Ideen mit Hilfe einer Fokusgruppendiskussion ermittelt und in das entworfene Konzept einbezogen. Auf dieser Grundlage wurde eine Anwendung entwickelt, deren zentrale Neuerungen das Hinzufügen von Annotationen und die Möglichkeit zum kooperativen Lernen in zwei unterschiedlichen Modi sind. Im ersten Kooperationsmodus tauschen sich die Lernenden asynchron, also zeitversetzt mit Hilfe von Annotationen über die Vorlesungsinhalte aus. Im zweiten, dem ’synchronen kooperativen Modus’ stehen die Lernenden über eine Audioverbindung direkt miteinander in Kontakt und bearbeiten die Vorlesungsaufzeichnung synchron. Eine nachgelagerte Studie mit 15 potenziellen Nutzern zeigte unter anderem, dass beide kooperativen Modi des neuen Systems im Vergleich zur bisherigen Anwendung besser bewertet wurden. Unter anderem sahen die Nutzer die Prozessmerkmale des Lernens aus der gemäßigt konstruktivistischen Lerntheorie stärker unterstützt. Des weiteren würden die Testpersonen die neue Anwendung eher im Studium einsetzen als die bisherige.
Variable selection and model choice are of major concern in many statistical applications, especially in high-dimensional regression models. Boosting is a convenient statistical method that combines model fitting with intrinsic model selection. We investigate the impact of base-learner specification on the performance of boosting as a model selection procedure. We show that variable selection may be biased if the covariates are of different nature. Important examples are models combining continuous and categorical covariates, especially if the number of categories is large. In this case, least squares base-learners offer increased flexibility for the categorical covariate and lead to a preference even if the categorical covariate is non-informative. Similar difficulties arise when comparing linear and nonlinear base-learners for a continuous covariate. The additional flexibility in the nonlinear base-learner again yields a preference of the more complex modeling alternative. We investigate these problems from a theoretical perspective and suggest a framework for unbiased model selection based on a general class of penalized least squares base-learners. Making all base-learners comparable in terms of their degrees of freedom strongly reduces the selection bias observed in naive boosting specifications. The importance of unbiased model selection is demonstrated in simulations and an application to forest health models.
Many different cluster methods are frequently used in gene expression data analysis to find groups of co–expressed genes. However, cluster algorithms with the ability to visualize the resulting clusters are usually preferred. The visualization of gene clusters gives practitioners an understanding of the cluster structure of their data and makes it easier to interpret the cluster results. In this paper recent extensions of R package gcExplorer are presented. gc-Explorer is an interactive visualization toolbox for the investigation of the overall cluster structure as well as single clusters. The different visualization options including arbitrary node and panel functions are described in detail. Finally the toolbox can be used to investigate the quality of a given clustering graphically as well as theoretically by testing the association between a partition and a functional group under study. It is shown that gcExplorer is a very helpful tool for a general exploration of microarray experiments. The identification of potentially interesting gene candidates or functional groups is substantially accelerated and eased. Inferential analysis on a cluster solution is used to judge its ability to provide insight into the underlying mechanistic biology of the experiment.
Finite mixture models are routinely applied to time course microarray data. Due to the complexity and size of this type of data the choice of good starting values plays an important role. So far initialization strategies have only been investigated for data from a mixture of multivariate normal distributions. In this work several initialization procedures are evaluated for mixtures of regression models with and without random effects in an extensive simulation study on different artificial datasets. Finally these procedures are also applied to a real dataset from E. coli.
We consider the following problem: estimate the size of a population marked with serial numbers after only a sample of the serial numbers has been observed. Its simplicity in formulation and the inviting possibilities of application make this estimation well suited for an undergraduate level probability course. Our contribution consists in a Bayesian treatment of the problem. For an improper uniform prior distribution, we show that the posterior mean and variance have nice closed form expressions and we demonstrate how to compute highest posterior density intervals. Maple and R code is provided on the authors’ web-page to allow students to verify the theoretical results and experiment with data.
In many applications it is known that the underlying smooth function is constrained to have a specific form. In the present paper, we propose an estimation method based on the regression spline approach, which allows to include concavity or convexity constraints in an appealing way. Instead of using linear or quadratic programming routines, we handle the required inequality constraints on basis coefficients by boosting techniques. Therefore, recently developed componentwise boosting methods for regression purposes are applied, which allow to control the restrictions in each iteration. The proposed approach is compared to several competitors in a simulation study. We also consider a real world data set.
In general, risk of an extreme outcome in financial markets can be expressed as a function of the tail copula of a high-dimensional vector after standardizing marginals. Hence it is of importance to model and estimate tail copulas. Even for moderate dimension, nonparametrically estimating a tail copula is very inefficient and fitting a parametric model to tail copulas is not robust. In this paper we propose a semi-parametric model for tail copulas via an elliptical copula. Based on this model assumption, we propose a novel estimator for the tail copula, which proves favourable compared to the empirical tail copula, both theoretically and empirically.
For an AR(1) process with ARCH(1) errors, we propose empirical likelihood tests for testing whether the sequence is strictly stationary but has infinite variance, or the sequence is an ARCH(1) sequence or the sequence is an iid sequence. Moreover, an empirical likelihood based confidence interval for the parameter in the AR part is proposed. All of these results do not require more than a finite second moment of the innovations. This includes the case of t-innovations for any degree of freedom larger than 2, which serves as a prominent model for real data.
Recently there has been an increasing interest in applying elliptical distributions to risk management. Under weak conditions, Hult and Lindskog (2002) showed that a random vector with an elliptical distribution is in the domain of attraction of a multivariate extreme value distribution. In this paper we study two estimators for the tail dependence function, which are based on extreme value theory and the structure of an elliptical distribution, respectively. After deriving second order regular variation estimates and proving asymptotic normality for both estimators, we show that the estimator based on the structure of an elliptical distribution is better than that based on extreme value theory in terms of both asymptotic variance and optimal asymptotic mean squared error.Our theoretical results are confirmed by a simulation study.
In this article we introduce a latent variable model (LVM) for mixed ordinal and continuous responses, where covariate effects on the continuous latent variables are modelled through a flexible semiparametric predictor. We extend existing LVM with simple linear covariate effects by including nonparametric components for nonlinear effects of continuous covariates and interactions with other covariates as well as spatial effects. Full Bayesian modelling is based on penalized spline and Markov random field priors and is performed by computationally efficient Markov chain Monte Carlo (MCMC) methods. We apply our approach to a large German social science survey which motivated our methodological development.
Microaggregation is one of the most important statistical disclosure control techniques for continuous data. The basic principle of microaggregation is to group the observations in a data set and to replace them by their corresponding group means. In this paper, we consider single-axis sorting, a frequently applied microaggregation technique where the formation of groups depends on the magnitude of a sorting variable related to the variables in the data set. The paper deals with the impact of this technique on a linear model in continuous variables. We show that parameter estimates are asymptotically biased if the sorting variable depends on the response variable of the linear model. Using this result, we develop a consistent estimator that removes the aggregation bias. Moreover, we derive the asymptotic covariance matrix of the corrected least squares estimator.
Most epidemiological studies suffer from misclassification in the response and/or the covariates. Since ignoring misclassification induces bias on the parameter estimates, correction for such errors is important. For measurement error, the continuous analog to misclassification, a general approach for bias correction is the SIMEX (simulation extrapolation) originally suggested by Cook and Stefanski (1994). This approach has been recently extended to regression models with a possibly misclassified categorical response and/or the covariates by Küchenhoff et al. (2005), and is called the MC-SIMEX approach. To assess the importance of a regressor not only its (corrected) estimate is needed, but also its standard error. For the original SIMEX approach. Carroll et al. (1996) developed a method for estimating the asymptotic variance. Here we derive the asymptotic variance estimators for the MC-SIMEX approach, extending the methodology of Carroll et al. (1996). We also include the case where the misclassification probabilities are estimated by a validation study. An extensive simulation study shows the good performance of our approach. The approach is illustrated using an example in caries research including a logistic regression model, where the response and a binary covariate are possibly misclassified.
Count data often exhibit overdispersion and/or require an adjustment for zero outcomes with respect to a Poisson model. Zero-modified Poisson (ZMP) and zero-modified generalized Poisson (ZMGP) regression models are useful classes of models for such data. In the literature so far only score tests are used for testing the necessity of this adjustment. For this testing problem we show how poor the performance of the corresponding score test can be in comparison to the performance of Wald and likelihood ratio (LR) tests through a simulation study. In particular, the score test in the ZMP case results in a power loss of 47% compared to the Wald test in the worst case, while in the ZMGP case the worst loss is 87%. Therefore, regardless of the computational advantage of score tests, the loss in power compared to the Wald and LR tests should not be neglected and these much more powerful alternatives should be used instead. We also prove consistency and asymptotic normality of the maximum likelihood estimators in the above mentioned regression models to give a theoretical justification for Wald and likelihood ratio tests.
In this paper we consider regression models for count data allowing for overdispersion in a Bayesian framework. We account for unobserved heterogeneity in the data in two ways. On the one hand, we consider more flexible models than a common Poisson model allowing for overdispersion in different ways. In particular, the negative binomial and the generalized Poisson distribution are addressed where overdispersion is modelled by an additional model parameter. Further, zero-inflated models in which overdispersion is assumed to be caused by an excessive number of zeros are discussed. On the other hand, extra spatial variability in the data is taken into account by adding spatial random effects to the models. This approach allows for an underlying spatial dependency structure which is modelled using a conditional autoregressive prior based on Pettitt et al. (2002). In an application the presented models are used to analyse the number of invasive meningococcal disease cases in Germany in the year 2004. Models are compared according to the deviance information criterion (DIC) suggested by Spiegelhalter et al. (2002) and using proper scoring rules, see for example Gneiting and Raftery (2004). We observe a rather high degree of overdispersion in the data which is captured best by the GP model when spatial effects are neglected. While the addition of spatial effects to the models allowing for overdispersion gives no or only little improvement, a spatial Poisson model is to be preferred over all other models according to the considered criteria.
Binary outcomes that depend on an ordinal predictor in a non-monotonic way are common in medical data analysis. Such patterns can be addressed in terms of cutpoints: for example, one looks for two cutpoints that define an interval in the range of the ordinal predictor for which the probability of a positive outcome is particularly high (or low). A chi-square test may then be performed to compare the proportions of positive outcomes in and outside this interval. However, if the two cutpoints are chosen to maximize the chi-square statistic, referring the obtained chi-square statistic to the standard chi-square distribution is an inappropriate approach. It is then necessary to correct the p-value for multiple comparisons by considering the distribution of the maximally selected chi-square statistic instead of the nominal chi-square distribution. Here, we derive the exact distribution of the chi-square statistic obtained by the optimal two cutpoints. We suggest a combinatorial computation method and illustrate our approach by a simulation study and an application to varicella data.
We prove that the quasi-score estimator in a mean-variance model is optimal in the class of (unbiased) linear score estimators, in the sense that the difference of the asymptotic covariance matrices of the linear score and quasi-score estimator is positive semi-definite. We also give conditions under which this difference is zero or under which it is positive definite. This result can be applied to measurement error models where it implies that the quasi-score estimator is asymptotically more efficient than the corrected score estimator.