POPULARITY
Fakultät für Mathematik, Informatik und Statistik - Digitale Hochschulschriften der LMU - Teil 02/02
Many economic and financial time series exhibit time-varying volatility. GARCH models are tools for forecasting and analyzing the dynamics of this volatility. The co-movements in financial markets and financial assets around the globe have recently become the main area of interest of financial econometricians; hence, multivariate GARCH models have been introduced in order to capture these co-movements. A large variety of multivariate GARCH models exists in the financial world, and each of these models has its advantages and limitations. An important goal in constructing multivariate GARCH models is to make them parsimonious enough without compromising their adequacy in real-world applications. Another aspect is to ensure that the conditional covariance matrix is a positive-definite one. Motivated by the idea that volatility in financial markets is driven by a few latent variables, a new parameterization in multivariate context is proposed in this thesis. The factors in our proposed model are obtained through a recursive use of the singular value decomposition (SVD). This recursion enables us to sequentially extract the volatility clustering from the data set; accordingly, our model is called Sequential Volatility Extraction (SVX model in short). Logarithmically transformed singular values and the components of their corresponding singular vectors were modeled using the ARMA approach. We can say that in terms of basic idea and modeling approach our model resembles a stochastic volatility model. Empirical analysis and the comparison with the already existing multivariate GARCH models show that our proposed model is parsimonious because it requires lower number of parameters to estimate when compared to the two alternative models (i.e., DCC and GOGARCH). At the same time, the resulting covariance matrices from our model are positive-(semi)-definite. Hence, we can argue that our model fulfills the basic requirements of a multivariate GARCH model. Based on the findings, it can be concluded that SVX model can be applied to financial data of dimensions ranging from low to high.
Fakultät für Mathematik, Informatik und Statistik - Digitale Hochschulschriften der LMU - Teil 02/02
Large knowledge graphs increasingly add great value to various applications that require machines to recognize and understand queries and their semantics, as in search or question answering systems. These applications include Google search, Bing search, IBM’s Watson, but also smart mobile assistants as Apple’s Siri, Google Now or Microsoft’s Cortana. Popular knowledge graphs like DBpedia, YAGO or Freebase store a broad range of facts about the world, to a large extent derived from Wikipedia, currently the biggest web encyclopedia. In addition to these freely accessible open knowledge graphs, commercial ones have also evolved including the well-known Google Knowledge Graph or Microsoft’s Satori. Since incompleteness and veracity of knowledge graphs are known problems, the statistical modeling of knowledge graphs has increasingly gained attention in recent years. Some of the leading approaches are based on latent variable models which show both excellent predictive performance and scalability. Latent variable models learn embedding representations of domain entities and relations (representation learning). From these embeddings, priors for every possible fact in the knowledge graph are generated which can be exploited for data cleansing, completion or as prior knowledge to support triple extraction from unstructured textual data as successfully demonstrated by Google’s Knowledge-Vault project. However, large knowledge graphs impose constraints on the complexity of the latent embeddings learned by these models. For graphs with millions of entities and thousands of relation-types, latent variable models are required to exploit low dimensional embeddings for entities and relation-types to be tractable when applied to these graphs. The work described in this thesis extends the application of latent variable models for large knowledge graphs in three important dimensions. First, it is shown how the integration of ontological constraints on the domain and range of relation-types enables latent variable models to exploit latent embeddings of reduced complexity for modeling large knowledge graphs. The integration of this prior knowledge into the models leads to a substantial increase both in predictive performance and scalability with improvements of up to 77% in link-prediction tasks. Since manually designed domain and range constraints can be absent or fuzzy, we also propose and study an alternative approach based on a local closed-world assumption, which derives domain and range constraints from observed data without the need of prior knowledge extracted from the curated schema of the knowledge graph. We show that such an approach also leads to similar significant improvements in modeling quality. Further, we demonstrate that these two types of domain and range constraints are of general value to latent variable models by integrating and evaluating them on the current state of the art of latent variable models represented by RESCAL, Translational Embedding, and the neural network approach used by the recently proposed Google Knowledge Vault system. In the second part of the thesis it is shown that the just mentioned three approaches all perform well, but do not share many commonalities in the way they model knowledge graphs. These differences can be exploited in ensemble solutions which improve the predictive performance even further. The third part of the thesis concerns the efficient querying of the statistically modeled knowledge graphs. This thesis interprets statistically modeled knowledge graphs as probabilistic databases, where the latent variable models define a probability distribution for triples. From this perspective, link-prediction is equivalent to querying ground triples which is a standard functionality of the latent variable models. For more complex querying that involves e.g. joins and projections, the theory on probabilistic databases provides evaluation rules. In this thesis it is shown how the intrinsic features of latent variable models can be combined with the theory of probabilistic databases to realize efficient probabilistic querying of the modeled graphs.
Fakultät für Mathematik, Informatik und Statistik - Digitale Hochschulschriften der LMU - Teil 02/02
Thu, 8 Oct 2015 12:00:00 +0100 https://edoc.ub.uni-muenchen.de/18757/ https://edoc.ub.uni-muenchen.de/18757/1/Rose_Doro.pdf Rose, Doro ddc:310, ddc:300, Fakultät für Mathematik, Informat
Fakultät für Mathematik, Informatik und Statistik - Digitale Hochschulschriften der LMU - Teil 02/02
Time series modeling and forecasting are of vital importance in many real world applications. Recently nonlinear time series models have gained much attention, due to the fact that linear time series models faced various limitations in many empirical applications. In this thesis, a large variety of standard and extended linear and nonlinear time series models is considered in order to compare their out-of-sample forecasting performance. We examined the out-of-sample forecast accuracy of linear Autoregressive (AR), Heterogeneous Autoregressive (HAR), Autoregressive Conditional Duration (ACD), Threshold Autoregressive (TAR), Self-Exciting Threshold Autoregressive (SETAR), Logistic Smooth Transition Autoregressive (LSTAR), Additive Autoregressive (AAR) and Artificial Neural Network (ANN) models and also the extended Heterogeneous Threshold Autoregressive (HTAR) or Heterogeneous Self-Exciting Threshold Autoregressive (HSETAR) model for financial, economic and seismic time series. We also extended the previous studies by using Vector Autoregressive (VAR) and Threshold Vector Autoregressive (TVAR) models and compared their forecasting accuracy with linear models for the above mentioned time series. Unlike previous studies that typically consider the threshold models specifications by using internal threshold variable, we specified the threshold models with external transition variables and compared their out-of-sample forecasting performance with the linear benchmark HAR and AR models by using the financial, economic and seismic time series. According to our knowledge, this is the first study of its kind that extends the usage of linear and nonlinear time series models in the field of seismology by utilizing the seismic data from the Hindu Kush region of Pakistan. The question addressed in this study is whether nonlinear models produce 1 through 4 step-ahead forecasts that improve upon linear models. The answer is that linear model mostly yields more accurate forecasts than nonlinear ones for financial, economic and seismic time series. Furthermore, while modeling and forecasting the financial (DJIA, FTSE100, DAX and Nikkei), economic (the USA GDP growth rate) and seismic (earthquake magnitudes, consecutive elapsed times and consecutive distances between earthquakes occurred in the Hindu Kush region of Pakistan) time series, it appears that using various external threshold variables in threshold models improve their out-of-sample forecasting performance. The results of this study suggest that constructing the nonlinear models with external threshold variables has a positive effect on their forecasting accuracy. Similarly for seismic time series, in some cases, TVAR and VAR models provide improved forecasts over benchmark linear AR model. The findings of this study could somehow bridge the analytical gap between statistics and seismology through the potential use of linear and nonlinear time series models. Secondly, we extended the linear Heterogeneous Autoregressive (HAR) model in a nonlinear framework, namely Heterogeneous Threshold Autoregressive (HTAR) model, to model and forecast a time series that contains simultaneously nonlinear and long-range dependence phenomena. The model has successfully been applied to financial data (DJIA, FTSE100, DAX and Nikkei) and the results show that HTAR model has improved 1-step-ahead forecasting performance than linear HAR model by utilizing the financial data of DJIA. For DJIA, the combination of the forecasts from HTAR and linear HAR models are improved over those obtained from the benchmark HAR model. Furthermore, we conducted a simulated study to assess the performance of HAR and HSETAR models in the presence of spurious long-memory type phenomena contains by a time series. The main purpose of this study is to answer the question, for a time series, whether the HAR and HSETAR models have an ability to detect spurious long-memory type phenomena. The simulation results show that HAR model is completely unable to discriminate between true and spurious long-memory type phenomena. However the extended HSETAR model is capable of detecting spurious long-memory type phenomena. This study provides an evidence that it is better to use HSETAR model, when it is suspected that the underlying time series contains some spurious long-memory type phenomena. To sum up, this thesis is a vital tool for researchers who have to choose the best forecasting model from a large variety of models discussed in this thesis for modeling and forecasting the economic, financial, and mainly seismic time series.
Fakultät für Mathematik, Informatik und Statistik - Digitale Hochschulschriften der LMU - Teil 02/02
Wed, 11 Feb 2015 12:00:00 +0100 https://edoc.ub.uni-muenchen.de/18194/ https://edoc.ub.uni-muenchen.de/18194/1/Moest_Lisa.pdf Möst, Lisa ddc:310, ddc:300, Fakultät für Mathematik, Informatik und Statistik
Fakultät für Mathematik, Informatik und Statistik - Digitale Hochschulschriften der LMU - Teil 02/02
Expectile regression can be seen as an extension of available (mean) regression models as it describes more general properties of the response distribution. This thesis introduces to expectile regression and presents new extensions of existing semiparametric regression models. The dissertation consists of four central parts. First, the one-to-one-connection between expectiles, the cumulative distribution function (cdf) and quantiles is used to calculate the cdf and quantiles from a fine grid of expectiles. Quantiles-from-expectiles-estimates are introduced and compared with direct quantile estimates regarding e�ciency. Second, a method to estimate non-crossing expectile curves based on splines is developed. Also, the case of clustered or longitudinal observations is handled by introducing random individual components which leads to an extension of mixed models to mixed expectile models. Third, quantiles-from-expectiles-estimates in the framework of unequal probability sampling are proposed. All methods are implemented and available within the package expectreg via the open source software R. As fourth part, a description of the package expectreg is given at the end of this thesis.
Mathematik, Informatik und Statistik - Open Access LMU - Teil 03/03
Combining national forest inventory (NFI) data with digital site maps of high resolution enables spatially explicit predictions of site productivity. The aim of this study is to explore the possibilities and limitations of this database to analyze the environmental dependency of height-growth of Norway spruce and to predict site index (SI) on a scale that is relevant for local forest management. The study region is the German federal state of Bavaria. The exploratory methods comprise significance tests and hypervolume-analysis. SI is modeled with a Generalized Additive Model (GAM). In a second step the residuals are modeled using Boosted Regression Trees (BRT). The interaction between temperature regime and water supply strongly determined height growth. At sites with very similar temperature regime and water supply, greater heights were reached if the depth gradient of base saturation was favorable. Statistical model criteria (Double Penalty Selection, AIC) preferred composite variables for water supply and the supply of basic cations. The ability to predict SI on a local scale was limited due to the difficulty to integrate soil variables into the model.
Mathematik, Informatik und Statistik - Open Access LMU - Teil 03/03
Recently academic work has been put forward that argues for a great urgency to implement effective climate policies to stop global warming. Concrete policy proposals for reducing CO2 emissions have been developed by the IPCC. One of the major instruments proposed is a carbon tax. A main obstacle for its implementation, however, are concerns about the short-term effects on employment and output. In order to miti-gate possible negative effects of enviromental taxes on output and employment, several European countries have introduced so-called environmental tax reforms (ETR) which are designed in a budget neutral manner: Revenues from the tax can be used to reduce existing distortionary taxes or to subsidize less polluting activities. We apply this idea to a carbon tax scheme by performing a vector autoregression (VAR) with output and employment data of nine industrialized countries. We impose a simultaneous policy shock on the economy whereby a carbon tax is levied on high-carbon intensive industries and the resulting tax revenue is redistributed to low-carbon intensive industries. Impulse response analysis shows that such a policy allows for net gains in terms of output and employment.
Mathematik, Informatik und Statistik - Open Access LMU - Teil 03/03
Many face-to-face surveys use field staff to create lists of housing units from which samples are selected. However, housing unit listing is vulnerable to errors of undercoverage: Some housing units are missed and have no chance to be selected. Such errors are not routinely measured and documented in survey reports. This study jointly investigates the rate of undercoverage, the correlates of undercoverage, and the bias in survey data due to undercoverage in listed housing unit frames. Working with the National Survey of Family Growth, we estimate an undercoverage rate for traditional listing efforts of 13.6 percent. We find that multiunit status, rural areas, and map difficulties strongly correlate with undercoverage. We find significant bias in estimates of variables such as birth control use, pregnancies, and income. The results have important implications for users of data from surveys based on traditionally listed housing unit frames.
Mathematik, Informatik und Statistik - Open Access LMU - Teil 02/03
Tue, 1 Jan 2013 12:00:00 +0100 https://epub.ub.uni-muenchen.de/21729/1/BA_Zeis.pdf Zeis, Klara ddc:500, ddc:310, Ausgewählte Abschlussarbeiten, Statistik
Fakultät für Mathematik, Informatik und Statistik - Digitale Hochschulschriften der LMU - Teil 01/02
Thu, 20 Dec 2012 12:00:00 +0100 https://edoc.ub.uni-muenchen.de/15319/ https://edoc.ub.uni-muenchen.de/15319/1/Schreiber_Irene.pdf Schreiber, Irene ddc:310, ddc:300, Fakultät für Mathematik, Informatik und Statistik
Mathematik, Informatik und Statistik - Open Access LMU - Teil 02/03
A novel point process model continuous in space-time is proposed for quantifying the transmission dynamics of the two most common meningococcal antigenic sequence types observed in Germany 2002-2008. Modelling is based on the conditional intensity function (CIF) which is described by a superposition of additive and multiplicative components. As an epidemiological interesting finding, spread behaviour was shown to depend on type in addition to age: basic reproduction numbers were 0.25 (95% CI 0.19-0.34) and 0.11 (95% CI 0.07-0.17) for types B:P1.7-2,4:F1-5 and C:P1.5,2:F3-3, respectively. Altogether, the proposed methodology represents a comprehensive and universal regression framework for the modelling, simulation and inference of self-exciting spatio-temporal point processes based on the CIF. Usability of the modelling in biometric practice is promoted by an implementation in the R package surveillance.
Mathematik, Informatik und Statistik - Open Access LMU - Teil 02/03
A novel point process model continuous in space-time is proposed for quantifying the transmission dynamics of the two most common meningococcal antigenic sequence types observed in Germany 2002-2008. Modelling is based on the conditional intensity function (CIF) which is described by a superposition of additive and multiplicative components. As an epidemiological interesting finding, spread behaviour was shown to depend on type in addition to age: basic reproduction numbers were 0.25 (95% CI 0.19-0.34) and 0.11 (95% CI 0.07-0.17) for types B:P1.7-2,4:F1-5 and C:P1.5,2:F3-3, respectively. Altogether, the proposed methodology represents a comprehensive and universal regression framework for the modelling, simulation and inference of self-exciting spatio-temporal point processes based on the CIF. Usability of the modelling in biometric practice is promoted by an implementation in the R package surveillance.
Mathematik, Informatik und Statistik - Open Access LMU - Teil 02/03
The partial area under the receiver operating characteristic curve (PAUC) is a well-established performance measure to evaluate biomarker combinations for disease classification. Because the PAUC is defined as the area under the ROC curve within a restricted interval of false positive rates, it enables practitioners to quantify sensitivity rates within pre-specified specificity ranges. This issue is of considerable importance for the development of medical screening tests. Although many authors have highlighted the importance of PAUC, there exist only few methods that use the PAUC as an objective function for finding optimal combinations of biomarkers. In this paper, we introduce a boosting method for deriving marker combinations that is explicitly based on the PAUC criterion. The proposed method can be applied in high-dimensional settings where the number of biomarkers exceeds the number of observations. Additionally, the proposed method incorporates a recently proposed variable selection technique (stability selection) that results in sparse prediction rules incorporating only those biomarkers that make relevant contributions to predicting the outcome of interest. Using both simulated data and real data, we demonstrate that our method performs well with respect to both variable selection and prediction accuracy. Specifically, if the focus is on a limited range of specificity values, the new method results in better predictions than other established techniques for disease classification.
Fakultät für Mathematik, Informatik und Statistik - Digitale Hochschulschriften der LMU - Teil 01/02
Gängige Theorien der mathematischen Finanzwirtschaft wie zum Beispiel der Mean-Variance-Ansatz zur Portfolio-Selektion oder Modelle zur Bewertung von Wertpapieren basieren alle auf der Annahme, dass Renditen im Zeitablauf unabhngig und identisch verteilt sind und einer Normalverteilung folgen. Empirische Untersuchungen liefern jedoch signifikante Hinweise dahingehend, dass diese Annahme f¨ur wichtige Anlageklassen unzutreffend ist. Stattdessen sindWertpapierrenditen zeitabh¨angige Volatilit¨aten, Heavy Tails (schwere Verteilungsr¨ander), Tail Dependence (Extremwertabh¨angigkeit) sowie Schiefe gekennzeichnet. Diese Eigenschaften haben Auswirkungen sowohl auf die theoretische als auch praktischeModellierung in der Finanzwirtschaft. Nach der Pr¨asentation des theoretischen Hintergrundes spricht die Arbeit die Modellierungsprobleme an, die sich aus diesen h¨aufig beobachteten Ph¨anomenen ergeben. Speziell werden Fragen bez¨uglich der Modellierung von Marktund Kreditrisiken volatiler M¨arkte behandelt als auch Probleme bei der Portfolio-Optimierung unter Verwendung alternativer Risikomae und Zielfunktionen. Fragen der praktischen Implementierung wird dabei besondere Aufmerksamkeit gewidmet.
Multi-state models provide a unified framework for the description of the evolution of discrete phenomena in continuous time. One particular example is Markov processes which can be characterised by a set of time-constant transition intensities between the states. In this paper, we will extend such parametric approaches to semiparametric models with flexible transition intensities based on Bayesian versions of penalised splines. The transition intensities will be modelled as smooth functions of time and can further be related to parametric as well as nonparametric covariate effects. Covariates with time-varying effects and frailty terms can be included in addition. Inference will be conducted either fully Bayesian (using Markov chain Monte Carlo simulation techniques) or empirically Bayesian (based on a mixed model representation). A counting process representation of semiparametric multi-state models provides the likelihood formula and also forms the basis for model validation via martingale residual processes. As an application, we will consider human sleep data with a discrete set of sleep states such as REM and non-REM phases. In this case, simple parametric approaches are inappropriate since the dynamics underlying human sleep are strongly varying throughout the night and individual specific variation has to be accounted for using covariate information and frailty terms.
Mathematik, Informatik und Statistik - Open Access LMU - Teil 02/03
Functional magnetic resonance imaging (fMRI) has led to enormous progress in human brain mapping. Adequate analysis of the massive spatiotemporal data sets generated by this imaging technique, combining parametric and non-parametric components, imposes challenging problems in statistical modelling. Complex hierarchical Bayesian models in combination with computer-intensive Markov chain Monte Carlo inference are promising tools.The purpose of this paper is twofold. First, it provides a review of general semiparametric Bayesian models for the analysis of fMRI data. Most approaches focus on important but separate temporal or spatial aspects of the overall problem, or they proceed by stepwise procedures. Therefore, as a second aim, we suggest a complete spatiotemporal model for analysing fMRI data within a unified semiparametric Bayesian framework. An application to data from a visual stimulation experiment illustrates our approach and demonstrates its computational feasibility.
Mathematik, Informatik und Statistik - Open Access LMU - Teil 01/03
Varying-coefficient models provide a flexible framework for semi- and nonparametric generalized regression analysis. We present a fully Bayesian B-spline basis function approach with adaptive knot selection. For each of the unknown regression functions or varying coefficients, the number and location of knots and the B-spline coefficients are estimated simultaneously using reversible jump Markov chain Monte Carlo sampling. The overall procedure can therefore be viewed as a kind of Bayesian model averaging. Although Gaussian responses are covered by the general framework, the method is particularly useful for fundamentally non-Gaussian responses, where less alternatives are available. We illustrate the approach with a thorough application to two data sets analysed previously in the literature: the kyphosis data set with a binary response and survival data from the Veteran’s Administration lung cancer trial.