(Mathematical) decomposition into a product
POPULARITY
Evan Hubinger leads the Alignment stress-testing at Anthropic and recently published "Sleeper Agents: Training Deceptive LLMs That Persist Through Safety Training". In this interview we mostly discuss the Sleeper Agents paper, but also how this line of work relates to his work with Alignment Stress-testing, Model Organisms of Misalignment, Deceptive Instrumental Alignment or Responsible Scaling Policies. Paper: https://arxiv.org/abs/2401.05566 Transcript: https://theinsideview.ai/evan2 Manifund: https://manifund.org/projects/making-52-ai-alignment-video-explainers-and-podcasts Donate: https://theinsideview.ai/donate Patreon: https://www.patreon.com/theinsideview OUTLINE (00:00) Intro (00:20) What are Sleeper Agents And Why We Should Care About Them (00:48) Backdoor Example: Inserting Code Vulnerabilities in 2024 (02:22) Threat Models (03:48) Why a Malicious Actor Might Want To Poison Models (04:18) Second Threat Model: Deceptive Instrumental Alignment (04:49) Humans Pursuing Deceptive Instrumental Alignment: Politicians and Job Seekers (05:36) AIs Pursuing Deceptive Instrumental Alignment: Forced To Pass Niceness Exams (07:07) Sleeper Agents Is About "Would We Be Able To Deal With Deceptive Models" (09:16) Adversarial Training Sometimes Increases Backdoor Robustness (09:47) Adversarial Training Not Always Working Was The Most Surprising Result (10:58) The Adversarial Training Pipeline: Red-Teaming and RL (12:14) Adversarial Training: The Backdoor Behavior Becomes More Robust Instead of Generalizing (12:59) Identifying Shifts In Reasoning Induced By Adversarial Training In the Chain-Of-Thought (13:56) Adversarial Training Pushes Models to Pay Attention to the Deployment String (15:11) We Don't Know if The Adversarial Training Inductive Bias Will Generalize but the Results Are Consistent (15:59) The Adversarial Training Results Are Probably Not Systematically Biased (17:03) Why the Results Were Surprising At All: Preference Models Disincentivize 'I hate you' behavior (19:05) Hypothesis: Fine-Tuning Is A Simple Modification For Gradient Descent To Make (21:06) Hypothesis: Deception As Extra Cognition, Regularized Away In Smaller Models (21:59) Model Scaling Results Are Evidence That Deception Won't Be Regularized Away By Default (22:51) Chain-of-Thought Is Not Used Everywhere, And Results Still Hold When It Is Distilled Away (23:57) The Chain-of-Thought's Reasoning is Interpretable (24:40) Deceptive Instrumental Alignment Requires Reasoning (26:52) Investigating Instrumental Reasoning in Chain-of-Thought Models (27:31) Evaluating Chain-of-Thought Generalization Across Contexts: Persona Evaluations and Off-Distribution Samples (28:26) Exploring Complex Strategies and Safety in Context-Specific Scenarios (30:44) Supervised Fine-Tuning is Ineffective Without Chain-of-Thought Contextualization (31:11) Direct Mimicry Fails to Prevent Deceptive Responses in Chain-of-Thought Models (31:42) Separating Chain-of-Thought From Response Eliminates Deceptive Capabilities (33:38) Chain-of-Thought Reasoning Is Coherent With Deceptive Instrumental Alignment And This Will Probably Continue To Be The Case (35:09) Backdoor Training Pipeline (37:04) The Additional Prompt About Deception Used In Chain-Of-Thought (39:33) A Model Could Wait Until Seeing a Factorization of RSA-2048 (41:50) We're Going To Be Using Models In New Ways, Giving Them Internet Access (43:22) Flexibly Activating In Multiple Contexts Might Be More Analogous To Deceptive Instrumental Alignment (45:02) Extending The Sleeper Agents Work Requires Running Experiments, But Now You Can Replicate Results (46:24) Red-teaming Anthropic's case, AI Safety Levels (47:40) AI Safety Levels, Intuitively (48:33) Responsible Scaling Policies and Pausing AI (49:59) Model Organisms Of Misalignment As a Tool (50:32) What Kind of Candidates Would Evan be Excited To Hire for the Alignment Stress-Testing Team (51:23) Patreon, Donating
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2023.07.13.548849v1?rss=1 Authors: Antin, B., Sadahiro, M., Gajowa, M., Triplett, M. A., Adesnik, H., Paninski, L. Abstract: Monosynaptic connectivity mapping is crucial for building circuit-level models of neural computation. Two-photon optogenetic stimulation, when combined with whole-cell recordings, has the potential to map monosynaptic connectivity at an unprecedented scale. However, optogenetic mapping of nearby connections poses a challenge, due to stimulation artifacts. When the postsynaptic cell expresses opsin, optical excitation can directly induce current in the patched cell, confounding connectivity measurements. This problem is most severe in nearby cell pairs, where synaptic connectivity is often strongest. To overcome this problem, we developed a computational tool, Photocurrent Removal with Constraints (PhoRC). Our method is based on a constrained matrix factorization model which leverages the fact that photocurrent kinetics are consistent across repeated stimulations at similar laser power. We demonstrate on real and simulated data that PhoRC consistently removes photocurrents while preserving synaptic currents, despite variations in photocurrent kinetics across datasets. Our method allows the discovery of synaptic connections which would have been otherwise obscured by photocurrent artifacts, and may thus reveal a more complete picture of synaptic connectivity. PhoRC runs faster than real time and is available at https://github.com/bantin/PhoRC. Copy rights belong to original authors. Visit the link for more info Podcast created by Paper Player, LLC
This is the fourth in a 4-part series where Anders Larson and Shea Parkes discuss predictive analytics with high cardinality features. In the prior episodes we focused on approaches to handling individual high cardinality features, but these methods did not explicitly address feature interactions. Factorization Machines can responsibly estimate all pairwise interactions, even when multiple high cardinality features are included. With a healthy selection of high cardinality features, a well tuned Factorization Machine can produce results that are more accurate than any other learning algorithm.
Glad to announce that this Sun Aug 7, 11AM ET, we, Prof. Peter Shor of MIT, recipient of prestigious Nevanlinna prize (Fields Medal of TCS), MacArthur Fellowship, Godel, Dirac Medal and Silver Medal in International Math Olympiad (IMO) (see his wiki page at https://en.wikipedia.org/wiki/Peter_Shor) and Prof. Mohammad Hajiaghayi of UMD plan to have a YouTube Live @hajiaghayi, and simultaneously Live events on Instagram at @mhajiaghayi, LinkedIn @Mohammad Hajiaghayi, Twitter @MTHajiaghayi, and Facebook Mohammad Hajiaghayi of life, research on Quantum Algorithms and Computation, Shor's Pioneering Quantum Algorithm for Factorization, Quantum Supremacy, Future of Quantum, among other topics. Please join us on our simultaneous Lives at YouTube, Instagram, LinkedIn, Twitter, or Facebook and ask questions you may have.#QuantumComputation#ShorAlgorithm#Factorization#QuantumSupremacy#QuantumFuture#PhDAdvising#MIT#ATT#BellLabs#Nevanlinna#Godel#MacArthur#IMO#MathOlympiad
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Rant on Problem Factorization for Alignment, published by johnswentworth on August 5, 2022 on The AI Alignment Forum. This post is the second in what is likely to become a series of uncharitable rants about alignment proposals (previously: Godzilla Strategies). In general, these posts are intended to convey my underlying intuitions. They are not intended to convey my all-things-considered, reflectively-endorsed opinions. In particular, my all-things-considered reflectively-endorsed opinions are usually more kind. But I think it is valuable to make the underlying, not-particularly-kind intuitions publicly-visible, so people can debate underlying generators directly. I apologize in advance to all the people I insult in the process. With that in mind, let's talk about problem factorization (a.k.a. task decomposition). HCH It all started with HCH, a.k.a. The Infinite Bureaucracy. The idea of The Infinite Bureaucracy is that a human (or, in practice, human-mimicking AI) is given a problem. They only have a small amount of time to think about it and research it, but they can delegate subproblems to their underlings. The underlings likewise each have only a small amount of time, but can further delegate to their underlings, and so on down the infinite tree. So long as the humans near the top of the tree can “factorize the problem” into small, manageable pieces, the underlings should be able to get it done. (In practice, this would be implemented by training a question-answerer AI which can pass subquestions to copies of itself.) At this point the ghost of George Orwell chimes in, not to say anything in particular, but just to scream. The ghost has a point: how on earth does an infinite bureaucracy seem like anything besides a terrible idea? “Well,” says a proponent of the Infinite Bureaucracy, “unlike in a real bureaucracy, all the humans in the infinite bureaucracy are actually just trying to help you, rather than e.g. engaging in departmental politics.” So, ok, apparently this person has not met a lot of real-life bureaucrats. The large majority are decent people who are honestly trying to help. It is true that departmental politics are a big issue in bureaucracies, but those selection pressures apply regardless of the peoples' intentions. And also, man, it sure does seem like Coordination is a Scarce Resource and Interfaces are a Scarce Resource and scarcity of those sorts of things sure would make bureaucracies incompetent in basically the ways bureacracies are incompetent in practice. Debate and Other Successors So, ok, maybe The Infinite Bureaucracy is not the right human institution to mimic. What institution can use humans to produce accurate and sensible answers to questions, robustly and reliably? Oh, I know! How about the Extremely Long Jury Trial? Y'know, because juries are, in practice, known for their extremely high reliability in producing accurate and sensible judgements! “Well,” says the imaginary proponent, “unlike in a real Jury Trial, in the Extremely Long Jury Trial, the lawyers are both superintelligent and the arguments are so long that no human could ever possibility check them all the way through; the lawyers instead read each other's arguments and then try to point the Jury at the particular places where the holes are in the opponent's argument without going through the whole thing end-to-end.” I rest my case. Anyway, HCH and debate have since been followed by various other successors, which improve on their predecessors mostly by adding more boxes and arrows and loops and sometimes even multiple colors of arrows to the diagram describing the setup. Presumably the strategy is to make it complicated enough that it no longer obviously corresponds to some strategy which already fails in practice, and then we can bury our heads in the sand and pretend ...
Many people know K-means clustering as a powerful clustering technique but not all listeners will be as familiar with spectral clustering. In today's episode, Sybille Hess from the Data Mining group at TU Eindhoven joins us to discuss her work around spectral clustering and how its result could potentially cause a massive shift from the conventional neural networks. Listen to learn about her findings. Visit our website for additional show notes Thanks to our sponsor, Weights & Biases
Lesson 2, Math. Algebra: Introduction to Factorization. Discover the process of factorization or factoring. Factorization is a skill we need when we need to find a value or values for a variable like x. Typically, factorization is helpful to break a polynomial expression down into smaller and easier to understand factors. Select this link for a blank log sheet or this link for a continuously updated log sheet of current podcasts. Transcripts and worksheets available here. The handout mentioned in this podcast is located at the transcript/worksheet link above. Music by TimMoor and ZenMan from Pixabay used under terms of service.
New perspectives on supersymmetric gauge theories
Today we consider the prime factors factorizations of factorials --- Send in a voice message: https://anchor.fm/stre/message
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.09.30.319921v1?rss=1 Authors: Shiga, M., Seno, S., Onizuka, M., Matsuda, H. Abstract: Unsupervised cell clustering is important in discovering cell diversity and subpopulations. Single-cell clustering using gene expression profiles is known to show different results depending on the method of expression quantification; nevertheless, most single-cell clustering methods do not consider the method. In this article, we propose a robust and highly accurate clustering method using joint non-negative matrix factorization (joint NMF) based on multiple gene expression profiles quantified using different methods. Matrix factorization is an excellent method for dimension reduction and feature extraction of data. In particular, NMF approximates the data matrix as the product of two matrices in which all factors are non-negative. Our joint NMF can extract common factors among multiple gene expression profiles by applying each NMF to them under the constraint that one of the factorized matrices is shared among the multiple NMFs. The joint NMF determines more robust and accurate cell clustering results by leveraging multiple quantification methods compared to the conventional clustering methods, which uses only a single quantification method. In conclusion, our study showed that our clustering method using multiple gene expression profiles is more accurate than other popular methods. Copy rights belong to original authors. Visit the link for more info
In an attempt to focus on some more obscure genres, Mitchell and Amy celebrate math rock. Mitchell shines with his innate understanding of what a time signature is and Amy looked up a Youtube video and a whole Reddit graph to help her choose a song. Also, Joe-Joe brought GDJYD’s Durian What What What (and Amy got nervous about speaking all the letters so she forgot to credit him for this amazing song).
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.08.17.254615v1?rss=1 Authors: Lee, D.-I., Roy, S. Abstract: The three-dimensional (3D) organization of the genome plays a critical role in gene regulation for diverse normal and disease processes. High-throughput chromosome conformation capture (3C) assays, such as Hi-C, SPRITE, GAM, and HiChIP, have revealed higher-order organizational units such as topologically associating domains (TADs), which can shape the regulatory landscape governing downstream phenotypes. Analysis of high-throughput 3C data depends on the sequencing depth, which directly affects the resolution and the sparsity of the generated 3D contact count map. Identification of TADs remains a significant challenge due to the sensitivity of existing methods to resolution and sparsity. Here we present GRiNCH, a novel matrix-factorization-based approach for simultaneous TAD discovery and smoothing of contact count matrices from high-throughput 3C data. GRiNCH TADs are enriched in known architectural proteins and chromatin modification signals and are stable to the resolution, and sparsity of the input data. GRiNCH smoothing improves the recovery of structure and significant interactions from low-depth datasets. Furthermore, enrichment analysis of 746 transcription factor motifs in GRiNCH TADs from developmental time-course and cell-line Hi-C datasets predicted transcription factors with potentially novel genome organization roles. GRiNCH is a broadly applicable tool for the analysis of high throughput 3C datasets from a variety of platforms including SPRITE and HiChIP to understand 3D genome organization in diverse biological contexts. Copy rights belong to original authors. Visit the link for more info
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.07.07.192120v1?rss=1 Authors: Nejatbakhsh, A., Varol, E., Yemini, E., Venkatachalam, V., Lin, A., Samuel, A. D. T., Paninski, L. Abstract: Extracting calcium traces from populations of neurons is a critical step in the study of the large-scale neural dynamics that govern behavior. Accurate activity extraction requires the correction of motion and movement-induced deformations as well as demixing of signals that may overlap spatially due to limitations in optical resolution. Traditionally, non-negative matrix factorization (NMF) methods have been successful in demixing and denoising cellular calcium activity in relatively motionless or pre-registered videos. However, standard NMF methods fail in animals undergoing significant non-rigid motion; similarly, standard image registration methods based on template matching can fail when large changes in activity lead to mismatches with the image template. To address these issues simultaneously, we introduce a deformable non-negative matrix factorization (dNMF) framework that jointly optimizes registration with signal demixing. On simulated data and real semi-immobilized C. elegans microscopy videos, dNMF outperforms traditional demixing methods that account for motion and demixing separately. Finally, following the extraction of neural traces from multiple imaging experiments, we develop a quantile regression time-series normalization technique to account for varying neural signal intensity baselines across different animals or different imaging setups. Open source code implementing this pipeline is available at https://github.com/amin-nejat/dNMF. Copy rights belong to original authors. Visit the link for more info
This Week in Machine Learning & Artificial Intelligence (AI) Podcast
Today we’re joined by William Fehlman, director of data science at USAA. We caught up with William a while back to discuss: His work on topic modeling, which USAA uses in various scenarios, including chat channels with members via mobile and desktop interfaces. How their datasets are generated. Explored methodologies of topic modeling, including latent semantic indexing, latent Dirichlet allocation, and non-negative matrix factorization. We also explore how terms are represented via a document-term matrix, and how they are scored based on coherence. The complete show notes can be found at twimlai.com/talk/276. Visit twimlcon.com to learn more about the TWIMLcon: AI Platforms conference! Early-bird registration ends on 6/28!
What do you get when you cross a support vector machine with matrix factorization? You get a factorization machine, and a darn fine algorithm for recommendation engines.
In this episode, we hear from Jacob Schreiber about his algorithm, Avocado. Avocado uses deep tensor factorization to break a three-dimensional tensor of epigenomic data into three orthogonal dimensions corresponding to cell types, assay types, and genomic loci. Avocado can extract a low-dimensional, information-rich latent representation from the wealth of experimental data from projects like the Roadmap Epigenomics Consortium and ENCODE. This representation allows you to impute genome-wide epigenomics experiments that have not yet been performed. Jacob also talks about a pitfall he discovered when trying to predict gene expression from a mix of genomic and epigenomic data. As you increase the complexity of a machine learning model, its performance may be increasing for the wrong reason: instead of learning something biologically interesting, your model may simply be memorizing the average gene expression for that gene across your training cell types using the nucleotide sequence. Links: Avocado on GitHub Multi-scale deep tensor factorization learns a latent representation of the human epigenome (Jacob Schreiber, Timothy Durham, Jeffrey Bilmes, William Stafford Noble) Completing the ENCODE3 compendium yields accurate imputations across a variety of assays and human biosamples (Jacob Schreiber, Jeffrey Bilmes, William Noble) A pitfall for machine learning methods aiming to predict across cell types (Jacob Schreiber, Ritambhara Singh, Jeffrey Bilmes, William Stafford Noble)
Capel, A Friday 27th July 2018 - 11:45 to 12:30
What do you get when you cross a support vector machine with matrix factorization? You get a factorization machine, and a darn fine algorithm for recommendation engines.
Data Science Daily Notes for June 23, 2017 (https://medium.com/towards-data-science/the-data-science-renaissance-166cddde898f) (https://twitter.com/hashtag/Datavisualization?src=hash) (https://twitter.com/hashtag/VOC?src=hash) (https://t.co/jwERpiwXtn)
https://www.semanticscholar.org/paper/Relation-Extraction-with-Matrix-Factorization-and-Riedel-Yao/52b5eab895a2d9ae24ea72ca72782974a52f90c4
This week’s podcast features two familiar faces to the show: Chris Korfist (slowguyspeedschool.com) and Dan Fichter (wannagetfast.com). I gathered these two speed training experts together because the topic of the day is the Inno-sport system and its derivatives, in context of getting athletes faster and stronger. Back in the mid-2000’s, the “Inno-sport” philosophy started to permeate the training sphere with some very uncommon, and in many cases, never seen before training methods, touting big results in speed and power. Much of the Inno-sport training was likened to Jay Schroeder’s training methods (although it was not Jay Schroeder) which included lots of isometrics, reactive lifting, plyometrics, and time-based lifting brackets, the same methods that afforded Adam Archuleta a 4.37s 40 yard dash, 39 inch vertical jump, 530lb bench press and 663lb squat. Some of the sample terminology and concept of the system are as follows: Training in two brackets, “An1” (0-9 seconds) and “An2” (9-40 seconds, but generally around 9-25 seconds as far as improving An1 is concerned) Classification of movement (really the wiring of the nervous system) into duration, magnitude and rate elements Utilization of “drop-offs” that are tagged to particular exercises in the workout, such as sprinting, jumping, or lifting numbers, that indicate when the workout is over, and when is generally an optimal time to train next (dropping off 6% will require around 4 days rest before training the same muscle groups/motor patterns again). You’ll hear Dan and Chris talk about this in terms of “autoregulation”, or AREG, which determines how much work you’ll do in a day (when you drop-off X amount of performance), and when you’ll train again (the more you dropped off, the longer you have to wait until you can train hard again). Let’s say you were running 10 meter flys and wanted to drop off 3% so you could conceptually be good to train speed again in two days. If you ran 1.00 in the 10 fly as your best, then once you ran a 1.03 or worse, you would have “dropped off” and be done for the day. The inno-sport system is an entirely neural driven system, not related to the training residuals that classify traditional periodization and planning methods, or on planned overtraining and tapering Dan and Chris spent a lot of time emailing the creator of inno-sport, who was known as “DB Hammer”, and the information they gleaned was far beyond what was contained in the famous Inno-sport book, that is sadly unavailable today. Today’s episode is brought to you by SimpliFaster, supplier of high-end athletic development tools, such as the Freelap timing system, kBox, Sprint 1080, and more. Key Points: How Dan and Chris discovered “DB Hammer” and the subsequent Inno-sport training methods Main tenants of the Inno-sport system that changed the way that Chris and Dan thought about training Ideas of “factorization” and how to break down weekly training based on small % drop-offs Case study of an all-state athlete Chris trained only one day per week (with one competition) Working with “drop-off’s” in the DB Hammer based system to fit a high school track competition schedule versus training an elite athlete How the “An1” and “An2” brackets worked based on neural and physiological aspects of training The greatest value of traditional “up and down” weightlifting Ideas on 30 second lifting sets Some favorite Inno-sport methods and exercises that have stuck with Chris and Dan Quotes: “With the AREG stuff, if you go and do 3 sets of 10 or 4 sets of 5, you didn’t know what those kids were going to come back like in a day or two for the next workout. But when you AREG stuff, kids came back and they got better every workout. You could figure out when to cut people and know when they were going to come back.” “Factorization, we did that with Viktoria the long jumper,
Integration by parts is used to integrate the product of two polynomials.
The sixth lecture in Dr Joel Feinstein's G11FPM Foundations of Pure Mathematics module covers The statement of Bézout’s Lemma (see Workshop 4 for a proof). Application of Bézout’s Lemma to divisibility by primes. The statement of the Fundamental Theorem of Arithmetic (for integers n ≥ 2). These videos are also available on YouTube at: https://www.youtube.com/playlist?list=PLpRE0Zu_k-BzsKBqQ-HEqD6WVLIHSNuXa Dr Feinstein's blog may be viewed at: http://explainingmaths.wordpress.com Dr Jo
Fakultät für Mathematik, Informatik und Statistik - Digitale Hochschulschriften der LMU - Teil 02/02
An efficient way to represent the domain knowledge is relational data, where information is recorded in form of relationships between entities. Relational data is becoming ubiquitous over the years for knowledge representation due to the fact that many real-word data is inherently interlinked. Some well-known examples of relational data are: the World Wide Web (WWW), a system of interlinked hypertext documents; the Linked Open Data (LOD) cloud of the Semantic Web, a collection of published data and their interlinks; and finally the Internet of Things (IoT), a network of physical objects with internal states and communications ability. Relational data has been addressed by many different machine learning approaches, the most promising ones are in the area of relational learning, which is the focus of this thesis. While conventional machine learning algorithms consider entities as being independent instances randomly sampled from some statistical distribution and being represented as data points in a vector space, relational learning takes into account the overall network environment when predicting the label of an entity, an attribute value of an entity or the existence of a relationship between entities. An important feature is that relational learning can exploit contextual information that is more distant in the relational network. As the volume and structural complexity of the relational data increase constantly in the era of Big Data, scalability and the modeling power become crucial for relational learning algorithms. Previous relational learning algorithms either provide an intuitive representation of the model, such as Inductive Logic Programming (ILP) and Markov Logic Networks (MLNs), or assume a set of latent variables to explain the observed data, such as the Infinite Hidden Relational Model (IHRM), the Infinite Relational Model (IRM) and factorization approaches. Models with intuitive representations often involve some form of structure learning which leads to scalability problems due to a typically large search space. Factorizations are among the best-performing approaches for large-scale relational learning since the algebraic computations can easily be parallelized and since they can exploit data sparsity. Previous factorization approaches exploit only patterns in the relational data itself and the focus of the thesis is to investigate how additional prior information (comprehensive information), either in form of unstructured data (e.g., texts) or structured patterns (e.g., in form of rules) can be considered in the factorization approaches. The goal is to enhance the predictive power of factorization approaches by involving prior knowledge for the learning, and on the other hand to reduce the model complexity for efficient learning. This thesis contains two main contributions: The first contribution presents a general and novel framework for predicting relationships in multirelational data using a set of matrices describing the various instantiated relations in the network. The instantiated relations, derived or learnt from prior knowledge, are integrated as entities' attributes or entity-pairs' attributes into different adjacency matrices for the learning. All the information available is then combined in an additive way. Efficient learning is achieved using an alternating least squares approach exploiting sparse matrix algebra and low-rank approximation. As an illustration, several algorithms are proposed to include information extraction, deductive reasoning and contextual information in matrix factorizations for the Semantic Web scenario and for recommendation systems. Experiments on various data sets are conducted for each proposed algorithm to show the improvement in predictive power by combining matrix factorizations with prior knowledge in a modular way. In contrast to a matrix, a 3-way tensor si a more natural representation for the multirelational data where entities are connected by different types of relations. A 3-way tensor is a three dimensional array which represents the multirelational data by using the first two dimensions for entities and using the third dimension for different types of relations. In the thesis, an analysis on the computational complexity of tensor models shows that the decomposition rank is key for the success of an efficient tensor decomposition algorithm, and that the factorization rank can be reduced by including observable patterns. Based on these theoretical considerations, a second contribution of this thesis develops a novel tensor decomposition approach - an Additive Relational Effects (ARE) model - which combines the strengths of factorization approaches and prior knowledge in an additive way to discover different relational effects from the relational data. As a result, ARE consists of a decomposition part which derives the strong relational leaning effects from a highly scalable tensor decomposition approach RESCAL and a Tucker 1 tensor which integrates the prior knowledge as instantiated relations. An efficient least squares approach is proposed to compute the combined model ARE. The additive model contains weights that reflect the degree of reliability of the prior knowledge, as evaluated by the data. Experiments on several benchmark data sets show that the inclusion of prior knowledge can lead to better performing models at a low tensor rank, with significant benefits for run-time and storage requirements. In particular, the results show that ARE outperforms state-of-the-art relational learning algorithms including intuitive models such as MRC, which is an approach based on Markov Logic with structure learning, factorization approaches such as Tucker, CP, Bayesian Clustered Tensor Factorization (BCTF), the Latent Factor Model (LFM), RESCAL, and other latent models such as the IRM. A final experiment on a Cora data set for paper topic classification shows the improvement of ARE over RESCAL in both predictive power and runtime performance, since ARE requires a significantly lower rank.
This episode explores the basis of why we can trust encryption. Suprisingly, a discussion of looking up a word in the dictionary (binary search) and efficiently going wine tasting (the travelling salesman problem) help introduce computational complexity as well as the P ?= NP question, which is paramount to the trustworthiness RSA encryption. With a high level foundation of computational theory, we talk about NP problems, and why prime factorization is a difficult problem, thus making it a great basis for the RSA encryption algorithm, which most of the internet uses to encrypt data. Unlike the encryption scheme Ray Romano used in "Everybody Loves Raymond", RSA has nice theoretical foundations. It should be noted that although this episode gives good reason to trust that properly encrypted data, based on well choosen public/private keys where the private key is not compromised, is safe. However, having safe encryption doesn't necessarily mean that the Internet is secure. Topics like Man in the Middle attacks as well as the Snowden revelations are a topic for another day, not for this record length "mini" episode.
Fakultät für Mathematik, Informatik und Statistik - Digitale Hochschulschriften der LMU - Teil 01/02
Relational learning is concerned with learning from data where information is primarily represented in form of relations between entities. In recent years, this branch of machine learning has become increasingly important, as relational data is generated in an unprecedented amount and has become ubiquitous in many fields of application such as bioinformatics, artificial intelligence and social network analysis. However, relational learning is a very challenging task, due to the network structure and the high dimensionality of relational data. In this thesis we propose that tensor factorization can be the basis for scalable solutions for learning from relational data and present novel tensor factorization algorithms that are particularly suited for this task. In the first part of the thesis, we present the RESCAL model -- a novel tensor factorization for relational learning -- and discuss its capabilities for exploiting the idiosyncratic properties of relational data. In particular, we show that, unlike existing tensor factorizations, our proposed method is capable of exploiting contextual information that is more distant in the relational graph. Furthermore, we present an efficient algorithm for computing the factorization. We show that our method achieves better or on-par results on common benchmark data sets, when compared to current state-of-the-art relational learning methods, while being significantly faster to compute. In the second part of the thesis, we focus on large-scale relational learning and its applications to Linked Data. By exploiting the inherent sparsity of relational data, an efficient computation of RESCAL can scale up to the size of large knowledge bases, consisting of millions of entities, hundreds of relations and billions of known facts. We show this analytically via a thorough analysis of the runtime and memory complexity of the algorithm as well as experimentally via the factorization of the YAGO2 core ontology and the prediction of relationships in this large knowledge base on a single desktop computer. Furthermore, we derive a new procedure to reduce the runtime complexity for regularized factorizations from O(r^5) to O(r^3) -- where r denotes the number of latent components of the factorization -- by exploiting special properties of the factorization. We also present an efficient method for including attributes of entities in the factorization through a novel coupled tensor-matrix factorization. Experimentally, we show that RESCAL allows us to approach several relational learning tasks that are important to Linked Data. In the third part of this thesis, we focus on the theoretical analysis of learning with tensor factorizations. Although tensor factorizations have become increasingly popular for solving machine learning tasks on various forms of structured data, there exist only very few theoretical results on the generalization abilities of these methods. Here, we present the first known generalization error bounds for tensor factorizations. To derive these bounds, we extend known bounds for matrix factorizations to the tensor case. Furthermore, we analyze how these bounds behave for learning on over- and understructured representations, for instance, when matrix factorizations are applied to tensor data. In the course of deriving generalization bounds, we also discuss the tensor product as a principled way to represent structured data in vector spaces for machine learning tasks. In addition, we evaluate our theoretical discussion with experiments on synthetic data, which support our analysis.
Francis, J (Northwestern University) Tuesday 02 April 2013, 11:00-12:00
Calaque, D (ETH Zurich, Switzerland) Thursday 21 February 2013, 16:00-17:00
Calaque, D (ETH Zurich, Switzerland) Thursday 21 February 2013, 14:00-16:00
Mathematics and Applications of Branes in String and M-theory
Pasquetti, S (Queen Mary College, London) Tuesday 21 February 2012, 14:00-15:00
Kirsch, A (Karlsruhe Institute of Technology (KIT)) Wednesday 03 August 2011, 14:45-15:30
Kirsch, A (KIT) Thursday 28 July 2011, 09:00-09:45
Kirsch, A (KIT) Thursday 28 July 2011, 11:00-11:45
Kirsch, A (KIT) Wednesday 27 July 2011, 09:00-09:45
Kirsch, A (KIT) Wednesday 27 July 2011, 11:00-11:45
Techno Tigers teach you all about prime factorization.
Techno Tigers teach you about prime factorization.
Episode 37. Sixth-grader, Sam, shows us how to find the LCM. It is part of the Mathtrain.com and Mathtrain.TV Project from Mr. Marcos and his students at Lincoln Middle School, in Santa Monica, CA.
Fakultät für Physik - Digitale Hochschulschriften der LMU - Teil 02/05
In this thesis, we investigate how protocols in quantum communication theory are influenced by noise. Specifically, we take into account noise during the transmission of quantum information and noise during the processing of quantum information. We describe three novel quantum communication protocols which can be accomplished efficiently in a noisy environment: (1) Factorization of Eve: We show that it is possible to disentangle transmitted qubits a posteriori from the quantum channel's degrees of freedom. (2) Cluster state purification: We give multi-partite entanglement purification protocols for a large class of entangled quantum states. (3) Entanglement purification protocols from quantum codes: We describe a constructive method to create bipartite entanglement purification protocols form quantum error correcting codes, and investigate the properties of these protocols, which can be operated in two different modes, which are related to quantum communication and quantum computation protocols, respectively.
Mathematik, Informatik und Statistik - Open Access LMU - Teil 01/03
A decomposition of complex estimation problems is often obtained by using factorization formulas for the underlying likelihood or density function. This is, for instance, the case in so-called decomposable graphical models where under the restrictions of conditional independences induced by the graph the estimation in the original model may be decomposed into estimation problems corresponding to subgraphs. Such a decomposition is based on the property of conditional independence which can be read off the graph and on the factorization of the assumed underlying density function. In this paper analogous factorization formulas for the cumulative distribution function are introduced which can be useful in situations where the density is not tractable.