This course focused on the signal processing on databases, based on detection theory and linear algebra with databases. By bringing together the concepts of strings and searching, student applied the big data analysis skills to new domains.
Jeremy Kepner talked about his newly released book, "Mathematics of Big Data," which serves as the motivational material for the D4M course.
Introduction to signal processing applied to graphs. Course outline. Discussion of relevant technologies programming and storage technologies. Constructing a graph from raw data.
D4M.mit.edu software demo example/1Intro/2EdgeArt. Adjacency matrix construction. Incidence matrix construction.
Historical evolution of the web and cloud computing. Using the exploded (D4M) schema. Analyzing computer network data. Analyzing computer network data.
Computing statistics and analytics on data in the exploded (D4M) schema.
D4M.mit.edu software demo examples/2Apps/4BioBlast. Distribution of genetic sample data. Ingesting genetic data into an associative array using the exploded (D4M) schema. Correlating genetic data via associative array multiplication.
D4M.mit.edu software demo example/2Apps/1EntityAnalysis. Incidence array of text entities. Computing the entity degree distribution.
Genectic sequence analysis using associative arrays. Creating of a genetic processing pipeline. Ingest of genetic sample data into a database. Sub-sampling of data. Correlation of genetic samples using associative array multiplication.
D4M.mit.edu software demo examples/3Scaling/2ParallelDatabase. MIT SuperCloud database management system. Starting the Apache Accummulo database. High performance database ingest. Using the D4M schema.
Theory of Kronecker graphs. Database ingest performance and database query performance. Array multiplication performance.
D4M.mit.edu software demo examples/3Scaling/1KroneckerGraph. Generation of power law graphs via Kronecker products. D4M.mit.edu software demoexamples/3Scaling/3MatrixPerformance. Measuring the performance of array multiplication.
Associative array mathematics. Relevant operations on an associative array. Semirings and matrices. See MIT Press book "Mathematics of Big Data."
D4M.mit.edu software demo example/1Intro/3GroupTheory. Associativity, commutativity, distributivity properties.
Statistical distribution of background/noise in databases. Power law distribution describes many backgrounds. Perfect power law distribution can be used to bin and model the background data.
D4M.mit.edu software demo examples/2Apps/3PerfectPowerLaw. Generate synthetic power law distributions. Analyze power law distributions in real data.
Creating an exploded database schema. Standard database processing chain. Graph adjacency matrix. Vertex degree distribution. Directed graphs, multi-graphs, and hyper-graphs. Graph incidence matrix.
D4M.mit.edu software demo example/2Apps/2TrackAnalysis. Tracking entities through space and time.
Introduction to associative arrays. D4M.mit.edu software demo example/1Intro/1AssocIntro. Creating, writing, reading, selecting, and performing math on associative arrays.