Multiscale Fast and Distributed Data & Statistics Summarization

Unprecedented amounts of data are being generated from many sources, including sensors and simulations. New data management algorithms that can accurately, compactly, and efficiently summarize large amounts of data on existing petascale and future exascale systems are necessary. The algorithms must efficiently process, analyze, and summarize/reduce supplied data in a single pass, while simultaneously minimizing data movement. To address this, IAI and collaborators will apply Geometric Multi-Resolution Analysis (GMRA) to uncover low dimensional structure in large volume and high dimensional data sets. The GMRA approach can be naturally applied to a wide variety of data types, and will be implemented in a scalable data intensive computing environment. It provides linear scaling with the number of data points, multi-resolution representation of the data, and provable error estimates. The technique is robust to noise, amenable to fast algorithms and suitable for visualization and subsequent analysis. The coarse scale representation in GMRA is a form summarization where separated partitions approximate the data. Further, the GMRA representation at fine scales encodes details and allows zooming into particular structures of the data. GMRA’s structure is conducive to performing many data analysis tasks including clustering, anomaly detection, change detection and visualization, and it is applicable to several sensor systems including electro-optical, infrared, LiDAR, and hyperspectral. It reduces both time and resources for extracting actionable information and making real time decisions. GMRA will improve information extraction from large amounts of text data, with applications in social media analytics, drug discovery and health care. It can provide effective real time summarization capability for streaming data, generate reduced bases to solve problems, and enable the potential reduction in memory-per-core for exascale systems.