IAI has considerable experience in the research, development and transition of innovative Big Data Analytics tools, applications, systems and technologies. We have expertise in the areas of Data Mining and Informatics, Natural Language Processing and Text Analytics, and Social Media Analytics. IAI’s Big Data solutions are applied to many areas including scientific data analysis, health informatics and intelligence analysis. Our clients and collaborators include the Department of Defense (including Army, Navy, DARPA, and Air Force), Health and Human Services (including CDC and NIH), Department of Energy, Integrated Intelligent Software, the Mayo Clinic, University of Illinois at Urbana-Champaign, New York University, Cornell University and University of Michigan.
IAI’s data mining solutions include an agent based meta-optimization data mining tool and an online data mining and knowledge management tool for Nanoinformatics. Our scalable, distributed and modular toolkit for mining text-rich heterogeneous information networks enhances contextual understanding. IAI also has extensive experience developing innovative data cubing and visual analytics solutions. Two examples of our solutions include developing heterogeneous information retrieval solution based on OLAP and faceted search technology, and developing multi-scale visualization techniques by integrating cubing architecture with visual analytics.
IAI's natural language processing and text analytics solutions include event and attribute extraction for military persistence awareness; entity and relation extraction for scientific papers, clinical notes, and social media contents; topic and opinion analysis for open source text and social media contents; and dynamic text summarization.
IAI's solutions in social media analytics bring the intelligence in the massive social media data to decision making. Our Dynamic Warehousing and Mining (DWM) solution maximizes the ability of intelligence analysts to collect, organize, and analyze massive data to assess HSCB dimensions for a given group and predict current belief states and likely intended actions. Another solution, SHIELD, synthesizes social networks and their communication contents based on the physical laws of network and linguistic features.
IAI has an OpenStack-based private cloud, DRACO (Distributed Robust Agile Cloud Operandi). The cloud is built using sets of 12 core (24 core via hyper-threading) Intel XEON Chips, 32GB/node RAM, and 2 x 4 TB local data storage per node. 1 Gbps Ethernet interconnects all nodes and Inﬁniband is being used.
IAI's cloud comprises of two main layers, lower and higher. The lower layer consists of Openstack (currently Folsom version) framework, which offers IaaS functionalities, whereas the higher layer contains the Hadoop ecosystem supported by other open source framework such as GraphLab and Storm, providing SaaS capabilities. DRACO IaaS is essentially a multi-user resource sharing framework with scalable computing, storage and networking services. Using DRACO IaaS, we provide the following capabilities, many of which are standard for Openstack-based clouds:
- Rapid Virtual Machine (VM) conﬁguration, where a set of VM images are provided, and users can create their own instances
- Web-based tools to manage and migrate VMs
- Custom image uploading
- Flexible resource allocation in an elastic and virtualized environment, and resource sharing
- Distributed storage based on GlusterFS, which provides persistent and fault-tolerant storage
- Project separation via Virtual Local Area Networking (VLAN), where projects do not interfere with each other
- Access control via key management
- Software Deﬁned Networking (SDN) support via 6 HP OpenFlow switches
On top of the IaaS, DRACO admits the deployment of Hadoop clusters closely integrated with GraphLab for graph analytics, and Storm for near-real-time Big Data analytics, one of its Software-as-a-Service (SaaS) capabilities. This capability is made possible by the configurable Hadoop-enabled VMs, and the Hadoop cluster deployment, management and control software. A representative list of software components that are integrated and provided for Big Data analytics in the VMs cluster include (i) MapReduce for parallel algorithm execution, (ii) GraphLab and Graphbuilder for graph analytics, (iii) HBase and Accumulo for