This project is supported by the National Science Foundation Grant No. 0844572.
Through a collaboration between the eScience Institute at the University of Washington, the University of Utah, and the NSF Science and Technology Center for Coastal Margin Observation and Prediction (CMOP), we are exploring architectures to access cloud-scale computing resources through a rich desktop environment.
The popularization of commodity multi-core CPUs and multi-GPU units has opened many alternatives for the design and implementation of efficient visualization and data analysis algorithms. However, manually distributing the processing load among CPU cores and GPU
Under NIH grant HG006091, Insilicos is extending the popular X!Tandem proteomics search engine to work in Hadoop, and has run experiments on the CluE platform through coordination with PI Howe.
Using the CluE platform, Keith Wiley, a research engineer in the Astronomy Department, has led the development of an image query pipeline involving Hadoop coupled to a relational database for pre-filtering.
Our VLDB 2010 paper on the HaLoop system was selected for inclusion in a special issue of the VLDB Journal highlighting the best papers from the VLDB 2010 conference.
We have submitted an extended version of the conference paper that adds more experiments on fault tolerance and a refined API.
HaLoop Code released!
We have released an initial version of the HaLoop infrastructure for efficient iterative distributed computations. The platform is based on Hadoop, but adds a new programming model for multi-step loop bodies and non-trivial termination conditions, several inter-iteration caching mechanisms and a new scheduling algorithm to exploit them, and fault-tolerance capabilities via automatic cache reconstruction.
Large-scale visualization systems are historically designed for "throwing datasets" --- pushing pre-conditioned data through the graphics pipeline as quickly as possible. However, increasingly, scalable data manipulation, restructuring, and querying are becoming important features for a comprehensive exploratory visualization system. Simultaneously, cloud computing providers are gaining increasing market share (and mind share) by renting access to low-cost --- but typically low-end --- distributed computing facilities.
HaLoop, developed by YingYi Bu, Bill Howe, and Magda Balazinska at the University of Washington, is an extension to Hadoop to support iterative MapReduce programs. Many machine learning and data mining applications run iteratively until they converge on a particular answer. These applications are currently implemented by using a separate "driver program” outside the MapReduce framework. We hypothesize that building recursion into the framework itself can offer significant improvements in both performance and ease-of-use.
Science is becoming data-intensive, requiring new software architectures that can exploit resources at all scales: local GPUs for interactive visualization, server-side multi-core machines with fast processors and large memories, and scalable, pay-as-you-go cloud resources. Architectures that seamlessly and flexibly exploit all three platforms are largely unexplored. Informed by a long-term collaboration with ocean scientists, we articulate a suite of representative visual data analytics workflows and use them to design and implement a multi-tier immersive visualization system.