A Science Cloud on your Desktop: VisTrails + GridFields

Through a collaboration between the eScience Institute at the University of Washington, the University of Utah, and the NSF Science and Technology Center for Coastal Margin Observation and Prediction (CMOP), we are exploring architectures to access cloud-scale computing resources through a rich desktop environment.

Scalable Datalog over HaLoop

The main goal of this project is to design an efficient parallel implementation of Datalog on the MapReduce framework.

HyperFlow: An Efficient Dataflow Architecture for Heterogeneous Systems

The popularization of commodity multi-core CPUs and multi-GPU units has opened many alternatives for the design and implementation of efficient visualization and data analysis algorithms. However, manually distributing the processing load among CPU cores and GPU

Peptide Search using Hadoop

Under NIH grant HG006091, Insilicos is extending the popular X!Tandem proteomics search engine to work in Hadoop, and has run experiments on the CluE platform through coordination with PI Howe.

Astronomical Image Co-addition Using Hadoop

Using the CluE platform, Keith Wiley, a research engineer in the Astronomy Department, has led the development of an image query pipeline involving Hadoop coupled to a relational database for pre-filtering.

HaLoop paper one of Best of VLDB 2010!

Our VLDB 2010 paper on the HaLoop system was selected for inclusion in a special issue of the VLDB Journal highlighting the best papers from the VLDB 2010 conference.

We have submitted an extended version of the conference paper that adds more experiments on fault tolerance and a refined API.

SEE ALSO:
HaLoop Code released!

HaLoop code released!

We have released an initial version of the HaLoop infrastructure for efficient iterative distributed computations. The platform is based on Hadoop, but adds a new programming model for multi-step loop bodies and non-trivial termination conditions, several inter-iteration caching mechanisms and a new scheduling algorithm to exploit them, and fault-tolerance capabilities via automatic cache reconstruction.

Get the code!

Massively Parallel Visualization in the Cloud: Challenges and Opportunities

Large-scale visualization systems are historically designed for "throwing datasets" --- pushing pre-conditioned data through the graphics pipeline as quickly as possible. However, increasingly, scalable data manipulation, restructuring, and querying are becoming important features for a comprehensive exploratory visualization system. Simultaneously, cloud computing providers are gaining increasing market share (and mind share) by renting access to low-cost --- but typically low-end --- distributed computing facilities.

Haloop: Efficient iterative data processing on large clusters

HaLoop, developed by YingYi Bu, Bill Howe, and Magda Balazinska at the University of Washington, is an extension to Hadoop to support iterative MapReduce programs. Many machine learning and data mining applications run iteratively until they converge on a particular answer. These applications are currently implemented by using a separate "driver program” outside the MapReduce framework. We hypothesize that building recursion into the framework itself can offer significant improvements in both performance and ease-of-use.

Evaluating Client+Cloud Architectures

Science is becoming data-intensive, requiring new software architectures that can exploit resources at all scales: local GPUs for interactive visualization, server-side multi-core machines with fast processors and large memories, and scalable, pay-as-you-go cloud resources. Architectures that seamlessly and flexibly exploit all three platforms are largely unexplored. Informed by a long-term collaboration with ocean scientists, we articulate a suite of representative visual data analytics workflows and use them to design and implement a multi-tier immersive visualization system.

Syndicate content