Map-D: MIT spinout takes big data real-time with GPUs ~ TECHNOLOGY 13

We all know that with enough expensive servers, big companies can crunch through massive amounts of data. In some cases, like trending search reports, dedicated computing resources can even make large-scale analysis happen in realtime. Now, startup Map-D has harnessed the power of GPUs to allow the realtime analysis and visualization of huge datasets with a much smaller hardware investment. Winner of this year’s Emerging Company Summit at Nvidia’s Global Tech Conference, Map-D wowed the judges and attendees (they got my vote) with a compelling demo that allows hundreds of simultaneous users to analyze tweets worldwide. Even as a canned demo it would have been cool, but the good news is that the system is live and public, so you can play with it yourself.

An in-memory database built around the GPU

Map-D CEO Thomas Graham accepts check from Nvidia for winning the Emerging Company Showcase at GTC 2014

Described in simplest terms, Map-D starts out as an in-memory, SQL-compatible, database. Its genius comes in a radically new architecture that allows it to use both CPUs and GPUs, with high-performance GPU memory serving as a cache for the most frequently used data. CPU memory is then used as a larger, next-level, cache. Map-D also uses a column organization — allowing it to make more effective use of the memory it has than a traditional organization by rows.
Map-D’s distributed architecture even allows it to scale across multiple nodes for extremely large databases, as well as allowing the realtime insertion of new data. This realtime updating is likely one of the reasons that companies — including Facebook and PayPal — have expressed interest in evaluating Map-D’s product for use in creating realtime analytic systems. The tweet visualization screenshot below links to the live demo (click on the image to run the actual demo), so you can experience some of the power and flexibility of Map-D for yourself. Note that the tweets in the demo are from a historical dataset and not being updated in realtime.

High performance through integration

A big part of Map-D’s amazing visualization performance is its integration of database, analytics, and visualization into a single package. Because all three applications are integrated, data can be left in memory — even on the GPU — as the data is queried, analytics are run, and the results are visualized. Traditional approaches using separate applications typically require moving the data between applications and often back and forth in and out of memory — which of course slows things down.

Next steps: A supercomputer in your pocket

Reaching into the future, Map-D also claims that its architecture is perfect for running on the increasingly powerful SoCs found in mobile devices. Right now it may be hard to imagine having enough data on your mobile device to need to run analytics on it. However, as memory continues to become more dense and less expensive, it is only a matter of time before our mobile devices have their own big data requirements — especially for processing-heavy mobile applications like medical diagnosis and image recognition. Instead of being tied to the cloud, someday those data-and-compute-intensive applications may truly be able to go mobile.