Intelligent CIO Europe Issue 06 | Page 65

CASE STUDY “We had a HPC (high-performance computing) environment with shared storage that supported all our research and software development activity,” noted James Blackburn, Head of Data Engineering at Man AHL. “At times, you would have people manipulating large numbers of files at the same time as quants were trying to run large simulations. Our storage system just didn’t have the IOPS to support all those demands simultaneously, which meant some people would get frustrated by the response times. We needed something that would provide much greater IOPS and bandwidth.” Collier noted: “We’ve had poor experiences in the past with expensive – and supposedly highly parallel – storage solutions simply not being able to deliver the bandwidth and IOPS we require. This time, we wanted to make sure we conducted extreme due-diligence to explore every possibility.” Blackburn was charged with managing the search for a new storage infrastructure. In addition to high IOPS and bandwidth, “we also wanted the new storage to be highly-scalable,” Collier added. “We’re an extremely agile business and need to be able to add terabytes of storage almost on a whim. We just want to be able to start where we are, then add capacity and IOPS as necessary. In other words, we want something that scales linearly.” In addition, Collier noted: “We are not in the storage business and do not want to dedicate multiple engineers to the job of building and maintaining bespoke storage solutions, or to administering complex-to- manage vendor products. That’s not what we’re best at, that’s not where the value-add from our technology lies.” experiences. That was a good sign for us from the start.” Based on its initial due diligence, the Man AHL team arranged for a proof of concept (PoC) trial of a Pure FlashBlade system from Pure Storage. FlashBlade is the first all-flash storage system purpose-built for modern analytics, architected from the ground-up to deliver a powerful cloud-era data platform that is fast, compact, infinitely scalable and easy to manage. FlashBlade is designed for customers who need parallel storage to support data-intensive applications involving parallel processing, such as the type of financial simulations conducted by Man AHL. Man AHL’s FlashBlade configuration has eight blades, each with 52TB raw capacity. Factoring in the overhead of the Purity OS and erasure coding, Man AHL is achieving 329.14 TiB of effective capacity thanks to an aggregate data reduction of 1.5:1 for all workloads running on FlashBlade. “In the proof of concept phase, we pushed the Pure Storage system to the max,” Blackburn reported. “We tested total throughput for large I/O, maximum IOPS and other metrics. Overall, the system scaled as promised. We achieved 6GBs bandwidth for large I/O reads and 3GBs for writes. And the 600,000 IOPS we achieved were far more than we ever had before.” Collier added: “Our quants want to test a model, get the results and then test another one, and another one all day long. So a 10x–20x improvement in performance can be a game-changer when it comes to creating a time-to-market advantage for us.” The greatest benefit from installing Pure FlashBlade, Collier said, “is significantly improved productivity for the team and accelerated time-to-market for new trading ideas.” The POC involved workloads from across the firm’s research teams, “and we picked jobs that were very demanding on I/O, like our Jenkins builds and Spark users,” Blackburn said. “As we progressed, we loaded more and more concurrent workloads and did benchmark testing to see how close we were to the limits of the system. What we found is that FlashBlade gives us significant headroom beyond what we need for the business today. This gives us confidence knowing we can simply add a blade to gain more capacity and preserve the investment we already have.” A month into the POC, the firm’s entire research department had been migrated to FlashBlade. In addition, Man AHL runs a concurrent 50 TB Mongo database containing order-book data from stock exchanges and other sources. “During the POC, we put a replica set of the Mongo data A MONTH INTO THE POC, THE FIRM’S ENTIRE RESEARCH DEPARTMENT HAD BEEN MIGRATED TO FLASHBLADE. The search for a new solution led to Pure Storage Knowing that its need for high bandwidth and IOPS meant next-generation storage, the team investigated suppliers of all-flash products. “Pure Storage is very well- respected, both by organisations like Gartner and by users,” Collier said. “More often than not, if you Google a storage vendor’s name, you’ll find horrific stories about data loss and other problems. But the customers who have purchased and deployed Pure take the time to write Internet posts about their positive www.intelligentcio.com Performance improvements create time-to-market advantage on FlashBlade using an NFS mount. It worked so well that we left it there,” Collier said. The impact of the new storage infrastructure was noticed immediately. “Many of our researchers have found that the introduction of FlashBlade has made it easier to use Spark for performing multiple simulations. One of them experienced a 10x-to-20x improvement in throughput for his Spark workloads, compared to the previous storage system,” Blackburn said. Beyond meeting the high-performance requirements of its core simulation applications, Blackburn said, “the real benefit of the FlashBlade is the consistent I/O and metadata access. To have a shared substrate of I/O that is consistent throughout the day makes users very happy. Storage used to be the bottleneck and now that has been removed.” INTELLIGENTCIO 65