CASE STUDY
“We had a HPC (high-performance
computing) environment with shared
storage that supported all our research and
software development activity,” noted James
Blackburn, Head of Data Engineering at
Man AHL. “At times, you would have people
manipulating large numbers of files at the
same time as quants were trying to run large
simulations. Our storage system just didn’t
have the IOPS to support all those demands
simultaneously, which meant some people
would get frustrated by the response times.
We needed something that would provide
much greater IOPS and bandwidth.”
Collier noted: “We’ve had poor experiences
in the past with expensive – and supposedly
highly parallel – storage solutions simply
not being able to deliver the bandwidth
and IOPS we require. This time, we wanted
to make sure we conducted extreme
due-diligence to explore every possibility.”
Blackburn was charged with managing the
search for a new storage infrastructure.
In addition to high IOPS and bandwidth,
“we also wanted the new storage to be
highly-scalable,” Collier added. “We’re an
extremely agile business and need to be
able to add terabytes of storage almost on
a whim. We just want to be able to start
where we are, then add capacity and IOPS
as necessary. In other words, we want
something that scales linearly.”
In addition, Collier noted: “We are not in
the storage business and do not want to
dedicate multiple engineers to the job of
building and maintaining bespoke storage
solutions, or to administering complex-to-
manage vendor products. That’s not what
we’re best at, that’s not where the value-add
from our technology lies.”
experiences. That was a good sign for us
from the start.”
Based on its initial due diligence, the Man
AHL team arranged for a proof of concept
(PoC) trial of a Pure FlashBlade system from
Pure Storage. FlashBlade is the first all-flash
storage system purpose-built for modern
analytics, architected from the ground-up to
deliver a powerful cloud-era data platform
that is fast, compact, infinitely scalable and
easy to manage. FlashBlade is designed
for customers who need parallel storage to
support data-intensive applications involving
parallel processing, such as the type of
financial simulations conducted by Man AHL.
Man AHL’s FlashBlade configuration has
eight blades, each with 52TB raw capacity.
Factoring in the overhead of the Purity OS
and erasure coding, Man AHL is achieving
329.14 TiB of effective capacity thanks to
an aggregate data reduction of 1.5:1 for all
workloads running on FlashBlade.
“In the proof of concept phase, we pushed
the Pure Storage system to the max,”
Blackburn reported. “We tested total
throughput for large I/O, maximum IOPS
and other metrics. Overall, the system scaled
as promised. We achieved 6GBs bandwidth
for large I/O reads and 3GBs for writes. And
the 600,000 IOPS we achieved were far
more than we ever had before.”
Collier added: “Our quants want to test
a model, get the results and then test
another one, and another one all day long.
So a 10x–20x improvement in performance
can be a game-changer when it comes
to creating a time-to-market advantage
for us.” The greatest benefit from
installing Pure FlashBlade, Collier said, “is
significantly improved productivity for the
team and accelerated time-to-market for
new trading ideas.”
The POC involved workloads from across the
firm’s research teams, “and we picked jobs
that were very demanding on I/O, like our
Jenkins builds and Spark users,” Blackburn
said. “As we progressed, we loaded more
and more concurrent workloads and did
benchmark testing to see how close we
were to the limits of the system. What we
found is that FlashBlade gives us significant
headroom beyond what we need for the
business today. This gives us confidence
knowing we can simply add a blade to gain
more capacity and preserve the investment
we already have.”
A month into the POC, the firm’s entire
research department had been migrated
to FlashBlade. In addition, Man AHL runs
a concurrent 50 TB Mongo database
containing order-book data from stock
exchanges and other sources. “During the
POC, we put a replica set of the Mongo data
A MONTH INTO THE POC, THE FIRM’S
ENTIRE RESEARCH DEPARTMENT HAD
BEEN MIGRATED TO FLASHBLADE.
The search for a new solution
led to Pure Storage
Knowing that its need for high bandwidth
and IOPS meant next-generation storage,
the team investigated suppliers of all-flash
products. “Pure Storage is very well-
respected, both by organisations like Gartner
and by users,” Collier said. “More often than
not, if you Google a storage vendor’s name,
you’ll find horrific stories about data loss and
other problems. But the customers who have
purchased and deployed Pure take the time
to write Internet posts about their positive
www.intelligentcio.com
Performance improvements
create time-to-market advantage on FlashBlade using an NFS mount. It worked
so well that we left it there,” Collier said.
The impact of the new storage infrastructure
was noticed immediately. “Many of our
researchers have found that the introduction
of FlashBlade has made it easier to use
Spark for performing multiple simulations.
One of them experienced a 10x-to-20x
improvement in throughput for his Spark
workloads, compared to the previous storage
system,” Blackburn said. Beyond meeting the high-performance
requirements of its core simulation
applications, Blackburn said, “the real
benefit of the FlashBlade is the consistent
I/O and metadata access. To have a
shared substrate of I/O that is consistent
throughout the day makes users very
happy. Storage used to be the bottleneck
and now that has been removed.”
INTELLIGENTCIO
65