The RedHat Summit

Posted by Saul Youssef

One of the inspirational things about being in Boston is how we’re swimming in a sea of technology companies and startups including Intel, DELL/EMC, Google, NVIDIA, Amazon, Facebook, IBM, Microsoft, Mathworks, Wolfram Research, Mellanox, Cisco, General Electric,…just to name a few off the top of my head. For NESE, it’s especially fortunate that Red Hat is in town and has a partnership with Boston University. Because of this, we get to work with some of the top Red Hat Engineers including Sage Weil’s Ceph team.

On May 7-9, Red Hat held their annual Red Hat Summit at the convention center down in the Seaport district. Scott and I were on a panel discussion that Uday organized talking about practical experiences with Ceph. We told them all about NESE – motivation, strategy and first deployment. Much of the panel was about success stories and Ceph “journeys” (this was new for me…everyone is “on a journey” all of a sudden). I was particularly impressed by an Israeli fellow named Idan who described migrating his University from proprietary storage applicances to open source and a single Ceph cluster. The part I liked best was when he explained that they completely replaced their Ceph cluster hardware and hardware vendor by swapping things out, all with no down time. Another big trend that became apparent during the panel is data science researchers migrating from specialized Hadoop and Spark clusters into larger more general purpose object stores. This is likely to create a flood of demand for NESE, since every one of our five collaborating institutions has a major new Data Science institute, initiative or center.

Sage, Scott and I also gave a talk about the road map and strategy for Ceph with NESE included as an example. Sage described two big directions. The first is a ~2 year “Project Crimson” which is a re-design of Ceph to be optimized for flash and 3DXpoint storage architectures, hardware that is expected to take over from spinning drives over the next few years. With flash based storage, the performance bottlenecks are actually in CPU rather than in the storage hardware itself. Sage explained that this means that the usual multi-threaded software design is going to be bottlenecked by context switching. To avoid context switching, they are pinning memory to individual cores in multi-core machines, using message passing to communicate between threads and using some protocols to bypass the kernel and directly communicate with storage devices. Naturally, this is a huge amount of work.

The second big Ceph-related trend is integration with Kubernetes and Openshift. The idea is to containerize MON, OSD, MDS,… and create multi-cloud Ceph clusters, so that storage could, for example, transparently migrate from NESE into multiple commercial clouds depending on economics, performance needs and for down time coverage. This would certainly be a big strategic shift.

There were many interesting booths.  Inspiration for control room/Data Science/AI software.  – Saul