PEARC 2023 “Computing for the Common Good”

By: Rion Dooley

Data Machines recently attended the 2023 Practice and Experience in Advanced Research Computing (PEARC) conference in Portland, Oregon to join members of the OpenStack Scientific SIG in a Bird of a Feather (BoF) session on, “Open Cloud Infrastructure for advanced research computing workloads.”

The PEARC conference series is held once a year and is chock full of highly technical content from HPC administrators, researchers, and application developers. Over the last 5 years, PEARC’s programming has reflected the pervasive adoption of cloud and container technologies seen across all areas of computational science. As a provider of bespoke private cloud solutions and support services for advanced computing workloads, the topics are always relevant to DMC’s own efforts, and the speakers are some of the top academic researchers in the world. The BoF was a great opportunity to share our experiences operating high-performance clouds and hear from other practitioners about their thoughts on the state of the industry.

We were fortunate to have a capacity crowd on hand for an early 8am start. Participants represented universities, national computing centers, and private companies in North America, Europe, and Australia. Much of the conversation drifted to aspects of tooling, application support, and hybrid use cases. Participants shared the various technology ecosystems used to access their research computing clouds and how OSS fits into the picture. Predefined applications-as-a-service through VM catalogs, such as Exosphere, or traditional Glance image management seemed to be popular. A few participants indicated that their users had begun leveraging underlying HPC storage and networking from their clouds to publish data, results, and provenance for the workloads running on their HPC systems. 

Hot Topics and Trends at PEARC

At the broader PEARC conference, AI disruption was a topic on the minds of many in attendance. From security hardening to hyper scaling, booth speakers and attendees acknowledged the ongoing challenge to pin down the mechanics and meaning of integrating AI into their work. Issues brought up could be roughly grouped into three categories: finding value, finding funding, and finding a path to adoption.

Finding value

Most of the attendees we spoke to expressed a general awareness that AI was something they needed to consider for their work, but many had questions about what that meant in their specific situation.

  • Should I focus on AI or ML first?

  • Am I most concerned with optimization or prediction?

  • Is my use case better suited for batch or real-time utilization?

  • How can I benefit from incorporating ChatGPT or similar LLM into my work?

Understanding how AI fits into your use case is unique to you. DMC helps many of our clients carry out this analysis and find a strategy that works for them.

Finding (more) funding

For those incorporating AI, questions came up early and often about how to pay for the added computing, licensing, cloud services, and storage costs. 

  • Is accuracy a function of capacity? (ie. What is the best answer we can get for X dollars?)

  • Are non-traditional accelerators viable for general-purpose AI workloads?

  • Can we utilize GPU multitenancy and virtualization to reduce capital costs?


Planning an infrastructure investment begins with understanding the utilization of your current and planned workloads, available resources, and potential for growth. DMC has a wealth of experience in profiling, analyzing, optimizing, and building custom infrastructure suited to the exact needs of our clients. 


Finding a path

Among those deep in the weeds integrating AI into their current research, a lot of unanswered questions arose over how best to lower the barrier to entry while improving interoperability and reusability. Several existing efforts to clarify this topic were presented during the week. 

  • The ConvoHPC workshop explored ways in which conversational AI could be applied to specific areas of interest within HPC. 

  • A keynote by the Intelligent Cyberinfrastructure with Computational Learning in the Environment (ICICLE) project discussed a per-domain approach to tackling the problem of AI integration using a holistic cyberinfrastructure approach. 

  • A BoF session entitled, “The Impact of AI Computing Paradigms on Science Gateways and National Compute Resources” invited users to brainstorm different ways that AI could be incorporated into existing science gateway use cases.

The common thread running through all of these sessions was that mature integration requires a shared understanding of domain knowledge between two very different systems. Existing HPC systems are built from the ground up to interact primarily through synchronous command-line interactions. Efforts to expose batch schedulers, workloads, and data on these systems through programmatic interfaces are fairly recent developments but still operate at a level that does not translate into anything close to the semantic definitions required by a large language model to provide a conversational interface. Science gateways frequently provide REST interfaces to their functionality, however, the definitions are frequently incomplete or exist with far less granularity than is required to support anything more than basic “Hello World” tasks.

If this story sounds familiar, it should. The same systems integration problem comes up every few years whenever legacy technology is mapped onto a new computing paradigm. It happened when companies shifted primary access from the Mainframe to PC, again as consumers shifted from PC to mobile, and yet again as fleet management shifted to management of the IoT, just to name a few.

Interfaces aside, simply integrating AI into a legacy system can be a significant challenge from a software engineering perspective. Is the legacy system using continuous integration? Is there a mechanism in place to evaluate model accuracy and performance on the current system? Can the current production environment even be replicated for testing (not a guarantee on HPC systems)? These concerns are all in addition to the basic question of whether the development team has a broad enough skillset to take on the technical debt involved with adding an AI system that likely leverages a completely different toolchain than they usually work with. 

The struggle is real. This is why efforts like ConvoHPC and ICICLE are underway. 


Integrating AI and ML into your existing technology stack can be a significant undertaking. Establishing an effective proof of concept up front gives you a framework from which to evaluate future performance as your models change and datasets evolve. If you are incorporating some combination of open-source, commercial, and in-house models into your solution, defining a flexible framework by which to add, remove, and evolve them over time is critical. DMC has experience helping our clients find success in integrating, evaluating, and managing catalogs of models and data from a variety of sources. 


It was good to reconnect in person with the Scientific SIG group at PEARC this year. We look forward to following up with the community over the next year to see how they are coming to grips with AI. 

Previous
Previous

DMC Supports Groundbreaking Efforts

Next
Next

DMC HQ Moves to Reston Town Center West