POPULARITY
Bernie Wu is VP of Business Development for MemVerge. He has 25+ years of experience as a senior executive for data center hardware and software infrastructure companies including companies such as Conner/Seagate, Cheyenne Software, Trend Micro, FalconStor, Levyx, and MetalSoft. Boosting LLM/RAG Workflows & Scheduling w/ Composable Memory and Checkpointing // MLOps Podcast #270 with Bernie Wu, VP Strategic Partnerships/Business Development of MemVerge. // Abstract Limited memory capacity hinders the performance and potential of research and production environments utilizing Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) techniques. This discussion explores how leveraging industry-standard CXL memory can be configured as a secondary, composable memory tier to alleviate this constraint. We will highlight some recent work we've done in integrating of this novel class of memory into LLM/RAG/vector database frameworks and workflows. Disaggregated shared memory is envisioned to offer high performance, low latency caches for model/pipeline checkpoints of LLM models, KV caches during distributed inferencing, LORA adaptors, and in-process data for heterogeneous CPU/GPU workflows. We expect to showcase these types of use cases in the coming months. // Bio Bernie is VP of Strategic Partnerships/Business Development for MemVerge. His focus has been building partnerships in the AI/ML, Kubernetes, and CXL memory ecosystems. He has 25+ years of experience as a senior executive for data center hardware and software infrastructure companies including companies such as Conner/Seagate, Cheyenne Software, Trend Micro, FalconStor, Levyx, and MetalSoft. He is also on the Board of Directors for Cirrus Data Solutions. Bernie has a BS/MS in Engineering from UC Berkeley and an MBA from UCLA. // MLOps Swag/Merch https://mlops-community.myshopify.com/ // Related Links Website: www.memverge.com Accelerating Data Retrieval in Retrieval Augmentation Generation (RAG) Pipelines using CXL: https://memverge.com/accelerating-data-retrieval-in-rag-pipelines-using-cxl/ --------------- ✌️Connect With Us ✌️ ------------- Join our slack community: https://go.mlops.community/slack Follow us on Twitter: @mlopscommunity Sign up for the next meetup: https://go.mlops.community/register Catch all episodes, blogs, newsletters, and more: https://mlops.community/ Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/ Connect with Bernie on LinkedIn: https://www.linkedin.com/in/berniewu/
In this Intel Conversations in the Cloud audio podcast: In this episode of Conversations in the Cloud, we’re joined by Matt Meinel, SVP of Business Development and Solutions Architecture at Levyx, and Rick Walsworth, VP of Product and Solution Marketing at Vexata. The companies worked on a joint solution using Vexata’s ultra-fast NVMe Flash arrays […]
In this episode of Conversations in the Cloud, we’re joined by Matt Meinel, SVP of Business Development and Solutions Architecture at Levyx, and Rick Walsworth, VP of Product and Solution Marketing at Vexata. The companies worked on a joint solution using Vexata's ultra-fast NVMe Flash arrays in conjunction with Levyx's low-latency software to achieve increased performance with less infrastructure, resulting in a 300% improvement in the price/performance ratio over the industry's next best alternative solution. At the core of Levyx low-latency software is the company’s indexing algorithm, which integrates with Vexata’s NVMe-based systems to make the best use of their IO controller. After studying bottlenecks in the IO stack, Vexata re-architected the stack around a data path using acceleration via FPGAs to deliver fast, low-latency services out to the applications. For financial use cases, the companies can achieve IO bottleneck breakthrough with the addition of Intel FGPA-based calculations for derivative computations. Looking to the future, the companies are seeing real solutions built for the growing AI industry. Solutions from Levyx and Vexata take advantage of heavy compute processing power to automate tasks, and the companies are working to optimize solutions to work within machine learning and deep learning use cases. A full IO stack benchmark report can be found at https://stacresearch.com/levyx. For more information on Levyx solutions, visit at www.levyx.com or sales@levyx.com. For more information on Vexata, visit www.vexata.com or sales@vexata.com and follow them on Twitter at https://twitter.com/vexatacorp.
In this Intel Conversations in the Cloud audio podcast: Matt Meinel, Senior Vice President – Sales, Business Development and Solutions Architecture at Levyx joins this episode of Conversations in the Cloud to discuss leading-edge financial services solutions that leverage high-performance data store technology. Levyx’s software-only solutions provide fundamental building blocks for next-generation software-defined storage infrastructure. […]
In this Intel Conversations in the Cloud audio podcast: Matt Meinel, Senior Vice President – Sales, Business Development and Solutions Architecture at Levyx joins this episode of Conversations in the Cloud to discuss leading-edge financial services solutions that leverage high-performance data store technology. Levyx’s software-only solutions provide fundamental building blocks for next-generation software-defined storage infrastructure. […]
Matt Meinel, Senior Vice President - Sales, Business Development and Solutions Architecture at Levyx joins this episode of Conversations in the Cloud to discuss leading-edge financial services solutions that leverage high-performance data store technology. Levyx’s software-only solutions provide fundamental building blocks for next-generation software-defined storage infrastructure. Jake and Matt talk about Levyx’s software-defined data processors, Helium* and Xenon*, as well as the work that Levyx has been doing with financial model backtesting in capital markets with Intel® Optane™ technology, and Intel® Programmable Acceleration Card (Intel® PAC) with Intel® Arria® 10 GX FPGAs. To learn more about Levyx, go to www.levyx.com or follow them on Twitter at https://twitter.com/levyxinc.
Bernie Wu, Chief Business Development Officer at Levyx, joins us on this episode of Conversations in the Cloud to discuss how Levyx’s software and Intel® Optane™ technology can help usher in this new age of real-time persistent computing. Levyx aims to enable big data operations/analytics to run faster, simpler, and cheaper, using persistent computing. Bernie talks about the Levyx software portfolio including Helium, Xenon, and Levyx-Spark Connector. He also highlights that Levyx's Helium Key Value Store, combined with a pair of Intel® Optane™ drives, achieved over 20 Million operations/sec with 99% latencies of under 21 microseconds. To learn more go to www.levyx.com or follow Levyx on Twitter at https://twitter.com/levyxinc.