#╭・artificial-intelligence | NetApp | Page 1

cunning barn Nov 8, 2023, 5:26 PM

#

We've added a new channel to discuss AI/ML and more! To kick things off, if you haven't seen OpenAI's recent keynote, Sam Altman unveiled some exciting new stuff coming to GPT! https://www.youtube.com/live/U9mJuUkhUzk?si=NDlLx751sPGZ4lws

YouTube

OpenAI

OpenAI DevDay, Opening Keynote

Join us for the opening keynote from OpenAI DevDay — OpenAI’s first developer conference.We’re gathering developers from around the world for an in-person da...

▶ Play video

ionic sandal Nov 8, 2023, 5:33 PM

#

I am actually interested in PaLM-E from Google.

mortal radish Nov 23, 2023, 1:44 AM

#

does netapp has any involvement in genAI?

cunning barn Dec 7, 2023, 1:05 AM

#

https://openai.com/blog/sam-altman-returns-as-ceo-openai-has-a-new-initial-board

Sam Altman returns as CEO, OpenAI has a new initial board

Mira Murati as CTO, Greg Brockman returns as President. Read messages from CEO Sam Altman and board chair Bret Taylor.

cunning barn Feb 12, 2024, 9:21 PM

#

https://www.wsj.com/tech/ai/sam-altman-seeks-trillions-of-dollars-to-reshape-business-of-chips-and-ai-89ab3db0

#

TRILLION? With a T?

lilac trail Feb 15, 2024, 10:11 AM

#

Practical question : how can we leverage the dataOps toolkit to import existing datasets into containers . most examples show how to use the toolkit to create new containers (workpsaces) with PVC but not how to bind existing datasets. This can probably be achieved with Trident but i don't see how to use the DataOsp toolkit for the same ina K8s environment .

cunning barn Feb 15, 2024, 3:20 PM

#

@hot field is in here, let’s see if the ping gets his attention. If not, I’ll go chase him down for ya

hot field Feb 16, 2024, 5:47 PM

#

lilac trail Practical question : how can we leverage the dataOps toolkit to import existing ...

@lilac trail - The DataOps Toolkit does not include any volume import capabilities. You would need to import the existing volume using tridentctl (instructions: https://docs.netapp.com/us-en/trident/trident-use/vol-import.html). After importing the volume with Trident, you will have a PVC which you can then attach to workspaces using the DataOps Toolkit.

fossil geyser Feb 19, 2024, 10:24 PM

#

https://www.wired.com/story/air-canada-chatbot-refund-policy/

WIRED

Air Canada Has to Honor a Refund Policy Its Chatbot Made Up

The airline tried to argue that it shouldn't be liable for anything its chatbot says.

lilac trail Feb 20, 2024, 11:15 AM

#

hot field <@456226577798135808> - The DataOps Toolkit does not include any volume import c...

Any plans to integrate this functionality into the DataOPs toolkit ? The main painpoint in my view is not building and starting container workspaces but rather handle the backend data that is used with the container which is often pre-existing or generated outside the containers used for training jobs. So snapshoting/cloning the existing data, provide versioning and so on is where the value is .

normal yoke Mar 6, 2024, 5:14 PM

#

Posting this AI related blog here about doing LLM RAG development with NetApp, let me know if you have any questions, comments, or disagreements!

https://community.netapp.com/t5/Tech-ONTAP-Blogs/Accelerating-LLM-Retrieval-Augmented-Generation-development-with-NetApp/ba-p/451230

Accelerating LLM Retrieval-Augmented Generation development with Ne...

NetApp is the intelligent data infrastructure company. In this post, we will take a look at what this means in the context of the now ubiquitous technology that is Generative AI (genAI). Large language models (LLM) are at the center of genAI offerings. These models require months of time and sig...

echo hamlet Mar 14, 2024, 10:57 AM

#

Ai and Analytics exam is so tough 😮‍💨. Gave 7 attempts still not able to clear

torpid hazel Mar 18, 2024, 8:04 PM

#

#┊・events message

Here we goooo!

#

Nvidia GTC! My colleagues are on-site, I'm so jealous 😵‍💫🤯

fossil geyser Mar 18, 2024, 8:10 PM

#

Nice! I'd love to be there

torpid hazel Mar 18, 2024, 8:32 PM

#

@cunning barn you have some space left in your rack?

cunning barn Mar 18, 2024, 8:42 PM

#

I wonder if David Blackwell is there...

#

#

That's insane.

cunning barn Mar 18, 2024, 8:45 PM

#

torpid hazel <@277739185786454016> you have some space left in your rack?

Space? Absolutely. Power/Cooling?

#

https://tenor.com/view/absolutely-not-dan-levy-david-david-rose-schitts-creek-gif-20805019

Tenor

#

130 TB/s?! No way.

cunning barn Mar 18, 2024, 9:30 PM

#

https://tenor.com/view/awesome-yes-yeah-baby-cheer-gif-15859596

Tenor

cunning barn Mar 19, 2024, 3:02 PM

#

https://www.netapp.com/newsroom/press-releases/news-rel-20240318-867369/

cunning barn Mar 19, 2024, 11:00 PM

#

#

Nice throwback to 5 years ago at GTC...

fossil geyser Mar 19, 2024, 11:21 PM

#

**Crysis

brittle helm Mar 20, 2024, 12:15 AM

#

nothing ever will actually be able to play Crysis. it's like the inverse of DOOM

thin sable Mar 20, 2024, 12:06 PM

#

brittle helm nothing ever will actually be able to play Crysis. it's like the inverse of DO...

That is like the greatest thing I've ever heard, I mean they put DOOM on an electronic pregnancy test... Crysis I still doubt would run well on Deep Thought.

brittle helm Mar 20, 2024, 1:19 PM

#

There's a whole subreddit on it 😄 https://www.reddit.com/r/itrunsdoom/

r/itrunsdoom

This subreddit focuses on odd hardware that runs Doom. Calculators, ATMs, fridges, old video game systems if it has a computer in it, it can possibly run Doom.

Please note that /r/itrunsdoom has gone dark indefinitely in protest of the API changes by Reddit. For more info, see here: https://www.reddit.com/r/Save3rdPartyApps/comments/1476ioa/red...

cunning barn Mar 27, 2024, 6:42 PM

#

https://nvdam.widen.net/s/xqt56dflgh/nvidia-blackwell-architecture-technical-brief?ncid=so-nvsh-253767

nvidia-blackwell-architecture-technical-brief.pdf

cunning barn Mar 29, 2024, 4:25 AM

#

https://www.uber.com/blog/scaling-ai-ml-infrastructure-at-uber/

Uber Blog

Scaling AI/ML Infrastructure at Uber

Machine Learning (ML) is celebrating its 8th year at Uber since we first started using complex rule-based machine learning models for driver-rider matching and pricing teams in 2016. Since then, our progression has been significant, with a shift towards employing deep learning models at the core of most business-critical applications today, whil...

#

Solid read! 🤘

cunning barn Apr 18, 2024, 7:41 PM

#

https://huggingface.co/blog/llama3

Welcome Llama 3 - Meta's new open LLM

rustic sequoia Apr 22, 2024, 7:19 PM

#

I just downloaded Llama3 on Ollama and am impressed with the quality and speed on my MacBook!

cunning barn Apr 24, 2024, 3:18 PM

#

https://docs.netapp.com/us-en/netapp-solutions/ai/aipod_nv_architecture.html

NetApp AIPod with NVIDIA DGX Systems - Solution Architecture

NetApp AI Pod with NVIDIA DGX Systems - Architecture

cunning barn May 13, 2024, 4:07 PM

#

https://www.youtube.com/watch?v=DQacCB9tDaw

YouTube

OpenAI

OpenAI Spring Update

We’ll be streaming live at 10AM PT Monday, May 13 to demo some ChatGPT and GPT-4 updates.

▶ Play video

torpid hazel May 13, 2024, 10:57 PM

#

this is so sick, all these example-videos: https://openai.com/index/hello-gpt-4o/

#

🤯

zinc ether Nov 13, 2024, 6:43 PM

#

I'm deploying the NetApp GenAI Toolkit Preview v0.4 - would this be the right place to make inquires about functionality ?

cunning barn Nov 13, 2024, 6:51 PM

#

Sure! And we can ping @hot field with any questions!

zinc ether Nov 13, 2024, 6:53 PM

#

After I've deployed and indexed some data - is it correct to assume that when I click on the eyeglass that it tries to open an explorer type window? Cause currently when I do that I get an error prompt. It does say above the prompt window "Explore with a prompt or select a folder"

#

perhaps its just a lack of a proper prompt..

zinc ether Nov 13, 2024, 8:08 PM

#

the chrome console is showing a 500 error when it tries to open http://<ip addr>/apiv1/prompt/ .... looking for an index-CSBXXXX.js file

cunning barn Nov 13, 2024, 10:07 PM

#

Mike is out on parental leave and will be back on Dec 2nd. Let me dig around and see if I can find someone to assist.

#

I found his backup, and he'll be here soon 🙂

zinc ether Nov 13, 2024, 10:14 PM

#

thanks appreciate the help

cunning barn Nov 13, 2024, 10:15 PM

#

@limpid storm here ya go...

#

Let's see if @torn fern gets his ping...

zinc ether Nov 13, 2024, 10:27 PM

#

no worries I'll check back in tomorrow

cunning barn Nov 14, 2024, 3:26 PM

#

zinc ether no worries I'll check back in tomorrow

zinc ether Nov 14, 2024, 3:32 PM

#

not sure the docker image is actually mounting the file system - cause I was able to get the prompt to show me an ls of /volumes/gcnv/<volname> and it returned a total of 12 files

#

given the bootscript - only the underlying OS would mount the NFS volume

#

but sure I'll create an issue in git

cunning barn Nov 14, 2024, 3:48 PM

#

Thanks William! That’s the best way to get to some of those engineers.

#

Prabu is on my team so I’ll make sure to follow up with him

zinc ether Nov 14, 2024, 4:06 PM

#

thanks gitlab issue created - https://github.com/NetAppLabs/genai-toolkit-terraform-deployment/issues/37

GitHub

v0.4 release - web UI doesn't show directory of mounted volumes. · ...

name: Bug report about: Report a bug or technical issue title: "v0.4 release - web UI doesn't show directories of mounted volumes." labels: bug assignees: '' Describe the issu...

cunning barn Nov 19, 2024, 1:32 AM

#

#

It’s crazy to think it was just released in Dec 2022…

cunning barn Jan 25, 2025, 5:39 PM

#

https://github.com/meta-llama/llama-stack

GitHub

GitHub - meta-llama/llama-stack: Composable building blocks to buil...

Composable building blocks to build Llama Apps. Contribute to meta-llama/llama-stack development by creating an account on GitHub.

#

Keep an eye on this! Composable middleware that’s fully extensible!

hot field Jan 28, 2025, 12:57 PM

#

This is super interesting. Looks like it is essentially a standardized API layer that can sit on top of any inference server (ollama, vLLM, NIM, etc.), allowing users to swap inference providers without needing to change any of their API calls. Only catch is it is “Llama first” (“Explicit focus on Meta’s Llama models and partnering ecosystem”).

cunning barn Jan 28, 2025, 3:18 PM

#

Agreed! I’m trying to imagine use-cases where something like this would be implemented

wraith apex May 21, 2025, 6:36 PM

#

Anyone in the NetApp community doing anything interesting with MCP and Agents on top of the ONTAP API? Likely needs some guardrails if allowing the agent to make any modifications. But would be neat if just giving it read-only access. Maybe an MCP server with an ONTAP API tool along with a BlueXP API tool? "Agent, tell me what people or orgs within my enterprise consume the majority of my resources?"

cunning barn May 21, 2025, 9:19 PM

#

I’m tinkering with building a hardware universe agent, just need to verify I’m not duplicating efforts already in progress.

#

Not sure I would turn an agent loose on my data quite yet

wraith apex May 22, 2025, 3:46 AM

#

Pair that up with the IMT and that could be a powerful tool for partners - "Agent, here's a BOM for NetApp gear and some other 3rd party vendor hardware/software. This is being installed in an environment with this additional list of hardware/software. Validate that this is a supported configuration."

wicked crow May 26, 2025, 12:49 AM

#

And hope it doesn't hallucinate an unsupported config😂

wraith apex May 28, 2025, 9:23 PM

#

That's sort of the entire point of using MCP underneath the Agent. You would expose HWU and IMT as either APIs or ingest the data from those tools into a database. MCP either exposes the API or templated queries as "tools" that the agent can then use. You just describe back to the agent what the tool can be used for and what parameters to provide the tool (the API contract or query templates). The LLM then has to abide be the constraints of the tools it has access to. You can then provide the lineage of the requests to understand how the agent derived its response. This is basically what Claude.ai is now doing (and I believe ChatGPT can do this now too) where you tell it to search the internet for an answer. When it creates the response, it also provides you with the sources it used for each part of the response.

wheat sequoia May 30, 2025, 1:04 AM

#

wraith apex Anyone in the NetApp community doing anything interesting with MCP and Agents on...

Hi Taylor, i've built a workflow with n8n, using the NetApp api's. It needs some tuning but I can get basic info out of Ontap. This is in a lab environment 🙂
I use telegram to chat with the workflow

wraith apex May 30, 2025, 11:54 AM

#

wheat sequoia Hi Taylor, i've built a workflow with n8n, using the NetApp api's. It needs some...

Nice! I've never heard of n8n before. Looks useful.

wheat sequoia May 30, 2025, 12:02 PM

#

wraith apex Nice! I've never heard of n8n before. Looks useful.

Once I get it 100% working, i'll create a video on it

velvet geyser Jul 3, 2025, 5:45 AM

#

wheat sequoia Hi Taylor, i've built a workflow with n8n, using the NetApp api's. It needs some...

Hi David, That's really great. IHAC who asked me a simple question related to get some information via thier chat. I think If we can get it using the workflow based on the n8n + LLM environment, of cour se we need to make some touches.

wheat sequoia Jul 3, 2025, 5:46 AM

#

velvet geyser Hi David, That's really great. IHAC who asked me a simple question related to ge...

I'm going to work on a short video this weekend of how it's all working and showcase a demo, few people have been asking 🙂

velvet geyser Jul 3, 2025, 6:12 AM

#

wheat sequoia I'm going to work on a short video this weekend of how it's all working and show...

Perfect! Many Thanks 🙂

velvet geyser Jul 3, 2025, 8:44 AM

#

If anyone has interested in the Trend report related to AI, please check this link. https://www.bondcap.com/report/pdf/Trends_Artificial_Intelligence.pdf
It's almost 340 pages.

hot field Jul 8, 2025, 9:36 PM

#

Wanted to make sure this community saw a couple of blogs that were recently posted:
https://community.netapp.com/t5/Tech-ONTAP-Blogs/Zero-to-LLM-Inference-in-Five-Minutes/ba-p/461655
https://community.netapp.com/t5/Tech-ONTAP-Blogs/Zero-to-LLM-Inference-vLLM-Edition/ba-p/461836

Zero to LLM Inference in Five Minutes

Confused about where to start with LLMs? You've come to the right place. In this post, I will walk through the deployment of a basic LLM inference stack that will run on any of NetApp's NVIDIA-based AIPods. This stack is appropriate for smaller-scale deployments and POCs. Prerequisites You must...

Zero to LLM Inference - vLLM Edition

A couple of weeks ago, I published a post walking through the deployment of a basic LLM inference stack that will run on any of NetApp's NVIDIA-based AIPods. In that post, I used NVIDIA NIM for LLMs as my inference server. NIM is powerful and easy to adopt, but it is not the only option for the infe...

cunning barn Jul 10, 2025, 11:36 PM

#

#

Good cheat sheet for Agentic protocols

wraith apex Jul 11, 2025, 10:40 PM

#

The first two have gained a lot of mindshare. I've yet to see the latter two put into practice.

hot field Jul 11, 2025, 11:36 PM

#

Hot off the press: https://community.netapp.com/t5/Tech-ONTAP-Blogs/LLM-Inference-KV-Cache-Offloading-to-ONTAP-with-vLLM-and-GDS/ba-p/461914

LLM Inference - KV Cache Offloading to ONTAP with vLLM and GDS

Today, we continue our series on LLM inference with an exploration of a more advanced topic, KV cache offloading. This post will build on my previous post in which I walked you through the deployment of an LLM inference stack using vLLM, a popular open-source LLM inference server. The vLLM deploymen...

hot field Aug 8, 2025, 8:40 PM

#

https://community.netapp.com/t5/Tech-ONTAP-Blogs/Host-OpenAI-gpt-oss-on-NetApp-AIPod/ba-p/462603

Host OpenAI gpt-oss on NetApp AIPod

Earlier this week, OpenAI released two state-of-the-art open-weight LLMs (large language models), gpt-oss-20b and gpt-oss-120b. These models have generated quite a bit of interest due to their reasoning capabilities, tool-use performance, and native efficiency. According to OpenAI, the gpt-oss-120b ...

cunning barn Aug 12, 2025, 2:46 PM

#

wraith apex Aug 14, 2025, 3:27 PM

#

Great posts @hot field ! I'm using a similar toolchain at home - Open WebUI with Ollama. Currently running on an RTX3060 with 12GB vRAM. Likely looking at an upgrade later this year, but having a hard time justifying $2,000+ for the higher end 50-series cards. Easier (and cheaper) to just use cloud-based GPU instances for the time being.

runic wadi Aug 20, 2025, 5:40 PM

#

Every day this week I've had a conversation that involved creative ways to incorporate Graph into retrieval pipelines. Curious if folks are hearing the buzz like I am. David wrote a great blog about the value of graph based RAG in the enterprise. Check it out: https://community.netapp.com/t5/Tech-ONTAP-Blogs/From-quot-Trust-Me-quot-to-quot-Prove-It-quot-Why-Enterprises-Need-Graph-RAG/ba-p/462813

From "Trust Me" to "Prove It": Why Enterprises Need Graph RAG

You are probably looking at the title of this blog post and saying to yourself, "I didn't know that you can build a RAG solution using other technologies besides a vector database." You aren't alone in this thought. There has been some great marketing out there for pushing vector databases as the on...

cloud sapphire Aug 20, 2025, 10:26 PM

#

https://www.netapp.com/forms/ai-thought-leadership-webinar/

#

WEBINAR: BUILDING THE AI READY ENTERPRISE
 NetApp webinar featuring NVIDIA and IDC

wraith apex Aug 21, 2025, 10:28 PM

#

@runic wadi - This has been a big part of my day-to-day for the last year in working on the Amazon Neptune team. Our team created an opensource library to assist with building GraphRAG applications. https://github.com/awslabs/graphrag-toolkit. Primarily supports Neptune, but we've opensourced it and have been accepting PRs to support other Graph databases/datastores, There's also a really good website being curated by Neo4j on all of the different implementation patterns for GraphRAG: https://graphrag.com/

GraphRAG

GraphRAG with a Knowledge Graph

Design patterns for improving GenAI applications with a graph.

runic wadi Aug 23, 2025, 3:20 PM

#

wraith apex <@879384830788399135> - This has been a big part of my day-to-day for the last y...

Very cool Taylor! @old jewel it might be interesting to look at integrating Neptune.

old jewel Aug 25, 2025, 3:44 PM

#

wraith apex <@879384830788399135> - This has been a big part of my day-to-day for the last y...

will need to take a look! the other project which looks pretty interesting is https://github.com/getzep/graphiti which is next on my list to take a look at. it's a general purpose library that might be an easy button for a simpe version of what I have proposed in the GitHub repo in that article

GitHub

GitHub - getzep/graphiti: Build Real-Time Knowledge Graphs for AI A...

Build Real-Time Knowledge Graphs for AI Agents. Contribute to getzep/graphiti development by creating an account on GitHub.

old jewel Aug 25, 2025, 3:45 PM

#

runic wadi Every day this week I've had a conversation that involved creative ways to incor...

thanks for posting this!

runic wadi Aug 25, 2025, 5:41 PM

#

old jewel will need to take a look! the other project which looks pretty interesting is ht...

Please do share with us what you find out about graphiti!

wraith apex Aug 25, 2025, 9:20 PM

#

Similar in nature to graphiti, we've integrated Neptune as a backend graph store for mem0 and cognee. We've also looked at integrating with graphiti, though there are some things we need to address on our end to support it.

wraith apex Sep 3, 2025, 12:56 PM

#

... and now supported with Graphiti: https://aws.amazon.com/about-aws/whats-new/2025/09/aws-neptune-zep-integration-long-term-memory-genai/

Amazon Web Services, Inc.

What's New at AWS - Cloud Innovation & News

runic wadi Sep 8, 2025, 11:44 AM

#

wraith apex ... and now supported with Graphiti: https://aws.amazon.com/about-aws/whats-new...

❗ @old jewel check it out...

old jewel Sep 8, 2025, 2:58 PM

#

Just posted my recent blog post and this might be an interest to some here...

We’ve all heard the phrase "treat infrastructure as code," but what if I told you it's time to give your datasets the same respect? In AI and machine learning, tiny shifts in data ripple through models, causing unpredictable behavior that can lead to compliance nightmares. Our latest blog explores a fundamental shift in mindset—treating data like code—showing you how immutable commits, provable lineage, and policy-driven pipelines can transform your AI from opaque magic to transparent, audit-ready solutions.

As regulators increasingly scrutinize AI applications (hello, EU AI Act!), understanding exactly what changed in your models is no longer optional—it's essential. Whether you’re a data scientist tired of explaining mysterious drifts or an AI stakeholder who needs to keep auditors happy (and lawsuits at bay), this piece aligns perfectly with today’s urgent conversations around AI governance and compliance. Imagine confidently pinpointing every decision to a specific data commit—no more guessing, just receipts.

Dive into the full article here: https://bit.ly/4ncqnRa

.

PS: If you are interested in some of the things that you are protecting yourself when taking your AI Governance seriously, I am re-presenting my in-person API World session this week for Virtual API World Week:

📅 Wednesday, September 10 at 11:00am PDT
🌏 VIRTUAL API World -- Workshop Stage A (PRO)
📖 More info: https://bit.ly/460Ryri

Compliance-Ready AI: Provenance, Lineage, and Policy You Can Prove

In my last blog post From "Trust Me" to "Prove It": Why Enterprises Need Graph RAG, we discussed why Enterprises need explainable, provable AI at inference because regulators, auditors, and risk teams demand verifiable answers. If you can't show why a model decided something, it creates legal, finan...

API World + CloudX + DataWeek 2025: [Virtual] PRO WORKSHOP (API): A...

View more about this event at API World + CloudX + DataWeek 2025

old jewel Nov 24, 2025, 7:37 PM

#

New blog: DocumentRAG Using OpenSearch: GraphRAG-like Structure Without the Graph Overhead

GraphRAG has exploded in popularity for its structure, but it forces teams to maintain a full graph ontology and schema. This post introduces the BM25-based Document RAG Agent (or what I am calling DocumentRAG), a practical middle ground between VectorRAG and GraphRAG that preserves explainability while avoiding graph overhead.

Read the full blog post here: https://bit.ly/48kWHvC

DocumentRAG Using OpenSearch: GraphRAG-like Structure Without the G...

Vector embeddings changed how teams build RAG systems. They made it easy to scan large datasets and pull back passages that feel semantically close to a question. And for a while, that was enough. You could drop your documents into an embedding model, compute vectors, plug everything into your favor...

old jewel Jan 16, 2026, 3:40 PM

#

Forgot to post the follow up to the blog above called: Hybrid RAG in the Real World: Graphs, BM25, and the End of Black-Box Retrieval

We discuss why Hybrid provides a 96% factual faithfulness on answers when compared to plain vector embeddings and we also provide an alternative to the Hybrid RAG (Graph + Vector) you typically see out with an alternative BM25 + Vector Hybrid RAG solution. This BM25 + Vector variation provides an in between solution that doesnt require the heavy lift of using a whole new database and maintaining graph ontologies/structure, while still getting most of the benefits that Graph portion provides: factual data grounding in answers.

Take a look at the blog post here: https://bit.ly/4pz0D3b

Hybrid RAG in the Real World: Graphs, BM25, and the End of Black-Bo...

In the earlier posts in this series, we talked about what happens when Retrieval-Augmented Generation leans too hard on vector search. The first post, From "Trust Me" to "Prove It": Why Enterprises Need GraphRAG, walked through why enterprises need retrieval that behaves more like a knowledge graph ...

hot field Jan 16, 2026, 8:15 PM

#

New blog: https://community.netapp.com/t5/Tech-ONTAP-Blogs/KV-cache-offloading-exploring-the-benefits-of-shared-storage/ba-p/465230

KV cache offloading - exploring the benefits of shared storage

Today, we continue our exploration of KV cache offloading. If you missed my previous posts on this topic, be sure to check them out here, here, and here. In this post, I will further explore the benefits of offloading your KV cache to shared storage. I will show the benefits of a shared storage tier...

cunning barn Jan 16, 2026, 9:16 PM

#

Killer posts @old jewel and @hot field! Love the research and demo content! Very well explained!

old jewel Mar 2, 2026, 5:58 PM

#

Dropping a new blog hot off the press titled:
Less Compute, More Impact: How Model Quantization Fuels the Next Wave of Agentic AI

Bigger models used to win headlines. Now they win (in not good ways) with power bills. This post looks at what changed after DeepSeek R1 made it clear that smarter engineering can compete with brute force. Instead of chasing parameter counts, we look at quantization, fine-tuning, and specialized Small Language Models that focus on one job and do it well. We also unpack what this means for agentic systems, where multiple focused models collaborate instead of one giant model trying to do everything.

This shift is happening for a reason. GPU costs are rising, data center power demand keeps climbing, and inference is now the line item that finance teams watch closely as token costs rise. NVIDIA’s recent inference-focused deal with Groq signals the same trend: latency, efficiency, and cost per token matter more than raw size. If you are building AI systems today, the question is no longer how big your model is. It is how much value it delivers per watt and per dollar.

Dive into the full article on the Open Data Science blog: https://bit.ly/4s6iKye

PS: for those interested in this topic, I will be presenting this topic at 3 different conferences (SCaLE this week, in April I have Devoxx France and a workshop at ODSC East) and 1 podcast due out Friday.

Open Data Science - Your News Source for AI, Machine Learning & more

Less Compute, More Impact: How Model Quantization Fuels the Next Wa...

Editor’s note: David vonThenen is speaking at ODSC AI East this April 28th-30th. Check out his talk, “Less Compute, More Impact: How Model Quantization Fuels the Next Wave of Agentic AI,” there! Early last year, DeepSeek dropped R1, and the market reacted as if someone had pulled the fire alarm....

old jewel Mar 11, 2026, 2:29 PM

#

If you are interested in the blog post above on why Small Language Models and Quantization are going to going to see a dramatic uptick in Agentic solutions....

I recently joined the Open Data Science Conference (ODSC) AI X Podcast with Sheamus McGovern to talk about what's actually happening inside production AI systems. Not the polished demos. The messy reality is when models meet budgets, latency limits, and infrastructure constraints.

We covered a lot of ground in this conversation:
• Why many RAG and agentic AI demos fall apart in production
• The shift from bigger models to smarter-per-watt systems
• What quantization really does when you move from FP32 to INT8 or INT4
• Why Small Language Models (SLMs) often work better for multi-agent systems
• Hybrid RAG architectures that combine vector embeddings with knowledge graphs
• The growing need for governance and observability in enterprise AI

🎧 Listen to the podcast:
Spotify: https://bit.ly/40jPqsq
Apple Podcasts: https://apple.co/4bfx3tu
SoundCloud: https://bit.ly/4s0UrlV

Spotify for Creators

Smarter Per Watt with David vonThenen by ODSC's Ai X Podcast

In this episode of the ODSC Ai X Podcast, host Sheamus McGovern sits down with David vonThenen, Senior AI/ML Engineer in the Office of the CTO at NetApp. David is a seasoned keynote speaker and open-source contributor with deep expertise in Agentic AI, deep learning, model optimization, cloud-native architectures, and retrieval-augmented generat...

Apple Podcasts

ODSC

Smarter Per Watt with David vonThenen

Podcast Episode · ODSC's Ai X Podcast · March 6 · 48m

old jewel Mar 13, 2026, 2:05 PM

#

New Blog Post Alert:
Engineering Inference: KV Cache, Shared Storage, and the Economics of AI

Large language models burn through GPU memory and compute faster than most teams expect. Every prompt creates key-value tensors that sit in GPU memory, and that memory footprint grows with every token and every user. In this article, I walk through what is really happening inside KV cache systems and why architectures like vLLM and LMCache exist in the first place. Instead of treating caching as a performance trick, the post looks at it as a memory strategy that changes how inference systems are built.

This topic matters right now because the economics of AI are shifting. Training made the headlines, but inference is what drives ongoing cost in production systems. Techniques such as KV cache reuse, memory tiering, and shared storage are becoming critical for controlling GPU spend and data center power consumption. As companies deploy chat systems, RAG pipelines, and agent workflows at scale, engineering the inference stack is becoming more important than adding more GPUs.

Dive into the full article here: https://bit.ly/4bl87kn

cunning barn Mar 18, 2026, 5:07 PM

#

https://github.com/NVIDIA/NemoClaw