#╭・artificial-intelligence

1 messages · Page 1 of 1 (latest)

cunning barn
ionic sandal
#

I am actually interested in PaLM-E from Google.

mortal radish
#

does netapp has any involvement in genAI?

cunning barn
lilac trail
#

Practical question : how can we leverage the dataOps toolkit to import existing datasets into containers . most examples show how to use the toolkit to create new containers (workpsaces) with PVC but not how to bind existing datasets. This can probably be achieved with Trident but i don't see how to use the DataOsp toolkit for the same ina K8s environment .

cunning barn
#

@hot field is in here, let’s see if the ping gets his attention. If not, I’ll go chase him down for ya

hot field
fossil geyser
lilac trail
# hot field <@456226577798135808> - The DataOps Toolkit does not include any volume import c...

Any plans to integrate this functionality into the DataOPs toolkit ? The main painpoint in my view is not building and starting container workspaces but rather handle the backend data that is used with the container which is often pre-existing or generated outside the containers used for training jobs. So snapshoting/cloning the existing data, provide versioning and so on is where the value is .

normal yoke
#

Posting this AI related blog here about doing LLM RAG development with NetApp, let me know if you have any questions, comments, or disagreements!

https://community.netapp.com/t5/Tech-ONTAP-Blogs/Accelerating-LLM-Retrieval-Augmented-Generation-development-with-NetApp/ba-p/451230

echo hamlet
#

Ai and Analytics exam is so tough 😮‍💨. Gave 7 attempts still not able to clear

torpid hazel
#

Nvidia GTC! My colleagues are on-site, I'm so jealous 😵‍💫🤯

fossil geyser
#

Nice! I'd love to be there

torpid hazel
#

@cunning barn you have some space left in your rack?

cunning barn
#

I wonder if David Blackwell is there...

#

That's insane.

cunning barn
#

Nice throwback to 5 years ago at GTC...

fossil geyser
#

**Crysis

brittle helm
#

nothing ever will actually be able to play Crysis. it's like the inverse of DOOM

thin sable
brittle helm
cunning barn
#

Solid read! 🤘

rustic sequoia
#

I just downloaded Llama3 on Ollama and am impressed with the quality and speed on my MacBook!

cunning barn
torpid hazel
#

🤯

zinc ether
#

I'm deploying the NetApp GenAI Toolkit Preview v0.4 - would this be the right place to make inquires about functionality ?

cunning barn
#

Sure! And we can ping @hot field with any questions!

zinc ether
#

After I've deployed and indexed some data - is it correct to assume that when I click on the eyeglass that it tries to open an explorer type window? Cause currently when I do that I get an error prompt. It does say above the prompt window "Explore with a prompt or select a folder"

#

perhaps its just a lack of a proper prompt..

zinc ether
#

the chrome console is showing a 500 error when it tries to open http://<ip addr>/apiv1/prompt/ .... looking for an index-CSBXXXX.js file

cunning barn
#

Mike is out on parental leave and will be back on Dec 2nd. Let me dig around and see if I can find someone to assist.

#

I found his backup, and he'll be here soon 🙂

zinc ether
#

thanks appreciate the help

cunning barn
#

@limpid storm here ya go...

#

Let's see if @torn fern gets his ping...

zinc ether
#

no worries I'll check back in tomorrow

zinc ether
#

not sure the docker image is actually mounting the file system - cause I was able to get the prompt to show me an ls of /volumes/gcnv/<volname> and it returned a total of 12 files

#

given the bootscript - only the underlying OS would mount the NFS volume

#

but sure I'll create an issue in git

cunning barn
#

Thanks William! That’s the best way to get to some of those engineers.

#

Prabu is on my team so I’ll make sure to follow up with him

zinc ether
cunning barn
#

It’s crazy to think it was just released in Dec 2022…

cunning barn
#

Keep an eye on this! Composable middleware that’s fully extensible!

hot field
#

This is super interesting. Looks like it is essentially a standardized API layer that can sit on top of any inference server (ollama, vLLM, NIM, etc.), allowing users to swap inference providers without needing to change any of their API calls. Only catch is it is “Llama first” (“Explicit focus on Meta’s Llama models and partnering ecosystem”).

cunning barn
#

Agreed! I’m trying to imagine use-cases where something like this would be implemented

wraith apex
#

Anyone in the NetApp community doing anything interesting with MCP and Agents on top of the ONTAP API? Likely needs some guardrails if allowing the agent to make any modifications. But would be neat if just giving it read-only access. Maybe an MCP server with an ONTAP API tool along with a BlueXP API tool? "Agent, tell me what people or orgs within my enterprise consume the majority of my resources?"

cunning barn
#

I’m tinkering with building a hardware universe agent, just need to verify I’m not duplicating efforts already in progress.

#

Not sure I would turn an agent loose on my data quite yet

wraith apex
#

Pair that up with the IMT and that could be a powerful tool for partners - "Agent, here's a BOM for NetApp gear and some other 3rd party vendor hardware/software. This is being installed in an environment with this additional list of hardware/software. Validate that this is a supported configuration."

wicked crow
#

And hope it doesn't hallucinate an unsupported config😂

wraith apex
#

That's sort of the entire point of using MCP underneath the Agent. You would expose HWU and IMT as either APIs or ingest the data from those tools into a database. MCP either exposes the API or templated queries as "tools" that the agent can then use. You just describe back to the agent what the tool can be used for and what parameters to provide the tool (the API contract or query templates). The LLM then has to abide be the constraints of the tools it has access to. You can then provide the lineage of the requests to understand how the agent derived its response. This is basically what Claude.ai is now doing (and I believe ChatGPT can do this now too) where you tell it to search the internet for an answer. When it creates the response, it also provides you with the sources it used for each part of the response.

wheat sequoia
wraith apex
wheat sequoia
velvet geyser
wheat sequoia
velvet geyser
hot field
#
cunning barn
#

Good cheat sheet for Agentic protocols

wraith apex
#

The first two have gained a lot of mindshare. I've yet to see the latter two put into practice.

hot field
#
hot field
cunning barn
wraith apex
#

Great posts @hot field ! I'm using a similar toolchain at home - Open WebUI with Ollama. Currently running on an RTX3060 with 12GB vRAM. Likely looking at an upgrade later this year, but having a hard time justifying $2,000+ for the higher end 50-series cards. Easier (and cheaper) to just use cloud-based GPU instances for the time being.

runic wadi
#

Every day this week I've had a conversation that involved creative ways to incorporate Graph into retrieval pipelines. Curious if folks are hearing the buzz like I am. David wrote a great blog about the value of graph based RAG in the enterprise. Check it out: https://community.netapp.com/t5/Tech-ONTAP-Blogs/From-quot-Trust-Me-quot-to-quot-Prove-It-quot-Why-Enterprises-Need-Graph-RAG/ba-p/462813

cloud sapphire
#

WEBINAR: BUILDING THE AI READY ENTERPRISE
 NetApp webinar featuring NVIDIA and IDC

wraith apex
#

@runic wadi - This has been a big part of my day-to-day for the last year in working on the Amazon Neptune team. Our team created an opensource library to assist with building GraphRAG applications. https://github.com/awslabs/graphrag-toolkit. Primarily supports Neptune, but we've opensourced it and have been accepting PRs to support other Graph databases/datastores, There's also a really good website being curated by Neo4j on all of the different implementation patterns for GraphRAG: https://graphrag.com/

GraphRAG

Design patterns for improving GenAI applications with a graph.

runic wadi
old jewel
# wraith apex <@879384830788399135> - This has been a big part of my day-to-day for the last y...

will need to take a look! the other project which looks pretty interesting is https://github.com/getzep/graphiti which is next on my list to take a look at. it's a general purpose library that might be an easy button for a simpe version of what I have proposed in the GitHub repo in that article

GitHub

Build Real-Time Knowledge Graphs for AI Agents. Contribute to getzep/graphiti development by creating an account on GitHub.

runic wadi
wraith apex
wraith apex
runic wadi
old jewel
#

Just posted my recent blog post and this might be an interest to some here...

We’ve all heard the phrase "treat infrastructure as code," but what if I told you it's time to give your datasets the same respect? In AI and machine learning, tiny shifts in data ripple through models, causing unpredictable behavior that can lead to compliance nightmares. Our latest blog explores a fundamental shift in mindset—treating data like code—showing you how immutable commits, provable lineage, and policy-driven pipelines can transform your AI from opaque magic to transparent, audit-ready solutions.

As regulators increasingly scrutinize AI applications (hello, EU AI Act!), understanding exactly what changed in your models is no longer optional—it's essential. Whether you’re a data scientist tired of explaining mysterious drifts or an AI stakeholder who needs to keep auditors happy (and lawsuits at bay), this piece aligns perfectly with today’s urgent conversations around AI governance and compliance. Imagine confidently pinpointing every decision to a specific data commit—no more guessing, just receipts.

Dive into the full article here: https://bit.ly/4ncqnRa

.

PS: If you are interested in some of the things that you are protecting yourself when taking your AI Governance seriously, I am re-presenting my in-person API World session this week for Virtual API World Week:

📅 Wednesday, September 10 at 11:00am PDT
🌏 VIRTUAL API World -- Workshop Stage A (PRO)
📖 More info: https://bit.ly/460Ryri

old jewel
#

New blog: DocumentRAG Using OpenSearch: GraphRAG-like Structure Without the Graph Overhead

GraphRAG has exploded in popularity for its structure, but it forces teams to maintain a full graph ontology and schema. This post introduces the BM25-based Document RAG Agent (or what I am calling DocumentRAG), a practical middle ground between VectorRAG and GraphRAG that preserves explainability while avoiding graph overhead.

Read the full blog post here: https://bit.ly/48kWHvC

old jewel
#

Forgot to post the follow up to the blog above called: Hybrid RAG in the Real World: Graphs, BM25, and the End of Black-Box Retrieval

We discuss why Hybrid provides a 96% factual faithfulness on answers when compared to plain vector embeddings and we also provide an alternative to the Hybrid RAG (Graph + Vector) you typically see out with an alternative BM25 + Vector Hybrid RAG solution. This BM25 + Vector variation provides an in between solution that doesnt require the heavy lift of using a whole new database and maintaining graph ontologies/structure, while still getting most of the benefits that Graph portion provides: factual data grounding in answers.

Take a look at the blog post here: https://bit.ly/4pz0D3b

hot field
#
cunning barn
#

Killer posts @old jewel and @hot field! Love the research and demo content! Very well explained!

old jewel
#

Dropping a new blog hot off the press titled:
Less Compute, More Impact: How Model Quantization Fuels the Next Wave of Agentic AI

Bigger models used to win headlines. Now they win (in not good ways) with power bills. This post looks at what changed after DeepSeek R1 made it clear that smarter engineering can compete with brute force. Instead of chasing parameter counts, we look at quantization, fine-tuning, and specialized Small Language Models that focus on one job and do it well. We also unpack what this means for agentic systems, where multiple focused models collaborate instead of one giant model trying to do everything.

This shift is happening for a reason. GPU costs are rising, data center power demand keeps climbing, and inference is now the line item that finance teams watch closely as token costs rise. NVIDIA’s recent inference-focused deal with Groq signals the same trend: latency, efficiency, and cost per token matter more than raw size. If you are building AI systems today, the question is no longer how big your model is. It is how much value it delivers per watt and per dollar.

Dive into the full article on the Open Data Science blog: https://bit.ly/4s6iKye

PS: for those interested in this topic, I will be presenting this topic at 3 different conferences (SCaLE this week, in April I have Devoxx France and a workshop at ODSC East) and 1 podcast due out Friday.

Editor’s note: David vonThenen is speaking at ODSC AI East this April 28th-30th. Check out his talk, “Less Compute, More Impact: How Model Quantization Fuels the Next Wave of Agentic AI,” there! Early last year, DeepSeek dropped R1, and the market reacted as if someone had pulled the fire alarm....

old jewel
#

If you are interested in the blog post above on why Small Language Models and Quantization are going to going to see a dramatic uptick in Agentic solutions....

I recently joined the Open Data Science Conference (ODSC) AI X Podcast with Sheamus McGovern to talk about what's actually happening inside production AI systems. Not the polished demos. The messy reality is when models meet budgets, latency limits, and infrastructure constraints.

We covered a lot of ground in this conversation:
• Why many RAG and agentic AI demos fall apart in production
• The shift from bigger models to smarter-per-watt systems
• What quantization really does when you move from FP32 to INT8 or INT4
• Why Small Language Models (SLMs) often work better for multi-agent systems
• Hybrid RAG architectures that combine vector embeddings with knowledge graphs
• The growing need for governance and observability in enterprise AI

🎧 Listen to the podcast:
Spotify: https://bit.ly/40jPqsq
Apple Podcasts: https://apple.co/4bfx3tu
SoundCloud: https://bit.ly/4s0UrlV

Spotify for Creators

In this episode of the ODSC Ai X Podcast, host Sheamus McGovern sits down with David vonThenen, Senior AI/ML Engineer in the Office of the CTO at NetApp. David is a seasoned keynote speaker and open-source contributor with deep expertise in Agentic AI, deep learning, model optimization, cloud-native architectures, and retrieval-augmented generat...

old jewel
#

New Blog Post Alert:
Engineering Inference: KV Cache, Shared Storage, and the Economics of AI

Large language models burn through GPU memory and compute faster than most teams expect. Every prompt creates key-value tensors that sit in GPU memory, and that memory footprint grows with every token and every user. In this article, I walk through what is really happening inside KV cache systems and why architectures like vLLM and LMCache exist in the first place. Instead of treating caching as a performance trick, the post looks at it as a memory strategy that changes how inference systems are built.

This topic matters right now because the economics of AI are shifting. Training made the headlines, but inference is what drives ongoing cost in production systems. Techniques such as KV cache reuse, memory tiering, and shared storage are becoming critical for controlling GPU spend and data center power consumption. As companies deploy chat systems, RAG pipelines, and agent workflows at scale, engineering the inference stack is becoming more important than adding more GPUs.

Dive into the full article here: https://bit.ly/4bl87kn

cunning barn
#

7k starts and over 700 forks already in what.... 24 hours?

I'm curious... who's gonna drop this in their Enterprise?

cunning barn