#llms

1 messages Ā· Page 1 of 1 (latest)

mystic skiff
#

YO

#

What is the minimum gpu requirement for inference using Llama 2 1b

cedar moat
#
mystic skiff
#

cool

floral tartan
#

Hey Guys!

Any1 building on LLMs *other than chatbots šŸ™‚

sly iris
young island
#

Can we use llm models on kaggle kernels

#

I would like to be a part of the project if someone is working on llms

restive garnet
#

Anyone here who has done any benchmarking on LLMs for Temporal tasks?

chrome kiln
# mystic skiff YO

it depends. E.g. gpu requirements if you use llama.cpp or mlc-llm is much less than if you use the model from huggingface. (e.g. I got mlc-llm to run llama2 13b on my 2080Ti with 12GB memory)

worldly meteor
#

Wooot! Wooot!

spark solstice
elder tusk
#

opensource LLM like Llama2 on AWS/own GPUs vs openAI APIs

customer service chatbots for companies. i want to work on this idea but openAI APIs seem very cheap compared to running Llama on AWS. why's it that doing everything by yourself is more expensive than the company doing things for you? is it simply because openAI is trying hard for customer acquisition with these prices ?

what's the point(in terms of commercial usage) of opensource LLMs if it's anyways expensive than proprietary?

chrome kiln
# elder tusk *opensource LLM like Llama2 on AWS/own GPUs vs openAI APIs* **customer service ...

rolling your own llms are not always more expensive, it all depends on volume, e.g. if you are building a POC and only you are using it obviously it's going to be lots more expensive to run on AWS than calling openai, but if you got thousands of users then costs can quickly escalate with openai, whereas you can keep a tighter control if it's on your own AWS instance. Also remember than openai has scale (e,g, they can batch together lots of requests from customers which mean they are using their GPUs effectively, so that they can provide predictions at a much lower cost than, say, if you are only serving very few users at any one time but have to keep the GPU up 24/7)

also cost is not the only consideration for companies who want to use LLMs. Factors such as data governance, ability to customise etc etc all contribute to whether a company decides to go with an open source model or one from an external provider

Finally, in terms of cost for open source models, if you want to use llama2 you can consider anyscale endpoints, which they claim cost ~$1 /million tokens

(and I have seen more startups now offering serverless GPU inference options, which can again help you keep costs down)

vague mesa
subtle bay
solar isle
#

Hi everyone. I am working in Roco Dasaset and trying to produce captions for images of radiology.Ā #llms I need help on this

worldly meteor
#

A quick question for everyone, has anyone explored combinations of different LLM models, frameworks (like LangChain, LlamaIndex, etc), vector databases, etc and have some experience to share, to help pick/find a good combination of them to use. This could be very much case-by-case and also depending on the kind of data and use cases we are trying to solve.

Added to it if anyone has used LLM monitoring with the help of Ragas or W&B Prompt, etc -- this is also an interesting thing to know about.

I would like to see code examples mainly, even notebooks are acceptable, does not just have to be github repos.

worldly meteor
# elder tusk *opensource LLM like Llama2 on AWS/own GPUs vs openAI APIs* **customer service ...

Economy of scale is the shortest answer. With repetition and experience comes skill and finesse.

Software solutions must run in optimised environments but when we try to replicate them in an less optimal environment, we start to pay the extra price either as tech debt or as cost of resources or both -- and here resources are not just tech resources but mostly people costs.

Many companies do not actually know the per unit cost they incur so they might be proposing a price that is easiest to consume, this could be one reason or from one perspective. Otherwise they wont have the customerbase they need to build for their future.

worldly meteor
strong edge
#

šŸ“¢ Exciting news! Ever wondered what vectors in a vector space look like? šŸ’­ Check out my latest Kaggle Notebook where I delve into the visual representation of word vectors using the** Word2Vec model on the Game of Thrones dataset**! šŸ‰šŸ“š

šŸ”— Link to the Notebook: Visual Representation of Word2Vec on GoT Dataset

In this notebook, I've explored the fascinating world of word embeddings and how they can be represented visually in a vector space. Using the popular Word2Vec model, I've** analysed the rich language of George R.R. Martin's iconic series, Game of Thrones.**

šŸ‘€ Dive in to:

šŸ” Understand the concept of word vectors and their significance in natural language processing.
šŸŽØ Visualise word embeddings in a high-dimensional vector space.
šŸ“Š Explore relationships between words and their vectors!
šŸ”€ Investigate semantic similarities and analogies between words.
šŸ’” Gain insights into how vector representations capture linguistic nuances and contexts.

Whether you're a seasoned data scientist or just curious about the magic behind language models, this notebook offers a captivating journey into the world of vectors and their applications.

Don't miss out! Click the link above to explore the notebook, and feel free to leave your comments, questions, and feedback. Let's dive into the realm of word vectors together! šŸš€āœØ

#DataScience #WordEmbeddings #nlp #GameOfThrones #Kaggle #Word2Vec #VectorSpace #DataVisualization

atomic canyon
mystic holly
#

hi everyone

#

how can i create a rag for excel files
now iam working on converting excel sheets into dataframe and then to sql tables.
but some excel sheets are to complicated and have multiple tables how can i create a structured table with correct table name as the table name given in excel sheet. If a excel sheet has multiple tables i need to create multiple tables how can i make this work. is there any better approach for excel ans csv

muted hollow
#

Hi, i need to deploy simple IRIS classifier on Azure, via CI/CD pipeline. Can someone please help me on this?

plush narwhal
#

Who has experience on llm fine-tuning for regression task?

muted hollow
# round reef Hi, What is your problem?

So I hav ea classifier model.pkl ready with me on github , and its realated code. Its super simple, just for learning purpose. I am trying to predict labe 1,2,3.
the issue is I dont know how to deploy it to azure platform.
What I want to do? place inputTrain.csv file on azure storage, which triggers training process, end result is model.pkl is created and stored on azure storage. Now IF test.csv is uploaded to azure storage, then test process is triggered, which utilized trained model.pkl and make predictions, end reuslt is it saves predictions in a csv file on azure storage.
only code is placed on github, i need to deploy it via docker and register in ecr. that ways i will have latest code there. Now to run code, i need to set up VMs. and need to set up triggers on storage folders.

round reef
#

I am going to setup a private LLM and configure it and train it on company resources.
How can I do it?
Help me, please.

round reef
#

Hi, everyone.
We are planning to deploy Llama3 for our app with millions of users.
How can we achieve this?
And which GPU series or cloud platforms are best for achieving high speed and scalability?

quaint lodge
#

Hi, everybody.
I have a question
I need to extract the abstract of papers not using GPT4, I have to rely on local resource.

from py_pdf_parser.loaders import load_file
from py_pdf_parser.components import ElementOrdering

document = load_file("JPM-2022-Harvey-25-46.pdf")
file_path = 'JPM-2022-Harvey-25-46.pdf'

document = load_file(
file_path, element_ordering=ElementOrdering.RIGHT_TO_LEFT_TOP_TO_BOTTOM
)

So I parsed the pdf using py_pdf_parser, and I'm going to merge the pieces until obtain the compelete abstract.
Now I try to use embedding models for this. But that doesn't work well.
If somebody has solution to about this, please help me.
In the case that I have to use the LLM models, the size should be under 2GB.
Thanks!

vale sequoia
#

Hello everybody, I would like to learn about LLMs, nevertheless cannot really find any courses for beginners. Can anyone suggest some course?

sly cloak
#

Hi all, i have a doubt like i am working on an project where i use RAG based retrieval for a domain specific problem, so in order to evaluate the results of my RAG based framework, what all evaluation metrics do i need and second question do i compare my RAG based appraoch with other LLMs or do i just compare it with and without RAG.

thorn storm
sly cloak
#

No it's a research project

wanton geode
#

I have a question on how models are submitted to Kaggle. Where can I find this info and is there some checks that Kaggle does or some proofs that model builders have to provide while submitting models to Kaggle?

patent oriole
earnest schooner
#

Anyone working on / with agents? Looking to join or create a group to share knowledge & learn together

twin ocean
#

Hey I am building a terms and condition summariser for software. Is anyone aware of available datasets with the T&Cs and summaries. I am using a Pegasus for text-generation and all it gives out is some gibberish with repeating words.

south fern
#

Hey anyone,

I have a set of question-answer pairs, and the idea is to pass our questions and documents to a library that uses an internal LLM to generate responses. Then, I’d like it to compare these generated responses with our existing answers using a default or customizable metric.

Is there an existing tool, library or approach that would facilitate this type of comparison?

granite magnet
#

HI, I am Abdullah I am an ML engineer want to join any team to particapte in kaggle competions

tiny egret
#

Hi, i'm trying to understand if I want to get useful insights from a big spreadsheet with survey results which way is better. I can create a rag but the rag is good to get similar content, i need to make the llm reason about what is a good insight. Any suggestion?

prime monolith
tiny egret
prime monolith
tiny egret
prime monolith
#

Thanks

fading maple
#

Hello guys. i'm looking for advice on LLM-Related topics.
So, i have some problem to decide to use Semantic Searching through Vector Database or using LLM to recognizing User-Intent prompt. i'll give you an examples by leveraging user intent to search on a site:

for Semantic Search:
we list all of prompt that might user do when they want to search on a site
when user prompt we will search for similarity through the list on our database
if we found the closest value from user prompt we could tell it that user want to search data on a site and maybe using web crawler to get it.

and for LLM:
Use a pre-trained NLP model (like OpenAI's GPT or other libraries like Hugging Face) to parse and classify user input to define what are user intent on.
if it detect user intent to search data to a site and we may use our web crawler to get it.

each of them has its own disadvantages that look like this:

Semantic Search:
Over-Reliance on Data Representations: Similarity search depends on how well your embedding model represents the query and content. Poor embeddings or mismatches in the model's training domain may lead to irrelevant results.
Limited Understanding of User Goals: While it matches meaning, it doesn't explicitly infer the user's intent (e.g., "search for this," "get a summary," or "navigate")
etc

LLM:
Computational Costs: High Resource Requirements and Inference Time
Dependency on Fine-Tuning and Prompt Engineering
etc

so, i want to know if any of you has any opinion about this matter ?

haughty trout
#

You can now build your own AI travel Agents!

Join us for a FREE workshop and automatically find the cheapest flights, compare hotels, and create perfect itineraries with AI.

šŸ“… Tomorrow, Nov 24th, 11AM IST

Limited seats available, register here: https://us06web.zoom.us/j/81989090192?pwd=MYs4Nme8smMQsMjgWIJs1aRhjoebF4.1

thorny thorn
#

šŸš€ Dive into the Future of AI with Marco O1!

I’ve just published a comprehensive deep dive into Alibaba’s groundbreaking AI model, Marco O1, designed for open-ended reasoning. This article unpacks how Marco O1 is setting new standards for developers and innovators with its cutting-edge capabilities.

Whether you’re an AI enthusiast, a developer, or just curious about where open-source tech is heading, this piece covers it all – from core functionalities to its game-changing applications in the real world.

šŸ”— Check it out here: https://www.linkedin.com/pulse/marco-o1-alibabas-advanced-groundbreaking-ai-model-nalkheda-wala-uhkmf

šŸ’” Trust me, this is more than just an overview – it’s a must-read deep dive for anyone passionate about the future of AI!

Feel free to share your thoughts – would love to know how you see this impacting the tech landscape! šŸš€

Explore Alibaba's Marco O1, a groundbreaking AI model for open-ended reasoning. This in-depth analysis covers its capabilities, impact on developers, and future

serene grove
simple coyote
#

Can anyone suggest me best course on multi agent RAG?

fierce oak
lilac sail
#

Hey folks I’m an OpenAI beta dev for fun. I’ve designed from 3.5 to what’s coming. I’m also the inventor of universal scripting language which currently lets you use 539+ languages in one place. Working on the transpiler now. Adding in one more set of characters per language (final checks) and then it’ll be into automating the ide with scheduling and cross interlacing. If I can be of any help, please let me know.

compact saddle
compact saddle
fierce oak
lilac sail
#

@compact saddle what level of work can you do? What are trained in so far? What do you enjoy doing with ml?

compact saddle
# lilac sail <@827545457790025739> what level of work can you do? What are trained in so far?...

Hi, Jordan, I can do deep learning. I use tensorflow. I have mostly worked on CNNs. I know most of the concepts related to traditional ml with sklearn and deep learning but I am currently trying to practice and get better. I am actually not sure what companies are looking for because I have tried implementing a few research papers too like resnets, densets etc. and worked on a few projects too but still...I guess working with someone who is more experienced can offer insights and ways of approaching problem, and help finding opportunities in this domain. Also, let me tell you I don't have engineering background or masters or phds, I am not sure but I guess these things may also be playing against my luck. I wish I could do these things but not in that kind of situation.

compact saddle
fierce oak
#

Ok I'll try

lilac sail
#

@compact saddle Well, what have you made for fun that blew you away?

compact saddle
# lilac sail <@827545457790025739> Well, what have you made for fun that blew you away?

So, as in model I don't have a fun answer for that as I have worked on a few and still working but there is something I have done that I think blew my mind when I first discovered it.
__
I have come up with a way to analyse the presence of consciousness in any AI system. It's a really solid idea according to me. I tried publishing my idea but it seem like research community these days are more into mathematical and data based things which I don't have. It's hard to prove it mathematically, and not even easy to conduct experiments.


My idea can easily tell if any AI system has presence of self-awareness or consciousness similar to humans. This is really something I believe can shift the way people see AI systems and intelligence. It can open up maybe possibilities to new ideas or better approach. I shared it online but it seems like people don't care unless you have some reputation or qualification of that level.

lilac sail
#

Okay. But can you explain how it does it. You can use any level of understanding to teach others about your project. Can you paste the paper text into a message and send it to me so we don’t clog the channel. If people are curious about it I have a place we can discuss any future idea. No links. No attachments.

#

If you can’t figure out the experiment. Let me know your methodology and I’ll work with you to see what can be done @compact saddle

compact saddle
compact saddle
lilac sail
#

@compact saddle I read through some of it. You’re asking if an isolated ai could be determined conscious if locked in a windowless room without knowing its own timeframe. The shadows on the wall their only points of reference. Depending on its innate abilities yes. But if you leave something by itself for too long it flanderizes into something you may not intend. If you make it time-blind it’s not a test. It’s more of can it survive or not. If you want them to compare to humans give them a writing instrument with their own learned abilities and have them take IQ batteries. They are surprising conversational.

compact saddle
# lilac sail <@827545457790025739> I read through some of it. You’re asking if an isolated ai...

Let me put this in this manner. My core idea is or assumption is:

  1. Anything that is conscious has a sense of flow of time, no matter how small. For example, if two human beings are talking to each other they both will be able to tell and even sense internally that some time has elapsed when they go back home or meet again. This ability to sense the flow of time is only and only present in conscious beings. Another example is, You can easily feel that some time has elapsed since the last time we chatted here on Discord (that is one day ago). You can sense it even if we remove hours, days, minutes kind of metrics. Time measurement is another thing but we both can tell and feel. Accuracy is not important. It's like we internally have a sense of moving from one time coordinate to another. Since the time flows in one direction in our universe this quality is present in every conscious being.

  2. The idea of the test emerges from the fact that we can still sense this passage of time if we are kept isolated somewhere in a room disconnected from the outside world. Over the centuries we have developed various ways to measure time like clocks etc. we developed these tools based on certain repeated pattern in the universe like motion of the planets and sun, and internally we have somehow aligned our sense of flow time with our inventions because time is moving in forward direction. Internally, it may not be in seconds, hours based but we kind of interpret our internal sense of flow of time with these inventions. So, if there is any AI that is conscious or self aware like human beings should also have this inner ability. It should also feel that some time has elapsed since it last chatted or to talked to a particular user. But how we are going to find this quality? It's by isolating it and letting it come up with a way to measure time or some repeated pattern. We'll have to first let its internal sense of time align with external world like clocks, or sun etc.

round reef
#

My LLM , is talking to itself
I tried chat template
But it is not working with tinyllama
Can anyone ,give me suggesion to resolve this issue

gloomy kelp
#

Hey folks, I m doing an AI master, I m strongly interested by LLMs and I wonder how broaden my knowledge beyond academic courses, do you have any recommandations to learn and practice more? btw, Im new also in kaggle

simple coyote
tulip locust
#

Hello everyone. I recently started to learn how to build language models(currently only bi-gram models) and I am having some problem in understanding a concept. I am talking about (B,T,C) (Batch,Time,Channel). I have tried to understand from online llms but failed to fully grasp it. If anyone here can make me understand it in layman's term, it will be very grateful for me. Also, any free learning resources for building of LM is highly appreciated.

astral gull
#

Hello all, i was working on a project where in I need to fetch data from multiple sources, the files are inclusive of text and video. The text files are too long which makes the entire data preprocessing time too long. I need to fetch all this data and then apply text summarization before generating the final output. Any idea how can i efficiently fetch these text files? They are not on aws or any other cloud platform. Currently the data sourcing takes more than 15mins.

rose tangle
#

https://www.kaggle.com/code/ashu009/finetuning-llama2

Hey guys I am stuck in this error for so long ... I am fine tuning llama2 and I am not getting what I doing wrong ... I followed krish naik sir's youtube tutorial... Tutorial link : https://youtu.be/Vg3dS-NLUT4?si=DVkaHUr17DFlbEnQ

Please help guys

In thsi video we will be dicussing about how we can fien tune LLAMA 2 model with custom dataset using parameter efficient Transfer Learning using LoRA :Low-Rank Adaptation of Large Language Models.
Code: https://drive.google.com/file/d/1Bd7c5rioBOmtJbDEax83vAHEPru-r06l/view?usp=sharing
-----------------------------------------------------------...

ā–¶ Play video
low prism
round reef
#

hello , im tryiing to fine tune blenderbot. I hv finetuned it , tested it and gives resposne but im makign error in saving model properly . And when i load model, it throws differnt errors.
i saved model like this
model.save_pretrained("/content/drive/MyDrive/Mental_Buddy/model") tokenizer.save_pretrained("/content/drive/MyDrive/Mental_Buddy/model")

but it doesnt make config.json and few more files ig .

sacred token
#

Hi all! I'm looking for a community to hang out in while learning. Does anyone use this place?

balmy token
#

Hi everyone,

I’m transitioning from a Neuroscience & Psychology and Data Science background to specialize in NLP/LLMs, and I’d love your advice:

Background:

  • BSc in Neuroscience & Psychology
  • 6-month Data Science specialization (2021)
  • 3 years as a Data Scientist in healthcare, focusing on EEG signal processing and classification. However, many of my projects didn’t reach completion, and I haven’t consistently kept up with the latest methodologies and innovations, so I recognize gaps in my expertise.

Current focus:

  • Completed Coursera’s Generative AI with LLMs course
  • Currently working on a small emotion-detection project (sentence classification) for my GitHub portfolio

Questions:

  • What core skills, tools, or frameworks should I master now to break into the NLP/LLM space and to be the most relevant to employers?
  • Which project ideas or portfolio pieces or experience in general you've found to be relevant in the hiring processes?
  • Any must-read papers, tutorials, or communities you’d recommend?
  • How do you stay up-to-date with all the changes, tools and innovations in the field?

Thanks in advance for your insights!

lethal canyon
lethal canyon
#
Amazon Web Services

In this post, we explore using Amazon Bedrock to create a text-to-SQL application using RAG. We use Anthropic’s Claude 3.5 Sonnet model to generate SQL queries, Amazon Titan in Amazon Bedrock for text embedding and Amazon Bedrock to access these models.

raven cradle
pearl orbit
#

Hello everyone, I am currently trying to understand chat templates for LLMS and their pros and cons
As of rightnow i am struggling to find enough resources to fully understand them so any help would be really appreciated

winged igloo
#

Hey everyone I want to learn LLMs how can I learn and where to start?

coral rain
#

Job Title: Part-Time Senior AI/ML Engineer (Remote)

We are seeking a skilled and experienced Senior AI/ML Engineer to join our remote team on a part-time basis. The ideal candidate will have a strong technical background, excellent communication skills, and the ability to work independently in a fast-paced environment.

Requirements:
-Minimum of 7–10 years of professional software development experience

-Proven experience working effectively in a remote environment

-Advanced English proficiency (C1 or higher); an American accent is preferred

-Availability to work 10–15 hours per week during EST or CST business hours

If you're a highly motivated engineer with a passion for building high-quality software and can commit to a flexible part-time schedule, we’d love to hear from you.
You can connect with me on WhatsApp: +1 (567) 469-5384
.

delicate mango
#

Code Rehabilitation:
A New Paradigm
I share a concept "code rehabilitation and rehabilitators."

A story:

I appointed an excellent fast chef, a few serving robots, and retained my existing staff—even hired more. Why?

Reason 1: My chef started producing fast, good, tasty recipes with about 90% accuracy in the logistics of placing them in plates or packages to be served. But that means a lot of human interference is still needed.

Reason 2: More training is going on for my robots on how to handle exceptional situations—so there's a growing need for rehab-trainers.

Reason 3: The more demand for my recipes, the more I need marginally more staff and robots. While 90% of the task gets done swiftly (maybe even at "200 percent velocity"!), that remaining 10% requires more people and robots. As demand increases, this marginal staff requirement could become tremendous.

The Moral: An analogical scenario for why I still recruit—but in a different fashion, for a different paradigm. This reminds me of "creative destruction," a concept from economist Joseph Schumpeter (though not a Nobel laureate, his influence on economic thought is monumental). It's a sort of Brahmic cycle—simultaneous destruction and creative activity.

The Temporal Question
But here's the thing: Will this 10% shrink over time? Will next year's AI handle 95%, then 98%? Or will new complexities emerge—new edge cases, security concerns, integration challenges—that keep regenerating that stubborn 10%? The rehabilitation need might persist longer than we think, or it might evolve into something entirely different. Time will tell which way the wind blows.

What Lies Ahead
I foresee a Rehabilitation Engineering Program emerging soon—a structured discipline to train specialists who can bridge the gap between AI-generated output and production-ready systems. These rehabilitators will be the crucial link in our new *creative-destructive cycle.

(*With due regards to Joseph Schrumpeter who kindled this idea)

upper pewter
earnest meteor
#

Hi everyone,
I recently published an open-source research project focused on reducing overconfidence in large language models through architectural uncertainty quantification.

The work introduces a pyramidal Q1/Q2 model that separates aleatoric from epistemic uncertainty, based on the paper ā€œHow to Solve Skynet: A Pyramidal Law for Epistemic Equilibrium.ā€

šŸ“„ Paper: https://www.researchgate.net/publication/397341922_How_to_Solve_Skynet_A_Pyramidal_Law_for_Epistemic_Equilibrium
šŸ’» Code: https://github.com/AletheionAGI/aletheion-llm

The system shows an 89% calibration improvement over baseline models while maintaining stable height (truth proximity). Feedback and replication experiments are very welcome.

potent ravine
#

I'm finding a US developer for the collaboration. If anybody interested, please dm me.

hybrid osprey
#

hii

earnest meteor
#

šŸš€ After 6 months of building, I'm excited to launch AletheionGuard

The problem we're solving:

Companies are deploying AI (chatbots, RAG apps, agents) in production without knowing when their models are generating incorrect information.

This is especially critical in:
šŸ„ Healthcare - Wrong medical advice
šŸ’° Finance - Incorrect market analysis
āš–ļø Legal - Unsupported claims
šŸ¤ Customer Support - Wrong product information

Our solution:

An API that quantifies epistemic uncertainty in LLM responses. In simple terms: we tell you when your AI is making things up.

How it works:

  1. Your app gets a response from an LLM
  2. Send prompt + response to our API
  3. Get back confidence scores and recommendations
  4. Decide whether to show, flag, or reject the output

Real impact:

  • One healthcare client reduced incorrect answers from 23% to 4%
  • A legal tech company now catches 85% of unsupported claims
  • A customer support bot knows when to escalate to humans

We're offering a free tier (1,000 requests/month) so teams can test it risk-free.

If you're deploying AI in production and care about reliability, I'd love to hear your thoughts.

Try it: https://aletheionguard.com

What challenges are you facing with AI accuracy in your organization?

hashtag#AI hashtag#Enterprise hashtag#Technology hashtag#Innovation hashtag#Startup

dreamy spruce
#

Hello everyone,

Anyone know how to train the llm model in the local machine?

earnest meteor
novel dome
#

anyone tried building a byte latent transformer model from scratch?

dreamy spruce
novel dome
potent ravine
#

In kaggle notebook, I got the error when I run "from transformers import Trainer"


ImportError Traceback (most recent call last)
/tmp/ipykernel_77/3090685106.py in <cell line: 0>()
----> 1 from transformers import Trainer

/usr/local/lib/python3.11/dist-packages/transformers/utils/import_utils.py in getattr(self, name)
2152 isdummy = True
2153
-> 2154 def getattribute(cls, key):
2155 if (key.startswith("") and key != "_from_config") or key == "is_dummy" or key == "mro" or key == "call":
2156 return super().getattribute(key)

/usr/local/lib/python3.11/dist-packages/transformers/utils/import_utils.py in _get_module(self, module_name)
2182 module_file: str,
2183 import_structure: IMPORT_STRUCTURE_T,
-> 2184 module_spec: Optional[importlib.machinery.ModuleSpec] = None,
2185 extra_objects: Optional[dict[str, object]] = None,
2186 explicit_import_shortcut: Optional[dict[str, list[str]]] = None,

127 
128 

/usr/local/lib/python3.11/dist-packages/transformers/trainer.py in <module>
136 )
137 from .training_args import OptimizerNames, ParallelMode, TrainingArguments
--> 138 from .utils import (
139 ADAPTER_CONFIG_NAME,
140 ADAPTER_SAFE_WEIGHTS_NAME,

ImportError: cannot import name 'is_torch_optimi_available' from 'transformers.utils' (/usr/local/lib/python3.11/dist-packages/transformers/utils/init.py)

How to fix that?

sweet ice
#

heya, has anyone done DPO with Kimi K2 Thinking?

We have the dataset for it (100K human preferences), would love to offer a bounty of compute cost reimbursement + bounty for a successful DPO run on it šŸ™‚

remote raven
#

Hey everyone šŸ‘‹
Quick question for folks building chatbots / LLM apps here —
how are you currently handling long-term user memory beyond a single session?
Curious what’s actually working in practice (RAG, DB, custom hacks, etc).

true tree
pastel patio
#

Anyone interested in giving a talk (45 -60 mins) , at University of Wisconsin - Milwaukee ???
My team is organizing a colloquium as part of the NSF REU Summer 2026 program(Link: https://sites.uwm.edu/reu/about/ ).

The colloquium will focus on Natural Language Processing (NLP), Natural Language Understanding (NLU), and Large Language Models (LLMs), with an audience made up primarily of undergraduate students. It's a wonderful opportunity to inspire the next generation of researchers and share your expertise in a meaningful way!
Kindly dm with your info if you are interested

Regards,
Pamela Martey

worldly sapphire
#

https://www.kaggle.com/datasets/mabubakrsiddiq/urdu-ghazal-dataset-32-poets-and-their-ghazals

The dataset contains poetry by 30 greatest urdu poets. Here they are:

'mirza-ghalib','allama-iqbal','faiz-ahmad-faiz','sahir-ludhianvi','meer-taqi-meer', 'dagh-dehlvi','kaifi-azmi','gulzar','bahadur-shah-zafar','parveen-shakir', 'jaan-nisar-akhtar','javed-akhtar','jigar-moradabadi','jaun-eliya', 'ahmad-faraz','meer-anees','mohsin-naqvi','firaq-gorakhpuri','fahmida-riaz','wali-mohammad-wali', 'waseem-barelvi','akbar-allahabadi','altaf-hussain-hali','ameer-khusrau','naji-shakir','naseer-turabi', 'nazm-tabatabai','nida-fazli','noon-meem-rashid', 'habib-jalib'
Every ghazal is given in three writing systems:

Urdu (Arabic Script)
Hindi (Hindi writing system)
English (Latin Script)
Divided into three folders: ur, en and hi.

Potential use cases:

NLP
Meter Detection
Modeling AI to predict the poet given the ghazal or couplet
Have fun with data!

stoic patrol
#

Hi Guys, wondering, anyone here using a 16gb vram GPU like the RTX 5060TI to run LLM workloads for actual daily coding work?

#

Been getting some good output using Llama.cpp but wondering if someone's been using similar and knows a stack that can make it run even faster etc (e.g. gemma4-26b-a4b I've managed to use a Q8_0 quantized version of it with partial GPU offload around 20t/s output)

ocean ore