#data-science-and-ml

1 messages · Page 177 of 1

serene scaffold
#

!slice

arctic wedgeBOT
#
Sequence slicing

Slicing is a way of accessing a part of a sequence by specifying a start, stop, and step. As with normal indexing, negative numbers can be used to count backwards.

Examples

>>> letters = ['a', 'b', 'c', 'd', 'e', 'f', 'g']
>>> letters[2:]  # from element 2 to the end
['c', 'd', 'e', 'f', 'g']
>>> letters[:4]  # up to element 4
['a', 'b', 'c', 'd']
>>> letters[3:5]  # elements 3 and 4 -- the right bound is not included
['d', 'e']
>>> letters[2:-1:2]  # Every other element between 2 and the last
['c', 'e']
>>> letters[::-1]  # The whole list in reverse
['g', 'f', 'e', 'd', 'c', 'b', 'a']
>>> words = "Hello world!"
>>> words[2:7]  # Strings are also sequences
"llo w"
spiral falcon
#

well hi, i dont learn data science in my university, but i'm learning to explore my job oppoturnity. I see that i should begin with statistics. Who can suggest me some course on udemy, coursera about it?

modern copper
#

i get it now

#

finally.

#

ill need some help in NumPy array 'axis' when i make arrays in higher dimensions

agile cobalt
random nymph
# spiral falcon well hi, i dont learn data science in my university, but i'm learning to explore...

in terms of statistics you need to know, it depends on what you're working on to be honest, but in school I've studied topics in hypothesis testing, regression, and ANOVA which have been helpful sometimes. I couldn't name you any courses, but I have an open source book that covers Python data science fundamentals. I've heard good things about it. https://jakevdp.github.io/PythonDataScienceHandbook/

mellow vector
#

Alternatively, there are some ML courses on udemy that will include a good deal of math but DS != ML engineer

#

Oh! One important note about Krista King, avoid the course with Jose, it has Data Science in the title but she pairs with an instructor who's really dry and it just mixes like oil and water (not good). I'd advise you not to waste your time on any of her programming courses.

obsidian talon
dire cipher
lime grove
#

one of the most frustrating aspects of AI with Python is that package versioning becomes a time wasting problem

rich moth
lime grove
#

I went to all this effort of setting up torch and a bazillion related packages (spacy, nltk, the entire mess), only to find that torchtext refuses to install if you're in python > 3.8

#

software rot is a HUGE issue in python

#

torchtext is in fact stranded at this stage.

rich moth
lime grove
#

nope, and thanks

spiral falcon
spiral falcon
runic flame
#

Hi everyone, not sure if it goes here but if people here are using VS code I'm sure its optimized for this purpose.

I'm currently learning the math behind ml models (starting with linear regression and gradient descent) and coding them in python as I go. My issue came up when I tried to use the sympy differentiation function to get my dJ/dw and dJ/db from my function J(w, b). I was then reminded that I don't have sympy after I cleared my library cache a bit ago (I think that's what I did but you get the point - I don't have it)

This lead me down a rabbit hole of stuff and wondering why I'm using homebrew and conda through miniforge and such. Anyway I thought I'd come here and ask for suggestions for how to set up my VS code (on an m-chip mac if that changes anything) from people more knowledgable than me. Open to any and all suggestions :)

agile cobalt
#

for most purposes, just using uv works great

uv add to add new packages, uv run if you need to use more advanced configurations, uv sync when cloning to another machine
very few packages require conda nowadays

runic flame
iron basalt
#

It's banned at some companies already, it's legacy.

runic flame
#

damn i didnt know that thanks

#

worth installing stuff through homebrew still?

iron basalt
#

IDK, I don't use Mac.

runic flame
#

ah ok thanks anyway

mellow vector
rich moth
floral badger
#

I'm using OpenCV to try to do camera calibration on my 5 DOF robot.

This camera is mounted on the robot behind the wrist. I need to solve for the matrix that related the End Effector (EE) to the Camera.

In open CV there is two commands, not sure what to use:

  • calibrateRobotWorldHandEye()
  • calibrateHandEye()

From my data, I collected the angles of each joint, and the corresponding image of an aruco marker.

I am able to calculate the relationship between the object and camera by solving PNP and the relationship between the base and EE using FK.

I have getting weird numbers saying that the camera is like 20cm away from the EE which makes no sense, it's closer to ~ 8cm

#

If anyone done this before it'll be really helpful!

knotty scaffold
#

in this funcn if the network is slow so the funcn keeps running but due network slow connection it cannot type for search so if there is a better alter to make sure that if the site is fully load or not rather than using the time.sleep

serene scaffold
serene scaffold
#

!warn 1446195934127063062 Your message was removed for asking for money in exchange for help. This will be your only warning regarding this behavior.

arctic wedgeBOT
#

:incoming_envelope: :ok_hand: applied warning to @flint sapphire.

rapid plaza
#

hello

#

guys where i can find for creating my own generative ai model and train it. Like i want a ai model coded not fine tuning not using anything else under the hood just hardcoded and trained by me model

#

but cant find any sources to help me build this

agile cobalt
#

first you'll need of a few millions of dollars worth of hardware and electricity bills
second you'll need of a team of about a dozen PhD researchers

waxen kindle
#

if everyone could do it they wouldn't be sold 200$/year by only like 5 companies in the world

rapid plaza
#

well i am not trying to do a nanobanana level project

#

i want to train it only on my own pictures and i would be the only output you know

waxen kindle
#

so you want a text-to-image algorithm built only by yourself ?

waxen kindle
#

nice, I think we can reduce the requirements to 2-3M$ worth of hardware and only 8 PhD researchers

rapid plaza
#

xd got it

#

but what about my own language model only trained on one topic for example history

waxen kindle
#

that's better, do you have 2 M$ and 6 PhD researchers ?

agile cobalt
rapid plaza
#

so basically 90% of AI projects are Fine Tuned versions of the others ?

waxen kindle
#

more like 98% I would say

rapid plaza
#

understood

agile cobalt
#

maybe more like 99.9%

tbh for LLMs might be like, 99% just use base models as-is with some prompt engineering, then 99% of that 1% that "train" a custom model are fine tuning it

rapid plaza
#

One more question lets say i want to create an ai app for historical things like wars, battles and revolutions and i want to have images like where there are casualities corpses and blood is there a gen ai that supports blood content

#

uncensored

waxen kindle
#

but you want to produce these images ?

#

Instead of showing historical ones ?

agile cobalt
#

I'm sure you can find some versions tuned for generating unsettling content online, I don't want to go anywhere near it though

and even leaving gore aside, I'd be very hesitant to show AI generated images as if they displayed real historical events

rapid plaza
waxen kindle
#

then don't show images for these events

#

tbh deep faking historical images is something I would avoid very much

#

I mean, if you make up images, why would I trust you regarding the remaining of the content ?

#

Is this battle AI generated too ?

#

how can I know ?

agile cobalt
waxen kindle
#

I don't think you need AI at all for your app, I think the only thing you need is historians

rapid plaza
#

this wont go public just for university to get my grade and thats it

#

if you have any better ideas i am more than open to them

agile cobalt
#

what even are the requirements set by the university?

waxen kindle
#

you don't even need to finetune a model, you only need a RAG

#

which is a good project in itself

rapid plaza
#

i wanted to do law but it had so many backdoors and exceptions

agile cobalt
#

perhaps better not though, there is a considerable chance it will make stuff up and I don't think you could ground it in reality very well

rapid plaza
#

Thanks guys a lot

#

but i am lost again xd

waxen kindle
#

I think a RAG and a bunch of historian reports on a specific war could be great

#

find something like one hundred documents regarding a specific war and recent war from which you can find some pictures

#

You can have an algorithm to retrieve the relevant picture based on the question

#

and the RAG will find an answer to it

#

somehting like that

rapid plaza
#

well my original idea was to only to take Azerbaijani history and get the information from our own historians and create a model to only answer questions if it is related to Azerbaijan. Well now i know i have to fine tune it and avoid using ai generated images for history because of realiability issues

waxen kindle
#

that's an idea, maybe find some document about a recent war or simply about Azerbaijan since 1900 idk

#

or since their indenpendance

#

you will be able to find a lot of picture and a lot of document

#

and not only about war

#

something about the "Azerbaijan since 1990"

rapid plaza
#

okay i will try to figure it out

#

thanks again

foggy egret
#

Yoo guys, could anyone please tell me how to practice python for Data Science and Analytics. I mean I always feel stucked while importing various libraries and downloading assets. How to manage it

jaunty helm
#

+1 on RAG, will be way more achievable
quite a few recent embedding models like jina v4 are multimodal too, which means you don't have to worry too much about lining up text vs. image embeddings, nor worry about documents coming in janky formats because you can just take each page as an image and embed that

midnight ermine
#

Hey guys

#

I made a package a while ago (it's currently in submission to JOSS) for deep learning specifically leaning heavily on transparent and Python only code, so that it is easy to learn the ML theory by reading the codebase

#

Just wanted to share it here in case anybody would find it helpful. Because I know people liked micrograd from Andrej, so this is something somewhat similar, but you can actually use it for real projects

#

Feel free to email or open an issue on github if you have any feedback or comments, or just DM on discord 🙂

placid pine
wide basin
#

hii, i am new to this ,i wan to learn ai ml dl ,etc like this stuff ,please help me ,can you just give stepstone marks that i have to cross in next 2 months

mellow vector
wide basin
#

Also ,shame to say but I am beginner. I am still learning python,so give step stones from starting

woven prairie
#

Can some tell me how can I reduce latency of my agent,
I am buliding a multi agent tool which can do some task for each task I have agent and each agent have certain tools. I am using gpt40 model result are fast but I am going to add speech to text and text to speech in it so it will cost some latency too. I am using langgraph to build it and It's my first time buliding and multi agent tool, truly speaking me building my first agent.

Any good advice to reduce latency what can I do,
Make prompt more better , in my tools I am having mongo db operation so make them fast , something like this I can do
Other than this what else I improve

serene scaffold
glad wing
#

Hi everyone! I'm currently working with a server (I can send you invites if you're interested) on a beginner-friendly AI/ Machine Learning project using Python, and we're looking for more teammates.
Requirements:

  • Basic Python (or JavaScript) knowledge
  • Interest in AI/ Machine Learning
  • Willing to learn and work as a team
    If you're interested, feel free to DM me. Thank you!
woven prairie
serene scaffold
# woven prairie I am using open ai key and model gpt40

so that means that you have to wait for the prompt to travel over the internet to OpenAI's servers, then wait for the result to generate, and for it to travel back to you. so there might not really be a way to speed that part up.

#

how many tokens are each prompt?

woven prairie
#

Yeah

#

See as poc I am using in memory of of langgraph and I am sending full state to model as conversation memory

serene scaffold
#

I'm not sure what you mean by that. but you might be able to cut down how long it takes for OpenAI to generate a response if you make the prompts more concise.

#

also, if you know where the OpenAI server is physically located (like in Ashburn, VA), you might try running your code on a server that's located near there.

woven prairie
#

Ok you mean to say if I am able to send a request to my nearest server of open ai then my response will be fast

serene scaffold
woven prairie
#

Correct me if I am wrong
If server one is located 1 mile near to me where I am having my tool .
Another server is 2 mile
So 1 mile one will give a fast response.

serene scaffold
woven prairie
#

Okkk

serene scaffold
#

is this a data science or AI question? otherwise, try #1035199133436354600. also, please be more specific about what you're trying to do.

clear light
#

Alr thanks

woven prairie
#

One thing I did was we were making a mongo operation, some of my team mates were fetching the complete data from mongo then filtering out it,
I updated the approach and made filtration itself in mongo db and then fetch the data.
So it reduces my space

serene scaffold
woven prairie
#

Yes, I am new so I lack fundamentals

#

But still learning

hardy wren
#

Hi everyone,

I’m planning to use this title for my Master’s dissertation:
"Machine Learning-Based Vibration Analysis for Early Fault Detection in Marine Diesel Engines."

Could you please share your thoughts on whether this title sounds clear and strong enough for future PhD scholarship opportunities?

Thanks a lot!

old token
#

Hi is there anyone from the Ilastik image processing organisation . Its a suborg under Python which I wanna contribute to

lime grove
#

today I learned: BERT-tiny, BERT-mini, BERT-small, and BERT-medium can all be trained on a 4 year old laptop with an RTX-3070

#

Only thing is that you need transformers[torch]

serene scaffold
lime grove
#

fine-tuned 🙂

#

basically get it from hugging face, and then apply some dataset to it.

#

you can get pretty good accuracy on dataset tasks with ~15 minutes training step

#

for the biggest BERT in the set, "AlbertForSequenceClassification"

iron lark
#

how hard is it to make an AI for recognizing sign language?

grand minnow
iron lark
grand minnow
iron lark
lapis sequoia
#

@glad wing

glad wing
lapis sequoia
polar anvil
#

Hi
Im busy doing a course on udemy 100 day course for python and wanted to know what would be worth doing next after finishing that course?
Should i start learning the modules and stuff?

silent nest
#

Hello I am learning file handling, data analyst should I message here ?

lofty horizon
polar anvil
lofty horizon
polar anvil
lofty horizon
flat abyss
flat abyss
iron lark
#

i do not know how neuron network work

flat abyss
#

So try to learn it before anything,
My recommendation is to staring with making a neuron-network that recognize XOR

#

You will get the fundamentals with pass forward, back propagation, neurons, weights and error cost and a global understanding of neuron network

iron lark
#

cant i do opencv tho

flat abyss
#

Yeah with python you'll need it

#

but idk if it recognize sign language

#

you will have to train an IA with sign languages then process with opencv

#

I think

opaque condor
#

Is there a beginners book to pytorch

iron lark
#

but then id need an ai and massive amounts of data to get words out of that

stuck brook
#

Hi to everyone, which are the best sources to get deeper into AI/ML? I know the basics, I just want to get very good in this area, to really understand it. Any github repos, forums, etc. would be more than helpful.
Thanks ! :D

serene scaffold
silent nest
#

I learnjng data analyst should I message here ?

silent nest
#
===creating folder===
my_folder already exist

===Creating files===
created - image.jpg
created - document.txt
created - song.mp3

===moving files===
file moved outside.txt into my_folder

[Program finished]
```py
silent nest
silent nest
rich moth
#

What coures are you taking?

silent nest
rich moth
#

Same here lol

#

Im more hands on

silent nest
silent nest
rich moth
#

Alright..

silent nest
#

But they don't tell me to buy a course

rich moth
#

So it a was free public learning course?

silent nest
silent nest
rich moth
#

Those are very good, just as well.

#

What country are you from?

silent nest
silent nest
rich moth
#

Cool man, you seem passionate with your choice. I salute you. 🫡

vapid notch
#

I have a strategy I checked it's results on ticks by ticks data but it's taking me ages to automate it

#

I tried making it for ninjatrader in c# but got errors

vapid notch
# worldly dawn why?

I don't know what's the error during compilation the code compiles and gives no error but in the strategy analyzer it isn't taking any trades.
I even tried in python by creatina bridge which would send trades to the platform but no success

worldly dawn
#

and why do you have compilation issues with python?

vapid notch
worldly dawn
vapid notch
worldly dawn
#

then use a broker API

vapid notch
worldly dawn
#

As engineers, it might be tempting to focus on very fancy ways to compute all sorts of things. But that would be missing the other half of the equation: a strategy that works

#

As you start putting your strategy in the real world, you will also learn many things about the market! Be it about slippage or bid-ask spread or other market behaviors

vapid notch
worldly dawn
# vapid notch I agree with this

yeah it's a classic mistake. You could spend years making a very fancy trading and backtesting platform, but have zero strategy to trade with

vapid notch
worldly dawn
vapid notch
#

I'm done with the Backtesting phase

worldly dawn
vapid notch
worldly dawn
vapid notch
worldly dawn
vapid notch
worldly dawn
#

That's 3k$/year

vapid notch
vapid notch
worldly dawn
#

think about gross vs net profit, slippage, etc. I doubt you will start betting 500k$ in capital in your first month either

worldly dawn
vapid notch
worldly dawn
vapid notch
#

So I already think what I got in the backtest in live markets the performance would go down to 20-30%

worldly dawn
#

could be more, could be less

vapid notch
#

yess i am thinking of hiring a developer

worldly dawn
vapid notch
worldly dawn
vapid notch
worldly dawn
vapid notch
#

thats what i did in the backtest for the last 1.5yrs of data

worldly dawn
#

jshell> 70 * .85 *40
$14 ==> 2380.0

so ~70 trades with 85% win ratio and each time you win, you make 40$

#

that should cover your VPS easily already

worldly dawn
#

you said real trades

vapid notch
#

i did on a demo account in the live market not with real capital

worldly dawn
#

ah not the same

vapid notch
# worldly dawn you said real trades

noo i said that the logic on which i have been backtesting my strategy for the 1.5 yrs of data so in live also i did the same but on a demo account not using real capital

worldly dawn
#

get your feet wet with real trades before thinking about hiring people

vapid notch
worldly dawn
#

build that experience

#

do things manually first so you can learn more

#

quite tangential, but nonetheless relevant

#

you want to see in real life how things happen, how it feels, the pressure with it and learning from it. From there, it will make automation easier and you can take these learning

#

you cannot skip that step

#

You will not be able to go from "in theory it works in my backtesting" to a "fully automated tick level strategy on a VPS in real life" without intermediary steps

#

at least if you want to keep or grow your capital

vapid notch
#

yess agree with it i will look into these things

#

thanks for your help men really appreciate it

worldly dawn
#

you got it! You will do great!

vapid notch
#

will update the progress here

worldly dawn
#

looking forward to it

vapid notch
stark plume
#

hi there,
i am developing a codebase search tool like deepwiki as a side project.
I am doing a hybrid search (semantic similary + bm25).
But while doing an A/B testing between Semantic Similarity only VS Hybrid.
This is what I observed, 1. When I ask "how does authentication works in this project?", some times hybrid search performs better, sometimes vector search performs better.
2. When I ask, "How does the DiagnosticAnswer model works here? Is it related to authentication here?", the hybrid search performed better. Hybrid search works better when we are specifically looking for something (keyword specific), for example, How a UserModel is implemented.

I have developed a RAG system around it.

RAG_BM25_WEIGHT=0.3             # BM25 contribution to RRF
RAG_VECTOR_WEIGHT=0.7           # Vector contribution to RRF
RAG_RRF_CONSTANT=60 

here are my initial weights.

Any suggestions on how can I improve the system?

jaunty helm
#

if you can take a bit of a performance hit, rerankers are models trained specifically to re-order retrieved documents based on relevance to the query, that might work better too

dire cipher
#

Need some second opinions for a good reliable Linux distro for database experimenting and program development for future Data Science projects, what would be a good route to go? Mainly will be playing with SQL and NoSQL w/ AI using those sources.

gentle flower
#

Linus Torvalds has officially released Linux kernel 6.18, the latest stable edition of the kernel that underpins countless devices — from tiny embedded boards and laptops to sprawling data-centre servers and supercomputers.

agile cobalt
worldly dawn
rich moth
lime grove
#

WSL2 is the next best thing

#

I would prefer to be in an all-Linux work environment, but there is too much Micro$oft legacy drag

#

I became a bit of a convert when I managed to compile my very own version of Vim on WSL. Everything was the same as it had happened on Linux

lime grove
#

heck, the base WSL distro even came with compilers included. I actually find that heart warming - it's the way things ought to be

stark plume
wicked parcel
#

Anyone here working with docker, docker containers with GPU utilization on Ubuntu servers for Agentic AI? Currently working on a big project need friends send help. 😂😂

wicked parcel
# grand belfry how can i help you?

Well it took me 12 hours to set up my old laptop (by old I mean 2 years old) up to run proxmox I have a Ubuntu desktop VM with full access and GPU utilization, which was the hardest part to pass through besides setting up networking because of current drivers for Nvidia and cuda. I am mainly looking for a mentor or group of people I can get in with who are cyber security focused. I am working on an Agentic AI network security infrastructure and have no one to talk to about this as all of my friends have no clue about anything I'm studying.

grand belfry
wicked parcel
#

Thanks man! Yeah I will say as someone who's always been into coding and tech, the past 2 years of study and diving deep into deep learning Agentic ai advanced Python and all of the libraries within, how does anyone do it?

#

I do currently have a simple chat bot I'm setting up on the server to aid in building the entire thing, as far as I know I have vs code set up to ssh into my container. What's like the best ide i should be using? I've messed around with jet brains stuff, cursor, zed, atom, and a couple of others but vs code has always been the most comfortable, probably because that's what they had us using in college

#

Also, if you work with Nvidia, do you know if they have support for Ubuntu 22.04lts to upgrade from nvidia-535 drivers to nvidia-570-open?

#

I mean the graphics cards not the company itself

jaunty helm
stark plume
jaunty helm
#

anyhow you can still try the ideas of reweighing and/or reranking

woven prairie
#

Can we Make and multi agent tool using langgraph and chain, where when user query so our master agent handel tool calling and normal conversation, when user query is formal and no other sub agent to activate master agent will reply normally but if master agent think that sub agent to activate then it will activate it then that agent will run tool and return output to user without giving back it to master agent , next time when user will query the user query will go directly to subagent not to master agent that subagent will decide that it has solved the user query then pass response to master agent and then master agent will reply back.

ashen ridge
#

Hi im hoping to get an apprenticeship in data science or as data analyst what sort of projects would make the company hire me quick??🌝🌝 its entry level but still, i need some gooood project ideass

serene scaffold
ashen ridge
serene scaffold
serene scaffold
#

in the current market, luck isn't on your side. if you're serious about starting a career in this space, I think you should reconsider your strategy.

woven prairie
#

My question above

subtle lotus
rich moth
#

Maybe you guys got some ideas, but im working on a self improving AI system that can rewrite its own code. so far so good, but on the 3rd cycle the AI made the code async I guess for performance.. but it didnt update the caller that expect that sync functions. the test still passed due to python funkyness but its breaks the system architecture. but for self modifying AI that will improve multiple files at once, what do you guys think the best recovery strategy would be here? Im not exactly set on how I want to build this part, how it iterates overself itself to apply fixs.

thick heron
#

can some one suggest me how to find legit actual implemented proven research papers that makes sense and actually contribute and not some review nor random case study on google scholar i am unable to find a really good paper that is worth spending time on ? or am i doing it wrong is ml being filled with random stuff ? i am trying to find papers on pure ml based implementations

worldly dawn
ashen ridge
serene scaffold
ashen ridge
# serene scaffold that's the same as luck. are you serious about doing things that would make you ...

Im on the survival mode in the current stage of life, am i serious about doing stuff? Yes ,,,, even if it takes years? Idk , i only have a year, either i get into the industry by starting uni next year or getting an apprenticeship, uni - not possible due to finance and im not eligible for student loans, Apprenticeship- the only other option to get in the industry, i could learn and most importantly I could earn.

serene scaffold
ashen ridge
ashen ridge
#

Uk

#

What country are you from?

thick heron
thick heron
ashen ridge
#

@serene scaffold r u gonee?

worldly dawn
serene scaffold
thick heron
worldly dawn
ashen ridge
#

Btw thankssss❤️

serene scaffold
thick heron
#

Can I get advice i optd ai and ds in my b tech every sem feels like another storm intense study is required and project must some how work in very little time (india btw ) that leaves me less room I don't know whether to do project or grind leetcode i already have 9.63 gpa i m in my 2-2 4 more sems left what should I do I feel like 0 skills i won't make it to the market

worldly dawn
flat sail
#

i dont get how to use tensorboard

grand minnow
rich moth
#

beautiful im running into the exact same problem current advance model deal with " python " this crap is haunting me

#

then when a model sees """ something, something """ in the header or anywhere in the code, it freaks out

#

but I've seen this pattern before. it might just be picked up by training models but it think its deeper in the python source, or more likely how """ is used in python and """ in the context of prompts

#

filtering stuff out just seems like the wrong approach in all this

#

its some sort of paradox around prompt injection and python sytax with triple quotes

rich moth
#

So the model loses strctucal boundaries when it collides with """ """ in python it seems.

#

but in training these models they invertedly created this whole nightmare? does this tie to PEP 257 docstring conventions

#

this seems pretty big

#

Because """ can be tokenized in 3 or 4 different ways , here we are lol

#

literally the ghost in the shell.

#

really all I can envison going forward in my project is completely removing docstrings

#

thats the wrong direction also though

#

they allow us a layer to talk to the LLM via the code and also humans. in some situations they can act as crucial instructions

vague inlet
#

has anyone tried

#

any cloud provider that hosts a big LLM for u

#

like deepseek and such

#

and just gives api

agile cobalt
#

hosting an arbitrary model you pick typically gets very expensive as they need to dedicate a big machine only for you

but serverless APIs exist for most common models and are the main way developers use LLMs

idle summit
#

@vague inlet not from a cloud provider specifically but we use openrouter in production. Been few months now

hasty elk
#

Been working on a tool that might help folks here - it crawls websites and outputs clean, deduplicated text ready for fine-tuning. Exports to JSONL (OpenAI format), Parquet, or CSV. Handles quality scoring and language detection automatically.

Useful if you're building training datasets but don't want to deal with HTML parsing and dedup manually.

https://apify.com/mea/ai-training-data-curator

Happy to answer questions about how it works!

Apify

Curate clean, deduplicated training data for AI models. Crawl any website, score quality, export JSONL/Parquet for GPT, Claude, Llama fine-tuning.

iron mauve
#

So I’ve been making a custom
LLM and just gave it its own voice the road to making real world Jarvis or Cortana is looking bright

rich moth
#

wow i found it ``` prompt = f"""Rewrite this Python file to address the improvement focus.

FILE: {file_path}
FOCUS: {focus_area}

CURRENT CODE:
{current_code} ``` current code started with """ it was throwing the whole thing off but i couldnt see the """ behind {current_code}

silent nest
full nexus
#

Hi! Do you guys have any recommended youtube channels for learning machine learning? Even just the basics

serene scaffold
#

(I'm sure there are more good ones, but I haven't been in the market for a few years.)

rich moth
#

What do you guys think of the UI? Got any suggestions

grand minnow
rich moth
#

What file is highlighted currently.

#

i should have sent a picture with that

ashen sable
rich moth
rich moth
#

I was thinking of quick keys like shift +1 and shift +2 to expanded or collapse the left or right panel. But 3 different models, like fully open, partial, or fully closed.

iron mauve
#

( please fork if creating your own ) push bug fixes to main though

rich moth
#

Got this Flux model running, this thing is a resource hog

rich moth
#

Jesus, I asked it to genereate some simple dobberman pixel art.

ashen sable
ashen sable
# rich moth like pytest?

What's that ?
like...les say I have 10lines of code I don't want to make another file just for this sake of those 10 lines just want to execute on terminal

ashen sable
#

Is there any product ,m

#

?* I'll be happy if there isn't...||I'll make one and add to my resume||

rich moth
#

ya im not exactly sure what you're asking. but i think thats what you are thinking

ashen sable
#

Like basically i write the code in a file i don't want to save it nor do any other bs just one click it'll run and give me output not line to line output also

#

Alr I'll just look into it ig..ipython looks better

rich moth
#

I think theres scratchpad and playground

#

sounds like what you're describing

ashen sable
lime grove
#

Who comes up with these silly UI choices?
The behavior of a classifier (or any estimator) in only printing non-default parameters is the default behavior in scikit-learn versions 0.23 and later. This change was implemented to make the representation of estimators more concise

#

Why do I have to go to the codebase to see what the defaults are?

#

I set a parameter, then print the estimator, but I can't see what's else is in there: it only shows me the one I set. So, I have to waste time and focus by hunting for the right article in the sklearn homepage

#

I literally spent hours trying to figure why the heck there was no regularization parameter in the LogisticRegression estimatorJFC

#

You have to add this line immediately after import sklearn sklearn.set_config(print_changed_only=False)

waxen kindle
#

Feel free to raise an issue on their github then....

lime grove
#

might do that

rich moth
#

😘

lime grove
#

That's my actual issue with this. Why make it opt out? It's so exasperating

wooden sail
#

there's no easy alternative to reading the source, other than writing your own functions with a lower level module

lime grove
#

I'm currently trying to translate a tensorflow NN to pytorch and yes, it's a problem

#

Because, for some wonderful reason, tensorflow stops at 3.9

#

Maybe just look at the NN with a visualizer, and take that the pytorch side where things just make more sense

#

It's also really annoying that unless you print the whole parameter set in the estimator, you'll overlook the fact they the default optimizer is LBFGS, which is aggressive and unstable

wooden sail
#

but it does make strong assumptions

rich moth
lime grove
#

The optimization is failing to converge

#

I need something more boring, like cg

rich moth
#

I installed tensorflow 2.17.1 on 3.11

#

Maybe im missing something.

lime grove
#

Maybe I am. I can install it on 3.12, just checked

rich moth
#

Ahh ok

gilded depot
wooden sail
gilded depot
wooden sail
#

under the conditions i mentioned 😛

gilded depot
#

the minima is on the right in the middle

wooden sail
#

you're showing several trajectories there that end up at different places

gilded depot
#

thats not bfgs I just googled the function image

#

let me find bfgs

wooden sail
#

it's very easy to show that if you don't have the above conditions, bfgs will reorient your gradient in the wrong direction

gilded depot
#

it can't since its always multiplying gradient by a positive definite matrix

#

that guarantees a descent direction

wooden sail
#

it doesn'T because that positive definite matrix is not the hessian of the true function

#

that's the strong assumption

gilded depot
#

the hessian is a zero matrix

#

its a fully piece wise function

wooden sail
#

i think you mean piecewise linear there

gilded depot
#

yeah

wooden sail
#

and then clearly you can't get the right hessian when quasi newton algorithms multiply the gradient by the inverse of the approximate hessian

gilded depot
#

thats theoretically but it still works really well

wooden sail
#

using a true newton method would NaN right away, and a quasi newton will give you a descent direction based on a positive dfinite matrix that is not the hessian

#

they are also strictly talking about lipschitz functions btw

#

so even if they are not everywhere differentiable, they are smooth in a different sense (not exploding values)

gilded depot
#

I have a problem like this but even worse, and its mainly like this because i was too lazy to make it better, but bfgs works on it really well

wooden sail
#

that's good, and bfgs is indeed more stable that regular newton methods

gilded depot
#

its aligning a grid of beats with one BPM change allowed to beats detected by some algorithm

wooden sail
#

but also it doesn't work on every problem

#

very importantly in the paper you shared, their conjecture says nothing about global optimality

gilded depot
#

no method works on every problem but I'd say BFGS is one of the most robust ones except for big problems

wooden sail
#

just that you will find some point with 0 in the subdifferential

#

(hence why i said you need to start close to the true solution)

gilded depot
#

oh you know what

#

i use pytorch and it has no subdifferentials

#

its just the gradient is either from left side or right side

#

which fixes it

#

at least im pretty sure

wooden sail
#

pytorch and all other ml modules are very opinionated with subdifferentials

#

they pick one for standard functions and always use that

#

this can give you different results if you run the same alg with tensorflow instead of pytorch

#

i don't remember off the top of my head what pytorch gives for abs(x) at x = 0, probably 0 or 1

#

but anything between -1 and 1 would work as subdifferential there, and bfgs will do its best to give you a positive definite matrix even at x = 0

gilded depot
#

I also used backtracking line search so probability of hitting subdifferential exactly is basically 0

#

i red somewhere that for bfgs its better to use backtracking

wooden sail
#

yeah should be the case

ashen ridge
#

How did u guys get into data science?

#

Are yall satisfied in life?

lime grove
#

To me it's just an extension of the work I did during PhD

ashen ridge
austere wing
#

Hello fellow snekers. I'm working with openpyxl to generate reports (I can use only openpyxl) where inserting values into cells are easy, ofc it is. But what I found struggling with is Excel Error Indicator (a little green triangles in left upper corner of Excel cell) that I can't remove by code. I tried manually change cell data_type with s, @ and even ' character. Nothing works. Anyone got super pro tip to get rid of that green Error Indicator?

silent nest
#

Is anyone working on data analyst, file handling ?

rich moth
lime grove
#

regarding optimizers, there is no single best solution in terms of both finding the minima, and computational efficiency. The only ones that will always find the minima are rather inefficient (e.g. simulated annealing with gradient-free simplex polytopes). So, it is sort of empirical in a way. If one doesn't work for some reason, just switch it with another one

#

the is particularly true if you do not know what the function looks like.

#

maybe it's the difference between 85.34% accuracy and 85% accuracy? It might not matter in the end. A convergence failure might actually result in a satisfactory result, given the type of insight that is sought.

austere wing
#

@rich moth I'm writing float strings, and also floats itself, for example:

ws["H20"] = f"{model.get_value():.2f}"
ws["H21"] = f"{model.get_value():.2f}%"
ws["H22"] = f"{model.get_value():.2f}%"
ws["H23"] = f"{model.get_value():.2f}"
ws["H24"] = model.get_value()
ws["H25"] = model.get_value()

Both are treated as dates...

I tried to set ws[cell].number_format = "0.00" or ws[cell].number_format = "0.00%"
that doesn't help either
also tried ws[cell].data_type = "s"

rich moth
austere wing
#

I found a solution from one dev on another server.

from openpyxl import Workbook
from openpyxl.worksheet.errors import IgnoredErrors, IgnoredError

wb = Workbook()
ws = wb.active

ws["A1"].value = "1234"

ws.ignored_errors = IgnoredErrors(ignoredError=[
    IgnoredError(sqref="H20:H25", numberStoredAsText=True),
])

wb.save("report.xlsx")

I'm trying this now

silent nest
#

I found this is a almost dead chat ig only few people intrested in ai and data science

serene scaffold
silent nest
#

):

lime grove
#

question for whomever: how much time do you, as a data scientist, spend reviewing actual pen and paper math? By this I mean going over things like proofs?

#

for instance, earlier we were discussing the LBFGS vs. CG algorithms, and their convergence properties. I would assume that algorithm stability could be traced to something like determinants of poorly conditioned matrices in a denominator somewhere

#

I would imagine that being able to walk up to a white board and justify a decision using math is a desirable ability

#

the thing that makes me nervous of gradient based optimizers is that you might have discontinuities in the surface being differentiated. This was often a situation in force field optimizations, and in such situations gradient-free approaches were recommended.

calm cipher
#

Has anyone been having issues with lightning recently? I'm working on a project where installing Lightning version 2.5.5 causes a huge memory leak in my model to the point it fails after an epoch or two, whereas previous versions of lightning don't have this issue

wooden sail
#

when interviewing new masters and phd students, we give out tasks to evaluate this type of reasoning when possible

lime grove
gilded depot
lime grove
#

Gradient-free optimizers don't have this problem, but they take forever to converger

#

The Nelder–Mead method (also downhill simplex method, amoeba method, or polytope method) is a numerical method used to find a local minimum or maximum of an objective function in a multidimensional space. It is a direct search method (based on function comparison) and is often applied to nonlinear optimization problems for which derivatives ma...

gilded depot
#

unless the objective has very specific well known structure I feel like all you can do is try different methods and see which one is the best

lime grove
#

yup, that was my point. Just try and see what works. Or, kill a fly with a howitzer, and use something like simulated annealing

#

Simulated annealing (SA) is a probabilistic technique for approximating the global optimum of a given function. Specifically, it is a metaheuristic to approximate global optimization in a large search space for an optimization problem. For large numbers of local optima, SA can find the global optimum. It is often used when the search space is di...

#

I recall that SA was typically used for optimizing models of proteins, to see if you could get the right folding behavior. Pretty heavy duty stuff

wooden sail
#

and SA also has no convergence guarantee

tidal garden
fallow coyote
#

has anyone tried implementing some form of machine learning in CAM/CNC programming? if so, for what purpose and how did you do it (e.g. for the CAM software you used)? and also a general question, for those of you in engineering (any discipline), how have you utilised ml and could you share your projects with me (i.e. link me your github)?

lime grove
wooden sail
#

by setting the hyperparameters to values you could only know if you already knew enough to solve the problem, yeah 😛

lime grove
#

yeah, it is always a chicken/egg problem. You don't know what's there till you know it.

#

in optimization the key is that your "walker" must have the ability to get out of local minima, perhaps even to a higher minima, and explore the surface in a non-local fashion.

#

this is not an easy problem, actually. Difficult AF

#

a good literature for this is David Wales' book, which is computational chemistry, but don't let that dissuade you from reading it - it is packed with algorithms for optimizing difficult landscapes
https://www.amazon.com/Energy-Landscapes-Applications-Biomolecules-Cambridge/dp/0521814154

#

Like real-world glasses, which are materials that are stuck in a type of intermediate phase because nature itself can't figure out the right optimizer to find a global minima within universe time scales

ashen ridge
#

What are some good entry level ds projects that could get me an internship

open trout
#

So if I want to make a machine learning AI, would python be the best option?

silent nest
#

I need a help is printing destination is correct or bro_files in the print stage . After shutil.move

bro_files = "my.txt"
with open(bro_files,'w') as file:
    file.write("hello")

destination = os.path.join(folder,bro_files)
shutil.move(bro_files,destination)
print(f"file moved {destination} to {folder}")
sleek temple
#

Hey linux this I'm data science student I'm in begginer lavel

woven prairie
#

I’m using FastAPI for the backend, but I’m not sure how to display or showcase the AI-to-AI conversation on the frontend.

wet dome
#

what is the point of the epsilon error term in Y = f(X) + epsilon
We assume there is some relationship between X and Y, so why do we need the error term?

wooden sail
#

the common example being noise: all electronic devices have noise due to heat causing electrons to move. so any data you measure with an electric/electronic sensor will have random noise added to it

#

another is if you assume there is a relationship between y and x, but the real relationship is slightly different. then the epsilon term can absorb this modeling error

wet dome
#

thank you

wooden sail
#

here, one assumes that y = mx + b, but our orange straight line does not actually pass through any of the data points

#

the difference between the orange line and the measurements (blue dots) is the epsilon

opaque condor
silent nest
opaque condor
#

Do I have to use have to load a custom data set and do I have to use a bounding box to track items moving in a video or just a picture

agile cobalt
#

custom data set
if you want to train a model you'll need of a dataset - if you want for it to perform great in a task, the dataset you use needs to reflect that task

whenever you want to train or just use a model someone else trained, as well as whenever you curate a dataset yourself or use one someone else has built, is up to you to decide

video or just a picture
not sure but pretty sure it depends on the model

opaque condor
#

No the import data set so that it sees it as a data set not just a file full of photos or images I don't know how it sees it as possible I just want to know since I know I could look online but I want to make a specific data set teach myself

thick heart
#

anybody use gemini or the cursor google squid thing while coding?
or maybe even firebender?
i feel like im stuck in 2018 the way im coding hearing about all the cool tools people are using to PROPELL their workflow, ive used chat gpt but found it produces many errors, Geniune question how are these ai tools helpful to you guys?

agile cobalt
#

it depends on what you are working on

for many projects working with large code bases or trying to tackle complex problems, it'll be more of a burden than of help

for simple tasks on small (or new) code bases, it can help a bunch with generic stuff and boilerplate

there's also some difference in between the latest premium models and older/cheaper ones, but even the best, latest and most expensive models still make simple mistakes at times

waxen kindle
#

Do you have questions regarding them ?

opaque condor
lime grove
#

🤦‍♀️
pip install ndjason

pallid panther
#

Any telugu ppl here ?

waxen kindle
#

what would you ask them is there were ?

rich moth
rich moth
#

Having the AI actually inside the environment and see you project gives you tremendous advantages. I think thats why things like claude code are becoming successful. But im having great sucess with it too, even with local models

#

i wanted to make something like that, but with local models

agile cobalt
flat plinth
wooden sail
#

even in the example i gave, where f is linear for simplicity, epsilon isn't the intercept

flat plinth
flat plinth
wooden sail
#

it's also not the bias of the network, that would go in f

flat plinth
# wooden sail it's not a constant here

? Constant in a way its not bring multiplied by input variable. Yes I know parameters are not constant thats the entire purpose of training to tweak them, I was just stating my intuition for the bias parameter in weight-input variable plane.

wooden sail
#

it's not the bias here

flat plinth
#

Okay my bad. I didn't notice y = f(x) + epsilon read that as y = mx + epsilon somewhy.

#

Lack of sleep signs fr

rich moth
rich moth
#

I did it! The qwen 3 30b model is a serious upgrade. It detecting the entire project entirely. The context is much longerbut its cutoff on your guys end. But then I turned it around and asked it to create a technical diagram using its knowledge.

#

well a prompt for it, I was going to feed it to the sd3 or flux model but ive having an issue copying and pasting in the prompt section of my UI.

#

my .venv is populating the RAG database too ...hmm

#

i need ideas for a work around the .venv folder contains 34k files.

latent pendant
#

in what cases i should use sqlalchemy

grand minnow
flat junco
#

Hi, all. A complete LLM rookie here.
I have recently built my first PC with 5060 ti 16gb and 64gb ram to learn LLMs.
I am interested in learning image LLM fine tuning and needed some tuts recommendations or any help.
What I have done so far is setup comfy ui and created a workflow with realvis v5 model and it works smooth.
I am also looking to explore a bit in text generation LLMs, where I need some UI and model suggestions that would are good for local hosting. I am not sure what 8b/30b/70b model exactly means in terms of how much can my gpu handle. I am guessing it can handle 30b.
Any help is much appreciated, thanks!

serene scaffold
agile cobalt
#

fine tuning uses up some extra storage on top of the normal amount you would need for inference though, and for LLMs the context can also require a fair bit of memory (proportional to how many tokens you want to fit in the input)

flat junco
serene scaffold
flat junco
#

are there any good model suggestions?

serene scaffold
#

to do what?

flat junco
#

text generation and fine tuning

serene scaffold
#

right, but what kind of text generation?

#

every LLM can generate text and be fine tuned.

flat junco
#

i don't know tbh, i need to explore the space

#

just need to learn

jaunty helm
jaunty helm
opaque condor
#

@rich moth how do you get your plots?

jaunty helm
flat junco
jaunty helm
flat junco
#

so i can use a sdxl model + lora to generate images? or lora can individually generate images?

jaunty helm
flat junco
outer nebula
#

I've been working through Kaggle a bit and must say, I find the occasional shade it throws entertaining

lime grove
#

regarding pipelines, scikit let's you do stuff like this
preprocessing_pipeline = Pipeline([ ("vect", CountVectorizer()), ("tfidf", TfidfTransformer()) ])

#

followed by a nicely abstracted statement
prep_trn = preprocessing_pipeline.fit_transform(data_trn.data) prep_tst = preprocessing_pipeline.transform(data_tst.data)

rich moth
#

ive been playing with a control theory style "governor" around text gen. but this time instead of doing a RAG and hope the whole thing behaves correctly, im measuring the drift of every step and clamp or reject generation when it starts going off the rails. but what surprised me was how often model stays on topic white making stuff up, obviously topic similarity is enough alone. I was tracking the topic aligment , evidence support and novelty and those get combined into a single error signal, once it crosses the thresholds the controller nudges temperatures or injects constraints or rejects the chunk and retries. its like a PPL feeback loop for language instead of open loop setup. The results coming back are really interesting though. I ran 40 steps but i didnt show them all.

lime grove
hasty ivy
#

guys can you give me roadmap for ai

karmic zealot
#

hey i was wondering if you guys can recommend some books to learn about LLMs and deep learning with pytorch, I am familiar with linear algebra and calculus.

weary timber
#

im trying to create a big ai and human written code dataset , as you could guess the most missing data is the ai code, if anyone wants to help dm me! (even you having cursor or almost fully ai coded (popular ai code editors preferred) would help)

woven prairie
#

What's the best way to write prompt for an agent, i tried many iteration and in 50 times 1 time my prompt fails.

#

I need accurate answer

rich moth
# lime grove How are you defining the drift? What is the metric?

im measuring topic alignment (cosine similarity between the current text embedding and anchor embedding (RAG context) , evidence support and novelty. but its basically (1 − topic_similarity) + penalties for unsupported novelty / low evidence, clipped to 0,1

serene scaffold
charred light
#

I'm trying to write my df to a csv that's 500k rows. Rereading it, Pandas and excel has no issues reading 500k rows. However, Databricks is misreading the columns because I have a "description" field that can contain your typical deliminator characters (Pipe, single quote, double quote, comma, etc).

How do I write this to_csv without any issues from databricks? I've seen csv.QUOTE_ALL option but wouldn't this just end up with the same issue if there's mismatched deliminator characters in the description?

Example description: bob's cat has many diseases that transferred to sally"s cat | no incurred, pending Using any standard deliminator here would still cause issues?

#

Also, it looks like missing values (np.nan) and None would turn into "nan" and "" respectively under csv.QUOTE_ALL

lime grove
rich moth
#

seems to be working well, better than i imagined. im testing it with llama3 right now

rich moth
rich moth
lime grove
#

yeah, that is what I suspected. I lack the hardware to self-host, so I am thinking about the cloud, and how much that will be.

rich moth
#

Lambda has H100s for 2.49 an hour

lime grove
#

as far I am concerned, anything I do in this regard is strictly pedagogical, so perhaps think of it as "tuition"

#

do you do much NLP work on the input / output side of this? You are mentioned a lot of cosine similarities, embeddings, and so on.

rich moth
#

ya, i got lost in the saunce a few years ago. I did a lot of RAG and NLP stuff.

#

not so much these days. I had a chatbot I was working on for a while, but get easily distracted with new projects. but will eventually circle back

lime grove
#

AWS EC2 has some reasonably priced accelerators. Maybe $.50 for A100 (P4 tier)

rich moth
#

ya thats not bad at all

lime grove
#

it has enough RAM, IMO

#

sorry, $1.475

rich moth
#

What do you plan to do with it?

lime grove
#

right now? my goal is RL

#

but it is all embryonic

#

focus on small LLMs. See how far I can take one

#

some can actually run on my laptop (RTX 3070 + 8GB), but that's just unwise

#

Runpod is basically 1/2 price for similar compute

rich moth
#

ya you can get a 5090 for .76/hr

#

A40, 48gigs is only .20 cents . seems like the best deals so far

rich moth
#

I did a test, one without the governor the baseline and one with.

#

The text was far better with it too, the baseline got a confused and start spewing unrelated stuff

#

ill make a github for it

#

right now you need a source of truth though, like a RAG.

#

i wanted a more modern lightweight vector DB. has anyone used LanceDB?

rich moth
rich moth
#

This is super interesting guys I wanted to share more. In this prompt you can see the system fighting as it attempts to not regurgitate the same thing thing it already said. But you can see between lack of data it pulls on the subject in addition to the model trying to maintain the maximum token output . Its like hallucination = (force output ) / (available knowledge) it seems likes important to instruction the models of their limitations by answering that it doesnt know based on available data.

solar osprey
#

Hi everybody, how to handle an outlier in a small dataset like 27 entries? the outlier is just 1 but the column is 'total_population'

i am new to data processing like this, any suggestion on how to handle this ?

Note : this outlier is in a csv that will be merge with other csv to make a model

agile cobalt
blissful monolith
#

is Microsoft Visual Studio good

#

pls tell me

serene scaffold
blissful monolith
serene scaffold
blissful monolith
#

whatever i use

serene scaffold
blissful monolith
serene scaffold
#

it doesn't matter. use whatever you like.

blissful monolith
agile cobalt
# blissful monolith is Microsoft Visual Studio good

Visual Studio or Visual Studio Code?
some editors are extremely suited towards specific languages or toolchains (great for that, but not very ideal for other use cases) - in general I'd recommend either using one suited towards whatever you are working on, or just using a general purpose one

Visual Studio is geared for the .NET ecosystem, while it's possible to use other languages with it, I'd strongly recommend using VSCode or PyCharm instead
(or something like Marimo / Jupyter notebooks)

#

if you are already familiar with any of them (including VS) you can use it, but if you'll need to learn how to use it either way, it's best to pick one that has better documentation on how to use it for whatever you need it for - like https://code.visualstudio.com/docs/python/python-tutorial

blissful monolith
rich moth
#

I cant decide on UI for this thing. Any suggestions?

rich moth
#

you guys rock

#

Here we are right now llama 8b with llava 13b right now

#

I like this UI better though..

plush shuttle
#

guys guys guys

#

i just built a ani ai for my robotic arm instead of using lerobot
Epoch 0 Loss 1113.9988
Epoch 1 Loss 350.0988
Epoch 2 Loss 349.2488
Epoch 3 Loss 349.2484

plush shuttle
ashen ridge
#

Hi what are some good books to cover all of maths for machine learning??

#

Even if i could get a list of topics i need to know will be helpful

dreamy solstice
plush shuttle
gloomy shore
# plush shuttle Its like lerobot

Cool, just seen a 5 minute video its hand moving by ai.. something buzzword about "Action tokens" what's interesting how do you get a target to compare for loss, or do you even get a target ?

proven raft
#

I’m new to this field, and I’m working with a Kaggle dataset (the current playground competition) that contains many extreme values. I don’t yet have experience handling them, so I’d appreciate learning what approaches you usually take in such cases.

limpid zenith
proven raft
limpid zenith
proven raft
#

thank you for the answer @limpid zenith I'll search for it

limpid zenith
#

what model are you using to solve the problem?

#

i can tell you what class imbalance algo you need

#

class imbalance is essentially when there are rare classes causing the model to predict only the dominant classes more frequently because there are more examples of it

proven raft
# limpid zenith what model are you using to solve the problem?

I haven’t decided yet. I’m currently in the learning phase, focusing mainly on data preprocessing techniques and feature engineering that work across different models. My plan is to train several models and try to get the best performance out of each one. For my first attempt, though, I’m planning to start with logistic regression.

limpid zenith
proven raft
limpid zenith
#

no problem 🙂

jaunty helm
#

I'd suggest beginning with the "getting started" competitions instead, at least in those you could actually do feature engineering

proven raft
jaunty helm
# proven raft thank you for the info I will try one of those competitions.

link here if you can't find it
specifically the titanic, space titanic, and house prices should be good places to start; some others like store sales dive into more advanced topics like time series
and if you still want to tackle this month's playground (diabetes), just note that the dataset's quite detached from reality and I don't think you can really use insights from the medical knowledge of the real world

proven raft
tight bough
#

hi ... can anyone please help remove this double index ? I am a newbie when it comes to pd

#

.drop() doesn't work as it ain't considering it as a column

#

this occured after i sorted the coin_id column and used .reset_index() resulting in the left most index

ruby mural
#

When you learn pytorch , how much control do you have over ai

limpid zenith
#

"control"

ruby mural
#

Can you use ai like in an app, I am new at this

limpid zenith
#

that's a very broad question, but yes

ruby mural
#

Like use it in the backend using pytorch like If i was to design an AI powered app

limpid zenith
#

yes it's possible

ruby mural
#

Ohkkk

#

You are a python dev , but what do you specialise in

serene scaffold
limpid zenith
warm lily
#

Intro:

Excited to find this channel, I managed some infrastructure for a guy with a PHD in Artificial Intelligence at my last job and learn some things because it was his first job, and I finally got the money to purchase some hardware.

tight furnace
#

are the constant warnings just something I have to put up with 🤣

#

looked into using stub files but couldn't seem to get them to work

#

wait I was being dumb, I forgot to delete my invalid ty.toml file lol

#

but also come to think of it, does anyone bother with stubs for packages like this? because I'm noticing how it completely kills autocomplete for anything not defined in the stubs file

warm lily
# fickle shale cool!!

I am absolutely thrilled to have some gear, I was able to get Ubuntu Server running with latest CUDA drivers and Toolkit to Serve LLMs via vLLM. My first real experiement is going to be fine tuning Llama3.2-1B-Instruct with documentation from my open source project SDKs just to see how it reacts.

tight bough
blazing lake
#

LLMKit - parallel LLM streaming comparison 🧵

Built with Next.js + SSE to handle:
• Multiple provider APIs (OpenAI, Anthropic, Google)
• Real-time TTFT metrics
• Custom scoring algorithms
• Secure local API key storage
Perfect for choosing models for production apps
https://www.llmkit.cc/model-comparison/gpt-4-vs-claude-3-5-sonnet

LLMKit

Compare GPT-4 Turbo (Azure) with the same input. See which model performs best in terms of quality, speed (TTFT, latency), tokens, and cost. Test and compare for free.

rich moth
#

What do you guys think?

inner roost
#

hi! could someone help me with a task im trying to finish? im supposed to make a house recommendation system and it was going well, no errors or anything but then the outcome itself was all NaN and i have no clue what i did wrong (im a complete beginner so this will look pretty stupid to many of you lol)

serene scaffold
#

!code

arctic wedgeBOT
#
Formatting code on Discord

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

For long code samples, you can use our pastebin.

inner roost
#
import pandas as pd

all_houses= {
  'price':[100000,200000,800000,300000],  
  'rooms':[2,2,3,3],
  'size':[2200,3200,3300,2200],
  'floor number': [2,1,4,5]
}

row_labels=['house 1', 'house 2', 'house 3', 'house 4']

all_houses=pd.DataFrame(all_houses, index=row_labels)
all_houses

wanted_house = pd.Series(
    [400000,5,4400,3],
    index=['price', 'rooms', 'size', 'floor number']
)

recommendations= all_houses.corrwith(wanted_house)
recommendations
#

not sure if i did that right but..

serene scaffold
#

that's right, thanks. one sec

serene scaffold
inner roost
#

oh wow yes

#

what was the issue?

serene scaffold
#

the axis part.

inner roost
#

would you mind explaining that please?

serene scaffold
inner roost
serene scaffold
#

one axis is vertical and the other is horizontal. for rows and columns.
arrays can have even more dimensions.
apparently, when you did all_houses.corrwith(wanted_house) without specifying an axis, it picked the wrong one to align things.

inner roost
#

thank you so much :)

lime grove
#

model.selection.GridSearchCV(n_jobs=N) is what a multiprocessing abstraction should look like

jaunty helm
vale field
#

Does anyone know any resources or tutorials etc for exploratory analytics to do on datasets before training classifiers? I've been trying to understand distributions and relationships in my data in my datasets. I've been using youtube tutorials and other websites but I want more help. Would appreciate any advise and recommendations.

open umbra
woven prairie
#

Guys, have some used eleven labs before

elder epoch
#

Guys i m data scientist but i got the opportunity as an ai engineer but i need to switvh my whole cv to ai engineer version , is it good to mention in the project side that i ve worked on qcm genration from pdf which is smtg very very basic specially from someone that wait a lot from me

waxen kindle
#

What's the difference between both roles ?

#

I mean, put in your resume everything that is relevant for the role you apply to

serene scaffold
fluid latch
#

Hey folks, wanted to share nexttoken.co. We are ML Engineers who've frequently coded in notebooks, and we were frustrated by how clumsy AI agents still are with actual data workflows (pipelines, modeling, analysis, etc.). Agents on notebooks feel clunky and AI IDEs don't quite have the right UI for data work. So, we decided to build a new tool focused on this type of work. It's currently in beta and free to use. Would love for this community to take a look, and share their feedback. If you give it a try, please send any feedback (good or bad) to feedback@nexttoken.co.

ruby wharf
#

Hello, I’m a high schooler who is interested in the field of Ai. I am currently studying how to make a model generate words. Any thing I should know while going down this path?

serene scaffold
ashen marten
serene scaffold
ashen marten
#

really only meant like in terms of I have 2 extra consumer grade gpus and a unused cpu most crash course understanding of code and ai does a lot already to make it work. So are we to the point where you could build yourself a local machine essentially, with your own project. I'm sure there's enough code by much smarter people than me

even a good ide can fix more than I could imagine.

woven prairie
#

Hi

#

Has anyone worked with eleven labs before

grand minnow
woven prairie
#

Stream audio too

grand minnow
#

Audio is slower than text anyways

winter canyon
#

#python-discussion message
This was one of my questions I asked in #python-discussion and I found out that this doesn't need ai at all.

But obviously my intention is to build an ai assistant kind of thing that does some basic stuff and maybe add home automation as a feature later(someone said pyserial gotta research Abt that).

How do I approach this given I have only started ai/ds and maybe upgrade the assistant later.

How would I start learning about ai with this intention and Any suggestions/Tips for a beginner
Thank you in Advance

agile cobalt
# winter canyon https://discord.com/channels/267624335836053506/267624335836053506/1453289765515...

depends on which languages you want to support and if you need of any nonstandard features

for starters you can use Whisper v3 for speech to text (original by OpenAI, many more efficient variations exist), but there are a lot of models you can look into

and for text to speech, again depends on which languages and how much quality & expressiveness you want, but as a default kokoro is pretty good imo. Again, loads of alternatives

winter canyon
agile cobalt
winter canyon
#

English incident_actioned

#

I understood what u meant

agile cobalt
winter canyon
#

Nah I don't need it to have such features just basic to do and ✅

agile cobalt
#

if you wanted to create your own model or fine tune an existing model you should read them, but just for using it is not that much more complex than using an online API

winter canyon
#

Aah ic so how do I start about learning ai/ds or is my approach wrong

agile cobalt
#

if you want to study in order: math -> statistics -> data analysis & data science -> machine learning -> artificial intelligence

#

you may as well just start from those resources either way, it's just that the project you're doing likely won't require tinkering with modifying models

winter canyon
#

Ohh

thanks ❤️

I will learn from those resources incident_actioned
tho idk what u mean by math and statistics I will check that out too

#

Ohh I understood why math needed got it incident_actioned

winter canyon
#

I think I cant self learn this(Ai Ds) like I did python so I am doing this course:
Harvard CS50's Artificial intelligence with python(a video on YouTube)

#

So ig that's a good course incident_actioned

warm lily
#

@winter canyon good find on the CS50 I'm properly gonna do that as well this year let me know if you want a study buddy, I just started DeepLearning.ai's courses last week. I guess im a pro now lol.

strange lintel
#

i use a python script to catch drones and helicopters. the script uses datasets and ollama gemma3. ai queries got a lot faster yesterday. they went from 16-22 seconds to 1-4 seconds. i use ollama in windows 11 with an i7 cpu. has anyone experience a speed increase recently?

vale delta
#

can anyone recommend me python course for ds

strange lintel
#

i never took courses. i used to do mostly c, perl, php and some javascript until i started ufo hunting. i found sound datasets opencv tutorials in youtube and then i added ollama to the code

grand minnow
strange lintel
#

wow combat code to learn python

grand minnow
vale delta
grand minnow
winter canyon
abstract wasp
#

Hi can someone who’s in ML give me feedback on my cv so far, Imma apply to mle roles (I added the basic like name, school, major, contact info at the top + still not completely done w the app)

minor sphinx
#

for those who are graduated/already work in an AI field. Do you use propositional logic/predicatelogic/lambda calculus in your daily work life
or is it something useful in a way?
also merry christmas

blissful lynx
#

Hi

#

what AI tool do you see US engineers using the most these days?
ChatGPT vs Claude vs Gemini vs Grok vs DeepSeek ?

minor sphinx
#

idk if that helps

blissful lynx
#

Got it, thanks!

#

I’m mostly curious about what’s most common in real-world dev workflows rather than “best on benchmarks.” Which one do you see engineers sticking with, and why?

minor sphinx
#

I think each has its own use depending on the field of engineering. i just ran ur question through gemini and here is the summary (ofc i cant say anything ab the US because i dont live there, and nor am I an engineer yet)

blissful lynx
#

thanks!

#

I agree that it varies by field.

rugged condor
#

Hi everyone,
I’ve recently started learning about Bayesian Networks in my university AI Python course, and I’m having trouble understanding the core ideas behind them. I’d really appreciate some help clarifying a few concepts.

I’m confused about whether Bayesian Networks are mainly used to compute things like conditional, joint, and marginal probabilities, and what those probabilities actually represent or are useful for. Are Bayesian Networks only used for calculating these probabilities? I also don’t quite understand why we need a graphical or data structure to work with these probabilities instead of just calculating them by hand. Another concept I’m struggling with is inference — what it means in the context of Bayesian Networks and why it’s such a central idea.

I’m also curious about how Bayesian Networks are used in real AI applications, since right now the theory feels very abstract to me. What I’m really looking for are very simple explanations and beginner-level, pen-and-paper style exercises that help build intuition step by step. If anyone has an exercise sheet, notes, or learning resources that helped them when they were first learning this, I’d be extremely grateful.

Thanks in advance for any help!

blissful lynx
minor sphinx
blissful lynx
lime grove
#

yowza - API calls to deepseek are 1/20 the price of Mistral per M tokens. 1/10 for output.

limpid zenith
waxen kindle
#

What is the title of the section with your data scientist role ?

abstract wasp
waxen kindle
#

Okay so I would put everything in the same section "Experience"

#

I think the word relevant shouldn't be there, of course if something is not relevant, it won't be included

#

you don't need to say it

#

in my opinion, you don't have to write any single tool/technology you used in every single lines of your experiences

#

I mean it's not relevant to me that you used Deboose to do one specific task, or performed t-test etc.

#

Be succinct, you can make it fit into one page by turning all 2-lines bullets points into one

#

Plus remove the one that are not relevant for the role you target. You can keep the key word in the bullet points only if they fit he job description you apply to

vale delta
#

can anyone recommend good python course on youtube
i am getting confused in oop's concept

rugged condor
waxen kindle
#

!res

arctic wedgeBOT
#
Resources

The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.

winter canyon
#

@warm lily u still there

wicked furnace
#

why is ai agents so tedious and unreliable? it often feels like it destroys way more than it provides... am i missing out on something?

outer cloak
#

guys...

#

did u all know that u can type
from sklearn import DecisionTreeClassifier
instead of typing whole,
from sklearn.model_selection import DecisionTreeClassifier

warm lily
#

@winter canyon I won’t be around today and have work tomorrow. My schedule might not compare to yours to do that course

winter canyon
warm lily
#

I could probably spare one hour a day to get through it. I think I can still get through my other learning objectives while doing that.

#

Dm me for sure

minor sphinx
#

Also thank you

#

and may I also ask what did u study or how you got into the role

abstract wasp
gritty ivy
#

hi people.
has anyone made a linear regression/neural network in assembly language before?

serene scaffold
waxen kindle
gritty ivy
#

finally, I'm free !

serene scaffold
gritty ivy
#

well, I kinda gave up asking people, it's been littearlly 4 months since I started this

serene scaffold
#

In the amount of time that you spent doing something that complicated in assembly, you could have finished that project in python, and learned a bunch more things

gritty ivy
iron basalt
gritty ivy
#

cool

iron basalt
#

Assembly is not really hard, just tedious.

#

Being able to read it is worth learning though for performance reasons, but writing it is rare now.

rich moth
#

Has anyone tried any of the ARC competitions before?

limpid zenith
rich moth
#

What was your expereince with it? What did you build?

limpid zenith
limpid zenith
#

I've solved 37/400 test examples
when u say u solve it...it was the public eval right? was it an inference time only solution?

#

we recently has somone who claimed they modified TRM with a better score and didn't even know what Exposure bias was but was training autoregressively...apparently they coded everything with claude code/llms...so the code was completely wrong

rich moth
# limpid zenith i'm in neural program synthesis

ya, the public eval. Its inference time only, no neural net training on the data. But its program synthesis using a couple different things ive been tinkering with. Parliament voting system , typed dsl with grammer based enumeration, beam search with near miss scoring and a dreamcoder style libray that attempts to extract resuable skills from solved program (symbolic). I tried expereming with an LLM as an advisor but it doesnt really change anything.

limpid zenith
#

yeah those typed DSL systems don't do well on ARC

rich moth
limpid zenith
#

yeah makes sense since the search space is so large

rich moth
rich moth
next glen
#

hey

#

i want to get started with machine learning

grand minnow
warm lily
#

I've used pgvector in the past on a project, im going through a training course that mentions weaviate, am I understanding the functionality correctly that I cannot associate the ID and vectors to text in the same database, I would still have to use something like Postgresql to store the human readable payload of the documents?

vale field
#

Guys i have a question, it might sound like a dumb question but plz excuse me I am a beginner in data science/analytics. For datasets where all the columns are categorical data, when you need to make visualisations to show the data distribution and relationships in the data, how can you show distribution in categorical data?

#

I've mostly worked with datasets that have columns with continous data so far.

warm lily
#

I wasn't 🙂

limpid ember
slender crown
#

Who will win

rich moth
#

Man, I feel like one of the biggest challenges im facing is automating a rule discovery system, something that can invent a new primitive from a small set of examples. So far im manually having todo it by running the 400 sample test each time, find the near misses, extract and train pairs them and then manually spot the hidden rule (which usually a single primitive) but then add that new primitive and solve. Only thing I can think of is focusing on near misses and building some type of feedback to steer the inventing/creation part. Anyone got any ideas?

vale field
#

also i have another question i wanna ask. Basically in the datasets I've been exploring so far i also made boxplots and noticed there are a lot of data points above the upper quartile. My question is should someone take these data points as outliers without using any anomaly detection technique or is it ok to say datapoints in boxplot above q3 are outliers so it is ok to remove? (depending on the dataset etc)

fiery dust
#

hey guys, where should I learn data science in python if I already know the math behind the math of all models?

#

and, are online certifications important? or not really

serene scaffold
fiery dust
#

Thanks for the answer.

jaunty helm
#

(but if you're really worried, just cross validate and check if removing them makes your model better)

rich moth
#

i had a wild idea. that RAG system i've been working on. I was already using the deepseek r1 8b for reasoning task with vector embeddings and the drift monitoring. Then I thought I'll just incorporate it into the ARC for the rule discovery system. And its exactly what I wanted, but i loaded 3 near miss task from the jsonl file, which contain tasks that are 50-75% correct. But it actually produced the correct formula I already solved.

#

Problem is what happens when we run out of near misses and hints/clues

glacial root
#

hey, i'm trying to understand the procedure of how backpropagation works and just wanted to make sure my understanding is correct. so when we take the gradients, since the partial derivatives also depend on the input, do we take the average across all inputs?

serene scaffold
#

If you fully expand all the computations of a neural network, it's one really big function with lots of nesting

limpid ember
glacial root
#

yeah but when we analytically compute the gradient, it also depends on the input right?

limpid ember
glacial root
#

as in the input is a part of the formula

glacial root
#

but there are many inputs

#

so would we take the average across all the inputs in our data?

limpid ember
glacial root
#

i mean the input vector

#

containing all the features for a specific input

limpid ember
limpid ember
jaunty helm
glacial root
#

so we pass in the inputs combined as a matrix?

glacial root
warm lily
#

Been a minute since my statistics, calculus, investments, and accounting college classes but it gets me by in healthcare tech. To keep progressing with understanding AI should I first brush up on some fundamentals then perhaps start with linear algebra then progress into Andrew ng’s deep learning module?

rich moth
#

49 low hanging fruit now the hard part begins.

limpid ember
limpid ember
rich moth
#

It makes totally sense how they design the ARC2 test though.

limpid ember
warm lily
rich moth
#

I knew I’d hit this wall. I’ve had some success combining an LLM + RAG using the proven formulas... it’s even detecting the successful vectors.

limpid ember
rich moth
#

Ive tried deepseek r1 8b and qwen 2.5 coder 14b

#

anything over those parameters just make it slow even when I can fit the entire model in the VRAM

jaunty helm
glacial root
limpid ember
glacial root
#

my question is, since we have many inputs in the dataset, would we plug in each input and take the average across all inputs

limpid ember
warm lily
jaunty helm
#

well, if you mean your 1 sample has multiple features - no averaging
if you mean you train on n samples at once - then yes ((mini)batch training)

limpid ember
glacial root
limpid ember
jaunty helm
#

if you mean your whole training set is n samples, and you train on 1 sample at a time, so you input a vector of shape [1, ...] into your network in 1 step, then no you're not averaging

rich moth
#

But it doesnt take a big model todo that.

glacial root
#

so in this case, would i repeat the process of feeding in 1 sample at a time until the loss is less than epsilon?

#

so then once i have gone through the last training example, i loop back if i have yet to get the loss within the threshold

jaunty helm
#

and you can train for multiple epochs

boreal nebula
#

damn

#

this chat seems intellectual

#

i am still stuck at scikit

glacial root
#

for regression, is it a good idea to use pca before training the neural network?

jaunty helm
jaunty helm
boreal nebula
fiery dust
#

Hey guys, what's your opinion on coding in C? Like, doing ML projects in C?

limpid ember
limpid ember
warm lily
#

You know its nice to prove yourself right with a little money spent, some experimentation, to justify leaving naive direction while you're still learning yourself, "We cannot use Docker for XYZ reason in our demo production environment". XYZ reason doesn't exist on my setup, sharing CUDA drivers to the container. 🙃 I never could fathom why accessing the hosts CUDA drivers was a bad idea and why doing so would prevent PyTorch from recognizing it.

Been waking up at 5am to learn without an alarm 😂 Happy New Year all looking forward to sharing and learning with everyone this new year.

exotic star
#

if i want to pursue a career in ai, is learning 3d modeling as well smart? or would it hinder my progress to much?

serene scaffold
exotic star
#

thats exactly why i asked if i should do that

karmic zealot
#

Any CV resource recommendations?

#

Computer vision

serene scaffold
exotic star
#

used to freelance a little with 3d too

exotic star
#

tho i wouldnt do 3d later on for work

serene scaffold
#

!cpban 1156584657371025438 scam thing

arctic wedgeBOT
#

:incoming_envelope: :ok_hand: applied ban to @barren cloak until <t:1767194991:f> (4 days).

karmic zealot
#

?

#

I want to learn classification cv, any resources you guys have in mind?

limpid ember
limpid ember
exotic star
jaunty helm
#

3d generative modeling is an active area of research ig

foggy jay
#

Hi everyone 👋

#

What exactly
does it mean for a machine to learn something?

serene scaffold
hidden trail
#

Its all about patterns

iron basalt
# foggy jay What exactly does it mean for a machine to learn something?

Originally, behavioral modification through experience. For a machine to do this it involves storing what was experienced (sensed) in memory for later recall, either directly storing what was sensed uncompressed, in a perfectly compressed form, or in a compressed form that loses some of the information but uses way less space (lossy compression, like JPEG images). And/or by nudging some (virtual) system's behavior in some direction based on that sensed thing without really storing that thing. This is a sliding scale, because even nudging the system can be seen as storing (a very small) part of what was sensed in it (just very indirectly, not for direct recall).

#

What is stored depends on the problem statement, you may be able to ignore a lot of the incoming data/what was sensed (not relevant to the problem/noise).

rich moth
#

Man this person did not beat around the bush lol https://arxiv.org/abs/2503.23923

glacial root
#

also when it comes to linear regression, usually which is better, normal equations or gradient descent?

waxen kindle
#

Linear regression, there is a direct formula for it

#

Use it

exotic star
#

thanks for the link this looks realy interessting

foggy jay
rich moth
#

Maybe start somewhere like here.

bold grove
# foggy jay What exactly does it mean for a machine to learn something?

basically it’s just letting the machine fail until it figures out a pattern, there's a formal way to define it where a program learns if its performance on a specific task gets better the more data or experience you feed it

technically it’s just a massive math function with a bunch of adjustable numbers or knobs every time it makes a bad prediction it checks the error and tweaks those numbers slightly so it’s a bit closer to the right answer next time it’s not really thinking with consciousness but it's building its own internal logic instead of just following rules a human hardcoded

iron basalt
# foggy jay A computer is also a machine, Let's say our computer downloads the entire data f...

It's also about what that data is used for. The original definition of "machine learning," as it was originally coined, was referring to a program that played Checkers and would memorize board states and their evaluations such that next time it did not have to evaluate them. So if the intent of that storage is for a decision making program to make better decisions, it falls under machine learning.

#

Machine learning is directly tied to the history of AI, but since then has taken a life of its own as a programming paradigm, in which rather than explicitly programming in "rules," it's driven by data, lots of samples from which it determines the correct "rules" to have.

#

So you may be familiar with how functional programming differs in paradigm in that rather than given an explicit set of instructions you give a set of rules/constraints and it generates the correct instructions from that (declarative). This is kind of taking that even further in a way. Rather than write down the rules ourselves we just feed the program tons of examples and what to do in each case, and it tries to figure out something that works for all those cases (and more (generalization)).

#

So we are even taking away that step a human would have done, where they look at tons of examples and try to come up with the general rule(s) and then write those in code.

#

Not just out of laziness, but rather because in many cases it's too complex for humans to ever come up with the rules by hand (and even if we did, they may be so many and so massive such that we could not feasibly ever make a program for it by hand).

#

(For example with image detection, we previously tried this (heavily), and it got nowhere (only working somewhat under ideal conditions))

#

(To understand the problem with this, try making a human face detector with hand written rules, it teaches something important about complexity that humans can't deal with directly like that (this was often done in computer vision classes and still is, after which they then move on to deep learning) (it's tempting to assume that we have just not tried hard enough or are just not doing it right, but I can assure you we have historically taken that idea to the extreme, making very complex manual implementations, and that deep learning blew that all out of the water (this was deep learning's first popularity explosion moment actually)))

lime grove
#

Anyone here have any experience with Python's DoWhy module?

grand minnow
lime grove
# grand minnow why do you ask?

I am about to get into causal ML, and I would like to know if there are any caveats / instabilities, issues. For instance, I am already finding out that DAGitty seems to be no longer under any sort of maintenance

grand minnow
#

!pip dowhy

arctic wedgeBOT
#

DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions

Released on <t:1762578310:D>.

lime grove
grand minnow
lime grove
grand minnow
lime grove
#

!pip jupyterlab-dagitty

arctic wedgeBOT
lime grove
#

maybe that ^^^^^

lime grove
#

that's a plot.

#

a graph has nodes and edges.

ashen marten
#

I don't really know how to code python, but I brute forced a couple different chatbots to create a .exe file renaming program mostly to make my life easier on renaming folders with many pdfs, is there a place I can share my github link with the .py and my released .exe?

ran out of my free limits on copilot and claude to get to v9 but I'm pretty happy with it. I don't really know what should go next after this for managing/fixing it

I struggled heavily through compiling it so I have the requirements.txt and .py because I'm betting the idea of running some unknown .exe isn't that great

But i'm not sure where/who I could share with to see if it works like I think on a different system

lime grove
#

I once spent something like 3 months trying to figure out WTF was going on with a binary I had compiled. It wasn't producing the right numbers on some standard benchmarks. Turns out that someone above my paygrade had mixed/matched compilers for the numerical libraries I had linked.

#

Compilation is usually a painpoint, e.g.

lime grove
ashen marten
arctic wedgeBOT
ashen marten
#

oh I meant to include the readme.md and the requirements.txt

openpyxl==3.1.2
tkinterdnd2==0.3.0
pyinstaller==6.3.0```
arctic wedgeBOT
ashen marten
solemn frigate
#

I want to learn data science,