#data-science-and-ml | Python | Page 169

opaque condor Jun 25, 2025, 10:57 PM

#

A medical I was taking photos of a human eye to see if that patient had a heart condition or something worse that couldn't be diagnosed early on

Somehow the AI found the biological sex of each patient without ever being trained on it

serene scaffold Jun 25, 2025, 10:58 PM

#

opaque condor A medical I was taking photos of a human eye to see if that patient had a heart ...

What kind of model was it?

opaque condor Jun 25, 2025, 10:58 PM

#

I can't remember what type of medical model think it was 2015

serene scaffold Jun 25, 2025, 10:59 PM

#

Okay

#

I guess we'll never know

lucid wren Jun 26, 2025, 3:35 AM

#

I am going to pursue Core CSE should I do?

hearty token Jun 26, 2025, 4:15 AM

#

Is there a good book on fine-tuning, for things like understanding hyperparameters and functional analysis of the different curves like loss grad norms?

rain kelp Jun 26, 2025, 8:42 AM

#

hello everyone i need someone to help me solve a problem that i am unable to tackle. i have created a forecasting model on a train df. i need to test the submission on a test df. the difference between these 2 is that the test df does NOT have the 'sales' column but the train df does. the test df starts on the day the train stops and goes on for 15 extra days. my question is:
how do i make the lag values for the first 7 days of the test df? i need to take the values of the first day and assign it the 'sales' value of the train df but i am unable to do it. Also there are many values for each day as sales are made for each store and family of a certain shop group.

could someone help me make this function?

#

the model i trained on the test df has shown that lag values are very important features in predicting sales so i need to make them as well for the dest data

jaunty helm Jun 26, 2025, 11:28 AM

#

rain kelp hello everyone i need someone to help me solve a problem that i am unable to tac...

why would you have problems specifically for creating lag features for the first 7 days in test_df?
I imagine you'd either have no problems making lags at all (e.g. you're using an autoregressive approach), or have problem making lags for all test data except for the first day, not this weird in between

agile cobalt Jun 26, 2025, 12:43 PM

#

"similar pricing"?
Haiku is 0.80/4.00 USD per million input/output tokens, while Gemini Flash is 0.30/2.50?

unless Anthropic's tokenizer is much more efficient that feels like a giant difference to me?

wet dome Jun 26, 2025, 3:30 PM

#

hi guys, im trying to implement my own linear regression class. One issue im running into is, for some datasets where values can go up to thousands for example my learning rate has to be really small, e.g. 1e-8. Otherwise my parameters blow up to infinity

#

does anyone know why

past meteor Jun 26, 2025, 3:41 PM

#

Do you have evals or just vibe check?

long ivy Jun 26, 2025, 3:44 PM

#

wet dome hi guys, im trying to implement my own linear regression class. One issue im run...

Standardize or normalize the values.
When calculating gradients, you are matrix multiplying those large values to result in large gradients.

wet dome Jun 26, 2025, 3:45 PM

#

is it a bad thing i have to use a tiny learning rate value? @long ivy

#

does it just mean it will take longer to find the optimal parameters? as smaller step size across the surface

long ivy Jun 26, 2025, 3:51 PM

#

wet dome does it just mean it will take longer to find the optimal parameters? as smaller...

If some features in your data have large values compared to others, it makes the optimization process inherently unstable. As some features might have a bigger impact than others.
If you are using a tiny lr to compensate for that, then there is a chance that your model might not even converge.

long ivy Jun 26, 2025, 3:54 PM

#

wet dome does it just mean it will take longer to find the optimal parameters? as smaller...

check this out
https://datascience.stackexchange.com/questions/112841/why-we-have-problem-with-gradients-when-feature-values-are-of-different-range

wet dome Jun 26, 2025, 3:57 PM

#

long ivy check this out https://datascience.stackexchange.com/questions/112841/why-we-hav...

Ty

past meteor Jun 26, 2025, 4:00 PM

#

prompting 🤢

#

Cached tokens to reduce cost?

#

idk how Anthropic is priced in detail but input tokens <<<< output tokens

#

Your responses seem quite short

#

Is it worth optimizing to this extent?

#

Even so, that's all on the input side?

#

What are you using again?

#

Haiku?

#

For the notification stuff you can also switch to the batch API

#

You send a request and they do it on a best effort basis in an hour or so

#

And then you can for instance poll every N minutes to see the status

#

Reduces cost by 50 %

#

(disclaimer: I mostly use OpenAI. All I've used Claude for is to provide support for my mini lib)

#

Looking at the docs, like OpenAI, they have a batch API

#

The Message Batches API is a powerful, cost-effective way to asynchronously process large volumes of Messages requests. This approach is well-suited to tasks that do not require immediate responses, with most batches finishing in less than 1 hour while reducing costs by 50% and increasing throughput.

past meteor Jun 26, 2025, 4:53 PM

#

How much does it cost you per day so far?

past meteor Jun 26, 2025, 6:12 PM

#

As in, reasoning models?

agile cobalt Jun 26, 2025, 6:28 PM

#

numerical scores like ```

Mood: 7/10 (focused, slightly tired)
Energy: 6/10 (afternoon dip)
Stress: 3/10 (manageable workload)

past meteor Jun 26, 2025, 6:30 PM

#

In my automated evals I do low/med/high instead of numbers, fire it N times concurrently and take the mode

agile cobalt Jun 26, 2025, 6:30 PM

#

past meteor Jun 26, 2025, 6:32 PM

#

so you watch south park during your break

hollow cobalt Jun 26, 2025, 8:11 PM

#

Hello all, I'm in process of creating a small LLM and I'm running into small problems as far as the performance of the models. I've pretrained and finetuned a few models, and I was curious if anyone in this group chat is experienced with creating LLM's. I would love to ask a few questions about how I can improve the model. Please let me know!

agile cobalt Jun 26, 2025, 8:13 PM

#

hollow cobalt Hello all, I'm in process of creating a small LLM and I'm running into small pro...

if you mean creating from scratch, not using a pre-trained model, then no wonders your performance will be awful

they requires a ridiculous amount of compute to start getting good results

#

iirc even "somewhat reads like English" requires over a day worth of training on a data center grade GPU

hollow cobalt Jun 26, 2025, 8:17 PM

#

I don't think that understanding context and the structure of conversation comes down to compute.

#

In essence a large corpus of data and parameters, will take the model so far. But for adequate performance more focus on the preparation is required.

turbid field Jun 26, 2025, 9:07 PM

#

hello i am trying to get annotated pictures of vehicles using fiftyone however when i am uploading it in roboflow the anotated images includes other classes instead of only 'bicycle' it also shows annotated boxes of license plate car person etc etc.. also how do i change the label name into bicycle or their orignal name since the classes name in roboflow is -m-0199g (i think this is the name for the bicycle)

thank you sorry i am new

import fiftyone as fo
import fiftyone.zoo as foz

# Parameters
CLASS_NAME = "Bicycle"
NUM_IMAGES = 10
SAVE_DIR = "./bicycle_dataset"  # change if needed

# Load 10 images with only 'Bicycle' bounding boxes from Open Images V7 validation set
dataset = foz.load_zoo_dataset(
    "open-images-v7",
    split="validation",
    label_types=["detections"],
    classes=[CLASS_NAME],
    max_samples=NUM_IMAGES,
    dataset_dir=SAVE_DIR,
    shuffle=True
)

# Optional: launch FiftyOne app to preview
session = fo.launch_app(dataset)
print(f"Dataset loaded and saved in: {SAVE_DIR}")

gentle stone Jun 26, 2025, 11:18 PM

#

I like "ai for everyone by Andrew Ng". Its Basics of AI include ML, DL, type of ai and many more suitable for beginner but I don't like the slide presentation design. Also his tone of voice sometimes makes me sleepy.

opaque falcon Jun 26, 2025, 11:21 PM

#

hearty token Is there a good book on fine-tuning, for things like understanding hyperparamete...

I saw this:
https://www.udemy.com/course/fine-tuning-llm-with-hugging-face-transformers/?srsltid=AfmBOorVAPh1SRzq4Zqyb69ptm1PfkEVZX0bHqUzYP5xrKxP_Z-tfqho&couponCode=ST16MT230625G1

gentle stone Jun 26, 2025, 11:22 PM

#

Is there some kind of test about data science/ai from the company for those who don't have a degree? Or does the company need a certificate or portfolio?

opaque falcon Jun 26, 2025, 11:24 PM

#

gentle stone Is there some kind of test about data science/ai from the company for those who ...

Depends on the company. Having a good portfolio and blog about practical things to show you know what you are talking about is good. You can use github to host your jupyter notebooks like notes you organize:
https://ethanweed.github.io/pythonbook/landingpage.html

opaque falcon Jun 26, 2025, 11:25 PM

#

gentle stone Is there some kind of test about data science/ai from the company for those who ...

Or you can use Obsidian:
https://obsidian.md/publish

hollow cobalt Jun 26, 2025, 11:26 PM

#

If you want to learn about the GPT-2 architecture there's lots of educational material on it. But one book stood out to me, https://github.com/rasbt/LLMs-from-scratch. Great book.

gentle stone Jun 26, 2025, 11:27 PM

#

opaque falcon Depends on the company. Having a good portfolio and blog about practical things ...

Thank you bro, I really need that

hollow cobalt Jun 26, 2025, 11:28 PM

#

opaque falcon Or you can use Obsidian: https://obsidian.md/publish

No worries man, I must say its good for a grasping LLM's.

gentle stone Jun 26, 2025, 11:31 PM

#

Ai started to take over a lot bro, unemployed people increased every time in my country, How about your country?

hollow cobalt Jun 26, 2025, 11:33 PM

#

I haven't seen it with my own eyes, might look though

opaque falcon Jun 27, 2025, 12:16 AM

#

gentle stone Thank you bro, I really need that

I am looking for a study buddy if you are interested.

opaque falcon Jun 27, 2025, 12:17 AM

#

hollow cobalt No worries man, I must say its good for a grasping LLM's.

Do you have any notes on LLMs? Trying to crowdsource notes in easy to understand english. Would this be a project that would be of interest?

opaque falcon Jun 27, 2025, 12:20 AM

#

gentle stone Ai started to take over a lot bro, unemployed people increased every time in my ...

The same. I think its a world wide thing.

Some interesting observations from experience:

dedicating time to learn a highly paid skill pays off eventually
learning in community is better than doing it alone. Its more motivating.
working on practical projects is a great way to really remember and apply the concepts
doing it with a team of people keeps momentum and motivation up.
crowd sourcing notes is a plus. Everyone wins, especially when there is a question and it can be answered

#

I am starting a zero to hero study group. Is anyone interested in joining? I will be adding and crowd sourcing notes on a joint repo using obsidian, obsidian publish and juypter notebooks.

hollow cobalt Jun 27, 2025, 12:22 AM

#

opaque falcon Do you have any notes on LLMs? Trying to crowdsource notes in easy to understand...

Yeah I have a obsidian canvas that I documented all evaluation when creating my first LLM's.

pliant swift Jun 27, 2025, 12:23 AM

#

opaque falcon I am starting a zero to hero study group. Is anyone interested in joining? I wil...

sure thing man

#

I'm vykt on github

opaque falcon Jun 27, 2025, 12:23 AM

#

hollow cobalt Yeah I have a obsidian canvas that I documented all evaluation when creating my ...

Are the notes something you'd like to share and have others contribute to?

opaque falcon Jun 27, 2025, 12:23 AM

#

pliant swift I'm `vykt` on github

Will add you once I start the github group from sratch.

pliant swift Jun 27, 2025, 12:24 AM

#

I know very little math/data science, but would like to take an interest in it

hollow cobalt Jun 27, 2025, 12:24 AM

#

You should make the group on discord, and I'll join.

opaque falcon Jun 27, 2025, 12:25 AM

#

pliant swift I know very little math/data science, but would like to take an interest in it

Its cool. I want to create a book similar to this to teach/learn the Math through Python:
https://ethanweed.github.io/pythonbook/landingpage.html

opaque falcon Jun 27, 2025, 12:26 AM

#

pliant swift I know very little math/data science, but would like to take an interest in it

There is this cool course that I want to take:
https://www.udemy.com/course/math-with-python

pliant swift Jun 27, 2025, 12:26 AM

#

Looks really nice man. I'd like to learn some statistics too, and know enough linear algebra for very basic 3d stuff.

opaque falcon Jun 27, 2025, 12:26 AM

#

Also:
https://www.coursera.org/specializations/statistics-with-python

pliant swift Jun 27, 2025, 12:27 AM

#

I learned a little once to do 2d rotations, but thats it

opaque falcon Jun 27, 2025, 12:27 AM

#

pliant swift Looks really nice man. I'd like to learn some statistics too, and know enough li...

That's pretty cool. Sounds awesome acutally.

gentle stone Jun 27, 2025, 1:08 AM

#

opaque falcon The same. I think its a world wide thing. Some interesting observations from ex...

I'd like to join the community, Especially if we encourage each other and share knowledge with each other.

opaque falcon Jun 27, 2025, 1:20 AM

#

gentle stone I'd like to join the community, Especially if we encourage each other and share ...

Sweet! That awesome! What's your github name?

opaque falcon Jun 27, 2025, 1:27 AM

#

pliant swift I'm `vykt` on github

Invite sent

#

Created github org: https://github.com/TenGen-no-TenJin

pliant swift Jun 27, 2025, 1:28 AM

#

Where can I find it?

#

ah, nice

opaque falcon Jun 27, 2025, 1:28 AM

#

check email invite

pliant swift Jun 27, 2025, 1:28 AM

#

Just joined, thanks!

opaque falcon Jun 27, 2025, 1:33 AM

#

pliant swift Just joined, thanks!

What sort of info do you want to see?

pliant swift Jun 27, 2025, 1:33 AM

#

It's a bit specific, but anything that can be applied to binary analysis

#

but i dont know enough to know what can be applied to binary analysis, so really im up for anything!

#

learning statistics will find its uses even in unexpected places im sure

hollow cobalt Jun 27, 2025, 1:45 AM

#

My user name: 1stest

gentle stone Jun 27, 2025, 12:54 PM

#

opaque falcon Sweet! That awesome! What's your github name?

Username: msapawie

#

Let's connect!

short moth Jun 27, 2025, 1:47 PM

#

anyone know why i am getting an extra column (filled with NaNs) when I concatenate dataframes?

agile cobalt Jun 27, 2025, 1:48 PM

#

short moth anyone know why i am getting an extra column (filled with NaNs) when I concaten...

post a minimum complete reproducible example
(a snippet of code we can run without relying on other files which shows the issue)

short moth Jun 27, 2025, 1:50 PM

#

ok

#

import pandas as pd
import os
folder= r'Documents/ARINC-429 Datasets'
files = os.listdir(folder)
df_obj = []
for file in files:
    df_obj.append(pd.read_csv(fr'Documents/ARINC-429 Datasets/{file}'))
updated_df_obj = []
for i in range(len(df_obj)):
    updated_df_obj.append(df_obj[i].assign(BUS_NAME = files[i].replace('.csv','')))

full_df = pd.concat(updated_df_obj, ignore_index = True)
full_df

#

#

so I have a list of csv files. I want to concatenate them into 1 dataframe, but I also want to add an extra column called BUS_NAME with the name of the respective CSV for organizational purposes

#

so I tried that and it seems to work. only that it created a random column called BUS NAME which I never said to create

#

it could be that one of the csv files already had that column

#

let me check. I may have found my error

#

yes that was the issue

broken shadow Jun 27, 2025, 3:31 PM

#

help

Screenshot_2025-06-27_at_11.31.09_PM.png

serene scaffold Jun 27, 2025, 3:32 PM

#

broken shadow help

what is the value for "path"? please give the answer as text rather than a screenshot

broken shadow Jun 27, 2025, 3:38 PM

#

serene scaffold what is the value for "path"? please give the answer as text rather than a scree...

sry, kinda new to coding

# Download latest version
path = kagglehub.dataset_download("andrewmvd/medical-mnist")

from kagglehub import KaggleDatasetAdapter

train_data = kagglehub.datasets.dataset_load(
    adapter=KaggleDatasetAdapter.PANDAS,
    handle="andrewmvd/medical-mnist",
    path=path
)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/tmp/ipython-input-21-3633857365.py in <cell line: 0>()
      1 from kagglehub import KaggleDatasetAdapter
      2 
----> 3 train_data = kagglehub.datasets.dataset_load(
      4     adapter=KaggleDatasetAdapter.PANDAS,
      5     handle="andrewmvd/medical-mnist",

2 frames
/usr/local/lib/python3.11/dist-packages/kagglehub/pandas_datasets.py in _validate_read_function(file_extension, sql_query)
    106             f"Supported file extensions are: {', '.join(SUPPORTED_READ_FUNCTIONS_BY_EXTENSION.keys())}"
    107         )
--> 108         raise ValueError(extension_error_message) from None
    109 
    110     read_function = SUPPORTED_READ_FUNCTIONS_BY_EXTENSION[file_extension]

ValueError: Unsupported file extension: ''. Supported file extensions are: .csv, .tsv, .json, .jsonl, .xml, .parquet, .feather, .sqlite, .sqlite3, .db, .db3, .s3db, .dl3, .xls, .xlsx, .xlsm, .xlsb, .odf, .ods, .odt

serene scaffold Jun 27, 2025, 3:39 PM

#

broken shadow sry, kinda new to coding ``` # Download latest version path = kagglehub.dataset...

can you show the result of doing print(path)?

broken shadow Jun 27, 2025, 3:41 PM

#

serene scaffold can you show the result of doing `print(path)`?

uhh it's
```/kaggle/input/medical-mnist``

serene scaffold Jun 27, 2025, 3:43 PM

#

broken shadow sry, kinda new to coding ``` # Download latest version path = kagglehub.dataset...

it looks like the path parameter is for which file in the dataset you want to load. but you probably want to load all of them. so kagglehub.datasets.dataset_load might be the wrong function.

broken shadow Jun 27, 2025, 3:45 PM

#

serene scaffold it looks like the `path` parameter is for which file in the dataset you want to ...

okay, thank you

wet dome Jun 27, 2025, 3:52 PM

#

I'm trying to see how good my linear regression class is by measuring it's performance. I calculated the mean squared error (MSE), root mean squared error (RMSE) and the R squared value (R2)

#

Are these the metrics I should focus on to measure performance or is there something else I could do too?

#

Just trying to figure out how to see how good my model is

prisma forge Jun 27, 2025, 7:13 PM

#

Hello, is this where I can ask questions about getting help organizing a report by the data inside of it?

opaque falcon Jun 27, 2025, 7:56 PM

#

hollow cobalt My user name: `1stest `

Invite sent

opaque falcon Jun 27, 2025, 7:56 PM

#

gentle stone Username: msapawie

invite sent

thick heron Jun 28, 2025, 2:20 AM

#

Guys

#

I want to improve my little experiment i did

#

Help?

#

https://www.reddit.com/r/MachineLearning/s/NEXNJ4RWFl

From the MachineLearning community on Reddit

Explore this post and more from the MachineLearning community

#

I posted it on reddit

glacial root Jun 28, 2025, 2:52 AM

#

hey, let's say i wanted to create a small text processing library similar to nltk, for each module, for example the tokenization module, would it be better to implement it as a class or as a group of functions

#

i thought it wouldn't make a difference but not sure, just wanted to know which is better for library design

gentle stone Jun 28, 2025, 3:14 AM

#

opaque falcon invite sent

Guess what project we will make🤔

opaque falcon Jun 28, 2025, 3:14 AM

#

gentle stone Guess what project we will make🤔

What do you have in mind?

gentle stone Jun 28, 2025, 3:16 AM

#

opaque falcon What do you have in mind?

We have to do research first about human problems, then we have to solve it

opaque falcon Jun 28, 2025, 3:26 AM

#

gentle stone We have to do research first about human problems, then we have to solve it

I am thinking maybe something in finance.

#

Maybe something that researches alternative asset classes in emerging markets with good fundamentals for investment

#

Earns dividend or has substantial capital appericiation. How to shift through these types of equities.

gentle stone Jun 28, 2025, 5:46 AM

#

opaque falcon I am thinking maybe something in finance.

That's great idea, I also think about the financial. It seems like if we were to make a project like that we would have to recruit a lot of expert.

#

If we are not able to make big projects, I think we can start from simple money management apps/website first

wet dome Jun 28, 2025, 2:13 PM

#

After normalising your data say via min-max normalisation, is it normal to get different MSE, RMSE and R squared values?

fleet token Jun 28, 2025, 4:53 PM

#

Hi, i hope this fits in this channel.
For my homework i have to estimate the quality of a respiration signal calculated from an ECG/PPG signal in reference to a signal recorded with a respiration belt. Those signals are basically very noisy sinus/cosinus waves, so our professor gave us the hint to do it with the cross-correlation method extracting the highest coefficient and eliminating the phase-shift through that. Seems straight forward and i already tried it with the scipy signal implementation of correlation ( explained here: https://www.scicoding.com/cross-correlation-in-python-3-popular-packages/), but i somehow can't get the proper alignment of the signals to work.
Any help on this issue would be very much appreciate, bc i'm just no the brigtest, if it comes to statistics^^

lapis sequoia Jun 28, 2025, 5:27 PM

#

When people talk about fine tuning LLMs, they are not talking about Bert, or T5 or encoder, encoder-decoder models right? And are they fine tuning autoregressive models from hugging face using torch? And what models and why?

lapis sequoia Jun 28, 2025, 5:48 PM

#

opaque falcon Maybe something that researches alternative asset classes in emerging markets wi...

Use yfinance and pyportfolio and get SEC files. And get a FrediAPI key.

opaque falcon Jun 28, 2025, 5:48 PM

#

lapis sequoia Use yfinance and pyportfolio and get SEC files. And get a FrediAPI key.

Thanks! I will look into this.

lapis sequoia Jun 28, 2025, 5:52 PM

#

opaque falcon Thanks! I will look into this.

Just be safe with SEC filings, get a token. And you need an api key for Fred. Hide it if you use it. Pyportfolio and yfinance, you will be fine.

serene scaffold Jun 28, 2025, 6:10 PM

#

lapis sequoia When people talk about fine tuning LLMs, they are not talking about Bert, or T5 ...

BERT is an LLM, but it's not an interactive one like ChatGPT, and the latter is the kind that people talk about these days.

#

Imo, the first L in LLM should be dropped and replaced with "interactive" or "generative"

#

Like, the number of parameters in "large" language models can differ by orders of magnitude now.

plucky pebble Jun 28, 2025, 8:19 PM

#

hello everyone, does applying DML to smoothed/penalised Local Projections sounds like it could work to provide confidence intervals?

#

that are accurate

#

the motivation is to keep the reduced variance of smoothed estimators whilst mitigating the bias introduced by shrinkage for more accurate coverage to measure uncertaintty

weak oxide Jun 28, 2025, 10:39 PM

#

lapis sequoia Use yfinance and pyportfolio and get SEC files. And get a FrediAPI key.

Nah Edgartools is Godly

#

Fred API is ok

#

yfinance isn't really that good with SEC filings

#

yfinance is fine for stock prices daily

#

For other data, I learned how to webscrape and use IB API

lapis sequoia Jun 28, 2025, 10:46 PM

#

weak oxide Nah Edgartools is Godly

I didn’t mean yfinance for sec for filings I meant for market data. I was just saying examples

#

Specifically stocks. They have ETFS, crypto, FX pairs/currencies, and other stuff as well. I was literally just replying to an answer off of the top of my head.

weak oxide Jun 29, 2025, 1:20 AM

#

lapis sequoia I didn’t mean yfinance for sec for filings I meant for market data. I was just s...

I get it but I had to promote Edgartools

#

I'm in contact with the guy behind it

#

Really cool

rich moth Jun 29, 2025, 2:26 AM

#

What if I said "reasoning" is not a mysterious emergent property of brains, but as a measurable , mathematical property of information itself...

From John Wheelers, "It from a bit" and information therorist like Shannon and Kolmorgorov, it can all be worked into a practical working system.

Ever think why complex information organizes itself in complex spirals of information? DNA double helix, shells, cosmic formations ? There's something fundamental here. Like think of black holes and their natures. Complex spirals of information, quantized.

It's the same reason quantum entanglement even works. The same law is fundamental to reality itself. Thats why it can work for vast distantness'. It's all connected via math.

Im rambling here, because I made a new type of AI that can reason, using pure math, information theory. Complexity reasoning really.

#

Curve fitting a black box is not the future it seems

small wedge Jun 29, 2025, 2:32 AM

#

is it open source?

#

because, no offense, but the whole "spirals of information, quantum entanglement, reasoning" line tied together reads a bit schitzo XD

rich moth Jun 29, 2025, 2:46 AM

#

small wedge is it open source?

I one HUNDREDED precent get it, man.. It sound so far off and crazy. I dont make these profound insights without weeks of validation work. I'm not psycho.. I'm very well grounded in my work. Validation projects these days are the core of my work.

iron basalt Jun 29, 2025, 2:54 AM

#

rich moth What if I said "reasoning" is not a mysterious emergent property of brains, but ...

"AI that can reason, using pure math, information theory." That is every modern AI and therefore does not really say anything.

#

Are you referring to Solomonoff induction?

rich moth Jun 29, 2025, 2:56 AM

#

iron basalt "AI that can reason, using pure math, information theory." That is every modern ...

You're absoutely right.. But its how we use that math.

#

DL took a wrong turn somewhere in 2010-2015. I feel likt its more explotive in nature than true discovery

iron basalt Jun 29, 2025, 2:58 AM

#

rich moth DL took a wrong turn somewhere in 2010-2015. I feel likt its more explotive in ...

Elaborate, what do you mean by exploitative?

#

And what is "true discovery?" How does it apply here?

small wedge Jun 29, 2025, 2:58 AM

#

rich moth I one HUNDREDED precent get it, man.. It sound so far off and crazy. I dont ma...

I mean, I see you here a lot posting metrics about your models so I know you aren't like just some random idiot, I'd just like to see what you mean exactly before accepting that some genius on discord is revolutionizing AI. How does your model differ from modern DL in such a way that it qualifies as a new type of AI?

rich moth Jun 29, 2025, 3:00 AM

#

iron basalt Elaborate, what do you mean by exploitative?

It's fundemental different in philososhpy! My system works with more data which is completely and entirely opposite to tradional ML. The entire overfitting problem is exactly what highlights it.

#

Logically the system should work better with MORE data, not the opposite.

opaque condor Jun 29, 2025, 3:08 AM

#

Does anyone know where I couldn't find high quality images for conversion

rich moth Jun 29, 2025, 3:15 AM

#

I guess the best way to say it is modern AI finds correlations, mine also finds structure thats measurable across domains.

iron basalt Jun 29, 2025, 3:16 AM

#

rich moth I guess the best way to say it is modern AI finds correlations, mine also finds ...

What are the "domains" here?

rich moth Jun 29, 2025, 3:17 AM

#

iron basalt What are the "domains" here?

Time seires, images, text and tabular.

iron basalt Jun 29, 2025, 3:20 AM

#

rich moth I guess the best way to say it is modern AI finds correlations, mine also finds ...

Other modern AI does that then though, so it's still too vague to separate it from modern AI.

rich moth Jun 29, 2025, 3:20 AM

#

Give me a dataset and ill run the the AI on it and share results

#

You control the experiment...

iron basalt Jun 29, 2025, 3:22 AM

#

Ok, a simple "hello world" for AI is the classic Wumpus World.

rich moth Jun 29, 2025, 3:23 AM

#

I'm not trying to "pull the wool" over anyones eyes, Im just asking for engagement.

iron basalt Jun 29, 2025, 3:24 AM

#

https://cis.temple.edu/~giorgio/cis587/readings/wumpus.shtml

rich moth Jun 29, 2025, 3:43 AM

#

iron basalt Jun 29, 2025, 3:46 AM

#

rich moth

Can you show some other cases? The original layout as shown in the link?

#

It also did not kill the wumpus, which is the other objective.

#

"The objective of the game is to kill the wumpus, to pick up the gold, and to climb out with it. "

#

To leave it needs to return to the start and do the climb action.

rich moth Jun 29, 2025, 4:49 AM

#

iron basalt Can you show some other cases? The original layout as shown in the link?

https://pastecode.io/s/aer2vf3g

rich moth Jun 29, 2025, 4:51 AM

#

iron basalt Can you show some other cases? The original layout as shown in the link?

Heres the wumpus code I made to verify. https://pastecode.io/s/c9qo8e1p

#

Solved it in 19, 20 steps?

#

I love these test honestly.

iron basalt Jun 29, 2025, 5:04 AM

#

rich moth https://pastecode.io/s/aer2vf3g

Found the gold and returned but the wumpus still lives.

rich moth Jun 29, 2025, 5:06 AM

#

iron basalt Found the gold and returned but the wumpus still lives.

Maybe it was too cautious. I should add Wumpus hunting logic.

#

Sorry for typos, you get it. But ill refine it

#

Thank you. @iron basalt

rich moth Jun 29, 2025, 5:33 AM

#

small wedge is it open source?

I get it.. What is crazy? And What isn't? Sometimes its definitive to define other times it can be hazy. But have you considered the substance for such a claim against someone's own self worth? Like I just wonder why you jump so righteously to a claim of mental unawareness?

Yet I just think you are "unaware" in the matter . Does that make me "schitzo"?
Or does it make you a bully?

really up to the observer.

worldly dawn Jun 29, 2025, 5:38 AM

#

rich moth I get it.. What is crazy? And What isn't? Sometimes its definitive to define ot...

Is there one or more experiments that can be analyzed and reproduced in the context of #data-science-and-ml ?

sage sorrel Jun 29, 2025, 5:39 AM

#

yeah I think there are

worldly dawn Jun 29, 2025, 5:39 AM

#

where is the code so we can run it and analyze it?

sage sorrel Jun 29, 2025, 5:40 AM

#

IMBD movie review sentiment analysis

#

Mnist Handwritten digit recognition

#

There are several experiements

#

that can be analayzed and reproduced

worldly dawn Jun 29, 2025, 5:41 AM

#

sage sorrel that can be analayzed and reproduced

and they all demonstrate <#data-science-and-ml message> ?

sage sorrel Jun 29, 2025, 5:43 AM

#

lol nvm

#

I would like to see the code as well

#

so we can run it

opaque condor Jun 29, 2025, 5:52 AM

#

where's the best sites for good high quality images?

small wedge Jun 29, 2025, 5:53 AM

#

rich moth I get it.. What is crazy? And What isn't? Sometimes its definitive to define ot...

To be clear, schitzo is just a turn of phrase, im not actually accusing you of being mentally ill. Im saying it sounds made up. My intention is not to bully you, but to make clear how outrageous what you are saying sounds to me.

That said, I have asked if the code is open source, and for an explanation on how your new AI differs from DL as you said. You've provided answers to neither, so I remain convinced that the grandiose claims you made are not true until you provide some evidence otherwise.

Im happy to change my view if you provide evidence for it.

rich moth Jun 29, 2025, 5:53 AM

#

opaque condor where's the best sites for good high quality images?

honestly depends, its usualyl domain specific.

#

Like do you want HQ magic cards? pokemon cards? medical data images?

rich moth Jun 29, 2025, 5:55 AM

#

small wedge To be clear, schitzo is just a turn of phrase, im not actually accusing you of b...

i appreciate you and you're skepticism and criticism. Its a few of things keeping me grounded 😆

opaque condor Jun 29, 2025, 5:56 AM

#

I want any type of image just in case I want something else
My friend is staring me to make a AI for anima recognition + character recognition

small wedge Jun 29, 2025, 5:59 AM

#

I wouldnt say you could find any/all images there, but hugging face and google datasets are always good places to start looking for clean, high quality datasets; especially when it comes to popular topics.

#

Almost certainly some anime datasets between the two

opaque condor Jun 29, 2025, 6:02 AM

#

I'm also trying to learn to make my own just in case you know I can't find one or it's for a specific job

opaque condor Jun 29, 2025, 6:05 AM

#

small wedge I wouldnt say you could find any/all images there, but hugging face and google d...

Just trying to prepare if my job doesn't give me that Queen data set or even the data set at all

old lodge Jun 29, 2025, 3:04 PM

#

Ok so I got a weird request. I would like someone to recommend me a topic/idea for capstone project of Data Science Diploma for me to do, I can't get any ideas after thinking for few days...

#

ducky_sphere

small wedge Jun 29, 2025, 3:17 PM

#

ooh that's neat

opaque condor Jun 29, 2025, 5:43 PM

#

rich moth honestly depends, its usualyl domain specific.

May I DM you?

opaque falcon Jun 29, 2025, 6:46 PM

#

gentle stone That's great idea, I also think about the financial. It seems like if we were to...

Want to do this course together:
https://www.codecademy.com/enrolled/courses/machine-learning-perceptrons

#

Anyone familiar with discord servers for learning machine learning? Say at aimed at beginners and intermediate folks?

lapis sequoia Jun 29, 2025, 7:27 PM

#

Is rag that intense? And can all of these autoregressive tasks be done without lang chain? Just good old torch and Hugging Face.

serene scaffold Jun 29, 2025, 7:31 PM

#

lapis sequoia Is rag that intense? And can all of these autoregressive tasks be done without l...

Rag is pretty much just putting more information into the prompt. You don't need an LLM-specific library except for the one that you use to prompt the LLM.

lapis sequoia Jun 29, 2025, 7:32 PM

#

Retriever prompt template IR is just information retrieval that is use for the LLM to generate responses, right? You don’t fine tune anything

#

It’s like a searching engine

lapis sequoia Jun 29, 2025, 8:43 PM

#

I just did it and now I feel rad

verbal oar Jun 29, 2025, 8:44 PM

#

rag is for reducing hallucinations and improving quality of answer

lapis sequoia Jun 29, 2025, 8:48 PM

#

I thought it was like the hardest thing of all time because I heard someone say that so it must have been true

lapis sequoia Jun 29, 2025, 8:49 PM

#

verbal oar rag is for reducing hallucinations and improving quality of answer

Chapter 14 Stanford NLP. yes

weak oxide Jun 29, 2025, 10:09 PM

#

opaque falcon Anyone familiar with discord servers for learning machine learning? Say at aimed...

I know Ryan and Matt data science, he's really good

#

For YouTube machine learning

#

https://youtube.com/playlist?list=PLcQVY5V2UY4LNmObS0gqNVyNdVfXnHwu8&feature=shared

YouTube

Scikit-Learn Tutorials - Master Machine Learning

Scikit-Learn is a powerful and user-friendly machine learning library in Python that provides a wide array of tools for creating, training, and evaluating ma...

#

I suggest to not immediately go into the code

opaque falcon Jun 29, 2025, 10:14 PM

#

Any resources on how to read an ML paper for new folks?

lapis sequoia Jun 29, 2025, 10:15 PM

#

opaque falcon Anyone familiar with discord servers for learning machine learning? Say at aimed...

What do you mean intermediate? Avoid NeuralNine he is trash, he shouldn’t talk ever. Most YouTubers are garbage. Just read just do everything everyone says to do that is consistent. I guess Sentex is good, deep lizard is attractive and those videos are good but her boyfriend deadlifts less than me so I am personally indifferent. Really, statquest is good. ISLR (the textbook) is the Bible. The NLP Stanford book is great, the Python book for machine learning is extremely outdated.
Honestly, just read libraries and the other suggestions. You have to like this. Because it is time consuming

lapis sequoia Jun 29, 2025, 10:18 PM

#

opaque falcon Any resources on how to read an ML paper for new folks?

ISLR textbook.

#

And Corey shrader if you watch Python tutorials. you really just need him honestly. And the torch textbook is very good.

serene scaffold Jun 29, 2025, 11:36 PM

#

opaque falcon Any resources on how to read an ML paper for new folks?

ML papers are usually really poorly written, so don't feel bad if you don't understand it.
I usually try to work out what the notation means and what it might look like if I coded it in torch

#

I also like to write on them on my tablet. I have a color coding system.

opaque condor Jun 30, 2025, 12:29 AM

#

Where's the best area to get images for convolution like I mean a good website for their free high quality and have almost every image that you may need

lapis sequoia Jun 30, 2025, 12:46 AM

#

opaque condor Where's the best area to get images for convolution like I mean a good website f...

Kaggle. Finding enough images for training that are all labeled is next to impossible. Honestly, just use kaggle.

gentle stone Jun 30, 2025, 1:17 AM

#

opaque falcon Want to do this course together: https://www.codecademy.com/enrolled/courses/mac...

Thank you, I will watch that course after I finished my previous courses

lapis sequoia Jun 30, 2025, 1:35 AM

#

You guys ever use Gemini and it ask for your SSN?

serene scaffold Jun 30, 2025, 2:41 AM

#

lapis sequoia You guys ever use Gemini and it ask for your SSN?

It wouldn't be able to know that unless Google knows it and has RAG to answer it

woeful harness Jun 30, 2025, 10:03 AM

#

Hello everyone, I need a little help. I'm building a ranking system for businesses based on features like distance, rating, cost, workload, completion rate, and total projects. I don't have any user data, and I need a way to rank businesses effectively. I have also tried MCDA (Multi-Criteria Decision Analysis).
so the problem i am facing is : while ranking, I want to give newer businesses those that haven’t had many chances to provide services yet slightly higher rank for a limited time to help them get exposure. How can I solve this problem?

fallow coyote Jun 30, 2025, 11:09 AM

#

serene scaffold ML papers are usually really poorly written, so don't feel bad if you don't unde...

What ML papers do you recommend that are somewhat simple to read and understand? It can be for any topic within ML

past meteor Jun 30, 2025, 11:59 AM

#

fallow coyote What ML papers do you recommend that are somewhat simple to read and understand?...

Go with books, especially the ones in the pinned posts

fallow coyote Jun 30, 2025, 12:28 PM

#

past meteor Go with books, especially the ones in the pinned posts

I already have a couple books. Just want to read some papers to see what theyre doing in research

brazen willow Jun 30, 2025, 2:11 PM

#

Hey gang, so my AI response is returning incorrect json data, i.e "total": 10 but the individual items that comprise the total does not equal 10 if you add manually

What I did was implement a retry logic until the individual items have the correct total -- is there a better alternative than just retrying? Am just using an off the shelf model, not sure if engineering my prompt better will make the responses more correct

lapis sequoia Jun 30, 2025, 2:32 PM

#

hey yall, almost finished perian data's course on udemy, where can i start learning python numpy and pandas

past meteor Jun 30, 2025, 3:58 PM

#

fallow coyote I already have a couple books. Just want to read some papers to see what theyre ...

It's important to walk before you run 🙂 Most papers aren't paradigm shifts, they're incremental updates on paradigm shifts. The books will cover the key ideas form those papers in a structued, linear way and the ones I linked in the pinned posts in particular also have references to the papers

#

So if you really really want to read them, look them up there

#

The running joke is that the average ML paper is "Bayesian interpretation of <method>" if you don't know <method> or have a solid grounding in bayesian statistics, reading that paper makes no sense imo

woven prairie Jun 30, 2025, 4:46 PM

#

Hello everyone , is there anyone who has worked on rag based chatbot or any kind of rag based application.

serene scaffold Jun 30, 2025, 4:48 PM

#

woven prairie Hello everyone , is there anyone who has worked on rag based chatbot or any kind...

it's best to always ask your actual question. not if someone is willing to commit to answering once you ask it.

woven prairie Jun 30, 2025, 4:49 PM

#

I want to discuss that thing in a little brief so I just asked, if someone has worked.

serene scaffold Jun 30, 2025, 4:51 PM

#

woven prairie I want to discuss that thing in a little brief so I just asked, if someone has w...

what do you want to discuss about it?

woven prairie Jun 30, 2025, 4:52 PM

#

I have project to make

#

Basically this project is focused on writing blogs for medium and other blog websites

#

So basically user want that he writes his topic and he can get the content for his post

#

Like title , body header what images should be included

#

Now the important thing is

#

He wants all the content to be extracted from the docs

#

He is having multiple docs from where he wants to make this content

#

First thought In my mind was about using rag , rag based chatbot

#

But he says the content should be written in such a way that it should be seo optimize

woven prairie Jun 30, 2025, 5:13 PM

#

?

past meteor Jun 30, 2025, 5:23 PM

#

woven prairie Basically this project is focused on writing blogs for medium and other blog web...

Sounds like a fairly straightforward rag use case but don't underestimate the fact that most people really do not like reading LLM generated slop

woven prairie Jun 30, 2025, 6:16 PM

#

past meteor Sounds like a fairly straightforward rag use case but don't underestimate the fa...

But the thing be based on the documents

#

Entirely docs

past meteor Jun 30, 2025, 6:16 PM

#

What docs exactly?

woven prairie Jun 30, 2025, 6:16 PM

#

Docs regarding yoga

#

User will upload it's docs and he will ask the blog based on the docs

past meteor Jun 30, 2025, 6:17 PM

#

You can try prompting it heavily to use a specific tone of voice, and/or put entire examples in your system prompt

woven prairie Jun 30, 2025, 6:17 PM

#

I think you got what I am trying to say

woven prairie Jun 30, 2025, 6:18 PM

#

past meteor You can try prompting it heavily to use a specific tone of voice, and/or put ent...

Yeah

past meteor Jun 30, 2025, 6:18 PM

#

The risk is really that things written by LLMs really have this vibe

woven prairie Jun 30, 2025, 6:18 PM

#

Yeah

rich moth Jul 1, 2025, 4:25 AM

#

opaque condor May I DM you?

sure! sorry I just saw your message.

visual goblet Jul 1, 2025, 7:19 AM

#

vizz wizzs out there, anyone has an idea how to make a barplot with a broken Y axis so that bars with values much larger than the rest dont ruin the plot by making all other bars look tiny?

jaunty helm Jul 1, 2025, 7:25 AM

#

visual goblet vizz wizzs out there, anyone has an idea how to make a barplot with a broken Y a...

set a fixed y limit? set the y axis to use log scale?

visual goblet Jul 1, 2025, 7:27 AM

#

jaunty helm set a fixed y limit? set the y axis to use log scale?

I mean like this:

#

jaunty helm Jul 1, 2025, 7:28 AM

#

visual goblet I mean like this:

ah
I mean that depends on the plotting library but here's matplotlib docs on broken axes

rich condor Jul 1, 2025, 11:21 AM

#

Sorry for reposting, need some suggestions and #1035199133436354600 threads are a bit too short lived

Let's say currently there is a test build of a project which calls OpenAI. It then calls a library that indirectly calls OpenAI, which will decide on a series of documents (html, pdfs) to download from the internet and then does some logicking process based on the corpus to derive an analysis (let's assume it is a .doc or .md, doesn't really matter here).

The current process is inefficient because

it downloads the 'corpora' into memory, so the entire process just gets killed off if there is an error
the library being used is a bit opaque and there is a need to inject some observability tooling (traces, metrics, logs) to see where it is 'choking'

Need some suggestions on how you would proceed in this scenario, or any advice

fallow coyote Jul 1, 2025, 11:47 AM

#

is it recommended to use git as version control for jupyter notebooks or should I use git for other proejcts

robust granite Jul 1, 2025, 11:55 AM

#

I'm confused, shall I do more kaggle or leetcode.

buoyant vine Jul 1, 2025, 12:02 PM

#

fallow coyote is it recommended to use git as version control for jupyter notebooks or should ...

I would use git

#

At the end of the day the jupyter files are just JSON files

fallow coyote Jul 1, 2025, 12:04 PM

#

im using pyhcarm community edition so how would i go about using git? ive also got git bash installed

lapis sequoia Jul 1, 2025, 12:04 PM

#

how beta is trainable? does anyone know a real use case for this activation function? or is it obsolete/irrelevant

buoyant vine Jul 1, 2025, 12:13 PM

#

fallow coyote im using pyhcarm community edition so how would i go about using git? ive also g...

In pycharm you can go to VSC (not visual studio code) and then click create repo I think it is.

Or you can in the command line in the folder with your project do git init

fallow coyote Jul 1, 2025, 1:10 PM

#

Its just with pycharm, even though Ive been using it for years now, it does feel quite bloated. I might think about going towards using something vim/neovim or some other simple ide. I saw as well someone suggesting using jupytertext to convert a notebook to python and linking the two files

jaunty helm Jul 1, 2025, 3:59 PM

#

fallow coyote is it recommended to use git as version control for jupyter notebooks or should ...

every run it changes some metadata or smthn and I find it really annoying to look at, but yeah you still can use git
now I mostly just rock a .py file with the vscode extension which allows you to do this

# %%
# this starts a new chunk like in jupyter
print('hello!')

# %%[markdown]
# # Title
# you can write markdown too, and you can export to a ipynb if you need
```I think `marimo` has also been getting traction as a notebook alternative

safe agate Jul 1, 2025, 4:06 PM

#

marimo notebooks are .py files and version well with Git

serene scaffold Jul 1, 2025, 4:13 PM

#

@safe agate do people ever say "marimo jupyter notebooks"?

#

hopefully not

safe agate Jul 1, 2025, 4:14 PM

#

serene scaffold <@303698973091299329> do people ever say "marimo jupyter notebooks"?

Not that I've heard of, but yes I hope not

fallow coyote Jul 1, 2025, 4:28 PM

#

Might give marino a go. Ill stick with Jupyter for a few projects and then to marimo

#

Apologies if this is off topic but it says 'no more JSON merge conflicts'. Why is it with JSON theres issues with merging

serene scaffold Jul 1, 2025, 4:53 PM

#

fallow coyote Apologies if this is off topic but it says 'no more JSON merge conflicts'. Why ...

data science tooling is on-topic for this channel.
ipynb files follow the json format (you can parse them with json.load, even if they don't have the json extension). jupyter notebooks store things like matplotlib output as big scary blobs that look terrible in git diffs.

calm thicket Jul 1, 2025, 5:10 PM

#

marimo is great

fallow coyote Jul 1, 2025, 5:10 PM

#

Ive looked on the marimo website and I actually quite like the look of it. I can see why people would want to use marimo over jupyter, particularly for version control. I'm going to keep version control for my project super simple as possible for practice and then, maybe the next project I do, switch to marimo and see how I like it. Its a good thing they ahev a feature where i can covnert my jupyter notebooks to marimo notebooks

runic parcel Jul 1, 2025, 5:22 PM

#

Hey guys, i was using a qwen and llama model for the ocr of my screenshot, its working great but the resonse is like 7s. And i want it to work in less that a second. How can i do something like this?

serene scaffold Jul 1, 2025, 5:33 PM

#

runic parcel Hey guys, i was using a qwen and llama model for the ocr of my screenshot, its w...

is the model running on your computer, or are you making API calls over a network?

lapis sequoia Jul 1, 2025, 6:03 PM

#

with LLMs, is there any point in even writing code? Like, Gemini made a movie of a story I wrote ten years ago and it was very good.

agile cobalt Jul 1, 2025, 6:17 PM

#

someone still needs to tell the LLM what to write, and for the foreseeable future be able to diagnose and identify what went wrong when it fails to do what you asked it to

not to mention that humans are terrible at defining precise requirements, and much of the time the llm's only context about your problem are the requirements you give it

lapis sequoia Jul 1, 2025, 6:20 PM

#

agile cobalt someone still needs to tell the LLM what to write, and for the foreseeable futur...

I am happy that I went in complete order, starting from linear regression (years ago), to LLMS. I did not touch a LLM until I fine-tuned Bert, T5,Bart,Roberta, and that was after going through all RNN's which was followed by CountVectorizer and TfidVectorizer, which came after regular experessions . I went in complete chronological order of tasks. I did not jump on the latest trend one single time.

agile cobalt Jul 1, 2025, 6:23 PM

#

I don't get how that is relevant to what was said before?

#

or you meant that as the start of a new topic?

lapis sequoia Jul 1, 2025, 6:24 PM

#

Sorry. Just meant that people go straight to LLMs instead of actually understanding what they are doing

lapis sequoia Jul 1, 2025, 6:26 PM

#

agile cobalt I don't get how that is relevant to what was said before?

I just meant it in the way that I knew what I was doing. I didn’t take into account that a lot of people just jump to LLMs and don’t do anything that came before

runic parcel Jul 1, 2025, 6:27 PM

#

serene scaffold is the model running on your computer, or are you making API calls over a networ...

The model is in my computer using ollama, then making api call

serene scaffold Jul 1, 2025, 6:28 PM

#

runic parcel The model is in my computer using ollama, then making api call

API calls are slow because your request has to travel over the internet to their server, then they have to process your request and send it back.

agile cobalt Jul 1, 2025, 6:29 PM

#

runic parcel The model is in my computer using ollama, then making api call

"then making api call"?
but either way ollama is not exactly known for being fast, for < 1 second you'll want to use something else like vLLM and make sure you have enough VRAM to keep all weights loaded

past meteor Jul 1, 2025, 6:29 PM

#

runic parcel The model is in my computer using ollama, then making api call

Is it all running locally? You're doing an api call to the server running on your machine?

agile cobalt Jul 1, 2025, 6:32 PM

#

lapis sequoia I just meant it in the way that I knew what I was doing. I didn’t take into acco...

on one hand, fair
on the other hand, knowing the details behind how llms work doesn't directly helps you use end user tools like Gemini much better
(understanding tokenization and context windows can help a lot, but that doesn't requires much prior knowledge - it matters a lot more if you are fine tuning local models, but that isn't exactly applicable for consuming closed source models)

lapis sequoia Jul 1, 2025, 6:33 PM

#

agile cobalt on one hand, fair on the other hand, knowing the details behind how llms work do...

I have fine-tuned local models.

fresh sluice Jul 1, 2025, 6:45 PM

#

anyone up ?

runic parcel Jul 1, 2025, 6:46 PM

#

past meteor Is it all running locally? You're doing an api call to the server running on you...

Yes

runic parcel Jul 1, 2025, 6:46 PM

#

serene scaffold API calls are slow because your request has to travel over the internet to their...

Its fully funning locally toh the api is of localhost

fresh sluice Jul 1, 2025, 6:48 PM

#

Anyone who can tell me what are the necessary skills that one must know being in AI ML domain

Like, there are a lot of things one can know like keras , computer vision , image classification , transformers and so on but what are the ones i need to focus on ?

In simple , what as an AIML student are the things i should know and make projects on specifically

fallow coyote Jul 1, 2025, 6:53 PM

#

Maths. Genuinely, if your maths is not up to par, you won't be able to understand any of the tools they use in ML

fresh sluice Jul 2, 2025, 7:02 AM

#

fallow coyote Maths. Genuinely, if your maths is not up to par, you won't be able to understan...

alright then

#

Can you tell me some good repos or sites from where i can get some projects / ideas as well?

fallow coyote Jul 2, 2025, 10:32 AM

#

fresh sluice alright then

Just sesrch up machine learning guided projects and therell be a github link to a repo that has a ton of complete ml projects. But like I said, learn the maths first. Thats far more important than learning to use the tool

somber willow Jul 2, 2025, 11:33 AM

#

hi does anyone know on the requirements of maths in ai

clever karma Jul 2, 2025, 11:35 AM

#

somber willow hi does anyone know on the requirements of maths in ai

The only thing I know is Advanced Maths

somber willow Jul 2, 2025, 11:35 AM

#

clever karma The only thing I know is Advanced Maths

what is that

clever karma Jul 2, 2025, 11:41 AM

#

somber willow what is that

Linear Algebra

#

Calculus

#

Probability & Statistics

#

Graph Theory

runic parcel Jul 2, 2025, 12:55 PM

#

I want to capture the live screen and get the OCR from that, that’s possible right? Using VLM like Qwen ollama

grand minnow Jul 2, 2025, 1:13 PM

#

runic parcel I want to capture the live screen and get the OCR from that, that’s possible rig...

either use the OCR or just pass a frame of the screen to the vlm directly

runic parcel Jul 2, 2025, 1:56 PM

#

grand minnow either use the OCR or just pass a frame of the screen to the vlm directly

so i do the screen capture from cv2 right?

#

and the crop by setting the coordiantes

#

and provide the VLM on that part

grand minnow Jul 2, 2025, 2:06 PM

#

runic parcel so i do the screen capture from cv2 right?

you can use anything to screen capture but if you wanna use cv2, go for it

runic parcel Jul 2, 2025, 2:06 PM

#

grand minnow you can use anything to screen capture but if you wanna use cv2, go for it

thanks

somber willow Jul 2, 2025, 3:48 PM

#

guys what do you recommend about steward's calculus for ml and data science

serene scaffold Jul 2, 2025, 8:04 PM

#

how did you arrive at this error without knowledge of programming?

#

if you're having an issue with Google Earth, and you're not trying to write Python code, it seems that question isn't on-topic for this server. You'd need to contact Google support.

lapis flax Jul 2, 2025, 11:06 PM

#

I'm working on building a neural net via Pytorch but I need to run it with an optimization algorithm that is more complicated than the in-built functions in Pytorch; does anyone have experience with doing this, or can you point me to some guides focused on this?

#

I've found a couple of sources where people enter 'no_grad' mode to manually compute their gradient descent steps, which is helpful, but it would be nice to see more in-depth sources where people do this because they need to do an optimization that isn't built-in to Torch

raw hare Jul 3, 2025, 1:28 AM

#

lapis flax I'm working on building a neural net via Pytorch but I need to run it with an op...

I am pretty sure you can write your own optimizer just like modules

lapis flax Jul 3, 2025, 1:37 AM

#

raw hare I am pretty sure you can write your own optimizer just like modules

are there fairly straightforward tools to do this? i do optimization research so the thing i’m trying to build is very specific and unique

raw hare Jul 3, 2025, 1:45 AM

#

I just read the docs for nn.optim, you can write a custom optimizer by inherit torch.optim.Optimizer

#

and you can update the parameter

#

base on your need

lapis flax Jul 3, 2025, 2:54 AM

#

i’ll try to play around with that i guess, i’d just rather inherit like sgd then customize what happens during the step

ocean fiber Jul 3, 2025, 4:09 AM

#

hi people\

#

i need some help

serene scaffold Jul 3, 2025, 4:15 AM

#

ocean fiber i need some help

any time you need help, imagine that someone has agreed to help you and that you have to tell them everything they need to know to start helping you

ocean fiber Jul 3, 2025, 4:17 AM

#

so it not that it really hard to explain because yes it is ai but i am trying to make a 3d model move and interact with my ai code the eyes move mouth open when texting the hole heads can move around all in pycrame so i dont really know how to start and i was looking for some ideas if any of you can help

grand minnow Jul 3, 2025, 9:15 AM

#

ocean fiber so it not that it really hard to explain because yes it is ai but i am trying to...

like a vtuber?

urban basin Jul 3, 2025, 4:11 PM

#

The one above me hasn't showered in 2 days

serene scaffold Jul 3, 2025, 4:48 PM

#

!warn 1339137495585132598 Trolling is not permitted in this server.

arctic wedgeBOT Jul 3, 2025, 4:48 PM

#

:incoming_envelope: :ok_hand: applied warning to @urban basin.

opaque condor Jul 3, 2025, 6:53 PM

#

Is a .PNG good or .JPG

tidal bough Jul 3, 2025, 6:56 PM

#

yes

opaque condor Jul 3, 2025, 7:15 PM

#

Because I'm using a web scraper I might be able to use two and I need to make a filter for a specific type of image

haughty oxide Jul 3, 2025, 8:13 PM

#

could i get help with this

tidal bough Jul 3, 2025, 8:14 PM

#

on windows you'd generally use python or py

serene dew Jul 3, 2025, 10:33 PM

#

I have 5 research questions given by perplexity when I fed it research gaps

#

anyone here able to validate if they are good or not? I am not an expert, I just found gaps but Im not entirely able to understand them

serene scaffold Jul 3, 2025, 10:54 PM

#

serene dew I have 5 research questions given by perplexity when I fed it research gaps

When you ask for help, always give people the information they would need to start helping you. Don't ask for people to commit in advance.

serene dew Jul 3, 2025, 10:56 PM

#

serene scaffold When you ask for help, always give people the information they would need to sta...

if I post everything it will be seen as spamming

serene scaffold Jul 3, 2025, 10:56 PM

#

serene dew if I post everything it will be seen as spamming

You can put it in the paste bin

serene dew Jul 3, 2025, 10:57 PM

#

iLl just send as a doc

serene scaffold Jul 3, 2025, 10:57 PM

#

You can't do that

serene dew Jul 3, 2025, 10:57 PM

#

wait

#

oh

random sun Jul 4, 2025, 6:23 AM

#

I need quick solutions for this : I am working on muzzle detection project , I have annotated and using yolo v8 -seg model I have trained . With inference I have got output image with muzzle of cattle part correctly where I have done t shaped muzzle segmentation. Now for further classification model I need only that part in image , how will i use only that part in image , because if I use full image left out region will be noise and accuracy will be low , now how I’ll I do this to only take region of interest part in an image

lost lion Jul 4, 2025, 12:31 PM

#

hey there, i'm currently trying to make a wake word detection model detecting the word "Labeeb", an arabic word. i tried to use openwakeword's tool that is made in google colab, it actually made the model, but when i tried it, it was having lots of FPs. I thought maybe it is because the tool isn't made for the arabic language, because it seems like the model didn't train on arabic negatives, but only english negatives. so, can anyone tell me how to retrain this model on arabic negatives? maybe using the same tool? or should i make a script for it?

crimson girder Jul 4, 2025, 6:25 PM

#

im just starting to get into data science and deep learning with python
i have heard about two libraries, tensorflow and pytorch, but im not really sure about the differences
im not looking for opinion, just facts about which is better for what circumstance

agile cobalt Jul 4, 2025, 6:34 PM

#

if you're building on top of the work of someone that used tensorflow, you may want to use tensorflow
if you're building on top of the work of someone that used pytorch, you may want to use pytorch

and it would happen that most researchers have been using pytorch over tensorflow in the last few years

odd meteor Jul 4, 2025, 6:54 PM

#

crimson girder im just starting to get into data science and deep learning with python i have h...

Take Etrotta's advise and go with PyTorch 😋 however, if you need more convincing, check out the trend on PapersWithCode https://paperswithcode.com/trends

Papers with Code - Papers With Code : Trends

Papers With Code highlights trending Machine Learning research and the code to implement it.

#

You can use TensorFlow still for DL, however it will get to a point where TensorFlow won't be of much help anymore, specifically, once you get to LLMs.

Starting with PyTorch will do you a lot more good.

odd meteor Jul 4, 2025, 7:05 PM

#

serene dew anyone here able to validate if they are good or not? I am not an expert, I just...

I could help validate/look at it if it's in my domain. I do ML Research in low-resource NLP and PPML (Federated Learning, Differential Privacy, Homomorphic Encryption, SMPC, etc)

serene dew Jul 4, 2025, 7:06 PM

#

odd meteor I could help validate/look at it if it's in my domain. I do ML Research in low-r...

It;s about OOD

#

out of Distribution generalization

#

its kinda LLM tho..

odd meteor Jul 4, 2025, 7:07 PM

#

serene dew out of Distribution generalization

Okay. Give me more detail.

serene dew Jul 4, 2025, 7:08 PM

#

odd meteor Okay. Give me more detail.

Can i DM u or is that against server rules

#

ok here is first question

#

How can we systematically quantify and minimize test set contamination in large-scale OOD benchmarks?

odd meteor Jul 4, 2025, 7:09 PM

#

serene dew Can i DM u or is that against server rules

Why can't we have the discussion here? At least if I don't have enough idea on it others can help out?

#

If you're trying to "safeguard" the research idea, that's understandable. But honestly, if it's a truly good and publishable direction, chances are someone else has already thought of it — or is already working on it. What really matters is execution, not just the idea.

crimson girder Jul 4, 2025, 7:30 PM

#

odd meteor Take Etrotta's advise and go with PyTorch 😋 however, if you need more convincin...

thanks!

lapis sequoia Jul 4, 2025, 7:34 PM

#

Hi

odd meteor Jul 4, 2025, 7:37 PM

#

serene dew 1. How can we systematically quantify and minimize test set contamination in lar...

This isn't my area of specialty but here's the direction I'm thinking

I think we can use these two statistical approach to quantify test set contamination. Kolmogorov-Smirnov test and KL Divergence.
For detecting the contamination, if this LLM is multimodal, then maybe, CLIP (Contrastive Language–Image Pre-training) from OpenAI might be useful.
Perhaps, SimCLR (Simple Contrastive Learning of Representations) can as well be helpful in this case if you want to approach this using Self-Supervised learning.

serene dew Jul 4, 2025, 8:57 PM

#

odd meteor This isn't my area of specialty but here's the direction I'm thinking 1. I thin...

Thank you!!

exotic star Jul 4, 2025, 9:32 PM

#

any tips on how to get trough mathematics for ML easier and actually get a grasp of everything?

#

i kinda skimmed trough some pages and did some reading but it hasnt been going too good

#

in 17 so i havent learned math of that level yet

#

also is using gpt to explain alghoritms concepts formulas.... a good way to learn or can he hallucinate?

limpid zenith Jul 4, 2025, 11:10 PM

#

exotic star also is using gpt to explain alghoritms concepts formulas.... a good way to lear...

yeah GPT can hallucinate math extremely badlly, i would use Khan academy first to get through everything upto multivariable caclulus and linear algebra

I wouldn't suggest Mathematics for ML tbh, a better method would be to get dedicated textooks for each subject and do practice problems on a notebook. Reading won't do you any good, you need to actually do the problems.

turbid field Jul 5, 2025, 1:03 AM

#

for training images on vehicle classification to create machine learning model, how many images instances per class? thesis level

#

also, may i know what website or any offers image annotations?

odd meteor Jul 5, 2025, 1:07 AM

#

turbid field for training images on vehicle classification to create machine learning model, ...

As much as you can lay your hands on.

turbid field Jul 5, 2025, 1:08 AM

#

odd meteor As much as you can lay your hands on.

i think we can only handle 1000, is it enough?

#

since we have a total of 13 classes

odd meteor Jul 5, 2025, 1:34 AM

#

turbid field also, may i know what website or any offers image annotations?

If you really wanna immerse yourself in this completely, you should consider using any of this approach instead

Active learning
SSL (self supervised learning) domain adaptation technique + MIM (masked image modelling)
Few Shot learning (with LLM)

odd meteor Jul 5, 2025, 1:38 AM

#

turbid field i think we can only handle 1000, is it enough?

Get as much 'good data' as possible! The major constrain you probably have to think about is, access to compute (because you will need GPU)

exotic star Jul 5, 2025, 1:48 AM

#

I should first go trough khan academy and after i get basic idea of everything, get a math book with problems and do them in a seperate notebook for each subject?

#

also should I spend all the time in the math part or also learn about the python libraries requierd for ML?

turbid field Jul 5, 2025, 2:47 AM

#

odd meteor Get as much 'good data' as possible! The major constrain you probably have to th...

what determines a good data? also for the gpu i might use google collab pro or my rtx3060ti

serene scaffold Jul 5, 2025, 2:49 AM

#

@odd meteor I hope you're doing great flag_dc 💚 🇳🇬

odd meteor Jul 5, 2025, 4:14 AM

#

serene scaffold <@519319496868233227> I hope you're doing great <:flag_dc:1166907825344229469> �...

Yeah I'm doing great Pope. Thanks for asking 😊

narrow tiger Jul 5, 2025, 7:05 AM

#

is there any hybrid recommender system with minimal frontend available online?

spring field Jul 5, 2025, 10:02 AM

#

turbid field for training images on vehicle classification to create machine learning model, ...

now that's a true machine learning model

spring field Jul 5, 2025, 10:05 AM

#

turbid field also, may i know what website or any offers image annotations?

you want like a ready-made dataset or just a tool for annotating the images yourself?

#

for datasets you can check out HuggingFace

glacial root Jul 5, 2025, 10:24 AM

#

hi @serene scaffold, i just wanted to make sure since i often see differing answers to this question, how do byte pair encoding and word piece deal with spaces?

#

do they still include them as characters?

serene scaffold Jul 5, 2025, 10:24 AM

#

glacial root hi <@253696366952316929>, i just wanted to make sure since i often see differing...

Hello, please don't ping specific people to answer questions that aren't specific to them.

glacial root Jul 5, 2025, 10:25 AM

#

sorry, i pinged you because i thought you were experienced with nlp

serene scaffold Jul 5, 2025, 10:26 AM

#

Even so, unless it's a question that's literally only for that person, please direct your questions to the channel generally. No one is on call to answer questions about their area of expertise.

glacial root Jul 5, 2025, 10:26 AM

#

oh alright, sorry for pinging

serene scaffold Jul 5, 2025, 10:28 AM

#

I'm actually not sure what byte pair encodings are.
Word pieces can't have spaces that separate them from the "main" word piece. Unless maybe the tokenizer has rules to handle that.

glacial root Jul 5, 2025, 10:31 AM

#

serene scaffold I'm actually not sure what byte pair encodings are. Word pieces can't have space...

i'm pretty sure both are similar except for the rule that they use to choose which pair gets added to the vocabulary, with byte pair encoding using the most frequent pair and word piece using the likelihood score that also account for the frequency of the individual parts of the pair

#

am i correct here about word piece?

#

and i'm not quite sure what you mean by main word piece

serene scaffold Jul 5, 2025, 10:35 AM

#

glacial root and i'm not quite sure what you mean by main word piece

Look at the tokenization of something from BERT and you'll see what I mean

glacial root Jul 5, 2025, 10:37 AM

#

glacial root Jul 5, 2025, 10:38 AM

#

serene scaffold Look at the tokenization of something from BERT and you'll see what I mean

so it doesn't first split the input text into characters?

serene scaffold Jul 5, 2025, 10:44 AM

#

glacial root so it doesn't first split the input text into characters?

No

turbid field Jul 5, 2025, 12:19 PM

#

spring field you want like a ready-made dataset or just a tool for annotating the images your...

We will be using roboflow to annotate

#

just to many images to annotate

ivory umbra Jul 5, 2025, 12:21 PM

#

if im labeling 10k+ domains and plan to train it to a model, would it be better to just go straight to postgres (supabase) just in case i scale more later on or is sqlite enough?

grand minnow Jul 5, 2025, 12:32 PM

#

ivory umbra if im labeling 10k+ domains and plan to train it to a model, would it be better ...

How are you training the model? Maybe use something like Google Vertex and upload your dataset there? Or via Google Colab?

ivory umbra Jul 5, 2025, 12:38 PM

#

grand minnow How are you training the model? Maybe use something like Google Vertex and uploa...

well right now, im not at the part of training a model yet. im still at the process of labelling domains. basically, fetch a domain from the tranco list -> scrape it -> store the scraped data -> label the scraped domain using qwen -> store that dataset -> train a model using that dataset. im just wondering if i should just go straight to supabase if im gonna label a lot anyways

runic parcel Jul 5, 2025, 6:29 PM

#

Guys i am doing ocr using a vlm, but instead of doing in on the image i want to do it from the live screen. i will be seeing something on my pc and it should do ocr on that, how can i do it?

#

Also the vlm and respond in .07 second.

earnest sleet Jul 5, 2025, 7:15 PM

#

Hello! Very basic question here, feel free to just point me somewhere: Why exactly is Python the language of choice for data science and such?
Couldn't find anything but corpo LLM slop on the topic online

serene scaffold Jul 5, 2025, 7:18 PM

#

earnest sleet Hello! Very basic question here, feel free to just point me somewhere: Why exact...

The ecosystem of tools that grew around it, which itself are a consequence of its C API and operator overloading

#

It was never intended to the the DS/AI language

earnest sleet Jul 5, 2025, 7:26 PM

#

Alright, thanks. IT's all about the tooling in the end :)

serene scaffold Jul 5, 2025, 7:29 PM

#

earnest sleet Alright, thanks. IT's all about the tooling in the end :)

The libraries. Not the tooling

#

Tooling is stuff like IDEs and type checkers

iron basalt Jul 5, 2025, 7:39 PM

#

earnest sleet Hello! Very basic question here, feel free to just point me somewhere: Why exact...

1. You can just make a new file and go (no need to learn what a compiler is and setup a project).
2. The syntax is not scary, it's straight forward / boring / not symbol soup.
3. It's a memory safe/garbage collected language.
4. The garbage collection in combination with operator overloading allows for some high level math-y code to be written.
5. Most importantly, it has a massive and ever growing library of modules available online for every task imaginable (it has momentum at this point, and no matter how good your language design is, nothing beats having the code already written for you by someone else).

soft star Jul 5, 2025, 10:26 PM

#

serene scaffold It was never intended to the the DS/AI language

and intended languages did not work out !

static pasture Jul 6, 2025, 4:43 AM

#

https://discord.com/channels/267624335836053506/1391237689440731256

pls help

final dock Jul 6, 2025, 8:46 AM

#

hey can anyoe guide me in machine learning / deep learning for python , I have intermediate knowledge of python(Upto OOP)

rancid thorn Jul 6, 2025, 9:08 AM

#

I want to implement a neural network for the creatures in my game and i found this library called PyTorch and i was wondering if its good for many small NNs or if its better to create them from scratch

#

Also I've tried developing many AIs before (last one was a 2048 AI) but it never worked out. It actually seemed to get worse over time, but i cant understand why

jaunty helm Jul 6, 2025, 9:13 AM

#

rancid thorn I want to implement a neural network for the creatures in my game and i found th...

for anything other than learning purposes (i.e. implementing from scratch to learn say backprop), it's probably better to just use pytorch
on the other hand, what are you planning to do with nns with creatures? usually "game AI" is pretty different from the current "AI"s

rancid thorn Jul 6, 2025, 9:14 AM

#

For example my idea was that it would take how much water it has, how much energy and other things and then would output how much to grow, to deepen its roots, to spread seeds etc

jaunty helm Jul 6, 2025, 9:15 AM

#

rancid thorn For example my idea was that it would take how much water it has, how much energ...

then I think you're looking for more "traditional" game AI approaches
a simple one is just rule based, like

if energy > a and water > b:
  grow()
elif ...
...

rancid thorn Jul 6, 2025, 9:16 AM

#

Yeah thats what im currently doing but i wanted it to be more advanced

#

Also this way it might actually be heavier on my pc because its gotta store a lot of variables and actions

jaunty helm Jul 6, 2025, 9:17 AM

#

rancid thorn Also this way it might actually be heavier on my pc because its gotta store a lo...

a neural network will probably be like at least 10x heavier and still do worse + be unpredictable

jaunty helm Jul 6, 2025, 9:18 AM

#

rancid thorn Also this way it might actually be heavier on my pc because its gotta store a lo...

like how much do you think is "a lot"?

jaunty helm Jul 6, 2025, 9:19 AM

#

rancid thorn Yeah thats what im currently doing but i wanted it to be more advanced

I'm not familiar with game dev, but the first thing that comes to mind is like 4x games - there's tons of variables like resources, available information of enemies and the map, potential tech to unlock, ...
so looking at how they approach it is probably not a bad idea

rancid thorn Jul 6, 2025, 9:20 AM

#

Idk it has to store a variable for each action twice, like the variable for when its wet and the one for when its dry. So it would be actions*vital variables**2

#

So it would come to be like 40 float point variables

jaunty helm Jul 6, 2025, 9:21 AM

#

rancid thorn So it would come to be like 40 float point variables

even ignoring whether that's a good approach or not, 40 floats is not a lot by any means

rancid thorn Jul 6, 2025, 9:21 AM

#

Alright then, thank you for helping me

wet dome Jul 6, 2025, 7:08 PM

#

does anyone know when using numpy if you can use type hints to show the shape of an array, is there a library which can let you do this? e.g.

x: np.ndarray[3,4] # array of shape = (3,4)

serene scaffold Jul 6, 2025, 7:38 PM

#

wet dome does anyone know when using numpy if you can use type hints to show the shape of...

Numpy has a typing module

opaque condor Jul 6, 2025, 7:59 PM

#

Can I mix PNG and JPEG files together?

serene scaffold Jul 6, 2025, 8:24 PM

#

opaque condor Can I mix PNG and JPEG files together?

As long as you convert them to the same representation when they go into the model

opaque condor Jul 6, 2025, 8:38 PM

#

So under the same label

serene scaffold Jul 6, 2025, 8:41 PM

#

opaque condor So under the same label

What do you mean?

opaque condor Jul 6, 2025, 8:42 PM

#

A mixture of jpegs and pngs

balmy yoke Jul 6, 2025, 8:44 PM

#

is this the machine learning server

serene scaffold Jul 6, 2025, 8:44 PM

#

balmy yoke is this the machine learning server

It's the python server. This is the ml channel of it.

serene scaffold Jul 6, 2025, 8:45 PM

#

opaque condor A mixture of jpegs and pngs

What is the actual representation of an image that goes into the model?

balmy yoke Jul 6, 2025, 8:45 PM

#

ok thanks i have a question that isnt hard enough to go on posts i think can you help me

serene scaffold Jul 6, 2025, 8:47 PM

#

balmy yoke ok thanks i have a question that isnt hard enough to go on posts i think can you...

I have no way of knowing until you ask the question

opaque condor Jul 6, 2025, 8:47 PM

#

The photo itself let's say if I had a picture of a tomato or a pear I would take them and put them into separate categories for that specific fruit
Under classification or a folder that acts like the label

balmy yoke Jul 6, 2025, 8:47 PM

#

do you happen to know simple linear regressio

#

like 3 variables only

serene scaffold Jul 6, 2025, 8:47 PM

#

@opaque condor when you pass an image to a model, what is the literal thing that that image is in your python code?

balmy yoke Jul 6, 2025, 8:49 PM

#

for some reason i keep getting a message that"reg" is not defined even though it is idk what im doing wrong i even asked ai but it keeps saying that even after i do the suggested corrections

serene scaffold Jul 6, 2025, 8:50 PM

#

balmy yoke for some reason i keep getting a message that"reg" is not defined even though it...

Looks like you didn't run the cell that defines reg

#

You can the cell after it

balmy yoke Jul 6, 2025, 8:50 PM

#

tried that

serene scaffold Jul 6, 2025, 8:50 PM

#

If you defined reg, you're guaranteed to not get a name error

#

You might get an error, but it won't be that one.

balmy yoke Jul 6, 2025, 8:51 PM

#

so run this line again? :reg = linear_model.LinearRegression()
reg.fit(df[['ex', 'test_s', 'interview_s']], df['sal'])

serene scaffold Jul 6, 2025, 8:51 PM

#

For the first time actually

#

@balmy yoke what's the new error message, if you got one?

balmy yoke Jul 6, 2025, 8:53 PM

#

i didnt get a new one

#

mabye is there a way i can share my code with you

serene scaffold Jul 6, 2025, 8:54 PM

#

Yes, by copying and pasting it

#

!code

arctic wedgeBOT Jul 6, 2025, 8:54 PM

#

Formatting code on Discord

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

For long code samples, you can use our pastebin.

serene scaffold Jul 6, 2025, 8:54 PM

#

^ like this

balmy yoke Jul 6, 2025, 8:54 PM

#

im in jupyter so its harder

#

i sent you a screen recording also

#

do you want my dataset as well

serene scaffold Jul 6, 2025, 8:55 PM

#

No, just copy and paste the relevant code and text output

#

Anything else would be harder for other people to read

balmy yoke Jul 6, 2025, 8:56 PM

#

alright

#

should we open up direct message so everyone doesnt get pinged or whatever

serene scaffold Jul 6, 2025, 8:57 PM

#

No, in this channel. No one else is getting pinged.

balmy yoke Jul 6, 2025, 8:57 PM

#

i will send each cell seperately to make it more clear

#

from sklearn import linear_model
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

#

df = pd.read_csv("/Users/srihari/Desktop/ml/variateLR/job.csv")
df

#

df.test_s.median()

#

df.test_s = df.test_s.fillna(df.test_s.median())
df

#

everything worked here

#

reg = linear_model.LinearRegression()
reg.fit(df[['ex', 'test_s', 'interview_s']], df['sal'])

#

reg.predict([[2,9,6]]) : 1 reg.predict([[2,9,6]])

NameError: name 'reg' is not defined

serene scaffold Jul 6, 2025, 8:59 PM

#

balmy yoke reg = linear_model.LinearRegression() reg.fit(df[['ex', 'test_s', 'interview_s']...

You somehow never ran the cell that defines reg. Try copying into the cell that uses reg, so we can be sure it gets defined

serene scaffold Jul 6, 2025, 8:59 PM

#

arctic wedge

@balmy yoke Please read this message from the bot

balmy yoke Jul 6, 2025, 8:59 PM

#

?

serene scaffold Jul 6, 2025, 9:00 PM

#

#data-science-and-ml message please read this

balmy yoke Jul 6, 2025, 9:00 PM

#

i used the pastebin but it added stuff to the code idk

serene scaffold Jul 6, 2025, 9:00 PM

#

The first part tells you how to format code on Discord.

#

"long code samples" are like, whole files.

balmy yoke Jul 6, 2025, 9:01 PM

#

i did

serene scaffold Jul 6, 2025, 9:01 PM

#

You didn't follow the bot's instructions for formatting code on discord.

Let me know what happens when you move the code that defines reg into the next cell and re run that cell.

balmy yoke Jul 6, 2025, 9:02 PM

#

oh wow it wokred

#

what changed when i moved that code?

serene scaffold Jul 6, 2025, 9:02 PM

#

What's the best explanation you can think of, based on what the error message said and what we've discussed?

balmy yoke Jul 6, 2025, 9:03 PM

#

it finally ran the code where it defined reg idk

serene scaffold Jul 6, 2025, 9:04 PM

#

What did the error message tell you?

balmy yoke Jul 6, 2025, 9:04 PM

#

that reg was not defined

serene scaffold Jul 6, 2025, 9:04 PM

#

Right

#

If a variable is defined, you can't possibly get that error

#

You might get a different error saying that you tried to use it in an invalid way

balmy yoke Jul 6, 2025, 9:05 PM

#

yeah also this message alwyas shows up btw

#

but it still shows the right output

#

/Users/srihari/.pyenv/versions/3.13.5/lib/python3.13/site-packages/sklearn/utils/validation.py:2749: UserWarning: X does not have valid feature names, but LinearRegression was fitted with feature names
warnings.warn(

serene scaffold Jul 6, 2025, 9:07 PM

#

That would require a deeper dive than I can do at the moment--I'm on mobile

balmy yoke Jul 6, 2025, 9:07 PM

#

ok but thanks for the help

serene scaffold Jul 6, 2025, 9:07 PM

#

balmy yoke do you happen to know simple linear regressio

By the way, your question ultimately had nothing to do with linear regression. It was just about name errors. That's why you should always just ask your actual question, not if anyone knows about what you think the topic is.

balmy yoke Jul 6, 2025, 9:08 PM

#

ok thanks

fringe jay Jul 7, 2025, 1:10 PM

#

RAG queries bring back unrelated information, or dont bring anything back for some reason, what are some ways to optimise RAG queries?

wanton viper Jul 7, 2025, 1:40 PM

#

Hi, I am Muhammad mustaqeem, 17. (UAE)
I want to get into tech and will be starting my Bsc program in AI & Daa Science in sept 2026. I want some advise on how to begin the self study journey as even majority of the uni studies are going to be self so i want to start from now basically. Alongside, i plan to join for basic IT role internships and part time jobs. Could anyone share some advise and practical experience on how to begin and especially what all do i need to know to get into any junior level IT role. (Currently learning python but haven't made any major projects ).

#

Data*

short moth Jul 7, 2025, 1:43 PM

#

wanton viper Hi, I am Muhammad mustaqeem, 17. (UAE) I want to get into tech and will be start...

get the book "Learning Python for Data" by Matt Harrison it is really good. then you can get the book "Effective Pandas" by Matt Harisson.

#

short read for both shouldn't take you more than 1 month

wanton viper Jul 7, 2025, 1:44 PM

#

Alr. Also, as i said i haven't made any major projects, so i am confused on whether i should first build software development skills or just straight off python for data science then whatever follows along.

short moth Jul 7, 2025, 1:47 PM

#

you can do both probably

#

in the end you will build software development skills as you learn data science and ai i think

#

although, I do not know for sure, I am not a full on software developer, I just use pandas for whatever I need in my field

#

i have a question for the pandas experts. I have some data I made here (2nd image) and I modified the data to remove the numbers so that I get this (1st text)
:

Process    Module    Hosting Environment
0    Platform    DPM    Core 1
1    UA Consolidator    DPM    Core 1
2    Traffic Server    DPM    Core 1
3    Flight Deck Window Manager    DPM    Core 2
4    Configuration Manager    DPM    Core 2
...    ...    ...    ...
144    Policy Handler    INSU    Core 7
145    ACMF    INSU    Core 8
146    AOIP    INSU    Core 8
147    Bluetooth UA    INSU    Core 8
148    SXM UA    INSU    Core 8

#

now i wanted to groupby the Module and then get the counts of each process for each module

#

so i did


updated_df = (df.assign(Process = lambda df_: df_.Process.str.replace(r'[0-9]', repl = ' ', regex = True))
              .assign(Module = lambda df_: df_.Module.str.replace(r'[0-9]', ' ', regex=True)).loc[:,'Process':'Module'].groupby('Module')
              .value_counts()
             )

#

I exported to excel and for some reason, I still have duplicates in the data!

#

#

why is this?

twilit topaz Jul 7, 2025, 2:18 PM

#

What are some good real life data sets to learn with. I heard of Kaggle but I watched a video saying I need to analyze with real datasets

#

I don't mind if it's unclean either

short moth Jul 7, 2025, 2:27 PM

#

it seems that it groups the elements correctly I do not understand why it duplicates some elements

serene scaffold Jul 7, 2025, 2:31 PM

#

fringe jay RAG queries bring back unrelated information, or dont bring anything back for so...

What query technique are you using? Vector search?

short moth Jul 7, 2025, 2:39 PM

#

even doing something like this:

#

updated_df = (df.assign(Process = lambda df_: df_.Process.str.replace(r'[0-9]', repl = ' ', regex = True))
              .assign(Module = lambda df_: df_.Module.str.replace(r'[0-9]', ' ', regex=True)).loc[:,'Process':'Module'].assign(counter = 1)
              .groupby(['Module','Process']).counter.sum()
             )

#

yields the same thing

#

having just 1 next to each module and process same result

#

i wonder why it does this it must thing that these mappers are not equal

serene scaffold Jul 7, 2025, 2:46 PM

#

to be clear, you wanted to remove the trailing digit from each value in Module, and then get the value counts of each (Process, Module) pair?

short moth Jul 7, 2025, 2:51 PM

#

serene scaffold to be clear, you wanted to remove the trailing digit from each value in Module, ...

yes but I realized the mistake, I had trailing whitespace

#

i didn't know but I used .strip() and removed it and it works

#

so the groupby did not treat certain equal elements as equal

#

but with .strip it works well

serene scaffold Jul 7, 2025, 2:52 PM

#

you could have also done .str.replace(r"\s*\d+$", ""), or something

short moth Jul 7, 2025, 2:52 PM

#

yes i added that in as well

woven prairie Jul 7, 2025, 5:19 PM

#

Is there anyone who have worked with open ai key

#

I tried to increase the max token in payload but still my output is not even more than 1500-1600 tokens

#

Basically I am trying to tell the model to write the html css code to make a poster or similar to poster

#

I know to make it one it writes a lot of line code and it will take a lot of tokens but I don't why it is not writing

agile cobalt Jul 7, 2025, 5:21 PM

#

the only thing max tokens does is cut the response short if the model still hasn't finished its response before hitting the token limit

#

the model will not make its responses shorter nor longer based on it, it'll just be cut midsentence if it crosses the limit

woven prairie Jul 7, 2025, 5:22 PM

#

But the gpt model has a 128k token limit

agile cobalt Jul 7, 2025, 5:23 PM

#

if it is finishing before the max tokens you set, that means the model considered its job done
either use a better model or prompt it better

woven prairie Jul 7, 2025, 5:23 PM

#

My task is to make a poster and a pamphlet using some data

#

.

#

I am using gpt4o

agile cobalt Jul 7, 2025, 5:24 PM

#

prompt it better then

woven prairie Jul 7, 2025, 5:25 PM

#

You have worked on a prompt can you help me

agile cobalt Jul 7, 2025, 5:25 PM

#

if you're expecting for the model to make SVG graphics, there is a huge chance it just straightway won't work

woven prairie Jul 7, 2025, 5:25 PM

#

No some where poster

agile cobalt Jul 7, 2025, 5:26 PM

#

which kind of poster? can you show an example?

woven prairie Jul 7, 2025, 5:26 PM

#

Ok

#

#

Something like this

agile cobalt Jul 7, 2025, 5:27 PM

#

how do you expect for it to handle the images 🙃

woven prairie Jul 7, 2025, 5:27 PM

#

Leaving the images

agile cobalt Jul 7, 2025, 5:31 PM

#

also I can barely read much of the text in that image, you're probably asking too much from the LLM

I'd recommend using a model fine tuned specifically for that, preferably create one yourself if you have some (>100, preferably >1000) data pairs of HTML that generates a given image

woven prairie Jul 7, 2025, 5:32 PM

#

Ok that I can not do rn but let's see what most I can get

#

Thanks for your help

fringe jay Jul 8, 2025, 12:53 AM

#

serene scaffold What query technique are you using? Vector search?

yep vector search

young beacon Jul 8, 2025, 1:07 AM

#

fringe jay yep vector search

Look into HyDE technique

serene scaffold Jul 8, 2025, 3:21 AM

#

fringe jay yep vector search

The vectorizer you're using might not be effective

fringe jay Jul 8, 2025, 9:13 AM

#

serene scaffold The vectorizer you're using might not be effective

using the multilangual from pinecone

#

rag keeps bringing up irrelevant stuff, or not bringing up relevant stuff

fringe jay Jul 8, 2025, 9:16 AM

#

young beacon Look into HyDE technique

ooh, very interesting

wet dome Jul 8, 2025, 3:33 PM

#

How do you measure the perfomance of a logistic regression model?

#

At the moment im calculating accuracy, i.e. how many times it got the correct label

#

what other metrics do people use

buoyant vine Jul 8, 2025, 4:17 PM

#

Normally for classification, you have Accuracy, Recall, Precision.
F1 core is normally my go too for an 'overall' metric for how well it is doing

past meteor Jul 8, 2025, 4:25 PM

#

wet dome How do you measure the perfomance of a logistic regression model?

I tend to exclusively look at precision or recall

#

Real world problems typically require on or the other, even at the expense of the other metric

#

Hence why I typically skip F1 score and/or AUC

serene scaffold Jul 8, 2025, 4:37 PM

#

past meteor I tend to *exclusively* look at precision or recall

what about micro and macro averages

past meteor Jul 8, 2025, 4:42 PM

#

serene scaffold what about micro and macro averages

Yeah, I'm sure there's a time and place where macro precision makes sense

reef spade Jul 8, 2025, 4:45 PM

#

Hello guys, I am trying to use the text to speech model chatterbox to clone my own voice and read a transcript. But im struggling with the installation. I got it working on google colab but it just doesnt work on VScode. The line "from chatterbox.tts import ChatterboxTTS" only works on google colab but not on vscode. I use the same isntallation comands i did on google colab. Can someone please help me out?

serene scaffold Jul 8, 2025, 4:50 PM

#

reef spade Hello guys, I am trying to use the text to speech model chatterbox to clone my o...

please copy and paste the text in the powershell terminal into this chat as text.

reef spade Jul 8, 2025, 4:53 PM

#

you mean the installation portion?

serene scaffold Jul 8, 2025, 4:53 PM

#

all of this as actual text

reef spade Jul 8, 2025, 4:54 PM

#

PS C:\Users\Admin\Documents\visual studio code projects\streamlit_apps\app1> & C:/Users/Admin/AppData/Local/Microsoft/WindowsApps/python3.11.exe "c:/Users/Admin/Documents/visual studio code projects/streamlit_apps/app1/file.py"
Traceback (most recent call last):
File "c:\Users\Admin\Documents\visual studio code projects\streamlit_apps\app1\file.py", line 3, in <module>
from chatterbox.tts import ChatterboxTTS
ModuleNotFoundError: No module named 'chatterbox.tts'
PS C:\Users\Admin\Documents\visual studio code projects\streamlit_apps\app1>

serene scaffold Jul 8, 2025, 4:55 PM

#

reef spade PS C:\Users\Admin\Documents\visual studio code projects\streamlit_apps\app1> & C...

try doing this command

C:/Users/Admin/AppData/Local/Microsoft/WindowsApps/python3.11.exe -m pip install chatterbox-tts

#

@reef spade

reef spade Jul 8, 2025, 4:56 PM

#

📎 message.txt

arctic wedgeBOT Jul 8, 2025, 4:56 PM

#

reef spade

~~Please react with ✅ to upload your file(s) to our paste bin, which is more accessible for some users.~~

reef spade Jul 8, 2025, 4:57 PM

#

it seems to have been unsucessful

serene scaffold Jul 8, 2025, 4:58 PM

#

reef spade it seems to have been unsucessful

remember to always follow instructions from the bot.

ERROR: Could not install packages due to an OSError: [WinError 206] The filename or extension is too long: 'C:\\Users\\Admin\\AppData\\Local\\Packages\\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\\LocalCache\\local-packages\\Python311\\site-packages\\onnx\\backend\\test\\data\\node\\test_attention_3d_with_past_and_present_qk_matmul_softcap_expanded\\test_data_set_0'

This is the issue, apparently

woven prairie Jul 8, 2025, 4:58 PM

#

Can some help me in writing a advance prompt

serene scaffold Jul 8, 2025, 4:59 PM

#

woven prairie Can some help me in writing a advance prompt

please be as specific as possible about what you're asking about.

reef spade Jul 8, 2025, 5:00 PM

#

serene scaffold remember to always follow instructions from the bot. ``` ERROR: Could not instal...

how do i rectify this error?

serene scaffold Jul 8, 2025, 5:00 PM

#

this might be the solution https://stackoverflow.com/questions/72352528/how-to-fix-winerror-206-the-filename-or-extension-is-too-long-error

#

you might also look into "windows enable long paths"

woven prairie Jul 8, 2025, 5:07 PM

#

@serene scaffold I want to write a prompt for that will guide gpt4o model to write a html css code that code we make a medical poster.

#

#

Example

serene scaffold Jul 8, 2025, 5:07 PM

#

woven prairie <@253696366952316929> I want to write a prompt for that will guide gpt4o model t...

is this part of a project you're trying to do in python?

woven prairie Jul 8, 2025, 5:07 PM

#

Yes

#

Basically I am getting some data from the rag pipeline and then I have to create a poster out of it.

serene scaffold Jul 8, 2025, 5:08 PM

#

okay. well, prompting LLMs isn't rocket science. just tell it what you want and try rendering the HTML that you get.

woven prairie Jul 8, 2025, 5:08 PM

#

I have been trying for week

#

But I am not getting desired output

serene scaffold Jul 8, 2025, 5:09 PM

#

what have you tried and what output are you getting?
when you ask for help, it's important that you're up front with all the information, so that we don't have to interview you to know what your question is.

woven prairie Jul 8, 2025, 5:10 PM

#

So I downloaded a poster form google gave it to claude told him to write html css code of it

#

Then told what thinking process you used to make such design

#

It gave me his thinking process and I used it as a prompt

serene scaffold Jul 8, 2025, 5:14 PM

#

@woven prairie I've never used claude. if I were doing this, I would describe what I want the page to look like from top to bottom, including the actual text, the size of the text, the color of text boxes, etc.

woven prairie Jul 8, 2025, 5:15 PM

#

Ok

mortal ivy Jul 8, 2025, 5:18 PM

#

hey guys, i need some help for a specific function in numpy.

serene scaffold Jul 8, 2025, 5:19 PM

#

mortal ivy hey guys, i need some help for a specific function in numpy.

hello, please always ask your actual question, so that people don't have to interview you to know what your question is. Never ask to ask.

mortal ivy Jul 8, 2025, 5:21 PM

#

i wanted to know how numpy.fft.fft() actually works, i went around here and there but could not understand

serene scaffold Jul 8, 2025, 5:22 PM

#

here's the implementation: https://github.com/numpy/numpy/blob/v2.3.0/numpy/fft/_pocketfft.py#L58

arctic wedgeBOT Jul 8, 2025, 5:22 PM

#

numpy/fft/_pocketfft.py line 58

def _raw_fft(a, n, axis, is_real, is_forward, norm, out=None):```

wooden sail Jul 8, 2025, 6:03 PM

#

mortal ivy i wanted to know how numpy.fft.fft() actually works, i went around here and ther...

do you want implementation details or just basic math background?

#

for vectors whose lengths are powers of 2, i believe the radix-2 cooley-tukey alg is the standard. in other cases, a prime factorization is used to determine the unique frequency components

mortal ivy Jul 8, 2025, 6:06 PM

#

wooden sail do you want implementation details or just basic math background?

basic understanding actually, here is the block and block after using fft --

 [-1198  -995  -916 ...   416   409   379]
 [  289   248   381 ...    -4    15   321]
 ...
 [ 3429  4607  3452 ... 13776 13738 13604]
 [13408  9697  4005 ...  5428  5389  5981]
 [ 6572  6834  6953 ...     0     0     0]]```
```blocks after fft :  [[ 2.03490000e+04+0.00000000e+00j  1.70904135e+03-1.72661480e+04j
  -6.65838677e+03+3.41195713e+04j ...  1.33128709e+03+3.95891200e+03j
  -6.65838677e+03-3.41195713e+04j  1.70904135e+03+1.72661480e+04j]
 [ 7.55290000e+04+0.00000000e+00j  6.26993787e+04+5.80008172e+03j
   9.69392154e+04-1.18256337e+04j ...  6.97357360e+04-2.70523446e+03j
   9.69392154e+04+1.18256337e+04j  6.26993787e+04-5.80008172e+03j]
 [-3.87660000e+04+0.00000000e+00j -5.49991855e+04+1.34433005e+04j
  -5.75556225e+04-4.38738493e+02j ... -5.49188931e+04-9.31813360e+03j
  -5.75556225e+04+4.38738493e+02j -5.49991855e+04-1.34433005e+04j]
 ...
 [ 4.76185000e+05+0.00000000e+00j  2.35227102e+05+1.13181706e+05j
  -1.36409643e+05+5.61953423e+05j ... -4.19050795e+04-1.94594074e+05j
  -1.36409643e+05-5.61953423e+05j  2.35227102e+05-1.13181706e+05j]
 [ 2.34655400e+06+0.00000000e+00j  7.02943400e+05-1.58216279e+06j
   6.85467802e+05+1.00273039e+06j ... -4.30924432e+05+1.08821807e+06j
   6.85467802e+05-1.00273039e+06j  7.02943400e+05+1.58216279e+06j]
 [ 4.58726000e+05+0.00000000e+00j -1.07550461e+06+7.62471567e+05j
   4.19460363e+05+1.06823585e+05j ...  4.96201820e+04-2.61617881e+06j
   4.19460363e+05-1.06823585e+05j -1.07550461e+06-7.62471567e+05j]]```

#

the way its converting arrays in 1d fourior transformation i dont understand at all, its like 1 + 2j with imaginary number

#

but here are all 'j'

wooden sail Jul 8, 2025, 6:08 PM

#

so what the fft does is apply a "discrete fourier transform" by using a fast algorithm

#

in general, the output of the fft is a complex-valued vector of the same length as the input vector

#

so you do expect all of the entries to be of the form a + bj

#

from wikipedia:

#

what the fft is doing is multiplying your vector by this matrix W of the appropriate size

#

if you interpret matrix-vector multiplication as taking several dot products, and the dot product as a measure of similarity, what the fft and dft do is measure how similar your input vector is to each of the rows of W

#

and the rows of W are complex exponentials of specific frequencies

#

so what the dft and fft do is decompose your input vector into complex exponentials of different frequencies

mortal ivy Jul 8, 2025, 6:13 PM

#

thats alot to digest lol

wooden sail Jul 8, 2025, 6:14 PM

#

this is something one would learn maybe late first year or early second year in uni if doing engineering. maybe earlier if doing maths

#

it's not exactly "elementary", but certainly very "fundamental" or important

mortal ivy Jul 8, 2025, 6:15 PM

#

hmm, thanks for the explaination. lemme digest this...

wooden sail Jul 8, 2025, 6:16 PM

#

https://en.wikipedia.org/wiki/DFT_matrix maybe read through this

DFT matrix

In applied mathematics, a DFT matrix is a square matrix as an expression of a discrete Fourier transform (DFT) as a transformation matrix, which can be applied to a signal through matrix multiplication.

undone steppe Jul 8, 2025, 6:55 PM

#

I'm working on what I thought would be a fairly simple project: trying write a python script that will send a prompt to a locally running LLM (google gemma, specifically). The request is to categorize a csv file of bank transactions, based on a csv file with the same format where I manually categorized a couple of previous months' transactions. I thought this sort of fuzzy logic matching would be in an LLM's wheelhouse, but it absolutely fails every time. Is my prompt incomplete? (See attached images) Is there another machine learning tool I should be using instead of an LLM?

agile cobalt Jul 8, 2025, 7:13 PM

#

You must give them clear instructions and preferably keep it short, avoid rambling.

Things like "must NOT do:" will frequently backfire, tell it only what to do, and at most include "without XYZ" for some specific instruction.
If it is doing something else instead, adjust your existing instructions before adding new ones.

Those file names are also extremely questionable, at least remove those "(1)"s, ideally rename to a standard format
not to mention "will likely be", "you may use any of them to draw your inferences from"... just remove that sort of hints, either let it draw its own conclusions or filter to only include the 'likely useful' categories

Lastly,

they are pretty damn bad when it comes to working with structured data
most local models can only handle so little context
overall you must give them clear and short instructions, individual short scoped tasks, and exactly as much context as needed to complete the task if you want to get good outputs out of them

#

try picking only 2~3 examples per category, then ask it to classify one individual row with a short prompt (a single phrase)

fallow coyote Jul 8, 2025, 8:06 PM

#

For NLP, is it better to use the nltk module or spaCY (i think its called)? Or is it a case of 'depending on what youre doing'

turbid epoch Jul 8, 2025, 8:37 PM

#

I want to get into AI/Machine Learning/LLMS etc how can I learn (for free possibily)

serene scaffold Jul 8, 2025, 9:07 PM

#

fallow coyote For NLP, is it better to use the nltk module or spaCY (i think its called)? Or i...

I have used spacy and have never used nltk.

serene scaffold Jul 8, 2025, 9:08 PM

#

turbid epoch I want to get into AI/Machine Learning/LLMS etc how can I learn (for free possib...

there are resources in the pins, and on our website. and I definitely encourage you to learn about things that interest you, but be advised that jobs in AI/ML/LLMs require formal training.

#

!resources data science

arctic wedgeBOT Jul 8, 2025, 9:08 PM

#

Resources

The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.

scenic parcel Jul 9, 2025, 7:17 AM

#

can you learn RL with kaggle

turbid epoch Jul 9, 2025, 10:37 AM

#

serene scaffold there are resources in the pins, and on our website. and I definitely encourage ...

wdym by "formal training"

#

you mean a certification?

serene scaffold Jul 9, 2025, 2:14 PM

#

turbid epoch you mean a certification?

university degree

tawny bison Jul 9, 2025, 2:40 PM

#

do i need to learn matplotlib, seaborn and plotly, these all three for data science? and also please tell from where should i learn

#

and i think as i know how to use MATLAB it would be easier as i read the syntax

serene scaffold Jul 9, 2025, 2:41 PM

#

tawny bison do i need to learn matplotlib, seaborn and plotly, these all three for data scie...

all three of those are for data visualizations. you don't need to know all three. if you already know matlab, you might not hate matplotlib

#

even though matplotlib is, without a doubt, the worst python library.

tawny bison Jul 9, 2025, 2:42 PM

#

lmao ok

#

but i need to learn it

#

so from where should i do

serene scaffold Jul 9, 2025, 2:42 PM

#

what do you mean by "need"?

tawny bison Jul 9, 2025, 2:43 PM

#

i mean i'm doing an internship/training at a govt company so they want me to make some graphs using their data

slim hearth Jul 9, 2025, 2:43 PM

#

ok so my friend is training his chatbot and i want to advertise it but i dont want to be interrupting any discussion here so if u want to join in the beta testing team dm me

#

also im sry for advertising here

serene scaffold Jul 9, 2025, 2:44 PM

#

slim hearth ok so my friend is training his chatbot and i want to advertise it but i dont wa...

this message is not allowed. don't be sorry for advertising--just don't do it.

slim hearth Jul 9, 2025, 2:44 PM

#

could i ask where do i do it?

serene scaffold Jul 9, 2025, 2:44 PM

#

slim hearth could i ask where do i do it?

idk. not this server.

serene scaffold Jul 9, 2025, 2:45 PM

#

tawny bison i mean i'm doing an internship/training at a govt company so they want me to mak...

in data science, "graphs" are not data visualizations. they're plots/data visualizations.
what kind of plots do you need to make?

tawny bison Jul 9, 2025, 2:45 PM

#

@serene scaffold can you help me with the resource as on yt there as 1.5hrs of videos

tawny bison Jul 9, 2025, 2:45 PM

#

serene scaffold in data science, "graphs" are not data visualizations. they're plots/data visual...

2-d, heat map

serene scaffold Jul 9, 2025, 2:46 PM

#

tawny bison <@253696366952316929> can you help me with the resource as on yt there as 1.5hrs...

I don't recommend videos. if you do watch videos, it's supremely important that you do something to actively follow along. you won't passively learn by watching.

tawny bison Jul 9, 2025, 2:47 PM

#

so is there any documentation?

#

nvm got it

#

btw thanks @serene scaffold

woven prairie Jul 9, 2025, 3:04 PM

#

What actually does it do , can you explain it in an easy way?

last knot Jul 9, 2025, 3:09 PM

#

woven prairie What actually does it do , can you explain it in an easy way?

It takes high resolution, noisy images that are in a pixel-art-style, for example a screenshot of a pixel art asset or the output of pixel-art-style images from chatgpt, and returns usable true pixel resolution assets

woven prairie Jul 9, 2025, 3:45 PM

#

Ok

lapis sequoia Jul 9, 2025, 4:11 PM

#

i had a new idea

#

you know minecraft AI

#

heres my idea

#

so you know those physics simulators

#

so people wanted to run a physics simulator backwards in time

#

so you train that minecraft AI on a reversed physics simulator footage and you getit

#

and that way you can also play minecraft backwards in time

ruby thunder Jul 9, 2025, 4:26 PM

#

Guys can anyone recommend Good Data Science projects for Beginners ? Website from where I can practice the concepts of stats and learn in real time

exotic star Jul 9, 2025, 4:59 PM

#

I made a khan academy account and enrolled in alg 1,2 college alg hs statistics college statistics pre calculus college calculus and other but im not sure in which order should i start and complete the courses

lapis sequoia Jul 9, 2025, 5:07 PM

#

from typing import Dict
from fastapi import FastAPI,Depends
from pydantic import BaseModel
from utils.model import get_model,Model


app = FastAPI()

class LabelRequest(BaseModel):
    text: str
    
class LabelResponse(BaseModel):
    probabilities: Dict[str, float]
    sentiment: str
    confidence: float
    


@app.post("/predict",response_model=LabelResponse)
async def predict(request:LabelRequest,model = Depends(get_model)):
    category,confidence,pred_prob = model.predict(request.text)
    return LabelResponse(category=category,confidence=confidence,pred_prob=pred_prob)

#

why will this not work with bert

wary oracle Jul 9, 2025, 5:34 PM

#

lapis sequoia why will this not work with bert

this code assumes that the model has a predicttext method that returns a label confidence score and probabilities if youre using a raw bert model from hugging face it doesnt come with such a method by default youd need to implement that yourself also make sure the output is converted to plain python types like dicts and floats because fastapi cant return pytorch tensors or numpy types directly

twilit topaz Jul 9, 2025, 9:23 PM

#

serene scaffold even though matplotlib is, without a doubt, the worst python library.

I like matplotlib

#

It's ez on the memory it uses

#

And it has a lot of themes

serene scaffold Jul 9, 2025, 9:24 PM

#

idc if it can make a plot with a million data points using only one bit. it has the least intuitive API of all time.

twilit topaz Jul 9, 2025, 9:26 PM

#

serene scaffold idc if it can make a plot with a million data points using only one bit. it has ...

Just a matter of opinion, with millions of points you are asking for on Plotly or other Python Libraries will have issues

#

Seaborn is a good alternative but it's still based on matplotlib

#

It'll consume a ton of memory (Plotly)

#

Matplotlib doesn't to me

#

Mostly kB instead of mB

calm thicket Jul 9, 2025, 9:48 PM

#

when will I ever make a plot with millions of points

serene scaffold Jul 9, 2025, 9:58 PM

#

calm thicket when will I ever make a plot with millions of points

you wouldn't, because data visualizations are for human consumption, and we can't do anything with that much information at once.

iron basalt Jul 9, 2025, 10:01 PM

#

calm thicket when will I ever make a plot with millions of points

Gotta render that particle simulation and i'm too lazy to use a game engine.

ornate pebble Jul 9, 2025, 10:08 PM

#

Is it ok to start with numpy since I know python but I don't remember every inbuilt function, I usually code in C++ and Javascript, but as python is easy so its syntax is easy to remember, but still there are some inbuilt functions which I don't remember... so can I start with numpy, pandas and data visualization or should I memorize those inbuilt functions first

serene scaffold Jul 9, 2025, 10:54 PM

#

ornate pebble Is it ok to start with numpy since I know python but I don't remember every inbu...

you shouldn't try to memorize any functions. you'll naturally remember them if they're useful for you.

vestal moth Jul 9, 2025, 11:04 PM

#

Just curious. Are there any open source Action Recognition CV models in the wild nowadays? I’m aware of SlowFast, X3D, and ActionClip. Some of those libraries seem outdated due to download issues so I’m not sure if that field of CV is still being researched. Unless those are SOTA and difficult to improve. I assume it’s due to other models (object detection and VLMs) being popular at the moment but I wanted to get someone’s view with more experience in this than me.

rich bridge Jul 10, 2025, 4:18 AM

#

im doing a competition for crypto price prediction and were given like 500k rows and 800 features, some of which are anonymized, and i have to predict an arbitrary label related to the return, does anyone have any advice for approaches to try? its my first time doing a competition like this

jaunty helm Jul 10, 2025, 5:09 AM

#

rich bridge im doing a competition for crypto price prediction and were given like 500k rows...

I think I know which you're talking about (DRW Crypto Market?)
in which case, imo honestly there's not much to gain
you can't do meaningful feature engineering as all features are anomalized
top scoring solutions are achieved by reverse-engineering the data shuffling process which probably goes against the spirit of the competition

rich bridge Jul 10, 2025, 5:09 AM

#

jaunty helm I think I know which you're talking about (DRW Crypto Market?) in which case, im...

yeah i was just looking at that and someone even had made their notebook public and a bunch of ppl submitted an 87% pearson

jaunty helm Jul 10, 2025, 5:10 AM

#

like if you actually want to learn what realistic approaches you can use, trying one of the beginner competitions will be way more helpful

rich bridge Jul 10, 2025, 5:10 AM

#

jaunty helm I think I know which you're talking about (DRW Crypto Market?) in which case, im...

but ngl for all these quant firm competitions it seems they have anonymized features

rich bridge Jul 10, 2025, 5:10 AM

#

jaunty helm like if you actually want to learn what *realistic* approaches you can use, tryi...

i see, i mainly wanted to do DRW to put it on my resume as a good signal

#

like placed top x percent

jaunty helm Jul 10, 2025, 5:13 AM

#

rich bridge but ngl for all these quant firm competitions it seems they have anonymized feat...

at which point the only thing you can do other than reverse engineer the dataset is just smacking a bunch of compute at hyperopt tuning an ensemble of trees, which is not interesting
as for resume, I honestly dont know if top x percent in a kaggle competition helps, though maybe #career-advice knows

rich bridge Jul 10, 2025, 5:14 AM

#

alright i’ll just submit this last run that’s going and then wrap this up probably, thank you!

pure yacht Jul 10, 2025, 5:47 AM

#

not sure if this or #algos-and-data-structs is suitable but

i have a dict where the keys are human readable model names of a range of devices, and i've been provided multiple spreadsheets that contain model names and relevant data pertaining to each model

my issue right now is being able to map the names from these spreadsheets to an entry in the dict as the way each model is written will differ slightly (not always a space between its name and generation, casing, + symbol instead of 'Plus', etc). i've tried using difflib but it's not accurate

does anyone know a good solution to make mapping these values consistent and correct? i have a vague idea where i convert each name into a set of individual words, and i perform a difference between each sets to find the one with the least amount but i'm yet to try this out

#

if there's any better solution/implementation that exists i'd love to hear about it

pure pond Jul 10, 2025, 1:09 PM

#

im moving abroad and learning the local language. I wanna do a little thing where I wear a small mic, then at the end of each day review my speech for grammar and vocab improvements, to get to native fluency in a much more efficient way. like having a floating ghost coach following me around and giving a class reviewing the day. would I need to finetune a stt model to pick out specifically my voice? or is that not necessary? once I have that I can just feed the dialogue into any good enough llm

serene scaffold Jul 10, 2025, 4:07 PM

#

pure pond im moving abroad and learning the local language. I wanna do a little thing wher...

you might see if Whisper can do it, but speech recognition models usually assume that the speaker is saying grammatically correct things in a standardized accent. It's unlikely to correctly transcribe speech from a novice in a language.

fair solar Jul 10, 2025, 4:39 PM

#

which one are you using

#

did just adding the timestamp as a feature not help? (or some normalization of it)

glacial root Jul 10, 2025, 4:45 PM

#

serene scaffold No

since it's subword tokenization, would it be fine to go about this method but then just remove the tokens with spaces after?

serene scaffold Jul 10, 2025, 4:46 PM

#

glacial root since it's subword tokenization, would it be fine to go about this method but th...

you're wanting to remove subtokens that aren't the "main" one?

#

or the other way around?

#

each token has one "head subtoken" and zero or more "trailing subtokens". and the trailing subtokens start with ## when you look at each subtoken individually.

glacial root Jul 10, 2025, 4:47 PM

#

serene scaffold you're wanting to remove subtokens that aren't the "main" one?

just removing the ones with spaces, i start off with all the characters separated, and then i group based on the most common pair rules of either bpe or word piece

glacial root Jul 10, 2025, 4:47 PM

#

serene scaffold each token has one "head subtoken" and zero or more "trailing subtokens". and th...

wouldn't that be done in stemming/lemmatization?

serene scaffold Jul 10, 2025, 4:48 PM

#

not necessarily. lemmas can still have subtokens.

glacial root Jul 10, 2025, 4:48 PM

#

i thought with bpe or word piece, it just tokenizes based on the most common pair, and with word piece also accounts for the frequency of the subparts of each pairing

serene scaffold Jul 10, 2025, 4:48 PM

#

it's been several days, so I don't remember what you're trying to do.

glacial root Jul 10, 2025, 4:49 PM

#

serene scaffold it's been several days, so I don't remember what you're trying to do.

sorry, became busy with some other things and couldn't work on this for a few days

#

just trying to implement bpe and word piece tokenizers correctly

#

input of a corpus and outputting an array with the tokens

fair solar Jul 10, 2025, 4:59 PM

#

makes sense, what sort of control are you expecting?

#

diminishing importance of timestamp feature or something (with time)?

#

without needing to recluster completely i mean

#

is that even possible, would be interesting to see

calm cipher Jul 10, 2025, 7:12 PM

#

what clustering method are you currently using?

#

oh ha yes you answered my next question, which was how to do you determine the number of clusters

#

I'm assuming the main issue with this is that each message embedding is computed independently of the others, so you might have some messages that aren't grouped in the correct cluster even if they should be?

#

got it, that's a cool idea

hollow pagoda Jul 11, 2025, 2:35 AM

#

they should add a trading channel to the server

mighty knot Jul 11, 2025, 2:47 AM

#

Hey, I'm currently building my own neural network from scratch only using NP. It's fully functioning with backprop, but I'm wondering if I made a mistake in how my weight matrices are stored.

Currently, they are stored as each column being a different neuron, and each row being a weight. So N X M, where N is rows (weights), and M is columns (neurons). Should it be the other way around?

mystic willow Jul 11, 2025, 3:13 AM

#

mighty knot Hey, I'm currently building my own neural network from scratch only using NP. It...

I doubt it would change anything besides how you program the algorithms to deal with the data. Even then, it's basically just mirrored.

mighty knot Jul 11, 2025, 3:17 AM

#

Yeah just some stuff changes. Like my layer output is calculated as x dot w instead of w dot x. And I don’t have to transpose to matrix ever.

Just wanted to make it standard so it’s easier to debug, but idk what the standard is.

woven prairie Jul 11, 2025, 6:39 AM

#

What is maximum line of code can gpt4o can write

#

I am using gpt 4o api and It's not writing more than 200 lines of code

#

Has anyone faced the same issues

upper bridge Jul 11, 2025, 7:51 AM

#

Okay, so I was thinking of proposing an AI localisation plan for our local government So what I was thinking is that we would use an open source model like Qwen3 and fine-tune using these local languages that are spoken in these areas, and we would have data sets. I have already gathered the data each language has around 200,000 rows containing sentences and question answers poemsstories etc so I was thinking that I would find tune that we have two Nvidia DGX stations with the a 100 GPUs so I will train and find you in the model there and then I was thinking that we just run it in that server but I checked with ChatGPT but it said that we would rather have to get that model into small server like GPUs and run it basically on a server with lots of GPUs and then multiple users would be able to use that so like do you have any expertise here? Could you tell me how to plan this out or go about this?

glacial root Jul 11, 2025, 9:01 AM

#

where can i find a good n-grams dataset that is not more than 1 gb in size?

toxic mortar Jul 11, 2025, 10:30 AM

#

Check out graph structure

tawny bison Jul 11, 2025, 10:41 AM

#

i am not able to plot avg time diff on matplotlib, can anybody help

toxic mortar Jul 11, 2025, 10:52 AM

#

It's very case specific, but glancign on your context you might want to read something like this https://substack.com/home/post/p-164375851

Building proactive AI agents

How we've built agents with dynamic-resolution memory, stateful tools, and more.

rain kelp Jul 11, 2025, 10:56 AM

#

Can someone help me decode binary files in to readable text where I can extract information from? It’s pretty challenging and I have been scratching my head for the last week trying to get a solution. These files come from a game and are stored locally in my pc. When I convert them into hex and then pass a reader on them I get very strange values like level being over 1million or coordinates being also like that. The main problem is probably due to my TLV reader. If someone can help me with this I will be very grateful because I have tried the last week to solve it. Please ping me

toxic mortar Jul 11, 2025, 10:57 AM

#

Okay, crawl walk run

walk would be entity resolution, something you see in incremental KG construction. You can see examples in graph-rag literature

#

uneven pawn Jul 11, 2025, 2:50 PM

#

I was trying to move some files to another folder from my current dir but to overwrite existing files, tried mv and mv -f, didn't work, asked chatgpt for a command to overwrite move

#

It rm -rfed my entire folder 💀

#

"Safest and cleanest way" it said

serene scaffold Jul 11, 2025, 2:53 PM

#

@uneven pawn this channel is for talking about how to implement AI, not about using AI products. But you shouldn't use LLM-generated code that you don't understand for anything that can't be undone.

peak field Jul 11, 2025, 4:16 PM

#

Hey guys - I was hoping you could help me find a dataset of images of sleep deprived people? I dont think that exists but an alternate solution was for dark circles under the eyes or swollen eyes or any other indicators (e.g. drooping faces). How would I go about specifically recognizing those features?

#

Maybe some helpful links? or youtube videos?

#

Technically if I just fed it pictures of a sleep deprived person, it would recognize those features itself but thats not really what im tryna do

#

lol

#

not that bad of an idea actually

#

Hey, just a question tho, the learning wouldnt be that efficient if its like basically the same pictures tho right?

#

that is tru tho

#

Actually i found a research paper so 🤞

#

https://hal.science/hal-01837080/document

#

incase anyone else might need it

#

lol

#

thanks

#

it would actually be really cool if i made my own dataset

#

never even considered this! thanks!

#

man there are so many innovative ways to create new data lol

#

Wait i dont understand how that works, monte carlo simulations are probablistic models for like all possible outcomes tho right? Can you explain how i could use it in this?

#

If i remember correctly

modest vigil Jul 11, 2025, 6:42 PM

#

What version of pandas/numpy do you guys use? I'm having memory leaks with them.

serene scaffold Jul 11, 2025, 6:52 PM

#

modest vigil What version of pandas/numpy do you guys use? I'm having memory leaks with them.

if you're sure the problem is a memory leak, see if that's a known issue with the version that you have (including the bugfix version)

peak field Jul 11, 2025, 6:59 PM

#

OH MY GOD. I dont think i could ever do that

#

But i will try my best

#

Wait so yo track the path of a photon with random simulations but how does that actually apply to the image?

shadow atlas Jul 11, 2025, 7:02 PM

#

Hey. I am having issue with Google Gemini API key.
My Gemini API was working perfectly. After some minutes i kept getting error of it being invalid. Error 400. What can be the possible issue?

shadow atlas Jul 11, 2025, 7:02 PM

#

shadow atlas Hey. I am having issue with Google Gemini API key. My Gemini API was working per...

Please reply to this message so that i can read it later

modest vigil Jul 11, 2025, 7:04 PM

#

shadow atlas Hey. I am having issue with Google Gemini API key. My Gemini API was working per...

Thats a bad HTTP request, just try again, make sure you paste your API key right.

woven prairie Jul 11, 2025, 7:27 PM

#

Anyone can guide me

#

I want to get in gen ai field

#

How can I get , where to learn

#

What you mean by school

#

Start from data science

#

Maths

exotic star Jul 11, 2025, 9:30 PM

#

Which khan academy math courses should i do and in what order?

#

also should i dedicate all of the time to math or learn and do projects with pandas, numpy and matplot working with data

dreamy rapids Jul 12, 2025, 12:03 AM

#

when feeding a perceiverio like network entities

#

is it smart to just feed it all the data you possibly have about the entity or is it still smart to try to filter it down to what's most useful?

#

(please ping me on reply, i check here once every eon basically)

hallow badger Jul 12, 2025, 3:42 AM

#

What does meaning of the tag call snek

serene scaffold Jul 12, 2025, 3:53 AM

#

hallow badger What does meaning of the tag call snek

It's a meme version of saying "snake"

ruby thunder Jul 12, 2025, 7:54 AM

#

Hi, I am thinking of creating a RAG based Perfume Recommender frrom a particular website, in that Can anyone guide what I can use and how I can create also send some resourece which I can use ? I want to make it completely free

ashen blaze Jul 12, 2025, 12:40 PM

#

oh masters of Python data science community.

#

Please Bestow Upon me the knowledge of data science

#

How should I learn Pandas and Numpy?

calm thicket Jul 12, 2025, 12:51 PM

#

read the tutorial on their websites

shadow viper Jul 12, 2025, 12:55 PM

#

reading docs help actually

queen raft Jul 12, 2025, 1:31 PM

#

hi i am a beginner how start with ai and neural networks

exotic star Jul 12, 2025, 1:59 PM

#

calm thicket read the tutorial on their websites

i just started the kaggle course for pandas its a short one

#

is it enough to get u started?

#

i plan on soing projects after that

grand minnow Jul 12, 2025, 2:02 PM

#

ruby thunder Hi, I am thinking of creating a RAG based Perfume Recommender frrom a particular...

https://langchain-ai.github.io/langgraph/tutorials/rag/langgraph_agentic_rag/

Agentic RAG

Build reliable, stateful AI systems, without giving up control

grand minnow Jul 12, 2025, 2:02 PM

#

ashen blaze How should I learn Pandas and Numpy?

https://www.kaggle.com/learn/pandas

Learn Pandas Tutorials

Solve short hands-on challenges to perfect your data manipulation skills.

edgy niche Jul 12, 2025, 2:40 PM

#

hey guys any roadmap to learn data analysis pls

mossy pond Jul 12, 2025, 2:58 PM

#

Science and Ai ... i code python with help of Ai... with a bit of brain works well 🙂

edgy niche Jul 12, 2025, 2:59 PM

#

huh

#

what is bro YAPPING about

grand minnow Jul 12, 2025, 4:13 PM

#

mossy pond Science and Ai ... i code python with help of Ai... with a bit of brain works we...

https://roadmap.sh/ai-data-scientist

roadmap.sh

AI and Data Scientist Roadmap

Learn to become an AI and Data Scientist using this roadmap. Community driven, articles, resources, guides, interview questions, quizzes for modern AI and Data Science.

clever void Jul 12, 2025, 4:38 PM

#

what do you need differential calculus for

#

the majority of calculus 3 and upward is almost hardly ever seen

mossy pond Jul 12, 2025, 4:39 PM

#

no nooooo ... learn what is possible with Ai-LLM and coding
I think all coders are afraid ... because coding is a language like English .. and now all people around the world are able to code ^^

clever void Jul 12, 2025, 4:39 PM

#

most of the calculus derivations that I see rests in the appendices of papers and not in the main portions

clever void Jul 12, 2025, 4:40 PM

#

mossy pond no nooooo ... learn what is possible with Ai-LLM and coding I think all coders a...

by definition

#

they are not able to code

#

LLMs are able to code

#

humans are just mediators in that case

mossy pond Jul 12, 2025, 5:00 PM

#

clever void humans are just mediators in that case

oc ... but the potential is not that bad ... Iv coded two stable windows tools ... without Ai it where impossible 😄

finite surge Jul 12, 2025, 6:05 PM

#

can i sell ai agents for lots of money

serene scaffold Jul 12, 2025, 6:10 PM

#

finite surge can i sell ai agents for lots of money

No

finite surge Jul 12, 2025, 6:11 PM

#

serene scaffold No

acc

#

i thought u could sell it to companies

serene scaffold Jul 12, 2025, 6:11 PM

#

finite surge i thought u could sell it to companies

Nope

finite surge Jul 12, 2025, 6:11 PM

#

serene scaffold Nope

what can i do for money then

serene scaffold Jul 12, 2025, 6:12 PM

#

finite surge what can i do for money then

get a degree and apply for full-time jobs with companies

finite surge Jul 12, 2025, 6:13 PM

#

serene scaffold get a degree and apply for full-time jobs with companies

im not there yet but what college should i go in manchester

#

after hs

serene scaffold Jul 12, 2025, 6:14 PM

#

finite surge im not there yet but what college should i go in manchester

I don't know.
I wouldn't even try to make money from programming while you're in high school. focus on doing well so that you can get into a good university that aligns with your goals.

finite surge Jul 12, 2025, 6:15 PM

#

serene scaffold I don't know. I wouldn't even try to make money from programming while you're in...

ok i wanted money quick cause i thought that i cant make money of coding in manchester so idk what to do when im older kinda stressing

mossy pond Jul 12, 2025, 6:15 PM

#

finite surge can i sell ai agents for lots of money

you can buy a GPU 500$ and doo all local with Ai 😉

serene scaffold Jul 12, 2025, 6:16 PM

#

finite surge ok i wanted money quick cause i thought that i cant make money of coding in manc...

there are no ways to make a quick buck with programming

finite surge Jul 12, 2025, 6:16 PM

#

serene scaffold I don't know. I wouldn't even try to make money from programming while you're in...

dw im cooking in skl top 10 maths ect

serene scaffold Jul 12, 2025, 6:17 PM

#

mossy pond you can buy a GPU 500$ and doo all local with Ai 😉

what are you suggesting they do with that GPU that will help them make money?

finite surge Jul 12, 2025, 6:18 PM

#

serene scaffold what are you suggesting they do with that GPU that will help them make money?

so i shouldnt learn coding now or?

mossy pond Jul 12, 2025, 6:18 PM

#

serene scaffold there are no ways to make a quick buck with programming

oc ... install LM-studio download Qwen2.5-Coder-14B and start ^^

finite surge Jul 12, 2025, 6:19 PM

#

serene scaffold I don't know. I wouldn't even try to make money from programming while you're in...

thing is i picked triple science not computer science will that affect me

serene scaffold Jul 12, 2025, 6:21 PM

#

mossy pond oc ... install LM-studio download Qwen2.5-Coder-14B and start ^^

you won't make money using that.

serene scaffold Jul 12, 2025, 6:21 PM

#

finite surge so i shouldnt learn coding now or?

if you're doing well in school and have extra time, then sure

mossy pond Jul 12, 2025, 6:24 PM

#

all coder s are afraid of Ai ... because its easy now to get a small part of code working now for every one 😉

serene scaffold Jul 12, 2025, 6:25 PM

#

mossy pond all coder s are afraid of Ai ... because its easy now to get a small part of cod...

I'm not sure what point you're making.
my job involves training LLMs to do different tasks. and I don't just get paid to download LLMs and "start".

mossy pond Jul 12, 2025, 6:30 PM

#

serene scaffold I'm not sure what point you're making. my job involves training LLMs to do diffe...

i see ... all fine ... iam not a coder ... but i know the logic and can use my brain ... so iam very happy that i can use the support of LLM´s
... oc some times I need help for details (eg restart a compiled exe) or hardware differ from my own ... for check out if its run ....

#

you train LLM for coding?

serene scaffold Jul 12, 2025, 6:31 PM

#

yes

mossy pond Jul 12, 2025, 6:34 PM

#

serene scaffold yes

that's great, what do you think is a good programming model - special in python?

serene scaffold Jul 12, 2025, 6:35 PM

#

mossy pond that's great, what do you think is a good programming model - special in python?

I usually use the biggest llama model on my company's llm server.

mossy pond Jul 12, 2025, 6:37 PM

#

serene scaffold I usually use the biggest llama model on my company's llm server.

oh... llama and sounds like pay for it 👀

#

maybe one kill question for your model ... windows, pyinstaller, internal restart of exe file ... I'm glad I have to ask a human for this kind of problem ^^

finite surge Jul 12, 2025, 6:54 PM

#

serene scaffold if you're doing well in school and have extra time, then sure

ty put me down right path cause search it up and everything entrepenuer may be rich but most dont make it and they stress cause not stable pay

serene scaffold Jul 12, 2025, 7:07 PM

#

finite surge ty put me down right path cause search it up and everything entrepenuer may be r...

The market can't enable every entrepreneur to be successful, no matter how hard they work

buoyant vine Jul 12, 2025, 7:08 PM

#

also, I will say, just because the AI can spew out some code doesn't mean you don't need to understand what it is doing

#

I have seen so many cases of people using AI to generate their product and it is naturally filled with issues and security leaks

serene scaffold Jul 12, 2025, 7:18 PM

#

(and they "developers" are powerless to fix it, without developing the skills they would have needed to produce it on their own.)

clever void Jul 12, 2025, 9:51 PM

#

mossy pond oc ... but the potential is not that bad ... Iv coded two stable windows tools ....

you're using the term wrong

#

you didn't code

#

the LLM did it

ripe vortex Jul 13, 2025, 6:37 AM

#

Hi, i'm using langchain to do an RAG application, but the web loader don't have support to supabase storage, so i have to change my file provider??

past meteor Jul 13, 2025, 7:56 PM

#

buoyant vine also, I will say, just because the AI can spew out some code doesn't mean you do...

The biggest issue so fa ris that LLMs seem to like just adding more and more code to solve problems

#

AI slop PRs that introduce 500 LoC for something that can be done in 50

calm cipher Jul 13, 2025, 7:58 PM

#

I noticed an odd quirk of agent generated code is that it likes to be extremely unnecessarily defensive

#

instead of just opening a file, it will implement a loop that attempts to load the file 10 times with elaborate error checking and tracking before giving up

#

as though a missing file might somehow mysteriously appear

past meteor Jul 13, 2025, 7:59 PM

#

Yeah, good observation. It'll validate inputs over and over

#

When it's much better design to validate stuff once at the edge and then to assume it'll be there

calm cipher Jul 13, 2025, 7:59 PM

#

that could actually be dangerous if the code is for a really high-traffic web site or something

#

if there's some error in the system that makes an expected file unavailable, it's probably super taxing for the host computer to deal with, and if lots of incoming requests are all asking for it you could end up overloading the system with thousands of unnecessary open attempts

#

totally nuts

#

also i've seen code interacting with DataFrames constantly checking for the existence of columns before interacting with them, as though the person submitting the work didn't know if those columns were there

tidal bough Jul 13, 2025, 8:13 PM

#

this wasn't my experience; I often see code that's too optimistic

calm cipher Jul 13, 2025, 8:19 PM

#

It probably depends on which LLM you're using, and how you word the prompt, and random chance

glacial hemlock Jul 13, 2025, 9:16 PM

#

good day !

#

somebody has a road map for data science ?

livid cipher Jul 13, 2025, 11:14 PM

#

What you mean road map?

shadow atlas Jul 14, 2025, 6:13 AM

#

How can we monetize mcp servers of our own? Any idea guys?

Please reply to this message so that i can read later.

agile cobalt Jul 14, 2025, 12:32 PM

#

shadow atlas How can we monetize mcp servers of our own? Any idea guys? Please reply to this...

the same way you would monetize any other API

raw sorrel Jul 14, 2025, 12:42 PM

#

Hello guys, i want to ask something, if i want to learn data analyst, can you give me recommendation free course pr resource to learning about data analyst?

please reply to this message if you have recommendation

shadow atlas Jul 14, 2025, 4:01 PM

#

agile cobalt the same way you would monetize any other API

I'm confused here

shadow atlas Jul 14, 2025, 4:01 PM

#

raw sorrel Hello guys, i want to ask something, if i want to learn data analyst, can you gi...

Go with kaggle.com

raven garden Jul 14, 2025, 5:29 PM

#

hi, I hope you are all doing well. I wanted to know if someone could guide me on how to set up physionet (.org). I want to use public healthcare datasets, but its not as straightforward as I thought ( setting up agreements, credentialing, ... ). is someone familiar enough with physionet to guide me on the steps I should take to be able to use any dataset on this plateform ? thanks !

raven garden Jul 14, 2025, 5:34 PM

#

raw sorrel Hello guys, i want to ask something, if i want to learn data analyst, can you gi...

hi, you also have the freecodeCamp 19h free course on youtube

raven garden Jul 14, 2025, 5:41 PM

#

raven garden hi, I hope you are all doing well. I wanted to know if someone could guide me on...

well actually that is indeed straightforward i am just blind, I got it. thanks anyways

raw sorrel Jul 14, 2025, 10:24 PM

#

raven garden hi, you also have the freecodeCamp 19h free course on youtube

what's the name of channel recommendation?

raven garden Jul 14, 2025, 10:25 PM

#

raw sorrel what's the name of channel recommendation?

FreeCodeCamp - you should fall on the video pretty quickly if you search "data analyst full course"

raw sorrel Jul 14, 2025, 10:26 PM

#

ahh i see, thanks bro for your recommendation

raven garden Jul 14, 2025, 10:27 PM

#

no problem!

exotic star Jul 14, 2025, 11:07 PM

#

is learning pandas and doing pandas projects a good ML start?

fallow coyote Jul 14, 2025, 11:08 PM

#

exotic star is learning pandas and doing pandas projects a good ML start?

Learn maths. If you dont understand the maths, you wont be able to understand how to use the tools

exotic star Jul 14, 2025, 11:11 PM

#

fallow coyote Learn maths. If you dont understand the maths, you wont be able to understand ho...

which courses from khan academy should i take and should i only do math and then programming or at the same time

#

not sure what level of math u need for this thats why im not sure which course

fallow coyote Jul 14, 2025, 11:28 PM

#

Make sure you know your high schools maths well if yoy already dont. Learn linear algebra, stats, probability and calculus. You can use khan academy for linear algebra and calculus. I used ISLP for the statisitcs side. Use Python for Data Science by Wes Kineey to learn the basics of the modules youll use for ML and statistical analysis. Start from there, research, practice, get angry, research, practice, bash your head against the wall and repeat

quaint mulch Jul 15, 2025, 12:38 AM

#

exotic star is learning pandas and doing pandas projects a good ML start?

Yes it is a good start. No it is not enough, as said, you also need to learn math, but I still think it is a good start.

exotic star Jul 15, 2025, 1:33 AM

#

got it. thanks a lot

shy otter Jul 15, 2025, 4:14 AM

#

If anyone is Agentic coding, I came across this service recently; gives $100 of free Anthropic credit to use Claude Code.

BIG Disclaimer : This is an affiliate link. Other than gaining credit I am not involved with this in any way.
I will just gain some free credit on their platform, but you also get double credit for signing up ($100 instead of $50) - So win/win.
https://anyrouter.top/register?aff=zb2p

You can strip off my affiliate if you want, but you'll only get $50 credit, not $100.

This is some Chinese site - probably not legitimate, and is probably scraping your code/data. It will probably disappear as quickly as it appeared, so I'm using it to test some ideas in Claude Code - Not for any production code, or sensitive data. It is a great way to 'try before you buy', but if you use this you do so knowing the potential risks

The instructions are translated to English in the attached.

📎 rocket_Quick_Start.txt

arctic wedgeBOT Jul 15, 2025, 4:14 AM

#

shy otter If anyone is Agentic coding, I came across this service recently; gives $100 of ...

Click here to see this code in our pastebin.

trim nymph Jul 15, 2025, 4:26 AM

#

shy otter If anyone is Agentic coding, I came across this service recently; gives $100 of ...

I'll use it

trim nymph Jul 15, 2025, 4:27 AM

#

exotic star is learning pandas and doing pandas projects a good ML start?

euhhh

#

just start with a project you would like to complete with ml related solution

gaunt rover Jul 15, 2025, 5:48 AM

#

Hey everyone! I'm Fahad. I started learning Python a few weeks ago for Data science AI & automation—still super new, but trying to stay consistent. I'm looking for a coding buddy or small group to learn and maybe build some mini-projects with. If anyone's interested, feel free to DM me or reply here! 🙂

tawny bison Jul 15, 2025, 6:20 AM

#

do i need to master numpy pandas and matplotlib as i just know the basics i mean the main commands and have to do seaborn and plotly

#

also can anybody suggest some seaborn and plotly resources

serene scaffold Jul 15, 2025, 10:08 AM

#

tawny bison do i need to master numpy pandas and matplotlib as i just know the basics i mean...

you don't need to "master" any of them as long as you understand them well enough to read the docs any time you need to.

#

I created every table in this paper in a total of probably a few thousand lines of pandas code. and I still look at the docs sometimes. https://www.sciencedirect.com/science/article/pii/S1532046421002999?via%3Dihub

quartz zodiac Jul 15, 2025, 10:19 AM

#

I have built a model classifier on the wine prediction data set in Kaggle. I want to make custom predictions using that model on my React website. Do I need to dump the entire model or just the model parameters into a file and load that model(from the file) on a flask server and send the input wine data by a request to the server, make the prediction and then return the answer. How would it work?

tawny bison Jul 15, 2025, 10:21 AM

#

serene scaffold you don't need to "master" any of them as long as you understand them well enoug...

oh okay

tawny bison Jul 15, 2025, 10:21 AM

#

serene scaffold I created every table in this paper in a total of probably a few thousand lines ...

ok thanks

#

so you just read the whole documentation @serene scaffold

serene scaffold Jul 15, 2025, 10:22 AM

#

tawny bison so you just read the whole documentation <@253696366952316929>

No, only the parts that are relevant.

tawny bison Jul 15, 2025, 10:22 AM

#

serene scaffold No, only the parts that are relevant.

oh okay

#

dm'ed you the way i'm doin

serene scaffold Jul 15, 2025, 10:23 AM

#

tawny bison dm'ed you the way i'm doin

I don't respond to those--you should post the message here.

serene scaffold Jul 15, 2025, 10:23 AM

#

quartz zodiac I have built a model classifier on the wine prediction data set in Kaggle. I wan...

do you know the difference between model parameters and the model itself?

tawny bison Jul 15, 2025, 10:23 AM

#

serene scaffold I don't respond to those--you should post the message here.

not able to upload any file

quartz zodiac Jul 15, 2025, 10:23 AM

#

I think i dont

serene scaffold Jul 15, 2025, 10:26 AM

#

quartz zodiac I think i dont

a model is the parameters, and the computation graph. and the computation graph is just "how you use the parameters"

#

so you can't really save the parameters as a separate thing from the model, in a sense that is useful.

quartz zodiac Jul 15, 2025, 10:29 AM

#

So if I want instant results I should save the entire weight along with the models, the training parameters, (ie the complete model itself)

serene scaffold Jul 15, 2025, 10:30 AM

#

quartz zodiac So if I want instant results I should save the entire weight along with the mode...

by training parameters, are you talking about the hyper parameters? those aren't part of the model.

quartz zodiac Jul 15, 2025, 10:32 AM

#

Yes

#

I get it now... actually

#

hyper parameters are pre training

#

and training parameters are during training that are calculated

serene scaffold Jul 15, 2025, 10:38 AM

#

quartz zodiac hyper parameters are pre training

the hyper parameters determine how the training is conducted

zealous girder Jul 15, 2025, 11:53 AM

#

I've entered into a kaggle competition, and i'm doing all the training and stuff on my machine locally and treating it as a serious project. What sort of project structure (directory structure) should I have?

peak field Jul 15, 2025, 1:01 PM

#

Hey guys

#

Any datasets for whether a person is walking or not

#

Basically stationary vs moving?

#

And using smartphone if possible but really just an accelerometer

#

(gyroscope if necessary)

fair solar Jul 15, 2025, 1:10 PM

#

which ones are you using btw

warped notch Jul 15, 2025, 1:13 PM

#

any recommendation on which book (any resource) to learn data mining?

fair solar Jul 15, 2025, 1:18 PM

#

oh makes sense. so wait, claude recalling stuff well is turning out to be a problem?

#

I reread this but still didn't understand it well enough ig

#

ah, so claude is just being fed with everything it needs and is only used for the final step i'm guessing

#

so how does it's temporal awareness affect you

#

makes sense

#

yeah, that makes sense

somber willow Jul 15, 2025, 2:30 PM

#

guys do you know where i can learn pytorch

odd meteor Jul 15, 2025, 3:13 PM

#

somber willow guys do you know where i can learn pytorch

https://youtu.be/Z_ikDlimN6A?si=CcfJc3CILU-FEpii

YouTube

Daniel Bourke

Learn PyTorch for deep learning in a day. Literally.

Welcome to the most beginner-friendly place on the internet to learn PyTorch for deep learning.

All code on GitHub - https://dbourke.link/pt-github
Ask a question - https://dbourke.link/pt-github-discussions
Read the course materials online - https://learnpytorch.io
Sign up for the full course on Zero to Mastery (20+ hours more video) - https:/...

▶ Play video

#

Then, if video content isn't your thing, check the tutorial section on PyTorch website

toxic sigil Jul 15, 2025, 3:27 PM

#

hi guys
as a machine learning beginner trying to make stock market price predictions or any project in general
how do i learn libraries and know how to use them ony my own and come up with ideas to use them for a project
i dont want to copy youtube videos etc. i want to use them on my own so i wanted to know if anyone has experience tell me the workflow or learn method

import numpy as np
import pandas as pd
import yfinance as yf
import matplotlib.pyplot as plt

import torch
import torch.nn as nn
import torch.optim as optim

from sklearn.preprocessing import StandardScaler
from sklearn.metrics import root_mean_squared_error

jaunty helm Jul 15, 2025, 3:53 PM

#

toxic sigil hi guys as a machine learning beginner trying to make stock market price predict...

try kaggle's beginner competitions
they usually come accompanied by a course that'll guide you through some basics

toxic sigil Jul 15, 2025, 3:55 PM

#

jaunty helm try kaggle's beginner competitions they usually come accompanied by a course tha...

i checked it out but does that recquire any previous knowledge of libraries or mathematics?

drowsy mesa Jul 15, 2025, 4:47 PM

#

Hello everyone i am new developer how can I use python for security purpose can any say it....

odd meteor Jul 15, 2025, 5:27 PM

#

drowsy mesa Hello everyone i am new developer how can I use python for security purpose can ...

Suggestions on how to use python for "security purpose" can be interpreted in various ways depending on who you asked and the domain of that person. Can you be more elaborate on what the end goal would be?

drowsy mesa Jul 15, 2025, 5:29 PM

#

In ai agent for security stored file

#

If I am uploading the file in backend and the ai reads It.. And I want to make sure the data should be secure

tranquil jasper Jul 15, 2025, 5:42 PM

#

hi
for someone to get into computer vision, what do we need to learn beforehand?

agile cobalt Jul 15, 2025, 6:00 PM

#

drowsy mesa If I am uploading the file in backend and the ai reads It.. And I want to make s...

"security" is not one thing you do, it's everything you do - and more importantly, everything you don't do

things are secure as long as you do not allow for anyone to access them.

every single time you allow for something to access it, you have to take into consideration who/what you're giving access to and what they can do with that access

fallow coyote Jul 15, 2025, 6:03 PM

#

anyone recommend any good commodity apis with a decent free plan? need to extract price data of gold, silver, platinum and copper

drowsy mesa Jul 15, 2025, 6:04 PM

#

What package can I use

agile cobalt Jul 15, 2025, 6:05 PM

#

that is not something a package can solve - no matter which tool you use, if you misconfigure things you'll create a security vulnerability

drowsy mesa Jul 15, 2025, 6:10 PM

#

We can encryption right

drowsy mesa Jul 15, 2025, 6:13 PM

#

agile cobalt that is not something a package can solve - no matter which tool you use, if you...

Just say what are ways to for security like ways

sharp crow Jul 15, 2025, 7:15 PM

#

pithink

fresh sluice Jul 15, 2025, 7:15 PM

#

Anyone who is working in AI field here?

serene scaffold Jul 15, 2025, 7:29 PM

#

fresh sluice Anyone who is working in AI field here?

what would you ask them?

fresh sluice Jul 15, 2025, 7:38 PM

#

serene scaffold what would you ask them?

I actually want to know what projects should I make which can make me standout as a 4th year in AIML domain

serene scaffold Jul 15, 2025, 7:39 PM

#

fresh sluice I actually want to know what projects should I make which can make me standout a...

don't do RAG, because everyone is doing RAG.

fresh sluice Jul 15, 2025, 7:49 PM

#

serene scaffold don't do RAG, because everyone is doing RAG.

True, all that AI Agents.... So what else am I supposed to do?

neon owl Jul 15, 2025, 9:22 PM

#

Anyone working in the Ai/Ml field , i am in my second year going to 3rd year which is year of internship my resume isnt that strong i am studying informatics but i want internship for Ai/ML as in international internship

Any recommendations for how to standout ?

serene scaffold Jul 15, 2025, 9:41 PM

#

neon owl Anyone working in the Ai/Ml field , i am in my second year going to 3rd year whi...

Are there research professors in your department?

rocky swan Jul 16, 2025, 12:28 AM

#

Is there anyway I could fine tune an LLM with only outputs as it's training data or do I need to make an LLM generate the input data and add it to the dataset or something.

serene scaffold Jul 16, 2025, 12:31 AM

#

rocky swan Is there anyway I could fine tune an LLM with only outputs as it's training data...

When you fine tune an (interactive) LLM, you're training it to respond a certain way to a given input. So you need pairs of inputs (an example input prompt) and outputs (the desired response to that input)

calm cipher Jul 16, 2025, 1:20 AM

#

This is true for question answering models but a lot of LLMs are just predicting the next token in a sequence, which you can train with only a giant pile of textual data

#

So in that case the input would be a document up to word t, and the output is word t+1

#

If that's what you want to do, and say you're starting with a pre trained model on huggingface, you would fine tune the casual variant of the model

serene scaffold Jul 16, 2025, 1:37 AM

#

calm cipher This is true for question answering models but a lot of LLMs are just predicting...

You probably already know this, but even interactive LLMs are just (repeatedly) predicting the next token. It's just that during inference, they keep generating until they get to an end-of-turn token.

stray void Jul 16, 2025, 3:12 AM

#

Are there libraries available that implement geodesic walking on an STL surface, given some travel direction? I know there are libraries that can calculate the geodesic distance between 2 points, but what if you don't know the endpoint in advance?

drowsy mesa Jul 16, 2025, 3:19 AM

#

Instead of rag what can we use ?

quaint mulch Jul 16, 2025, 3:48 AM

#

fresh sluice I actually want to know what projects should I make which can make me standout a...

publish here.

quaint mulch Jul 16, 2025, 3:50 AM

#

stray void Are there libraries available that implement geodesic walking on an STL surface,...

people in #game-development might have better answers because they work with 3d

gaunt apex Jul 16, 2025, 3:55 AM

#

Hi!
I'm training an OCR model (CRNN/Easter2 architectures) and getting inconsistent results on Kaggle despite using:

Same dataset and preprocessing
Same code/hyperparameters
Same random seeds
Previously got good CER performance, now stuck at 70%+ with repetitive predictions

The model gets stuck outputting repetitive character patterns instead of learning to read text properly, even with different seeds and learning rates.

Has anyone experienced:

Different OCR training behavior between Kaggle sessions?
Model collapse (repetitive predictions) with CRNN/Easter2 on P100s?
Memory constraints affecting OCR convergence?
Different PyTorch/CUDA behavior on Kaggle vs other platforms?

Could Kaggle's P100 GPU environment be causing this? Any insights on GPU-specific OCR training issues would be helpful!

Hardware: Kaggle P100
Framework: PyTorch
Models: CRNN, Easter2
Task: Text recognition

serene scaffold Jul 16, 2025, 4:03 AM

#

drowsy mesa Instead of rag what can we use ?

to do what?

fresh sluice Jul 16, 2025, 4:39 AM

#

quaint mulch publish here.

Where is it actually? This picture?

drowsy mesa Jul 16, 2025, 4:40 AM

#

serene scaffold to do *what*?

To read documents by ai agent

quaint mulch Jul 16, 2025, 5:35 AM

#

fresh sluice Where is it actually? This picture?

I'm not sure what you mean? the picture is just screnshots from google scholars

fresh sluice Jul 16, 2025, 5:48 AM

#

quaint mulch I'm not sure what you mean? the picture is just screnshots from google scholars

Oh understood, are these some research papers?

opaque condor Jul 16, 2025, 6:22 AM

#

could anyone look over this code and tell me what im doing wrong i have done this 3 times but its better i have a person who secilizes in convolutional networks:
https://paste.pythondiscord.com/C52A

sharp crow Jul 16, 2025, 7:25 AM

#

Oi lads I have a question
I am working on a simple regression problem and my scores are
TRAINING SCORE-0.999999
TESTING SCORE-0.999999
R2_SCORE- 0.999999
MSE- 4.52
RMSE- 2.13
These scores look too perfect to me

#

Any one who can help me

#

pithink

sharp crow Jul 16, 2025, 7:59 AM

#

Nvm found it

#

joe_salute

quaint mulch Jul 16, 2025, 8:17 AM

#

fresh sluice Oh understood, are these some research papers?

Kinda, those are academic conferences. Research papers are published in many places, including those places. I'm trying to say is that publishing in top AI conferences would make you really standout. Very difficult but I have seen people done it. This is one of the most "standout" thing that you can do.

fresh sluice Jul 16, 2025, 8:19 AM

#

quaint mulch Kinda, those are academic conferences. Research papers are published in many pla...

Understandable... I will look into those, but I am more of a technical person, uk building stuffs so that's something I am still looking for

quaint mulch Jul 16, 2025, 8:19 AM

#

what's "uk building" ?

grand minnow Jul 16, 2025, 8:21 AM

#

quaint mulch what's "uk building" ?

uk = you know

quaint mulch Jul 16, 2025, 8:23 AM

#

writing papers usually also involve building stuff.
Another alternative would be to win competitions in places like kaggle / aicrowd

toxic pilot Jul 16, 2025, 11:43 AM

#

sharp crow Oi lads I have a question I am working on a simple regression problem and my sc...

Check for data leaking