#data-science-and-ml

1 messages · Page 169 of 1

opaque condor
#

A medical I was taking photos of a human eye to see if that patient had a heart condition or something worse that couldn't be diagnosed early on

Somehow the AI found the biological sex of each patient without ever being trained on it

serene scaffold
opaque condor
#

I can't remember what type of medical model think it was 2015

serene scaffold
#

Okay

#

I guess we'll never know

lucid wren
#

I am going to pursue Core CSE should I do?

hearty token
#

Is there a good book on fine-tuning, for things like understanding hyperparameters and functional analysis of the different curves like loss grad norms?

rain kelp
#

hello everyone i need someone to help me solve a problem that i am unable to tackle. i have created a forecasting model on a train df. i need to test the submission on a test df. the difference between these 2 is that the test df does NOT have the 'sales' column but the train df does. the test df starts on the day the train stops and goes on for 15 extra days. my question is:
how do i make the lag values for the first 7 days of the test df? i need to take the values of the first day and assign it the 'sales' value of the train df but i am unable to do it. Also there are many values for each day as sales are made for each store and family of a certain shop group.

could someone help me make this function?

#

the model i trained on the test df has shown that lag values are very important features in predicting sales so i need to make them as well for the dest data

jaunty helm
agile cobalt
#

"similar pricing"?
Haiku is 0.80/4.00 USD per million input/output tokens, while Gemini Flash is 0.30/2.50?

unless Anthropic's tokenizer is much more efficient that feels like a giant difference to me?

wet dome
#

hi guys, im trying to implement my own linear regression class. One issue im running into is, for some datasets where values can go up to thousands for example my learning rate has to be really small, e.g. 1e-8. Otherwise my parameters blow up to infinity

#

does anyone know why

past meteor
#

Do you have evals or just vibe check?

long ivy
wet dome
#

is it a bad thing i have to use a tiny learning rate value? @long ivy

#

does it just mean it will take longer to find the optimal parameters? as smaller step size across the surface

long ivy
past meteor
#

prompting 🤢

#

Cached tokens to reduce cost?

#

idk how Anthropic is priced in detail but input tokens <<<< output tokens

#

Your responses seem quite short

#

Is it worth optimizing to this extent?

#

Even so, that's all on the input side?

#

What are you using again?

#

Haiku?

#

For the notification stuff you can also switch to the batch API

#

You send a request and they do it on a best effort basis in an hour or so

#

And then you can for instance poll every N minutes to see the status

#

Reduces cost by 50 %

#

(disclaimer: I mostly use OpenAI. All I've used Claude for is to provide support for my mini lib)

#

Looking at the docs, like OpenAI, they have a batch API

#

The Message Batches API is a powerful, cost-effective way to asynchronously process large volumes of Messages requests. This approach is well-suited to tasks that do not require immediate responses, with most batches finishing in less than 1 hour while reducing costs by 50% and increasing throughput.

past meteor
#

How much does it cost you per day so far?

past meteor
#

As in, reasoning models?

agile cobalt
#

numerical scores like ```

  • Mood: 7/10 (focused, slightly tired)
  • Energy: 6/10 (afternoon dip)
  • Stress: 3/10 (manageable workload)
past meteor
#

In my automated evals I do low/med/high instead of numbers, fire it N times concurrently and take the mode

agile cobalt
#

Enum-like values can work better/more reliably/be more robust, like energy: LOW | DIMINISHING | MEDIUM | BUILDING UP | HIGH or mood: HAPPY | SAD | ANGRY

past meteor
#

so you watch south park during your break

hollow cobalt
#

Hello all, I'm in process of creating a small LLM and I'm running into small problems as far as the performance of the models. I've pretrained and finetuned a few models, and I was curious if anyone in this group chat is experienced with creating LLM's. I would love to ask a few questions about how I can improve the model. Please let me know!

agile cobalt
#

iirc even "somewhat reads like English" requires over a day worth of training on a data center grade GPU

hollow cobalt
#

I don't think that understanding context and the structure of conversation comes down to compute.

#

In essence a large corpus of data and parameters, will take the model so far. But for adequate performance more focus on the preparation is required.

turbid field
#

hello i am trying to get annotated pictures of vehicles using fiftyone however when i am uploading it in roboflow the anotated images includes other classes instead of only 'bicycle' it also shows annotated boxes of license plate car person etc etc.. also how do i change the label name into bicycle or their orignal name since the classes name in roboflow is -m-0199g (i think this is the name for the bicycle)

thank you sorry i am new

import fiftyone as fo
import fiftyone.zoo as foz

# Parameters
CLASS_NAME = "Bicycle"
NUM_IMAGES = 10
SAVE_DIR = "./bicycle_dataset"  # change if needed

# Load 10 images with only 'Bicycle' bounding boxes from Open Images V7 validation set
dataset = foz.load_zoo_dataset(
    "open-images-v7",
    split="validation",
    label_types=["detections"],
    classes=[CLASS_NAME],
    max_samples=NUM_IMAGES,
    dataset_dir=SAVE_DIR,
    shuffle=True
)

# Optional: launch FiftyOne app to preview
session = fo.launch_app(dataset)
print(f"Dataset loaded and saved in: {SAVE_DIR}")

gentle stone
#

I like "ai for everyone by Andrew Ng". Its Basics of AI include ML, DL, type of ai and many more suitable for beginner but I don't like the slide presentation design. Also his tone of voice sometimes makes me sleepy.

gentle stone
#

Is there some kind of test about data science/ai from the company for those who don't have a degree? Or does the company need a certificate or portfolio?

opaque falcon
hollow cobalt
gentle stone
hollow cobalt
gentle stone
#

Ai started to take over a lot bro, unemployed people increased every time in my country, How about your country?

hollow cobalt
#

I haven't seen it with my own eyes, might look though

opaque falcon
opaque falcon
opaque falcon
# gentle stone Ai started to take over a lot bro, unemployed people increased every time in my ...

The same. I think its a world wide thing.

Some interesting observations from experience:

  • dedicating time to learn a highly paid skill pays off eventually
  • learning in community is better than doing it alone. Its more motivating.
  • working on practical projects is a great way to really remember and apply the concepts
  • doing it with a team of people keeps momentum and motivation up.
  • crowd sourcing notes is a plus. Everyone wins, especially when there is a question and it can be answered
#

I am starting a zero to hero study group. Is anyone interested in joining? I will be adding and crowd sourcing notes on a joint repo using obsidian, obsidian publish and juypter notebooks.

hollow cobalt
pliant swift
#

I'm vykt on github

opaque falcon
opaque falcon
pliant swift
#

I know very little math/data science, but would like to take an interest in it

hollow cobalt
#

You should make the group on discord, and I'll join.

opaque falcon
pliant swift
#

Looks really nice man. I'd like to learn some statistics too, and know enough linear algebra for very basic 3d stuff.

opaque falcon
pliant swift
#

I learned a little once to do 2d rotations, but thats it

opaque falcon
gentle stone
opaque falcon
opaque falcon
pliant swift
#

Where can I find it?

#

ah, nice

opaque falcon
#

check email invite

pliant swift
#

Just joined, thanks!

opaque falcon
pliant swift
#

It's a bit specific, but anything that can be applied to binary analysis

#

but i dont know enough to know what can be applied to binary analysis, so really im up for anything!

#

learning statistics will find its uses even in unexpected places im sure

hollow cobalt
#

My user name: 1stest

gentle stone
#

Let's connect!

short moth
#

anyone know why i am getting an extra column (filled with NaNs) when I concatenate dataframes?

agile cobalt
short moth
#

ok

#
import pandas as pd
import os
folder= r'Documents/ARINC-429 Datasets'
files = os.listdir(folder)
df_obj = []
for file in files:
    df_obj.append(pd.read_csv(fr'Documents/ARINC-429 Datasets/{file}'))
updated_df_obj = []
for i in range(len(df_obj)):
    updated_df_obj.append(df_obj[i].assign(BUS_NAME = files[i].replace('.csv','')))

full_df = pd.concat(updated_df_obj, ignore_index = True)
full_df
#

so I have a list of csv files. I want to concatenate them into 1 dataframe, but I also want to add an extra column called BUS_NAME with the name of the respective CSV for organizational purposes

#

so I tried that and it seems to work. only that it created a random column called BUS NAME which I never said to create

#

it could be that one of the csv files already had that column

#

let me check. I may have found my error

#

yes that was the issue

broken shadow
serene scaffold
# broken shadow help

what is the value for "path"? please give the answer as text rather than a screenshot

broken shadow
# serene scaffold what is the value for "path"? please give the answer as text rather than a scree...

sry, kinda new to coding

# Download latest version
path = kagglehub.dataset_download("andrewmvd/medical-mnist")

from kagglehub import KaggleDatasetAdapter

train_data = kagglehub.datasets.dataset_load(
    adapter=KaggleDatasetAdapter.PANDAS,
    handle="andrewmvd/medical-mnist",
    path=path
)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/tmp/ipython-input-21-3633857365.py in <cell line: 0>()
      1 from kagglehub import KaggleDatasetAdapter
      2 
----> 3 train_data = kagglehub.datasets.dataset_load(
      4     adapter=KaggleDatasetAdapter.PANDAS,
      5     handle="andrewmvd/medical-mnist",

2 frames
/usr/local/lib/python3.11/dist-packages/kagglehub/pandas_datasets.py in _validate_read_function(file_extension, sql_query)
    106             f"Supported file extensions are: {', '.join(SUPPORTED_READ_FUNCTIONS_BY_EXTENSION.keys())}"
    107         )
--> 108         raise ValueError(extension_error_message) from None
    109 
    110     read_function = SUPPORTED_READ_FUNCTIONS_BY_EXTENSION[file_extension]

ValueError: Unsupported file extension: ''. Supported file extensions are: .csv, .tsv, .json, .jsonl, .xml, .parquet, .feather, .sqlite, .sqlite3, .db, .db3, .s3db, .dl3, .xls, .xlsx, .xlsm, .xlsb, .odf, .ods, .odt
serene scaffold
broken shadow
serene scaffold
wet dome
#

I'm trying to see how good my linear regression class is by measuring it's performance. I calculated the mean squared error (MSE), root mean squared error (RMSE) and the R squared value (R2)

#

Are these the metrics I should focus on to measure performance or is there something else I could do too?

#

Just trying to figure out how to see how good my model is

prisma forge
#

Hello, is this where I can ask questions about getting help organizing a report by the data inside of it?

opaque falcon
opaque falcon
thick heron
#

Guys

#

I want to improve my little experiment i did

#

Help?

#

I posted it on reddit

glacial root
#

hey, let's say i wanted to create a small text processing library similar to nltk, for each module, for example the tokenization module, would it be better to implement it as a class or as a group of functions

#

i thought it wouldn't make a difference but not sure, just wanted to know which is better for library design

gentle stone
opaque falcon
gentle stone
opaque falcon
#

Maybe something that researches alternative asset classes in emerging markets with good fundamentals for investment

#

Earns dividend or has substantial capital appericiation. How to shift through these types of equities.

gentle stone
#

If we are not able to make big projects, I think we can start from simple money management apps/website first

wet dome
#

After normalising your data say via min-max normalisation, is it normal to get different MSE, RMSE and R squared values?

fleet token
#

Hi, i hope this fits in this channel.
For my homework i have to estimate the quality of a respiration signal calculated from an ECG/PPG signal in reference to a signal recorded with a respiration belt. Those signals are basically very noisy sinus/cosinus waves, so our professor gave us the hint to do it with the cross-correlation method extracting the highest coefficient and eliminating the phase-shift through that. Seems straight forward and i already tried it with the scipy signal implementation of correlation ( explained here: https://www.scicoding.com/cross-correlation-in-python-3-popular-packages/), but i somehow can't get the proper alignment of the signals to work.
Any help on this issue would be very much appreciate, bc i'm just no the brigtest, if it comes to statistics^^

lapis sequoia
#

When people talk about fine tuning LLMs, they are not talking about Bert, or T5 or encoder, encoder-decoder models right? And are they fine tuning autoregressive models from hugging face using torch? And what models and why?

lapis sequoia
opaque falcon
lapis sequoia
serene scaffold
#

Imo, the first L in LLM should be dropped and replaced with "interactive" or "generative"

#

Like, the number of parameters in "large" language models can differ by orders of magnitude now.

plucky pebble
#

hello everyone, does applying DML to smoothed/penalised Local Projections sounds like it could work to provide confidence intervals?

#

that are accurate

#

the motivation is to keep the reduced variance of smoothed estimators whilst mitigating the bias introduced by shrinkage for more accurate coverage to measure uncertaintty

weak oxide
#

Fred API is ok

#

yfinance isn't really that good with SEC filings

#

yfinance is fine for stock prices daily

#

For other data, I learned how to webscrape and use IB API

lapis sequoia
#

Specifically stocks. They have ETFS, crypto, FX pairs/currencies, and other stuff as well. I was literally just replying to an answer off of the top of my head.

weak oxide
#

I'm in contact with the guy behind it

#

Really cool

rich moth
#

What if I said "reasoning" is not a mysterious emergent property of brains, but as a measurable , mathematical property of information itself...

From John Wheelers, "It from a bit" and information therorist like Shannon and Kolmorgorov, it can all be worked into a practical working system.

Ever think why complex information organizes itself in complex spirals of information? DNA double helix, shells, cosmic formations ? There's something fundamental here. Like think of black holes and their natures. Complex spirals of information, quantized.

It's the same reason quantum entanglement even works. The same law is fundamental to reality itself. Thats why it can work for vast distantness'. It's all connected via math.

Im rambling here, because I made a new type of AI that can reason, using pure math, information theory. Complexity reasoning really.

#

Curve fitting a black box is not the future it seems

small wedge
#

is it open source?

#

because, no offense, but the whole "spirals of information, quantum entanglement, reasoning" line tied together reads a bit schitzo XD

rich moth
# small wedge is it open source?

I one HUNDREDED precent get it, man.. It sound so far off and crazy. I dont make these profound insights without weeks of validation work. I'm not psycho.. I'm very well grounded in my work. Validation projects these days are the core of my work.

iron basalt
#

Are you referring to Solomonoff induction?

rich moth
#

DL took a wrong turn somewhere in 2010-2015. I feel likt its more explotive in nature than true discovery

iron basalt
#

And what is "true discovery?" How does it apply here?

small wedge
rich moth
#

Logically the system should work better with MORE data, not the opposite.

opaque condor
#

Does anyone know where I couldn't find high quality images for conversion

rich moth
#

I guess the best way to say it is modern AI finds correlations, mine also finds structure thats measurable across domains.

iron basalt
rich moth
iron basalt
rich moth
#

Give me a dataset and ill run the the AI on it and share results

#

You control the experiment...

iron basalt
#

Ok, a simple "hello world" for AI is the classic Wumpus World.

rich moth
#

I'm not trying to "pull the wool" over anyones eyes, Im just asking for engagement.

rich moth
iron basalt
# rich moth

Can you show some other cases? The original layout as shown in the link?

#

It also did not kill the wumpus, which is the other objective.

#

"The objective of the game is to kill the wumpus, to pick up the gold, and to climb out with it. "

#

To leave it needs to return to the start and do the climb action.

rich moth
#

Solved it in 19, 20 steps?

#

I love these test honestly.

iron basalt
rich moth
#

Sorry for typos, you get it. But ill refine it

#

Thank you. @iron basalt

rich moth
# small wedge is it open source?

I get it.. What is crazy? And What isn't? Sometimes its definitive to define other times it can be hazy. But have you considered the substance for such a claim against someone's own self worth? Like I just wonder why you jump so righteously to a claim of mental unawareness?

Yet I just think you are "unaware" in the matter . Does that make me "schitzo"?
Or does it make you a bully?

really up to the observer.

worldly dawn
sage sorrel
#

yeah I think there are

worldly dawn
#

where is the code so we can run it and analyze it?

sage sorrel
#

IMBD movie review sentiment analysis

#

Mnist Handwritten digit recognition

#

There are several experiements

#

that can be analayzed and reproduced

worldly dawn
sage sorrel
#

lol nvm

#

I would like to see the code as well

#

so we can run it

opaque condor
#

where's the best sites for good high quality images?

small wedge
# rich moth I get it.. What is crazy? And What isn't? Sometimes its definitive to define ot...

To be clear, schitzo is just a turn of phrase, im not actually accusing you of being mentally ill. Im saying it sounds made up. My intention is not to bully you, but to make clear how outrageous what you are saying sounds to me.

That said, I have asked if the code is open source, and for an explanation on how your new AI differs from DL as you said. You've provided answers to neither, so I remain convinced that the grandiose claims you made are not true until you provide some evidence otherwise.

Im happy to change my view if you provide evidence for it.

rich moth
#

Like do you want HQ magic cards? pokemon cards? medical data images?

rich moth
opaque condor
#

I want any type of image just in case I want something else
My friend is staring me to make a AI for anima recognition + character recognition

small wedge
#

I wouldnt say you could find any/all images there, but hugging face and google datasets are always good places to start looking for clean, high quality datasets; especially when it comes to popular topics.

#

Almost certainly some anime datasets between the two

opaque condor
#

I'm also trying to learn to make my own just in case you know I can't find one or it's for a specific job

opaque condor
old lodge
#

Ok so I got a weird request. I would like someone to recommend me a topic/idea for capstone project of Data Science Diploma for me to do, I can't get any ideas after thinking for few days...

small wedge
#

ooh that's neat

opaque condor
opaque falcon
#

Anyone familiar with discord servers for learning machine learning? Say at aimed at beginners and intermediate folks?

lapis sequoia
#

Is rag that intense? And can all of these autoregressive tasks be done without lang chain? Just good old torch and Hugging Face.

serene scaffold
lapis sequoia
#

Retriever prompt template IR is just information retrieval that is use for the LLM to generate responses, right? You don’t fine tune anything

#

It’s like a searching engine

lapis sequoia
#

I just did it and now I feel rad

verbal oar
#

rag is for reducing hallucinations and improving quality of answer

lapis sequoia
#

I thought it was like the hardest thing of all time because I heard someone say that so it must have been true

lapis sequoia
weak oxide
#

For YouTube machine learning

#

I suggest to not immediately go into the code

opaque falcon
#

Any resources on how to read an ML paper for new folks?

lapis sequoia
# opaque falcon Anyone familiar with discord servers for learning machine learning? Say at aimed...

What do you mean intermediate? Avoid NeuralNine he is trash, he shouldn’t talk ever. Most YouTubers are garbage. Just read just do everything everyone says to do that is consistent. I guess Sentex is good, deep lizard is attractive and those videos are good but her boyfriend deadlifts less than me so I am personally indifferent. Really, statquest is good. ISLR (the textbook) is the Bible. The NLP Stanford book is great, the Python book for machine learning is extremely outdated.
Honestly, just read libraries and the other suggestions. You have to like this. Because it is time consuming

lapis sequoia
#

And Corey shrader if you watch Python tutorials. you really just need him honestly. And the torch textbook is very good.

serene scaffold
#

I also like to write on them on my tablet. I have a color coding system.

opaque condor
#

Where's the best area to get images for convolution like I mean a good website for their free high quality and have almost every image that you may need

lapis sequoia
gentle stone
lapis sequoia
#

You guys ever use Gemini and it ask for your SSN?

serene scaffold
woeful harness
#

Hello everyone, I need a little help. I'm building a ranking system for businesses based on features like distance, rating, cost, workload, completion rate, and total projects. I don't have any user data, and I need a way to rank businesses effectively. I have also tried MCDA (Multi-Criteria Decision Analysis).
so the problem i am facing is : while ranking, I want to give newer businesses those that haven’t had many chances to provide services yet slightly higher rank for a limited time to help them get exposure. How can I solve this problem?

fallow coyote
past meteor
fallow coyote
brazen willow
#

Hey gang, so my AI response is returning incorrect json data, i.e "total": 10 but the individual items that comprise the total does not equal 10 if you add manually

What I did was implement a retry logic until the individual items have the correct total -- is there a better alternative than just retrying? Am just using an off the shelf model, not sure if engineering my prompt better will make the responses more correct

lapis sequoia
#

hey yall, almost finished perian data's course on udemy, where can i start learning python numpy and pandas

past meteor
#

So if you really really want to read them, look them up there

#

The running joke is that the average ML paper is "Bayesian interpretation of <method>" if you don't know <method> or have a solid grounding in bayesian statistics, reading that paper makes no sense imo

woven prairie
#

Hello everyone , is there anyone who has worked on rag based chatbot or any kind of rag based application.

serene scaffold
woven prairie
#

I want to discuss that thing in a little brief so I just asked, if someone has worked.

serene scaffold
woven prairie
#

I have project to make

#

Basically this project is focused on writing blogs for medium and other blog websites

#

So basically user want that he writes his topic and he can get the content for his post

#

Like title , body header what images should be included

#

Now the important thing is

#

He wants all the content to be extracted from the docs

#

He is having multiple docs from where he wants to make this content

#

First thought In my mind was about using rag , rag based chatbot

#

But he says the content should be written in such a way that it should be seo optimize

woven prairie
#

?

past meteor
woven prairie
#

Entirely docs

past meteor
#

What docs exactly?

woven prairie
#

Docs regarding yoga

#

User will upload it's docs and he will ask the blog based on the docs

past meteor
#

You can try prompting it heavily to use a specific tone of voice, and/or put entire examples in your system prompt

woven prairie
#

I think you got what I am trying to say

past meteor
#

The risk is really that things written by LLMs really have this vibe

woven prairie
#

Yeah

rich moth
visual goblet
#

vizz wizzs out there, anyone has an idea how to make a barplot with a broken Y axis so that bars with values much larger than the rest dont ruin the plot by making all other bars look tiny?

jaunty helm
visual goblet
jaunty helm
rich condor
#

Sorry for reposting, need some suggestions and #1035199133436354600 threads are a bit too short lived

Let's say currently there is a test build of a project which calls OpenAI. It then calls a library that indirectly calls OpenAI, which will decide on a series of documents (html, pdfs) to download from the internet and then does some logicking process based on the corpus to derive an analysis (let's assume it is a .doc or .md, doesn't really matter here).

The current process is inefficient because

  • it downloads the 'corpora' into memory, so the entire process just gets killed off if there is an error
  • the library being used is a bit opaque and there is a need to inject some observability tooling (traces, metrics, logs) to see where it is 'choking'

Need some suggestions on how you would proceed in this scenario, or any advice

fallow coyote
#

is it recommended to use git as version control for jupyter notebooks or should I use git for other proejcts

robust granite
#

I'm confused, shall I do more kaggle or leetcode.

buoyant vine
#

At the end of the day the jupyter files are just JSON files

fallow coyote
#

im using pyhcarm community edition so how would i go about using git? ive also got git bash installed

lapis sequoia
#

how beta is trainable? does anyone know a real use case for this activation function? or is it obsolete/irrelevant

buoyant vine
fallow coyote
#

Its just with pycharm, even though Ive been using it for years now, it does feel quite bloated. I might think about going towards using something vim/neovim or some other simple ide. I saw as well someone suggesting using jupytertext to convert a notebook to python and linking the two files

jaunty helm
# fallow coyote is it recommended to use git as version control for jupyter notebooks or should ...

every run it changes some metadata or smthn and I find it really annoying to look at, but yeah you still can use git
now I mostly just rock a .py file with the vscode extension which allows you to do this

# %%
# this starts a new chunk like in jupyter
print('hello!')

# %%[markdown]
# # Title
# you can write markdown too, and you can export to a ipynb if you need
```I think `marimo` has also been getting traction as a notebook alternative
safe agate
#

marimo notebooks are .py files and version well with Git

serene scaffold
#

@safe agate do people ever say "marimo jupyter notebooks"?

#

hopefully not

safe agate
fallow coyote
#

Might give marino a go. Ill stick with Jupyter for a few projects and then to marimo

#

Apologies if this is off topic but it says 'no more JSON merge conflicts'. Why is it with JSON theres issues with merging

serene scaffold
calm thicket
#

marimo is great

fallow coyote
#

Ive looked on the marimo website and I actually quite like the look of it. I can see why people would want to use marimo over jupyter, particularly for version control. I'm going to keep version control for my project super simple as possible for practice and then, maybe the next project I do, switch to marimo and see how I like it. Its a good thing they ahev a feature where i can covnert my jupyter notebooks to marimo notebooks

runic parcel
#

Hey guys, i was using a qwen and llama model for the ocr of my screenshot, its working great but the resonse is like 7s. And i want it to work in less that a second. How can i do something like this?

serene scaffold
lapis sequoia
#

with LLMs, is there any point in even writing code? Like, Gemini made a movie of a story I wrote ten years ago and it was very good.

agile cobalt
#

someone still needs to tell the LLM what to write, and for the foreseeable future be able to diagnose and identify what went wrong when it fails to do what you asked it to

not to mention that humans are terrible at defining precise requirements, and much of the time the llm's only context about your problem are the requirements you give it

lapis sequoia
# agile cobalt someone still needs to tell the LLM what to write, and for the foreseeable futur...

I am happy that I went in complete order, starting from linear regression (years ago), to LLMS. I did not touch a LLM until I fine-tuned Bert, T5,Bart,Roberta, and that was after going through all RNN's which was followed by CountVectorizer and TfidVectorizer, which came after regular experessions . I went in complete chronological order of tasks. I did not jump on the latest trend one single time.

agile cobalt
#

I don't get how that is relevant to what was said before?

#

or you meant that as the start of a new topic?

lapis sequoia
#

Sorry. Just meant that people go straight to LLMs instead of actually understanding what they are doing

lapis sequoia
runic parcel
serene scaffold
agile cobalt
past meteor
agile cobalt
lapis sequoia
fresh sluice
#

anyone up ?

runic parcel
fresh sluice
#

Anyone who can tell me what are the necessary skills that one must know being in AI ML domain

Like, there are a lot of things one can know like keras , computer vision , image classification , transformers and so on but what are the ones i need to focus on ?

In simple , what as an AIML student are the things i should know and make projects on specifically

fallow coyote
#

Maths. Genuinely, if your maths is not up to par, you won't be able to understand any of the tools they use in ML

fresh sluice
#

Can you tell me some good repos or sites from where i can get some projects / ideas as well?

fallow coyote
# fresh sluice alright then

Just sesrch up machine learning guided projects and therell be a github link to a repo that has a ton of complete ml projects. But like I said, learn the maths first. Thats far more important than learning to use the tool

somber willow
#

hi does anyone know on the requirements of maths in ai

clever karma
somber willow
clever karma
#

Calculus

#

Probability & Statistics

#

Graph Theory

runic parcel
#

I want to capture the live screen and get the OCR from that, that’s possible right? Using VLM like Qwen ollama

grand minnow
runic parcel
#

and the crop by setting the coordiantes

#

and provide the VLM on that part

grand minnow
somber willow
#

guys what do you recommend about steward's calculus for ml and data science

serene scaffold
#

how did you arrive at this error without knowledge of programming?

#

if you're having an issue with Google Earth, and you're not trying to write Python code, it seems that question isn't on-topic for this server. You'd need to contact Google support.

lapis flax
#

I'm working on building a neural net via Pytorch but I need to run it with an optimization algorithm that is more complicated than the in-built functions in Pytorch; does anyone have experience with doing this, or can you point me to some guides focused on this?

#

I've found a couple of sources where people enter 'no_grad' mode to manually compute their gradient descent steps, which is helpful, but it would be nice to see more in-depth sources where people do this because they need to do an optimization that isn't built-in to Torch

raw hare
lapis flax
raw hare
#

I just read the docs for nn.optim, you can write a custom optimizer by inherit torch.optim.Optimizer

#

and you can update the parameter

#

base on your need

lapis flax
#

i’ll try to play around with that i guess, i’d just rather inherit like sgd then customize what happens during the step

ocean fiber
#

hi people\

#

i need some help

serene scaffold
# ocean fiber i need some help

any time you need help, imagine that someone has agreed to help you and that you have to tell them everything they need to know to start helping you

ocean fiber
#

so it not that it really hard to explain because yes it is ai but i am trying to make a 3d model move and interact with my ai code the eyes move mouth open when texting the hole heads can move around all in pycrame so i dont really know how to start and i was looking for some ideas if any of you can help

urban basin
#

The one above me hasn't showered in 2 days

serene scaffold
#

!warn 1339137495585132598 Trolling is not permitted in this server.

arctic wedgeBOT
#

:incoming_envelope: :ok_hand: applied warning to @urban basin.

opaque condor
#

Is a .PNG good or .JPG

tidal bough
#

yes

opaque condor
#

Because I'm using a web scraper I might be able to use two and I need to make a filter for a specific type of image

haughty oxide
#

could i get help with this

tidal bough
#

on windows you'd generally use python or py

serene dew
#

I have 5 research questions given by perplexity when I fed it research gaps

#

anyone here able to validate if they are good or not? I am not an expert, I just found gaps but Im not entirely able to understand them

serene scaffold
serene dew
serene scaffold
serene dew
#

iLl just send as a doc

serene scaffold
#

You can't do that

serene dew
#

wait

#

oh

random sun
#

I need quick solutions for this : I am working on muzzle detection project , I have annotated and using yolo v8 -seg model I have trained . With inference I have got output image with muzzle of cattle part correctly where I have done t shaped muzzle segmentation. Now for further classification model I need only that part in image , how will i use only that part in image , because if I use full image left out region will be noise and accuracy will be low , now how I’ll I do this to only take region of interest part in an image

lost lion
#

hey there, i'm currently trying to make a wake word detection model detecting the word "Labeeb", an arabic word. i tried to use openwakeword's tool that is made in google colab, it actually made the model, but when i tried it, it was having lots of FPs. I thought maybe it is because the tool isn't made for the arabic language, because it seems like the model didn't train on arabic negatives, but only english negatives. so, can anyone tell me how to retrain this model on arabic negatives? maybe using the same tool? or should i make a script for it?

crimson girder
#

im just starting to get into data science and deep learning with python
i have heard about two libraries, tensorflow and pytorch, but im not really sure about the differences
im not looking for opinion, just facts about which is better for what circumstance

agile cobalt
#

if you're building on top of the work of someone that used tensorflow, you may want to use tensorflow
if you're building on top of the work of someone that used pytorch, you may want to use pytorch

and it would happen that most researchers have been using pytorch over tensorflow in the last few years

odd meteor
#

You can use TensorFlow still for DL, however it will get to a point where TensorFlow won't be of much help anymore, specifically, once you get to LLMs.

Starting with PyTorch will do you a lot more good.

odd meteor
serene dew
#

out of Distribution generalization

#

its kinda LLM tho..

odd meteor
serene dew
#

ok here is first question

#
  1. How can we systematically quantify and minimize test set contamination in large-scale OOD benchmarks?
odd meteor
#

If you're trying to "safeguard" the research idea, that's understandable. But honestly, if it's a truly good and publishable direction, chances are someone else has already thought of it — or is already working on it. What really matters is execution, not just the idea.

lapis sequoia
#

Hi

odd meteor
# serene dew 1. How can we systematically quantify and minimize test set contamination in lar...

This isn't my area of specialty but here's the direction I'm thinking

  1. I think we can use these two statistical approach to quantify test set contamination. Kolmogorov-Smirnov test and KL Divergence.

  2. For detecting the contamination, if this LLM is multimodal, then maybe, CLIP (Contrastive Language–Image Pre-training) from OpenAI might be useful.

  3. Perhaps, SimCLR (Simple Contrastive Learning of Representations) can as well be helpful in this case if you want to approach this using Self-Supervised learning.

exotic star
#

any tips on how to get trough mathematics for ML easier and actually get a grasp of everything?

#

i kinda skimmed trough some pages and did some reading but it hasnt been going too good

#

in 17 so i havent learned math of that level yet

#

also is using gpt to explain alghoritms concepts formulas.... a good way to learn or can he hallucinate?

limpid zenith
turbid field
#

for training images on vehicle classification to create machine learning model, how many images instances per class? thesis level

#

also, may i know what website or any offers image annotations?

odd meteor
turbid field
#

since we have a total of 13 classes

odd meteor
odd meteor
exotic star
#

I should first go trough khan academy and after i get basic idea of everything, get a math book with problems and do them in a seperate notebook for each subject?

#

also should I spend all the time in the math part or also learn about the python libraries requierd for ML?

turbid field
serene scaffold
#

@odd meteor I hope you're doing great flag_dc 💚 🇳🇬

odd meteor
narrow tiger
#

is there any hybrid recommender system with minimal frontend available online?

spring field
spring field
#

for datasets you can check out HuggingFace

glacial root
#

hi @serene scaffold, i just wanted to make sure since i often see differing answers to this question, how do byte pair encoding and word piece deal with spaces?

#

do they still include them as characters?

serene scaffold
glacial root
#

sorry, i pinged you because i thought you were experienced with nlp

serene scaffold
#

Even so, unless it's a question that's literally only for that person, please direct your questions to the channel generally. No one is on call to answer questions about their area of expertise.

glacial root
#

oh alright, sorry for pinging

serene scaffold
#

I'm actually not sure what byte pair encodings are.
Word pieces can't have spaces that separate them from the "main" word piece. Unless maybe the tokenizer has rules to handle that.

glacial root
#

am i correct here about word piece?

#

and i'm not quite sure what you mean by main word piece

serene scaffold
glacial root
glacial root
turbid field
#

just to many images to annotate

ivory umbra
#

if im labeling 10k+ domains and plan to train it to a model, would it be better to just go straight to postgres (supabase) just in case i scale more later on or is sqlite enough?

grand minnow
ivory umbra
runic parcel
#

Guys i am doing ocr using a vlm, but instead of doing in on the image i want to do it from the live screen. i will be seeing something on my pc and it should do ocr on that, how can i do it?

#

Also the vlm and respond in .07 second.

earnest sleet
#

Hello! Very basic question here, feel free to just point me somewhere: Why exactly is Python the language of choice for data science and such?
Couldn't find anything but corpo LLM slop on the topic online

serene scaffold
#

It was never intended to the the DS/AI language

earnest sleet
#

Alright, thanks. IT's all about the tooling in the end :)

serene scaffold
#

Tooling is stuff like IDEs and type checkers

iron basalt
# earnest sleet Hello! Very basic question here, feel free to just point me somewhere: Why exact...
1. You can just make a new file and go (no need to learn what a compiler is and setup a project).
2. The syntax is not scary, it's straight forward / boring / not symbol soup.
3. It's a memory safe/garbage collected language.
4. The garbage collection in combination with operator overloading allows for some high level math-y code to be written.
5. Most importantly, it has a massive and ever growing library of modules available online for every task imaginable (it has momentum at this point, and no matter how good your language design is, nothing beats having the code already written for you by someone else).
soft star
static pasture
final dock
#

hey can anyoe guide me in machine learning / deep learning for python , I have intermediate knowledge of python(Upto OOP)

rancid thorn
#

I want to implement a neural network for the creatures in my game and i found this library called PyTorch and i was wondering if its good for many small NNs or if its better to create them from scratch

#

Also I've tried developing many AIs before (last one was a 2048 AI) but it never worked out. It actually seemed to get worse over time, but i cant understand why

jaunty helm
rancid thorn
#

For example my idea was that it would take how much water it has, how much energy and other things and then would output how much to grow, to deepen its roots, to spread seeds etc

jaunty helm
rancid thorn
#

Yeah thats what im currently doing but i wanted it to be more advanced

#

Also this way it might actually be heavier on my pc because its gotta store a lot of variables and actions

jaunty helm
jaunty helm
jaunty helm
rancid thorn
#

Idk it has to store a variable for each action twice, like the variable for when its wet and the one for when its dry. So it would be actions*vital variables**2

#

So it would come to be like 40 float point variables

jaunty helm
rancid thorn
#

Alright then, thank you for helping me

wet dome
#

does anyone know when using numpy if you can use type hints to show the shape of an array, is there a library which can let you do this? e.g.

x: np.ndarray[3,4] # array of shape = (3,4)
serene scaffold
opaque condor
#

Can I mix PNG and JPEG files together?

serene scaffold
opaque condor
#

So under the same label

serene scaffold
opaque condor
#

A mixture of jpegs and pngs

balmy yoke
#

is this the machine learning server

serene scaffold
serene scaffold
balmy yoke
#

ok thanks i have a question that isnt hard enough to go on posts i think can you help me

serene scaffold
opaque condor
#

The photo itself let's say if I had a picture of a tomato or a pear I would take them and put them into separate categories for that specific fruit
Under classification or a folder that acts like the label

balmy yoke
#

do you happen to know simple linear regressio

#

like 3 variables only

serene scaffold
#

@opaque condor when you pass an image to a model, what is the literal thing that that image is in your python code?

balmy yoke
#

for some reason i keep getting a message that"reg" is not defined even though it is idk what im doing wrong i even asked ai but it keeps saying that even after i do the suggested corrections

serene scaffold
#

You can the cell after it

balmy yoke
#

tried that

serene scaffold
#

If you defined reg, you're guaranteed to not get a name error

#

You might get an error, but it won't be that one.

balmy yoke
#

so run this line again? :reg = linear_model.LinearRegression()
reg.fit(df[['ex', 'test_s', 'interview_s']], df['sal'])

serene scaffold
#

For the first time actually

#

@balmy yoke what's the new error message, if you got one?

balmy yoke
#

i didnt get a new one

#

mabye is there a way i can share my code with you

serene scaffold
#

Yes, by copying and pasting it

#

!code

arctic wedgeBOT
#
Formatting code on Discord

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

For long code samples, you can use our pastebin.

serene scaffold
#

^ like this

balmy yoke
#

im in jupyter so its harder

#

i sent you a screen recording also

#

do you want my dataset as well

serene scaffold
#

No, just copy and paste the relevant code and text output

#

Anything else would be harder for other people to read

balmy yoke
#

alright

#

should we open up direct message so everyone doesnt get pinged or whatever

serene scaffold
#

No, in this channel. No one else is getting pinged.

balmy yoke
#

i will send each cell seperately to make it more clear

#

from sklearn import linear_model
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

#

df = pd.read_csv("/Users/srihari/Desktop/ml/variateLR/job.csv")
df

#

df.test_s.median()

#

df.test_s = df.test_s.fillna(df.test_s.median())
df

#

everything worked here

#

reg = linear_model.LinearRegression()
reg.fit(df[['ex', 'test_s', 'interview_s']], df['sal'])

#

reg.predict([[2,9,6]]) : 1 reg.predict([[2,9,6]])

NameError: name 'reg' is not defined

serene scaffold
serene scaffold
balmy yoke
#

?

serene scaffold
balmy yoke
#

i used the pastebin but it added stuff to the code idk

serene scaffold
#

The first part tells you how to format code on Discord.

#

"long code samples" are like, whole files.

balmy yoke
#

i did

serene scaffold
#

You didn't follow the bot's instructions for formatting code on discord.

Let me know what happens when you move the code that defines reg into the next cell and re run that cell.

balmy yoke
#

oh wow it wokred

#

what changed when i moved that code?

serene scaffold
#

What's the best explanation you can think of, based on what the error message said and what we've discussed?

balmy yoke
#

it finally ran the code where it defined reg idk

serene scaffold
#

What did the error message tell you?

balmy yoke
#

that reg was not defined

serene scaffold
#

Right

#

If a variable is defined, you can't possibly get that error

#

You might get a different error saying that you tried to use it in an invalid way

balmy yoke
#

yeah also this message alwyas shows up btw

#

but it still shows the right output

#

/Users/srihari/.pyenv/versions/3.13.5/lib/python3.13/site-packages/sklearn/utils/validation.py:2749: UserWarning: X does not have valid feature names, but LinearRegression was fitted with feature names
warnings.warn(

serene scaffold
#

That would require a deeper dive than I can do at the moment--I'm on mobile

balmy yoke
#

ok but thanks for the help

serene scaffold
balmy yoke
#

ok thanks

fringe jay
#

RAG queries bring back unrelated information, or dont bring anything back for some reason, what are some ways to optimise RAG queries?

wanton viper
#

Hi, I am Muhammad mustaqeem, 17. (UAE)
I want to get into tech and will be starting my Bsc program in AI & Daa Science in sept 2026. I want some advise on how to begin the self study journey as even majority of the uni studies are going to be self so i want to start from now basically. Alongside, i plan to join for basic IT role internships and part time jobs. Could anyone share some advise and practical experience on how to begin and especially what all do i need to know to get into any junior level IT role. (Currently learning python but haven't made any major projects ).

#

Data*

short moth
#

short read for both shouldn't take you more than 1 month

wanton viper
#

Alr. Also, as i said i haven't made any major projects, so i am confused on whether i should first build software development skills or just straight off python for data science then whatever follows along.

short moth
#

you can do both probably

#

in the end you will build software development skills as you learn data science and ai i think

#

although, I do not know for sure, I am not a full on software developer, I just use pandas for whatever I need in my field

#

i have a question for the pandas experts. I have some data I made here (2nd image) and I modified the data to remove the numbers so that I get this (1st text)
:

Process    Module    Hosting Environment
0    Platform    DPM    Core 1
1    UA Consolidator    DPM    Core 1
2    Traffic Server    DPM    Core 1
3    Flight Deck Window Manager    DPM    Core 2
4    Configuration Manager    DPM    Core 2
...    ...    ...    ...
144    Policy Handler    INSU    Core 7
145    ACMF    INSU    Core 8
146    AOIP    INSU    Core 8
147    Bluetooth UA    INSU    Core 8
148    SXM UA    INSU    Core 8

#

now i wanted to groupby the Module and then get the counts of each process for each module

#

so i did


updated_df = (df.assign(Process = lambda df_: df_.Process.str.replace(r'[0-9]', repl = ' ', regex = True))
              .assign(Module = lambda df_: df_.Module.str.replace(r'[0-9]', ' ', regex=True)).loc[:,'Process':'Module'].groupby('Module')
              .value_counts()
             )
#

I exported to excel and for some reason, I still have duplicates in the data!

#

why is this?

twilit topaz
#

What are some good real life data sets to learn with. I heard of Kaggle but I watched a video saying I need to analyze with real datasets

#

I don't mind if it's unclean either

short moth
#

it seems that it groups the elements correctly I do not understand why it duplicates some elements

serene scaffold
short moth
#

even doing something like this:

#
updated_df = (df.assign(Process = lambda df_: df_.Process.str.replace(r'[0-9]', repl = ' ', regex = True))
              .assign(Module = lambda df_: df_.Module.str.replace(r'[0-9]', ' ', regex=True)).loc[:,'Process':'Module'].assign(counter = 1)
              .groupby(['Module','Process']).counter.sum()
             )
#

yields the same thing

#

having just 1 next to each module and process same result

#

i wonder why it does this it must thing that these mappers are not equal

serene scaffold
#

to be clear, you wanted to remove the trailing digit from each value in Module, and then get the value counts of each (Process, Module) pair?

short moth
#

i didn't know but I used .strip() and removed it and it works

#

so the groupby did not treat certain equal elements as equal

#

but with .strip it works well

serene scaffold
#

you could have also done .str.replace(r"\s*\d+$", ""), or something

short moth
#

yes i added that in as well

woven prairie
#

Is there anyone who have worked with open ai key

#

I tried to increase the max token in payload but still my output is not even more than 1500-1600 tokens

#

Basically I am trying to tell the model to write the html css code to make a poster or similar to poster

#

I know to make it one it writes a lot of line code and it will take a lot of tokens but I don't why it is not writing

agile cobalt
#

the only thing max tokens does is cut the response short if the model still hasn't finished its response before hitting the token limit

#

the model will not make its responses shorter nor longer based on it, it'll just be cut midsentence if it crosses the limit

woven prairie
#

But the gpt model has a 128k token limit

agile cobalt
#

if it is finishing before the max tokens you set, that means the model considered its job done
either use a better model or prompt it better

woven prairie
#

My task is to make a poster and a pamphlet using some data

#

.

#

I am using gpt4o

agile cobalt
#

prompt it better then

woven prairie
#

You have worked on a prompt can you help me

agile cobalt
#

if you're expecting for the model to make SVG graphics, there is a huge chance it just straightway won't work

woven prairie
#

No some where poster

agile cobalt
#

which kind of poster? can you show an example?

woven prairie
#

Ok

#

Something like this

agile cobalt
#

how do you expect for it to handle the images 🙃

woven prairie
#

Leaving the images

agile cobalt
#

also I can barely read much of the text in that image, you're probably asking too much from the LLM

I'd recommend using a model fine tuned specifically for that, preferably create one yourself if you have some (>100, preferably >1000) data pairs of HTML that generates a given image

woven prairie
#

Ok that I can not do rn but let's see what most I can get

#

Thanks for your help

fringe jay
young beacon
serene scaffold
fringe jay
#

rag keeps bringing up irrelevant stuff, or not bringing up relevant stuff

fringe jay
wet dome
#

How do you measure the perfomance of a logistic regression model?

#

At the moment im calculating accuracy, i.e. how many times it got the correct label

#

what other metrics do people use

buoyant vine
#

Normally for classification, you have Accuracy, Recall, Precision.
F1 core is normally my go too for an 'overall' metric for how well it is doing

past meteor
#

Real world problems typically require on or the other, even at the expense of the other metric

#

Hence why I typically skip F1 score and/or AUC

serene scaffold
past meteor
reef spade
#

Hello guys, I am trying to use the text to speech model chatterbox to clone my own voice and read a transcript. But im struggling with the installation. I got it working on google colab but it just doesnt work on VScode. The line "from chatterbox.tts import ChatterboxTTS" only works on google colab but not on vscode. I use the same isntallation comands i did on google colab. Can someone please help me out?

serene scaffold
reef spade
#

you mean the installation portion?

serene scaffold
#

all of this as actual text

reef spade
#

PS C:\Users\Admin\Documents\visual studio code projects\streamlit_apps\app1> & C:/Users/Admin/AppData/Local/Microsoft/WindowsApps/python3.11.exe "c:/Users/Admin/Documents/visual studio code projects/streamlit_apps/app1/file.py"
Traceback (most recent call last):
File "c:\Users\Admin\Documents\visual studio code projects\streamlit_apps\app1\file.py", line 3, in <module>
from chatterbox.tts import ChatterboxTTS
ModuleNotFoundError: No module named 'chatterbox.tts'
PS C:\Users\Admin\Documents\visual studio code projects\streamlit_apps\app1>

serene scaffold
#

@reef spade

reef spade
arctic wedgeBOT
# reef spade

Please react with ✅ to upload your file(s) to our paste bin, which is more accessible for some users.

reef spade
#

it seems to have been unsucessful

serene scaffold
# reef spade it seems to have been unsucessful

remember to always follow instructions from the bot.

ERROR: Could not install packages due to an OSError: [WinError 206] The filename or extension is too long: 'C:\\Users\\Admin\\AppData\\Local\\Packages\\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\\LocalCache\\local-packages\\Python311\\site-packages\\onnx\\backend\\test\\data\\node\\test_attention_3d_with_past_and_present_qk_matmul_softcap_expanded\\test_data_set_0'

This is the issue, apparently

woven prairie
#

Can some help me in writing a advance prompt

serene scaffold
serene scaffold
#

you might also look into "windows enable long paths"

woven prairie
#

@serene scaffold I want to write a prompt for that will guide gpt4o model to write a html css code that code we make a medical poster.

#

Example

serene scaffold
woven prairie
#

Yes

#

Basically I am getting some data from the rag pipeline and then I have to create a poster out of it.

serene scaffold
#

okay. well, prompting LLMs isn't rocket science. just tell it what you want and try rendering the HTML that you get.

woven prairie
#

I have been trying for week

#

But I am not getting desired output

serene scaffold
#

what have you tried and what output are you getting?
when you ask for help, it's important that you're up front with all the information, so that we don't have to interview you to know what your question is.

woven prairie
#

So I downloaded a poster form google gave it to claude told him to write html css code of it

#

Then told what thinking process you used to make such design

#

It gave me his thinking process and I used it as a prompt

serene scaffold
#

@woven prairie I've never used claude. if I were doing this, I would describe what I want the page to look like from top to bottom, including the actual text, the size of the text, the color of text boxes, etc.

woven prairie
#

Ok

mortal ivy
#

hey guys, i need some help for a specific function in numpy.

serene scaffold
mortal ivy
#

i wanted to know how numpy.fft.fft() actually works, i went around here and there but could not understand

serene scaffold
arctic wedgeBOT
#

numpy/fft/_pocketfft.py line 58

def _raw_fft(a, n, axis, is_real, is_forward, norm, out=None):```
wooden sail
#

for vectors whose lengths are powers of 2, i believe the radix-2 cooley-tukey alg is the standard. in other cases, a prime factorization is used to determine the unique frequency components

mortal ivy
# wooden sail do you want implementation details or just basic math background?

basic understanding actually, here is the block and block after using fft --

 [-1198  -995  -916 ...   416   409   379]
 [  289   248   381 ...    -4    15   321]
 ...
 [ 3429  4607  3452 ... 13776 13738 13604]
 [13408  9697  4005 ...  5428  5389  5981]
 [ 6572  6834  6953 ...     0     0     0]]```
```blocks after fft :  [[ 2.03490000e+04+0.00000000e+00j  1.70904135e+03-1.72661480e+04j
  -6.65838677e+03+3.41195713e+04j ...  1.33128709e+03+3.95891200e+03j
  -6.65838677e+03-3.41195713e+04j  1.70904135e+03+1.72661480e+04j]
 [ 7.55290000e+04+0.00000000e+00j  6.26993787e+04+5.80008172e+03j
   9.69392154e+04-1.18256337e+04j ...  6.97357360e+04-2.70523446e+03j
   9.69392154e+04+1.18256337e+04j  6.26993787e+04-5.80008172e+03j]
 [-3.87660000e+04+0.00000000e+00j -5.49991855e+04+1.34433005e+04j
  -5.75556225e+04-4.38738493e+02j ... -5.49188931e+04-9.31813360e+03j
  -5.75556225e+04+4.38738493e+02j -5.49991855e+04-1.34433005e+04j]
 ...
 [ 4.76185000e+05+0.00000000e+00j  2.35227102e+05+1.13181706e+05j
  -1.36409643e+05+5.61953423e+05j ... -4.19050795e+04-1.94594074e+05j
  -1.36409643e+05-5.61953423e+05j  2.35227102e+05-1.13181706e+05j]
 [ 2.34655400e+06+0.00000000e+00j  7.02943400e+05-1.58216279e+06j
   6.85467802e+05+1.00273039e+06j ... -4.30924432e+05+1.08821807e+06j
   6.85467802e+05-1.00273039e+06j  7.02943400e+05+1.58216279e+06j]
 [ 4.58726000e+05+0.00000000e+00j -1.07550461e+06+7.62471567e+05j
   4.19460363e+05+1.06823585e+05j ...  4.96201820e+04-2.61617881e+06j
   4.19460363e+05-1.06823585e+05j -1.07550461e+06-7.62471567e+05j]]```
#

the way its converting arrays in 1d fourior transformation i dont understand at all, its like 1 + 2j with imaginary number

#

but here are all 'j'

wooden sail
#

so what the fft does is apply a "discrete fourier transform" by using a fast algorithm

#

in general, the output of the fft is a complex-valued vector of the same length as the input vector

#

so you do expect all of the entries to be of the form a + bj

#

from wikipedia:

#

what the fft is doing is multiplying your vector by this matrix W of the appropriate size

#

if you interpret matrix-vector multiplication as taking several dot products, and the dot product as a measure of similarity, what the fft and dft do is measure how similar your input vector is to each of the rows of W

#

and the rows of W are complex exponentials of specific frequencies

#

so what the dft and fft do is decompose your input vector into complex exponentials of different frequencies

mortal ivy
#

thats alot to digest lol

wooden sail
#

this is something one would learn maybe late first year or early second year in uni if doing engineering. maybe earlier if doing maths

#

it's not exactly "elementary", but certainly very "fundamental" or important

mortal ivy
#

hmm, thanks for the explaination. lemme digest this...

wooden sail
#

In applied mathematics, a DFT matrix is a square matrix as an expression of a discrete Fourier transform (DFT) as a transformation matrix, which can be applied to a signal through matrix multiplication.

undone steppe
#

I'm working on what I thought would be a fairly simple project: trying write a python script that will send a prompt to a locally running LLM (google gemma, specifically). The request is to categorize a csv file of bank transactions, based on a csv file with the same format where I manually categorized a couple of previous months' transactions. I thought this sort of fuzzy logic matching would be in an LLM's wheelhouse, but it absolutely fails every time. Is my prompt incomplete? (See attached images) Is there another machine learning tool I should be using instead of an LLM?

agile cobalt
#

You must give them clear instructions and preferably keep it short, avoid rambling.

Things like "must NOT do:" will frequently backfire, tell it only what to do, and at most include "without XYZ" for some specific instruction.
If it is doing something else instead, adjust your existing instructions before adding new ones.

Those file names are also extremely questionable, at least remove those "(1)"s, ideally rename to a standard format
not to mention "will likely be", "you may use any of them to draw your inferences from"... just remove that sort of hints, either let it draw its own conclusions or filter to only include the 'likely useful' categories

Lastly,

  • they are pretty damn bad when it comes to working with structured data
  • most local models can only handle so little context
    overall you must give them clear and short instructions, individual short scoped tasks, and exactly as much context as needed to complete the task if you want to get good outputs out of them
#

try picking only 2~3 examples per category, then ask it to classify one individual row with a short prompt (a single phrase)

fallow coyote
#

For NLP, is it better to use the nltk module or spaCY (i think its called)? Or is it a case of 'depending on what youre doing'

turbid epoch
#

I want to get into AI/Machine Learning/LLMS etc how can I learn (for free possibily)

serene scaffold
serene scaffold
#

!resources data science

arctic wedgeBOT
#
Resources

The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.

scenic parcel
#

can you learn RL with kaggle

turbid epoch
#

you mean a certification?

serene scaffold
tawny bison
#

do i need to learn matplotlib, seaborn and plotly, these all three for data science? and also please tell from where should i learn

#

and i think as i know how to use MATLAB it would be easier as i read the syntax

serene scaffold
#

even though matplotlib is, without a doubt, the worst python library.

tawny bison
#

lmao ok

#

but i need to learn it

#

so from where should i do

serene scaffold
#

what do you mean by "need"?

tawny bison
#

i mean i'm doing an internship/training at a govt company so they want me to make some graphs using their data

slim hearth
#

ok so my friend is training his chatbot and i want to advertise it but i dont want to be interrupting any discussion here so if u want to join in the beta testing team dm me

#

also im sry for advertising here

serene scaffold
slim hearth
#

could i ask where do i do it?

serene scaffold
serene scaffold
tawny bison
#

@serene scaffold can you help me with the resource as on yt there as 1.5hrs of videos

serene scaffold
tawny bison
#

so is there any documentation?

#

nvm got it

#

btw thanks @serene scaffold

woven prairie
#

What actually does it do , can you explain it in an easy way?

last knot
woven prairie
#

Ok

lapis sequoia
#

i had a new idea

#

you know minecraft AI

#

heres my idea

#

so you know those physics simulators

#

so people wanted to run a physics simulator backwards in time

#

so you train that minecraft AI on a reversed physics simulator footage and you getit

#

and that way you can also play minecraft backwards in time

ruby thunder
#

Guys can anyone recommend Good Data Science projects for Beginners ? Website from where I can practice the concepts of stats and learn in real time

exotic star
#

I made a khan academy account and enrolled in alg 1,2 college alg hs statistics college statistics pre calculus college calculus and other but im not sure in which order should i start and complete the courses

lapis sequoia
#
from typing import Dict
from fastapi import FastAPI,Depends
from pydantic import BaseModel
from utils.model import get_model,Model


app = FastAPI()

class LabelRequest(BaseModel):
    text: str
    
class LabelResponse(BaseModel):
    probabilities: Dict[str, float]
    sentiment: str
    confidence: float
    


@app.post("/predict",response_model=LabelResponse)
async def predict(request:LabelRequest,model = Depends(get_model)):
    category,confidence,pred_prob = model.predict(request.text)
    return LabelResponse(category=category,confidence=confidence,pred_prob=pred_prob)
#

why will this not work with bert

wary oracle
# lapis sequoia why will this not work with bert

this code assumes that the model has a predicttext method that returns a label confidence score and probabilities if youre using a raw bert model from hugging face it doesnt come with such a method by default youd need to implement that yourself also make sure the output is converted to plain python types like dicts and floats because fastapi cant return pytorch tensors or numpy types directly

twilit topaz
#

It's ez on the memory it uses

#

And it has a lot of themes

serene scaffold
#

idc if it can make a plot with a million data points using only one bit. it has the least intuitive API of all time.

twilit topaz
#

Seaborn is a good alternative but it's still based on matplotlib

#

It'll consume a ton of memory (Plotly)

#

Matplotlib doesn't to me

#

Mostly kB instead of mB

calm thicket
#

when will I ever make a plot with millions of points

serene scaffold
iron basalt
ornate pebble
#

Is it ok to start with numpy since I know python but I don't remember every inbuilt function, I usually code in C++ and Javascript, but as python is easy so its syntax is easy to remember, but still there are some inbuilt functions which I don't remember... so can I start with numpy, pandas and data visualization or should I memorize those inbuilt functions first

serene scaffold
vestal moth
#

Just curious. Are there any open source Action Recognition CV models in the wild nowadays? I’m aware of SlowFast, X3D, and ActionClip. Some of those libraries seem outdated due to download issues so I’m not sure if that field of CV is still being researched. Unless those are SOTA and difficult to improve. I assume it’s due to other models (object detection and VLMs) being popular at the moment but I wanted to get someone’s view with more experience in this than me.

rich bridge
#

im doing a competition for crypto price prediction and were given like 500k rows and 800 features, some of which are anonymized, and i have to predict an arbitrary label related to the return, does anyone have any advice for approaches to try? its my first time doing a competition like this

jaunty helm
rich bridge
jaunty helm
#

like if you actually want to learn what realistic approaches you can use, trying one of the beginner competitions will be way more helpful

rich bridge
rich bridge
#

like placed top x percent

jaunty helm
rich bridge
#

alright i’ll just submit this last run that’s going and then wrap this up probably, thank you!

pure yacht
#

not sure if this or #algos-and-data-structs is suitable but

i have a dict where the keys are human readable model names of a range of devices, and i've been provided multiple spreadsheets that contain model names and relevant data pertaining to each model

my issue right now is being able to map the names from these spreadsheets to an entry in the dict as the way each model is written will differ slightly (not always a space between its name and generation, casing, + symbol instead of 'Plus', etc). i've tried using difflib but it's not accurate

does anyone know a good solution to make mapping these values consistent and correct? i have a vague idea where i convert each name into a set of individual words, and i perform a difference between each sets to find the one with the least amount but i'm yet to try this out

#

if there's any better solution/implementation that exists i'd love to hear about it

pure pond
#

im moving abroad and learning the local language. I wanna do a little thing where I wear a small mic, then at the end of each day review my speech for grammar and vocab improvements, to get to native fluency in a much more efficient way. like having a floating ghost coach following me around and giving a class reviewing the day. would I need to finetune a stt model to pick out specifically my voice? or is that not necessary? once I have that I can just feed the dialogue into any good enough llm

serene scaffold
fair solar
#

which one are you using

#

did just adding the timestamp as a feature not help? (or some normalization of it)

glacial root
# serene scaffold No

since it's subword tokenization, would it be fine to go about this method but then just remove the tokens with spaces after?

serene scaffold
#

or the other way around?

#

each token has one "head subtoken" and zero or more "trailing subtokens". and the trailing subtokens start with ## when you look at each subtoken individually.

glacial root
glacial root
serene scaffold
#

not necessarily. lemmas can still have subtokens.

glacial root
#

i thought with bpe or word piece, it just tokenizes based on the most common pair, and with word piece also accounts for the frequency of the subparts of each pairing

serene scaffold
#

it's been several days, so I don't remember what you're trying to do.

glacial root
#

just trying to implement bpe and word piece tokenizers correctly

#

input of a corpus and outputting an array with the tokens

fair solar
#

makes sense, what sort of control are you expecting?

#

diminishing importance of timestamp feature or something (with time)?

#

without needing to recluster completely i mean

#

is that even possible, would be interesting to see

calm cipher
#

what clustering method are you currently using?

#

oh ha yes you answered my next question, which was how to do you determine the number of clusters

#

I'm assuming the main issue with this is that each message embedding is computed independently of the others, so you might have some messages that aren't grouped in the correct cluster even if they should be?

#

got it, that's a cool idea

hollow pagoda
#

they should add a trading channel to the server

mighty knot
#

Hey, I'm currently building my own neural network from scratch only using NP. It's fully functioning with backprop, but I'm wondering if I made a mistake in how my weight matrices are stored.

Currently, they are stored as each column being a different neuron, and each row being a weight. So N X M, where N is rows (weights), and M is columns (neurons). Should it be the other way around?

mystic willow
mighty knot
#

Yeah just some stuff changes. Like my layer output is calculated as x dot w instead of w dot x. And I don’t have to transpose to matrix ever.

Just wanted to make it standard so it’s easier to debug, but idk what the standard is.

woven prairie
#

What is maximum line of code can gpt4o can write

#

I am using gpt 4o api and It's not writing more than 200 lines of code

#

Has anyone faced the same issues

upper bridge
#

Okay, so I was thinking of proposing an AI localisation plan for our local government So what I was thinking is that we would use an open source model like Qwen3 and fine-tune using these local languages that are spoken in these areas, and we would have data sets. I have already gathered the data each language has around 200,000 rows containing sentences and question answers poemsstories etc so I was thinking that I would find tune that we have two Nvidia DGX stations with the a 100 GPUs so I will train and find you in the model there and then I was thinking that we just run it in that server but I checked with ChatGPT but it said that we would rather have to get that model into small server like GPUs and run it basically on a server with lots of GPUs and then multiple users would be able to use that so like do you have any expertise here? Could you tell me how to plan this out or go about this?

glacial root
#

where can i find a good n-grams dataset that is not more than 1 gb in size?

toxic mortar
#

Check out graph structure

tawny bison
#

i am not able to plot avg time diff on matplotlib, can anybody help

toxic mortar
rain kelp
#

Can someone help me decode binary files in to readable text where I can extract information from? It’s pretty challenging and I have been scratching my head for the last week trying to get a solution. These files come from a game and are stored locally in my pc. When I convert them into hex and then pass a reader on them I get very strange values like level being over 1million or coordinates being also like that. The main problem is probably due to my TLV reader. If someone can help me with this I will be very grateful because I have tried the last week to solve it. Please ping me

toxic mortar
#

Okay, crawl walk run

walk would be entity resolution, something you see in incremental KG construction. You can see examples in graph-rag literature

uneven pawn
#

I was trying to move some files to another folder from my current dir but to overwrite existing files, tried mv and mv -f, didn't work, asked chatgpt for a command to overwrite move

#

It rm -rfed my entire folder 💀

#

"Safest and cleanest way" it said

serene scaffold
#

@uneven pawn this channel is for talking about how to implement AI, not about using AI products. But you shouldn't use LLM-generated code that you don't understand for anything that can't be undone.

peak field
#

Hey guys - I was hoping you could help me find a dataset of images of sleep deprived people? I dont think that exists but an alternate solution was for dark circles under the eyes or swollen eyes or any other indicators (e.g. drooping faces). How would I go about specifically recognizing those features?

#

Maybe some helpful links? or youtube videos?

#

Technically if I just fed it pictures of a sleep deprived person, it would recognize those features itself but thats not really what im tryna do

#

lol

#

not that bad of an idea actually

#

Hey, just a question tho, the learning wouldnt be that efficient if its like basically the same pictures tho right?

#

that is tru tho

#

Actually i found a research paper so 🤞

#

incase anyone else might need it

#

lol

#

thanks

#

it would actually be really cool if i made my own dataset

#

never even considered this! thanks!

#

man there are so many innovative ways to create new data lol

#

Wait i dont understand how that works, monte carlo simulations are probablistic models for like all possible outcomes tho right? Can you explain how i could use it in this?

#

If i remember correctly

modest vigil
#

What version of pandas/numpy do you guys use? I'm having memory leaks with them.

serene scaffold
peak field
#

OH MY GOD. I dont think i could ever do that

#

But i will try my best

#

Wait so yo track the path of a photon with random simulations but how does that actually apply to the image?

shadow atlas
#

Hey. I am having issue with Google Gemini API key.
My Gemini API was working perfectly. After some minutes i kept getting error of it being invalid. Error 400. What can be the possible issue?

shadow atlas
modest vigil
woven prairie
#

Anyone can guide me

#

I want to get in gen ai field

#

How can I get , where to learn

#

What you mean by school

#

Start from data science

#

Maths

exotic star
#

Which khan academy math courses should i do and in what order?

#

also should i dedicate all of the time to math or learn and do projects with pandas, numpy and matplot working with data

dreamy rapids
#

when feeding a perceiverio like network entities

#

is it smart to just feed it all the data you possibly have about the entity or is it still smart to try to filter it down to what's most useful?

#

(please ping me on reply, i check here once every eon basically)

hallow badger
#

What does meaning of the tag call snek

serene scaffold
ruby thunder
#

Hi, I am thinking of creating a RAG based Perfume Recommender frrom a particular website, in that Can anyone guide what I can use and how I can create also send some resourece which I can use ? I want to make it completely free

ashen blaze
#

oh masters of Python data science community.

#

Please Bestow Upon me the knowledge of data science

#

How should I learn Pandas and Numpy?

calm thicket
#

read the tutorial on their websites

shadow viper
#

reading docs help actually

queen raft
#

hi i am a beginner how start with ai and neural networks

exotic star
#

is it enough to get u started?

#

i plan on soing projects after that

grand minnow
edgy niche
#

hey guys any roadmap to learn data analysis pls

mossy pond
#

Science and Ai ... i code python with help of Ai... with a bit of brain works well 🙂

edgy niche
#

huh

#

what is bro YAPPING about

grand minnow
clever void
#

what do you need differential calculus for

#

the majority of calculus 3 and upward is almost hardly ever seen

mossy pond
#

no nooooo ... learn what is possible with Ai-LLM and coding
I think all coders are afraid ... because coding is a language like English .. and now all people around the world are able to code ^^

clever void
#

most of the calculus derivations that I see rests in the appendices of papers and not in the main portions

clever void
#

they are not able to code

#

LLMs are able to code

#

humans are just mediators in that case

mossy pond
finite surge
#

can i sell ai agents for lots of money

serene scaffold
finite surge
#

i thought u could sell it to companies

serene scaffold
finite surge
serene scaffold
finite surge
#

after hs

serene scaffold
finite surge
mossy pond
serene scaffold
finite surge
serene scaffold
finite surge
mossy pond
finite surge
serene scaffold
serene scaffold
mossy pond
#

all coder s are afraid of Ai ... because its easy now to get a small part of code working now for every one 😉

serene scaffold
mossy pond
#

you train LLM for coding?

serene scaffold
#

yes

mossy pond
serene scaffold
mossy pond
#

maybe one kill question for your model ... windows, pyinstaller, internal restart of exe file ... I'm glad I have to ask a human for this kind of problem ^^

finite surge
serene scaffold
buoyant vine
#

also, I will say, just because the AI can spew out some code doesn't mean you don't need to understand what it is doing

#

I have seen so many cases of people using AI to generate their product and it is naturally filled with issues and security leaks

serene scaffold
#

(and they "developers" are powerless to fix it, without developing the skills they would have needed to produce it on their own.)

clever void
#

you didn't code

#

the LLM did it

ripe vortex
#

Hi, i'm using langchain to do an RAG application, but the web loader don't have support to supabase storage, so i have to change my file provider??

past meteor
#

AI slop PRs that introduce 500 LoC for something that can be done in 50

calm cipher
#

I noticed an odd quirk of agent generated code is that it likes to be extremely unnecessarily defensive

#

instead of just opening a file, it will implement a loop that attempts to load the file 10 times with elaborate error checking and tracking before giving up

#

as though a missing file might somehow mysteriously appear

past meteor
#

Yeah, good observation. It'll validate inputs over and over

#

When it's much better design to validate stuff once at the edge and then to assume it'll be there

calm cipher
#

that could actually be dangerous if the code is for a really high-traffic web site or something

#

if there's some error in the system that makes an expected file unavailable, it's probably super taxing for the host computer to deal with, and if lots of incoming requests are all asking for it you could end up overloading the system with thousands of unnecessary open attempts

#

totally nuts

#

also i've seen code interacting with DataFrames constantly checking for the existence of columns before interacting with them, as though the person submitting the work didn't know if those columns were there

tidal bough
#

this wasn't my experience; I often see code that's too optimistic

calm cipher
#

It probably depends on which LLM you're using, and how you word the prompt, and random chance

glacial hemlock
#

good day !

#

somebody has a road map for data science ?

livid cipher
#

What you mean road map?

shadow atlas
#

How can we monetize mcp servers of our own? Any idea guys?

Please reply to this message so that i can read later.

agile cobalt
raw sorrel
#

Hello guys, i want to ask something, if i want to learn data analyst, can you give me recommendation free course pr resource to learning about data analyst?

please reply to this message if you have recommendation

shadow atlas
raven garden
#

hi, I hope you are all doing well. I wanted to know if someone could guide me on how to set up physionet (.org). I want to use public healthcare datasets, but its not as straightforward as I thought ( setting up agreements, credentialing, ... ). is someone familiar enough with physionet to guide me on the steps I should take to be able to use any dataset on this plateform ? thanks !

raven garden
raven garden
raw sorrel
raven garden
raw sorrel
#

ahh i see, thanks bro for your recommendation

raven garden
#

no problem!

exotic star
#

is learning pandas and doing pandas projects a good ML start?

fallow coyote
exotic star
#

not sure what level of math u need for this thats why im not sure which course

fallow coyote
#

Make sure you know your high schools maths well if yoy already dont. Learn linear algebra, stats, probability and calculus. You can use khan academy for linear algebra and calculus. I used ISLP for the statisitcs side. Use Python for Data Science by Wes Kineey to learn the basics of the modules youll use for ML and statistical analysis. Start from there, research, practice, get angry, research, practice, bash your head against the wall and repeat

quaint mulch
exotic star
#

got it. thanks a lot

shy otter
#

If anyone is Agentic coding, I came across this service recently; gives $100 of free Anthropic credit to use Claude Code.

BIG Disclaimer : This is an affiliate link. Other than gaining credit I am not involved with this in any way.
I will just gain some free credit on their platform, but you also get double credit for signing up ($100 instead of $50) - So win/win.
https://anyrouter.top/register?aff=zb2p

You can strip off my affiliate if you want, but you'll only get $50 credit, not $100.

This is some Chinese site - probably not legitimate, and is probably scraping your code/data. It will probably disappear as quickly as it appeared, so I'm using it to test some ideas in Claude Code - Not for any production code, or sensitive data. It is a great way to 'try before you buy', but if you use this you do so knowing the potential risks

The instructions are translated to English in the attached.

arctic wedgeBOT
trim nymph
#

just start with a project you would like to complete with ml related solution

gaunt rover
#

Hey everyone! I'm Fahad. I started learning Python a few weeks ago for Data science AI & automation—still super new, but trying to stay consistent. I'm looking for a coding buddy or small group to learn and maybe build some mini-projects with. If anyone's interested, feel free to DM me or reply here! 🙂

tawny bison
#

do i need to master numpy pandas and matplotlib as i just know the basics i mean the main commands and have to do seaborn and plotly

#

also can anybody suggest some seaborn and plotly resources

serene scaffold
quartz zodiac
#

I have built a model classifier on the wine prediction data set in Kaggle. I want to make custom predictions using that model on my React website. Do I need to dump the entire model or just the model parameters into a file and load that model(from the file) on a flask server and send the input wine data by a request to the server, make the prediction and then return the answer. How would it work?

tawny bison
#

so you just read the whole documentation @serene scaffold

serene scaffold
tawny bison
#

dm'ed you the way i'm doin

serene scaffold
serene scaffold
tawny bison
quartz zodiac
#

I think i dont

serene scaffold
#

so you can't really save the parameters as a separate thing from the model, in a sense that is useful.

quartz zodiac
#

So if I want instant results I should save the entire weight along with the models, the training parameters, (ie the complete model itself)

serene scaffold
quartz zodiac
#

Yes

#

I get it now... actually

#

hyper parameters are pre training

#

and training parameters are during training that are calculated

serene scaffold
zealous girder
#

I've entered into a kaggle competition, and i'm doing all the training and stuff on my machine locally and treating it as a serious project. What sort of project structure (directory structure) should I have?

peak field
#

Hey guys

#

Any datasets for whether a person is walking or not

#

Basically stationary vs moving?

#

And using smartphone if possible but really just an accelerometer

#

(gyroscope if necessary)

fair solar
#

which ones are you using btw

warped notch
#

any recommendation on which book (any resource) to learn data mining?

fair solar
#

oh makes sense. so wait, claude recalling stuff well is turning out to be a problem?

#

I reread this but still didn't understand it well enough ig

#

ah, so claude is just being fed with everything it needs and is only used for the final step i'm guessing

#

so how does it's temporal awareness affect you

#

makes sense

#

yeah, that makes sense

somber willow
#

guys do you know where i can learn pytorch

odd meteor
# somber willow guys do you know where i can learn pytorch

Welcome to the most beginner-friendly place on the internet to learn PyTorch for deep learning.

All code on GitHub - https://dbourke.link/pt-github
Ask a question - https://dbourke.link/pt-github-discussions
Read the course materials online - https://learnpytorch.io
Sign up for the full course on Zero to Mastery (20+ hours more video) - https:/...

▶ Play video
#

Then, if video content isn't your thing, check the tutorial section on PyTorch website

toxic sigil
#

hi guys
as a machine learning beginner trying to make stock market price predictions or any project in general
how do i learn libraries and know how to use them ony my own and come up with ideas to use them for a project
i dont want to copy youtube videos etc. i want to use them on my own so i wanted to know if anyone has experience tell me the workflow or learn method

import numpy as np
import pandas as pd
import yfinance as yf
import matplotlib.pyplot as plt

import torch
import torch.nn as nn
import torch.optim as optim

from sklearn.preprocessing import StandardScaler
from sklearn.metrics import root_mean_squared_error

jaunty helm
toxic sigil
drowsy mesa
#

Hello everyone i am new developer how can I use python for security purpose can any say it....

odd meteor
drowsy mesa
#

In ai agent for security stored file

#

If I am uploading the file in backend and the ai reads It.. And I want to make sure the data should be secure

tranquil jasper
#

hi
for someone to get into computer vision, what do we need to learn beforehand?

agile cobalt
fallow coyote
#

anyone recommend any good commodity apis with a decent free plan? need to extract price data of gold, silver, platinum and copper

drowsy mesa
#

What package can I use

agile cobalt
#

that is not something a package can solve - no matter which tool you use, if you misconfigure things you'll create a security vulnerability

drowsy mesa
#

We can encryption right

drowsy mesa
sharp crow
fresh sluice
#

Anyone who is working in AI field here?

serene scaffold
fresh sluice
serene scaffold
fresh sluice
neon owl
#

Anyone working in the Ai/Ml field , i am in my second year going to 3rd year which is year of internship my resume isnt that strong i am studying informatics but i want internship for Ai/ML as in international internship

Any recommendations for how to standout ?

serene scaffold
rocky swan
#

Is there anyway I could fine tune an LLM with only outputs as it's training data or do I need to make an LLM generate the input data and add it to the dataset or something.

serene scaffold
calm cipher
#

This is true for question answering models but a lot of LLMs are just predicting the next token in a sequence, which you can train with only a giant pile of textual data

#

So in that case the input would be a document up to word t, and the output is word t+1

#

If that's what you want to do, and say you're starting with a pre trained model on huggingface, you would fine tune the casual variant of the model

serene scaffold
stray void
#

Are there libraries available that implement geodesic walking on an STL surface, given some travel direction? I know there are libraries that can calculate the geodesic distance between 2 points, but what if you don't know the endpoint in advance?

drowsy mesa
#

Instead of rag what can we use ?

quaint mulch
gaunt apex
#

Hi!
I'm training an OCR model (CRNN/Easter2 architectures) and getting inconsistent results on Kaggle despite using:

  • Same dataset and preprocessing
  • Same code/hyperparameters
  • Same random seeds
  • Previously got good CER performance, now stuck at 70%+ with repetitive predictions

The model gets stuck outputting repetitive character patterns instead of learning to read text properly, even with different seeds and learning rates.

Has anyone experienced:

  • Different OCR training behavior between Kaggle sessions?
  • Model collapse (repetitive predictions) with CRNN/Easter2 on P100s?
  • Memory constraints affecting OCR convergence?
  • Different PyTorch/CUDA behavior on Kaggle vs other platforms?

Could Kaggle's P100 GPU environment be causing this? Any insights on GPU-specific OCR training issues would be helpful!

Hardware: Kaggle P100
Framework: PyTorch
Models: CRNN, Easter2
Task: Text recognition

serene scaffold
fresh sluice
drowsy mesa
quaint mulch
fresh sluice
opaque condor
#

could anyone look over this code and tell me what im doing wrong i have done this 3 times but its better i have a person who secilizes in convolutional networks:
https://paste.pythondiscord.com/C52A

sharp crow
#

Oi lads I have a question
I am working on a simple regression problem and my scores are
TRAINING SCORE-0.999999
TESTING SCORE-0.999999
R2_SCORE- 0.999999
MSE- 4.52
RMSE- 2.13
These scores look too perfect to me

#

Any one who can help me

sharp crow
#

Nvm found it

quaint mulch
# fresh sluice Oh understood, are these some research papers?

Kinda, those are academic conferences. Research papers are published in many places, including those places. I'm trying to say is that publishing in top AI conferences would make you really standout. Very difficult but I have seen people done it. This is one of the most "standout" thing that you can do.

fresh sluice
quaint mulch
#

what's "uk building" ?

grand minnow
quaint mulch
#

writing papers usually also involve building stuff.
Another alternative would be to win competitions in places like kaggle / aicrowd