#data-science-and-ml

1 messages ยท Page 161 of 1

glacial root
#

yeah i was told previously that there were times that he had his model train overnight

#

why is numpy not optimal

#

oh you mean compared to c/c++?

iron basalt
iron basalt
glacial root
#

and how come python is the preferred language over c++

#

even if c++ is more efficient

#

is it just cause of simplicity

iron basalt
#

Because C++ came from the 9th ring of hell.

glacial root
#

lol

#

for real the people who made python are doing gods work

#

using a diabolical language to make a not diabolical language

iron basalt
#

Also, you won't be doing any of this manually, but instead using something like Pytorch, which while technically is also there in C++ (just Torch), there won't be any difference except extra pain from using C++.

#

So since you have a giant loop in plain Python, the solution is to either find a way to get rid of it by turning into some Numpy stuff, or reduce the iteration bounds (make it smaller).

#

for i in range(50000): is a red flag for Python performance.

iron basalt
glacial root
#

but isn't it necessary cause that's all the training examples

#

and it's different for each image cause it depends on that image's activations

iron basalt
# glacial root i see

Well first, are you actually using this value in your updates? If it's just there to give you a measurement to print while training you don't need to compute this every iteration.

#

Also in that case it does not need to be exactly correct, you can do less than all of them since it's just to give you a feel.

glacial root
#

cause i gotta get an average across all training examples

#

also the model will still train even if my pc goes into sleep mode due to inactivity right

iron basalt
glacial root
#

i've been away from my computer for a while

glacial root
#

if i only do 1 then wouldn't it just always classify everything as the number that is in the one training example

iron basalt
glacial root
#

what's a gaussian

#

just a cluster?

iron basalt
#

Bell curve.

glacial root
#

i see

#

oh wait what

#

bell curve?

#

oh wait i see what you mean

#

the peak of the bell curve is where the cluster is

iron basalt
#

I'm describing how these points where spawned, they tend to be mostly near the center.

#

And falls off exponentially.

bold rapids
#

Does anyone know how i can find rows where the LRank is greater than the WRank

iron basalt
# iron basalt No, let put it this way. If I had a program that took in a bunch of 2d points th...

So given these points are your dataset. Initially you may have say a single point estimate, randomly chosen, so you can imagine there is some green point in this plot somewhere. And through many iterations, we adjust this a little bit each time. During each iteration, which get a random point, and move our point a little bit towards it (interpolation by alpha amount of the distance between them). Now if you keep doing this where do you think the green point will end up roughly?

glacial root
#

do you mean that you start off with these datapoints, plus your random estimate, and then you get a random point which you move your estimate towards?

iron basalt
glacial root
#

oh from the data

#

oh then wouldn't that give you an average across all the data?

iron basalt
#

Yeah.

#

But note we did not ever like, sum them all and then divide by N.

glacial root
#

(i'm not gonna lie, i don't know how that works)

#

like i kind of guessed it and kind of see how it would make sense

iron basalt
#

It's pretty intuitive, that is why I chose simple 2D points.

glacial root
#

but i don't really fully understand how that results in an exact average

#

or is this an approximate

iron basalt
#

It's about convergence.

#

Imagine it's given infinite time to "settle."

glacial root
#

oh wait yeah that makes sense now

iron basalt
#

If you only do like 3 iterations, probably way off.

glacial root
#

yeah not sure why i didn't see it completely before

#

so it converges towards the average

iron basalt
#

And note what happens when you pull too hard towards a point too.

glacial root
#

and probably never gets to the exact average

#

but gets super close

iron basalt
#

If I pull with max strength, basically setting our point to the random one, it will just keep jumping everywhere, the result is not really a mix of all the points.

glacial root
#

i see

#

so these points are like our training examples

iron basalt
#

So instead we say like, move about 0.001 of the way there.

#

Like a small % of the way there.

glacial root
#

and how much we pull towards each point is our learning rate

iron basalt
#

Yes.

glacial root
#

what type of neural network is this

#

or is it still feed forward

iron basalt
#

Not a neural network really, more broad.

glacial root
#

so it's more so just a way of averaging across large datasets more efficiently

#

man sometimes i worry

#

like earlier when i didn't see right away how it converges to the average

#

now i feel like that's just common sense

#

but i couldn't think of it

iron basalt
#

Well, now let me ask this, if you have a neural network with a single neuron, and you are "pulling" based on one of two labels (binary classification problem), rather than input point itself, could you use this idea? What would "pulling" be in this case? When we were pulling towards other points we relied on some idea of "distance" or "difference" between the points (where we currently are and some kind of "target"), and moved part of the way towards it.

glacial root
#

so then it would be pulled toward a side

iron basalt
# glacial root well if it's a binary classification then we would probably have a threshold sim...

In machine learning, the perceptron is an algorithm for supervised learning of binary classifiers. A binary classifier is a function which can decide whether or not an input, represented by a vector of numbers, belongs to some specific class. It is a type of linear classifier, i.e. a classification algorithm that makes its predictions based on...

hearty depot
glacial root
#

oh so what i said isn't bullshit

#

man sometimes i feel that even if im able to figure things out, there's this slight feeling that what i'm saying is meaningless yap

iron basalt
#

I gotta go, but one more thing to ponder. In that random point thing we did. I said that each iteration we get a random point to move toward from the set. But what if it's not randomly picked? What if we took all the inputs and sorted them by x and y (in that order)? And then iterated over them in that order. Would it still end up giving a nice estimate in the end? What about with infinite iterations?

#

(Hint, imagine just two input points (or a bunch in each corner) on opposite ends of the image, where would the estimate end up?)

#

(With and without random picking)

glacial root
glacial root
iron basalt
#

You can also just try simulating it yourself in Python.

glacial root
#

nah i mean in general

quick cairn
#

how would i get ai into python?

glacial root
#

what does this question even mean

fervent canopy
jaunty helm
hallow badger
#

have any good project on gitHub you guys recommend to me

bold rapids
#

Hi

silent haven
#

hello guys, i'm new to this server.

#

i'm here because i'm doing a uni project, i would like to know how to start learn ML, it would be of great help if you could recommend me some sources

#

i am not looking to make an LLM or anything of that sort, just something that looks at a few numbers and judges the severity of a situation, and prompts a few actuators

jade prairie
#

which framework is easy to learn for CNN

serene scaffold
fickle shale
#

Can anyone give me review of Deep learning by ian goodfellow book? Is good to read?or too depth and time waste?

warm iron
#

Hey guys, I manually wrote a simple neural network with each mathematical operation. it is my first attempt to create something like this so I wanted to share it here. I am open to be critiqued! ๐Ÿ™‚

torn flint
#

That's impressive fr

#

I mean I'm a beginner, I wanted to code a nn from scratch I tried but it was tough so I kinda left that in the middle

#

Hope to redo it someday

opaque condor
#

why do i have a type errorhttps://paste.pythondiscord.com/H22A

lapis sequoia
#

he had great blog on it

#

i remember it used to be free but now he is monetising it with a book ig

opaque condor
main fox
#

Also just from looking at it, your forward pass doesn't do anything

opaque condor
# main fox Isn't that the same code Wendigo had?

Yes I'm starting to try and learn convolution I couldn't find any tutorials and I know that if I try looking on Amazon for a book that doesn't have what I need it's going to take longer so I knew that Wendigo posting some code nothing we're working on convolution and I thought well I know that there's an error but if I can find out or at least have somebody guide me maybe they would learn if they came back to #data-science-and-ml

opaque condor
opaque condor
# main fox Isn't that the same code Wendigo had?

Here is the video wendigo used: https://youtu.be/pDdP0TFzsoQ?si=-qKX0vOd8VB5TU9j

New Tutorial series about Deep Learning with PyTorch!
โญ Check out Tabnine, the FREE AI-powered code completion tool I use to help me code faster: https://www.tabnine.com/?utm_source=youtube.com&utm_campaign=PythonEngineer *

In this part we will implement our first convolutional neural network (CNN) that can do image classification based on the ...

โ–ถ Play video
opaque condor
#

it has a feed forward now

plain glacier
unkempt wigeon
unkempt wigeon
weary timber
glacial root
#

yo guys for a simple feed forward neural network, what learning rate range should i be using

serene scaffold
glacial root
#

like for one of the digits the average cost will be like 0.9 and it just won't even touch that

#

but then it goes for the one that's 0.07 or something

torn flint
#

Yo anyone has worked with gtzan dataset

#

Music related stuff

cerulean violet
#

Guys is break dataset good for training my chatbot?

hearty depot
acoustic seal
#

has anyone used paperspace gradient? im considering subscribing to it but it sounds a bit too good(?)

acoustic seal
mystic peak
#

How do you give an ai a reward system

warm iron
warm iron
glacial root
#

i think that's the main issue i'm currently having

hearty depot
glacial root
#

or is that something different

hearty depot
glacial root
#

oh i see

#

so there's multiple backprop methods

#

does the activation function also have a decent impact on this

hearty depot
hearty depot
glacial root
#

but i'll definitely look into it still

#

recently i've just been learning as i go when it comes to math

hearty depot
glacial root
#

so it's just the backprop algorithm i should be worried about for right now

hearty depot
glacial root
hearty depot
glacial root
#

so the purpose of taking a random subset rather than the whole thing is just for computational efficiency right

#

i remember 3blue1brown talking about creating groups of data and then doing backpropagation on each group separately, then averaging the results

#

not sure if that's exactly what it was, was something like that though

lapis sequoia
#

Hi! I have joined somewhere as an intern and have to start working from next monday in the field of data science i only have knowledge in numpy pandas matplotlib seaborn till intermediatry level so should I focus on machine learning more or should I make my understaning in the libraries more robust...

#

could anyone provide any advice

serene grail
# lapis sequoia could anyone provide any advice

I don't work in the field but other people will probably be able to help you better if you gave a more detailed job description.
Did they tell you what you will actually be doing on the job? "data science" is pretty vague

austere prawn
#

I got a Pandas dataframe with lot of rows, but every pair of rows should be combined:

1 a x 50.0
1 a y 60.5
2 b x 10
2 b y 10.3

=>

1 a 50.0 60.5 21
2 b 10   10.3 3

What are my options to achieve this? Groupby, agg, transform or apply?

grand breach
#

is it a good idea to label a small subset of data for image captioning task ?

small wedge
#

on that first column after the index

fervent canopy
rich river
#

is super().__init__() required in pytorch models' __init__ function?

drifting gust
ivory root
#

Hey guys, am still new to ML I just finished building my first supervised model but am still learning, am currently using collab and when I tried using CSV file not from the course I was following its not working am trying to upload it straight from my pc but it ain't working. If anyone met with this issue I would love you help

torn flint
#

Hi I'm training a cnn model, with like custom conv layers

Everything is fine just consecutive epochs the accuracy and everything turns 0

#

Please help me

drifting gust
#

I CAN HELP U

#

probably ur messing up ur forward pass bro

#

googles accuracy

#

"Neural network accuracy, a measure of how often a model correctly predicts outcomes"

#

yes bro it sounds like ur overfitting

#

or some other world ending phenominon

#

like gradient vanishing cause u have like 5 layers of conv nets

#

and no dropout (ds0nt is currently drunk and getting into troubles)

drifting gust
torn flint
#

I will look up Vanishing Gradients in detail

austere prawn
#

Next is a jupyter notebook question. Can I with a for loop output text and plots interleaved?

for i in (1,2,3):
    print(i)
    plot(i) 

Outputs

1
2
Plot1
Plot2

Whereas I would like

1
Plot1
2
Plot2
fervent canopy
#

So, I used to read a lot of comics and watch a shit tonne of cartoon as a kid. In many of those marvel comics, the superior beings used to pretend that all of the heroes and the villains were just pieces

#

and everything was just a grand game

austere prawn
fervent canopy
torn flint
#

I just realised I should use ml algos instead of dl

#

Taking reference from existing kaggle notebook

#

Why to complicate things when it can be done easily

#

10 conv layers for a dataset containing 1000 items will obviously cause overfitting

dry lynx
#

anyone know how make click farm for youtube if yes dm

austere prawn
dry lynx
arctic wedgeBOT
#

The rules and guidelines that apply to this community can be found on our rules page. We expect all members of the community to have read and understood these.

dry lynx
#

mmmmmmmmmmmmmmmm

#

what is @sonic vapor

torn flint
#

Deep Learning my dear

austere prawn
#

I see ๐Ÿ‘

austere prawn
serene scaffold
#

are we talking about data science?

torn flint
#

My salutes to u

torn flint
dry lynx
#

oki

austere prawn
#

Oh, Github? ๐Ÿ˜‹

torn flint
#

Yes I stalked u, ๐Ÿ’€

#

Seems like you're into coding since very long

#

That's inspiring

glacial root
#

what's haskell used for

torn flint
#

I've never used it but I feel it's something mathematical

glacial root
#

"strong emphasis on immutability"

#

sounds diabolical

lapis sequoia
#

hi everyone

#

how are you guys

austere prawn
torn flint
#

Rather call you Mr David instead of "Bro"

austere prawn
#

๐Ÿ™

glacial root
#

must give due respect

#

he's a veteran programmer

torn flint
#

I'm so pleased to meet him it's a great pleasure

#

Like honestly

glacial root
#

always great to meet an expert

magic sorrel
#

How do you organize your files? I took a bunch of courses and now getting into data projects, but I have file all over the place. Some are python venv, some art jupyter, some pull from different data sets, some are just trying new methods. some are a continuous lists of hypotehsis test. is there a best practice or recommended way or organizing this?

stuck tapir
#

atl that works for me

magic sorrel
#

you are using vscode or jupyter ?

stuck tapir
magic sorrel
#

actually, maybe it's just right. - after thinking about it

austere prawn
#

Is it possible to run a regular python file as a jupyter notebook? I don't see why it couldn't with just a few markers for cell division.

from jupyter import MADE_UP_THING as next_cell

print(1)
next_cell()
print(2)
stuck tapir
stuck tapir
stuck tapir
tawdry sundial
#

where can i finetune llm online?

I am trying to finetune around 8B parameter models like llama

#

I tried on google collab but its slow and very limited

#

I find it hard to understand the pricing range for gpu renting

glacial root
tawdry sundial
serene scaffold
#

Oh you're willing to buy credit

tawdry sundial
#

settled for runpod and modal they seem like a good option

austere prawn
river cape
#

Hey guys so I was playing around with RNN using the imdb dataset, so initially I added a maxlen of 50 for the pad_sequences,I just made my input vocabulary to 10000 and then followed this architecture

model.add(Embedding(input_dim=10000,output_dim=2,input_length=50))
model.add(SimpleRNN(32,return_sequences=False))
model.add(Dense(1,activation='sigmoid'))

I did get an accuracy of 75 on the validation set , but then this time , I did these changes to the architecture

model.add(Embedding(input_dim=88364,output_dim=80,input_length=2943))
model.add(SimpleRNN(32,return_sequences=False))
model.add(Dense(1,activation='sigmoid'))

and I am only getting a constant accuracy of 50% , how do I increase it? Is it possible to achieve 75%+ accuracy using SimpleRNN

stuck tapir
# river cape Hey guys so I was playing around with RNN using the imdb dataset, so initially ...

the drastic accuracy drop likely stems from the increased vocabulary size and sequence length. with a larger vocabulary (88364), the embedding layer's weight matrix becomes significantly larger, making it harder to train effectively, especially with a simple rnn. similarly, the extended sequence length (2943) can lead to vanishing gradients, hindering learning. try reducing the vocabulary size, lowering the sequence length, or using lstm or gru layers, which handle long sequences better. also, experiment with different learning rates and optimizers, and consider adding regularization techniques like dropout. finally, verify your data preprocessing and ensure no unintended data leaks.

stuck tapir
narrow tiger
#

Isit possible to know why did the AI/llm give certain answer??

stuck tapir
narrow tiger
stuck tapir
tender hearth
#

read the anthropic blog post on golden gate claude if you havenโ€™t already

narrow tiger
#

Thanks i will

bright comet
#

guys

#

how to can i make an AI who helps me programming?

stuck tapir
stuck tapir
smoky ingot
#

hello i am new i wanna learn about ai so from where should i start plz guide me

stuck tapir
smoky ingot
#

can you provide me some specific source

#

like some websiter and youtube channel

stuck tapir
#

check coursera (andrew ng), fast.ai, scikit-learn's site, and tensorflow/pytorch docs. for youtube, try 3blue1brown, sentdex, or Lex Fridman. consistent practice is key

stuck tapir
#

yw

#

always there to guide you along your journey

eager horizon
#

im thinking about starting to make a resume and upload projects, what are all the avenues. Github Repositories, Resume, anything else?

serene scaffold
jaunty helm
austere prawn
#

How to prevent seaborn from drawing multiple plots on top of other plot? I didn't have this issue before, it suddenly started to happen ๐Ÿฅด

jaunty helm
#

or like

sns.scatterplot(...)
plt.subplot()
sns.lineplot(...)
austere prawn
#

It was enough to do

sns.stripplot(...)
sns.barplot(...)

Without any ax, I think. But I might misremember

#

(because I was doing ax stuff at some point)

#

All of this state-machinery in matplotlib is so confusing. ๐Ÿ˜•

serene scaffold
austere prawn
#

What's the correlation? Completely independent things? Which is newer?

Seaborn is quite big and looks nice and is on top of matpotlib. It's also big and scary ๐Ÿ˜
@serene scaffold

glacial root
river cape
austere prawn
serene scaffold
river cape
#

is it possible to build an llm from scratch? I do have the data and i do have graphical power to some extend

serene scaffold
austere prawn
river cape
serene scaffold
river cape
#

Is it possible to achieve that using fine - tuning

austere prawn
serene scaffold
austere prawn
#

Did I just waste a week doing matplotlib stuff? ๐Ÿ˜ฑ๐Ÿ˜

river cape
runic ibex
#

can anyone help me build a RAG on prem with python. thinking of using railway app

serene scaffold
# river cape So how to combat that issue

there's no way to guarantee that an LLM will or won't do something. You can include instructions in the prompt to do things a certain way, and if the LLM is good at following explicit instructions ("don't include any personal names in your response"), it probably never will. but if this is a situation where you're worried about getting sued, you need to include some post-processing on the LLM's response that will deterministically guarantee that the rule is followed.

serene scaffold
austere prawn
austere prawn
austere prawn
#

Ok good, I got an empty browser screen on my first try with some Javascript loading... So got scared.

opaque condor
stuck tapir
austere prawn
austere prawn
#

Probably not done today. But thank you for the help

stuck tapir
# opaque condor Which line?

just remove
x = self.conv1()
from the ConvNet __init__

class ConvNet(nn.Module):
    def __init__(self):
        super(ConvNet, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)
#

hopefully that solves the issue.

opaque condor
#

why is that there i dont see anything

stuck tapir
opaque condor
#

thank you

#

How would I go about fixing the test loop?

stuck tapir
#

ywyw

#

in your test loop
if (labels == pred):
should be comparing each label, not the full tensors, so just replace it with
if label == pred:

#

Line 111

opaque condor
#

I already changed that a long time ago

#

acc = 100.0 * n_class_correct[i] / n_class_samples[i]

#

It's giving me this error

#

0 division error float division by zero

opaque condor
stuck tapir
opaque condor
opaque condor
stuck tapir
#

fixed it overall,

with torch.no_grad():
    n_correct = 0
    n_sample = 0

    n_class_correct = [0 for _ in range(10)]
    n_class_samples = [0 for _ in range(10)]

    for images, labels in test_loader:
        images = images.to(device)
        labels = labels.to(device)

        outputs = model(images)
        _, predicted = torch.max(outputs, 1)

        n_sample += labels.size(0)
        n_correct += (predicted == labels).sum().item()

        for i in range(len(labels)):
            label = labels[i].item()
            pred = predicted[i].item()
            if label == pred:
                n_class_correct[label] += 1
            n_class_samples[label] += 1

    acc = 100.0 * n_correct / n_sample
    print(f'Overall Accuracy: {acc:.2f}%')

    for i in range(10):
        if n_class_samples[i] == 0:
            acc = 0.0
        else:
            acc = 100.0 * n_class_correct[i] / n_class_samples[i]
        print(f'Accuracy of {classes[i]}: {acc:.2f}%')

Just added a zero check,

#

also, that error means that n_class_samples[i] is 0 for at least one class.

#

so recheck that,

#

Could've happened cuz of if (labels == pred): error
because you were comparing the entire labels tensor to a single pred, the condition was likely never true --> so n_class_correct[label] was never incremented, and in some cases, n_class_samples[label] wasn't either if it crashed

opaque condor
stuck tapir
#

the first problem in your code was that you were like running a layer with NO INPUT BTW in __init__ where we define layers so that's what was causing that

#

and the second error arose by using the WHOLE tensor instead of indivisual comparison, which obv would result in 0 then, and then /0

opaque condor
#

Are you able to get on to voice chat because it'll make everything a whole lot faster if you can just listen and type also I did something bad to my visual code Studio key bindings so I guess I'm going to have to go to idle

stuck tapir
#

in a shared space rn can't really vc

#

hope you understand the limitation </3

opaque condor
#

And what about voice chat when it's not being used I'm sorry

opaque condor
#

F-zero is array being used one doesn't have anyone in it so therefore it won't be able to be used unless you would be comfortable with something else

stuck tapir
#

Ah, you mean class 0 didnโ€™t appear in the test set, so n_class_samples[0] stayed zero? Makes sense right, thatโ€™s why the division by zero happened

opaque condor
#

How can I fix that division error?

stuck tapir
#

adding a zero check, like I did

    for i in range(10):
        if n_class_samples[i] == 0:
            acc = 0.0
        else:
            acc = 100.0 * n_class_correct[i] / n_class_samples[i]
        print(f'Accuracy of {classes[i]}: {acc:.2f}%')

and fixing if (labels==pred): to

if label == pred:
hollow pagoda
#

hi im learning neural networks, in the first two screenshots i learned how to use polynomial transforms to add nonlinearities to the linear regression,
im trying to understand why it's said that you cant do this same thing for the hidden layers in neural networks, and the first solution is to perform sigmoid transformation on the data (it might be something obvious im misunderstanding)

#

i understand why the sigmoid one works to calculate logistic loss on a logistic regression but can you not use that synthetic feature from the first examples on the hidden layer and leave the equations linear?

#

apparently it makes it an 'activation layer' tho so ima continue reading but id appreciate feedback

opaque condor
#

do I need visual code Studio to process torch

hollow pagoda
hollow pagoda
#

ok i get it now

stuck tapir
stuck tapir
# hollow pagoda ok i get it now

polynomial transformations in linear regression help add nonlinearity by expanding the feature space, but neural networks need activation functions (like sigmoid or ReLU) to introduce nonlinearity in each layer. without activations, the network would just be stacking linear equations, limiting its ability to learn complex patterns. activation functions allow the model to adapt and find the best features during training. it's not just about adding nonlinearity; it's about learning the right transformations for each layer.

opaque condor
opaque condor
#

Python

opaque condor
stuck tapir
#

you don't need visual studio code to use pytorch; you just need the right python environment set up. make sure you have pytorch installed in your environment. you can check that by running pip show torch in your terminal or command prompt. if it's not installed, you can install it with pip install torch. let me know if you're still having trouble

jaunty helm
# hollow pagoda hi im learning neural networks, in the first two screenshots i learned how to us...

the non-linearity in nns is added thru activation functions; sigmoid's one of them, but nowadays ReLU (which is just max(0, value)) is more popular

why it's said that you cant do this same thing for the hidden layers
assuming you mean, why not more complex functions in neurons instead of simple weighted sums
the latter is very easy to compute, makes backprop a lot easier, and they can universally approximate anyways

rich river
#

my current project require libtorch-gpu, onnxruntime, cuda-toolkit and cudnn, and the overall image size is so big, any ideas?

odd meteor
stuck tapir
stuck tapir
stuck tapir
rich river
stuck tapir
unkempt apex
silent haven
#

We could use a normal algorithm or threshold but we decided to go for bonus point using ML

opaque condor
#

Can a label be in a folder?

unkempt apex
stuck tapir
stuck tapir
silent haven
stuck tapir
opaque condor
opaque condor
#

Yes

stuck tapir
#

if yes then yea ofc thats how most people do it

warm iron
#

Any idea about the best / the most common image preprocessing techniques for the further use in CNNs? ๐Ÿฅฒ

serene scaffold
stuck tapir
past meteor
#

Typically whatever package you're using will have that readily available for the model you're using

warm iron
stuck tapir
safe agate
jaunty helm
#

if you're on vscode w/ the jupyter extension, you can do this in a .py file

# %%
print('cell 1')

# %% [markdown]
# # Header
# some md text

# %%
print('code cell 2')
austere prawn
safe agate
austere prawn
#

Cool ๐Ÿ‘

#

Is there a time and date set?

past meteor
#

And if youโ€™re not using one, you probably should ๐Ÿ˜„

stuck tapir
# warm iron Why

consistency, optimal performance, avoiding errors
as for why to use pretrained models,
faster dev, better perfromance, resource efficiency

fervent canopy
# warm iron Why

Hyper is right about using pretrained models. You should almost always try to work with pretrained models unless you have access to a bunch of A100s and a shit tonne of data. By the rule of thumb, you should never try to reinvent the wheel

#

Pretrained models and fixed architectures exist for a reason

#

People almost always use Adam, Nadam or SGD

#

and as I like to say, the inventor of wheel must've got paid nothing, but the owner of ferrari does

small wedge
#

Facts

past meteor
# warm iron Why

I like the analogy that the first part of training CNNs is just about teaching it what elementary features are

#

So instead of starting from 0, take one that is pretrained and continue training that one to fit it to your domain

serene grail
next ember
#

guys i have a question
is this result okay for my work
its a "pv panel image segmentation with ai" project, 1st images are panel images, 2.s are my masks and 3rds are model prediction results.
I got > 0.96 dice_coef and > 0.95 accuracy with 30 epoch on around 200 train images. please mention me in your reply, thanks.
end goal is making true masks and model predictions close as much as possible

woeful escarp
#

Hello, I am starting in ML, I would like to work in a project to improve, send me DM

river cape
next ember
river cape
#

and after detecting edges , we tend to get a lil higher level features

river cape
next ember
past meteor
merry ridge
#

I have a folder with tens of thousands of training images, and sometimes I just want to quickly scroll through and visually inspect them in windows to make sure nothing odd showed up in the pipeline. Windows doesn't seem to really like managing extremely large folders. It can take a minute or two for the folder to even open. Is there something I can download that makes navigating these files a bit easier in the same way someone might use voidtools instead of the default Windows search.

stuck tapir
stuck tapir
stuck tapir
merry ridge
stuck tapir
fervent canopy
#

If anyone is interested in YOLOv12 and comparing its performance with YOLOv11 and YOLOv8 in real-world scenarios, I made this

stuck tapir
stuck tapir
waxen echo
#

is it easy to get into making like AI bots for games, or anything realistically?

serene scaffold
cedar tusk
#

1st is you need an interface that connects to the game or make a similar game yourself

#

2nd is depending on the game the amount of variables is too much. Obviously if you are doing pacman or pong its gonna be not that hard, but if its anything 3d for example its gomna become impossible to properly manage and calculate the inputs for the ai model.

dry raft
#

this may be niche, but how can get a pdb file (protein data bank) structure into embeddings for a huggingface model?

#

I know that I can turn it into a set of coords and perform dimensionality reduction and then embed it as text, but I feel like that there is something more advanced that can truly capture the complexities of protein structure

#

the alpha-helices, beta-pleated sheets, cysetine bonds, etc

#

btw ping or dm me

#

i'll be off maybe

stuck tapir
safe agate
mighty lake
#

for data science, without going into any further detail, at what point should I move on to R

plain leaf
#

yoo chat, i learnt langchain and thought cool, now i can build projects, but recently everyone started to bash langchain and are moving to llamaindex or PydanticAI, I need to get a job ASAP, I'm strong with fundamentals but struggling with these gazzillionn frameworks..some one please help and i need a production ready project to start applying for stuffs..

vocal cove
#

I doubt he wants sth as sophisticated as like NPCs from MGSV, or MW 2022, or TLOU 2.

#

He could try making the ghosts from PACMAN.

#

gym has environments for most of those OG games for which you can make agents for.

vocal cove
#

In python you have matplotlib, you have pandas, you have scipy, you have numpy, all of which are amazing tools for datascience and even basic AI/ML (for when you want to make a NN from scratch).

That's my 2 cents on it. You can of course see if R has any advantage depending on what you want to do in specific.

eager mantle
#

guys not to interrupt but can anyone tell me if these are good regression evaluation metrics for real estate predictions?

#

given that the mean house prices are over 1,200,000

vocal cove
#

Sth I usually like is just checking the accuracy on train and test datasets.

#

It's a percentage, it takes your dataset into consideration (thus context), and is easier to debug for model performance.

eager mantle
#

Thanks

vocal cove
#

My pleasure. I'm curious to see what your acc is too now.

#

Vanishing gradients go burrrr

eager mantle
#

๐Ÿ’€

vocal cove
#

When you train tensor networks, you really start to feel orthogonality catastrophe.

eager mantle
#

tensor flop

vocal cove
#

This is reaching that okay accuracy on a rather tiny tensor network. I wanted to keep it to a simple, global optimization approach, but that leads to orthogonality catastrophe, and that leads to vanishing gradients, and that leads to poor convergence as my networks get larger in size.

#

I have a love/hate relationship with it. I like how it makes sense, but don't like how inconvenient it is.

eager mantle
stuck tapir
stuck tapir
jaunty helm
#

if you're doing some niche statistics, R might have better support than python in some cases
example: in my very limited experience, if you want to do SEM, lavaan is more feature complete than semopy

stuck tapir
#

yep, thatโ€™s true
R shines for niche stats like SEM (lavaan) or bio stats
pythonโ€™s catching up, but Rโ€™s libraries still lead in some areas
so if youโ€™re deep into stats, Rโ€™s worth considering

jaunty helm
#

also another thing to consider: you can use both together
RMarkdown is like jupyter notebooks but you can run both R and python snippets; packages like reticulate also allow you access your pd.DataFrames in R code to bridge the gap

stuck tapir
#

yeah, exactly!
RMarkdown + reticulate is a great combo
lets you mix R and Python seamlessly, so you get the best of both worlds
perfect for projects that need both languages

verbal oar
#

if I have android, adobe, google
how can I cluster them
like distance between adobe and google is smaller than between android and adobe
make vectors from words and see in visualization
is it possible to do?

#

then I would have cluster with tech and cluster with companies

#

I think they would not be placed as I expect

jaunty helm
verbal oar
#

yes I know it in theory ok I check it in practice thanks

#

maybe also inter and intra cluster (forgot was distance?)

#

and wcss

jaunty helm
verbal oar
#

actually I have android and adobe words, google is just hypothetical
I did pos tagging on words now I have nouns and want to filter it further

#

yes closer of course example not good because complicates things

#

maybe other example algebra, android, adobe

#

Corresponding to math, tech, company

#

but I dont have any labels

#

also I have 700 words

stuck tapir
# verbal oar I think they would not be placed as I expect

yep, you can create word vectors for "android", "adobe", and "google" and calculate the distances between them. once you do that, you can use a technique like t-SNE to visualize them in 2D or 3D. you'll likely see clusters form, but they might not match your exact expectations

dry raft
#

i really needed this, tryna do my first independent paper

stuck tapir
dry raft
plain leaf
stuck tapir
#

get complicated ideas but then dedicate on manifesting them

dry raft
#

idk if i mentioned it or not tbh

stuck tapir
# dry raft btw, how do I then tokenize the GNN for my huggingface model?

you can tokenize the gnn by first converting the graph structure into a format that huggingface can handle, like a sequence of node features or a graph-based input. you can use libraries like torch-geometric or dgl to handle graph processing, then extract node features and adjacency information. from there, you can tokenize the node features and edges into a suitable input format (e.g tensors) and feed them into the huggingface model. if you're using esm2/protbert, you might want to integrate the graph structure with the sequence modelโ€™s embeddings, but thats just imo

dry raft
dry raft
#

besides i'm learning discrete maths as abase for learning ml

stuck tapir
#

especially since you're working with graph-based models like GNNs. understanding concepts like graph representations, adjacency matrices, and node embeddings will make it easier to work with graph data and integrate them into your models. discrete math is a great base for this, as it covers the fundamentals youโ€™ll need to grasp graph theory and more advanced ML concepts down the road

muted vine
#

hello

#

someone here can explain me how works the dialogflow IA?

#

to capture the text and directs to the correct intent

stuck tapir
#

dialogflow works by using nlp to match user text to the closest intent youโ€™ve set up. you define sample phrases per intent, and it uses ML to detect which one fits best. once matched, it can trigger a response or webhook to handle logic. you can access the raw user input too if needed.

knotty wolf
#

any data scientist aspirant bangali here?

serene scaffold
knotty wolf
twin sail
#

Hey Guys, I need some help with an automation project using Pywinauto. I'm stuck on a crucial partโ€”analyzing tabular data inside a Pane. The problem is that this data doesnโ€™t appear in the control identifiers, so I can't access it directly.

To work around this, I tried capturing an image of the table and using Tesseract OCR to extract the text. However, the accuracy is only around 80%, and some important data is being extracted incorrectly.

Would AI-based OCR be a better approach? Or is there another way to extract this data more reliably? Any suggestions would be appreciated!

serene scaffold
#

(tesseract is AI, but that's neither here nor there.)

stuck tapir
stuck tapir
#

okii

#

gl

dense needle
#

*plotting using ggplot2 package

#

if you end up using R, dplyr is usually recommended as the go to package for cleaning/manipulating data but i recommend data.table instead

#

syntax is less intuitive but it is much faster for big data sets

#

disclaimer: i actually did data science stuff in R before I learned python. however, R was not my first programming language

round parcel
#

can I do DL on an rtx 4060 laptop?

jaunty helm
naive axle
#

I'm training an image classifier model using pytorch , in each epoch accuracy of training change from 0.08% then to 0.0%, what could be the most probable cause to oscillate training accuracy like this

small wedge
naive axle
#

learning rate too high means like 0.001 or 0.1?

small wedge
#

.1 would generally be too high for any large model, .001 could work for a lot of models but the point is not the actual number that you set but the scheduling you're doing

#

usually to avoid the model jumping around right when it's about to reach convergence people will decay the learning rate

#

batch size also matters though, regardless of your learning rate that could cause a problem by giving poor gradient estimates

naive axle
#

got it

hallow badger
#

Gemini 2.0 flash create image sounds look very powerful, what do you're thinking about that

stuck tapir
# round parcel can I do DL on an rtx 4060 laptop?

yeah def, an rtx 4060 laptop can handle most dl tasks pretty well. you can train cnn models, run transformers, even fine-tune small llms if you manage vram smartly. just keep an eye on thermals and maybe use mixed precision where you can.

stuck tapir
stuck tapir
stuck tapir
lapis sequoia
#

Yo so like, yolo object detection, yay or neigh? And why is CV so limited with deep learning or something?

stuck tapir
quartz karma
#

Hi, if i have list of integer with arbitrary duplications as following {1,2,3,3,4,5,6,6,6,7,...,10000} and I sliced the list into sub lists with varing lengths so that there's no duplicate values in different sub lists. The sliced list is like this: {1,2,3,3} {4,5,6,6,6} {7,...} ... {..., 10000}

#

My question is, is there a simple mathematical function subID = F(x), where x is value of arbitrary element, subID is the identify number of the sub list that x is categorized to?

stable hollow
#

Hopefully this is a really simple question but I have a line chart with a table of the values beneath it. It would be really nice for the first cell in each row to have the little symbol showing which line it is. Can this be done?

(example is from excel)

lapis sequoia
#

Guys I need help , I am building a chat bot using RAG : the problem is that I feed data through pdfs , but I need to fetch that data directly from the website (url) , so it answers with the updated information from the website. Is that possible ?

unkempt apex
#

on that website which you are referring?

jaunty helm
#

or ig, you mean that for all sublists, no item in sublist_i can be in sublist_j
but then you don't have a unique way to do that, e.g. you could've cut it into

{1, 2, 3, 3, 4, 5},  {6, 6, 6, 7},  ...
stuck tapir
# quartz karma Hi, if i have list of integer with arbitrary duplications as following {1,2,3,3,...

yo so if the sublists are made by grouping duplicates together, like all same numbers stay in same sublist (and appear only once per sublist),
then nah, thereโ€™s no simple math function F(x) that maps x -> subID directly unless u track how many unique values showed up before x in the full list.

u basically need either a dict mapping x โ†’ subID (if slicing is done already),
or build F(x) by knowing the slice rules (e.g sublist ends when next dup shows up)
so unless the sublist pattern is strict + predictable, canโ€™t define clean F(x),

stuck tapir
stuck tapir
lofty knoll
#

Hello, good day to you. I need help. I'm having troubles in paste values of formula in python, how should I do this? I would want to loop all xlsx in a folder and paste as values all of the live formula included in files of the folder. ๐Ÿ™

#

Thanks in advance ๐Ÿ™

lapis sequoia
#

Hi! i am new with the data analytics and want to practise as i have gone throught the libraires like numpy pandas matplotlib seaborn -> but practise is what i lack so could anyone tell me which kaggle dataset or code i can use or what to do in this situation...

stable hollow
#

I'm hoping somebody can help me make the code to do it

serene scaffold
#

Where do you want these tables to appear?

stable hollow
#

On my computer screen?

serene scaffold
#

In a notebook? On a web page? As a PNG?

stable hollow
#

Not sure I understand what you mean

serene scaffold
#

In a word document? In a pdf?

stable hollow
austere prawn
austere prawn
glacial root
glacial root
stable hollow
#

Basically everything works except inserting these legend symbols into the cells

austere prawn
stable hollow
safe agate
#

Ooh I'd be interested too on legend icons repeated in the table

jaunty helm
#

in the table sounds like a hassle
either you somehow put the string representation of what you want into a new column, or you hack jupyter's html to display what you want
(or maybe there's a 3rd way unknown to me)

stable hollow
arctic wedgeBOT
stable hollow
austere prawn
#

The text I would guess is already in the table. The question is how to get an icon there that matches the legend's style. I think it would be A LOT of work

stable hollow
#

how come its not a lot of work to get it in the seaborn legend???

#

surely its just a bunch of shapes???

glacial root
#

for a digital camera pipeline should i be using opencv

serene scaffold
#

Did someone delete their own message?

hollow pagoda
#

does anyone know why normalization made V3 column clone V2

hollow pagoda
stuck tapir
stuck tapir
hearty depot
#

i find kaggle is way too simple most of the time and more interesting data comes from data one sources themselves

main fox
#

Wondering if anyone familiar with NLP has come across a similar problem and can offer alternative approaches.

I was recently tasked with extracting evidence of a certain medical condition from PDFs. These PDFs are very non-standard in their form. They sometimes span hundreds of pages.

The particular evidence in question is valid if a patient has had a related screening in the last two years.
The dates of service for these procedures are also very non-standard. They can by yyyy-MM-dd, or of the form "Jan 12, 2025", etc.

Sometimes the patient refuses to have the procedure, so some sort of "assertion" needs to happen to check this. Also sometimes the evidence is related to a family member.

I ended up building a rule based program that just looks for keywords, parses dates using regex, looks if certain words like "refused" are present in a small context window. Being just regex/word matching, it runs very fast, and I know exactly how it works. But it can definitely miss charts that have valid evidence.

stuck tapir
# main fox Wondering if anyone familiar with NLP has come across a similar problem and can ...

yo that's a super common real-world nlp challenge
messy, domain-specific docs w/ inconsistent formats. honestly, your rule-based setup sounds solid for precision + speed. to boost recall, maybe try a hybrid setup: keep ur regex for speed, but add a lightweight ML/LLM layer (like a distilBERT fine-tuned on examples of positive/negative evidence chunks) to catch edge cases. you can also use spaCyโ€™s dependency parsing for better assertion logic (e.g. link โ€œrefusedโ€ to the right subject). also worth extracting & normalizing dates w/ dateparser, it handles weird formats better than custom regex

serene scaffold
main fox
#

We have OCR in place that is very accurate

I'll have to look more into spaCy
It wasn't until after I built the program that I noticed medspacy has something called "ConText" for the assertion part

#

Handwritten notes are definitely a mess sometimes
The rest of the text, including text in tables, is extracted with good accuracy

serene scaffold
#

Medspacy is still a thing?

#

I'm at the creator of it in like 2018

#

I met*

#

He was trying to decide if you wanted to build on my platform or start something new

#

And he built something new

hollow pagoda
hollow pagoda
main fox
river cape
#

Hi , so I tried to train a model using colab , now after training the model , if I run the model.fit cell again , does it further train the same model

#

and if i want to train the model from scratch , should i restart the session?

jaunty helm
river cape
hollow pagoda
#

and just set the params to the latest results

cedar tusk
#

can you check the mean and variance of both columns

#

maybe the correlation is 1

hidden cloud
#

Hi guys ,
Iโ€™m a beginner

serene scaffold
fickle shale
hidden cloud
#

I wanna be a Data scientist what should I do as a beginner?

serene scaffold
serene scaffold
serene scaffold
#

@stuck tapir your message was removed for seeking an employee, which is not allowed.

stuck tapir
#

oh dang mb

hearty depot
#

like if u know basic linear and what convexity is

#

u basically have enough to make most models by just copying architecture

#

for stuff that has a little bit more math like vae u can learn the stats on the way

serene scaffold
flint grotto
#

where is LLM text books? recommend the books.

#

recommend resource. please.

glacial root
flint grotto
#

I want to study LLM, but all the books and materials are theses. I want to find books and materials, so please recommend some.

hearty depot
#

Then read gpt2

#

Once uread this try to load the weights and try to perform inference

flint grotto
muted vine
lapis sequoia
#

Ok, GridsearcCV hyperparameter tuning with cv=4; the roc_score and the accuracy score from that is not the same from using sklearn.metrics for accuracy and the roc auc score, right?

hollow pagoda
left sapphire
#

any pandas users here know how to run map() while retaining references to the current row and column of each element?

#

I have a data format where I have dozens of columns with categorical IDs, i.e.

incident_type, materials_type, human_factors
102, 50, 3
140, 42, 5

and each of those integers matches a lookup table where incident_type:102=STRUCTURE FIRE, materials_type:50=COMPRESSED GAS, etc

so I need to know what column I am on while doing an applymap so I can sub in the correct lookup table value

glacial root
#

this may be a trivial question, but do people generally prefer importing individual methods or modules, or just importing the whole library and having to type out everything (not wildcard import, don't worry i'm not that stupid)

serene scaffold
#

I think what you meant to ask was "do people prefer importing modules or importing individual classes and functions?"

glacial root
#

i mean like the "from ... import ..."

glacial root
#

is a module not a group of methods?

#

like for example with numpy, there's the linalg module

serene scaffold
#

A method is a function that belongs to a class. All methods are functions, but not vice versa

glacial root
#

wait yeah my bad i meant function

#

i gotta get rid of the habit of saying methods and parameters and instead say functions and arguments

serene scaffold
quaint mulch
quaint mulch
serene scaffold
spring field
#

it is also typically influenced by example code

#

like if I see that they import the module with an alias, I'll probably do the same in my code
if I see them importing specific names from a module, I'll also do that most likely

neat crystal
#

how can i stop my language model ai from just using punctuation spamm to get cheat the system

arctic wedgeBOT
austere prawn
#

Nice!

stable hollow
#

deriving the bbox position and spacing of the seaborn legend from the cell heights of the table, I am able to fake it

#

I am such a clever boy I deserve a treat

#

I know

#

as a little treat, I will take back the holy land from the nonbelievers

#

I can see it now - a holy war spreading across the land like unquenchable fire

#

fanatical legions worshipping at the shrine of my skull

#

a war in my name

#

everyones shouting my name

river cape
#

Heyyy guys I have a dissertation to make , could you recommend some problem statements that i should be working on ?

neat crystal
#

im making a ai and its already questioning me

Generated text after epoch 8:
<user> Hi how are you? <bot> '' what did you know ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? to do ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

neat crystal
# river cape what model are you using??

What do you mean? uh i here i let you look

    model = Sequential([
        Embedding(
            vocab_size,
            embedding_dim,
            weights=[embedding_matrix],
            input_length=CONFIG["max_sequence_len"],
            trainable=True  # Fine-tune embeddings
        ),
        
        Bidirectional(LSTM(lstm_units, return_sequences=True)),
        Dropout(dropout_rate),
        
        LSTM(lstm_units, return_sequences=True),  
        Dropout(dropout_rate),
        
        LSTM(lstm_units),  
        Dropout(dropout_rate),
        
        Dense(vocab_size, activation='softmax')
    ])```
hollow pagoda
#

dense might be just the activation layer before output

neat crystal
hollow pagoda
#

ya i think each model computes their own layers, atleast thats what it looks like

grand minnow
#

I made a chatbot that is powered by Google Gemini. How do I track and limit it so that it doesn't make me go broke?

hoary wigeon
#

Hey guys, I need help.

#

I started my journey in Data Science in 2022 and have been working with classical ML algorithms since then. Now that I have some time, I want to upgrade my skills and stay up to date with the latest tech stack. I need guidance on where to start and the best resources to learn from.

pearl barn
#

I have question what is the best course to learn python data analysis maven Analytics course or Jose Portilla course??

static oar
#

I wanted to replicate a project from github which translates sign language to English and vice versa. I can't even run the original project, let alone create a new one. Appreciate your patience and time.

Here's the repo link: https://github.com/kevinjosethomas/sign-language-processing

GitHub

โœŒ๏ธ An ASL fingerspell recognition and semantic pose retrieval interface (arXiv, GitHub, YouTube) - kevinjosethomas/sign-language-processing

glacial root
#

anyone here know of any good resources to look up on setting up a digital camera pipeline

olive obsidian
#

Hi, hopefully this is the right channel but I'm sure someone around might have an idea about this. I'm working on a project where I'm receiving the position data from a sensor (x, y). I'm now looking into the Kalman filter to better forward predict the position of the sensor with a couple of milli seconds (~100-300ms).

I've been reading up on the Kalman filter and trying to implement it. I'm curious if someone around has done this before and might want to help me a bit.

The Kalman filter I want to create should keep track of the estimate with [x, y, vx, vy]. What I'm especially curious about is how I setup the state covariance matrix (P) ? Should I simply come up with some values for the P-matrix?

glacial root
# hearty depot Opencv is decent

is there any way to detect and correct lens distortions of an image with just that image or do i need multiple images with different perspectives

random rune
#

Hello ๐Ÿ™‚ could someone give me some clues on how to do my homework please ๐Ÿ™‚ Apologies if this isn't the right place to post!

#

message me if you can maybe give me some clues - I want to figure it out for myself but I'm just needing a little push haha

#

I did manage to help some help from my classmate on task 2 in the end - i sort of understand it now

untold fable
arctic wedgeBOT
untold fable
#

pls check this out

opaque condor
#

@rich moth how did you manage to get your graphs to work

glacial root
glacial root
# cedar tusk lol youtube

all the youtube videos i've been finding either just give a very basic overview or they just straight up show the code (which i don't want, i want to be able to code it myself after learning the theory)

#

it wasn't like that for neural networks, i was able to find in depths theory videos for that, which allowed me to implement it myself using just numpy

hearty depot
#

This is ok primer to classical cv

glacial root
#

oh wait this is exactly what i was referred to by someone else lol

#

i could not understand the first part of 2.1 though

#

i must be stupid or something

#

after the homogenous vector part i just did not know where things were coming from

hearty depot
#

Iโ€™d focus on getting good at linear first

#

A lot of ml will be painful if u donโ€™t know linear and stats well

glacial root
#

i see

#

i know linear to an extent, but it could definitely be better

#

part of it thought was just that i didn't know where some of the equations were coming from

#

like for example with this

#

i get that lambda is a diagonal matrix (it is right?) but i have no clue as to what this is being used for

hearty depot
glacial root
#

so that gives us r = mew * p + lamba * q

quaint mulch
#

I'm still not sure what do you want?
Do you need help to find more dataset than 600k?
Do you need help to refine the 7k instance?
Are you showing one project outcome and asking for feedback on this "report"?
Are you asking for another project idea?

quaint mulch
quaint mulch
# hollow pagoda im still learning but is this labeling accurate? or are those not layers

The input layers could have been multiple layers. Have to double check with the definition of the Embedding class. But it seems to be a single layer, since there is only one embedding_matrix. but looks like it is expecting a list, so it could have been multiple layers.

The output layer is most probbably not just an activation layer, because the name of the class is Dense, which is usually a linear layer.

quaint mulch
quaint mulch
quaint mulch
quaint mulch
quaint mulch
quaint mulch
#

just scroll to the end of the page. It says so.
If you want a 3D line, then there is no constraint
if you want a 3D line SEGMENT, then you should constraint it

quaint mulch
glacial root
#

oh wait yeah i forgot only bolded items are matrices

quaint mulch
quaint mulch
glacial root
#

oh wait yeah

#

capital are matrices

quaint mulch
#

I just edited my explanation above, does it make sense now?

hollow pagoda
hollow pagoda
#

but lambda being x within the lines, like progression from point 1 to 2 as u explained

olive obsidian
#

I'm thinking about the prediction step of the Kalman filter: P' = APAแต€ + Q. I'm still trying to get a good understanding of why the Aแต€ is required there. If I got it right it's meant to make sure that the result of the multiplication ensures a symmetric covariance matrix. Am I right?

broken gyro
#

is this right place to ask about ai models?

jaunty helm
broken gyro
#

So I want to test some open source ai models and identify which ones can take max parameters + should be able to run without internet.

Am not familiar with using these models. Was suggested to find them on huggingface.

jaunty helm
dense lantern
#

Can somebody help me why my x axis looks like that ?

#

I am using matplotlib but the x axis value doesn't line up

dense lantern
#
fig, ax = plt.subplots()
for width_val in fts:
    x = [item[1] for item in width_val]
    y = [item[2] for item in width_val] 
    ax.plot(x, y)
jaunty helm
#

what's width_val and fts

dense lantern
#

it is a list of list with this format [[<width>, <x>, <y>]]

broken gyro
dense lantern
#

fts is just a list of width_val

jaunty helm
# dense lantern ```fts``` is just a list of ```width_val```
>>> import matplotlib.pyplot as plt
>>> from random import random
>>>
>>> fts = [[[random()]*3 for _ in range(5)] for _ in range(3)]
>>> fts
[[[0.6070407867652481, 0.6070407867652481, 0.6070407867652481], [0.21341951630147704, 0.21341951630147704, 0.21341951630147704], ...
>>> fig, ax = plt.subplots()
>>> for width_val in fts:
...     x = [item[1] for item in width_val]
...     y = [item[2] for item in width_val]
...     ax.plot(x, y)
...
[<matplotlib.lines.Line2D object at 0x000001E4EABC64B0>]
[<matplotlib.lines.Line2D object at 0x000001E4EABC6750>]
[<matplotlib.lines.Line2D object at 0x000001E4EABC6A80>]
>>> plt.show()
```cant reproduce your x-axis thing
jaunty helm
#

yes, e.g. 7b means that model has 7 billion parameters

dense lantern
#

I think I found the solution, the x is a string instead of float That's why the x isn't lining up

broken gyro
#

I have not used any models so idk how to setup ,need help with that too

jaunty helm
#

I'm not really understanding what you mean by "max parameters"

#

a 7b model has exactly 7 billion parameters
a 1.5 b model has exactly 1.5 billion parameters

broken gyro
#

uhm, idk tbh ๐Ÿ’€

jaunty helm
broken gyro
#

just test few models

jaunty helm
jaunty helm
# broken gyro yea

imo the easiest backends (the software used to run those models) are either ollama or koboldcpp
personally I prefer koboldcpp but the 2 are p similar in terms of ease of use

#
  1. download a version of koboldcpp from releases fit for your hardware
  2. download a model (that's stored in GGUF format), e.g. this; you'll see a lot of versions of the same model, e.g. Q4_K_M, Q6_K, you don't have to worry too much about it rn and just download the Q4_K_M one
  3. open koboldcpp.exe, select the file you downloaded, click Launch, and start chatting
broken gyro
#

so it is gui based ?

jaunty helm
broken gyro
#

What if I want to setup cli based?

jaunty helm
#

if you have an AMD gpu and want to use specifically rocm, there's a fork

broken gyro
#

got rtx 3060

jaunty helm
austere prawn
#

Did someone post here recently about an alternative to jupyter notebook?

I've been using it for 2 weeks and it doesn't feel like the sweet spot of persistent and dynamic so I'm just looking for experimenting if there are alternatives available.

serene scaffold
austere prawn
jaunty helm
austere prawn
agile cobalt
#

there is also the option of just using the terminal directly, either literally running in the terminal (e.g. VSCode's shift + enter can send the line(s) of code you have selected to run in the terminal), or anything else like IPython

jaunty helm
austere prawn
hearty depot
#

it has nicer formatting too and better vim support

austere prawn
random rune
#

Hey ๐Ÿ™‚ does anyone know anything about these functions? I'm so stuck on this...we have a dataset with the data being like 4000 bacterial strains in about 400 different conditions, and the correlation I've made with the conditions using .cor

#

this is something we did previously: - still not 100% sure what line 2 does, but I think it has something to do wth removing all the diagonals and repeated values

harsh bane
#

Don't know if i can ask it in this channel, but what's the recommended cloud hosted chatbot akin to GPT4 to assist with python code? Or is GPT4 the best at that currently?

opaque condor
#

How can I import a neural model into a simulation like a panda 3D or pie game

keen veldt
#

Got laid off - thinking of doing a masters in data science or AI. do any of you have any feedback on if such a degree would be helpful?

serene scaffold
glacial root
#

yo guys does anyone know what i'm doing wrong here

#

i'm trying to convert this to a color image

#

and to do so aren't we supposed to just make each element an array or tuple of 3 of that number?

#

so pretty much just setting the rgb values all to the grayscale value

#

wait nevermind that would just keep it as the same grayscale

#

but how else would we turn it into color

spring field
# glacial root but how else would we turn it into color

you can't, information has been lost, there's no way back
you can only estimate and "guess"

in terms of deep learning, you can train a network to do that (or grab a pretrained model from huggingspace (which you could finetune if necessary ig))
either way, I found this paper on the topic and it seems pretty interesting if you wanna dive deeper in how they achieved https://www.mdpi.com/2073-8994/14/11/2295

obviously there are probably also free (and not so free) online services that can do this as well (unless you need to do this for a large dataset in which case it'd likely definitely cost something)

glacial root
#

this is the overall assignment

#

i'm so lost on all of this

#

is there anyone who could perhaps help me a little with this? my goal is to be able to code this myself but i just need some help finding the right direction in terms of concepts and some directions on how i would implement this

iron basalt
#

Skipping step 1, you need to learn the RAW sensor data format.

#

A camera raw image file contains unprocessed or minimally processed data from the image sensor of either a digital camera, a motion picture film scanner, or other image scanner. Raw files are so named because they are not yet processed, and contain large amounts of potentially redundant data. Normally, the image is processed by a raw converter,...

#

A camera raw image file contains unprocessed or minimally processed data from the image sensor of either a digital camera, a motion picture film scanner, or other image scanner. Raw files are so named because they are not yet processed, and contain large amounts of potentially redundant data. Normally, the image is processed by a raw converter,...

#

What you need to do depends on the camera and its settings.

spring field
iron basalt
#

Whether or not it's using Bayer etc depends on the camera. I'm assuming since this is an assignment they have certain assumptions for you to make. Otherwise you need to enumerate all possibilities and that is why there are big camera libraries.

#

For example https://en.wikipedia.org/wiki/Foveon_X3_sensor captures directly to RGB.

The Foveon X3 sensor is a digital camera image sensor designed by Foveon, Inc., (now part of Sigma Corporation) and manufactured by Dongbu Electronics.
It uses an array of photosites that consist of three vertically stacked photodiodes. Each of the three stacked photodiodes has a different spectral sensitivity, allowing it to respond differentl...

#

Not color filter array, and therefore no Bayer.

#

But it still needs some processing since it's not in sRGB, which is probably what is meant when they ask for "RGB."

spring field
#

the assignment suggests loading a colored image and converting it to RAW grayscale
so, I would presume Bayer

iron basalt
#

Likely, but technically could not be (or no CFA at all as I linked).

spring field
#

(what I've learned in my quick research on this is that this is a deep spot in this area of the field and I'm glad we have abstractions over it ๐Ÿ˜)

iron basalt
#

Color is a huge rabbit hole.

#

And all the other parts too, like there are multiple ways to convert to grayscale, and you may need to use a different one depending on what you are doing with the result.

#

(Video game graphics programmers will know about this stuff (as they need to enter this rabbit hole for their work))

opaque condor
#

If I wanted to have a neural network control a puppet within a simulation do I have to make it so that can grab on to something like a blender ik bones

glacial root
glacial root
#

oh this definitely helps

#

i just need to find out about how to find raw bayer image data

glacial root
#

ok so i found a raw image file on kaggle and tried using that, and i found a library called rawpy that processes the raw image file in just one line

#

it feels like cheating though

#

cause there's definitely a lot more work that goes into this

narrow tiger
#

Is there any resources u can recommend on learning about agentic AI,
There are too many tools to learn so I need a reference as to what i should cover first.

#

Most videos I watch people just talk about theory like a lot of theory need something practical

#

And up to date

serene scaffold
narrow tiger
serene scaffold
narrow tiger
#

By charts I mean
flow charts ERD diagrams and stuff like that

serene scaffold
#

We call those plots. Or data visualizations.

#

I guess flow charts don't fall under plots.

narrow tiger
serene scaffold
narrow tiger
#

so basically if it can generate this memaid code That'll be a good start

#

^ so for something like this, it's still better to train your own model?
What will i train it on ? mermaid docs?

glacial root
#

yo guys do you think working on an edge detection algorithm using just numpy is doable? i know it'll definitely be decently harder than just a plan feed forward neural network but still worth doing right

#

only place i think i'll use another library is for getting the matrices of images and converting the edge matrix to an image

small wedge
#

absolutely doable yes, it sounds like a great project

glacial root
#

excellent, looks like my next task is decided

agile cobalt
glacial root
#

i'll first watch some videos to learn the theory/math behind it

#

and then i'll try to implement it myself

jaunty helm
#

it shouldn't be too hard to make a simple edge detection kernel and convolve it with your image

arctic delta
#

Hey everyone, I have a small question about clustering. I now have a distance matrix between samples, but how can I cluster based on this matrix? Any clustering method is fine.

#

As far as I know kmeans does not provide such a precomuted metric

jaunty helm
slim storm
#

does anyone know a good model for imputing a dataset with both categorical and continuous features? ive tried a couple but none have really worked. IterativeImputer from sklearn doesnt support categorical features, and MultipleImputer from autoimpute just throws a weird error

serene scaffold
#

No matter what, the way that you impute categorical features will be different from continuous ones.

cursive wing
#

๐ŸŒŸ Excited to Share My Latest Project! ๐ŸŒŸ
Iโ€™m thrilled to announce that Iโ€™ve successfully developed a chatbot powered by the advanced Google Gemini 2.0 LLM!โ€ฆ

slim storm
#

Im guessing if i use two different formulas they can still use all features as inputs to impute right?

lapis sequoia
#
  1. how to tackle the outliers to further clean the datasets
  2. how to figure out which visualisation might help you to clean or explore the dataset
  3. how to choose which ml model will be best for random dataset
keen veldt
serene scaffold
#

even reputed unis will have masters in DS or in AI that are separate from their CS department, where all the actual academic rigor is.

serene scaffold
#

The admissions requirements for northeastern are a red flag--you basically just have to send them unofficial transcripts, and they don't seem to care what courses you actually took.

glacial root
keen veldt
serene scaffold
pale condor
dense needle
#

decided it wasn't remotely worth, but i gave them some contact info and they have basically tried to flag me down a couple times lol

#

i was already not going to do it but it didn't inspire a lot of faith in the program

distant linden
#

Hi guys, I'm a first year computer engineering student and I would like to approach the world of AI, could you recommend me handouts, forums or books to start studying AI from scratch?

late vector
#

Why are there many ways to create graphs in R and Python?

#

I think it depends on preference and use cases.

late vector
civic vigil
#

Idk if this is the most appropriate channel but I'm trying to plot a confidence interval around a fit I did with scipy.optimize.curve_fit(). I asked chatgpt and it told me that I can do something like

var_f = J @ pcov @ J.T

Where J is the Jacobian. I don't need any help really I just want to make sure this is true bc i don't find it anywhere else other than chatgpt telling me

worn cosmos
#

I'm looking to train a NN with mixed continuous/discrete input features AND mixed continous/discrete target features. What's a good place to start with this? I only really have experience with sklearn

iron basalt
glacial root
#

yo guys i'm having an issue with grayscale conversion, isn't this the correct way to do it?

glacial root
#

i thought we were supposed to round

#

cause it's pixel values

glacial root
iron basalt
#

Pillow wants 8 bit pixels, grayscale.

glacial root
#

it's probably set to float for me

glacial root
#

thank you bro

glacial root
#

for edge detection, typically do we just omit the outer layer

unique ridge
#

https://labelme.io/docs/export-to-yolo

is this command only for premium users? I tried to run it but it keeps failing:

usage: labelme [-h] [--version] [--reset-config] [--logger-level {debug,info,warning,fatal,error}] [--output OUTPUT]
               [--config CONFIG] [--nodata] [--autosave] [--nosortlabels] [--flags FLAGS] [--labelflags LABEL_FLAGS]
               [--labels LABELS] [--validatelabel {exact}] [--keep-prev] [--epsilon EPSILON]
               [filename]
labelme: error: unrecognized arguments: /annotations --class-names waste```
Labelme - AI Image Annotation & Dataset Creation

Create private, flexible AI datasets with our offline annotation app. Save hours with AI-powered tools.

odd meteor
# worn cosmos I'm looking to train a NN with mixed continuous/discrete input features AND mixe...

You need to start with learning PyTorch or other deep learning frameworks like TensorFlow, JAX etc.

If you ask me, I'd say, just zero in on PyTorch!

Once you've understood a bit about how PyTorch works and how to use it in training NN (there are nice videos on YouTube you can use to learn), then implement what you've learnt on your dataset.

You can use this video to learn PyTorch https://youtu.be/Z_ikDlimN6A?si=QtSTZSD7hc1SyFE8

Welcome to the most beginner-friendly place on the internet to learn PyTorch for deep learning.

All code on GitHub - https://dbourke.link/pt-github
Ask a question - https://dbourke.link/pt-github-discussions
Read the course materials online - https://learnpytorch.io
Sign up for the full course on Zero to Mastery (20+ hours more video) - https:/...

โ–ถ Play video
odd meteor
# slim storm Im guessing if i use two different formulas they can still use *all* features as...

You can use the SimpleImputer from sklearn.

from sklearn.impute import SimpleImputer

How to go about the imputation depends on your choice of imputation strategy. See the docs below for a detailed guide.

https://scikit-learn.org/stable/modules/generated/sklearn.impute.SimpleImputer.html

odd meteor
clear goblet
#

hi does anyone know any good website to get datasets i have one from my uni assignment but i wanna experiment myself

hearty depot
velvet phoenix
#

Guys I have a laptop with hybrid graphics which includes Nvidia graphics and whenever I'm trying to use pytorch from inside python it is not showing up over there so what is the issue

torch.cuda.is_available() is giving false

glacial root
hearty depot
#

Model like yolo

#

Or efficientnet

glacial root
#

oh wait

#

object detection algorithm?

glacial root
zealous girder
#

What are some alternatives to the browser-based jupyter notebook for running ipynbs. Currently I am using VSCode, but I open to other alternatives

jaunty helm
kind sage
#

Hey guys, I have a database of products from Amazon, but it's missing the "Date First Available" field. Does anyone know how I can scrape Amazon to get this information? Any tips or tools would be super helpful

red heron
#

hey guys i was searching for an ML project which is not trendy but very much useful, if anyone has any idea, do lemme know :pepeHype:

small rune
kind sage
river cape
#

Do you think mcp servers is the next big thing?

lofty thorn
weary timber
#

first (can be) useful project of mine

weary timber
#

if you have any ideas for new features pls tell me

quaint mulch
# distant linden Hi guys, I'm a first year computer engineering student and I would like to appro...
MIT Deep Learning 6.S191

MIT's introductory course on deep learning methods and applications.

GitHub

My personal list of what are the things to learn in deep learning. - aprbw/ArianDLPrimer

quaint mulch
glacial root
#

i've done that

worn cosmos
#

but thanks for the link this looks interesting!

quaint mulch
glacial root
#

also i think my implementation was just regular machine learning and not much computer vision involved since i didn't make it able to detect in real time

#

the only computer vision element to it is the fact that it's images, but in terms of the way the model was trained, it was just like any other extremely simple feed forward neural network

#

right now i'm trying edge detection, i've done the sobel operator part and i need to try canny, which i'm not yet sure how to do so i'll have to figure it out

#

sobel though was a lot simpler than i thought it would be, just a basic kernel convolution

#

and i might try implementing the AlphaDog attack

flat dragon
#

A = np.array([[1, 2, 3], [4, 5, 6]]) # Shape (2,3)
print(A)
A_expanded = A[:, :, np.newaxis] # Shape (2,3,1)

print("\nExpanded 3D Array:\n", A_expanded)
print("Shape:", A_expanded.shape) does it become tensor at this point? mathematically speaking where tensor is matrix but in higher dimension

#

im trying to get intuition behind np.newaxis

serene scaffold
pale quarry
#

Hey guys am 4 th year CSE student,
Am assuming to work on data science for my mini project,
So I need you guys to recommend me ideas on what to do ,
Beginner or medium level

opaque condor
#

Is it a good idea to make your own simulation or use something already pre-made for a neural network lets say a 3D parkour ai

white reef
#

Hey, guys!

I've been doing a research in natural language processing for conlangs and during the research I developed a framework called ALF-T5, which uses Google's T5 to adapt to any language for translation via fine-tune using transfer learning with PEFT 's LoRA technique. It serves as an universal language translator trainer, you can train ALF-T5 to any language translation pair and it will adapt, thanks to T5's language comprehension capabilities.

There's more info on the repository itself, which is available at: https://github.com/matjsz/alf-t5

If anyone wants to check it out, please don't forget to leave a star on the repository, it helps me a lot! Thank you for reading and have a nice day! :)

serene scaffold
white reef
#

This started as a side-project to translate my own conlangs to English and vice-versa, thus creating the need for a bidirectional encoder-decoder structure, and kept evolving. The thing is that conlangs have scarce data, so it's not like a consolidated language, this way I was trying to find a way to train a encoder-decoder model to be able to translate even with few examples on the dataset

#

That's when I decided to use T5 as a base model, as it has the knowledge on natural language that I need and by applying transfer learning via fine-tuning, I could keep the capabilites from T5 that I needed and the data from the conlang that I wanted it to learn

#

Few shot learning was key on that, too

#

The framework was born from this and is key to perform the research now, so if this was useful for me, maybe it will be useful for someone else, too

serene scaffold
white reef
#

Yes, exactly!

serene scaffold
#

what conlangs are you most interested in?

white reef
#

I really like agglutinative ones, like Na'vi, but the one I was testing the framework on is one of my own, it's inspired by latin and is very straight-forward

#

I was reading the Na'vi PDF a couple days ago, boggled my mind

#

Those are really wild, but pretty interesting

serene scaffold
#

does Na'vi have any phonemes that are impossible for humans?

white reef
#

I'm not really sure, to be honest, but it's sure difficult to learn

#

I couldn't tackle everything, but as it seems, the language has its similarities to some human languages

serene scaffold
#

if there are "impossible phonemes", it would be because the Na'vi have different throat/mouth structures than humans. which is what I would expect.

white reef
#

That makes sense, since it was thought to be spoken by an alien species

#

Could it be possible?

#

To like, design a conlang that way

#

I never thought about that, to be honest with you

serene scaffold
#

why wouldn't it be?

serene scaffold
white reef
#

Yeah, since we have the phonemes, we could think on some wild ones, that's right

white reef
#

That explains a lot haha

#

I actually thought about pursuing a degree on linguistics, but choose CS instead

serene scaffold
#

you made the right choice, unfortunately.

white reef
#

I really like both, so even if linguistics is a hobby, they are my main passions, but I don't have that much knowledge on linguistics, it's very dense, there's a lot to learn

white reef
#

But it's a fascinating field, really

#

It's really nice to know that there is another linguistics enjoyer here haha, with the difference that you are actually a professional. I just hopped in here, so I'm kind of getting the grip on the server, but it seems a nice place.

serene scaffold
#

that's because we ban everyone who makes it not nice.

white reef
#

That's fair haha, seems to be working

clear goblet
#

hi anyone know how you are supposed to optimise an ANN?

#

i keep trying to train mine and it only ever reaches .14 r^2 score

#

but what ive read up already is that some areas might be undersampled?

#

but ive no way of going around this since most of the data sits between this 4 - 8 range