#data-science-and-ml

1 messages · Page 82 of 1

boreal gale
#

i think that's exactly chebyshev's inequality just represented slightly differently

agile cobalt
serene scaffold
#

always ask an actual question that someone who knows the answer can start answering--don't ask to ask.

#

@boreal gale @agile cobalt thanks!

somber prism
#

model : resnet 152 initially loaded with imagenet weights
dataset : freihand and multiview hand pose dataset (approx total of 180k images)
optimizer : Adam with .01 lr
task : keypoints detection
hnet : trained for 10 epochs
hnet2 : +5 epochs (15 ep)
hnet3 : +5 epochs (20 ep)
hnet4 : : +5 epochs (25 ep)

ive been training my resnet model for days and ive found out there's a gradual decrease in MSE loss, my goal is to train the model to work well during testing ( lets say with my hand , outside of this dataset ), can someone suggest me a way to improve this model , it would be really helpful , should i just train my model bit more like upto 50 epoch? so far i've cropped the image using bouding box to focus the whole image on hand to remove noises.

serene scaffold
#

this insight was the key

#

@boreal gale @agile cobalt any ideas for what to make of this? I'm not sure what to do with u (without a subscript; u has up to this point been the random variable) now that we have a sequence of u values.

boreal gale
agile cobalt
#

what I thought that could be useful from the proof was mainly the y = (X - u) ** 2

serene scaffold
pale hemlock
#

omg

#

I think i found my room

pale hemlock
#

jaabir

#

ok honestly im novice but i work with it so icould help?

serene scaffold
#

sorry I just wanted to be included.

somber prism
pale hemlock
#

ok

#

what exactly is the issue?

wooden sail
#

yeah you can take (u - mu)^2 as a new random variable, substitute into chevyshev's, and compute its expectation

pale hemlock
#

@somber prism >?

#

OH

#

Jaabir

#

would you mind phone call, i may interest you in a new way to work with this?

somber prism
#

you can dm

#

@pale hemlock

somber prism
serene scaffold
#

I just asked the prof, so we'll see when he replies.

#

I guess it's an arbitrary element from the sequence of u values, but I don't think that goes without saying.

boreal gale
#

is the question to figure out what is u 😛

serene scaffold
#

the question is "what is u intended to represent in the context of the homework question"?

#

and the homework question is to prove that inequality.

wooden sail
#

u is probably the mean of all the u_i, since that reduces the variance by a factor of n

#

you can already go ahead and try that out

boreal gale
#

my guess it's probably supposed to be \bar{u}

wooden sail
#

but yeah, it's not written clearly

#

you can use the linearly of the expectation operation to show this one, as well as once again plugging into chebyshev's ineq

#

as a general recommendation, it's a good idea to keep an eye out for the (central) moments of random variables

#

they tend to have nice properties and give you intuition about their distribution's properties

serene scaffold
#

what is u-bar, and if it's the mean of all u_i, how is that different from mu?

wooden sail
#

the bar is a common notation for the mean

#

and i guess i should say "sample mean" for clarity

#

.latex the sample mean is [ \frac{1}{N} \sum_n u_n ]

strange elbowBOT
wooden sail
#

this is only equal to the true mean if N goes to infinity

#

people usually call the "true mean" the "expected value"

#

.latex that'd be
[
\mathbb{E}(U) = \int_{-\infty}^{infty} u f(u) du
]

strange elbowBOT
wooden sail
#

oops

#

where f(u) is the pdf of U

#

these two being equivalent as N goes to infinity is the "law of large numbers", which btw doesn't hold in general

#

but the takeaway for you is that the average and the expected value are not the same thing

#

what averaging DOES do is reduce the variance

#

this is what the problem is asking you to show

#

in signal processing terms, averaging is the same as applying a rolling average window, which is the same as lowpass filtering

#

another way to think of it is that the sample mean or average is a function of random variables, and so its output is also another random variable with a new distribution. one with the same mean as the original variables, but a lower variance. the expected is a constant

serene scaffold
#

@wooden sail

Oh wow, sorry, I just left that out of the question, and no one asked yet! Yes, u in part c is the sample mean of all the us.

#

it's due in 10 hours

wooden sail
#

😌

#

mystery solved. i left you 10 pages of lore explaining the background in the meantime

serene scaffold
#

I like lore

serene scaffold
past meteor
#

ooof

#

In this context bar{u} would've been appropriate like Ry said then

#

Typos can be made ofc 🙂

past meteor
#

I always feel "guilty" when I realize that I used to know these things but I forgot 🤣 . Guess that's what mostly doing applied stuff does to you

boreal gale
#

i have 0 guilt 🤣 because i know i wouldn't even know how to do 20% of the things i easily whip out today back then

modest mauve
#

I'm making a python script that extracts the sudoku grid from the image. all numbers should be extracted into a 2d array matching the sudoku image

this my colab notebook link:
https://colab.research.google.com/drive/1ykMxMtiPX0SVph6bQpzQTLCZGkPJFJuT?usp=sharing

there some error in text extraction or might be in perspective correction, not sure what to do exactly?
kindly have a look at the code🙏

error:
AttributeError Traceback (most recent call last)
/usr/local/lib/python3.10/dist-packages/PIL/ImageFile.py in _save(im, fp, tile, bufsize)
517 try:
--> 518 fh = fp.fileno()
519 fp.flush()

AttributeError: '_idat' object has no attribute 'fileno'

During handling of the above exception, another exception occurred:

SystemError Traceback (most recent call last)
10 frames
/usr/local/lib/python3.10/dist-packages/PIL/ImageFile.py in _encode_tile(im, fp, tile, bufsize, fh, exc)
531 encoder = Image._getencoder(im.mode, e, a, im.encoderconfig)
532 try:
--> 533 encoder.setimage(im.im, b)
534 if encoder.pushes_fd:
535 encoder.setfd(fp)

SystemError: tile cannot extend outside image

i put an exception handling code there and now whole extracted grid is 0000...

please help!!!!

white crest
#

Can someone help me with pandas in python? I have some issues in implementing some of the functions of pandas and numpy. I also need some help in creating a script to get some insights from a dataset.

serene scaffold
left tartan
#

Check your params carefully. I didn’t look at your code tho.

modest mauve
left tartan
#

I can’t right now, I’m half debugging something, just doing light discord 🙂

white crest
# serene scaffold be sure to always give enough information that someone can start helping--don't ...

hi @serene scaffold thanks for your reply, actually i want help in generating a scripts from specific condition to implement in dataset,
though the datasets are

  1. power data of equipments -
    {'Time stamp': ['2023-06-01 00:00', '2023-06-01 01:00', '2023-06-01 02:00', '2023-06-01 03:00', '2023-06-01 04:00'], 'HVAC 1 (kW)': [0.0, 0.0, 0.0, 0.0, 0.0], 'HVAC 2 (kW)': [0.0, 0.0, 0.0, 0.0, 0.0], 'HVAC 3 (kW)': [0.0, 0.0, 0.0, 0.0, 0.0], 'HVAC 4 (kW)': [0.6772, 0.5796, 0.4976, 0.6235, 0.5637], 'Kitchen Bar lights (kW)': [0.0, 0.0, 0.0, 0.0, 0.0], 'LCC Oxford Circus - Total (kW)': [5.39, 5.25, 5.17, 5.0, 4.42], 'Main 1 (kW)': [5.39, 5.25, 5.17, 5.0, 4.42], 'Main 1 L1 (kW)': [3.81, 3.85, 3.77, 3.72, 3.22], 'Main 1 L2 (kW)': [0.9682, 0.9715, 0.9935, 0.919, 0.9206], 'Main 1 L3 (kW)': [0.611, 0.4309, 0.4057, 0.3594, 0.2783]}

  2. Working hours of the site -
    {'WeekDay': ['Monday', 'Monday', 'Monday', 'Monday', 'Monday'], 'Type': ['Non Trading', 'Non Trading', 'Non Trading', 'Non Trading', 'Non Trading'], 'Hour': [0, 1, 2, 3, 4]}

halcyon hedge
#

df_temp = df.query('1970<Year<1981')
plt.pyplot.subplot(1, 5, 1)
df_temp.value_counts("Method").plot(kind = 'bar', title="1970-1980", figsize=(24,4))

df_temp = df.query('1980<Year<1991')
plt.pyplot.subplot(1, 5, 2)
df_temp.value_counts("Method").plot(kind = 'bar', title="1980-1990")

df_temp = df.query('1990<Year<2001')
plt.pyplot.subplot(1, 5, 3)
df_temp.value_counts("Method").plot(kind = 'bar', title="1990-2000")

df_temp = df.query('2000<Year<2010')
plt.pyplot.subplot(1, 5, 4)
df_temp.value_counts("Method").plot(kind = 'bar', title="2000-2010")

df_temp = df.query('2010<Year<2020')
plt.pyplot.subplot(1, 5, 5)
df_temp.value_counts("Method").plot(kind = 'bar', title="2010-2020");

plt.pyplot.suptitle("Main Title", fontsize=15)
plt.pyplot.subplots_adjust(hspace=4, top=4)
plt.pyplot.subplots_adjust(left=0.1,
bottom=0.1,
right=0.9,
top=0.9,
wspace=0.4,
hspace=0.4)

plt.pyplot.show;

#

Spacing and padding not working for the heading("plt.pyplot.subtitle").

#

The word "Main Title" ("plt.pyplot.subtitle") is just overlapping with the title of the individual graphs. Despite adding "plt.pyplot.subplots_adjust(left=0.1,
bottom=0.1,
right=0.9,
top=0.9,
wspace=0.4,
hspace=0.4)"

#

How to fix this

serene scaffold
# white crest hi <@253696366952316929> thanks for your reply, actually i want help in generati...

to be clear, I wasn't making a commitment to help. I was just telling you that you need to ask your question if you want to get help.

generating a scripts from specific condition to implement in dataset,
I don't know what this means. it sounds like you don't have the vocabulary to convey what you're trying to do.

Try showing what result you want given the two dataframes that you have shown.

minor cloak
#

[PyGraft is looking for open-source contributors]

Hi there,

I recently open-sourced PyGraft, a configurable Python tool to generate synthetic knowledge graphs easily!
It can be used in any AI tasks (Machine Learning, Deep Learning, Reasoning, etc.) provided that you work with graphs.

The repo is gaining a lot of visibility, and I am looking for motivated contributors to support me in implementing new features and unit tests. Ideally, you should (or would like to) have a general understanding of knowledge graphs, semantic web, RDF/RDFS, and OWL vocabularies. In addition, strong Python programming skills are required. Experience in Software Engineering is a plus 🙂

DM me if you would like to contribute!

Otherwise, you can still take a look and star and fork the repo if you find the project interesting!

https://github.com/nicolas-hbt/pygraft

GitHub

Configurable Generation of Schemas and Knowledge Graphs at Your Fingertips - GitHub - nicolas-hbt/pygraft: Configurable Generation of Schemas and Knowledge Graphs at Your Fingertips

lunar wadi
#

Can someone help me to find some good article pr blog which discusses about static evaluation for non-terminal gamestate

left tartan
lunar wadi
#

Yeah, I am subscribed to that channel

left tartan
#

(That’s all out all I know about games tho)

lunar wadi
#

Im just having a simple approach in my game which uses minimax upto certain depth and the chanell talks about the deep learning

atomic hamlet
mighty patio
#

That kind of graph is called a histogram.
There is a plt.hist() function that will give you a bar plot, but there is also a histogram function in numpy which you can use in combination with matplotlib to make a line plot like shown there

atomic hamlet
weak mortar
#

while a histogram also have x and y axis and can display the same data, i'd say its a linechart with smoothing applied to the line

#

wouldnt cost too much and see here if you like that type of plot. thats just one out of many other libs. https://plotly.com/python/line-charts/

Over 16 examples of Line Charts including changing color, size, log axes, and more in Python.

#

to make it smooth you want to look for what they call "spline" .. somewhere . the documentation is rather ok while not amazing

atomic hamlet
weak mortar
#

okay. you have time on X and what you plan for Y?

#

or i dont know if you have time on X, thats how i interpreted your msg

atomic hamlet
left tartan
atomic hamlet
#

(srry didnt think to make any yet)

left tartan
atomic hamlet
#

Its an object with a UNIX EPOCH Timestamp iirc

mighty patio
left tartan
#

You -can- do some of this stuff in the charting library, but the normal process is just to calculate what you want, then plot it.

#

(I don’t think of this as a histogram case, but if you added a date column to each object, I guess it’d work)

#

All that grouping stuff is really Pandas or SQL language, by the way

weak mortar
left tartan
#

Bins is what you’d call it in a histogram… but grouping is what you’d call the data transformation in data libraries and dbs.

weak mortar
#

ah cool. yea i see the histogram 2d and other 2d maps have nbins functions where scatter only has a few different named something with group

pulsar elk
#

In a two-way ANOVA, how would you calculate the main effects and interaction effects using Python?

#

helloi am not even able to understand what is ols function working?

prime token
#

Is data science just about grabbing data and making sense of it? Like if I wanted to create some analytics for rugby, I just grab the important data make sense of it via libraries and whatnot

mighty patio
prime token
#

Do you have an example of that if you dont mind me asking

mighty patio
#

I am an X-ray scientist so probably not the best person to ask for examples, but you can look into past kaggle competitions
I remember there was one for predicting supermarket sales (to predict how much produce should be stocked by the supermarket), and the sales of ice cream would increase on public holidays but only if the weather was good.

red latch
#

whats the best stats course i can take for data science? your recommendations?

left tartan
red latch
#

uhm, i have an over the top understanding of some concepts but ive taken high school math and engineering math aswell so yeah

potent sky
red latch
#

but what would be a good course to take teaches both stats and implementing that in python

left tartan
left tartan
prime token
#

I think you can make bank with a maths and stats degree. Quants require like some msc from what I see and you can make millions if you're good enough

left tartan
red latch
#

since im not really super idk

#

whats the word im looking for?

#

isolated from the math concepts?

left tartan
red latch
#

thanks for the pointers

#

how do you guys follow a book when its not prescribed as part of a course

#

thats dedication

#

pls

left tartan
red latch
#

ahh

#

thats far more convenient and freeing tbh

#

my brain is so "all or nothing" tho

#

i cant just pick a chapter and not like perfectly do the whole book seems so wrong

left tartan
#

Yah, I have a few books I’ll never finish, but I can’t put them away

pale hemlock
#

there's always finishing them with audio

wooden sail
#

that doesn't work if it's technical content

left tartan
#

Lol, I’d love to hear a stochastic calc text in audio

past meteor
gaunt sorrel
#

i know

simple tapir
#

hey

#
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

def logistic_regression(data, max_iter=100, random_state=42):
    x = data.drop(["happiness_rank"], axis=1)
    y = data["happiness_rank"]
    x_train, x_test, y_train, y_test = train_test_split(x,y,random_state=random_state)
    model = LogisticRegression(max_iter=max_iter, random_state=random_state)
    model.fit(x_train,y_train)  
    preds = model.predict(x_test)
    score = accuracy_score(y_test, preds)
    return score
#

it shows that accuracy is 0.0

past meteor
#

How many levels? (E.g., is it 0/1 or from 1 to 10 or so?)

abstract wasp
#

Hello there, I want to train YOLO with my own dataset but I’d have to do the annotations manually and it would take forever to do so 😭 I was thinking of using edge detection to automatically create the bounding boxes. Do you guys agree or do you guys have a better idea?

dire iron
#

Anyone interested in optimization of LMs?

civic elm
#

Greetings, I am working on the kaggle medical transcript dataset and I am trying to re-balance the dataset because the mean word count is at 18 words, max words is 76. I am wondering what is the best move here? should I cut the top 25% highest word count?

lunar wadi
#

Hi, does someone knows any kind of resource related to designing some evaluation function which takes in a state (mostly non-terminal) and the player and returns score (winning).
An example of the function in any game will also be fine. I just want to get some jist about the core concept of extracting the feature and evaluating those features

I am making a connect4 game which is using some algorithmic approach for AI player. I'm using minimax under the hood.

echo lance
#

I was doing chapters of Elements of Statistical Learning completed 3 chapters, but then started Deep L. For coders 2022 course. Watched 4 lec.
Now i am just not able to figure out that should I first learn ML in depth or do DL. I know ml basics .

#

I have 2 years left of my clg (undergrad)

past meteor
#

Unless you specifically want to go into computer vision or NLP. You should know ML before DL because deep learning offers solutions where "regular" ML fails.

echo lance
#

i was thinking that this is the best time when i can read this book with no pressure... but i also think that is it really required? because this time i can invest in any other thing like more on implementing than on theory

past meteor
#

These 3 books are what I always recommend

past meteor
#

ML is a very very leaky abstraction. Every model nowadays has a .fit() and .predict() method but they're all subtly different and for me at least knowing what most of them do gives me confidence that my work is correct.

drowsy dove
#

Hi all, does anyone know of any good online resources where I can find data science sample projects (and preferably solutions)? . I've gone through McKinney's book for Pandas but I feel like without some actual hands-on projects none of it will stick

potent sky
echo lance
#

Someone 📍 pin this message... really helpful

potent sky
drowsy dove
short heart
#

Is it ok to tune CatBoost parameters on GPU to then use these parameters for CPU training?

potent sky
# drowsy dove I wasn't actually. Do you recommend just going through the excercises here https...

Yep that might be useful.
But I was also referring to projects part, since you asked about projects.
Kaggle has tons and tons of projects/notebooks by other people.
You can find the right projects as per your interest.
Though do be careful to not pick up bad practices as these projects are not curated. Maybe keep an eye on the upvotes, user-level and discussion section to get some sort of idea.

delicate gyro
jovial elm
#

So I have this project I'm working on, and I need to capture the text on a screen and identify what number the text is. However, the area surrounding the text is transparent for the most part, therefore the background is subject to change, which may directly affect how the text is captured.
To capture the text, I'm using adaptive thresholding. I've attached a short video on the text being captured in different background environments.

My question is, what's the best approach to identify if the text equals "0%" or "1%" or "2%" and etc.
I need the solution to be relatively efficient. So far, the capturing the text and adaptive thresholding is done in 0.03s roughly.
As of right now, I'm thinking of template matching.

serene scaffold
echo lance
#

Is data mining possible with LLMs ? Like passing whole bunch of pdfs and say it to generate a csv from them ..
I searched for some papers but couldn't find somthing intresting.

#

I want to generate a medical dataset for medicines and symptoms

#

So thinking of this instead of doing text scrapping

#

from medical books

serene scaffold
#

and you can't just pass PDFs--you'd need to convert them to text first.

echo lance
serene scaffold
#

but it might be that it doesn't matter.

echo lance
left tartan
echo lance
left tartan
bronze vessel
#

hey guys i want to learn opencv

#

Are someone suggest tuto

#

r

past meteor
odd meteor
silk drum
#

Good afternoon everybody! Does anyone here use seaborn?

serene scaffold
solar pagoda
#

Hi guys, im working with a dataset of cars in csv and a some data has the word 'Turbo' in something like line break
so what i want to do is delete the CC and Turbo, i tried this code:

data['Power in cubic capacity (CC)'] = data['Power in cubic capacity (CC)'].str.replace('CC\nTurbo', '')

But Turbo isn't deleted, can somebody help me on how to delete it?

dusk tide
#

Hi, guys need help finding a regex pattern . I have a list of strings eg. ['TR ITA14TRK010 FF BROOKLYN STRAIGHT Beige 44','MB ITA14BLT016 35MM NA Olive Green 32','MB ITA15BLT004 40MM NA Reddish Brown 38 / 112CM'] and I want to split the string based on charaters which starts with IT(like ITA14TRK010 ,ITA14BLT016 ,ITA15BLT004 ) so that I can grab the brand name(at the start of string like TR, MB,...etc.). How can I write a good regex expression for the same . I have written one r'IT[A-Z0-9]+'. Can someone validate it ?

lavish lily
#

Is BertGeneration or gpt-3 better suited for text generation tasks

simple tapir
past meteor
simple tapir
#

let me try

past meteor
# simple tapir

Okay, now this makes. You need linear regression for this. There's too many levels in your target.

#

There's models that are (slightly) more adequate than linear regression for your problem (because you have integers) but I don't want to send you down a rabbit hole 🙂

#

What I can say is that in your code you're not doing any feature scaling, you actually should. Logistic regression by default (and the new model you should use which is RidgeRegressionCV) automatically have regularisation and it does not work properly if your data is on different scales. I can explain why if you want but for now my advice to you is always rescale your data. Doing it where it's not necessary isn't as bad as not doing it

simple tapir
#

I see

#

Thanks a lot for your help!

mint palm
#

how to do hyperparameter tuning?
my supervisor was mentioning something similar to doit search (is sounded like **oit search, * means i am not sure about start lol ), I didn't catch the pronunciation correctly.

past meteor
mint palm
past meteor
#

That's relatively important to mention because it influences what kind of hyperparameter tuning you can do 🙂

#

Typically for neural nets you'd run a hyperparameter tuning algorithm that does the search "sequentially" because you may not have enough VRAM to do it in parallel

low orbit
#

Hello.
I'm a new at this field. #langchain
Does anyone can give an advice how to ask chatGPT to get structured response with validation model?
In official docs represent extract documents for that. But I'm trying to build something like:

Question: Hello chat, what is the State of the Union?
Structured answer:
['Question',
'Alternative rephrased question',
'Second alternative rephrased question']

Kind a hard to build. Please give an advice.
I know somehow its possible to get from chatGPT structured answers from just one-line questions.

spare briar
past meteor
#

I mostly mentioned this because in the past I never had the ability to train multiple NNs concurrently so instead of doing my default strategy (random search) I'd go with a variant of bayes opt

inland rivet
#

is there any function to get the index value of a given element from any column?

agile cobalt
#

can you give an example of the input/output of that would be like?
edit; brb in like half an hour

inland rivet
#

I have 2 dataframes, A and B. A columns are vegetable and type. B columns are vegetable and amount. I want to iterate through B vegetable column, check if that vegetable is in A and asign the amount in B as a new column in A.

#

A might have vegetables that B dont't so I'll put a 0 in those cases.

#

I tried using concat inner.

tidal bough
#

I think it'd be something like A.join(B, on="vegetable", how="left"), yeah.

agile cobalt
#

you could do it a bit more "manually" like ```py

either of

veg_amount = B.set_index("vegetable")["amount"]
veg_amount = pd.Series(index=B["vegetable"], data=B["amount"].array)

then

A["amount"] = A["vegetable"].map(veg_amount).fillna(0)
``` but you probably should prefer df.join or pd.merge

mint palm
#

i need to tune

3 weights of 3 losses
one temperature
learning rate

what hyperparameter tuning method should i use and why? when to prefer one over other? i am using torch

topaz night
#

is theres any diff between 4060 lp and reguler one ??

serene scaffold
verbal venture
#

do you have any thoughts on this @serene scaffold I want to make a web application that uses different LLMs frmo differnet organizations. I'm worried prehistoric inputs won't be used in the attention mechanism as they are different LLMs. is it possible to retain the different attention histories across each LLM. in other words, use the previous attention from one LLM in another

#

I was thinking concatenating attention might work but not sure if the math behind that does what I want it to

serene scaffold
#

I don't think it's guaranteed that every LLM uses attention (though it might be unusual for one not to), let alone in a way that can be used as-is in a different LLM

boreal blaze
#

Hi everyone, this question isn't particularly related to AI, more so to the philosphy of AI, so if there is a more suitable channel feel free to point it out and I will move it there.

#

I want to know what exact the robot argument and system arguement to the Chinese Room Argument mean.

serene scaffold
boreal blaze
#

So far, my understanding of chinese room arguement is as follows.

#

haha dw i want nothing to do with AI or datascience

#

this is for some content i'm learning for an algorithmics class

#

with the system argument, the general gist of it is that even tho the human doesn't understand Chinese, the room as a whole does. But what does understand Chinese, in that context mean?

serene scaffold
#

"What does it mean to understand" that's the crux of the question.

left tartan
#

That summary also seems terribly written, no offense to the author.

boreal blaze
#

... the author is me ;-;

#

how can i improve it?

left tartan
#

It's the sentence about "Searle claims that if a human-like computer..."

serene scaffold
#

It sounds like you might not fully understand what the Chinese room analogy is intended to convey.

boreal blaze
#

to me it seems like the analogy is there to argue that there is no way for strong AI to exist, to act as a human mind does.

#

is that wrong?

#

or is it to argue the Turing Test?

serene scaffold
#

I don't think the analogy is intended to argue for one position or another, but to provide a basis for discussion

left tartan
serene scaffold
#

"if a robot participates in a conversation only by looking up responses from a table of inputs and outputs, can it be considered to understand what is being said?"

boreal blaze
#

isn't his basis that it can't, though? because he does rebut a lot of arguments that try to say that it can be intelligent?

#

actually wait basically, i just want to confirm this statement:

serene scaffold
#

I haven't read the Searle paper that you're referring to

left tartan
boreal blaze
#

Alan Turing believes that if a computer can simulate a human being well enough, it is intelligent. Searle argues that no matter how well a computer is programmed, it is still only simulating understanding, and is not intelligent

boreal blaze
left tartan
#

What's inte4resting... as I read through this, is not the argument, but the reply's to the argument

boreal blaze
#

i like the brain simulator reply the most

#

i meant brain simulator

#

just cause the analogy is really well thought out,

left tartan
#

just got into it. Interesting read, enjoyed it.

iron basalt
# serene scaffold I don't think the analogy is intended to argue for one position or another, but ...

Searle does make a conclusion, and it's a non sequitur. It's another form of Vitalism, but for modern day. In addition, whether it can understand the tasks given depends on the tasks, specifically how much they rely on knowing about the real world. If the task is for example, math (symbol manipulation / algebra parts of it), then sure it can do those and "understand" them, just like a human would in that situation. Fundamentally it's arguing that computers lack symbol grounding, but they can have symbol grounding, this is an arbitrary assumption / premise made by Searle that makes their argument work at all.

#

This argument was also probably made in the context of understanding AI back how it was when everyone did symbolic ("good old fashioned") AI. Which is why it has the heavy focus on symbols and symbol grounding.

iron basalt
#

Seems like a bit of a walk back.

#

Something like "brains have the magic sauce" (again, Vitalism).

#

(Vitalism used to be huge when cells / humans being made of cells first started becoming accepted)

#

(we like to feel special)

boreal blaze
#

i feel like (based on his reply on the other minds reply) that he wants to say the machine can't ever do it, because a machine can never become a biological.

#

well not quite.

#

The Many Mansions Reply suggests that even if Searle is right in his suggestion that programming cannot suffice to cause computers to have intentionality and cognitive states, other means besides programming might be devised such that computers may be imbued with whatever does suffice for intentionality by these other means.

This too, Searle says, misses the point: it “trivializes the project of Strong AI by redefining it as whatever artificially produces and explains cognition” abandoning “the original claim made on behalf of artificial intelligence” that “mental processes are computational processes over formally defined elements.” If AI is not identified with that “precise, well defined thesis,” Searle says, “my objections no longer apply because there is no longer a testable hypothesis for them to apply to” (1980a, p. 422).

#

he wants to say that a machine is defined as somehting programmed, with instructions. If it doesn't work like that, the argument fails because the argument is not meant to target biological machines

iron basalt
#

That is what I read, but then I read that Searle basically said that brains can do it, machines can't, and that it would need to be demonstrated. Which is not the same as "not possible."

#

I think the opinion and message has changed over time.

iron basalt
#

The messaging is not really clear enough, so i'm just going to leave it at what I wrote.

somber panther
#

so someone posted this over in the excel discord, what is your impression?

#

excel and python are kind of my thing, wonder if i should milk this

#

🐄

boreal blaze
#

it looks nice, but the subscription at the end does seem like something Microsoft will milk.

latent ibex
#

Not sure if this is the right chat but I'm at a stage in which I need to extend some of the built in stats file for an open source library that I'm currently using. Upon checking in with chatgpt to help me do so, it suggests that it's best to subclass it rather than modify the original file.
In theory, if I'm subclassing would I just need to add the pertinent code to a whole new file and include this in the same directory as the rest of the library files? Or is there something else I'm missing? Thanks.

left tartan
latent ibex
left tartan
#

And it sounds like you’re not very familiar with subclassing, right?

latent ibex
left tartan
#

Generally, subclassing let’s you override or add functionality from the base class. It’s requires a few examples to explain, but perhaps you should start with the inheritance section here: https://python.swaroopch.com/oop.html

#

In terms of file placement: you’d usually put your code in your own directory. It doesn’t matter where the library files are. You’d import the library, then any modules you wrote

serene scaffold
manic tangle
#

dipping my toes in ai / ml, hoping someone else can suggest some reading for what im trying to accomplish

#

my goal is to first determine whether a piece of text is code, and if it is classify which language it was written in. i understand this is very accomplishable without any ML but i just want a lil project 🤠

#

I just dont even know where to start cause ive never rly touched this

rigid oxide
#

is there a way to check if a variable is truthy and equals a value at the same time? I have to use this and it's annoying. I have a js background:

if meta_tag_property != None:
        if  meta_tag_property.startswith('og:'):
manic tangle
past meteor
# manic tangle my goal is to first determine whether a piece of text is code, and if it is clas...

Looks like a fun project. On the top of my head I know a few ways how I would solve this. I'd say what matters most for you is what part of AI you want to delve into:

  1. Do you want to do traditional ML and generate "features" (input variables) and then train a model?
  2. Do you want to pass it off to something like a neural network?
  3. Do you want to use an API to generate "features" to generate your input variables and then train a model.
rigid oxide
manic tangle
manic tangle
past meteor
#

That's a fun one to do! I would suggest you pick a few programming languages but make it sufficiently hard for yourself (have C# and Java be in there together) and then for each language you read a bunch of code and ask yourself "what makes Python code Python"

#

... and then you'll need a lot of regex to make rules

#

That's just what I would do on the top of my head. I'm also not sure how far it will get you 🤔 . You can always move on to 2) and 3) if it's not working well

manic tangle
#

so a neural network solution would look like what?

#

I know basically nothing about AI so most of these concepts r very foreign haha

past meteor
#

A neural network solution would process the raw strings into some sort of vector and then it would use that to make predictions. The difference is that you are no longer generating features yourself

manic tangle
#

mm okay, I assume for training that I would feed it the vector + label of each document?

past meteor
manic tangle
#

well that doesn't sound incredibly difficult thinkingsmirk

manic tangle
#

thank you!

verbal venture
timid kestrel
#

hey hey yall, anyone got any good data sources to practice machine learning models with python? Ive tried searching in kaggle but i dont think i have a good trained eye to select good data sets. im tryna practice my xgboost, random forest parameter setting and optimization skills. also am pretty new to python coding. i was told to browse thru the pins but i cant find anything too specific

cunning crystal
serene scaffold
past meteor
#

Specifically for xgboost just tune the number of trees, for random forest the cost_complexity

delicate lodge
quaint spade
#

does anyone know of site or video that could teach me how I can build something like this for options , data supplied by yfinance and CBOE

left tartan
#

Like, are you looking to write this yourself? GEX curves are a little annoying to calculate/draw out

#

but hang on a sec... I know a blog that posted this...

#

The method is solid, although slow. It can be vectorized, if youre up to the task

quaint spade
# left tartan Do you know black scholes?

nope can't say I'm familiar and yes I do want to write this myself if I have to, I just want my personal app website or whatever that can give me those levels , for now I'm getting them from another server by the name of investors haven , I don't fully understand how they do it but up for the challenge and definitely willing to learn more

quaint spade
verbal venture
#

Whoever authored that thought is an idiot (the chinese room author in this case)

#

If you simulate understanding, you have understanding, which is intelligence

kind herald
#

hey can someone whos done machine learning with both pytorch and tensorflow help me out. I can't decide which i wanna learn first.

lapis sequoia
kind herald
verbal venture
verbal venture
#

Basically if you can manipulate data to solve your problem

lapis sequoia
#

a calculator can do that... is it intelligent?

#

i think im too dumb to talk abt this

sonic knoll
#

Hello everyone!

#

I have to do a project using machine learning but I would like to know if anyone has an interesting dataset to share with me?

timid kestrel
abstract wasp
#

Hi, which library do you guys think is better for building decision trees?

magic dune
#

?

#

Decision trees is a pretty simple algorithm tho

abstract wasp
# magic dune Which libraries are you talking abt

Tensorflow or Scikit-Learn?
My idea for this is that once I get this other model working, I will use the output of that model to help me decide an approximate date, like month, of when an image was taken. I think I’ll include a CNN to help me identify the season and then do the decision tree portion. Do you think this is a good idea?

magic dune
#

but tensorflow is more made for nn

past meteor
#

I would only use the Tensorflow's decision trees if for some reason I to do the predictions on edge with TF lite.

lapis sequoia
#

Pyspark

lapis sequoia
#

Hi guys, do any of you know how to display a neural network? It is a simple Neural Network (3 in, 2 hid, 3 out). I am using neat-python for if it helps, but you can use any package if you want. If you can help me out, that would be great!

red elk
#

Does anyone know a good deep speach yt tutorial

halcyon hedge
#

results = all_months_data.groupby('Month').sum()
months = range(1,13)

plt.bar(months, results['Sales'])
plt.xticks(months)
plt.ylabel("Sales in USD($)")
plt.xlabel("Month")
plt.show();

#

Month contains datetime objects. Code runs perfectly fine on Jupyter but I get a "Cannot sum datetime object" error on kaggle, how to fix this.

agile cobalt
#

you can specify which columns you want to operate on after using groupby like df.groupby(groupby_col)[target_col].function()

halcyon hedge
agile cobalt
#

the month will become the index

#

if you need of it as a column you could reset index after running the aggregation

halcyon hedge
#

I have 1.4.2 running locally and 2.0.3 on kaggle

agile cobalt
halcyon hedge
flat silo
#

Hello, has anyone here used TurboODBC (although it could just be a typical ODBC driver issue as well...) and dealing with a converted datatype from Pandas for BIT, MONEY, and TEXT to SQL Server (in Azure)? Getting various issues about cannot convert: Numeric Error.

wooden sail
#

anyone here very familiar with stochastic matrices?

small wedge
#

they're used in some RL right? like a matrix of probabilities for each state an agent can have?

potent sky
#

transition probability matrices yes, with markov chains

#

unless Edd is referring to a different type of Stochastic matrices ;-;

desert bobcat
#

heyy

#

did you work on AR projects.!?

tropic niche
#

I have a question about annotating data for training. I would like to train LayoutLM with my own dataset of scanned forms. I plan on annotating the data using the same method used for the Funsd dataset. I have used pyTesseract to extract the data from the images. Unfortunately pyTesseract, isn't perfect! even after pre-processing the images (removing lines, noise, and binarizing).

#

Does the annotation need be based on extracted data from pyTesseract or the data as it should appear?

#

For example do the bounding box coordinate need to match those in pytesseract data. If there is a missing word do I add into the annotated data?

wooden sail
#

oops, i disappeared all of a sudden. yeah, stochastic matrices like in markov chains. say right stochastic matrices, more concretely (rows adding up to 1). do you know of any interesting properties of the product A^T A? maybe some bounds on the off diagonal elements 👀

verbal oar
#

if I want to try make some 3d model with AI what should I use instead of NeRf and pytorch3d?

#

and also rasterization based not rt

#

assuming I have dataset of images

languid prairie
#

Hi, looking for help : How to proceed Fine-tuning with LlamaIndex for any models (for example with finBERT model) ?
So currently I am working on a project which consist of fine-tuning our model FinBERT with the LlamaIndex method (https://gpt-index.readthedocs.io/en/latest/examples/finetuning/embeddings/finetune_embedding_adapter.html) in order to have better result in the context of Sentiment analysis. I am actually a beginner so I would appreciate any kind of help for a better understanding of this process.

Looking forward hearing from you 🙂

glossy adder
#

Does anyone have some good reasources on feature selection? I have a dataset with several combinations of features. One way is to make a model for each combination and test these models against each other, an other way (per a blog I read) is to train the model with the full features then test it on the different combinations of features (when it performs worse it means the missing feature was important). Is there any authorative source on this?

past meteor
desert oar
#

i don't know of a single authoritative source for feature selection, there might be something about it in elements of statistical learning

#

there is a lot of old bad advice out there about "stepwise" regression, but there are a lot of problems with that

glossy adder
#

@past meteor Yes, as in selecting a set of existing features not inferring new ones.

#

aha, interesting @desert oar

past meteor
glossy adder
#

I might be overthinking my use case, I am making plots, plotting several dimensions of time series data (line showing the position, colour showing speed etc etc) so the thought is to find what features actually help the model

past meteor
#

Even if your model is capable of finding non-linear patterns explicitly making them can help

#

But using plots to remove features? idk, I would probably not do that.

#

Regularization is the answer

desert oar
past meteor
#

Knowing what features are important is so dangerous that I wouldn't touch it unless you know what you're doing

desert oar
#

the business people will want to know 🙂

glossy adder
#

@desert oar good point. As in - there is no reason to actually remove features. Thx @past meteor - I guess that might move me into overfitting landscape and such

desert oar
#

that said, i have run into cases where i did really want to remove "irrelevant" features, but it's a case-by-case situation. if you can explain what you're actually doing maybe we can provide more detailed advice

past meteor
#

I'd nuance it to the very very very maximum that it means nothing

#

"Under this particular instance of the model and our data the most important features appear to be ..., different instances may drastically find different importances."

#

Business cannot expect me to give them more unless they give me the € to do an experiment 🤣. I "fight" this every other day.

#

Their statistical literacy can be low, if you give them what they think they want they'll make decisions that hurt the business

glossy adder
#

haha, good point 🙂 kind of a mvp product, just minimally budgeted product

true scaffold
#

Hi guys, need some help, i have created n clusters, a cluster contains m docs which are embeddings of 2048 dims (1 doc = 2048 dim of vector, 1 cluster = m docs), now i have a query string, i want to get the most relevant/similar cluster that it can fall under, so i'm thinking of calculating an average embedding of a cluster, and finding cosine sim b/w the query embedding and the cluster embeddings to find the most relevant cluster it can belong to? Any other efficient approach?

past meteor
#

If you have 2 highly correlated features your regularizer will kill 1, that doesn't mean the feature is irrelevant to the problem etc etc

past meteor
#

Just use regularisation and call it day imo

past meteor
past meteor
true scaffold
#

Thought u were referring to my question …

past meteor
#

Is this latent semantic analysis you did? (LSA or LSI)

past meteor
#

Because it might be just a standard case of LSA

tropic niche
#

I have a question about annotating data for training that I asked yesterday but has not been answered. I'm hoping someone can provide some insight. #data-science-and-ml message

true scaffold
past meteor
#

It's a retrieval problem? Someone gives an input, what do you want to give them, the most relevant document?

#

or the most relevant topic?

true scaffold
#

document

#

basically, a user uploads a csv file

#

each row is a doc let's say

#

now i take this csv and create clusters out of it, now the user also enters a query, now based on this query, i wanna recommend him a cluster

#

the most "similar/relevant" cluster

past meteor
#

Your case is exactly latent semantic indexing

#

From my old slides, that's what you want to do right?

true scaffold
#

yea i wanna give him the index of the cluster which contains relevant docs based on his query

past meteor
#

You actually don't need to cluster

true scaffold
#

i know, i can just show him most similar docs

#

i've done that

#

but the cluster part is a different feature

#

basically with clusters, the user can look at other options...

#

the ability to explore more...

past meteor
#

You can just take the cluster centres and do the same as before

#

That's indeed what you proposed originally

true scaffold
#

yea i guess its a KNN problem...?

past meteor
#

(I think you're making this a lot harder than it should but...) take the mean of all the docs in the cluster, compute the cosine sim, take the most similar one, show the top N in that cluster

true scaffold
past meteor
#

Yes it will but c'est la vie

true scaffold
#

wait let me translate that...

past meteor
#

that's life*

true scaffold
#

yea...

#

any better approach?>

past meteor
#

This is what my course had to say about it:

#

NLP from 1979 though 👀

#

Their suggestion is just to take the mean of the documents and return all in the cluster, which has drawbacks ofc as you mentioned.

true scaffold
#

hmm... yeah

#

anyway, thanks

#

ill implement what we discussed and will keep researching to find a better approach to solve this...

serene scaffold
#

Are these two expressions equivalent?

wooden sail
#

what's that fancy 1

serene scaffold
#

1 if the underset equality is true, else 0

#

I think (because the assignment doesn't say)
(the first expression is given in the assignment and the second is my attempt at rewriting it to be easier to reason about)

wooden sail
#

yeah, you factored out the -1 exponent from the log. looks equivalent

serene scaffold
#

ty math wizard

half herald
#

Why am I getting such an error? I can't use Cv.Imshow directive

agile cobalt
#

how did you install opencv? just pip install?

half herald
#

Yes, but after I got this error, I deleted it and reinstalled it, it said so on the internet, but it didn't work.

agile cobalt
agile cobalt
#

maybe try uninstalling it then reinstalling opencv

#

or just nuke your current venv and create a new one

half herald
#

@agile cobalt It didn't work, I still get the same error, it's ridiculous.

agile cobalt
#

try reinstalling python, this time from python.org instead of the windows store, then create a virtual environment before using pip

half herald
agile cobalt
#

tl;dr keep things tidy instead of ending up with messes that can causes all sorts of problems like what you just had

half herald
#

@agile cobalt What I am about to say may seem strange to you, but when I run the code using a different IDE, there is no problem, but I get this error in Visual Studio Code. I really couldn't understand

agile cobalt
#

that other IDE is PyCharm?

#

or something inside of Anaconda

#

both of these manage virtual environments for you to some extent

half herald
#

No, normal Python Idle

agile cobalt
#

it might be just pointing to a different python interpreter then

#

in VSCode, do you see the python version in the bottom right corner? Click it and select a different interpreter

half herald
#

Wowwww I chose anaconda and it worked, very strange

#

So the problem is that simple

#

I've been trying to solve this problem for 2 hours

agile cobalt
#

managing dependencies in python can be a pain on the ass sometimes

half herald
#

Thank you so much bro

candid spruce
#

hi I was wondering if anyone wanted to work on a ai project with me 😄 here is a blue print for the ai

abstract wasp
#

I am building a CNN with data regarding most popular cities. If I train my cnn with the cities I have right now and gather more data of other cities later, will it remember the previous cities or will it forget and just remember the new ones?
Should I just wait until all my data and train it all together?

serene scaffold
#

@abstract wasp training a CNN to do what?

abstract wasp
serene scaffold
serene scaffold
# abstract wasp Yes

if you trained your classifier on cities {a, b, c}, and then continued training it on {d, e}, you would run the risk of the classifier forgetting {a, b, c}. and there are strategies for mitigating this, but unless the training you did on {a, b, c} can't wait and would be expensive to replicate, you'll get better results if you train once on {a, b, c, d, e}.

abstract wasp
serene scaffold
#

but that's probably a quesiton that dissertations are written about.

serene scaffold
#

@abstract wasp why do you ask? do you already have a trained model that was expensive to train?

abstract wasp
#

But yeah, training a CNN with this amount of data would be very expensive, that’s also another factor.

magic dune
#

can I have a simple decision tree code review?

serene scaffold
serene scaffold
# abstract wasp No, I haven’t trained it yet. I’m still gathering data but rn, I’m just gatherin...

something to keep in mind as you approach this is: if someone with internet access looked at one of the images in the dataset, would they be able to figure out what city they're from? if the images are so non-descript (no famous buildings, no city-specific architecture, etc.) that there's nothing about them that could be tied to a particular city, a neural network won't be able to magically solve that for you.

magic dune
#

I think I did an ok job but might be able to improve

#

if anyone can review the code and tell me what I should improve on I would be happy to hear

abstract wasp
echo lance
#

Is there a specific book or vid series for ml that focus on how to achive accuracy for different data behaviours. And techniques to win competitions.
As most of the books focus on teaching algos...

echo lance
#

Yeah

tacit basin
#

Maybe the kaggle book?

scenic parcel
#

How is ^VIX being calculated as having only a 0.0272 correlation with VIXCLSx

#

Should I be normalizing first? Using pandas corr function, pearson correlation

lost plinth
#

Hi Folks,
During my free time, I was doing personal
project basically created a chatbot which can
answer your question from document. I used
Langchain(framework), ChromaDB(vector database), Streamlit(ui) and used both local llm(Llama2 based model) or OpenAl api for llm. You can use PDF, TXT, CSV, and DOCX files for question answering. Any
contributions to this project will be highly welcome. Thanks!
Github link: https://github.com/himanshu662000
/InfoGPT

wispy junco
#

Hi I'm a complete beginner to ml and need to train a model to automatically find coordinates in an image, can someone please point me to some resources and libraries that can help me accomplish this, thanks.

scenic parcel
#

Does it get slow with large amounts of pdf, essentially if you gave it an entire bookshelf to search through? I'm definitely downloading this and trying it out though. How long did it take?

#

I already have a correction, it is supposed to be requirements.txt not requirement.txt just a tiny thing lol

lost plinth
scenic parcel
long canopy
#

any AI tools available for CLI prompt validation? i.e. to check whether a string answer to a command line prompt has an appropriate format

lapis sequoia
#

Do any of you guys know some high quality libraries for making maps in python

#

I have worked with plotly and folium

#

Plotly is pretty good but runs into some limitations from time to time

fallow frost
#

does anybody know the maximum length a SQL query can be with Athena DB ?

#

I basically need to do: SELECT ... WHERE col IN <very-long-list-of-values> the list/tuple can have upwards of 100k strings with at least 50 chars each

#

not sure if I pass it as query parameter if it will still matter or not...

wintry cloud
serene scaffold
#

I'm at a loss for how to proceed with this question. It appears that we have function K as R^d x R^d -> R, and Phi as R^d -> R^d, but I don't understand the relationship between K and Phi.

past meteor
#

You definitely have to look at it in terms of K and not phi, that's the property they want you to exploit

#

if x_i = x_j then the term is exp^0 = 1 and if they are different xi - xj² results in a positive number which you multiply by -1/2 resulting in a negative which is also bounded by 0 and 1.

#

I don't understand the <= 2

small wedge
small wedge
# scenic parcel gpt

Please never recommend chatGPT as a source of information, especially to beginners.

wispy junco
#

anyway, not even gonna try to sugarcoat this, I got the code from chatGPT

from PIL import Image
import numpy as np
from sklearn.cluster import KMeans

def get_dominant_colors(image, num_colors):
    # Convert the image to a numpy array
    img_array = np.array(image)

    # Reshape the image array to a list of pixels
    pixels = img_array.reshape(-1, 3)

    # Initialize K-Means with the desired number of clusters (colors)
    kmeans = KMeans(n_clusters=num_colors, random_state=0).fit(pixels)

    # Get the RGB values of the cluster centers (dominant colors)
    dominant_colors = kmeans.cluster_centers_.astype(int)

    return dominant_colors

# Open an example image (replace with your box)
box = Image.open('./doggo.jpeg')

# Specify the number of dominant colors you want to extract (5 in this case)
num_colors = 5

# Get the 5 dominant colors within the box
dominant_colors = get_dominant_colors(box, num_colors)

# Print the dominant colors (RGB values)
print("Dominant Colors:")
for color in dominant_colors:
    print(f"RGB: {color[0]}, {color[1]}, {color[2]}")

can someone tell me what's going wrong, I'm trying to get the 5 dominant colors in a image

wispy junco
#

I'm getting this error

lapis sequoia
small wedge
quaint spade
#

hey everyone , i need a favor , can someone run code for me , doesnt seem to work on my laptop i want to see if the problem is me, maybe i didnt install all the right packages or maybe the code , has to do with webscraping from cboe and gamma exposure for oprions https://github.com/Matteo-Ferrara/gex-tracker/tree/e4a5cd508268673004e7dcd2f73ce7f74bf251c5

GitHub

Dealers' gamma exposure (GEX) tracker. Contribute to Matteo-Ferrara/gex-tracker development by creating an account on GitHub.

abstract wasp
#

Hi, help, I get an data_iterator = data.as_numpy_iterator() AttributeError: 'DirectoryIterator' object has no attribute 'as_numpy_iterator'
This is my code:
``import pandas as pd
import numpy as np
import os

import tensorflow as tf
from tensorflow.keras import layers, models
from tensorflow.keras.preprocessing.image import ImageDataGenerator

#GPU
gpus = tf.config.experimental.list_physical_devices('GPU')
for gpu in gpus:
tf.config.experimental.set_memory_growth(gpu, True)

#DATA LOADING
train_images = '/Users/avatarvaleria/Projects/colabs/Lys/time/data/images'
classes = os.listdir(train_images)
print(classes)

#DATA AUGMENTATION
data_aug = ImageDataGenerator(
rotation_range=25,
fill_mode='nearest',
horizontal_flip=True,
brightness_range=(.5, .5),
zoom_range=.5
)

#APPLYING AUG
batch_size =32

data = data_aug.flow_from_directory(
train_images,
target_size=(256, 256),
batch_size=batch_size,
class_mode='sparse'
)

data_iterator = data.as_numpy_iterator()
batch = data_iterator.next()

fig, ax = plt.subplots(ncols=5, figsize=(20,20))
for idx, img in enumerate(batch[0][:5]):
ax[idx].imshow(img.astype(int))
ax[idx].title.set_text(batch[1][idx])

#DATA PREPROCESSING
data = data.map(lambda x, y: (x/255, y))
data.as_numpy_iterator().next()

scaled = batch[0]/255
scaled.max()

scaled_iterator = data.as_numpy_iterator()
batch = scaled_iterator.next()

fig, ax = plt.subplots(ncols=4, figsize=(20,20))
for idx, img in enumerate(batch[0][:4]):
ax[idx].imshow(img)
ax[idx].title.set_text(batch[1][idx])

data.as_numpy_iterator().next()[0].max()

#SPLITTING
len(data)

train_size = int(len(data).7)
val_size = int(len(data)
.2)
test_size = int(len(data)*.1)

print(f"Train size: {train_size}")
print(f"Validation size: {val_size}")
print(f"Test size: {test_size}")

train = data.take(train_size)
val = data.skip(train_size).take(val_size)
test = data.skip(train_size+val_size).take(test_size)`

quaint spade
#

also im a beginner , i just saw source code and thought that it would work

quaint spade
left tartan
#

All that’s saying is the request is returning no data. Presumably there’s no error handling, so it’s probably just hiding an error

quaint spade
#

why would it do that lol

left tartan
#

This code is a year old. Websites change.

#

Scraping is very fragile. I’ve worked with cboe to do this exact thing before (gex), but manually downloaded the data.

quaint spade
quaint spade
quaint spade
boreal gale
lapis sequoia
boreal gale
#

is folium/plotly lacking in any way for these?

lapis sequoia
#

Sort of like this

lapis sequoia
boreal gale
lapis sequoia
#

Like for example, plotly wasn't handling overlap of text very well. And I had to custom code a clustering logic which avoided the overlap

#

gpt did mention that

#

I mainly wish to make static maps btw

desert oar
#

no interactivity but relatively detailed control over output

#

i get map tiles using contextily

#

it's good enough for the static images in presentations and docs that i need

scenic parcel
small wedge
scenic parcel
#

gpt4 is highly reliable and most beginner programmers just want to do something simple that gpt 3.5 wont be hallucinating anyting up for

#

Its instant responses chatting with an industry expert when the alternative is maybe getting a response every few hours from some people on discord/reddit or poring over documentation. Its how I got my start a few months ago and I found it invaluable

small wedge
#

It's absolutely not reliable as it can and will give you contradictory answers to the same logical question when worded differently. Not to mention hallucination is still a problem for gpt4 even if it's not as much of a problem as it was for 3.5. Again, not to mention you didn't recommend it for a simple task, you recommended it to someone who had a fairly complex task. Talking to gpt4 is absolutely not akin to talking to an industry expert.

iron basalt
left tartan
#

Even worse is when it gives a working answer that’s a bad practice

scenic parcel
# small wedge It's absolutely not reliable as it can and will give you contradictory answers t...

Is this not a helpful respone? (3.5 btw)

Hi I'm a complete beginner to ml and need to train a model to automatically find coordinates in an image, can someone please point me to some resources and libraries that can help me accomplish this, thanks.

ChatGPT
Certainly! If you're a beginner in machine learning and want to train a model to automatically find coordinates in an image, you'll likely be working on an object detection task. Object detection involves identifying and locating objects in an image, which can be thought of as finding the coordinates of objects within the image. Here are some resources and libraries to get you started:

  1. Python: Most machine learning and computer vision tasks in the context of object detection are done in Python.

  2. Libraries/Frameworks:

TensorFlow Object Detection API: This is a popular framework for object detection. It provides pre-trained models and tools to train your own models. Here's the official GitHub repository.

PyTorch: PyTorch is another popular deep learning framework that can be used for object detection. You can find tutorials and pre-trained models in the PyTorch Hub.

OpenCV: OpenCV is a computer vision library that can be used for various tasks, including object detection. It has pre-trained models and tutorials for object detection. Check the OpenCV documentation.

YOLO (You Only Look Once): YOLO is a popular real-time object detection framework. You can find implementations and pre-trained models like YOLOv3 and YOLOv4 in various repositories, such as YOLO GitHub.

  1. Datasets: You'll need a dataset of images with labeled coordinates to train your model. Some popular object detection datasets include COCO (Common Objects in Context), Pascal VOC, and custom datasets you can create.

  2. Tutorials and Courses:

Coursera and Udacity offer machine learning and computer vision courses that cover object detection.

YouTube has numerous tutorials on object detection using different frameworks.

#

Blogs and tutorials on Medium and Towards Data Science often provide step-by-step guides for object detection tasks.

  1. Books: Books like "Deep Learning" by Goodfellow, Bengio, and Courville or "Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow" by Aurélien Géron can provide a solid foundation in machine learning and deep learning concepts.

  2. Forums and Communities: Websites like Stack Overflow and Reddit (e.g., r/MachineLearning) are great places to ask questions and seek guidance from the machine learning community.

  3. Online Coding Platforms: Platforms like Kaggle provide datasets, kernels (code notebooks), and competitions related to object detection. It's a great way to learn and practice.

Remember that object detection can be a complex task, especially for a beginner, but with dedication and practice, you can make progress. Start with the basics of machine learning and gradually delve into object detection techniques as you become more comfortable with the concepts and tools.

left tartan
#

Without even reading it; that content is no better than what the same google search would yield.

small wedge
#

Note I didn't say it can't provide helpful responses. I said it is unreliable

left tartan
#

That’s incredibly generic, overly verbose, and not particularly helpful advice to someone just starting.

#

In fact, I’d argue that is worse advice than googling and reading a few articles that explain -why- and put things in context

past meteor
shadow viper
#

good day everyone,
please does anyone know how i can edit a particular cell in powerBI?

left tartan
shadow viper
#

I'm making use of power bi and I have a column filled with null values and I want to edit one particular cell in the column to something else.
The replace function keeps replacing the whole column filled with null instead of the particular cell I want to replace.
How do I do this please?

left tartan
#

I don’t know powerbi, sorry

shadow viper
#

Alright

#

Thanks

scenic parcel
#

Does anybody else get this "UserWarning: The NumPy module was reloaded (imported a second time). This can in some cases result in small but subtle issues and is discouraged." its like unfixable

small wedge
#

!paste Could you show the code that creates the warning?

arctic wedgeBOT
#
Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the Paste! button in the bottom left, or by pressing CTRL + S. After doing that, you will be navigated to the new paste's page. Copy the URL and post it here so others can see it.

scenic parcel
#

Seems to be any time I import pandas, or if a library that I import uses pandas

#
import gdown

import zipfile
import os

def download_data_from_drive(zip_url, output_path):
    # Download the zip file from Google Drive
    gdown.download(zip_url, output_path, quiet=False)

    # Extract the zip file
    with zipfile.ZipFile(output_path, 'r') as zip_ref:
        zip_ref.extractall(os.path.dirname(output_path))
    os.remove(output_path)  # Remove the zip file after extraction

# Using the direct download link format provided by gdown's warning
url = r'https://drive.google.com/example_link'

output = str(DATA_DIR / 'Stock_data/dl_folder/example_data-2.zip')
download_data_from_drive(url, output)
#

Most recent time I've run into it, has happened a lot of other times. This is one of the smaller scripts that causes it

#

Stackoverflow said uninstall and reinstall pandas/numpy, have tried that. It happened with miniconda, tried using full anaconda instead, still happens. Uninstalled everything on conda and reinstalled everything with pip only, still happens

#

At this point I think its a problem with python 3.10.12 and have left it because it hasn't caused any noticable effects but the warning is annoying

steep shadow
#

Does anyone have any advice on how I can begin learning AI. I already know intermediate python and data structures.

frosty gale
#

Hey i am having a trouble installing OpenCV CUDA, I am done with all the steps in CMAKE-gui, but when i try to build the files, it just throws an error:

MSBUILD : error MSB1009: Project file does not exist.
Switch: INSTALL.vcxproj
vale swallow
#

Hi, can someone pls help me. Can someone give me an example code of how to split my data. For example, I have a directory named “main_dir” and in this directory I have three directories, each for the three classes I have named “1”, “2”, “3” (with just images of each class). How can I split my data into train, val, test?
I’m seeing different ways using Tensorflow, Sklearn, and other ways so I’m confused on how I should do it.

frosty gale
vale swallow
frosty gale
# vale swallow Ok, thanks! Also, do you know how to implement data augmentation? I saw you can ...

hi yeah, you need to apply scaling or any kind of augmentation to your data set to the split model instead of the original dataset because:
if you augment the orignal dataset, all data entries will be changed, and upon splitting into test and train sets, your test set will also be affected.
on the other hand, splitting data and then augmenting/scaling/changing the train set, will help you preserve the original test set, giving more accurate outcomes to the test output

#

am new to this too, so correct me if am wrong, anyone

small wedge
small wedge
scenic parcel
oak panther
#

what are the best algos to try for stock trading futures/indices?

small wedge
#

so if I made a requirements.txt it'd be like 100 lines long

#

if you just want my versions of these packages I could send that

scenic parcel
#

Also python version

small wedge
#
gdown==4.7.1
├── beautifulsoup4 [required: Any, installed: 4.10.0]
├── filelock [required: Any, installed: 3.12.4]
├── requests [required: Any, installed: 2.25.1]
├── six [required: Any, installed: 1.16.0]
└── tqdm [required: Any, installed: 4.66.1]
config==0.5.1

did any of these even use numpy pithink

tona@albedo:~$ pip --version
pip 22.0.2 from /usr/lib/python3/dist-packages/pip (python 3.10)
scenic parcel
#

zipfile ?

small wedge
#

isn't that a built-in module

scenic parcel
#

Yeah I think so nvm

#

So is that python 3.10.0

small wedge
#

mhm

lapis sequoia
quaint loom
#

I'm currently developing a module to detect small bubble events (Ebullition), calculating the CH4 ebullition flux (eFCH4) by assuming a constant diffusion rate. To mitigate diffusion flux inhibition due to high CH4 concentration in a floating chamber, I select previous data within the observation period to calculate the diffusion flux, represented by the U-M line (Prototype: [link: https://paste.pythondiscord.com/K6RA). I employ the least squares method to fit the slope of the U-M line, obtaining the CH4 diffusion rate within the period. Additionally, I calculate the CH4 diffusion concentration at the observation's end (point E) based on the U-M line's slope. The change in CH4 ebullition concentration (Δc) results from subtracting the concentration at point E from point T during the observation period.

I want a module that can extracts relevant time periods from raw data (an xls file) for analysis (e.g., 10:02:59 - 10:12:59, 11:23:59 - 11:26:59). This targeted approach eliminates the need to analyze the entire raw data range. Ebullition events occur when CH4 bubbles disrupt the linear increase in CH4 concentration.

While I've created a prototype for significant bubble events ([link: https://paste.pythondiscord.com/K6RA), I'm seeking guidance on developing one for small bubbles. Additionally, I'm working on determining an appropriate threshold value ([link: https://paste.pythondiscord.com/H7RQ). Any assistance or advice to enhance the module would be greatly appreciated.

quaint loom
night forge
#

Hi, I had a question with pytorch. Below is my model

#
from torch import nn
# create a two layer FCNN, avoid ValueError: optimizer got an empty parameter list
class img2latent(nn.Module):
    def __int__(self):
        super(img2latent,self).__init__()
        self.neuralDim=len(X_train[0])
        self.latentDim=len(Y_train[0])
        self.hiddenDim=self.neuralDim
        self.fc1=nn.Linear(self.neuralDim,self.hiddenDim)
        self.fc2=nn.Linear(self.hiddenDim,self.latentDim)
        # INTITIALISE THE WEIGHTS, FC1 WITH ONES, FC2 WITH PARAMETERS OF RIDGE
        self.fc1.weight.data.fill_(1)
        self.fc1.bias.data.fill_(0)
        self.fc2.weight.data=ridge.coef_
        self.fc2.bias.data=ridge.intercept_
    
    def forward(self,x):
        x=self.fc1(x)
        # add reLU
        x=torch.relu(x)        
        x=self.fc2(x)
        return x
  
    

def train_loop(model,loss_fn, optimizer):
    model.train()
    # do full batch gradient descent
    pred=model(X_train)
    loss=loss_fn(pred,Y_train)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    return loss.item()
    
    
fullmodel=img2latent()
# SEND TO GPU
fullmodel=fullmodel.to(device)
# take mse loss + l2 regularaisation  on the weights of the second layer
loss=nn.MSELoss()
optimizer=torch.optim.Adam(fullmodel.parameters(),lr=0.01,weight_decay=0.01)
for t in range(1000):
    loss=train_loop(fullmodel,loss,optimizer)
    if t%100==0:
        print(t,f"{loss:0.2f}",end='\t')
#

However I get

ValueError: optimizer got an empty parameter list

How do I solve this issue

small wedge
desert oar
# quaint loom I'm currently developing a module to detect small bubble events (Ebullition), ca...

when moving into predicting events that are hard to distinguish from normal variation, you usually end up having to trade off between probability of true detection and probability of false detection. but ultimately just about every statistical technique revolves around doing something very similar to what you are already doing: proposing a baseline model, and then looking for deviations from that baseline model. where it gets more complicated is when you want to analyze the situation probabilistically, but that's often necessary in cases where it's not straightforward to distinguish the baseline and deviated scenarios. probability model and gives you a principled framework for distinguishing the two, and allows you to make trade-offs in terms of the probabilities of false positives, true positives, etc.

#

One thing that did not really occur to me when I was helping you previously is that, because you assume a constant rate of increase in the baseline scenario, when you remove the trend by taking first differences, you should get a flat line

#

That is, you can transform your data in terms of deviations from the expected trend line

#

Then instead of modeling the slope directly, you can just look for unexpectedly large positive deviations from trend

#

This is convenient because you can think in terms of level values rather than rates, which i think makes it all a little bit easier

#

It also simplifies the problem I think, because it reduces now to figuring out what is the normal baseline distribution of CH4 increases at any time step

#

In your case, it seems like maybe the variation in the data is constant over time? If that's true, then you can get pretty far using standard statistical hypothesis testing

quaint loom
# desert oar when moving into predicting events that are hard to distinguish from normal vari...

Thanks for explaining that, it makes a lot more sense now! I agree that focusing on deviations from the expected trend seems like a crucial first step in developing the module. It's true that many published reports on methane ebullition events are based on predictions rather than actual observations, which can introduce uncertainty.

Understanding the composition of bubbles and how other substances inside them change over time can significantly reduce uncertainties and improve predictions. there is ongoing research into the content of these bubbles. It seems like we're working towards making our predictions about methane bubbles more accurate and reliable, moving away from just educated guesses. I also want to mention that I think this is the first step for me to develop the module, so the module itself will have to be improved of course.

desert oar
#

it gets fuzzier and harder to distinguish events from non-events when you need to estimate the baseline distribution directly from the data without a theoretical model

quaint loom
quaint loom
# desert oar my understanding was that the expectation of a linear trend was derived from som...

To be honest, not always, but in some cases, the expectation of a linear trend is based on theoretical knowledge of the underlying chemical processes. This theoretical foundation provides us with a starting point for making predictions. However, remember that while theory guides us, real-world data can sometimes behave differently due to various factors. So, while we start with a theoretical basis, we also need to be prepared to adapt our models when necessary to account for deviations from the expected linear trend.

desert oar
#

remind me again: are you able to analyze the whole time series at once? or do you need to be able to detect events when they occur, using only past data?

quaint loom
quaint loom
desert oar
quaint loom
desert oar
#

okay, so if you are looking for methods that only use past data without seeing the full sequence, your keyword is "online" (although it's not very useful on its own given its other meanings)

#

i think last time you asked about this, changepoint detection was brought up, and ultimately i think that does describe what you are trying to do

quaint loom
# desert oar right, but is this something that is going to be running continuously and sendin...

Well, this is where the time-consuming part comes in. The instrument logs data every second. So, in the field where I use the instrument for 10 minutes at each site, I have to physically analyze the data afterward in Excel, empty the distributed dataset before placing it in the water, etc. And this is just raw data; there are a few other modules that have to be used afterward to obtain the actual Flux data."

desert oar
quaint loom
desert oar
#

makes sense

quaint loom
past meteor
#

If you can mathematically define what you want to do finding a method that does it comes out the other end sometimes 😄

quaint loom
past meteor
#

Do they change the trajectory of your overall curve?

quaint loom
past meteor
#

I'd say the size of the bubble doesn't determine if it's an anomaly or not, you can choose what an anomaly is as the practitioner

#

If the bubble moves your line permanently it's a changepoint, if it doesn't I'd say it's an anomaly

quaint loom
past meteor
#

When you say temporary, does it mean it shifts back?

quaint loom
past meteor
#

Is the entire "lifespan" of your data impacted by an "event"?

#

Or just a "zone" around the "event"?

quaint loom
past meteor
#

Okay I'd say they're anomalies then. How fast do you need to spot them? You can be vague about this like "very fast", "medium" etc

#

And do you know the expected pattern before starting your process?

#

Additionally, do you have series that don't have any "bubble events"?

quaint loom
# past meteor Is the entire "lifespan" of your data impacted by an "event"?

It all depend on the length of the time I measure, Often I sample for 10-15 minute so when I see a significant increase when I analyses the data, it will be for the rest of the "liftspan" of the observed datatime I look for. But When I look at the real-time data out in the field, it will ofcourse drop down as the system flushes

past meteor
#

I don't know your domain so it's in both our best interest if you abstract away some of the details 🤣

quaint loom
past meteor
#

So not in real-time? After the fact?

quaint loom
past meteor
#

I'd look at the variance of N points that don't have any "bubble events" and then take "N" points that contain a bubble event in the beginning and the end

#

There should be some sort of difference in variance

quaint loom
# past meteor So not in real-time? After the fact?

Well, when looking at the data when I am on the field, the graph itself drops if I continously keep meauring. But when I have decided that this timespend is the data I will use, you may not always see its decrease unless it is small bubble event.

#

Well, there will always be a constance change as the measurment is being done every second. That is also why I need to have a trashold value.

lofty star
#

Data

desert oar
#

that is, they are assuming there is some steady state constant rate of increase, and these bubbles lead to large "steps", effectively positive y-intercept shifts

past meteor
#

I agree with all you said hence why I think we're going in circles 😄

desert oar
#

yeah that's why I was trying to get out whether this was online or not

#

The best I can gather is that this is not really an online problem, which allows you to produce a decent estimate of baseline variation around trend, so you can retroactively look and find large deviations that might be bubbles

#

basically what I'm proposing is a shortcut to find change points, using the specific assumptions of the problem, rather than a fully general algorithm

past meteor
#

No, what you're suggesting is enough

desert oar
#

however I suspect that most of the off-line change point detection algorithms that work by recursively partitioning the time series would also work very well to detect large mean shifts

#

where things get tricky is detecting smaller mean shifts, and that's where I got hung up before I had to go do something else

past meteor
#

There's a few cases where it will fail that I can foresee but they should start here and solving those will be easy

desert oar
#

if you just look at average deviation from trend, E.g. estimating sample standard deviation of first differences, that standard deviation estimate will include all of the shifts

#

and this is where I really regret dropping that nonparametric statistics class in grad school

#

because my hunch is that some kind of robust estimation would be appropriate here

#

essentially you have an extra distribution of mean shifts: baseline shifts, and shifts caused by bubbles

past meteor
#

Something simpler can work, I'd only reach for those if the diff-in-variance method fails

desert oar
#

so either you do something nonparametric to try to eliminate the bubbles from the baseline estimate, or you do something like a bayesian mixture model where you are being really meticulous about accounting for all sources of variation, but that might be harder to design

#

True, maybe it just works

#

@quaint loom if you have the opportunity to sit there and observe bubbles as they come up, I think that would help your analysis substantially

past meteor
#

Most summary statistics have robust counterparts incl. variance if that's an issue

desert oar
#

Literally just mark the time that a bubble occurs, then all of a sudden you have labeled data points and you can be much more confident about model/technique selection

desert oar
past meteor
#

You probably also use Huber loss?

quaint loom
quaint loom
desert oar
# quaint loom Yea, that is the simple way out.

It's not the simple way out, it's the correct solution. This is the "shoe leather" part of "statistics and shoe leather" that goes back to the earliest days of statistical analysis. Actually collecting good useful data almost always involves toil and manual effort

quaint loom
#

As the chamber is covering th water surface, you can not always detect with your eye that a bubble is appearing inside the chamber.

desert oar
#

I see, you said before that they were observable and I didn't realize that was limited

quaint loom
#

I would still like to develop this module. Just have to figure out how the best way should be! I also missed so much in school about this...

lapis sequoia
#

anyone can help me in openpyxl?

desert oar
quaint loom
quaint loom
abstract wasp
#

I was going to run my model but I got this error:
(DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_4' with dtype int32 and shape [5923] [[{{node Placeholder/_4}}]] 2023-09-23 07:34:00.087774: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_4' with dtype int32 and shape [5923] [[{{node Placeholder/_4}}]] Epoch 1/60 Assertion failed: (f == nullptr || dynamic_cast<To>(f) != nullptr), function down_cast, file ./tensorflow/tsl/platform/default/casts.h, line 58. Assertion failed: (f == nullptr || dynamic_cast<To>(f) != nullptr), function down_cast, file ./tensorflow/tsl/platform/default/casts.h, line 58.

desert oar
# quaint loom If you have some suggestions, please list them out to me when you have some time...

Well we were talking about looking for outliers in the distribution of differences right? So I would start by computing the sample standard deviation, or maybe the median absolute deviation, of the first differences, and then messing around setting thresholds on that quantity. For example, if a deviation is more than two median absolute deviation away from the median, it might be an outlier, i.e. a bubble event

#

The median absolute deviation should be more robust to the effects of including outliers in the estimation process, compared to the standard deviation

#

I would probably start by doing this on each time series individually, preferably one where you were able to actually observe and mark known bubbles

#

and by the way you might have a somewhat more enjoyable data analysis experience if you work with pandas or numpy. the trade-off is that they are fairly large libraries and might take a while to get comfortable with, but maybe you can set that as an intermediate goal before moving onto more sophisticated algorithms or scaling up to some kind of automated process

quaint loom
neon field
#

ANYONE WHO HAS DONE SIGN LANGUAGE DETECTION USING MACHINE LEARNING??!!! ITS URGEENTTT

verbal venture
#

can someone explain to me why np.axis=0 means "applied to each column individually, but along the rows"

echo mesa
#

Hi guys, this might be a dumb question but, Do I have to know numpy in order to understand and use pytorch, because I'm learning math right now for machine learning and ai and I want to improve in python as well and eventually want to use either pytorch of tensorflow to implement my math knowledge and build models, so I'm wondering how should I do them, because I understand that I need math and I love learning math, however I also eventually want to use pytorch or tensorflow to implement my math knowledge and actually build models with them, so I'm wondering that in what order should I do them. Or should I start with understanding and being familiar with numpy and then get on to pytorch and tensorflow, or first being really good at math and then get into numpy pytorch and tensorflow?
I hope this make sense, I know that this is dumb question, but I'm a beginner in this field.

scenic parcel
#

pip makes we want to kill a dolphin

narrow spear
#

is there anyone who is available ı d would like to ask something easy things 🙂

cursive narwhal
#

I can but i'm also a beginner

narrow spear
#

in first photo ı can open camera easyly in second photo ı would like to open webcam again ,ı m writing 0 to default what can ı do

#

when ı write 0 to promt ıt works but when ı was trying to write it in code doesnt work

#

where ı made a mistake

cursive narwhal
#

oh

#

i'm noob)

narrow spear
#

hah okey 🙂

sterile barn
#

Hello. I am doing some preprocessing and have just 2 null values in a single column I want to look at. How can I look at just the two rows of data that have them? I have the typical suite of libraries installed: pandas, numpy, matplotlib and seaborn.

agile cobalt
#

something like df.loc[df.isna().any(axis=...)] might work? maybe also use subset= on isna

sterile barn
#

Yeah, that was just about the first thing that popped up after I rephrased my own question. This always happens 🙃

#

Thank you so much though 😅

left tartan
left tartan
untold ginkgo
#

does anyone do tracking of certain products of the web

left tartan
verbal venture
left tartan
#

That link explains it better than I would, could you check that first?

#

I understand your confusion, it’s just one of those things that’s how it works

tidal bough
#

the way I think of it: for reduction operations, axis=0 means "get rid of axis 0 by reducing along it".

#

so you have an (n,m) array, you do something like .sum(axis=0), you get an (m,) array.

desert oar
#

oh hah thats what reptile just said

#

so it's not that axis=0 means "operate columnwise", it means "operate everything-other-than-row-wise"

#

consider an array of RGB images, shape (m,n,3). let's say you want to find the average value of each color across all pixels across all images. that would be np.mean(images, axis=(0,1))

left tartan
#

What do you expect x_train[1] to do?

golden ridge
#

anyone knos a good machine learning algorithm for a tyre degradation prediction model? the feature is you input the compound and the tyre lap and it should give the expected time

desert oar
#

you could imagine that maybe tires do not wear consistently. maybe they exhibit a lot of wear in the first couple of laps, then wear rate flattens, and then the tire deteriorates rapidly at the end of its life. or maybe something completely different. but i would always advocate for the simpler model first

#

assuming you are actually interested in making predictions about F1 race outcomes, you have the problem where you are not physically observing and measuring the condition of the tire, so any proposed model is more like a guess or theory and there's no real principled way to fit that to any data, because you have no data

#

so in that case you would almost definitely want to go with the linear model, always go with the simpler model in the absence of other information

prisma hinge
#

Hello, I am currently trying to run a pretrained model that classifies the mnist number data set both from huggingface. I am having issues with the dimensions and format of the images. I have attached my code below along with the error raised and would appreciate any help regarding this. Thanks in advance.

prime galleon
#

Hello, I have dataset which have 1400 rows and 1800 rows. I am trying to recognize letters. My model can currently recognize every letter but A, B, D and H. I use randomforest algorithm. Do I need more data or is there some other way to solve this problem. At training it has accuracy of 97% but when trying in practice it doesn't recognize those above mentioned letters

golden ridge
plucky breach
#

Guys why do my spacy code doesn't return correct similarity, these sentences even are similar

import spacy

nlp = spacy.load('en_core_web_lg')

sentence1 = nlp("subjective test 3 _ Test paper (Biology) __ PDF ONLY __ (Neev 2024)")
sentence2 = nlp("biology test")
print(sentence1.smilarity(sentence2))
0.1846538... 
oblique quarry
#

Ive been implementing lda, following a guide, making some tweeks occasionally. But I cant stop asking myself why do I have to use the within class scatter matrix. When I look at the formula I'm really tempted to just go for the cov Matrix... it would be so much easier on the computer in terms of computations.

midnight harbor
#

Hello Everyone👋🏼

I am thrilled to share that I have participated in a Datacamp competition to show my analytical and machine-learning skills. Just like as you've supported me in past competitions, I am reaching out to you again.🙌🏼

Your support means the world to me. To increase my chance of winning, I kindly ask for a moment of your time to visit my DataCamp workspace and upvote it from the link 👇🏼

https://app.datacamp.com/workspace/w/83209d5b-2341-46d3-88c3-113ebb8d587b

Your upvote could make all the difference. Your encouragement and support have always been a driving force and I am immensely grateful for it. ☺️

Thanks for taking the time to upvote my work ♥️

By
Umar and Faizan

It's that time of the year when summer has come, and it brings a feeling of happiness and liveliness, especially if you're in the Northern Hemisphere…

silk sun
#

Can anyone help me in making a Sign Language Recognition

golden ridge
#

hey guys, which algorithm do you think it suits the most this model. it is a model which predicts the time in a specific circuit due to tyre degradation with inputs: compound, laps with tyre, laps in race.

I have used linear regresion but now what to try something else

past meteor
#

so for instance '50 hours'?

golden ridge
earnest wren
past meteor
#

This is likely a case where linear regression is not great 🙂

#

Probably look at a gamma regression

past meteor
golden ridge
past meteor
#

Can you plot the distribution of your target variable

golden ridge
#

yess

#

but ts bugged

#

i cant send it

#

illl dm you

past meteor
#

Don't DM me please

#

Could you make it a histogram, something like this?

golden ridge
#

i dont rlly know how to use matplotlib, could you tell me how to do it

past meteor
golden ridge
#

but do you know why my plot is bugged??

past meteor
#

The link has clear examples on how to, you'd do something like: sns.displot(data=penguins, x="flipper_length_mm", kind="kde")

golden ridge
#

this??

past meteor
#

Yes

past meteor
golden ridge
past meteor
#

it's just an example, you'd put your own data there

golden ridge
#

but what does penguin stand for

#

like what does it do

past meteor
#

Your data frame and x is your column

golden ridge
past meteor
#

I think you need to read through the documentation (both links I sent you)

golden ridge
#

and what would be the hue

past meteor
#

I won't tell you and that's in your best interest 🙂 Learning to read documentation is probably top 3 most important things in programming.

golden ridge
#

the thing is i dnt rlly understand what histogram are you askin for

golden ridge
past meteor
#

To find out you make a histogram or kde plot

golden ridge
past meteor
golden ridge
#

nono, im saying whats the graph u want me to plot

golden ridge
#

so y is laptime

#

and x is...

past meteor
#

I'm not going to answer that 😄 Time for you to do the work.

golden ridge
#

this is not about an implementation is about something u want me do to

#

i dont know why gamma reggresor would be better

past meteor
#

Read the stuff I linked and you'll know what X is. I won't always be here

golden ridge
#

flipper length mm??

golden ridge
past meteor
#

I'll leave you on your own for now with your homework, good luck! (I'm not trying to be annoying, it's for your best interest)

shut wren
#

hello can anyone help me with a bug with my project

#

its a pre-trained image classifier to identify dog breeds

#

i just hv one small problem i can't fix

#

if interested dm asap please i really need to complete this project

nimble hawk
#

Hello everyone,
I share data science tutorials regularly every week on YouTube and I wanted to share the playlists I've created. If you are learning about data analysis, data science and machine learning, I have plenty of videos that can help you on this journey.

Data science projects playlist ->https://youtube.com/playlist?list=PLTsu3dft3CWg69zbIVUQtFSRx_UV80OOg&si=cLljLTBYA9c48Bys
This playlist contains my end-to-end data science projects which I provide with the datasets i use.

I share courses on my channel too, the PySpark course that I share on my channel -> https://www.youtube.com/watch?v=jWZ9K1agm5Y&t=2628s

My channel for more -> https://www.youtube.com/@onurbltc

Thanks for reading, have a great day!

PySpark, the Python API for Apache Spark, empowers data engineers, data scientists, and analysts to process and analyze massive datasets efficiently. In this course, you'll dive deep into the fundamentals of PySpark, learning how to harness the combined power of Python and Apache Spark to handle big data challenges with ease. From data manipulat...

▶ Play video
left tartan
shut wren
#

oh alright

shut wren
#

i dont actually have a link for it

#

i was looking forward to getting on like a voice call to share my screen or smt

#

its from a course online thats why

verbal oar
#

can someone recommend course/book for deep learning for computer vision?
I'm interested mainly in CNN

prisma hinge
median fulcrum
#

What would you guys do to recognize the position In a online chess board image and return a FEN . Would you use train, models or coordinates in the board?

desert oar
desert oar
#

is it an image of a physical chessboard, or a computer generated 2d image or something else?

desert oar
#

but the error is that collections.Iterable was removed in recent versions of python, you need collections.abc.Iterable. it's essential to practice understanding and working through error messages on your own, it is a critical skill

median fulcrum
#

Would you have an idea?

#

I've already done image recognition

#

individually

#

how could I do in the chess board

#

with many pieces

#

and

#

I have to identify the squares that each piece are

#

already done

left tartan
#

Or a screenshot/whatever of a chess board where the positions are fixed?

median fulcrum
#

online chess board

left tartan
# median fulcrum screenshot

So can you decompose the problem to square level recognition? Instead of recognizing the board, just recognizing each square?

median fulcrum
#

But I want opinions In wich are the best way to do

#

for example, find a online chess board In any image and try to use it

#

not just this perfect cropped screenshot

sharp sierra
#
# Fitness algorithm
def get_fitness(area):
    return (1000 * ((GRAY_AREA_TOTAL - area) ** 4)) / (GRAY_AREA_TOTAL ** 4)

# Generate segments
def generate_segments(neural_network):
    segments = []
    total_length = 0
    
    # Use the neural network to produce a list of segment endpoints
    output = neural_network.activate([1])
    
    # Interpret the segment endpoints as pairs of (x, y) coordinates
    for i in range(0, len(output), 4):
        x_start, y_start, theta, length = output[i:i+4]
        x_start = (x_start + 1) * X_SCALING_FACTOR + BORDER[0]
        y_start = (y_start + 1) * Y_SCALING_FACTOR + BORDER[2]
        length += 1
        total_length += length
        if total_length >= MAX_LENGTH:
            total_length -= length
            length = MAX_LENGTH - total_length
            x_end = x_start + length * math.cos(theta * math.pi)
            y_end = y_start + length * math.sin(theta * math.pi)

            segments.append(((x_start, y_start), (x_end, y_end)))
            return segments
        x_end = x_start + length * math.cos(theta * math.pi)
        y_end = y_start + length * math.sin(theta * math.pi)

        segments.append(((x_start, y_start), (x_end, y_end)))
    return segments
#
def evaluate_genome(genomes, config):
    nets = []
    sets = []

    # Create a neural network from the genome
    for id, g in genomes:
        net = neat.nn.FeedForwardNetwork.create(g, config)
        nets.append(net)
        g.fitness = 0

    # Get the segments NEAT generates
    for net in nets:
        set = generate_segments(net)
        sets.append(set)

    # Implement fitness
    for i, segments in enumerate(sets):
        # Calculate the remaining areas
        area_original = calculate_valid_area(segments)
        area_flipped = calculate_valid_area(flip(segments))
        area_total = area_original + area_flipped

        # Get the fitness of the segments
        fitness = get_fitness(area_total)

        genomes[i][1].fitness = fitness

def run_neat(config, gen_count):
    # Create the NEAT population
    population = neat.Population(config)
    
    # Add a reporter to monitor progress (optional)
    reporter = neat.StdOutReporter(True)
    population.add_reporter(reporter)
    stats = neat.StatisticsReporter()
    population.add_reporter(stats)
    
    # Run the NEAT algorithm
    winner = population.run(evaluate_genome, gen_count)  # Specify the number of generations

    # Retrieve the best genome (neural network)
    best_genome = winner

    return best_genome

###
# RUN
###

# Set configuration file
config_path = "./config-feedforward.txt"
config = neat.Config(neat.DefaultGenome, neat.DefaultReproduction,
                     neat.DefaultSpeciesSet, neat.DefaultStagnation, config_path)

run_neat(config, 10)
#

I want to get the output generated from generate_segments() that performed the best once the simulation is finished. How can i do this?

(I am using NEAT-python)

sharp sierra
#

^full script

prisma hinge
past meteor
lapis sequoia
sharp sierra
#

the first 2 plots show the fitness. its that area in grey that isnt covered with either red or blue. The NN is attempting to draw lines whose combined length has a strict limit that meet the conditions

#

m and b represent the values in y = mx + b for which, in red or blue, the line goes through a segment in the set, or in gray, the line goes through the target area

#

In discrete geometry, an opaque set is a system of curves or other set in the plane that blocks all lines of sight across a polygon, circle, or other shape. Opaque sets have also been called barriers, beam detectors, opaque covers, or (in cases where they have the form of a forest of line segments or other curves) opaque forests. Opaque sets wer...

#

im super new to working with neural networks tho so im iffy on whether its not doing anything other than random selection at this point lmaooo

rose agate
#

does anyone knows about how to work with spatial autocorrelations? I have data spaced evenly every 10 meters of the change of a conditions and want to know the relationship to surrounding points. Most of the things I search on partial autocorrelations are to do with time-series which seems a little different, so not sure how to start

#

e.g. a partial autocorrelation for a time-series

lapis sequoia
wheat fox
#

Anyone familiar with gephi

versed gulch
#

Hi I have a 3D array which is a 3D image composed of 2D slices, is there a way I can rotate my 3D array in the x-z plane and the y-z plane?

serene scaffold
#

there's also flip, if that one isn't right.

ember pawn
#

hi , i wanted to ask about maths part of ai

#

is there any reference book ?

serene scaffold
ember pawn
#

thanks

unborn saddle
#

Hey anyone interested in developing AI
Can join me I'm going to develop AI

We can build and learn together 😀

serene scaffold
#

lol @past meteor I linked that because you link it

past meteor
#

Hahaha so I'm just giving myself a pat on the back 🤣

serene scaffold
#

you deserve it 👍

left tartan
#

So pat yourself on teh back with both hands 🙂

serene scaffold
serene scaffold
sharp sierra
tacit basin
#

do you have any recs for libs / tools for data anonymization?

tacit basin
#

also anyone could recommend books / resources for learning AI with C++? (asking for a friend 🙂 )

lapis sequoia
steep shadow
#

does anyone have any ideas about how i can begin learning ai in python (neural networks and like maybe image classification). I already have a good basis in python. I also don't want to spend money on resources and am looking for good free resources.

left tartan
lapis sequoia
median fulcrum
coral bridge
#

guys i need advice for how to segment english word by rules

golden canyon
#

hey guys, how do you prepare for the programmimg part of the interview? Leetcode?

#

I am applying for a graduate position

left tartan
stark bay
#

Hi... i want to ask a question from AI and DS engineers... what is ur view on how the AI art is created and if it violates copyright issues of artists... some of the artists go as far as to claim that bots trained on such models use stolen images and art available from artists without their consent hence not just the bot but the dev is also responsiblebehind them..and then worse is monetizing them... i wanna ask where can i find an appropriate library/dataset to build such a model that doesnt violate such rights...
Previously i have been using kaggle for such datasets... is there one such set available which doesnt do that

desert oar
stark bay
#

So are there claims legitimate? And if yes then why have devs chosen to take such measures to build bots/generators like that even if they can be illegal

weak mortar
#

im gonna play around with falcon today, now i cant really find any good info comparing 180b with 40b. which would you recommend me to go with at first? is 180b significantly more power hungry or harder to use?

#

can i even run it on a normal computer? it says somewhere 40b requires 90 gigabyte of gpu memory.....

abstract wasp
past meteor
stark bay
# past meteor These questions are above the paygrade of 9/10 *engineers* and make more sense t...

Hahah... surely.. the point is if they re right then is there such a data to get where such copyright claims arent available... and if their claim is not correct then how dare they insult us... it is either they havent predicted the future like it is and now crying over spilled milk or we misunderstood what such a thing could bring in our lives... usually the questions i asked from other fellows...were that they dont care...since art can be subjectivelemon_sweat

past meteor
#

I don't really know what you mean. At work we have a legal dept. if I want to know something I ask them

#

The AI art debate has so much nuance that most engineers (maybe I'm just projecting) don't have, it's a legal/philosophy matter I think. I can voice an opinion but it's likely going to be bad 😄

stark bay
#

It is fine for me... ur bad can be mine good and vice versa as well... but after having such a debate with such fellows i am now in a shock to even start my own ML model on such a thing... or not...

#

So for me...it is kinda a guiding light now

agile cobalt
stark bay
#

Not at all

agile cobalt
#

over half a million dollars

stark bay
#

I see...

#

Thats really high

agile cobalt
#

you can train a mini scale diffusion based image generation model that generates something like 32x32 grayscale images for 10 types of objects, but training something that generates high quality images for almost anything you can imagine requires an absurd amount of compute

#

which is why you pretty much only ever see giant corporations training their own models

#

there are a lot of different ways to customize these models though, for example a bunch of people fine tune Stable Diffusion to work better on generating specific kinds of images

stark bay
#

Yeah i figured... i dont wanna create a model like stable diffusion or Dall E... i just wanna create something smoller and a proof of concept... which wont require data from such artists... i really like the concept of qr code art so i wanna create a small model like that

agile cobalt
#

the "qr code" part aside, getting a model good enough to generate something people might recognise as "art" is already insanely difficult as-is

#

iirc the current methods to generate it are mostly using control net to guide stable diffusion, you might want to look into these two in detail first if you haven't yet?
(control net and stable diffusion)

stark bay
#

Oka... thanks for the help

#

I am just starting this field again so i thought to check this out as well... the imaging part...

weak mortar
languid prairie
#

Hi guys 👋🏻

For generating a synthetic dataset from financial PDFs :
I' want to do Query Generation:
For that, i think about using a pre-trained language model (such as GPT-3, GPT-4, or other LLMs) to generate queries based on the content of each text chunk.

But the problem is using OpenAI's GPT models, I would need to have access to the OpenAI API and set up API key.

Do u think I can use LlamaIndex instead to generate these queries ?

desert oar
languid prairie
#

It’s just for the questions and answers generated by GPT

#

From a pdf file

#

Like u can ask questions about the content of the PDF

#

but I want to generate those questions through another method in order to avoid using OPEN AI

#

‘Cause I don’t have API KEY / no budget for that

#

So I was wondering if I can do through Llamaindex model

desert oar
#

what do you mean by generate queries though?

#

or are you talking about fine-tuning a model using some text data that you have?

left tartan
#

This seems more of an NLP question; you want to query meaning from (presumably) 10-k’s and q’s?

desert oar
#

i mean, querying from a corpus of documents seems to be one of the really strong use cases for fine tuning one of these open models

#

ive seen it mentioned a handful of times now, that llama performs pretty well when fine tuned

#

i have zero personal experience with it though

left tartan
#

I downloaded llama, just need to finally get around to trying it. Halfway there, I guess 🙂

past meteor
#

I took many NN based courses in uni and ultimately a lot of my projects have been time series so there's overlap in methods but LLMs specifically are a very specific niche. I have 0 FOMO at the "next big thing" coming out every other week, once the hype settles down a little bit I'll catch up.

willow quest
#

for Pandas: does anyone know how to get a datetime64[ns] to work with pd.cut()? it apparently supports datetime64 but not [ns]. I'm just trying to bin the datetime into months

jaunty helm
#

How should I deal with related features?

Example: I want to predict housing prices, and two features I have are distance to nearest school and nearest school type(as in, say elementary, middle, high)

I could keep them separate, but intuition tells me that I could "combine" them somehow, or there was some way to inform the model that these two are related, which could yield better results

serene scaffold
#

or are you trying to group the dataframe by year/month for some subsequent operation?

#

(ie, group by year/month so that days in March 2020 aren't in the same group as March 2023)

willow quest
#

it's just data from the past few months, trying to bin the entries by month. By now we've found a workaround by making the .index the datetime and then do .index.month. but I was kinda expecting pd.cut() to be able to do this type of binning, considering I saw the following:

df['day_bin'] = pd.cut(df['date'], bins='1D')
#

so my assumption was it would be able to also handle bins='1M' 🤷‍♂️

serene scaffold
#

are you sure that what you're trying to do is neither of the two options I gave? I've never used pd.bin before, and I'm trying to understand what you are doing.

#

I don't know precisely what it means to "bin the entires by month", and that's what I'm trying to understand.

willow quest
#

every entry in my df has a column with a datetime64[ns] dtype. I'm just trying to group the entries of the same month together, so all entries with the month 'May' get in a bin, 'June' their own bin, etc.

#

just binning

#

i technically got what i want now, such that all entries are grouped by the respctive month in the datetime column. I was just thinking the pd.cut could make it easy

serene scaffold
small wedge
# jaunty helm How should I deal with related features? Example: I want to predict housing pr...

Hmm I'm not sure how much the model will gain from combining these features but there are definitely ways you could try. Say those are your only two features and you are onehot encoding the nearest school type. Instead of using one's in your encoding you could use a distance from the nearest school; i.e. [2.5,0,0] is an elementary school that is 2.5 miles away where [.5, 0, 0] would be one that's .5 miles away.

jaunty helm
#

Ah, so something like

elementary  |  middle  |  high
------------------------------
5.11           0          0
0              3.2        0
0              0          0
```Instead of only `1` and `0` the value is now the distance
small wedge
#

Yeah, although you might have to fiddle with the values there, normalizing them and/or making the distance inversely proportional to the size of the number if being close would raise housing prices, etc.

desert oar
jaunty helm
past meteor
desert oar
#

that's the way of the world

past meteor
#

sad but very true

weak mortar
#

Which of the popular open sauce gpt LLMs would you suggest for a system with 2x3060Ti (24gb vram + 128gb system ram)?

#

I came to realize falcon 180b that i was eyeing is way out of league, maybe even 40b is. 7b just sounds so low in comparison

desert oar
desert oar