#data-science-and-ml

1 messages · Page 410 of 1

haughty topaz
#

fire

#
df1.loc[df1["kwartaal"] == "Q1", ["month", "day"]] = 1, 1
df1.loc[df1["kwartaal"] == "Q2", ["month", "day"]] = 4, 1
df1.loc[df1["kwartaal"] == "Q3", ["month", "day"]] = 7, 1
df1.loc[df1["kwartaal"] == "Q4", ["month", "day"]] = 10, 1
#

This this be done more clean

serene scaffold
lapis sequoia
serene scaffold
limber token
#

I think I understand but an example is welcome 😄

quasi cape
#

Hi , guys , i have one small take home test.
So the data has ID column and some values in another column.

My requirement is.
If I enter an ID number , the code should return rows that are similar to the values in the ID i entered.

dusty valve
#

ahem, so i read a csv with pandas, (columns are ID Timestamp Contents Attachments), how would i use the data from the Contents in tensorflow?

#

nvm

#

i got it into a TensorSliceDataset

#

how do i fit it into a model

placid flint
#

question about general ai, but im trying to find a pattern in strings (im going to try and use this for reaction prediction in chemistry) anyone know a type of ai, library, or algorithm I could use or research about to go more in depth into this topic

#

thanks so much!

serene scaffold
#

because if your problem can be solved with an exact series of steps, you should just do that

placid flint
#

I dont mean to confuse you with the plusses but its not a mathematical equation but instead a chemical one

serene scaffold
placid flint
#

and really there isnt any easy way to do that

#

since there are far too many exceptions, reasons for interactions and all sorts of nonsense

#

so instead im trying to just feed it like 100 million example equations and hope it can find the relationships itself instead of having to code them all

serene scaffold
placid flint
#

no sorry :/

serene scaffold
# placid flint no sorry :/

it's where you take the inputs that you're given, and derive new inputs that help you figure out the outputs

placid flint
#

ohhh

#

is there any librarys of methods you would recommend for a text based version? or is it all the same kind of idea

serene scaffold
#

like, you can write something that determines if there's at least two Hs and at least one O in the input, and then have has_water as a feature

#

a feature is like... a property of the input

placid flint
#

is there any way I can have it derive that? since there are a helluva lotta different reactions, combinations, and exceptions that I would need to code

serene scaffold
#

why do you need to do this? because if you don't already have some ML experience, this sounds like a very challenging first project.

placid flint
#

yeah :/ its rlly annoying since very little things can change everything, and there are so many properties that can effect every part of the equation 😩

placid flint
serene scaffold
placid flint
#

alr, tysm for the help!

limber token
#

Hey, it's me again but with another question lemon_smile

I have a df with ~9500 rows, and I need to find the 'period' of 500 rows that has the highest sum. Can I do that without having to loop through it a bajillion times?

#

Dataframe looks like this

#

Also, how can I filter out every row (datetime format for the column) that has the same month AND year than another row?

fair nimbus
plush jungle
#

has anyone ever managed to run code from either the stylegan2 or stylegan3 repos?

#

I'm a huge fan of how advanced these gans are at creating novel human faces and I'd love to play with the code

#

but I'm told stylegan2 is basically abandoned, and that I should use this code instead

#

and the newest version is stylegan3

#

but I've been trying for months to get any of this code to run on both my windows computer and my ubuntu virtual machine

#

and every time I solve an error, a new error replaces it, until I run into an error that has an issue on the github project with no solutions, or the only solution given didn't work for me

#

and when I run into an error that isn't on stackoverflow, I really just feel like I've hit a dead end and it's impossible

#

is it just me, or are these codebases so outdated now that the only way they'd work would be with the exact versions of their dependencies that they released with

#

I'd love to know if anyone else can manage to run the generate.py files in either of these projects, and if so, what they did so I can copy it

grave burrow
#

The inner product cosine formula is a theorem in R^2 and R^3 by law of cosines. In higher dimensions, the inner product cosine formula is a result of how we define angle.

Why then does cosine similarity work in higher dimemsions? Seems to me that angle only has meaning in R^2 and R^3.

wooden sail
#

what one does is define the angle pairwise between two vectors

#

since vectors don't have a location in space, only a length and orientation, you can consider both vectors as starting at the origin and point toward a point in space

#

this gives you 3 points in space, which uniquely defines a 2D plane

#

the angle between the vectors is computed on that plane

limber token
wooden sail
#

this is the same as how the 2-norm or length or euclidean distance of a vector is computed: you can compute it for 2 coordinates at a time and see that this easily extends to higher dimensions by just substituting repeatedly. this is equivalent to drawing 2D planes and using the pythagorean theorem repeatedly

grave burrow
pseudo wren
#

Just finished my final data science project for a while

#

i'd love it if someone could review it really quickly

#

it's pretty short

#

but i want to make sure i got all the technical parts of it correct

wooden sail
native rune
feral acorn
#

"Here a K-level qualitative variable is represented by a vector of K binary variables or bits, only one of which is “on” at a time." Can anyone please tell me what does a K-level qualitative variable mean in dummy variables?

wooden sail
#

you'll find this as "one-hot encoding" in google

#

the idea is that you have a list of categories your object can fall into, or a list of adjectives if you prefer

#

only one of them can be true at a time

#

so let's say something like [big, medium, small, tiny]

#

if your object is big, you encode this as [1,0,0,0]

feral acorn
#

so thats it??

wooden sail
#

yep

feral acorn
#

so amongst a given answers, only one answer will be correct. but what's the use of the term k in this?

wooden sail
#

there are K answers to choose from

feral acorn
#

oh

#

and k-1 will be incorrect

wooden sail
#

in the example i gave you, K = 4

feral acorn
wooden sail
#

you need to have a good reason to believe this representation makes sense

#

i chose those adjectives because they are mutually exclusive

#

it doesn't make sense to do this with, for example, [big, small, blue, red]

feral acorn
#

oh

wooden sail
#

since 2 could be true at a time

#

then you have the problem that the possible options don't have the same norm

feral acorn
#

but minimum 1 will be correct right??

wooden sail
#

that's the idea, 1 has to be correct no matter what

#

if your vectors don't have the same norm, the gradients get really nasty after a few iterations

feral acorn
#

oh damn

#

that looks complex

#

thank you so much for the explanation @wooden sail !!

wooden sail
#

all righty

quasi light
#

hello, is someone have e-book or link about generative adversarial network (GAN) to generate image?
my teacher give me project but im still blind about GAN, (I've been trying to find and study GAN from internet).

patent pine
#

What is the best for image segregation if I have color ranges?!

For example:
Rang1 : rgb(255,0,0) - rgb(255,0,10)

Rang2 : rgb(255,255,0) - rgb(255,255,10)

wooden sail
#

computing inequalities? 😛

noble falcon
#

Do you guy agree?
Context - Deep Learning, Activation Function

wooden sail
#

sounds about right

hasty grail
#

if you're dealing with RNNs then you want tanh instead of ReLU

lavish fable
fair nimbus
lapis sequoia
#

why does it keep saying invalid syntax

lavish fable
viscid bridge
#

Can someone explain what do we mean by modality ?

steady basalt
#

Anyone here rly good at CV could you teach me segmenting techniques?

rose agate
# lavish fable

i think you'd want to create a unique dictionary containing the times, then do a groupby on the name sorted by the time, and find the lowest time in the dict that isn't in the time of the groupby

subtle spoke
#

anyone willing to help with a data science/ai problem? (in a Jupyter Notebook)

lapis sequoia
#

is there a simple way to filter out dataframe columns with equal name and if the combined values of those columns are less than a specific value?

summer pivot
#

Hi! would AI be a good career? some people told me that it'd be hard to get a job on that field.
If not, what's more hip?
and, is theory > practice? (if you had to choose a speciality that has only one of them)
Thanks!

steady basalt
summer pivot
steady basalt
summer pivot
steady basalt
#

I don’t know what type of person u are so no

#

Managment, consulting, anything easy and pays decent may be good

#

If u can’t sit down hours on end and do computational tasks

#

And spend over a year learning full time

#

Another downside to AI is if you aim to reach the top and are not insanely high iq and have a PhD in math I don’t think Ur able to work on research tasks for big tech

#

Take that with a grain of salt tho, but yeah making it to the top in such a field is not attainable

#

I doubt I’ll ever make it past senior ds for a medium company

#

Oh and don’t forget douchebag managers who can’t code and think they know more than u

#

If u can stomach all of the above I’d say 100% do it it’s worth it

summer pivot
#

alright. thanks mate

#

I think i can manage to get through this

viscid bridge
#

what do we mean by signal and noisy component in a data set ?

odd meteor
steady basalt
summer pivot
summer pivot
steady basalt
#

Good then it will be fun for u

#

Did u take stats in school? If so it will make life very easy

rose agate
summer pivot
hollow sentinel
#

you could do something like business analytics

#

join the dark side

#

afaik they do not need phDs in math

#

we make da dashboard

#

no seriously to this day i have no idea what a business analyst does besides analyze "business data" like customer churn etc.

steady basalt
#

Anyone here know how to score segmentation accuracy

#

Is it how many ground truth pixels were correctly guessed?

#

In a certain area around the segment

fervent vale
#

hello

#

I have some questions about clustering please

#

anyone can help ?

serene scaffold
fervent vale
#

OKKKK

#

I want to cluster a list of names according to 5 different parameters, these parameters have scores ranging from 1 to 100

#

I did this using the following code

#
"This part is used to compute the optimal number of clusters using an Elbow Curve---"
model = KMeans()
visualizer = KElbowVisualizer(model, k=(1,12)).fit(df_target)
visualizer.show()
"---Uncomment this previous section to compute the optimal number of clusters-----"
"We will fix the number of clusters to 4 in the following"

X = df_target.values


def calculate_cost(X, centroids, cluster):
  sum = 0
  for i, val in enumerate(X):
    sum += np.sqrt((centroids[int(cluster[i]), 0]-val[0])**2 +(centroids[int(cluster[i]), 1]-val[1])**2)
  return sum

def kmeans(X, k):
  diff = 1
  cluster = np.zeros(X.shape[0])
  centroids = df_target.sample(n=k).values
  while diff:
      # for each observation
      for i, row in enumerate(X):
          mn_dist = float('inf')
         
          for idx, centroid in enumerate(centroids):
            d = np.sqrt((centroid[0]-row[0])**2 + (centroid[1]-row[1])**2)
            
            # store closest centroid
            if mn_dist > d:
                mn_dist = d
                cluster[i] = idx
      new_centroids = pd.DataFrame(X).groupby(by=cluster).mean().values
      # if centroids are same then leave
      if np.count_nonzero(centroids-new_centroids) == 0:
          diff = 0
      else:
        centroids = new_centroids
  return centroids, cluster

k = 4
centroids, cluster = kmeans(X, k)




# Create the figure
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')


# Generate the values
x_vals = X[:, 0]
y_vals = X[:, 1]
z_vals = X[:, 2]

cluster = cluster.astype(int)
#

But this doesn't seems to work

#

do you have some suggestions please

#

For now the plot is in 3d

#

However given that I have 5 parameters

#

I didn't know how to proceed

#

The way I coded this is something I used for a 3 parameters problem, worked well

#

I use k-means

#

I looked on Stack Ex, apparently kmeans can work for problems up to 7 dimensions

lunar folio
#

I have 4 graphs I plot with SHAP:

shap.summary_plot(shap_values[0], X_test, feature_names = X2.columns)
shap.summary_plot(shap_values[1], X_test, feature_names = X2.columns)
shap.summary_plot(shap_values[2], X_test, feature_names = X2.columns)
shap.summary_plot(shap_values[3], X_test, feature_names = X2.columns)```
Is there a way to cycle between them with correspondent buttons?
steady basalt
#

Are you describing a web dev problem?

lunar folio
#

just jupyter

#

displaying results obtained with randomforest

#

I was thinking of using a callback function that clears the plot area and replots the graph selected when the user clicks

#

but i don't know how to do that with SHAP

steady basalt
#

What buttons

#

Interactive?

#

Use JS?

lunar folio
#

I'm not super familiar with JS and it would be preferred to use python in this task

#

could I use matplotlib maybe?

#

I don't know how well it works with SHAP

steady basalt
#

Plotly is what you need

mint palm
#

professionally, how are trained NN handed over to customer for deployment??

misty flint
#

¿

#

thats something asked during the business requirements gathering stage

#

usually not after the fact

spring marsh
#

Hey can someone please explain how LSTM neural networks work I have a very specific doubt and I cant find the answer anywhere.

lunar folio
mint palm
limber token
#

Good evening guys!

In Pandas, I have a 6000 column deep df, and I want to sum a specific column 500 by 500, something like this in Excel:

Next cell over would be =SUM(C2:C501) and so on
The df is exactly like this sheet, without the sum column

serene scaffold
limber token
#

Rolling sum

#

I've tried this:

serene scaffold
arctic wedgeBOT
#

Series.rolling(window, min_periods=None, center=False, win_type=None, on=None, axis=0, closed=None, method='single')```
Provide rolling window calculations.
limber token
limber token
# arctic wedge

Yeah, I tried it, but it threw DataError: No numeric types to aggregate, I even tried to use pd.to_numeric on the valor column, but still get same error

serene scaffold
# limber token

can you do selic_rates.dtypes and show the result as text?

limber token
#

Sure thing

serene scaffold
# limber token

so valor is fine. it's probably data that is giving you issues. try selic_rates['valor'].rolling(500).sum()

limber token
serene scaffold
misty flint
limber token
#

The results are different from Excel somehow tho

#

Probably decimal place difference? Python uses more so the sums are a little bigger?

serene scaffold
#

but it looks like the answer is only different for the first one?

limber token
serene scaffold
#

I don't like jumping back and forth with my eyes 😛

limber token
#

Like, 501 on Excel has the value of 500 on Python, 502 on Excel of 501 on Python and so on

limber token
limber token
dusty valve
#

can someone plz recommend a good tensorflow tutorial?

#

in text generation

dusty valve
#

nvm

wheat ice
#

does anyone know if pandas supports selecting specific table objects when reading from excel? note the blue table and top left where it says "Table4"

#

so Table4 might be in an arbritrary position

serene scaffold
#

if you read that sheet with pd.read_excel, whatever you get back is probably the most you're going to get.

misty flint
limber token
#

When converting a pandas column to datetime, is there a way to format the output? I'm currently passing format="%d/%m/%Y" but it's just telling pandas what the input format is

#

I tried using .strftime on the column later, but it converts them back to str

#

Also, probably last question: if I have a df with a date (daily) column and a rate column, but some days are missing, what is the fastest way to fill in the missing days copying the rate of the previous day?

serene scaffold
limber token
#

Display mostly, and also cut out the time stamp, I only want the dates

#

mm/dd/yyyy is very confusing to a non-american

serene scaffold
# limber token Display mostly, and also cut out the time stamp, I only want the dates

be sure that whatever you do, you don't change the underlying data. this SO post goes over the distinction and your options https://stackoverflow.com/questions/38067704/how-to-change-the-datetime-format-in-pandas

limber token
#

I actually looked at that post, but hadn't seen the .style.format option, thanks 😄

limber token
serene scaffold
limber token
#

It wasn't, will check out! 😄

misty flint
#

curious

#

oh wow this is super useful

serene scaffold
misty flint
robust granite
#

How do i deal with inconsistence city names across my data?

#

some city names are added,some are being droppes.some are being renamed

serene scaffold
robust granite
#

States are consistent cause government cant direclty change the name of state.
but as I go deeper to subdistrict, things gets messy.

serene scaffold
serene scaffold
#

look into "named entity resolution india places"

robust granite
#

So, I made the list of distinct state-district-subdistrict pair for all years and the numbers are about 6k for each year.

serene scaffold
robust granite
#

so i get the output of pairs which are different in both the list
for eg:a=[1,2,3,4,5,6]
b=[4,5,6,7]
op-[1,7]

misty flint
#

fuzzy string matching may or may not help

robust granite
#

heard of dedupe?

delicate summit
#

Hi, is there a community for Matplotlib related questions?

untold smelt
#

hi can i get help with this question?

#

To better recommend songs, Spotify decides to create an agent trained recommender as a model of reinforcement learning. After training, the agent tends to recommend the same songs over and over again. How could this behavior be corrected during the learning process?

a) Decreasing the exploration decay rate.
b) Increasing the exploration decay rate.
c) It is not possible to correct this behavior.
d) Increasing the learning rate.
e) Increasing the discount factor.

tacit basin
tacit basin
plain drift
plain drift
odd meteor
quasi valley
#

Is anyone skilled at object oriented programming I have a problem I need help with

lapis sequoia
#

What's everyone's favorite ML algorithm?

odd meteor
odd meteor
steady basalt
quasi valley
# odd meteor Don't ask to ask. Simply ask your question right away. I'm sure if you had done ...

Okay I'm trying to create a table using the tabulate module in python when. I've stored the data in a list of object instances of a class called Shoe when I run the tabulate function I get a runtime error: Shoe object not iterable. Why is that? Also how does one add a variable that is not initialized with init to each object instance by running a for loop through each instance which is stored in a list. Like so:

For object in list of objects:

Value = object. quantity * object.size

I want to store variable "value" as a property of each instance object.

summer pebble
#

is this a case of overfitting or underfitting. what can i do to improve the model?

steady basalt
#

I really need someone to help me getting my images in arrays properly

#

I’m reading when I’m with cv2 but it’s unable to convert to tensor so I can’t train a model

#

If anyones good at CV please dm me

mild dirge
#

Over-fitting just means that the model is performing really well on the data it is trained on, but is not able to generalize very well

lapis sequoia
#

I was trying to run the Dall-E mega model on my gtx 1080, but i seem to run out of vram, when trying to run their example code. Is there any way to add just a little bit of extra vram to my system without buying a super expensive 3080? Does Tensorflow support using the vram of 2 gpus connected through sli as a "pool"?
Are there any other ways i could get this to work?
I have 8GB of vram, which seems to be just a bit under the required amount

mild dirge
#

I thought you had to wait on a waiting list to use Dall-E?

#

Is there some download link to run it on your own machine?

lapis sequoia
#

Their model is open source, as far as i can tell?

#

lemme look up where i found it

mild dirge
#

Maybe it was Dall-E 2 then

#

Oh, but the transformer to generate the images is not open source?

#

Only the text to representation part I guess

lapis sequoia
#

This is where i found it

#

dunno if it is a modified dall-e mini, but it sure has the word mega in it

#
import wandb
run = wandb.init()
artifact = run.use_artifact('dalle-mini/dalle-mini/mega-1:latest')
artifact_dir = artifact.download()
#

But i cannot really evaluate it, as my gpu cannot run it :(

plain drift
lapis sequoia
#

That was my fear :(
Are you aware of how tensorflow interacts with sli? Could I get away with getting a cheapo 4g vram card and slapping it into my motherboard to have more vram?

I might just tinker with it on my CPU, if I can get that running. I'd like to have this on my remote server sometime anyway. Sad though

plain drift
#

Not sure, but since most of the new GPU's don't support SLI, I would think that the support for SLI might start dwindling.

#

That being said, it looks like keras/tensorflow handle that for you and make it easy, so it's possible that another little gpu for the vram would be helpful.
https://datascience.stackexchange.com/questions/46952/should-i-connect-my-two-gpus-with-sli-or-not-for-keras-tensorflow

stable isle
#

is anyone aware of the existence of an AI that can create Python programs? Something like "Create a program that traverses the file-system and finds all Linux distro ISOs and ftps them to this ftp server..." https://openai.com/blog/openai-api/ wow...

steady basalt
#

WHY

#

DONT

#

The errors

#

STOP

#

Willing to pay a lot of money to anyone who can fix my problem

#

Need to load images and masks into resnet

mild dirge
#

and masks?

arctic wedgeBOT
#

9. Do not offer or ask for paid work of any kind.

mild dirge
#

resnet takes an image and outputs a class, so what do you even mean mask?

#

And yeah no money

stable isle
#

I hate to be that guy but hey...

plain drift
#

Not a solved problem though. It's more of a "helper"

stable isle
mint palm
#

i need to very precisely track the motion on pupils for some task

#

so i decided to use openflow for that

#

is it ok to use that?
considering it is ok:
for eye detection i used dlib 81 landmarks,
it is best for that?
and for this i first need to detect faces
what is the best face detection model
is it vgg?

#

*so there are 3 questions here!

river oak
#

i am having trouble saving the image for the pytorch implementation of imagen running locally
https://paste.pythondiscord.com/udedoyuwaw this is my modified code from the repo. got help with the last 4 lines
original: https://paste.pythondiscord.com/unikupabiz
however, when i run it i get this error:

cv2.imwrite(f"image-{i}.png", images)
cv2.error: OpenCV(4.6.0) D:\a\opencv-python\opencv-python\opencv\modules\imgcodecs\src\loadsave.cpp:737: error: (-215:Assertion failed) image.channels() == 1 || image.channels() == 3 || image.channels() == 4 in function 'cv::imwrite_'

ive been trying various solutions such as putting in a subfolder, specifying the full path, trying without the f but no luck.

#

can anyone help? and if anyone know an alternative method of saving the image, it would be appreciated

river oak
#

ping if you know

tacit basin
river oak
#

although its fine if its just 1 at a time

#

i just cut it down to 1 in my attempt

tacit basin
#

But can open CV save batch? Or just single image?

river oak
#

idk

#

supposedly?

#

i tried with 1 prompt without the f and {i} doesnt work

tacit basin
#

If you do tote(images)

#

What do you get?

river oak
#

where would you put that

tacit basin
#

After you call imagen.sample

river oak
#

like this?

tacit basin
#

If it's not jupyter then you want to print it to see it

river oak
#

i wanna save it to my pc locally

tacit basin
#

print(type(images))

#

That's right

river oak
#

sorry if i ask too many questions im not that good at python

tacit basin
#

Just add this line for noe

#

*now

river oak
#

so like this?

#

or after

tacit basin
#

Yeah what does it print?

river oak
#

aight lemme run it now, it takes around 4 minutes thats why i wanna make sure

tacit basin
#

In the last code if you change to

for i, image in enumerate(images):
    cv2.imwrite(f'image-{i}.png', image)

Would it work?

river oak
river oak
hollow sentinel
#

yay

#

i found healthcare data

tacit basin
river oak
#

ah so like this?

tacit basin
#

Yes

river oak
#

aight

#
    cv2.imwrite(f'image-{i}.png', image)
cv2.error: OpenCV(4.6.0) D:\a\opencv-python\opencv-python\opencv\modules\imgcodecs\src\loadsave.cpp:737: error: (-215:Assertion failed) image.channels() == 1 || image.channels() == 3 || image.channels() == 4 in function 'cv::imwrite_'
river oak
tacit basin
#

It complains about number of channels in a image that it expects in png

river oak
#

i saw smth online about tensors but i didnt quite understand it

tacit basin
#

Where's that code from?

river oak
tacit basin
#

Is there a GitHub repo? I see

burnt prism
#

Is this reliable?

#

Data science roadmap

tacit basin
steady basalt
#

Anyone here able to help me with my computer vision segmentation

#

With Unet experience

inland drum
#

Hi. I am wondering if I could get some clarifications regarding these matters please.

In order to pass data to train an SVM or other ML estimator, given a dataset with nested / hierarchical data like in this SO question, is it always necessary to convert/flatten the dataset so that the facts are all present in each row (each example)?

Follow up question. If it is always necessary (meaning classifiers cannot be directly trained or predict based off hierarchical data (examples)), are the only available techniques to flatten the data to either:

  1. Group the nested data into one representative value
  2. Expand the nested data into columns/variables (new features)

Follow up question 2. Does RNN have the same need for flattened data or can they handle the hierarchical data out of the box?

SO question -> https://stackoverflow.com/questions/66961525/how-to-feed-a-nested-array-into-an-svm-model

wooden sail
#

these are really limitations of how the procedure is coded, more than anything else. from the mathematical standpoint, these hierarchies don't really have any meaning. it is equivalent to just take out the "nested" data and appending it at the end of the vector, for example

#

you can write an equivalent (multi)linear transformation nested with your favorite activation function regardless of how you choose to arrange the data. most libraries are written so as to take a single vector as input, and the order in which you do that doesn't matter as long as it's consistent

inland drum
wooden sail
#

what i mean is that if you were to look only at the math, the limitation is not there. the problem is the that the libraries pick a fixed implementation

#

so it's not really a machine learning problem, but rather a software/API design choice

inland drum
#

That I understand. Until that point at least.

wooden sail
#

beyond that, you have an input vector that has one or more entries that are vectors themselves

#

you can simply make one long vector out of this

#

this is kinda like what you said on your point 2, but it's not one-hot encoding

inland drum
#

Yep, Im using incorrect the one hot encoding term.

So you expand the data, in some cases this expansions is done through pivoting as the SO overflow post, or you could also come with aggregate variables that takes the vector from R^n to R (ie: the mean of the nested vector). Are those the only two ways?

past parcel
#

I am barley downloading Anaconda and I was curious as if why I am unable to check the first checkbox

wooden sail
#

you could technically turn the data into an ND array of arbitrary shape, not just a vector

past parcel
serene scaffold
past parcel
#

I should be able to check mark both checkboxes.

steady basalt
#

any calculus chads in chat able to tell me why my loss is 1x10^15

#

learning rate 0.0001 , batch size 1

wooden sail
#

rate could still be too high

steady basalt
#

its already far below default

wooden sail
#

whats the loss at the 1st iter

steady basalt
#

maximum, its only decreasing slowly

wooden sail
#

but it is decreasing? in that case you can try increasing the learning rate

steady basalt
#

it seems to converge though

wooden sail
#

the numbers alone mean nothing, just the trend. it's important to keep in mind that the admissible initial learning rate depends on the smoothness of the network's behavior. things like "default" don't exist

steady basalt
#

it stops decreasing and increases back to 12x10 15

wooden sail
#

try a smaller step size, then

steady basalt
#

is that rho?

#

where is that in keras model

limber token
#

Hey guys! 🙂

Anyone have a little time and patience to help me figure out why my code is displacing the correct values one day back please?

#

(There are extra days because I filled in the missing days for other purposes but the daily amount earned for the filled in days are 0)

steady basalt
#

Anyone know why I’m getting 99% validation accuracy in segmenting

#

Well, more like 97.8%

#

I know for a fact it’s too good to be true

serene scaffold
#

!code

arctic wedgeBOT
#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

limber token
#

Shit, I found the problem

#

In df.iterrows() is there a way to access the value of the previous row?

#

Should I store it in some variable?

serene scaffold
limber token
#

Okay!

limber token
# serene scaffold you can do that, but if you show me the implementation, I can see if there's a m...

So basically I have this df (attached image) with daily dates (data) and interest rates (valor_ajustado is them in decimal format for calculations).

Then I have the function: (don't worry about frequency)

What I noticed was that I'm calculating a day's gains using it's own daily rate, when I should be using the rate for the past day.

def calculate_interest(df, frequency, capital):
    calculations_df = filter_by_freq(df, frequency)
    answer_df = pd.DataFrame(
        [{'Date': df['data'].values[0], 'Capital': capital, 'Amount Earned': 0}]
    )
    
    if frequency.upper() == 'DAY':
        current_capital = capital

        for index, row in calculations_df.iterrows():
            # -> This next line
            current_capital *= (1 + row['valor_ajustado'])
            # <- This past line
            new_row = {
                'Date': row['data'],
                'Capital': current_capital,
                'Amount Earned': current_capital - capital
            }
                
            answer_df = answer_df.append(new_row, ignore_index=True)

    elif frequency.upper() == 'MONTH':
        pass
    
    elif frequency.upper() == 'YEAR':
        pass

    return answer_df
serene scaffold
#

appending to a dataframe (or numpy array) involves copying the entire dataframe into a new dataframe. it's incredibly inefficient

limber token
#

Oh damn

#

Had no idea

#

What's the best way to add a new row then?

serene scaffold
#

most people don't. I wish they'd just delete it from numpy and pandas

serene scaffold
limber token
#

Makes sense

limber token
serene scaffold
limber token
#

Okay, thank you so much 🙂

#

Holy shit

#

The program was taking ~55s to run and now took 1.5 lol

serene scaffold
inland drum
#

mental note. do not use df.append. got it. AA_Noted

serene scaffold
#

a thousand pepes on typewriters

halcyon grove
#

I managed to split the year,month,days

#

I need to hand in this assignment tomorrow

royal hound
#

hello there fellas im learning AI and im kinda clueless as to what to do to what im trying to do

#

trying to analyze a large set of files and then auto generate some new batch and manually rate them to train the model

#

not sure on how to do it to be honest

serene scaffold
royal hound
#

midi files

serene scaffold
#

so you're trying to generate music, or what?

royal hound
#

just midi files

serene scaffold
#

what do the midi files mean?

royal hound
#

umm its like the data of the melodies

serene scaffold
#

I assume you want your model to do something other than generate completely random audio.

royal hound
#

yea

serene scaffold
#

for the moment, you are the only one who knows what the midi files are, and what you want the output to be as compared to the input.

royal hound
#

this is what u see when u drag and drop a midi file into FL studio

#

its a app for making music

#

a midi file holds data for each note

#

like the tone pitch velocity

#

any clue @serene scaffold

stable isle
#

@royal hound are you trying to parse midi files?

#

i can help with the midi parsing/generating but other than that.....I know foo about training a model/ai/ml

limber token
#

Hey guys! Genuinely last question:

def filter_by_freq(df, frequency):
    filtered_df = df.copy()
    
    if frequency.upper() == 'DAY':
        pass
        
    else:
        date_obj = filtered_df['Date'].values[0]
        target_day = pd.to_datetime(date_obj).day
        target_month = pd.to_datetime(date_obj).month
        
        final_date_obj = filtered_df['Date'].values[-1]
        
        if frequency.upper() == 'MONTH':
            filtered_df = filtered_df.loc[filtered_df['Date'].dt.day.eq(target_day)]
        
        elif frequency.upper() == 'YEAR':
            filtered_df = filtered_df.loc[filtered_df['Date'].dt.day.eq(target_day)]
            filtered_df = filtered_df.loc[filtered_df['Date'].dt.month.eq(target_month)]
    
    return filtered_df

How can I also include in the .loc the very last row from the original df? Tried doing (for frequency month): filtered_df = filtered_df.loc[(filtered_df['Date'].dt.day.eq(target_day)) | (filtered_df['Date'].dt.date.eq(final_date_obj))] but didn't work.

royal hound
#

I just dont know/understand how to implement the ai part

delicate summit
#

Does anyone know how to use a rectangle with text in it as the legend of a chart in Matplotlib?

#

Something that looks like this in the legend box (two rectangles with different colors and example text in it)

-------------
|    69%    |    success (percentage)
-------------
-------------
|    31%    |    failure (percentage)
-------------
sleek tapir
#

Wat time is andrew ngs course being released

#

prefer u put a time on AEST

compact star
#

does anyone know how to use theano, I have doen the pip install Theano but whenI try to import it I get an error.

arctic wedgeBOT
#

Hey @compact star!

You either uploaded a .txt file or entered a message that was too long. Please use our paste bin instead.

compact star
sleek tapir
vestal spruce
#

Could anyone guide me on bullet hole detection system? As I'm quite frankly stuck on what to do next with these images

dusty turret
#

Any good source for aws deep racer competition?
It's reinforcement based machine learning platform and we need to develop a reward function which keeps car on track and finishes the race in fastest time
You get input parameters like distance from center and track width etc
What algo would be best suited for this scenario

cinder matrix
#

anyone got a tool/script about the PCA eigenface algorithm but with simple matrix inputs cause i wanna visualize the steps?
please @ when replying

winged crest
#

I want to pick the 4 corners of an image, and an n amount of random points in the image. Then I want to create a new image that interpolates between all of those points. How would I do that? I tried Scipy's interp2d function but interp2d is its own object I can't get it back to a 2d array for further processing. Anyone know how to do this?

#

It's for background subtraction of telescope data

earnest herald
#

tensorflow is running extremely slow, despite having installed CUDA, cuDNN and tensorflow-gpu. I'm running my deep q models, and they will take multiple hours to train on 1000 episodes, instead of minutes which I'd see on my old pc

winged crest
weary ridge
#

anyone comfortable with opencv and pytesseract?

serene scaffold
weary ridge
#

one of the text is not getting detected

#

its not proper so that i am not able to match with the text

#

how to make it predictable

#

can i send the pic?

#

SDLTC should come

jolly knoll
#

Hey there. How do we choose the number of components for NMF (Non-Negative Matrix Factorization)?

#

Or to be more precise n_components from sklearn.decomposition.NMF

#

(for topic extraction)

halcyon grove
jovial vault
#

Hi friends, new to this server and concept. Ive joined #help-potato with my problem - do I need to do anything else or just wait until someone can help?

rose agate
compact rose
#

Hello guys, I feel a bit stupid to dont knowing this... I know this is simple terms, but I can't get it right. What is the difference between groupby and aggregating?

spiral furnace
#

got any idea on what's the dif between SimpleImputer().fit_transform() and SimpleImputer().transform() ?

mild dirge
dusty valve
#

can someone recommend a tutorial on how to make a text gen model with a custom dataset

spiral furnace
# mild dirge

I've read but still can't get why you do fit_transform() on your train data and transfrom() on your valid data since you still train your data afterwards

spiral furnace
indigo cove
#

Hello

#

Anyone here that could help me with python pulp linear programming?

steady basalt
lapis sequoia
#

Hello, can someone help me get feature importance for my data?
This is the code I'm running right now:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
importance = DecisionTreeClassifier.feature_importances_

myModel = DecisionTreeClassifier()
myModel.fit(X_train, y_train)
myModel.predict(X_test)

print(myModel.score(X_test, y_test), myModel.feature_importances_)

It shows feature importance, but without any labels. I want to show the name of each feature next to its feature importance. Any ideas? Thanks

mild dirge
#

How does it show it?

#

If it is just a list of numbers, it is likely in the same order as your training data

lapis sequoia
#

Yes, it's just the list of numbers but how would I know which number belongs to which feature for datasets with a large amount of features? And how can I sort the list of feature importances by which ones are the most important and knowing which feature they belong to?

#

I did this to get every feature with its importance:

for i in range(0, len(df.columns)):
    print(X.columns[i], myModel.feature_importances_[i])

But I still don't know how I can sort it by feature importance

mild dirge
#
feature_score = sorted(zip(X.columns, myModel.feature_importances_), key=lambda t: t[1])
#

@lapis sequoia

#

You zip the columns with the scores to get them grouped in tuples

#

then you sort the tuples

lapis sequoia
vestal spruce
lapis sequoia
#

Does anyone have any recommendations on where to get Forex(EURUSD specifically) data from? I would like to be able to get data in 5 second intervals.

blazing bridge
#

this might be a dumb question but can someone help me with this

#

I have this corpus of text and the red highlighted is the label itself and after the \ is the actual text itself

#

can someone help me turn that into a list of

sentences = []
labels = []

blazing bridge
#

pls ping me if you can help

native dome
#

hello. anyone k now how to use power BI ? I'm wandering how to merge two columns together WITHOUT affecting original table raw data and only changing the format in a matrix visual. (everything is only visual affections no need to mess with data)

misty flint
#

theres a way to do the aggregation you need with DAX

#

youd have to look into it

#

should be on the microsoft docs

blissful bone
zenith spear
#

Summer just started and I haven't been in school for over a year now. And yet, here I am, watching Khan Academy, taking notes in a physical notebook, and testing stuff out in Desmos, trying to figure out derivatives and dot products and sigmoid functions just so I can make a computer predict if some number is a 1 or a 0

serene scaffold
zenith spear
serene scaffold
zenith spear
#

Yeah, I'm a college CS student, but I've kinda just slipped through cracks and moved at the least convenient time so I've only had some class on local history since I graduated high school

#

I'd be doing this if I wasn't going, though. I just love technology and finally feel competent enough to understand this stuff

dusty valve
#

is an accuracy of 0.42 good for a text gen model? i made it myself, it's nowhere near gpt-2

serene scaffold
violet monolith
#

Hi,

oak olive
violet monolith
#

I am a professional ocr developer

oak olive
#

Really?

violet monolith
#

yes

oak olive
#

The ocr perfomance, was not that good.. Around 60%

#

I believe that the binarization is not correct

#

What do you think?

#

Or what would you suggest?

violet monolith
#

Please send me the example image

#

if you can send more images, it is better

oak olive
#

Hmmm

#

Let my try

violet monolith
#

yes

oak olive
#

By the way, I know that this is a Python server

#

But I am using Java

#

Just for educational purposes

serene scaffold
misty flint
#

i can hear the pain and frustration through these words

#

straight up an entire backstory in this one

lapis sequoia
#

could someone tell me if mediapipe with its pose solution is machine learning or deep learning? Thanks

weary ridge
#

can someone shed some light on why some of the words are not detected in my code using opencv and pytesseract

clever sorrel
oblique agate
#

"input type (torch.floattensor) and weight type (torch.cuda.floattensor) should be the same"

#

How to fix this error?

odd meteor
lapis sequoia
odd meteor
# lapis sequoia why does it say that it is a machine learning solution? then what algorithm does...

I believe there's a wide range of stuff one can do with mediapipe aside using it for computer vision projects.

https://www.assemblyai.com/blog/mediapipe-for-dummies/

AssemblyAI Blog

With just a few lines of code, MediaPipe allows you to incorporate State-of-the-Art Machine Learning capabilities into your applications. Learn about MediaPipe and how to use its simple APIs in this beginner's guide.

lapis sequoia
odd meteor
lapis sequoia
odd meteor
lapis sequoia
# odd meteor Deep Learning is still Machine Learning, so yeah.

but I read on the internet that artificial intelligence is machine learning which contains deep learning but if I have to explain to a person what category is mediapipe, I must say that it is machine learning and it is vision by computer? I'm confused about this

odd meteor
lapis sequoia
peak ridge
#

Hey anybody here into data science/data analysis / data field?
i need some strong career /skill suggestions to make progress
thx in advanced!

#

pls ping me if u're into it

haughty topaz
#

What's like the cheapest way of getting a python file to run 24/7 on a loop of like 30mins

#

without having your computer running all the time

haughty topaz
#

excel

hasty grail
#

if you're running your script on your own computer, you can't exactly keep running that when it's switched off, so you'll need to use a remote machine/service

dusty valve
#

they aren't coherent most of the time but they look about right for a self made text gen

#

for example

#

i entered "binary" as the string, and it returned "I want to string the python"

serene scaffold
# dusty valve they look good

well, you're not going to make something that rivals GPT-2, so if you've generated sentences that "sound good" and you learned from doing it, then that's a win.

compact rose
#

hello guys, does anybody here already used PCA in pyspark that can help? im trying to use in a highly correlated features,but i dont know how

bold timber
#

Now, I'm studying time series forecasting. But I have a question: Why do when I re-run the model the result follows to change?

mild dirge
#

With the same trained model?

#

Because predicting is normally deterministic, but training is not as there is randomness in weight initialization f.e.

#

@bold timber

bold timber
#

this is my model that I've build

mild dirge
#

Yeah but when you test it the second time, did you fit the model again as well?

bold timber
#

this is my first plot after running the model

#

and this is my second plot when I running the model for twice

feral acorn
#

hello, im studying machine learning and rn im studying linear models and least squares section, i saw this particular equation. is it necessary for me to remember this equation? Also i can somewhat understand what this equation does so is it fine for me to proceed further? or do i need to do some problem solving sums related to it?

wooden sail
#

all of machine learning deals with that equation, so you certainly need to remember it

feral acorn
wooden sail
#

one way to interpret it geometrically is as an affine hyperplane. this means that, if you work in 1D, this equation represents a point on a line

#

if you work in 2d, it represents a line on a plane. in 3D, it represents a plane inside a cuboid. in 4d, a cuboid inside a ????, and so on

wooden sail
#

because the sum written there in sigma notation is the same expression as a dot product

#

that means that you can take several equations of this kind, and write them as a single matrix equation

#

you reach the familiar y = Ax + b when doing so, and machine learning deals extensively with such equations

feral acorn
#

okay so here's whats happening here, we are doing a dot product between Xj and B-hat j. and we are doing this for P times and then adding all these dot product values and then adding a B-hat 0 value right??

wooden sail
#

not quite

#

or well, that could actually be

feral acorn
#

oh

#

i thought sigma means summation

wooden sail
#

the equation as it is is ambiguous unless you tell me which quantities are vectors, matrices, and scalars

#

it does mean summation, but idk if the stuff inside are scalars or vectors

feral acorn
#

so like how does knowing if the stuff inside is a scalar or vector affects the addition??

wooden sail
#

yes

feral acorn
#

oh yes

#

vector addition and scalar additions are different

#

so is thats whats affecting it??

#

or am i wrong?

wooden sail
#

multiplication is the issue, not addition

feral acorn
wooden sail
#

multiplication of matrices and vectors adds in other summations

feral acorn
#

ooh

wooden sail
#

.latex e.g. if $\boldsymbol{x} = [x_1, x_2, \dots, \x_n]$ and $\boldsymbol{y} = [y_1, y_2, \dots, \y_n]$ are vectors and we take their dot product $\boldsymbol{x} \cdot \boldsymbol{y} = \langle \boldsymbol{x} , \boldsymbol{y} \rangle = \boldsymbol{x}^{T} \boldsymbol{y}$

strange elbowBOT
wooden sail
#

sigh this thing

feral acorn
#

damn

wooden sail
#

.latex e.g. if $\boldsymbol{x} = [x_1, x_2, \cdots, \x_n]$ and $\boldsymbol{y} = [y_1, y_2, \cdots, \y_n]$ are vectors and we take their dot product $\boldsymbol{x} \cdot \boldsymbol{y} = \langle \boldsymbol{x} , \boldsymbol{y} \rangle = \boldsymbol{x}^{T} \boldsymbol{y}$

strange elbowBOT
wooden sail
#

well, i'm not gonna try and spam it here

feral acorn
#

XD

eager remnant
#

Hello peeps i have a Web scraping question
hey guys i need to know if there's a name or if what i'm trying to do is even possible. Let's say i have two functions

def a(self,response):
  yield from response.follow_all(xyz, self.b) 

def b(self.response):
  yield{
      "x":'x',
      "y":'y'
   }

result will be something like this

[{
      "x":'x',
      "y":'y'
   },{
      "x":'x',
      "y":'y'
   },{
      "x":'x',
      "y":'y'
   }]

What if i wanted to add "z":'z' but from the function a ? is it possible to do something like

def a(self,response):
  yield { "z":'z',
    yield from response.follow_all(xyz, self.b) 
  }
def b(self.response):
  yield{
      "x":'x',
      "y":'y'
   }
misty flint
eager remnant
# eager remnant Hello peeps i have a **Web scraping question** hey guys i need to know if there'...

basically i have a structure like this

/sectorsPage          / ==== needed for sector name
--//categoriesPage    \ ==== needed for sector name
----//subCategoriesPage
-------//organizationsPage
----------//organizationDetailsPage <-- needed for coordinates
----------//organizationDetailsPage
----------//organizationDetailsPage
----------//...
-------//organizationsPage
-------//...
----//subCategoriesPage
----//subCategoriesPage
----//....
--//categoriesPage
--//categoriesPage
--//....

sample output

[{"Nom": "ETS TAKTADSQK MOURAD  ", "Description": "      Notre tablissement toute industrie et les besoins de nos clients....", "category": "                ", "Addresse": " BP N 16 3089 ", "Tel": " ", "Fax": " ", "E-mail": " ets.@gmail.com", "URL": "http://www.made-in-tunisia.net/vitrine/index.php?tc1="},

{"Nom": "", "Description": "      Raison sociale : ", "category": null, "Addresse": "    ariana", "Tel": " ", "Fax": " ", "E-mail": " @yahoo.fr", "URL": "http://www.made-in-tunisia.net/vitrine/index.php?tc1="},

] 
#

but i want it to be

[{"Nom": "ETS TAKTADSQK MOURAD  ", "Description": "      Notre tablissements besoins de nos clients....", "category": "                ", "Addresse": " BP N 16 3089 ", "Tel": " ", "Fax": " ", "E-mail": " ets.@gmail.com", "URL": "http://www.made-in-tunisia.net/vitrine/index.php?tc1=","SECTOR":"finance"
},
{"Nom": "", "Description": "      Raison sociale : ", "category": null, "Addresse": "    ariana", "Tel": " ", "Fax": " ", "E-mail": " @yahoo.fr", "URL": "http://www.made-in-tunisia.net/vitrine/index.php?tc1=","SECTOR":"finance"},
] 
feral acorn
wooden sail
#

here we go

#

what range?

#

and what theta

feral acorn
#

oh fuck

#

rangle**

#

my bad

#

nvm that

feral acorn
wooden sail
feral acorn
#

now it cleared my doubts

#

also

#

should i look into how this was proved? or should i just remember this as it is?

wooden sail
#

here's the final form

wooden sail
feral acorn
wooden sail
#

by using latex

#

but the bot here is in... poor shape

#

so i'm on overleaf

feral acorn
feral acorn
wooden sail
#

not corollary. definition

feral acorn
#

oh

wooden sail
#

this is what i wrote on overleaf

feral acorn
#

okay okay thank you so much for the explanation :)

feral acorn
wooden sail
#

latex is good to learn because it lets you nicely typeset your maths

feral acorn
#

ooh

wooden sail
#

this is what people use to write papers, books, magazines, blogs, etc. that are math-heavy

feral acorn
#

oh sheesh

wooden sail
#

and overleaf is an easy way of using latex, since you don't have to download anything nor compile it yourself

misty flint
#

yep yep

feral acorn
#

now i understood how they typed all that

misty flint
#

overleaf is nice

feral acorn
#

and i was trying to do that on microsoft word 💀

misty flint
#

def worth bookmarking if you havent already

wooden sail
#

microsoft word also allows tex-style input in equation environments

feral acorn
#

i only know header and body in word XD

#

also ik its a dumb question but im reading this book called "Elements of statistical learning" and i have no knowledge in ml

#

is it a nice idea?

#

or should i stop and start with a different book or resource?

wooden sail
#

presumably you'll learn what you need there

#

if you find it's going too fast, go back to something simpler

feral acorn
#

idk much about the resources but thanks ill look up online for some

#

its taking me some time to understand it

wooden sail
#

you can't escape learning at least basic linear algebra and multivar stats if you're doing ML, so you'll have to pick it up somewhere or another

feral acorn
#

its just that i need some basic brushup which im doing currently by following 3blue1brown's videos

wooden sail
#

i disagree, given the questions you just asked 😛

wooden sail
#

the stuff i just wrote out for you is week 1 of linear algebra

#

those are just definitions of basic operations with matrices

feral acorn
#

so is my approach to ml wrong?

wooden sail
#

there's no such thing, learning is different for everyone

#

and stats is such a wild, weird field that you can almost learn it disjointly from everything else

feral acorn
#

oof

wooden sail
#

at some point you might need linalg for it though, so you should brush that up sooner rather than later

#

anyway, the interpretation of this stuff is the intersection of affine hyperplanes

#

each equation of the kind you shared is the equation of a hyperplane that may or may not pass through the origin

feral acorn
#

ahh so far i need to look into scalars and vectors and matrices right??

wooden sail
#

mhm

feral acorn
#

thank you

#

also

#

may i ask how did you learn ai and ml?

#

it might help me in guiding me into it

wooden sail
#

i did a masters and am doing a phd 😛

#

so, going to class, reading books and papers, and writing papers

feral acorn
#

oof

#

thats sick man

wooden sail
#

i like gilbert strang's linalg book

#

moses and stoica's spectral analysis of signals

#

axler's linear algebra done right (this one is on the harder side)

#

statistical signal processing by kay

wooden sail
#

idk, still not sure if academia is my thing to begin with

silent flare
#

Hello! i'm going crazy.. i don't understand why with matplotlib, once i make an interactive graph, the graph is not updated on slider change. it must be something wrong with the update function but i really don't understand. Could anyone give a look? Many thanks! https://paste.pythondiscord.com/uhilebikal

misty flint
gray blade
hollow sentinel
#

make small projects

#

don't get stuck up on the math, the math will come with practice

#

and it's gonna take time so be patient

lapis sequoia
#

How long it takes to learn pytorch?

agile cobalt
#

depends on how you define "learning" it, and how much previous knowledge you have

feral acorn
feral acorn
lapis sequoia
#

funtion and realation

#

Little trignometry

agile cobalt
#

if you know how neural networks are structured, learning how to use PyTorch should be simple
actually understanding why you build them the way they're built or getting good results from your projects is a completely different topic

gray blade
feral acorn
wooden sail
#

if you just want to use existing libs, you probably need nothing more than some base intuition. if you wanna produce new results yourself, you need the maths

feral acorn
wooden sail
#

just in general, since you're all discussing different studying methods and different math backgrounds

feral acorn
#

Yes our college went taught us some maths like complex integration and matrices and vectors so i thought I'll be able to get into ml

lapis sequoia
#

is pytorch is capable of doing all the things?

#

@feral acorn show me your notebook?

feral acorn
lapis sequoia
#

It will help me

feral acorn
#

I don't have a well maintained notebook lol

lapis sequoia
agile cobalt
feral acorn
#

But if i have to go through, its calculus (differentiation, integration, maxima, minima), linear algebra (matrices and vectors and scalars),and iirc some statistics

agile cobalt
#

"doing all the things" is way too broad, but reality is: AI is not magic. They are made to do one thing each, but they do it well and fast

hollow sentinel
#

gets your hopes down 💀

#

they're not wrong

feral acorn
hollow sentinel
#

some

feral acorn
#

Oof

hollow sentinel
#

some statistics

lapis sequoia
#

It didn't understand 1 step hopes== self esteem?

feral acorn
#

Yes i don't recall most of the statistics

#

So I'm saying that ik some statistics

agile cobalt
feral acorn
lapis sequoia
#

Statistics is for ds algo

feral acorn
agile cobalt
# feral acorn What do you mean by "how you intent to use it"?

if you just want to make a simple "is it a dog or a cat" model, you can download an existing model and use it easily without understanding anything that's happening behind the scenes
if you want to make something that does not exists anywhere in the world yet, good luck

lapis sequoia
hollow sentinel
#

e.g besides downloading datasets from kaggle and running models on them

#

like classification and regression models

agile cobalt
hollow sentinel
#

which my github will be full of soon

#

sith kermit

agile cobalt
#

like I said before, AI is not magic - they are taught to do one task, and you must have ways of giving it input & reading the output

feral acorn
lapis sequoia
agile cobalt
hollow sentinel
#

it sounds like you need help framing the problem

lapis sequoia
#

Btw can you break captcha by ai-ml? 😂

#

Break=solve

wooden sail
#

the whole point of captcha's is for your input to be used to train AI

#

it's a two birds, one stone thing

slim dew
#

welp

#

i got an idea.

#

take capcha. re draw it. then google image it. it should show the thing.

#

idk not sure just a concept

spiral palm
#

weather data-driven art project using matplotlib

#

thinking of ways to carry out the next step of displaying it in places, i was thinking of just generating one per day and putting it on a website but i could also automate posting one to Twitter and embedding that on a site

median fog
#

Can anyone suggest any good courses or certifications for machine learning within python?

wooden sail
#

there's a line of tensorflow courses on coursera that covers the basics without going in depth into the math. you can ask for financial aid ifyou're a student of any kind, letting you take them for free while still getting a certificate

gray blade
tired oak
#

this has probably been asked plenty of times here, but does anybody know any fully structured path for data science/machine learning? there is lots of stuff online but it's hard to know what to read/watch first before moving to the next step

blazing bridge
#

Im doing text classification and I have 5 labels

#

1,2,3,4,5

#

and this is my model

#

When I put the last dense layer as 5 I the first image I sent

#

Loss of nan

#

but when I put 6 into the last dense layer I get a normal output. Why is that and can someone tell me if im doing something wrong

mild dirge
#

The labels should probably be 0, 1, 2, 3, 4 @blazing bridge

#

I would think, but not sure

blazing bridge
#

the labels are 1,2,3,4,5

mild dirge
#

So what does the model output?

blazing bridge
#

its supposed to predict these 5 labels

mild dirge
#

Right, but what does it predict?

#

can you show the output

blazing bridge
#

I have made predictions with it yet

#

I can try doing that

#

but idk why it wouldnt be 5 in the last dense layer instead of 6

mild dirge
#

What I think is that the model outputs 5 logits, and gets the index of the highest logits, which goes from 0 through 4

#

and your labels have index values to 5

#

thus giving an undesired result when comparing them

#

But when increasing the final dense layer, the output is size 6, so the index can go to 5 now

#

Giving a valid loss

blazing bridge
#

so do I just keep it at 6 then

#

because at 5 nothing rlly happens

mild dirge
#

You reduce the labels by 1

blazing bridge
#

so I reduce every label by 1

#

can i use a for loop for that

mild dirge
#

you have a vector labels?

#

tensor or numpy array

#

Just do labels = labels - 1

blazing bridge
#

I have a list

#

that I then convert to a numpy array

#

it works now

#

with the dense layer of 5 at the end

#

I just used this for loop

#

final_labels = []
for i in range(0, len(labels)):
final_labels.append(labels[i] - 1)

mild dirge
blazing bridge
#

also any recommendation do lower validation loss

#

theres a huge difference in loss values between training and validation

mild dirge
#

Look up "overfitting"

blazing bridge
#

yeah i tried alot of dropout layers but my validation is still so different from the training

#

and the validation loss keeps going up

charred light
#

If I have variable age (year, bounded by 1-99) with missing values. Is there a difference between making age a categorical variable with "missing/na" as a category vs keeping age as a continuous variable with a null/missing flag indicator variable? I feel like the are the same as age acts as a discrete variable in this case.

serene scaffold
mild dirge
#

You probably don't want to make it categorical/1-hot encoded, as the output for someone with age 43 will likely be similar to output for someone with age 44, whereas categorical data would not maintain this relation.

hollow sentinel
charred light
charred light
mild dirge
#

If you think you know how to group the ages then maybe yeah

charred light
#

I guess I'll find out once I try both and get the results. But I was just curious if there's some theoretical justification that's already known.

mild dirge
#

But maybe there is a big difference between a 51 year old and 53 year old, but you are binning it in 50-60

#

So you need to think about that

charred light
#

Yes, that's a good point too.

native rune
#

How to apply multilingual bert for non-English text classification?

#

Does anyone have idea or if you have any good resources please share them

dusk tide
#

Is this a normal distribution??

wooden sail
#

looks more poisson-like to me

clever sorrel
#

is there anyone who s comfortable with cv?

#

one word is not getting detected in my code properly

lapis sequoia
#

I have that problem too

rose agate
#

from my 3.5 minutes of cv research I know that you should usually transform the image so it's black text on a white background and sometimes use small amounts of gaussian blur if the text is sharp

short heart
#

Hey guys I need a little advice. I have a feature which consists of age when person started drinking/smoking. The problem is that there are people who have never drunk/smoked and their values are nan. Whats the best way to fill them?
I tried filling with high ages like 100 so that model would think that the later the person starts drinking/smoking the better it is, but it didnt improve score

wooden sail
#

sounds like something rather to be included in the cost function, not as a feature

#

you could have an extra penalty term added to the cost function, and it is multiplied by something like not isnan. then for people that never started, there is a nan and this term is multiplied by 0

brisk sage
#

I'm writing a paper on my doctoral thesis in medicine and did all the statistical analysis using python and scipy/pandas. How likely is it, that there are differences from python to SPSS, if the exact same test is performed with both?
Since my tutor is unfamiliar with python (as most physicians have no idea about anything IT related) I'd like to know, how "scientifically acknowledged" a pure python analysis is

spiral furnace
#

what is the difference of doing ** newdf = df ** and ** newdf = df.copy() **

#

if I just do newdf = df and then change something in df then I should see the change in the newdf too?

wooden sail
# brisk sage I'm writing a paper on my doctoral thesis in medicine and did all the statistica...

it should be fine, the mathematics are the same. what might change are default parameters for the analysis, e.g. the number of bins for histograms, whether normalization is done, if means, variances etc. are computed as if for a sample or a full population, and the like. that means it is possible to tweak the parameters of spss to do exactly what python does, and also the other way around

serene scaffold
bleak coyote
#

Hey, does anyone have good materials for PCA

#

Im trying to do one based gender and political parties but its not working out

#

😢

wooden sail
#

you can look up covariance matrices, eigenvalue decomposition, and singular value decomposition to learn more about it

native rune
#

I mean, comparing the text present in image with original text(.txt file) and applying bert model to detect errors and grammatical mistakes

native rune
bleak coyote
#

Because I only have data with Name Surname Gender Party columns

bleak coyote
wooden sail
#

yeah you have to turn everything into numbers somehow

#

you can do onehot encoding or just assign a scalar to each class (subspace dimension is invariant but the principal components themselves will look different depending on what you choose)

bleak coyote
#

Yea I did one hot encoding on gender

#

Thanks for the help

#

Found this kaggle article which is actually really useful

wooden sail
#

the names are probably not important in your analysis

bleak coyote
#

Never even knew about df.corr()

compact egret
#

hi im trying to develop a deepl model, is it bad that i have more loss on my training data than validation data?

#

or not necessarily

mild dirge
#

Are you using dropout?

compact egret
#

no

mild dirge
compact egret
#

ok nm i have no idea what dropouts are just got started xd

serene scaffold
plucky ibex
acoustic halo
#

There's loads of reasons it can happen, dropout, maybe you are accidentally using your validation data while training, maybe the validation data is easier to classify

native rune
native rune
acoustic halo
#

That's not classification though

#

Classification would involve giving discrete labels

#

Plus depending on how you are calculating accuracy, ML seems a bit overkill anyway

brisk sage
# wooden sail it should be fine, the mathematics are the same. what might change are default p...

Thanks for the answer, I’m sorry for the late response but I didn’t see it earlier.

So you’re saying that I can trust the results Python is giving me to be the same (regarding something like an independent sudents T-test, wilcoxon signed rank test, Shapirpo-Wilks test or spearman correlation), without having to validate it via SPSS (and therefore learn how to do that plus paying the very expensive program)?

hallow salmon
#

here for the same state code i need to add all the boys of differeent districts nd make it as a single row

native rune
native rune
acoustic halo
#

Fair enough, since you said "is text classification according to labels possible using bert?", one would assume you wanted to classify something

#

How exactly do you want to find the errors between them? Like count the number of letters different?

brisk sage
brisk sage
#

or nested in a for loop

for code in codes:
  print(f"Code: {code}, Total Boys: {df[df["state_cd"] == code]["Boys_total"].sum()}")```
serene scaffold
#
df.loc[df['state_cd'].eq(36), 'boys_total'].sum()
#

also, people can ask pandas questions in this channel.

#

@hallow salmon did you figure it out?

hallow salmon
serene scaffold
still frost
#

yo i need help with matplotlib basically i have a huge dataframe containing a date and data from 100+ countries, now i need to show all of those countries' names as labels, i currently have this

mild dirge
#

Yeah well, that's what you get when you plot 100 lines

#

What did you expect?

still frost
#

yes but i need to show them, like this

mild dirge
#

Right but that's 2 lines, you have 100

still frost
#

i have no choice, its kinda a requirement

mild dirge
#

100 lines in a single plot will be chaotic and uninformative

#

Let alone having a legend with the 100 labels too

still frost
#

hmmmm, you're right, maybe i should just get the average instead

mild dirge
#

Or separate into multiple plots, or make a grid of plots or something

#

Or a histogram, but 100 lines isn't the way to go

brisk sage
frigid sparrow
#

@still frost do you have to make a lineplot?

uneven cargo
#

Would anyone like to give some feedback on an API for running embeddings-based classification, segmentation and reranking?

serene scaffold
uneven cargo
#

For example, goal is to make it super flexible so that it can provide classification with BYO labels and no finetuning/training. Calling it https://similarity.ai, keen for feedback!

#

(based on things like kmeans and other algs for segmentation)

serene scaffold
royal hound
#

Yo i'm back i still don't know how i would go about doing what i wnat to do

#

i want to make a AI that analyzes a certain file type (midi) and try generate new ones based of what it learns

serene scaffold
royal hound
#

not quiet

#

i looked at some projects

#

they dont really do what i want to do

#

they really focus more on generating sound

#

than midis

serene scaffold
#

your problem statement is kind of underspecified.

royal hound
#

how should i specify

#

okay let me try

#

and specify again

serene scaffold
#

AIs don't just accumulate arbitrary data. you have to know what you expect to happen when you train it on a given MIDI

royal hound
#

so a midi file holds data of notes

each note in the midi file contains data of where it is when its played the tone pitch velocity e.t.c

I want to make a ML/AI that can take a look at this midi learn how the midi is made. And after that each generation needs to ouput a set of midi files like 10-50 and then i rate them myself to give the AI some feedback

serene scaffold
royal hound
#

like try and guess/randomaly make one

#

until it consistently makes something good

serene scaffold
#

what are the MIDIs going to be? music, or something else?

royal hound
#

MIDIs are just files that hold data of the notes

#

and then you can later use it urself to create music

serene scaffold
#

so are the MIDIs going to be completely random noise?

royal hound
#

its not noise

serene scaffold
#

then what is it?

royal hound
#

i just said

#

like u know

#

c1 c#1 d1 d#1

#

then when u drag and drop into a music software like fl studio

#

it reads it

#

and then the noise is created through fl studio

#

the midi file is just data of the notes thats all

#

Unlike regular audio files like MP3s or WAVs, these don't contain actual audio data and are therefore much smaller in size. They instead explain what notes are played, when they're played, and how long or loud each note should be.

Files in this format are basically instructions that explain how the sound should be produced once attached to a playback device or loaded into a particular software program that knows how to interpret the data.

This makes MIDI files perfect for sharing musical information between similar applications and for transferring over low-bandwidth internet connections. The small size also allows for storing on small devices like floppy disks, a common practice in early PC games.

acoustic halo
#

You would probably have to find a way to convert your input midi files into something a model can make use of, then do the inverse with the result

royal hound
#

yes i been done that

#

i just dont know how to process it with tensorflow

#

i dont depend on tutorials unless its like complete beginner stuff

#

which is what im struggling to find about tensorflow

frigid sparrow
#

uhm tensorflow homepage?..

royal hound
#

the only thing i know how to do right now is to make a model save the model and load the model

acoustic halo
#

yeah tensorflow has loads of documentation and tutorials, though personally i would use something like keras or pytorch, how much machine learning do you actually know?

royal hound
#

not much

mild dirge
#

Yeah when reading through what you want, it seems like you may just be in over your head

acoustic halo
mild dirge
#

This seems like a pretty complex task, and you don't have experience with ml

royal hound
#

hm

#

that's fair

acoustic halo
#

You would be better off following a structured course in ML to understand the actual concepts at work

royal hound
#

any sources you can link me?

#

im looking at the tensorflow homepage right now

frigid sparrow
#

isn't it saying there something about MIDI files as an example?

royal hound
#

also i dont think it would be that hard i have a large dataset

frigid sparrow
#

uhm, so how much data science and/or ai/ml experience do you have @royal hound ?

royal hound
#

i got a good experience in data science

#

i havent gotten into ML/AI yet

#

this will be my first project

frigid sparrow
#

so you know that a large dataset might not be good if you have a lot of variation within your data?

royal hound
#

its not much variation

#

i know what i mean when i said large dataset

cedar sierra
#

Hi i m using google colab and i use multi cells for my disco diffusion art works, i need to know any way to change multi cells text at once , example i have sentence in different cells and i need to know how to change that at once ……. Is there any one can help with that please

tacit basin
cedar sierra
misty flint
#

we dont talk about aws on this server do we? specifically deploying MLs if possible

misty flint
serene scaffold
#

I'm trying to deal with AWS right now and I hate it

#

I think it should be a banned topic.

misty flint
#

stellll

serene scaffold
#

also AWS should be banned

#

I don't care how good the uptime or scalability are if it ruins my life otherwise.

tacit basin
#

Let's ban AWS and Rex for talking about it : 😂

misty flint
#

i need to figure out the tradeoffs between having an ML model on SageMaker Serverless vs. traditional Lambda + API gateway architecture

serene scaffold
misty flint