#data-science-and-ml

1 messages · Page 23 of 1

warped gate
#

Fellow ChODS member here btw 😄

fresh fable
#

ah nice haha

#

im just messing around i learnt this huckel theory approximation quite some time ago but never really understood how they computed the energies for benzene or other larger molecules

#

3x3 or 4x4 matrices were quite easy to examine but the larger ones like benzene made me curious as to how they're computed

split drift
#

Does some computation are faster in pyarrow vs numpy?:

warped gate
#

ig for directly putting the values and calculating for matrices with larger dimensions you could use numpy.linalg.det(array)

fresh fable
#

but i have variables in the matrix (x)

#

like this

warped gate
fresh fable
#

how do you mean?

warped gate
#

what is the determinant finding here?

fresh fable
#

the determinant gives an equation in x which is set equal to 0

#

in this case it's a quartic equation

#

the roots of this equation are the coefficients i'm looking for

fresh fable
#

Ah I see, thank you!

heady spoke
#

Hello, my name is Agustin. I am from Argentina. I am currently working in data analytics. I am trying to solve a problem with pandas. specifically with the metod .astype() which will be obsoleted in a near future. I dont know how to replace this. Python itself is suggesting this function for replacement: Use obj.tz_localize(None) or obj.tz_convert('UTC').tz_localize(None) instead

#

Can someone help me with this? I dont understand how to replace this function with those options

#

that is the error that I get when executing the astype() method. Its more a warning rather than an error, but in the near future it will become an error

lone nacelle
#

Hello, I have a question about numpy, specifically about numpy.linalg.eig. In the documentation, it says that it returns “normalized” eigenvectors. However, I don’t want them normalized for a project I’m working on. I’ve looked at stackoverflow, but there’s no suggestion that doesn’t involve going to another package sympy. Is there any way to use just numpy and calculate the eigenvectors of a matrix without the normalizing?

cursive pond
#

Hello, general question about machine learning from a beginner here: what does it mean to train a model? I often see that being said in context of machine learning, but based on my experience with kNN and GAs, I dont really get what it means. How would I "train" my GA or kNN algorithm? Or is training needed for other algorithms and methods besides those simple ones?

Also, how exactly can I imagine machine "learning"? How is my machine "learning" something by using kNN or GAs? I only see it as following a strict pattern/algorithm and coming to a solution that way?

I hope someone can answer my questions, thanks in advance!

young granite
#

If you look at this model you will notice a certain peculiarity for areas of your model that are not covered by data.

if you would then analyze known data with it and check the predictions in a truth matrix you could make statements about the quality of your model

cursive pond
young granite
cursive pond
#

For example genetic algorithms or the k-nearest neighbors algorithm?

young granite
#

u got 3 main methods: supervised, unsupervised and reinforced learning

young granite
#

the method is how ur modell handles ur data

cursive pond
#

Can you give an example for that?

young granite
#

different algorithms result in different predictions for same dataset

#

Logistic Regression, Decision Tree, Random Forest, Support Vector Machine, K-Nearest Neighbour and Naive Bayes are the main ones

#

and its to say that for all the above the input data differs so u have to normalise the inputs but from the original dataset

#

so the dataset always is the same

#

if that makes any sense 🗿

#

maybe @serene scaffold can crosscheck my explanation

cursive pond
young granite
cursive pond
#

Lets say im doing supervised kNN?

#

Or, as another example, unsupervised GA?

young granite
#

basically its linear regression and u got functions in ur NN with y=f(x)

cursive pond
#

NN stands for...?

young granite
#

and u got labelled data with which ur NN can check autonomously if it was right or wrong

#

neural network

cursive pond
#

Oh, im not using NNs

young granite
#

u always do

cursive pond
#

The only thing I did so far is really just implementing the k-nearest neighbor algorithm to predict data, and an evolutionary algorithm

#

Is that not machine learning, if im just using those algorithms without a NN?

young granite
#

ok so u just use statistics?

cursive pond
#

Hmm yeah I guess

#

I basically got these algorithms from a machine learning tutorial series, so I thought this would already be some sort of learning, but ig its just statistics then? Where/when does the actual learning part come in? With NNs?

strong sedge
young granite
#

and for my understanding of whats happening inside the hidden layers of a NN -> Black Box cause the Network finds correlations and causalitys from n-data

cursive pond
#

Ok so, how is the NN interacting with, for example, the kNN algorithm then on a higher level? Who is influencing/adjusting what?

young granite
#

u got input(blue) hidden(yellow,red) and output(green)
all got a function y=f(x) and weights(black strings) now it gives different approaches but easiest is forward so blue->yellow->red->green

#

"what wires together fires together"

#

depending on the value for a given neuron it fires or it wont

#

and thats the learning part

strong sedge
#

fyi knn and nn are 2 fundamentally different algorithms
dont get confused

cursive pond
#

I got a general idea of NNs, but how do those inputs and outputs now interact with an algorithm like kNN? Where would I put that in a NN?

strong sedge
#

knn stands for k nearest neighbours and nn stands for neural networks

cursive pond
strong sedge
cursive pond
#

And my question is, how are you using them together?

serene scaffold
young granite
#

first then learning

#

statistics always first step

strong sedge
#

they are 2 different things to do something similar

#

like you can ride a car, or a bike

#

both do the same thing

#

but both are fundamentally different

cursive pond
#

Alright, is it the same for GAs and NNs? Or is that possible to be used tgt?

cursive pond
#

I saw some of those videos with GAs and NNs on youtube

#

Together

young granite
#

@strong sedge i could give statistic functions as neuron functions tho cant i?

strong sedge
cursive pond
young granite
#

for my understanding and like a tried to explain to bruce neurons got functions and weights applied to em (y=f(x))

cursive pond
#

Maybe because the kNN wouldnt make sense cuz its a supervised algorithm and there wouldnt be much left to learn if ur just comparing the data etc.?

young granite
#

so it would be possible to say make a predicition with knn and use the value of it

strong sedge
strong sedge
#

knn != nn

#

they are different

young granite
#

ofc

strong sedge
#

knn doesnt have a neuron in it

young granite
#

i do know that

#

but neurons got functions

#

so i assumed i could apply a function to the neuron like knn

#

different approaches yes

#

and nn is not always >> statistics but i thought i could "combine" if wanted

strong sedge
#

no

#

that is not how nn works

young granite
#

elaborate pls

strong sedge
#

nn doesnt works on statistics, rather calculus
knn also doesnt works on statistics, it works on the distance between points

there is a separate algorithm called naive bayes that works on statistical idea called bayes theorem

strong sedge
# young granite elaborate pls

a neuron in a neural network takes in inputs, does some processing on it and gives some outputs
in function form
y = f(x)

young granite
#

yes

strong sedge
#

knn works on distance between k points, the resultant value of a new point is the average of the k nearest points

#

2 very different ideas

young granite
#

i tired to keep it simple and not explain the underlying idea but i thought i could combine em thanks for correcting me

strong sedge
#

no worries

young granite
#

but in a nn i work somewhat with statistics cause the system searches for correlations and causalitys doesnt it?

strong sedge
young granite
#

best answer BLACK BOX 🗿

young granite
young granite
#

"note: the multiplier used should be small, there is no fixed value, can **you **what ever works for you"

#

looks fine for me and is sufficient but i wont quiet remember what my prof said a few years back

young granite
#

gladly

lapis sequoia
#

if I have an array of zeroes np.zeros(100,336) how do I update the 318th to 325th entries to 1?

serene scaffold
lapis sequoia
#

yes

#

in the first row for example

serene scaffold
lapis sequoia
#

thanks

#

can i ask a more complicated question?

#

i actually have an array of zeros np.zeros(100,48*7)

#

each row is a factory shift in a data frame

#

and each column is a half hour segement of the week

serene scaffold
#

well fuck

lapis sequoia
#

i'm trying to iterate over a dataframe

#

which will count how many factory shifts are active at each half hour of the week

#

so something like

#

for r in df.iterrows:

serene scaffold
#

when you're doing numpy or pandas, just banish "iterate" from your mind.

lapis sequoia
#

ok

#

so what I have so far is

serene scaffold
#

hold that thought

#

do print(df.head().to_dict('list')) and put the result in the chat

#

and then explain what you're trying to do, without any code.

#

I won't look at any screenshots.

lapis sequoia
#

ok give me one sec

#

thanks

#

crap I dont have access to the file on my home computer

#

guess we can't do it?

serene scaffold
#

do you have the name of each column and its dtype memorized?

lapis sequoia
#

yes

serene scaffold
#

that's what I need to know

lapis sequoia
#

factory_position: str

#

day: int

#

shift_start_time: int

#

shift_end_time: int

serene scaffold
#

what time unit is start_shift_time? seconds?

lapis sequoia
#

its a 24 hour clock

#

so 2300 hours for eample

serene scaffold
#

that doesn't answer my question.

#

oh, I see.

lapis sequoia
#

how should we proceed? should I tell you what I tried to do?

serene scaffold
#

so you need to know how many half-hour blocks in each day (00:00, 00:30, 01:00, etc.) are covered by a factory position?

lapis sequoia
#

i need to know how many factory workers/positions are needed at each half hour of the week

serene scaffold
#

so you need to know how many factory positions are active during each half-hour block?

lapis sequoia
#

eactly

#

*exactly

serene scaffold
#

okay, we can work with that.

lapis sequoia
#

thanks

#

so we loop through the df right and need to figure out the start half hour and end half hour right?

serene scaffold
#

no looping.

lapis sequoia
#

ok

#

so whats the approach?

serene scaffold
#

the first step is to represent everything as actual timestamps

In [3]: pd.Series([430, 1200, 0000])
Out[3]:
0     430
1    1200
2       0
dtype: int64

In [4]: s = _

In [6]: s.astype(str).str.zfill(4)
Out[6]:
0    0430
1    1200
2    0000
dtype: object

In [7]: pd.to_datetime(s.astype(str).str.zfill(4), format='%H%M')
Out[7]:
0   1900-01-01 04:30:00
1   1900-01-01 12:00:00
2   1900-01-01 00:00:00
dtype: datetime64[ns]
lapis sequoia
#

ok got it

serene scaffold
#

You can also add the days.

In [9]: pd.Series([1, 2, 3]).astype('timedelta64[D]')
Out[9]:
0   1 days
1   2 days
2   3 days
dtype: timedelta64[ns]

In [10]: pd.to_datetime(s.astype(str).str.zfill(4), format='%H%M') + _
Out[10]:
0   1900-01-02 04:30:00
1   1900-01-03 12:00:00
2   1900-01-04 00:00:00
dtype: datetime64[ns]
lapis sequoia
#

ok i'm with you so far

#

the datetime is for the start right?

serene scaffold
#

yeah

#

will the start and end always be on an hour or on the half hour? like it will never be at 617 or 1535?

lapis sequoia
#

no

serene scaffold
#

no to which part

lapis sequoia
#

itll always be on the hour or half hour

serene scaffold
#

okay. I wonder if there's a way to "expand" each row into one row for each half-hour block

#

you said there's 7 days, right?

lapis sequoia
#

yes

serene scaffold
#

so you can do this

In [23]: pd.date_range(start='1900-01-01', freq='30min', periods=24 * 2 * 7)
Out[23]:
DatetimeIndex(['1900-01-01 00:00:00', '1900-01-01 00:30:00',
               '1900-01-01 01:00:00', '1900-01-01 01:30:00',
               '1900-01-01 02:00:00', '1900-01-01 02:30:00',
               '1900-01-01 03:00:00', '1900-01-01 03:30:00',
               '1900-01-01 04:00:00', '1900-01-01 04:30:00',
               ...
               '1900-01-07 19:00:00', '1900-01-07 19:30:00',
               '1900-01-07 20:00:00', '1900-01-07 20:30:00',
               '1900-01-07 21:00:00', '1900-01-07 21:30:00',
               '1900-01-07 22:00:00', '1900-01-07 22:30:00',
               '1900-01-07 23:00:00', '1900-01-07 23:30:00'],
              dtype='datetime64[ns]', length=336, freq='30T')
lapis sequoia
#

is this on each row?

serene scaffold
#

no, this is separate. it's every possible half-hour block

lapis sequoia
#

oh i see

serene scaffold
#

anyway, you can do something like this

In [36]: blocks = pd.date_range(start='1900-01-01', freq='30min', periods=24 * 2 * 7).to_numpy()[None, :]

In [37]: blocks.shape
Out[37]: (1, 336)

In [38]: shift_starts
Out[38]:
array([['1900-01-01T04:30:00.000000000'],
       ['1900-01-01T12:00:00.000000000'],
       ['1900-01-01T00:00:00.000000000']], dtype='datetime64[ns]')

In [39]: (shift_starts < blocks) & (blocks < shift_ends)
Out[39]:
array([[False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False]])

where you use broadcasting to get a 2d array of bools. each column is a block and each row is a shift. and it's True if that shift overlaps with that block

lapis sequoia
#

i see

#

with you so far

serene scaffold
#

well, that's it

lapis sequoia
#

how do you get to a count of shift for each half hour?

serene scaffold
#

sum of each row

lapis sequoia
#

ok cool

#

thanks for your help

strong sedge
#
'''
        y = wx + b
        dy = dwx + wdx + db
        
        dy / dw = dw * x / dw + w * dx / dw + db / dw
        
        what I think should be correct
        dy / dw = x
        
        dy / db = 1
        
        dy / dx = w
        '''

this is technically wrong
it should be

#
'''
    dw = dy * x
    db = dy
    dx = dy * w
'''```
strong sedge
fringe anvil
#

what style of plot is this?

#

ive looked at all the examples on matplotlib

merry pike
#

I have a h5 model, but how do I run it with opencv

#

this is the code im using, but it doesnt work : ```from cvzone.ClassificationModule import Classifier
import cv2

cap = cv2.VideoCapture(0)
myClassifier = Classifier('eyedisease.h5','labels.txt')

while True:
_, img = cap.read()
predictions, index = myClassifier.getPrediction(img)
print(predictions)

cv2.imshow("Image", img)
cv2.waitKey(1)
#

gives this error

rare socket
#

my reinforced learning network has 5 inputs and 3 outputs. No matter how many middle layers there are or how many nodes it has, my output is always only 1 option. I have tried different training algorithms and different activation functions but nothing works. Do I not have enough input nodes or something? I am not sure what to do. I would appreciate the help

novel python
#

how can I compare how many "lowest" values a column has compared to 3 other columns?

woven pasture
#

why does jax.grad fail on the following method

def U(x):
    return np.sum(np.linalg.norm(x[:, None, :] - x[None, :, :], axis=-1))```
#

i truncated to np.linalg.norm after trying with np.sqrt(np.sum(np.square(... for a while

#

jax.grad is possible up until the np.sqrt part

#

also np is jax.numpy, not the standard numpy

rigid wadi
#

Hi, has anyone worked with receipt data extraction before? Like extract the invoice number, receipt date and amount etc..
Is there any model that are ready to train for this?

fleet pulsar
#

can anyone tell me good course about data science ?

hasty mountain
#

Guys, any tips on how to deal with vanishing gradients in a discriminator from a GAN?
(My discriminator has only 3 layers and its optimizer is an adam with lr=1)

#

I can think about residual blocks and batchnormalization, but I suppose residual blocks aren't really a good option for a GAN, right?

desert parcel
woeful hedge
#

ACCURACY = THE POINT AND RANGE OF A MEASURED AMOUNT OF CAPABILITY A POSSIBILITY CAN HAPPEN AND DETERMINE COME INTO EFFECT
RADIUS = SET RANGE OF A CENTERED POINT TO THE END DESTINATION
DIAMETER = SET RANGE POINT FROM START TO MIDDLE TO THE END WHILE PASSING THE RADIUS
CONVERT = CHANGE FORM AND OR CHARACTER AND OR FUNCTION
PATTERN = REPEATING METHOD
WRITE = ENSCRIBE FROM LOOKING AT WORDS
READ = DESCRIBE FROM LOOKING AT A PATH OF WORDS
SPAN = MEASURED LIMITED RANGE
VIBRATION = PARTS THAT MOVE BACK AND FORTH AT A GIVEN SPEED
TRANSFORM = MAKE A CHANGE IN FORM
SYNCHRONIZE = LINK AND SEND THE SAME RESULT TO ALL SOURCES
SCAN = ANALYZE A SPECIFIC WORD OR FIELD AND OR GIVE DATA ON THE ASKED INFORMATION TO SEARCH FOR
ANALYZE = READ AND LOOK OVER
CALCULATE = GIVE A DESIGNATED OF A CALCULATES DESCRIPTION FOR A NUMBER AND GIVE ANSWER FOR ALL OF VALUE
LIMIT = SET DEFINED AMOUNT FOR KNOWLEDGE WITH A GIVEN POWER LEVEL
RECALL = GAIN THE ABILITY TO VIEW PAST MEMORY INSIDE BRAIN
REACH = GRAB TO PULL INWARDS
PREDICT = GIVE PERFECT VALUE
REPEAT = CYCLE SAME EFFECT AGAIN INTO SAME FREQUENCY
RECOGNIZE = RECALL FROM AN EARLIER POINT WITHIN TIME
ENCODE = COMPRESS CODE
DECODE = DECOMPRESS CODE
RECODE = COMPRESS CODE ONCE MORE

#

LOOP = BIND IN A CYCLE
MEASURE = TAKE IN THE AMOUNT AND DISTANCE OF
ANSWER = SOLUTION TO A PROBLEM
SOLUTION = FINAL OUTCOME TO AN FORMULA
PROBLEM = UNFINISHED SOLUTION
SEARCH = FIND AND LOCATE SOMETHING
ASK = STATE A QUESTION
TIME = MEASUREMENT IN WHICH CURRENT REALITIES MUST PASS
SPACE = CONTAINER IN WHICH TIME MUST PASS THROUGH
UPLOAD = TRANSFER3 INTO DESCRIBED LOCATION
DOWNLOAD = TRANSFER3 TO CURRENT DEVICE
SIDELOAD = TRANSFER3 TO ALL DEVICES WITH STATUS OF STATED SET LOCATION
CLONE2 = MAKE AN IDENTICAL COPY OF
SYNCHRONIZE = LINK AND SEND THE SAME RESULT TO ALL SOURCES
ENCODE = COMPRESS CODE
DECODE = DECOMPRESS CODE
RECODE = COMPRESS CODE ONCE MORE
SETTING = A MEASUREMENT COMMAND THAT CAN BE ADJUSTED AND BY AN OPERATOR
ADJUST = EDIT AND MODIFY
EDIT = CHANGE AND OR MODIFY TO ADJUST TO A SPECIFIED PURPOSE
WORK = PRODUCING EFFORT TO FINISH A TASK
WORKLOAD = THE AMOUNT OF WORK
COMMAND = ORDER TO BE GIVEN
LINK = BRING TOGETHER AND ATTACH TO
BIND = EDIT AND MODIFY
LEVEL = NUMBER AMOUNT OF OR SIZE
UNIT = STORAGE CONTAINER
DIMENSION = NUMBER OF GIVEN AXIS POINTS
NUMBER = ARITHMETICAL VALUE THAT IS EXPRESSED BY A WORD AND OR SYMBOLE AND OR FIGURE REPRESENTING A PARTICULAR QUANTITY AND USED IN COUNTING AND MAKING CALCULATIONS AND OR FOR SHOWING ORDER IN A SERIES OR FOR IDENTIFICATION
FREQUENCY = REPEATED PATTERN AND OR SETTING

#

POWER = AMOUNT
STRENGTH = LEVEL INTENSITY
CALIBRATE = SCALE WITH A STANDARD SET OF READINGS THAT CORRELATES THE READINGS WITH THOSE OF A STANDARD IN ORDER TO CHECK THE INSTRUMENT AND ITS ACCURACY
PUBLIC = ACCESS TO ALL OF CREATORS INTERIOR DOMINION
PRIVATE = HIDDEN TO EVERYONE BUT CURRENT2 USER2
PERSONAL = EXCLUSIVE TO THE CREATOR
ESCAPE = RETURN TO SOURCE PLACE2
RETURN = GO BACK
CONSTANT = ALWAYS IN EFFECT
CYCLE = PROCESS OF REPEATING AN EVENT CONTINUOUSLY IN THE SAME ORDER
MEASUREMENT = AN ACT TO CALCULATE AND GIVE A SPECIFIC LENGTH ON SOMETHING
CALCULATOR = A DEVICE USED TO CALCULATE INFORMATION AND ANALYZE SET TASKS AS A ROOT VALUE OF LOGIC
WAVELENGTH = A SET OF WAVE PATTERNS GIVEN FREQUENCY FORMAT IN A LENGTH OF A WAVE VALUE DETERMINED BY A PREVIOUS VALUE EFFECT
LENGTH = HOW LONG A MEASURED DIMENSIONAL OBJECT IS EXTENDED
LATTICE = INTERLACED STRUCTURE AND OR PATTERN
LOCATION = SPECIFIED AREA
LINE2 = CHOSEN DIRECTION THAT IS SET IN A SINGLE PATH
WAVE2 = CONTINUAL FLUCTUATION OF FREQUENCY AND OR PATTERN
WIDTH = MEASUREMENT OF SOMETHING FROM SIDE TO SIDE
HEIGHT = THE LENGTH OF RAISING OR LOWERING IN A VERTICAL PATH
HERTZ = DEFINED SOUND WAVE FREQUENCY
MEASURE = TAKE IN THE AMOUNT AND DISTANCE OF

#

Those are the mathematical variables my language has
just some and not a complete list yet

#

looking for input and feedback and what others think of it. How others see it could be used if I made it as a smaller library/module for it to connect to the full language with.
What others see within its potential as well

thorn nova
#

Can anyone help me turn excel data into something that can be worked with in python? I'm new to data science and have already tried all the built in functions from pandas but it can never recognize my file for some reason, not sure if i should be saving it in a particular place first? Would appreciate if someone could hop on a call or something and help me work through this!

wary breach
#

Bit confused about how to combine two different models into one. I.e. if I fit a linear regression model and also fit a XGBoost model to a dataset. I know sometimes you can get better scores utilizing both models but am unsure how to go about this process. Can anyone point me in the right direction? Thanks! (@ me on reply if you can please)

wary breach
woeful hedge
#

@topaz night You Like it

hasty mountain
# wary breach Bit confused about how to combine two different models into one. I.e. if I fit a...

If you're using keras, you can pass the output of a model as input of another model.
Example: create a convolution model to extract features from an image and pass those features inti XGBoost. Or you can extract features with PCA and pass them into a Decision Tree.

If you're using tensorflow or Pytorch for neural networks, things can get more interesting, as you can create a Neural Network with XGBoost inside of it.

ebon jewel
#

Need help with pythpn and pandas code

mint palm
#
Traceback (most recent call last):
  File "/local/scratch/v_rahul_pratap_singh/UnsupervisedVAD/video_feature_extractor/extract.py", line 50, in <module>
    model = get_model(args)
  File "/local/scratch/v_rahul_pratap_singh/UnsupervisedVAD/video_feature_extractor/model.py", line 32, in get_model
    model = model.cuda()
  File "/shared/home/v_rahul_pratap_singh/miniconda3/envs/envRahul/lib/python3.10/site-packages/torch/nn/modules/module.py", line 689, in cuda
    return self._apply(lambda t: t.cuda(device))
  File "/shared/home/v_rahul_pratap_singh/miniconda3/envs/envRahul/lib/python3.10/site-packages/torch/nn/modules/module.py", line 579, in _apply
    module._apply(fn)
  File "/shared/home/v_rahul_pratap_singh/miniconda3/envs/envRahul/lib/python3.10/site-packages/torch/nn/modules/module.py", line 602, in _apply
    param_applied = fn(param)
  File "/shared/home/v_rahul_pratap_singh/miniconda3/envs/envRahul/lib/python3.10/site-packages/torch/nn/modules/module.py", line 689, in <lambda>
    return self._apply(lambda t: t.cuda(device))
RuntimeError: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.```
this is error or what, my code seems to continuing running and making relavant files but this pops
ebon jewel
#

Anyone interested to have a look in my project and help me

strong sedge
strong sedge
strong sedge
wary breach
ebon jewel
#

@strong sedge can we connect need to share my screen and make you understand my problem

silent stump
#

Hi guys ive got my entry, take profit, and stop loss stored in my dataframe, but cant figure out how to track the profit and loss of the strategy. Any advice? thanks. This is for a trading strategy

topaz night
topaz night
inland eagle
#
    return int(v.strip(',')) ```
does anyone know why a .strip won't work in this instance. i am trying to pull from a collum in a data frame where it is all strings since the number values have commas (EX: 36,098) but i want to convert all of those values to ints without commas
```---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/tmp/ipykernel_191/1557144099.py in <module>
      2 def convert_votes_to_int(v):
      3     return int(v.strip(','))
----> 4 video_games = video_games.con
      5 video_games

AttributeError: 'DataFrame' object has no attribute 'con'```
this is the error message i am getting
steady basalt
#

anyone know a cute way of getting the bottom 3 strings from a specific column? so lets say i did .tail(3) I want 3 values from that, it would be the same column for each row

#

lets call that rowCbottom3 = []

#

and then to match for row B

#

sorry

#

not row, COL

#

ah, worked it out

hushed kraken
#

How can I see if my prediction model is the best model?

serene scaffold
hushed kraken
desert oar
hushed kraken
#

accuracy I guess

desert oar
#

i am not being glib. that's a legitimate question and an important one that you must answer in any modeling project!

#

in general, it's hard to know if your model is "best" but you can compare various models to see which one is better

hushed kraken
#

So the only way is by testing multiple models and comparing them?

desert oar
#

in general yes. in statistics specifically, certain models have certain desirable mathematically-proven characteristics in some situations.

#

but that also doesn't make them "best" for any particular application

#

bias-variance tradeoff is also important to consider

#

would you prefer a model with really small average error, but huge variation in the predictions? or would you prefer a model with modest average error, but less variation in the predictions?

hushed kraken
desert oar
#

if you don't know what "bias-variance" tradeoff is, go look it up right now

hushed kraken
#

ok thnx

hushed kraken
desert oar
hushed kraken
#

but wouldnt a bigger bias also give incorrect results?

desert oar
#

this often comes up with observational data, such as that collected from the environment. it's often helpful to think of "the environment" as a big random sampling engine: physical phenomena are the outcomes of random data generating processes. you get exactly one opportunity to observe that data generating process, because time only runs forward!

#

so it's tempting to look at a time series at the millisecond scale of something like solar energy, and conclude that you have a big data set, and therefore that you don't care about variance and must minimize bias. but there is a legitimate interpretation in which you have a data set of exactly 1 data point.

hushed kraken
#

so its actually better to find a balance between the bias and variation error?

desert oar
hushed kraken
#

And how can I calculate these, because right now I'm only calculating the mse of the last training data and the mse of the prediction

#

Also another question, since I am using 2 models for the energy prices and solar energy production, would stacking be a good method to make a more accurate prediction?

grand olive
#

i need help choosing between tf and pytorch.
i've read that pytorch pretty much beats tf when it comes to use in research, and is starting to get more and more popular in the industry

i'm a bit concerned about deployment though (i'm only concerned about deploying to web apps)
read that it's a bit harder to deploy with pytorch. is this still true? or has it become easier to deploy pytorch now?

my interests include mostly NLP(mostly japanese) and music (music theory, metadata, genres)

fast rivet
#

this command Dataset.from_dict(dutch_dict) gives me this error pyarrow.lib.ArrowInvalid: Column 1 named validation expected length 43410 but got length 5426
I just want to convert a dictionary to a Dataset object which I've imported from datasets but I don't know why I'm getting this error.

lean jacinth
#

Tensorflow still has the most weight behind it, but it's a bit of a relic
Pytorch is the up and comer and will likely overtake eventually

grand olive
#

last question
how hard is it to deploy to the web with pytorch vs tensorflow as of now? (the articles i've been reading are from 1-2 years ago and i'd guess pytorch has improved since then)

lean jacinth
#

Like I use GCP for model deployment and both are integrated in the same way

hushed kraken
#

I got this error and can't fix it pls help : (

 Graph execution error:
fleet pulsar
lean jacinth
# fleet pulsar

Real programmers delete and start from scratch whenever they reload their IDE

lean jacinth
fleet pulsar
#

i started python

#

today

#

i feel hardness

hasty mountain
hasty mountain
lean jacinth
hasty mountain
#

In sklearn you can make a model's output be another model's input

hasty mountain
#

Use less neurons

lean jacinth
#

Ah

hasty mountain
#

However, it seems that it's stabilized for now... I'm adding random noise to the discriminator's inputs, using label-smoothing, weights initialization...

#

Aaaand updating after each batch, instead of each epoch

strong sedge
strong sedge
minor coral
#

hii

rare socket
#

hello, I am trying to change a single weight and bias in my model but I am not sure how to go about. Is there some sort of indexing through the model? model[column][row] <-- like this?

#

I'm using pytorch

hasty mountain
rare socket
#

thank you

rare socket
#

is there a way to index model.parameters() without the loop?

plush jungle
#

I got this error

#
RuntimeError: CUDA out of memory. Tried to allocate 2.43 GiB (GPU 0; 8.00 GiB total capacity; 5.70 GiB already allocated; 0 bytes free; 6.52 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
#

which I find strange, because my memory is supposed to be 16 gb

#

so why does it say 8.00 GiB total capacity?

agile cobalt
hasty mountain
#

Try checking your dxdiag

plush jungle
#

so my gpu only has 8gb of memory?

hasty mountain
#

Only

#

Yes

plush jungle
hasty mountain
#

Those big models usually does that

agile cobalt
plush jungle
#

task manager says I've got 2 gpus, a 3080 and "AMD radeon Graphics"

#

but when I try to train neural nets it only lets me use the 3080

#

is the amd one not a real gpu or something?

agile cobalt
#

CUDA = nvidia = probably doesn't supports amd I guess

hasty mountain
plush jungle
#

so why would the manufacturer of the computer put two incompatible gpus?

agile cobalt
#

they're not incompatible per se
it's just not compatible with cuda

plush jungle
#

oh

hasty mountain
#

Unfortunately, it doesn't seem to be that easy to use AMD GPUs for neural networks... I remember I tried to search about it, and...nothing.

#

Maybe most algorithms tend to rely on NVidia GPUs because of that...except for Google's, since they like to use their TPUs

agile cobalt
#

if you're just messing with GANs for fun or even some project you haven't got much progress into yet, you might as well move on to Stable Diffusion tbh - almost completely (if not completely) different network architecture, but generalises a lot better as far as I know

plush jungle
hasty mountain
agile cobalt
agile cobalt
agile cobalt
#

that doesn't really fits into neither "high quality" nor "face datasets" I think?

plush jungle
#

my main goal was to generate new character designs, kind of like a "novel pokemon gan" I saw someone do

#

but I trained a vanilla gan from scratch and the results were both very blurry and extremely overfit

#

and then I discovered thiswaifudoesnotexist, which retrained stylegan2 on a small dataset of anime girls

hasty mountain
plush jungle
#

and it was super crisp

#

so now I'm trying to retrain stylegan3 on my dataset

#

to achieve the same result

hasty mountain
#

OpenAI even developed a diffusion model that is better than DeepMind's BigGAN, which is the state of the art GAN, but the computation power that thing demands...
Each checkpoint file has, like, 1 Gb.

agile cobalt
plush jungle
#

but the waifu one was excellent

hasty mountain
#

Blurry images tend to be normal, so GAN models usually rely on SuperResolution nets...
Maybe some of them don't, but others do.
I think BigGAN uses something to avoid this, but it was so complicated that I can't remember... but there's NVidia's Progressive Grow, which uses a GAN that grows after each training session and generates quite interesting images with quite a resolution.

mint palm
#

i am using ssh.
if i clean GPU cache because i am getting CUDA out of memory, will it affect others using that GPU?

timid kiln
#

Working with dates/times in a pandas dataframe.

One of the columns in a df of data from our SQL server is ip_date (initial production date). Pandas says it's type object. I need to work on this as a date, so I run .to_datetime on it, and now its type datetime64[ns]. However, when I try to get the data type off of an individual value in that column of data, its type is <class 'pandas._libs.tslibs.timestamps.Timestamp'>.

  meters_sql = #result of the sql query
  print(meters_sql.dtypes) # Says column `ip_date` is `object`
  meters_sql['ip_date'] = pd.to_datetime(meters_sql['ip_date'])
  print(meters_sql.dtypes) # Says column `ip_date` is `datetime64[ns]
  print(type(meters_sql['ip_date'][1])) # Says it's type ...timestamps.Timestamp

How do I force this to be a datetime? Or what module would I use to work with timestamp?

agile cobalt
#

timestamps are pandas's version of datetimes

#

!e import pandas; print(pandas.Timestamp.mro())

arctic wedgeBOT
#

@agile cobalt :white_check_mark: Your 3.11 eval job has completed with return code 0.

[<class 'pandas._libs.tslibs.timestamps.Timestamp'>, <class 'pandas._libs.tslibs.timestamps._Timestamp'>, <class 'pandas._libs.tslibs.base.ABCTimestamp'>, <class 'datetime.datetime'>, <class 'datetime.date'>, <class 'object'>]
agile cobalt
#

pandas.Timestamp is to datetime.datetime what numpy.float64 is to float

timid kiln
# agile cobalt `pandas.Timestamp` is to `datetime.datetime` what `numpy.float64` is to `float`

OK... So the reason I'm asking this is I tried to run (forgive me for using terms badly) a list comprehension on the df to replace all the day values in the dates with the number 1. So 5/14/2022 wwould become 5/1/2022. I'm very much a beginner with list comprehensions so I tried this and got an error:

meters_sql['ip_date'] = [meters_sql['ip_date'].replace(day=1) for x in meters_sql['ip_date']]

error message: Series.replace() got an unexpected keyword argument 'day'

#

So my first thought was that the type of data in that series is not datetime so that's how I got to where I am now.

#

OK, so I think I figured out the first part of the list comprehension error. I have this now:

meters_sql['ip_date'] = [meters_sql['ip_date'][x].replace(day=1) for x in meters_sql['ip_date']]

The error message is: Exception has occurred: KeyError Timestamp('2018-05-19 00:00:00')

I'm at a loss as to what to do here.

agile cobalt
agile cobalt
#

loop up pandas vectorized operations

timid kiln
#

hokey pokey, thx 🙂

#

Sounds complicated so if I start dropping those words around the developers maybe they'll think I'm smart lol

#

Oh man, that looks a lot simpler and easier to understand. At least the first couple examples I see.

agile cobalt
#

explicit loops are as bad as (or even worse than) pure python code without pandas
apply()/map() with user defined functions is bad and shouldn't be used either, but still beats explicit loops
you should always use specific built-in methods that operate over the entire series

timid kiln
agile cobalt
#

pretty much

digital locust
#

Hey there! I'm building a Django app and I use pandas a lot to process data. I have come across one big problem: at some point in my app, data analysis takes like forever. I have the following code:

   i = df['agencia'] == 'DHL'
    for row in tqdm(df[i].index):
        for col in df.columns:
            for supplement_col in supplements_columns_names:
                for supplement_col_total in supplements_columns_names_total:
                    for supplement_price_col in list_df_supplements_prices_columns:
                        if df.loc[row, col] == supplement_price_col:
                            df.at[row, supplement_price_col] = df_supplements_prices.at[0, supplement_price_col]
                            theoretical_price = df.at[row, supplement_price_col]
                            invoiced_price = df.at[row, supplement_col_total]

                            if theoretical_price != invoiced_price:
                                errors_data.append(
                                    {'Package number': df.at[row, 'agencia'], 'Supplement error': supplement_price_col,
                                     'Invoiced price': invoiced_price, 'Theoretical price': theoretical_price,
                                     'Difference': invoiced_price - theoretical_price})

    # Generation DF errors
    df_errors = pd.DataFrame(errors_data)

I know that pandas does not recommend to loop through a DF. But in my case, I have to get to a precise cell to append data, i.e. getting the row and column for this part :

df.at[row, supplement_price_col] = df_supplements_prices.at[0, supplement_price_col]

For 2000 rows, the analysis takes like 4 min (!), which is way too long. So here's my question, I know it is possible to do better, but could you please guide me? I did look up and saw df.apply(lambda row) for example but since I'm using a lot of conditions and loops, it is unclear to me whether I should use this function or not...

inland eagle
#

does anyone know how to keep only a certain amount of rows in a data frame

#

like for example i just want to keep the first 10 rows of a df with like over 200 rows

#

what function would i use

digital locust
#

@inland eagle maybe df.head(10) ?

timid kiln
inland eagle
#

"to a DataFrame that contains the ten most common genres of video games, in descending order"

inland eagle
#

or like 0-9

inland eagle
#
AttributeError                            Traceback (most recent call last)
/tmp/ipykernel_644/3463392015.py in <module>
      2 sheesh = yolo.get('title').sort_values(ascending=False)
      3 wut = yolo.get(['title']).assign(count = sheesh).drop(columns='title')
----> 4 most_common_genres = wut.head(10)
      5 most_common_genres

AttributeError: 'DataFrame' object has no attribute 'head'```
this is the error message i am getting
#

ignore the variable and df names lol

timid kiln
#
print(wut.head(10))
plush jungle
#

I'm trying to retrain stylegan3 starting from one of their pretrained models, and I'm getting this

  File "C:\python\Generative-Adversarial-Networks\stylegan3-main\stylegan3-main\torch_utils\misc.py", line 162, in copy_params_and_buffers
    tensor.copy_(src_tensors[name].detach()).requires_grad_(tensor.requires_grad)
RuntimeError: The size of tensor a (12) must match the size of tensor b (24) at non-singleton dimension 0```
#

this stackoverflow post about the same issue in stylegan2 says it's because my dataset doesn't have the same number of classes as the pretrained model

#

but how could I fix this?

lapis sequoia
#

web dev here, looking to start my first ai python project

#

any tips?

plush jungle
lapis sequoia
#

i would prefer nothing to do with data

#

anything else

#

wait nvm

#

any ai is fine

plush jungle
#

there's image recognition, image generation, NLP

lapis sequoia
#

image recognition sounds cool

plush jungle
#

if that's the case, then you should look into the basics of neural nets

#

train a pytorch neural net on the MNIST dataset

lapis sequoia
#

okay

plush jungle
#

I highly recommend 3blue1brown's youtube video on neural networks

lapis sequoia
#

okay

#

do i just install any mnist dataset?

plush jungle
#

training a neural net to look at images of handwritten digits and predict what number it is is a great starting project

lapis sequoia
#

okay

inland eagle
timid kiln
inland eagle
#

this is the output:
babypandas.bpd.DataFrame

inland eagle
inland eagle
#

like majority of the functions are the same

timid kiln
#

sure

#

what did you get for the print/type command?

#

oh sorry you posted it

inland eagle
#

lol

timid kiln
#

maybe the head command isn't available in babypandas

timid kiln
#

Take a look in there and see if there's a 'head' command anywhere

#

You could always just make a loop:

for i in range(10):
  print(df[i])
#

That might work for ya.

inland eagle
#

IndexError: BabyPandas only accepts Boolean objects when indexing against the data frame; please use .get to get columns, and .loc or .iloc for more complex cases.

timid kiln
#

hmmm, just a sec...

#

Give this a try:

print(df.iloc[0:9])
#

idk how limited baby pandas is tho

agile cobalt
#

...why would you use that over normal pandas?

inland eagle
#

i have no choice but to use it over regular pandas

inland eagle
timid kiln
inland eagle
timid kiln
#

Trying to replace the day in a datetime if the value of day is < 15.

I have this, but I get an error on df.loc[df['fom'].day

df.loc[df['fom'].day < 15, 'fom'] = df['fom'].apply(lambda dt: dt.replace(day=1))
next sorrel
#

Hello, does anyone here have a good amount of experience with PyTorch?

#

I was just wondering if someone can help me understand how to prepare data for nn.LSTM or nn.LSTMCell, the long short term memory, a recurrent neural network

fringe anvil
#

anyone knows why my graph looks like this?

#

instead of this reference image.. i cant figure it out.. been at it for a while

#
fig2,ax2 = plt.subplots(figsize=(5,4))
fig2.patch.set_facecolor("None")
ax2.set_xlabel("Year")
ax2.set_ylabel("Double faults per match")
x,y = df2["year"],df2["player1 double faults"]/df2["player1 total points total"]
ax2.scatter(x,y,alpha=0.5)
ax2.plot(x,y,"-",color="orange")
mpl.style.use("default")
molten forge
#

Do anyone have experience in federated learning?

serene scaffold
novel python
#

how do I get all the values with pandas groupby? I want to sort it by 2 columns but I want the rest of the columns to come as a result too, but all I'm getting is a generic GroupBy object in return. Do I necessarily have to use a function with groupby for it to return something?

serene scaffold
#

but if you want "all the values", you might rethink why you're using groupby. you usually end up with less data after grouping and doing something with the groups, not the same amount.

novel python
#

oooh, I see. It makes total sense now, thanks! Basically, I wanted to turn this dataset:

#

into this one, where it separates by months

#

I thought using groupby would do, but doesn't look like it's the proper solution

serene scaffold
#

I think you're looking for pivot_table

novel python
#

oh, let me check that

serene scaffold
#

if you get stuck, do print(df.sample(10).to_dict('list')) for me and put it in the chat as text (no screenshots), and ping me.

#

also if the .sample(10) part doesn't give you rows with at least two months represented, just do it again until it does.

novel python
#

alrighty, thanks a lot, will try it out with the documentation and will reach you if I get stuck

desert oar
#

as stelercus said, usually you don't need to do this

#

but sometimes it comes in handy. i do it now and then

#

note that pivot_table might give funny results with a datetime column

#

obviously you can manually construct a "year-month" column first and use that for pivoting

#

personally i can never remember the arguments for pivot_table so i would probably do .resample followed by .unstack

#

(what is RDD?)

hasty mountain
#

Hey guys, when I load an .wav file using librosa.load, what is the unit of measurement for the y axis?
I know that it loads audio data in a time-series, so the x is seconds, but what about the y? Amplitude in decibels?

desert parcel
#

I posted a question in #🤡help-banana would be very grateful if anyone could answer 🙏🏻

fringe anvil
#

i did some changes to my code, ive lost my orange line. but the data looks better. tho the x axis doesnt give anything proper. can anyone provide pointers?

fig2,ax2 = plt.subplots(figsize=(6,4))
fig2.patch.set_facecolor("None")

dbl_ratio = pd.DataFrame(df2["player1 double faults"]/df2["player1 total points total"]) # good
y_avr = dbl_ratio
x_grpby = df2.groupby("year")

x,y = df2["start date"].values,dbl_ratio

ax2.set_xlabel("Year")
ax2.set_ylabel("Double faults per match")
ax2.scatter(x,y,alpha=0.2) # good
ax2.plot(x_grpby,y_avr,"-",color="orange")
mpl.style.use("default")
#

it would need to look like this

#

i was thinking, using the mean for the data points to draw the orange line.. but nothing of what i use/do works

steady basalt
strong sedge
#

Basically
U plot 2 graphs on the same figure
1 is test vs preds
2 is test vs y_true

compact star
#

I am trying to create a neat implementation in python and in the papers it says that neural networks that haven't improved in x generations will be removed, what is the definition for having not improved and how would I check for it?

strong sedge
compact star
#

is the fitness of the species the average fitness of the genomes in that species?

compact star
strong sedge
#

he has a bunch of videos on explaining parts of neat

split drift
compact star
strong sedge
#

Did u try reading the original research paper ?

compact star
strong sedge
#

Check that out

compact star
#

how do I read a .ps file?

compact star
#

Because I have the java version of neat and that references the drop off age but I would need help understanding how that works

strong sedge
compact star
#

ok thank ty for ur help

bleak coyote
#

Whats the best way to display confusion matrices?

#

sklearn, but are there better alternatives

celest vine
#
empty_df = pd.DataFrame()

for name in sd_eth_list:
    profile_url = df.loc[df['name'].str.contains(name, case=False)]
    
    empty_df.append(profile_url)

The dataframe still remains empty after running the code.
What am I doing wrong?

rich olive
#
y = np.linalg.solve(random_img, heart_img)
#

why arent these the same matrix

rich olive
#
contains_name = df.query('name in sth_ed_list')
#

filtering a dataframe is not done through iteration

#

You have to query or aggregate the data to reduce the size

#

or dimensionality, for aggregating

celest vine
rich olive
#

does my code work

celest vine
wheat snow
#
native-country      salary
United-States       >50K      7171
?                   >50K       146
Philippines         >50K        61
Germany             >50K        44

i got a lil dataframe here

e= df[['native-country', 'salary']]
highest_earning_country= e[e['salary']== '>50K'].value_counts()

i filtered it to show the country and the salary over 50K

now i want to print out the country with the leading salaray which would be the US

celest vine
#
urls = []

for name in sd_eth_list:
    
    if profile_url == (df['name'].str.contains(name, case=False)).any():
        url = df[df['name'].str.contains(name, case=False)]['profileUrl']
        
        urls.append(url)

Tried this as well but did not work as well

rich olive
wheat snow
#

smh i forgot how to do this

#

via .index() maybe?

rich olive
celest vine
# rich olive send the df

profileUrl    screenName    name    bio    followersCount    friendsCount
0    https://twitter.com/TheSnoopAvatars    TheSnoopAvatars    The Doggies    Enter tha Metaverse with @SnoopDogg x @TheSand...    88130    16
1    https://twitter.com/JulienROMAN13    JulienROMAN13    Julien ROMAN    💵 Investisseur / Youtube 🎬\n\n💸 Finance - Inve...    88768    162
2    https://twitter.com/landz_nft    landz_nft    Landz.io - Minting NOW    The first disruptive Real Estate NFT collectio...    53608    266
3    https://twitter.com/borgetsebastien    borgetsebastien    Sebastien 🏞    Co-Founder & COO of @TheSandboxGame, the open ...    93652    1138
4    https://twitter.com/cryptoamazo    cryptoamazo    Crypto Amazo    Crypto Promoter | Giveaway | DM to sponsor a #...    15743    56
rich olive
#

share it the way you constructed it lol or as a csv

#

actually nvm ill figure it out with another df

wheat snow
celest vine
# rich olive send the df

Basically I have a list of names that I want to find in the dataframe's name column.
I can do it with .isin but that looks for exact match.
I want results the way .str.contains() gives

rich olive
#

nvm thats not a thing

celest vine
wheat snow
#

alr

celest vine
#

@rich olive where you at help me

rich olive
#

one sec

wheat snow
#

works 🫂

rich olive
#
filtered = df.query(lambda row: name_ele in row.name for name_ele in sd_eth_list)
#

maybe

#

wait you want contains

rich olive
#
filtered = df.query(lambda row: row.name.contains(name_ele) for name_ele in sd_eth_list)
#

case=False

celest vine
# rich olive ```py filtered = df.query(lambda row: row.name.contains(name_ele) for name_ele i...
ValueError                                Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_8584\2498179663.py in <module>
----> 1 filtered = df.query(lambda row: row.name.contains(name_ele, case=False) for name_ele in sd_eth_list)

c:\users\user\appdata\local\programs\python\python37\lib\site-packages\pandas\core\frame.py in query(self, expr, inplace, **kwargs)
   4055         if not isinstance(expr, str):
   4056             msg = f"expr must be a string to be evaluated, {type(expr)} given"
-> 4057             raise ValueError(msg)
   4058         kwargs["level"] = kwargs.pop("level", 0) + 1
   4059         kwargs["target"] = None

ValueError: expr must be a string to be evaluated, <class 'generator'> given
rich olive
#

yeah that makes sense. sorry Im pretty new too. can fix tho one sec

#
filtered = df.apply(lambda row: row.name.contains(name_ele) for name_ele in sd_eth_list)
fossil ivy
#

I have a multiindexed dataframe like:

                   Duration  Duration/MW       Cost  Cost (m)  Times Offshore Exceeded  Times Vessel Full
Vessel Start Date                                                                                        
JUV    2022-01-01    34.688        4.818  3.983e+06     3.983                      1.5                0.0
       2022-01-02    33.296        4.624  3.839e+06     3.839                      1.4                0.0
       2022-01-03    34.354        4.771  3.948e+06     3.948                      1.6                0.1
       2022-01-04    30.342        4.214  3.534e+06     3.534                      1.5                0.1
       2022-01-05    35.092        4.874  4.025e+06     4.025                      1.6                0.1
       2022-01-06    31.342        4.353  3.637e+06     3.637                      1.4                0.2
       2022-01-07    30.100        4.181  3.509e+06     3.509                      1.3                0.2

WTIV   2022-01-01    34.688        4.818  3.983e+06     3.983                      1.5                0.0
       2022-01-02    33.296        4.624  3.839e+06     3.839                      1.4                0.0
       2022-01-03    34.354        4.771  3.948e+06     3.948                      1.6                0.1
       2022-01-04    30.342        4.214  3.534e+06     3.534                      1.5                0.1
       2022-01-05    35.092        4.874  4.025e+06     4.025                      1.6                0.1
       2022-01-06    31.342        4.353  3.637e+06     3.637                      1.4                0.2
       2022-01-07    30.100        4.181  3.509e+06     3.509                      1.3                0.2

I need to create a boxplot of Duration for each Vessel/ Start Date combination. Ive been struggling to make it work could someone help me? It would be much appreciated

#

whoops wrong df, I created this for the time-series analysis of duration and costs but they are the mean of 10 runs for each pair Vessel, Start Date

#

I have a long ~7300x7 dataframe, where each entry Vessel/ Start Date is separate 20 times, with the same index

celest vine
fossil ivy
# fossil ivy I have a long `~7300x7` dataframe, where each entry `Vessel/ Start Date` is sepa...

that looks like

...
723   WTIV 2022-12-28    10.750  ...     2.248                       0                  0
724    JUV 2022-12-29    43.333  ...     4.876                       2                  0
725   WTIV 2022-12-29     6.833  ...     1.647                       0                  0
726    JUV 2022-12-30    43.667  ...     4.910                       2                  0
727   WTIV 2022-12-30    12.083  ...     2.452                       0                  0
728    JUV 2022-12-31    47.917  ...     5.349                       2                  0
729   WTIV 2022-12-31     8.000  ...     1.826                       0                  0
0      JUV 2022-01-01    35.375  ...     4.054                       1                  0
1     WTIV 2022-01-01     6.500  ...     1.596                       0                  0
2      JUV 2022-01-02    33.083  ...     3.817                       1                  0
3     WTIV 2022-01-02    10.250  ...     2.171                       0                  0
4      JUV 2022-01-03    30.875  ...     3.589                       1                  0
5     WTIV 2022-01-03     9.250  ...     2.018                       0                  0
6      JUV 2022-01-04    10.917  ...     1.528                       0                  1
...
rich olive
fossil ivy
#

Just appended to each other:
simulation(strategy) generates one of those 729x7 dfs

full_results = []

    for i in range(0, 10):
        print("Run", i+1, "of 10")
        full_results.append(simulation(strategy))
#

I managed to get a boxplot for each vessel, but using the entire year as data for each boxplot. Instead I need to have a boxplot for each day of the year per vessel

#

I don't quite now how to implement the Date still

rich olive
#

you can datetime or just manually parse the date

#

whats the difference doing it year vs day

fossil ivy
#

my research investigates the impact of weather seasonality on offshore wind farm decommissioning project performance

rich olive
#

okay what are you trying to boxplot lol

fossil ivy
#

the box and whisker for the duration per day

#

Because the time-series graph I create takes the average of 20 runs, so very high values and very low values are not considered

rich olive
#

so for each day of the year, you want the spread of duration across all vessels

fossil ivy
#

for each day of the year, I want the spread of duration per vessel

#

Because I want to investigate if one of the vessels is more subject to weather uncertainties/ impacts

rich olive
#

you cant boxplot that, its 3 dimensional

fossil ivy
#

The graph at the bottom of the thread

#

just with the year representing my vessels, and the a/b the start date on the x axis

rich olive
#

so you want each vessel as a sub-hierarchy to each day in a boxplot

fossil ivy
#

yes

#

Im rather new/ unknowledgable in coding so yeah... quite tough to get behind it

rich olive
#

me too so we'll see if I can even be any help

fossil ivy
#

wait a sec... Isn't my structure pretty much identical to the df in the thread?

#
   Vessel  Start Date    Duration
717   WTIV 2022-12-25    12.000  ...     2.439                       0                  0
718    JUV 2022-12-26    47.333  ...     5.289                       1                  0
719   WTIV 2022-12-26    10.000  ...     2.133                       0                  0
720    JUV 2022-12-27    45.917  ...     5.143                       2                  0
721   WTIV 2022-12-27    10.500  ...     2.210                       0                  0
#

Year in his example would be my Vessel, Text would be my Start Date and data would be duration?

rich olive
#

sure. I imagine most dfs would apply. Im reading through it now but pivoting is hard lol

fossil ivy
#

Pivoting is a bitch yeah

celest vine
fossil ivy
celest vine
#

Fuck this shit man! I think I should open a small grocery shop

rich olive
celest vine
rich olive
#
filtered = df.apply(lambda row: name_ele in row for name_ele in sd_eth_list)
#

yeah because I assumed you were using .contains() correctly lmao

fossil ivy
#

Soooooooooo yeaaah

#

Looks like too much data for this lol

celest vine
rich olive
#

Im saying thats why i put it in my code. you said dumb

#

try the above

fossil ivy
rich olive
fossil ivy
#

it looks pretty much like a greyed out version of my time-series

rich olive
#

i just wouldnt use a boxplot

fossil ivy
#

What would you suggest otherwise?

rich olive
#

hm one sec

fossil ivy
#

Something like this would be nice as well, but probably same story as the boxplot

rich olive
#

Maybe heatmap the duration on a 2x2 vessel x day and examine the spread seperately

fossil ivy
#

bless you

rich olive
#

lmao np

fossil ivy
#

I meant bless you like what are those words lol

#

looking at it now though

rich olive
#

haha oh yeah theyre not words just had a micro-seizure

fossil ivy
#

fair enough loool

#

I see what you mean by heatmap

#

but how can I imagine the structure there?

#

Would you mean duration on y axis, date on x axis

#

and then heatmap per vessel

rich olive
#

one axis of heatmap is day the other is vessl so each square is a vessl on a day, trends along each axis, colour is avg duration

fossil ivy
#

but then that would not model spread (?)

rich olive
#

not spread per vessel per day. Thats what I meant by examine it seperately like per vessl or per day

fossil ivy
#

aaah

rich olive
#

but youll be able to see spread of avg duration across year and vessel

fossil ivy
rich olive
#

and then use other examinations to highlight areas of interest on the heatmap

fossil ivy
#

Maybe if I were to combine the months instead of doing a boxplot for every single day that could work

rich olive
#

its still 20 vessels so whatever you think 240 boxes looks like

fossil ivy
#

its 2 vessel

rich olive
#

oh lol

fossil ivy
#

JUV and WTIV

#

yeah

rich olive
#

...you could do a 3d heatmap from an offest angle with cell height and whiskers showing spread

fossil ivy
#

yeah the bot is right

#

What the hell did you mean lol

rich olive
#

one sec

fossil ivy
#

uuh 3d heatmap looking nice tho

rich olive
#

actually the whiskers would be nonsensical

#

heatmap doesnt make sense with two of one variable

#

can you just dual candlestick chat it

fossil ivy
#

ayo

#

calm down with words

#

im out here googling their meaning nonstop haha

#

Wouldn't dual candlestick essentially be dual boxplot tho

rich olive
#

like a stock chart but with the grouped bars like in your SO example

#

lmao yeah

#

im dumb

fossil ivy
#

yeah nah I think a boxplot would be the best approach here

#

because it is intended to visualize variability isnt it

rich olive
#

yeah i guess you have to reduce the timeframe if you wanna see spread

fossil ivy
#

yeah I might just do one separately for each month

#

Then I could derive If you use vessel x in month y, the project performance is significantly uncertain some stuff like that

normal hazel
#

Hi

#

Anyone have used dbt?

serene scaffold
fossil ivy
normal hazel
#

.getdbt

normal hazel
pure moat
#

guys im getting this error could someone help

#

if query[0] == activationWord:
TypeError: 'builtin_function_or_method' object is not subscriptable

serene scaffold
pure moat
#

sure

pure moat
#

this is what got printed

serene scaffold
jolly knoll
#

hello i got a score of 1.0 accuracy for my kNN for k=1 to k=15, what am i doing wrong?

arctic wedgeBOT
jolly knoll
#

exactly

serene scaffold
#

look at query = parseCommand().lower().split. you forgot the () at the end of split

pure moat
#

O

#

TYYY

serene scaffold
# jolly knoll yep

so you think your model can only have 100% accuracy if you've done something incorrectly? we can't guess what we did wrong if we don't know what you did.

serene scaffold
jolly knoll
serene scaffold
jolly knoll
#

ahh i've been told to be worried if my model has 100% accuracy haha. my dimensions are (247165, 19) for my training set. i scaled all numerical datasets before inserting it into a kNN

serene scaffold
#

usually, if you have 100%, it means that your model is very dependent on the training data, and wouldn't perform well in real situations. but I don't know what your model is intended to do.

wet valve
#

Hi I’m new to data science so what all modules should I learn in python for data science 🙂

#

i know pandas and numpy

pseudo basin
#

the first block, I'm learning Machine learning and AI

#

basically, in ML, we make use of matplot to plot graph and sklearn to do the heavy-lifting. try to explain how dataframe says

wet valve
#

thanks

serene scaffold
#

!resources data science

arctic wedgeBOT
#
Resources

The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.

rare socket
#

I am trying to manually randomly change the weights and biases in my neural network but the only way I found to access them is to loop through them so this is what I did but it is not changing the weights and biases at all. This is very cumbersome, is there a easier way to index through it so that it actually changes the weights and biases?

#

using pytorch

restive python
#

Basically I have around 1,250,000 photos uploaded on boto3 and am trying to make an X file with all the rgb values, but colab takes way too long to download the files and turn them into np arrays
anyone have a better idea?

austere swift
#

You should enumerate in the loops to get the indexes along with the values, then change the value at that index to the new value

#

It would be something like param[idx1][idx2] = j

serene scaffold
#

!code

arctic wedgeBOT
#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

plush jungle
#

I'm trying to retrain stylegan3, but I keep running into the following error

#

this command

python train.py --outdir=~/training-runs --cfg=stylegan3-t --data=datasets/ffhq_control.zip --gpus=1 --batch=4 --gamma=8.2 --mirror=1 --workers=1 --snap=50 --tick=4 --cbase=16384 --resume=C:\python\Generative-Adversarial-Networks\stylegan3-main\stylegan3-main\pretrained_models\stylegan3-t-ffhqu-1024x1024.pkl```

produces this error

File "C:\python\Generative-Adversarial-Networks\stylegan3-main\stylegan3-main\training\training_loop.py", line 162, in training_loop
misc.copy_params_and_buffers(resume_data[name], module, require_all=False)
File "C:\python\Generative-Adversarial-Networks\stylegan3-main\stylegan3-main\torch_utils\misc.py", line 163, in copy_params_and_buffers
tensor.copy_(src_tensors[name].detach()).requires_grad_(tensor.requires_grad)
RuntimeError: The size of tensor a (16) must match the size of tensor b (32) at non-singleton dimension 1```

#

I'm resuming from a model trained on 1024x1024 images from the ffhq dataset

rare socket
plush jungle
#

and my dataset, "ffhq_control.zip" is 12 images from that same dataset

austere swift
plush jungle
#

so I figured if I can't even retrain it on the same dataset it's not an issue with my images

#

it also says this

#
Output directory:    ~/training-runs\00006-stylegan3-t-ffhq_control-gpus1-batch32-gamma8.2
Number of GPUs:      1
Batch size:          32 images
Training duration:   25000 kimg
Dataset path:        datasets/ffhq_control.zip
Dataset size:        12 images
Dataset resolution:  1024
Dataset labels:      False
Dataset x-flips:     True

Creating output directory...
Launching processes...
Loading training set...

Num images:  24
Image shape: [3, 1024, 1024]
Label shape: [0]```
desert oar
#

do they provide instructions for running it?

plush jungle
#

yeah

#

in the section of the readme under Preparing Datasets and Training

#

my working theory is that it's something to do with my class labels (or lack thereof)

desert oar
plush jungle
#

yeah, like this

#
python dataset_tool.py --source=C:\python\Generative-Adversarial-Networks\stylegan3-main\stylegan3-main\datasets\ffhq_control --dest=datasets/ffhq_control.zip ```
desert oar
#

what happens if you follow one of their examples exactly as written? e.g. the metfaces example

#

"Fine-tune StyleGAN3-R for MetFaces-U using 1 GPU, starting from the pre-trained FFHQ-U pickle."

plush jungle
#

don't you need to download the entire metfaces dataset for that?

#

that's a terabyte at least I think

#

70,000 images or so

#

which is why I instead tried it with the 12 images I downloaded from ffhq

#

you think I'm missing a config file that comes with those datasets?

desert oar
#

oh i didn't realize it was a huge dataset

#

maybe there's a sample you can download

#

oh i see, the ffhq set is more manageable

#

seems like that should work too though

plush jungle
#

from the readme:

restive python
#

anyone know how to use boto3

#

having a little trouble w it rn

plush jungle
#
Datasets are stored as uncompressed ZIP archives containing uncompressed PNG files and a metadata file dataset.json for labels. Custom datasets can be created from a folder containing images; see python dataset_tool.py --help for more information. Alternatively, the folder can also be used directly as a dataset, without running it through dataset_tool.py first, but doing so may lead to suboptimal performance.```
desert oar
#

makes sense

#

debugging other people's code is always difficult

#

hard to know where the breakdown is... if it were me, i'd probably file a bug report

plush jungle
#

this person seemed to have the same issue

desert oar
plush jungle
#

so I added this --cbase=16384 argument

#

per the answer

restive python
#

and they told me to come here

desert oar
#

please do read the guide on asking good questions

desert oar
#

apparently your cbase etc. options need to match the pre-trained model

restive python
#

i have 1.2 mil photos on s3 and am trying to turn them into a large csv file. Anyone have any idea on how to pickle these, because right now I am downloading each one and it's going to take around 50 days

desert oar
restive python
#

pixel data

desert oar
#

what are you doing with them? why do you need csv data?

restive python
#

into a huge np array

desert oar
#

...do you see how withholding information in your question wastes both yours and everyone else's time?

#

now it is (kind of) a data science question

plush jungle
#

you're trying to make a single numpy array of 1.2 million images?

desert oar
#

why do you need it all in a huge numpy array?

#

that seems ill-advised and like an "XY" problem

restive python
plush jungle
#

Is there even enough ram on any computer to do that?

restive python
#

that's why im asking

restive python
#

idk how to manage this much data

desert oar
plush jungle
restive python
#

i'm making an animal recognition software and I have a harddrive with a lot of trail cam footage that I'm trying to make into something I can train a model with

#

I'm new to this

#

and am trying to learn

#

I've just never made something with this much data

desert oar
#

it's sometimes enticing to try to DIY things, but with a relatively large amount of data, and the relatively sophisticated models required to do ML on it, then you should probably just use a framework and spare yourself the difficulty

restive python
#

thank you so much ❤️

plush jungle
plush jungle
desert oar
fringe anvil
#

so uh, im trying to do this (first image) but im getting this (second image)

here's the code

fig2,ax2 = plt.subplots(figsize=(6,4))
fig2.set_facecolor("None")

dbl_ratio = pd.DataFrame(df2["player1 double faults"]/df2["player1 total points total"]) # good
dbl_ratio_avr = dbl_ratio
year_grpby = df2.groupby("year").max()

x,y = df2["start date"],dbl_ratio

ax2.set_xlabel("Year")
ax2.set_ylabel("Double faults per match")
ax2.scatter(x,y,alpha=0.3) # good
ax2.plot(year_grpby,dbl_ratio,"-",color="orange")
mpl.style.use("default")
desert oar
fringe anvil
#

columns in a data frame

#

pd.read_csv

desert oar
#

those look like strings

#

i highly recommend instead converting start date to a proper datetime column

#

!d pandas.to_datetime

arctic wedgeBOT
#

pandas.to_datetime(arg, errors='raise', dayfirst=False, yearfirst=False, utc=None, format=None, exact=True, unit=None, infer_datetime_format=False, origin='unix', cache=True)```
Convert argument to datetime.

This function converts a scalar, array-like, [`Series`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html#pandas.Series "pandas.Series") or [`DataFrame`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html#pandas.DataFrame "pandas.DataFrame")/dict-like to a pandas datetime object.
fringe anvil
#

year is pandas.core.series.Series and same for start date

desert oar
#

df['start date'].dtype

fringe anvil
#

oh sorry, i just started my course lol

desert oar
#

if you convert from strings to a proper datetime type, then you don't need the year column at all. you can do something like df.resample('AS', on='start date').max()

desert oar
#

also... what is year_groupby supposed to be the max of? right now your code takes the max of all columns

#

furthermore that line looks like a mean, not a max (and a smoothed one at that)

#
df2['start date'] = pd.to_datetime(df2['start date'])

x = df2["start date"]
y = dbl_ratio
y_year_mean = df2.resample('AS', on='start date').mean()

this might get you started, but i think you are missing some other things here

#

please do read the docs and not just copy my code though

novel python
#

@desert oar not sure if you'll remember me from yesterday, but only got time now to test again. Whenever you're free let me know and I'll send you the sample and the pivot table I created.

fringe anvil
fringe anvil
#

lmfao, getting somewhere i guess

desert oar
fringe anvil
desert oar
desert oar
fringe anvil
desert oar
fringe anvil
#

in my head its clear what i want to do, but putting it into code, doesnt look like its working much

desert oar
#

show me the code you used for the messed up chart you just posted above

fringe anvil
# desert oar well answer the practical question here. you want the yearly mean, right? so why...

it might seem simple to you, but to me, it wasnt working. ive tried 100s of iteration to the code. for some reasons, its just not clicking for this exercise. its really the first one where im having this much trouble. im behind, this is the first workshop, theres a second one. and i only have a few days left to upload it to github. i have a full time job, i wish i could take the time to dig into every single documentation, but right now its not possible

fringe anvil
# desert oar show me the code you used for the messed up chart you just posted above
fig2,ax2 = plt.subplots(figsize=(6,4))
fig2.set_facecolor("None")

df2['start date'] = pd.to_datetime(df2['start date'])

dbl_ratio = pd.DataFrame(df2["player1 double faults"]/df2["player1 total points total"]) # good
dbl_ratio_avr = dbl_ratio

x = df2["start date"]
y = dbl_ratio

ax2.set_xlabel("Year")
ax2.set_ylabel("Double faults per match")
ax2.scatter(x,y,alpha=0.3) # good
ax2.plot(x,dbl_ratio,"-",color="orange")
mpl.style.use("default")
desert oar
desert oar
#

i totally understand the stress of being short on time and not understanding what's going on

fringe anvil
desert oar
#

also ax2.plot(x,dbl_ratio,"-",color="orange") you didn't even plot dbl_ratio_avr

#

i think you understand more than you realize, you are just making silly mistakes at this point. maybe fatigue?

fringe anvil
#

been at it for 12 years. thats why im trying the bootcamp and maybe get a job. change of career. im all in

desert oar
#

you only need to do this once, when you load the dataset:

df2['start date'] = pd.to_datetime(df2['start date'])

and this should produce something like the plot you're looking for:

dbl_ratio = pd.DataFrame(df2["player1 double faults"] / df2["player1 total points total"])
dbl_ratio_avg = dbl_ratio.resample('AS', on='start date').mean()

x = df2["start date"]
y = dbl_ratio

fig2, ax2 = plt.subplots(figsize=(6,4))
fig2.set_facecolor("None")

ax2.scatter(x, y, alpha=0.3)
ax2.plot(x, dbl_ratio_avg, "-", color="C1")

ax2.set_xlabel("Year")
ax2.set_ylabel("Double faults per match")

plt.show()
fringe anvil
#

also, i usually come back from work, take a shower and code.. then i realise, like now, that i did not eat yet

desert oar
#

go eat and don't look at a computer screen. then go look at the code i just posted above and see if it makes more sense

wise iris
#

can someone explain me what's pytorch and why do I need it to train YoloV5?

desert oar
wise iris
#

or how to find out

desert oar
wise iris
#

no lol

desert oar
#

do you have a python environment set up?

wise iris
#

I was following tutorials and they just brought me here

wise iris
plush jungle
#

I managed to get the stylegan code to not throw any errors by just trying random pretrained models until one worked. But now the code freezes at

Setting up PyTorch plugin "filtered_lrelu_plugin"...```
desert oar
#

tbh it might be a little easier to do this with conda, but installing and setting up conda is a bit of a pain

plush jungle
#

is there any way to know why it's freezing here?

desert oar
# wise iris so should I do it?

no, don't bother.

activate the venv, then run:

python -m pip install --extra-index-url https://pypi.ngc.nvidia.com nvidia-cuda-runtime-cu11

this will install cuda toolkit, as per https://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/index.html#pip-wheels-installation-windows

you should be able to check the exact cuda version that was installed using the nvcc command.

then you should be able to run (with the venv still active):

python -m pip install torch torchvision

and you should have torch available in the venv

#

i assume you installed python 3.10 from python.org?

wise iris
wise iris
fringe anvil
#

@desert oar hmm i get KeyError: 'The grouper name start date is not found'

desert oar
fringe anvil
#

in the documentation for resample, i cant find "AS". where did you get those keyword?

wise iris
desert oar
desert oar
#

@fringe anvil i think it would be easier to do this using a datetime index, but that's a whole big pandas topic that i think we can hold off on (but you should learn it at some point)

#

oh also, one more thing

#

you don't need pd.DataFrame here:

dbl_ratio = df2["player1 double faults"] / df2["player1 total points total"]
#

the full code:

df2 = ...

df2["start date"] = pd.to_datetime(df2["start date"])

df2["dbl_ratio"] = df2["player1 double faults"] / df2["player1 total points total"]

dbl_ratio_year_avg = df2.resample("AS", on="start date")["dbl_ratio"].mean()

x = df2["start date"]
y = df2["dbl_ratio"]

fig2, ax2 = plt.subplots(figsize=(6,4))
fig2.set_facecolor("None")

ax2.scatter(x, y, alpha=0.3)
ax2.plot(x, dbl_ratio_year_avg, "-", color="C1")

ax2.set_xlabel("Year")
ax2.set_ylabel("Double faults per match")

plt.show()
fringe anvil
#

hmm, title dont show on the y and x now, and the style isnt white anymore.. idk if its my computer being janky lol.. i restarted the kernel reran everything. now i get ValueError: x and y must have same first dimension, but have shapes (1179,) and (15,)

#

ok set_xlabel needs to be called before .scatter and .plot

plush jungle
#

ok I found out that the reason it's freezing at filter_lrelu_plugin is cause I have two versions of it in my pytorch files

#

how do I know which one to delete?

fringe anvil
#

3.9 is last version, but which version does pytorch uses?

novel python
#

I just pivoted a table and wanted to get rid of the top row (all DATA_USAGE_GB__C), and bring the months 1 row down so that the 3rd current row becomes the top one

#

wanted to do that with python, not simply moving them on the .csv file

#

anyone got an idea how to do that? got kinda confused trying here

fringe anvil
#

@desert oar the resample creates a shape of 15, which doesnt match the shape of x "start date" start date has 1179 row

#

we passed it the whole column, it should have the same rows, both of them

fringe anvil
#

new code, new error. getting closer to the shape of x..
im able to generate the same graph with groupby.. not sure if its better or not
ValueError: x and y must have same first dimension, but have shapes (1179,) and (926,)

df2["start date"] = pd.to_datetime(df2["start date"]) # should be good now

fig2, ax2 = plt.subplots(figsize=(6,4)) # good
fig2.set_facecolor("None") # good

plt.style.use("default") # good
ax2.set_xlabel("Year") # good
ax2.set_ylabel("Double faults per match") # good

df2["dbl_ratio"] = (df2["player1 double faults"]/df2["player1 total points total"]) # good
dbl_ratio_avr = df2.groupby(["start date","dbl_ratio"])["dbl_ratio"].mean() # not good


x = df2["start date"] # good
y = df2["dbl_ratio"] # good

ax2.scatter(x, y, alpha=0.3) # good
ax2.plot(x, dbl_ratio_avr, "-", color="C1") # need to change something for y
#

152+926=1078 .. so still missing 101 rows .. ah geez this graph.. 3 failed days in a row lol

desert oar
#

or even just

dbl_ratio_avr.plot(ax=ax2, color='C1')

using the pandas built-in plotting helpers

#

(this is a taste of why indexes are useful)

desert oar
fringe anvil
#

@desert oar ❤️

lapis sequoia
#

how can I keep rows in a df, which have/don't have a match in another df while merging

#

anti_join, semi_join kind of thing

fossil sphinx
#

Greetings! I have a functional json implementation, for the most part. I am having difficulties with this section:

        puntreturns_t1 = dataGameStats['teams'][0]['stats'][7]['data'].split("-")[0]
        puntreturnsyards_t1 = dataGameStats['teams'][0]['stats'][7]['data'].split("-")[0]

Appropriate JSON code:

       {
           "stat" : "Punt Returns: Number-Yards",
           "data" : "-"
       }

How can I get puntreturns_t1 AND puntreturnsyards_t1 == 0 / None?

I am getting the following error with the current code:

ValueError: invalid literal for int() with base 10: ''

trail yacht
#

I need some help. I have to do a project on pneumonia detection using deep learning and machine learning. Its a group project and we just know machine learning basics and a little algo. We don't know any deep learning. We do have the code but don't know how to distribute among 3 people. And also how to quickly learn deep learning.. just need to learn straight from the code... They will teach us later. Any tactics?

idle cairn
#

Does anyone know why my spline looks like this (blue)? I would expect it to be like the one i drew op top (red)..

serene scaffold
#

!code

arctic wedgeBOT
#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

arctic wedgeBOT
#

:incoming_envelope: :ok_hand: applied mute to @earnest raven until <t:1666271518:f> (10 minutes) (reason: newlines rule: sent 106 newlines in 10s).

The <@&831776746206265384> have been alerted for review.

serene scaffold
#

!unmute 130213385265610753

arctic wedgeBOT
#

:incoming_envelope: :ok_hand: pardoned infraction mute for @earnest raven.

serene scaffold
#

@earnest raven use the paste bin

#

!paste

arctic wedgeBOT
#

Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

earnest raven
#

Thank you! appreciate it.

serene scaffold
#

but I appreciate that you used ```py. sorry you got zapped.

earnest raven
#

No problem 🙂

#

I've been trying to make a distribution graph based on a dataset that contains duplicate data in an array like so py [5, 15, 15, 15, 15, 15, 20, 20, 20, 30]
However when I change the X axis to be a linear range instead of the actual values of the array, the graph morphs into something completely different.
This is my code which results in the first graph: https://paste.pythondiscord.com/omojaqoxoz
The second graph was created by using x = np.linspace(min(gatewayLatencyValues), max(gatewayLatencyValues), len(gatewayLatencyValues)), however this completely morphs the graph. It is notable that the boxplot stays correct regardless of what the X axis is, since it is generated by matplotlib and not based on array indices.

Anyone have any idea how to solve this?

mighty patio
#

what are you trying to achieve by changing the values on the x-axis?

earnest raven
mighty patio
# earnest raven Id like to smooth the lines out, but for that I need an X axis that has a lot of...

Your calculate_normal function does 2 things.
First it calculates the average and standard deviation
Then it makes the normal
You should separate this into two functions, the first avg, std = get_fit(array) and the second y = make_curve(x, avg, std)
The x you input to the second can have a high density of points, and should not be the same as the array you input to the first. This will give you a smooth curve

earnest raven
#

I also forgot to mention another thing, i'd like to add another graph to it with a different dataset using the same x axis

#

but I will keep what you mentioned in mind

#

I tried to just add it to the existing one, but it has more datapoints but less latency so that means the entire x axis has a different scale

mighty patio
#

I also advise you to set both dpi and figsize in plt.subplots(). Doing so allows you to control the fontsize regardless of the number of pixels in your graph
A high dpi+low figsize makes the text big while low dpi+high figsize makes the font small

mint palm
#

if do

from numba import cuda 
device = cuda.get_current_device()
device.reset()

will it affect other users(using that gpu)?

mint palm
#

i hope my prof data doesnt get reset

earnest raven
mint palm
#

RuntimeError: CUDA error: out of memory

#

what to do?

serene scaffold
#

keep in mind that none of us have any idea what you're doing that resulted in you getting that error unless you tell us.

mint palm
#

is this relevant?

$ free -g
              total        used        free      shared  buff/cache   available
Mem:            503         414          40           8          47          75
Swap:           255         255           0
#

seems less than normal to me though

serene scaffold
#

but if the model itself needs more memory than the size of the GPU, I think you're SOL.

mint palm
serene scaffold
mint palm
#

my video is pretty small, and i am doing it one by one

#

ok one minute

#

its this one(the model).
but i dont know about GPU, con i know over ssh?

sharp citrus
#

hello guys I'm a very new in here discord and data science. I started to an internship and I have to some forecast with ml is there anybody to help to find some resources ?

mint palm
# serene scaffold how big is the model, and how much memory does your GPU have? please answer usin...

got this
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.141.03 Driver Version: 470.141.03 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|

#

| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 Off | N/A |
| 42% 89C P2 282W / 350W | 23647MiB / 24268MiB | 92% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce ... Off | 00000000:25:00.0 Off | N/A |
| 47% 86C P2 215W / 350W | 2667MiB / 24268MiB | 100% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 NVIDIA GeForce ... Off | 00000000:41:00.0 Off | N/A |
| 30% 37C P8 21W / 350W | 14999MiB / 24268MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 3 NVIDIA GeForce ... Off | 00000000:61:00.0 Off | N/A |
| 30% 34C P8 25W / 350W | 13515MiB / 24268MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 4 NVIDIA GeForce ... Off | 00000000:81:00.0 Off | N/A |
| 87% 68C P2 302W / 350W | 16301MiB / 24268MiB | 98% Default |
| | | N/A |

#

+-------------------------------+----------------------+----------------------+
| 5 NVIDIA GeForce ... Off | 00000000:A1:00.0 Off | N/A |
| 50% 58C P2 143W / 350W | 3213MiB / 24268MiB | 100% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 6 NVIDIA GeForce ... Off | 00000000:C1:00.0 Off | N/A |
| 33% 54C P2 144W / 350W | 3213MiB / 24268MiB | 100% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 7 NVIDIA GeForce ... Off | 00000000:E1:00.0 Off | N/A |
| 31% 47C P2 143W / 350W | 3159MiB / 24268MiB | 100% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage

lapis sequoia
#

in sklearn's accuracy_score function, how do I implement sample_weight? https://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html
would I use
label, count = np.unique(y_true, return_counts = True)
and call accuracy_score(y_true, y_pred, count)

plush jungle
#

I'm trying to retrain stylegan3
https://github.com/NVlabs/stylegan3
but I keep getting this error:

  File "C:\python\Generative-Adversarial-Networks\stylegan3-main\stylegan3-main\torch_utils\misc.py", line 163, in copy_params_and_buffers
    tensor.copy_(src_tensors[name].detach()).requires_grad_(tensor.requires_grad)
RuntimeError: The size of tensor a (512) must match the size of tensor b (1024) at non-singleton dimension 1```
#

I can't figure out why the tensor shapes would be wrong. I'm running this command

py train.py --outdir=~/training-runs --cfg=stylegan3-t --data=datasets/control_dataset.zip --gpus=1 --batch=32 --gamma=8.2 --mirror=1 --workers=1 --snap=50 --tick=4 --resume=C:\python\Generative-Adversarial-Networks\stylegan3-main\stylegan3-main\pretrained_models\stylegan3-r-ffhqu-256x256.pkl```
which runs a model trained on 256x256 images on my dataset (control_dataset.zip) which is also 256x256 images
fresh tiger
#

Hi, I have a question regarding ANNs, in particular what the neurons in the hidden layer represent

In the screenshot, ive just drawn up a quick ANN thats used to predict house prices based on the three features: num. of bedrooms, area, and dist. from closest school.

So the hidden layer consist of two layers of neurons. Looking at the first layer in the hidden layer, each neuron will take as input the same features, but the weights may be different, and hence different features may have more impact in some neurons than others. The neurons then apply an activation function etc and produce an output.

My question is, what exactly is this output? What sort of information is a specific neuron calculating and outputting?

Im assuming this is completely flying over my head, but I can not seem to find a clear/direct answer on this, and would appreciate any help

plush jungle
#

every hidden layer neuron is computing a score (the activation of that neuron) based on all the ones in the previous layer

#

I like to think of it like each hidden layer neuron is an olympic judge, and the input neurons are a diver

#

each input neuron is a quality that the diver had: gracefulness, amount of splash, difficulty of dive

#

and the hidden layer judges each value those qualities differently

#

the second hidden layer is like another panel of judges

#

only instead of judging the diver, they judge the olympic judges

#

and they too have preferences, so maybe one really hates the russian judge but really likes the swedish judge etc.

#

I don't know if this is making any sense, but TLDR; the first hidden layer finds patterns in the input data. the second hidden layer finds patterns in the patterns

wooden sail
serene scaffold
wooden sail
#

yeah that'd be my take as well

plush jungle
wooden sail
#

you don't need to (and actually can't) draw them geometrycally. you could just use thin rectangles and fat rectangles (and cubes/prisms/etc when dealing with multidimensional stuff and/or tensors)

serene scaffold
plush jungle
#

if you look, he never actually draws the whole net as vectors

#

this is literally just one neuron

plush jungle
wooden sail
#

well it's basically what you drew there just now, just removing the annotations of the elements of each object

#

but lemme fish something up

#

like what you see here for the convolutional parts

#

there's no reason you can't do the same for a dense network

fresh tiger
#

There just one thing im still a bit unclear on

#

what would these patterns consist of?

wooden sail
#

my artistic interpretation. regarding the patterns, that depends entirely on what you're training the network to do, but in general they are not human-interpretable. most deep learning architectures are not interpretable

serene scaffold
#

So artistic

fresh tiger
#

Oh, so all we know is that it builds upon some sort of pattern?

#

And so via training the model, we set the weights so that at each layer our neurons find the pattern that lead to the best/most accurate output?

wooden sail
#

pretty much. that's why many people dislike it

#

it's hard to derive strict guarantees for its performance, but so far it anyway works better than most classical methods

fresh tiger
#

Ok ok I see, so just to summarize:

The different set of weights for each neuron will essentially lead to a different pattern being detected by each neuron. The outputs of all neurons when reaching the end in a way "combine"/each neuron contributes to affecting how the overall model will look at the end, and hence we can get models that can fit to any kind of data (ie models with many squiggly lines when graphed)?

#

OH so like if neuron 1 for example had weights that emphasized num of bedrooms and area

#

there could be a pattern in terms of num of bedrooms and area of house having a particular affect on the output right?

wooden sail
#

sure

fresh tiger
#

Alright, I think its making sense to me now. Thank you all very much for ur help 🙂

wooden sail
#

i prefer looking at it from the perspective of parameter estimation. you assume a model and find the model parameters that best explain the data

#

the deep learning model is "ayyy lmao idk what the model is, but this thing has so many parameters it can't go wrong"

fresh tiger
#

Right I see, so like in this image for example, the neurons which are connected with a purple line may have higher weights/parameters when connecting to the dog output neuron. and the neurons connected with the green lines may have higher weights for the cat neuron and lower wegiths for the dog neuron

#

So our model learned via estimating the parameters which neurons have more emphasis on determining if we have a dog or a cat.

wooden sail
#

well, but what you're calling a "neuron" here are just entries of intermediate (or final) vectors

#

the only reason those matter are because you yourself chose which one represents dog and which one represents cat

#

but yeah that's more or less the idea

#

the caveat being that the stuff going into that layer already has no interpretation

fresh tiger
wooden sail
#

mhm

fresh tiger
#

Sorry, I was referring to the neurons before those 2

wooden sail
#

the idea is basically the same

#

since you're applying an affine transformation, it's two vectors related via a matrix

#

you're finding the entries of that matrix, which correspond to the weights, as you call them

fresh tiger
#

Ahaa ok yes. Thank you so much for all of your help! I really appreciate it 🙂

strong sedge
#

I have been working on my own neural network implementation using numpy
https://github.com/sivansh11/sklearn-nn-extension
try it out! I feel like there is probably a bug some where in the code lmao
I want this to be an extension to sklearn's neural network capabilities, ie work with all the infrastructure that sklearn has built

GitHub

A separate extension to sklearn for adding modular neural networks which in theory should be able to work with sklearn's infrastructure. - GitHub - sivansh11/sklearn-nn-extension: A separa...

desert oar
# strong sedge I have been working on my own neural network implementation using numpy https:/...

nice little project. admittedly i don't think i or many other people would use this when something like skorch is available:

https://towardsdatascience.com/skorch-pytorch-models-trained-with-a-scikit-learn-wrapper-62b9a154623e

https://skorch.readthedocs.io/en/latest/?badge=latest

but it seems like a good self study project!

Medium

A guide to understand how easy and simple it is to train PyTorch models with SKORCH

strong sedge
#

the only thing I dont understand is how/where to implement l1 and l2 regularisation

lapis sequoia
#

Hello, I have looked everywhere for the answer to this. I am using keras / tensorflow and creating a model

    history = model.fit(
  File "C:\Python310\lib\site-packages\keras\utils\traceback_utils.py", line 70, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "C:\Python310\lib\site-packages\tensorflow\python\eager\execute.py", line 54, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InvalidArgumentError: Graph execution error:

Input is empty.
         [[{{node decode_image/DecodeImage}}]]
         [[IteratorGetNext]] [Op:__inference_test_function_7901]
2022-10-20 22:04:21.910095: W tensorflow/core/kernels/data/cache_dataset_ops.cc:856] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.```
#

The error is within the package, tried reinstalling, doesnt wanna work

wary crown
#

so I am trying to splt a csv into X and y
here is my code:

# Python version
import sys

from sklearn.metrics import make_scorer

print('Python: {}'.format(sys.version))
# scipy
import scipy

print('scipy: {}'.format(scipy.__version__))
# numpy
import numpy

print('numpy: {}'.format(numpy.__version__))
# matplotlib
import matplotlib

print('matplotlib: {}'.format(matplotlib.__version__))
# pandas
import pandas

print('pandas: {}'.format(pandas.__version__))
# scikit-learn
import sklearn

print('sklearn: {}'.format(sklearn.__version__))

# compare algorithms
from pandas import read_csv
from matplotlib import pyplot
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import StratifiedKFold
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC
from sklearn.feature_selection import RFE

# Load dataset
url = "energy.csv"
#url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/iris.csv"
names = ['YEAR', 'TOTAL', 'PURCHASED', 'NUCLEAR', 'SOLAR', 'WIND', 'NATURAL_GAS', 'COAL', 'OIL']
dataset = read_csv(url, names=names)
print(dataset.shape)

# Split-out validation dataset
array = dataset.values
X = array[:, 0:8]
y = array[:, 8]


print(y)

when I print y in the last line tho
I get this:

[  19.9948    0.        0.        0.        0.        0.        0.
    0.        0.      260.2    1326.9          nan       nan       nan
       nan       nan  723.18   2070.    ]

which I dont believe is supposed to happen (the 'nan' thing)
can anyone who knows this kind of stuff tell me whats wrong because im not really sure
thanks in advance

lapis sequoia
fringe anvil
#

lets say i made a nice scatterplot that uses the whole dataframe. and i want to generate smaller scatterplots, but limit the dataframe to a specific entry in one of the column. so column "species" has 6 different birds. how do i generate those similar but slightly different subset of my main scatterplot? idk if i make any sense

#

so here's what im working with and it made my original scatterplot, but 6 times. now i just need 6 different scatterplots with just the data of a specific entry of my column "surface". which has 6 entries.

num_rows, num_cols = 3,2
fig3, ax3 = plt.subplots(num_rows,num_cols,figsize=(10,12))
fig3.set_facecolor("None") # good

plt.style.use("default") # good

for i in range(num_rows):
    for j in range(num_cols):
        ax3[i,j].scatter(x,y,alpha=0.3)
        ax3[i,j].plot(dbl_ratio_year_avg.index, dbl_ratio_year_avg, "-", color="C1")
        ax3[2,0].set_xlabel("Year")
        ax3[2,1].set_xlabel("Year")
        ax3[0,0].set_ylabel("Double faults per match")
        ax3[1,0].set_ylabel("Double faults per match")
        ax3[2,0].set_ylabel("Double faults per match")
desert oar
fringe anvil
desert oar
#

and maybe do some clever indexing as well, but that's not strictly necessary

#

let's say that you want to split according to a series or array called categ

#

the only tricky bit here is figuring out which element in the axes array corresponds to which category

fringe anvil
#

this is what ive found. fifth column of my dataframe

desert oar
#

there are a couple different ways to do it actually

#

@fringe anvil
you can use some clever indexing for this:

df2["dbl_ratio"] = (df2["player1 double faults"] / df2["player1 total points total"])
surfaces = df2['surface'].unique().to_list()

num_rows, num_cols = 3,2
fig3, axs3 = plt.subplots(
    num_rows, num_cols,
    figsize=(10,12),
    sharex=True, sharey=True,
)

for k, surface in enumerate(surfaces):
    df_surface = df2.loc[df2['surface'] == surface]
    dbl_ratio_year_avg = df_surface.resample('AS', on='start date')["dbl_ratio"].mean()
    i, j = np.unravel_index(k, (num_rows, num_cols))
    a = axs3[i, j]
    a.scatter(df_surface['start date'], df_surface['dbl_ratio'], alpha=0.3)
    a.plot(dbl_ratio_year_avg.index, dbl_ratio_year_avg, color="C1")

ax3[2,0].set_xlabel("Year")
ax3[2,1].set_xlabel("Year")
ax3[0,0].set_ylabel("Double faults per match")
ax3[1,0].set_ylabel("Double faults per match")
ax3[2,0].set_ylabel("Double faults per match")

fig.tight_layout()
plt.show()
#

you can also do this a bit more elegantly with pandas groupby, but this is good enough to start with

#

np.unravel_index is worth understanding. think of a 4x3 3x3 array:

a00  a01  a02
a10  a11  a12
a20  a21  a22

now imagine "walking" through this array by going across each row. when you get to the end of the row, jump down to the beginning of the next row, like a typewriter:

-->---->---->-|
a00  a01  a02 |
|--------------
-->---->---->-|
a10  a11  a12 |
|--------------
-->---->---->-|
a20  a21  a22

idk if my hilariously bad illustration helps

#

what's the array index of the 6th step (k = 5 with zero-indexing) along that walk? it's 1, 2.

imagine if were to flatten out the array, connecting rows end-on-end, to produce a 1-d array. then flatten(a)[5] == a[1, 2]

#

!eval numpy calls this "ravel" (a pun on "unravel", like yarn or thread):

import numpy as np

a = np.arange(9).reshape((3, 3))
assert a.ravel()[5] == a[1, 2]
arctic wedgeBOT
#

@desert oar :warning: Your 3.11 eval job has completed with return code 0.

[No output]
desert oar
#

and you can convert between these "flat" ("raveled") indexes and the "non-flat" ("unraveled") array indexes with np.unravel_index and np.ravel_multi_index

#

so either of these would work

    i, j = np.unravel_index(k, (num_rows, num_cols))
    a = axs3[i, j]
    a = axs3.ravel()[k]
#

@fringe anvil does that make any sense at all?

#

this is actually how numpy arrays are stored internally: as one big flat array. all the multi-dimensional axis stuff is an illusion, produced by looping over the array contents internally

fringe anvil
#

sorry im back. almost forgot to load my winter tires in the car for tomorrow lol

fringe anvil
desert oar
fringe anvil
desert oar
fringe anvil
#

thats what unravel_index does from what i can see

desert oar
fringe anvil
#

so we are enumerating on surfaces, is that df2["surface"]