#data-science-and-ml

1 messages · Page 28 of 1

serene scaffold
#

Sure. and lo and behold, the difference is negligible.

In [21]: %%timeit
    ...: i = 0
    ...: while i < 1_000_000:
    ...:     i += 1
    ...:
28.9 ms ± 315 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [22]: %%timeit
    ...: i = 0
    ...: for _ in range(1_000_000):
    ...:     i += 1
    ...:
26.4 ms ± 224 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
strong sedge
#

yeah, for loops are faster, but not that much faster

serene scaffold
strong sedge
serene scaffold
idle urchin
fading crane
#

Are there any other communities like this one

#

For AI?

#

I literally could not find any others that had activity

strong sedge
serene scaffold
#

So for each row, you're trying to figure out how many prior rows satisfy some condition, or what?

idle urchin
#

yeah

#

and based on that increase the row_index

serene scaffold
# idle urchin yeah

pandas doesn't effectively support operations like that, but you could speed it up by caching the result for prior rows.

#

that way you don't have to start looping from the beginning each time.

opal garden
#

Hi there, I’d like to start career as data analyst. For know, I know basics of pandas and sql, of corse python core too. What libraries next should I learned?

serene scaffold
idle urchin
#

is there a way to fix that

opal garden
serene scaffold
serene scaffold
opal garden
#

Im at 2nd year, this time we had only python córę

#

Core*

serene scaffold
wheat snow
#

what is the exact subject that your studfying=

#

?

uncut loom
#

Hey

#

Im trying to make an Ai

strong sedge
opal garden
strong sedge
#

can you explain what your trying to do better ?

uncut loom
#

Am I right here

serene scaffold
opal garden
uncut loom
#

Does anyone knows how to create new objects while the code is running

#

(Python)

idle urchin
strong sedge
serene scaffold
#

@idle urchin is there a CSV that animals comes from? please drag the file into this chat.

strong sedge
#

yeah, share the file and also, explain what exactly your trying to do

#

that would help

strong sedge
# serene scaffold Sure. and lo and behold, the difference is negligible. ```py In [21]: %%timeit ...

sorry for being petty 😅 but

Number of iterations: {100}
4.923000233247876e-06
5.7369998103240505e-06
Number of iterations: {10000}
0.0003440270002101897
0.0006337819995678728
Number of iterations: {500_000}
0.021400072999313124
0.03858755500004918

code

import time


def for_loop(iters):
    i = 0
    for _ in range(iters):
        i += _


def while_loop(iters):
    i = 0
    _ = 0
    while _ < iters:
        i += _
        _ += 1
        

def timer(f, iters = 1000000):
    start = time.perf_counter()
    f(iters)
    return time.perf_counter() - start

print("Number of iterations: {100}")
print(timer(for_loop,   100))
print(timer(while_loop, 100))

print("Number of iterations: {10000}")
print(timer(for_loop,   10000))
print(timer(while_loop, 10000))

print("Number of iterations: {500_000}")
print(timer(for_loop,   500_000))
print(timer(while_loop, 500_000))

atleast on my machine, its about 2x faster

#

I consider 2x significant lol

#
Number of iterations: {100}
4.971000635123346e-06
5.819999387313146e-06
Number of iterations: {10000}
0.00035766699966188753
0.0006018149997544242
Number of iterations: {500_000}
0.028559688000314054
0.04812260699964099
Number of iterations: {1_000_000}
0.05709153600037098
0.08422518800034595```
serene scaffold
idle urchin
strong sedge
#

ummm, yeah

slate gate
#

is there a python library for the engilish dictionary where i can get a random word, or a random word of a specfic type (ex: random noun)

serene scaffold
# strong sedge yoo its chill, I am not trynna one up you lmao, you obviously know more than me ...

a for loop is essentially a while True loop that passes the iterator to next repeatedly until it raises StopIteration, at which time it breaks. So the overhead of each kind of loop depends on what the while condition is, or (for for loops) what work the iterator has to do to produce new values.

The execution of the code inside the loop, meanwhile, is completely unaffected by what kind of loop it is.

strong sedge
#

but its implemented in C

#

not python, thats why for is faster

#

lemme try using range with while

#

maybe thatll be faster

#

yeaah, 100%

serene scaffold
strong sedge
#

for expensive computations, the computation would be the bottleneck

serene scaffold
wheat snow
#

guys i need some help, i wanna add a complete column to a tkinter combobox, how would i do that?

#

its a long column btw

#
0        2022-06-17
1        2022-06-09
6        2022-06-08
8        2022-06-05
10       2022-06-02
            ...
23022    2018-07-16
23030    2018-07-15
23047    2018-07-13
23059    2018-07-12
23073    2018-07-11
``` this is my output for the general colöumn
#

now i wanna add the column into a combobox

#

but smh it always takes the index with it

atomic palm
#

is p,q value in ARIMA in range of 50-60 is correct?

wheat snow
#

i tried

cb_start_time['values']=   df_date['Dates'].reset_index(drop=True)  
``` but this doesnt seem to work
gaunt anvil
serene scaffold
gaunt anvil
serene scaffold
#

Also, that's scientific notation. Though you might already know that.

gaunt anvil
#

yh

#

10^-3

#

alr ty i'll try it

#

how would i know if the learning_rate is small enough

#

training loss converges to 0?

serene scaffold
# gaunt anvil how would i know if the learning_rate is small enough

The learning rate determines by how much you adjust the weights after each epoch. And if your loss is oscillating (which means "going up and down repeatedly"), that means you keep overcorrecting, and then overcorrecting the overcorrection, and then overcorrecting again.

#

Because you can't make an adjustment small enough to get to the optimum

gaunt anvil
#

ah i see, i didn't know that

#

thanks!

#

i assume as we continue to train then, we should expect to see training loss converging to 0?

serene scaffold
gaunt anvil
#

hmm

#

thank you :>

arctic wedgeBOT
#

:incoming_envelope: :ok_hand: applied mute to @misty swan until <t:1667769816:f> (10 minutes) (reason: duplicates rule: sent 4 duplicated messages in 10s).

The <@&831776746206265384> have been alerted for review.

tame ocean
#

Yo guys I need help with making my first tensorflow project

#

I don't understand ml and ai much

#

If any of you could briefly explain the basic concepts of it

serene scaffold
hasty mountain
#

Hey guys, I want to make a neural network in Pytorch which has multiple outputs, and one of those outputs is conditioned by another.
However, I want the backpropagation for output A to be independent from the backpropagation for output B.
How should I do this? Simply using .detach()?

Code example:


class Subject(torch.nn.Module):

  def __init__(self):

    super(Subject,self).__init__()

    self.linear1 = torch.nn.Linear(1, 50)
    self.linear2 = torch.nn.Linear(100, 1)

  def forward(self, input):

    outA = self.linear1(input)

    input = torch.cat((outA, outA), 1) # This generates a conflict in backprop
    
    input = torch.cat((outA.detach(), outA.detach()), 1) # Maybe this is what I want?
    
    outB = self.linear2(input)

    return outA, outB

#

lol...it seems this is also making the output from the first iteration in the batch condition the output from the rest of the batch.
Conditioning a bit too much.

serene scaffold
#

I wonder if that would just make it not backprop all the way

hasty mountain
#

Well, the idea would be pass the outA into one loss, and outB into another loss, and apply backward on each loss.
I suppose lossA would backprop through linear1 and lossB through linear2?

#

At least, that's the idea...

hasty mountain
#

Tested and confirmed: exactly what I want brainmon

serene scaffold
#

YAY

hasty mountain
#

Now I just have to wait and confirm if my learning rate is too high or if the model is simply still a bit confused with the data...as one of the output losses went from trillions to millions and is now going to the moon...but it's only the 7th epoch.

#

Also, is it normal to use a LogSoftmax + NLLLoss and get a loss value of...like...200.000?

idle urchin
#

are there any ways that are faster then "np.where" that I can search if a column in a dataframe contains a certin value and return that row index

hasty mountain
idle urchin
#

like I even tried it

hasty mountain
#

Oh...

idle urchin
hasty mountain
#

Nah, I don't use pandas that much since I've started studying neural networks

#

I just use it once in a while for visualizing data, separating X and y...

gaunt anvil
rugged comet
#

If anyone has the time, would someone please review my code and give me some guidance on where to go next? The accuracy is barely better than guessing randomly.
The important parts are build_mtg_model, build_preprocessing_model, and main. The file attached is main.py. Most of the other functions are commented-out because they were previous tries.
https://paste.pythondiscord.com/supecupuni
If you want to take a look at mtg.py, which loads the data, just let me know.

woeful hedge
#

Can someone give me some info and guidance on where to begin with making a local closed system NLP system that I can train with a neural network to read books from my Google play library or from my local hard drive.

gaunt anvil
#

how do i convert the model.inference to output a mel spectrogram so i can run it thru smth other than waveglow

gaunt anvil
#

what does this bit of code do? i.e. what's reduced_loss for

fervent hatch
#

Hello i just want to ask what does the cross validation score do?

drifting imp
#

Hi, I need a pre-trained model or library which can recognise hate speech in the input text. Any suggestions? Thanks <3.14

fossil ivy
#

good morning everyone

#

How can I get rid of the grey-ish background of the plot?

mighty patio
fossil ivy
#
import seaborn as sns
import matplotlib.pyplot as plt

def create_box_whisker(x):

    #sns.set(rc={'figure.figsize': (18, 8), 'axes.facecolor': 'white', 'figure.facecolor': 'white'})
    fig, ax = plt.subplots(figsize=(22, 8))
    #sns.set(rc={'axes.facecolor': 'white', 'figure.facecolor': 'white'})

    sns.barplot(x="Paper", y="Value", data=x, color="grey")
    plt.xlabel("")
    plt.ylabel("Dismantling duration per MW [h]", fontsize=22)
    plt.xticks(fontsize=22, rotation=0)
    plt.yticks(fontsize=22)

    plt.show()
#

This is the code, apparently the sns.set makes some default changes to the plot

#

so I did plt.subplots to define the figsize instead, making the grey-ish background disappear

mighty patio
# fossil ivy ```py import seaborn as sns import matplotlib.pyplot as plt def create_box_whis...

This is why I prefer pure matplotlib instead of seaborn.
Anyways, I suggest you set a higher dpi and lower figsize instead of adjusting the fontsize manually
Try the following

def create_box_whisker(x):
    fig, ax = plt.subplots(figsize=(11, 4), dpi = 200)
    sns.barplot(x="Paper", y="Value", data=x, color="grey", ax = ax)
    ax.set_xlabel("")
    ax.set_ylabel("Dismantling duration per MW [h]", fontsize=22)
    fig.show()
fossil ivy
#

looks even better

#

thanks

fossil ivy
#

@mighty patio may I ask you another question?
Im using pyplot to plot the results of my simulation model. It looks like this.
How hard would it be to have an average axhline for each plot in there? I've tried making it work but, again, couldn't find something for multiple lines

#

(also just added plt.tight_layout()) to have the x label shown properly

mighty patio
#

I have never used axhline before, but I assume you mean something like this?

import matplotlib.pyplot as plt
import numpy as np

x = np.linspace(1,13,200)
y1 = np.cos(2*np.pi*x/12)+1
y2 = np.cos(2*np.pi*x/12+1)**3+1.7

fig, ax = plt.subplots(1,1, figsize = (6,4), dpi = 200)
ax.plot(x, y1, label = "A", color = [1,0.5,0])
ax.plot(x, y2, label = "B", color = [0,0.5,1])
ax.axhline(np.average(y1),ls = "--", color = [1,0.5,0])
ax.axhline(np.average(y2),ls = "--", color = [0,0.5,1])

ax.set_xticks(np.arange(12)+1)
ax.set_xticklabels(["Jan\n2022","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov", "Dec"])
ax.legend()
ax.set_xlabel("Starting date")
ax.set_ylabel("Project cost [M€]")
ax.set_xlim(1,13)
fig.tight_layout()
fig.savefig("temp.png")
fossil ivy
#

yes exactly

#

seems like I will have to make changes then

#

because I used

def create_timeseries(x, s, f):
    final_df = pd.concat(x)

    averages = final_df.groupby(["Vessel", "Start Date"]).mean()
    final_df.set_index(["Vessel", "Start Date"])
    averages.to_excel("SIMRESULTS_COSTPERMW_AVG.xlsx")

    with pd.option_context("display.max_rows", None,
                           "display.max_columns", 9,
                           "display.precision", 3,
                           "display.expand_frame_repr", False):
        print(final_df)
        print(averages)

    plt.figure(1, figsize=(50, 5))
    averages.unstack(level=0)["Duration"].plot()
    plt.xlabel("Starting Date")
    plt.ylabel("Project Duration")
    plt.title(f"Decommissioning duration per starting date, Number of Turbines: {f.n}")
drifting imp
#

Hi can I fix this red lines? Thanks!

fossil ivy
#

im not sure, have you tried using from keras.preprocessing.text import Tokenizer?

#

Like you import keras, so maybe the reference should also be keras only not tensorflow.keras

fossil ivy
#

from keras.preprocessing

#

not tensorflow.preprocessing

drifting imp
fossil ivy
#

so does it work?

drifting imp
keen notch
#

hey I have a python error was wondering if someone can help spot the problem

#
# function first, then main script for plotting
# YOUR CODE HERE
def fourier(values, order):
    n = order
    xDivideT = values
    solution = []
    for i in range(len(xDivideT)):
        summation = 0
        for  m in range(1, n+1):
            j = (2*m)-1
            k = np.sin(j*2*np.pi*xDivideT[i])
            sum = (1/j)* k
        result.append(4* np.pi * sum)
    return result

##plotting 3 curves
values = np.linspace(0,1,200)

fourierOne = fourier(values, 3)
fourierTwo = fourier(values, 11)
fourierThree = fourier(values, 40)

plt.plot(values,fourierOne, color = "blue" )
plt.plot(values,fourierTwo, color = "green" )
plt.plot(values,fourierThree, color = "red" )

plt.title("Fourier series approximation")
plt.show()```
wooden sail
keen notch
#

thank you🙈

wooden sail
#

similarly with summation and sum, btw

#

and if it's called sum(mation), you probably meant += as well?

drifting imp
fossil ivy
drifting imp
drifting imp
keen notch
#
import matplotlib.pyplot as plt
# function first, then main script for plotting
# YOUR CODE HERE
def fourier(values, order):
    n = order
    xDivideT = values
    solution = []
    for i in range(len(xDivideT)):
        summation = 0
        for  m in range(1, n+1):
            j = (2*m)-1
            k = np.sin(j*2*np.pi*xDivideT[i])
            summation = (1/j)* k
        solution.append(4* np.pi * summation)
    return solution

##plotting 3 curves
values = np.linspace(0,1,200)

fourierOne = fourier(values, 3)
fourierTwo = fourier(values, 11)
fourierThree = fourier(values, 40)

plt.plot(values,fourierOne, color = "blue" )
plt.plot(values,fourierTwo, color = "green" )
plt.plot(values,fourierThree, color = "red" )

plt.title("Fourier series approximation")
##plt.legend()
plt.show()```
#

the graph doesn't seem right

wooden sail
#

in the for loop here

for  m in range(1, n+1):
            j = (2*m)-1
            k = np.sin(j*2*np.pi*xDivideT[i])
            summation = (1/j)* k

i'm pretty sure you meant summation += something, otherwise there is no point in iterating

#

either that, or the append goes in the loop

#

what are you trying to do?

keen notch
#

I'm trying to write this instead of just one line I tried breaking it down

keen notch
#

it makes sense

wooden sail
#

right, so there's a summation 🙂 you need to += something

keen notch
#

i see i see thank you!!:)

#

looks good I think!!

#
import matplotlib.pyplot as plt
# function first, then main script for plotting
# YOUR CODE HERE
def fourier(values, order):
    n = order
    xDivideT = values
    solution = []
    for i in range(len(xDivideT)):
        summation = 0
        for  m in range(1, n+1):
            j = (2*m)-1
            k = np.sin(j*2*np.pi*xDivideT[i])
            summation += (1/j)* k
        solution.append(4* np.pi * summation)
    return solution

##plotting 3 curves
values = np.linspace(0,1,200)

fourierOne = fourier(values, 3)
fourierTwo = fourier(values, 11)
fourierThree = fourier(values, 40)

plt.plot(values,fourierOne, color = "blue" )
plt.plot(values,fourierTwo, color = "green" )
plt.plot(values,fourierThree, color = "red" )

plt.title("Fourier series approximation")
##plt.legend()
plt.show()```
#

hmm but still getting an assertion error

wooden sail
#

i have no idea about that, idk what the assert is trying to check

#

those are arbitrary constants, so presumably it'll only work for a specific set of signals

compact star
#

Does anyone know how I could do forward propogation from scratch in python for a convolutional layer? If possible could u show me how to vectorise it and then apply that whole process using numba

keen notch
wooden sail
#

forward propagation is simply the application of the model. now, regarding convolutions, there's a LOT of freedom

keen notch
#

I have another error for another code

wooden sail
#

you can apply convolutions by constructing (multi level) toeplitz matrices, by using built-in convolution functions, or by doing it in the frequency domain

keen notch
#
'''
One of the simplest change detect methods is the
online exponential filter, dating back to early radar applications.
Change detection means the comparison of each incoming value to the previous
value, see the detail and formula below.  If that numerical comparison of
the current value with the previous value exceeds a fixed threshold value then
an alarm is raised (or the location is stored as in this exercise). This
process can be implemented on a computer as a simple digital filter

The filter takes one data item after the other (online). The filter is
implemented in the function 'expofilter(prval, data, alpha).
The factor alpha is a gain factor or 'forgetfulness' factor,
quantifying how much influence on the filter previous data values should
have with values in the interval 0<=alpha<=1. Small alpha lead to hardly
any smoothing and the filter will react on any change in the signal very
sensitively while large alpha should show a clear change but react
little on noisy input.
'''
# YOUR CODE HERE

import numpy as np
import matplotlib.pyplot as plt

def expofilter(prval, data, alpha): 
    return alpha*prval + (1-alpha)*data # YR: no error in this line

def changeDetect(data, alpha, threshold): 
    previousvalue = data
    response = []
    change = []
    alarms = []
    for counter, val in enumerate(data):
        value = expofilter(previousvalue, val, alpha)
        print(value, previousvalue, threshold)
        if abs(value-previousvalue)>threshold:
            change.append(counter)
        response.append(abs(value-previousvalue))
    return np.array(response), np.array(change)

# Use case and testing; 
# YR: No error below this line, style changes as appropriate are possible.
tseries = np.random.randint(-4,4,100)
tseries[50] += 20
tseries[51] += 20
tseries[52] += 20
alarmlevel = 1
gainfactor = 0.85
resp, alarms = changeDetect(tseries, gainfactor, alarmlevel)


# plotting
plt.plot(resp)
plt.xlabel('time')
plt.ylabel('filter response')
plt.show()```
wooden sail
#

so, it looks like you expect previousvalue and value to be scalars, but they aren't in changeDetect, what is data supposed to be? presumably a numpy array

#

when you do previousvalue = data, you make previousvalue a numpy array too

#

then the operation if abs(previousvalue...) > something also returns a numpy array of booleans

#

if [numpy array of booleans] is ambiguous

#

but more importantly, i'm pretty sure you meant previousvalue to be a scalar in the first place 😛 rethink how you store the previous value

keen notch
wooden sail
#

looks like you're doing a bsc or msc in something that involves signal processing, control, electrical engineering, or something of the sort :x fun times

keen notch
wooden sail
#

heh. have fun!

keen notch
worn dome
#

Hello Everyone.

Does too many python creates an issue while installing tensorflow?
i.e.
I have 1. Anaconda (Python 3.8.8 64-bit)
2. Miniconda3 (Python 3.9.12 64-bit)
3. Python 3.10.8 (64-bit)

Will it create any problem for installing tensorflow and object detection?
Can anyone help me with installation for the same?

wooden sail
#

there shouldn't be a problem, just make sure you specify which interpreter you're installing it for

mild sorrel
#

hi i'm currently trying to recreate Michael Reeves' sentiment analysis thing for fun and wondering how i would do it since i know nothing about this topic

hasty mountain
#

It's easier to start with isolated words, that is, marking a sentence with "good words"("nice", "good", "welldone", "well-made") with positive emotions, and making the counterpart for negative emotions

obsidian belfry
#

hey, anyone of you knows a way to display the km scaling on cloropleth maps? (with plotly express)

simple fossil
#

Hello. Is there a way to make the following function faster py def calculate_distance_sklearn(target_representation: list, representations: pd.DataFrame): numpy_representations = np.array(representations["representation"].tolist()) numpy_target_rep = np.array([target_representation]) repr = cosine_similarity(numpy_representations, numpy_target_rep) representations["distance"] = repr.flatten() return representations

#

Currently, the function takes 5 seconds. The most expensive operation is .tolist() call. Any ideas on how to make it faster?

strong sedge
simple fossil
#

No, it fails with an error.

#
Traceback (most recent call last):
  File "vectorize.py", line 118, in <module>
    calculate_distance_sklearn_quick(target_rep, representations)
  File "D:\AI\website\api\utils\general.py", line 10, in wrap_func
    result = original_function(*args, **kwargs)
  File "vectorize.py", line 76, in calculate_distance_sklearn_quick
    repr = cosine_similarity(numpy_representations, numpy_target_rep)
  File "C:\Users\Martin\.conda\envs\tf\lib\site-packages\sklearn\metrics\pairwise.py", line 1377, in cosine_similarity
    X, Y = check_pairwise_arrays(X, Y)
  File "C:\Users\Martin\.conda\envs\tf\lib\site-packages\sklearn\metrics\pairwise.py", line 155, in check_pairwise_arrays
    X = check_array(
  File "C:\Users\Martin\.conda\envs\tf\lib\site-packages\sklearn\utils\validation.py", line 856, in check_array
    array = np.asarray(array, order=order, dtype=dtype)
ValueError: setting an array element with a sequence.```
tidal bough
simple fossil
tidal bough
#

What's the dtype of that column (representations["representation"].dtype)?

simple fossil
#

Object

#

Here is an image to clarify what I'm trying to do.

tidal bough
#

Oh, each row being a python list is annoying indeed. tolist shouldn't be necessary still, but converting this column of list into a 2d array will still take some time.

#

@simple fossilI tested a few approaches, and I'm getting that the fastest is this very straightforward one:

def f3(df):
    col = df.lists.values
    n,m = len(col), len(col[0])
    arr = np.empty((n,m),dtype=type(col[0][0]))
    for i,row in enumerate(col):
        arr[i,:] = row
    return arr
%timeit np.array(df.lists.tolist())
%timeit np.vstack(df.lists.apply(np.array).values)
%timeit f3(df)
1.32 ms ± 170 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
2.29 ms ± 439 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
1.15 ms ± 243 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
#

this is on this test dataframe:

import numpy as np
import pandas as pd
df = pd.DataFrame.from_dict({"lists":[[j+0.5 for j in range(i,i+100)] for i in range(100)]})
#

the reason this is fast is because it allocates an array of the final size right away, whereas solutions like vstack have to allocate arrays for each row first

simple fossil
#

Thank you. That did speed up. Now it takes 3.8 seconds.

#

Would there be a better way to do it that would be faster, like storing those values as a different type?

wooden sail
#

ah, reptile, now that you're around i wanna ask you a question. you happen to know a good estimator for the fundamental frequency of a signal consisting of a sinusoid and its harmonics?

drowsy cairn
#

Hey guys, I am perusing MS in DS. I have to choose a dataset for my course project, related to unsupervised learning (clustering). Can any of you guys help me out with the dataset.

serene scaffold
drowsy cairn
serene scaffold
#

I wouldn't make it more challenging than your instructor intended tbh. a university course should introduce new concepts in a deliberate order. Unless you have prior data science experience, I'd focus on the specific goals of the assignment.

drowsy cairn
young granite
#

if i got a df like this:

     ID comp 1  amount 1 comp 2  amount 2 comp 3  amount 3 comp 4  amount 4
0  772D    D45       0.5    U45       0.3    T45       0.2    NaN       NaN
1  223P    D54       0.5    U54       0.5    NaN       NaN    NaN       NaN
2  212E    D45       0.6    U55       0.1    I23       0.2    Z23       0.1```
and i want to transpose it in such a way that the values in "comp x" are cols what would be a good approach maybe a dict?
serene scaffold
young granite
serene scaffold
#

actually, hmm

#

so basically, you want columns that are (ID, comp, amount)

young granite
#

no

#

i want ID, comp values as cols

#

so i tried to get all unique values from each comp and then attach the right amount values 🗿

serene scaffold
#

I'm in a meeting, so I can't focus in, unfortunately

young granite
#

its alright maybe later u find some time or someone else

#

thanks nevertheless

pure plover
#

Trying to subtract a portion of one dataframe for another (background subtraction - timeseries). I want to just subtract the reading values but I don't want to subtract the time columns. The headers are: {'Time [s]': [0.0, 60.015, 120.03, 180.048, 240.048], 'A1': [328, 394, 452, 515, 577], 'A2': [299, 360, 416, 472, 524], 'A3': [685, 826, 952, 1118, 1209], 'A4': [631, 768, 898, 1034, 1154], 'A5': [1420, 1689, 1956, 2236, 2460], 'A6': [1475, 1797, 2093, 2391, 2601], 'A7': [2231, 2569, 2935, 3262, 3588], 'A8': [2426, 2799, 3185, 3579, 3924]}

#

The header of the dataframe I would like to subtract from that data is: {'Time [s]': [0.0, 60.015, 120.03, 180.048, 240.048], 'A1': [84, 80, 79, 82, 79], 'A2': [167, 162, 154, 154, 155], 'A3': [330, 283, 280, 279, 281], 'A4': [256, 246, 248, 246, 246], 'A5': [545, 543, 557, 548, 545], 'A6': [563, 566, 552, 565, 576], 'A7': [1075, 1025, 1025, 1027, 1033], 'A8': [969, 974, 997, 996, 980]}

#

The purpose is background subtraction

tidal bough
torn monolith
#

def list_of_case(self):
lisx=[]
for gulty in self.bomb:
lisx.append(gulty['category'])
return lisx

#

bro i made a class object and i allways get a error from for loop

#

can you help me ?

serene scaffold
#

And remember to never say that you "get an error" without copying and pasting the whole error message into the chat. No one knows what the error is except you.

wooden sail
compact star
#

I am trying to implement forward and backward propagation for a convolutional layer:

I currently have this as my class for that layer and I was wondering how I could create a function that uses guvectorize as it said in the page linked that it can be used for that.

Any help would be appreciated

class ConvolutionalLayer(Layer):
    def __init__(self, input_shape, stride, kernel_size, number_of_filters):
        self.input_depth, self.input_height, self.input_width = input_shape
        self.number_of_filters = number_of_filters
        self.input_shape = input_shape
        self.stride = stride

        self.output_shape = (number_of_filters, (self.input_height - kernel_size) // self.stride + 1, (self.input_width - kernel_size) // self.stride + 1)
        self.kernel_shape = (number_of_filters, kernel_size, kernel_size)
        self.kernels = np.random.randn(*self.kernel_shape)
        self.biases = np.random.randn(*self.output_shape)

https://numba.pydata.org/numba-doc/latest/user/vectorize.html#

desert oar
#

doesn't make sense to put that on the gpu even if you could do it sensibly

#

nor does vectorizing over input shapes make a lot of sense

#

if you wrote your own forward and backward pass implementations using numpy arrays and python loops, you could guvectorize those functions

#

then call those vectorized functions from the methods on the class

#

numba does support jit compiling entire classes but i don't think it supports vectorizing methods on those classes

compact star
#

ah, right thanks for that, I have now written a forward pass like this

def forward_propogation(self, a):
        self.input = a
        self.output = np.copy(self.biases)

        for i, channel in enumerate(self.input):
            for j, kernel in enumerate(self.kernels):
                self.output[i] = np.add(self.output[i], convolve2d(channel, kernel, self.stride, self.output_shape[1:]))
#

with the convolve2d function looking like this

def convolve2d(image, kernel, stride, output_shape):
    kernel = flipud(fliplr(kernel))
    kernel_size = kernel.shape[0]
    
    output = zeros(output_shape)
    for y in range(image.shape[1]):
        if y > image.shape[1] - kernel_size:
            break
        
        if y % stride == 0:
            for x in range(image.shape[0]):
                # Go to next row once kernel is out of bounds
                if x > image.shape[0] - kernel_size:
                    break
                try:
                    # Only Convolve if x has moved by the specified Strides
                    if x % strides == 0:
                        output[x, y] = (kernel * image[x: x + kernel_size, y: y + kernel_size]).sum()
                except:
                    break

    return output
#

How would I use guvectorize for this?

desert oar
#

this is a good candidate for numba because it uses only python primitives and numba primitives

#

(i would start with @vectorize to test and debug on cpu first)

#

that said, i think this is already vectorized

#

unless you are trying to vectorize this over multiple images

#

i'm not sure how to specify what to "vectorize over" specifically

#

oh i see, there's the "layout" spec (n),()->(n)

#

you should be able to just guvectorize this as-written. if you want to vectorize over "stacks" of images, you need a for loop over those as well, and you need to modify your layout spec accordingly

fluid spindle
#

Hey, can I remove a feature from a Pandas framework without affecting the source dataset?

gaunt anvil
#

using tacotron2, how do I convert model.inference to output a mel spectrogram so I can run it thru a different vocoder?

fluid spindle
#

Sadly only matplotlib and seaborn are allowed, also I am unfamiliar to PyTorch yet

compact star
desert oar
#

!d pandas.DataFrame.drop

arctic wedgeBOT
#

DataFrame.drop(labels=None, *, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')```
Drop specified labels from rows or columns.

Remove rows or columns by specifying label names and corresponding axis, or by specifying directly index or column names. When using a multi-index, labels on different levels can be removed by specifying the level. See the user guide <advanced.shown\_levels> for more information about the now unused levels.
desert oar
fluid spindle
#

Thanks and sorey for being stupid

desert oar
knotty crystal
#

I have some questions on Reinforcement Learning, is there someone available to assist ?

gaunt anvil
#

Hi does anyone know what this error means? Using https://github.com/rishikksh20/HiFi-GAN on commit 7c049f9

Traceback (most recent call last):
  File "/home/user/HiFi-GAN/utils/train.py", line 87, in train
    step)
  File "/home/user/HiFi-GAN/utils/validation.py", line 25, in validate
    sc_loss, mag_loss = stft_loss(fake_audio[:, :, :audio.size(2)].squeeze(1), audio.squeeze(1))
  File "/home/user/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/user/HiFi-GAN/utils/stft_loss.py", line 130, in forward
    sc_l, mag_l = f(x, y)
  File "/home/user/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/user/HiFi-GAN/utils/stft_loss.py", line 89, in forward
    x_mag = stft(x, self.fft_size, self.shift_size, self.win_length, self.window)
  File "/home/user/HiFi-GAN/utils/stft_loss.py", line 23, in stft
    x_stft = torch.stft(x, fft_size, hop_size, win_length, window)
  File "/home/user/.local/lib/python3.6/site-packages/torch/functional.py", line 573, in stft
    normalized, onesided, return_complex)
RuntimeError: stft input and window must be on the same device but got self on cuda:0 and window on cpu```
#

how would i fix this error?

dusty valve
#

got this while training keras sequential model 2022-11-07 17:43:04.428412: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 504022000 exceeds 10% of free system memory.

#
model = Sequential((
    layers.Embedding(len(char2idx.items()), 256),
    layers.LSTM(128, activation='relu'),
    layers.Dense(len(char2idx.items()), activation='softmax')
))```
#

never got that before

#

nvm fixed

#

browser was taking up 6 gigs

upbeat dagger
#

Question for you all. Zip codes are categorical data stored as numbers right?

If I didn't have a column name or data dictionary to tell me, how might I test if I'm looking at categorical data and not a numeric data?

turbid wolf
hasty mountain
steady basalt
#

man im so damn rusty py_guido

#

time to get back into the swing

#

thinking kaggle

soft badge
#

What is best course, introdution to machine learning?

#

Or site for learn?

steady basalt
#

id just google about read some pages people wrote

#

coursera do paid courses

novel python
#

I'm getting "AttributeError: 'Series' object has no attribute 'columns'", but every item in my list is a dataframe with the attribute columns. When I access them individually it works, but when I try accessing it in a for loop it gives me this error.

steady basalt
#

i probably wont pay for another ML course in my life

steady basalt
#

each item is a DF object?

#

and for x in list, x.columns doenst work?

novel python
#

it doesn't, and I can't really tell why since I can access them outside the for loop

steady basalt
#

show loop and list

#

and how u called the objects

novel python
#

List is:

                jan_feb_average_0, 
                jan_feb_average_1, 
                jan_feb_average_2, 
                jan_feb_average_3, 
                jan_feb_average_4, 
                jan_feb_average_5, 
                jan_feb_average_6, 
                jan_feb_average_7, 
                jan_feb_average_8, 
                jan_feb_average_9, 
                jan_feb_average_10
                ]```

And I'm trying to access a particular column for every dataframe in that list with:

```for items in jan_feb_items:
    print(f"{items.columns[8]} amount of predictions within 0.25GB: ", len(items[abs(items[items.columns[8]]) <= 0.25]), len(items[abs(items[items.columns[8]]) <= 0.25])/len(items))```
#

if I do "jan_feb_average_0.columns[8]" it works

steady basalt
#

and how doesnt it work?

#

without the [8]?

novel python
#

inside the for loop

steady basalt
#

try just printing part?

novel python
#

it works completely fine

#

I could create a print statement for each object in that list, I just thought it'd be completely inneficient

steady basalt
#

yuh i think its cause ur finding the len of something within the object

novel python
#

since the for loop should work fine

steady basalt
#

ur doing items items

#

so i think its trying to get series

#

len(items[abs(items[items.

#

must be that right

#

ok

novel python
#

hmm let me check

steady basalt
#

ur calling items p

#

items [ .. ... .. <<< thats a series

#

cause its party of items

#

rmemeber when u call a df column/series u do that too

serene scaffold
steady basalt
#

this is a bit true but the actual issue here is ur doing 'items[' which is inherently a series once u do that

#

if memoery serves corerclty

#

anyway dict is usally better

#

but ive in hte past for some reason also done a list, cant recall why

novel python
serene scaffold
#

if you do print(jan_feb_average_0.head().to_dict('list')), show the text (no screenshots), and explain what you're trying to do without any code, I will try to help.

steady basalt
#

makes sense, altho cud be better to dictionary th em

#

the error tho is because or ur syntax ur trying to find the columns of a series, whhere you did items[ you basically said series

#

ud need to do items.columns only

novel python
# serene scaffold if you do `print(jan_feb_average_0.head().to_dict('list'))`, show the text (no s...

{'LINE__R.CARRIER_DEVICE_CATEGORY_FORMULA__C': ['Tablet', 'Tablet', 'Tablet', 'Tablet', 'Aircard'], 'January': [0.314, 0.544, 0.0, 0.0, 0.0], 'February': [0.045, 0.685, 0.0, 0.0, 0.0], 'March': [0.0, 0.389, 0.0, 0.0, 0.0], 'Variance': [0.018090250000000002, 0.004970250000000001, 0.0, 0.0, 0.0], 'Standard Deviation': [0.1345, 0.07050000000000001, 0.0, 0.0, 0.0], 'Average': [0.1795, 0.6145, 0.0, 0.0, 0.0], 'Last Month Model Prediction': [0.045, 0.685, 0.0, 0.0, 0.0], 'Last Month Model Difference': [0.045, 0.29600000000000004, 0.0, 0.0, 0.0]}

All I'm trying to do is access the 8th column of each of these dataframes and check how many values of that column are below a certain threshold. The other columns are completely useless for that analysis. So jan_feb_average_1, 2, 3, 4, etc. are the same shape, just with different values based on their average

novel python
steady basalt
#

if u know the column name, just specify items[column name]

#

and then theres some syntax ive long forgotten to check values below threshold but its easily googlable

#

for x in the list, get values under threshold of x

#

u wudnt need to make series like you did

serene scaffold
steady basalt
#

well, you would but then u wudnt use columns

serene scaffold
#

le means "less than or equal"

steady basalt
#

id actually specify a series as you did

steady basalt
#

that ought to work and print them all

serene scaffold
steady basalt
#

yes your soluition is extremely pandonic

#

i didnt even know about le()

serene scaffold
#

there's eq, ne, lt, gt, le, and ge

#

so they all match the dunder methods

steady basalt
#

niced

serene scaffold
#

@novel python it looks to me like your problem is that you were trying to get "the 8th column" when your columns are indexed by strings. in which case you need .iloc to do position-based indexing.

#

I guess that also means the solution might be df.iloc[:, 7].le(.25).sum(), depending on whether you consider the leftmost column to be 1st or 0th.

steady basalt
#

God I hope I’ll be allowed to install Linux at my new company I’ve had it up to here with windows cmd

steady basalt
#

Willing to learn Linux properly to dodge windows at this point

serene scaffold
steady basalt
#

It’s not just for the command line interface itts the os in general

#

Felt kinda nice using a Linux cluster

#

Simple and good

#

Makes environments easier

novel python
serene scaffold
#

companies usually won't let you install linux on a company issued computer, unless they have a specific linux image that they maintain with all the security stuff they want.

steady basalt
#

Cringe

serene scaffold
steady basalt
#

I have hell awaiting then

serene scaffold
steady basalt
#

Wonder if I can sneak work on my personal mac and copy paste the end project back

#

Prob not allowed

serene scaffold
#

have fun getting fired

steady basalt
#

True

serene scaffold
#

you can't pick between windows and mac?

steady basalt
#

No I beleive they all use Lenovo

#

Like most companies

#

Gona have to wait and see

serene scaffold
#

my company lets you pick between a macbook pro and some Dell model

steady basalt
#

Damn that’s amazing

serene scaffold
#

but no matter which you pick, it comes with a specific image of that operating system with tons of security crap on it

steady basalt
#

U know it’s quite funny how they’ve titled me DS consultant but I’ll be working on coding projects for them like some data pipelines and possible ml. normal?

#

That sounds like two roles in one job!

serene scaffold
#

job titles in the data science sphere are kind of arbitrary

serene scaffold
#

we don't use mcafee, no

steady basalt
#

My current company didn’t even try to counter when I handed notice

#

Was strange, guess they don’t care

#

I don’t got much experience w these things

serene scaffold
#

I've heard it's usually a bad idea to accept a counter offer anyway

steady basalt
#

Why

serene scaffold
#

because you've already revealed that you wanted to leave and were about to, which is a negative signal.

steady basalt
#

True, and they have acted weird about it too

#

I actively warned them I was interviewing and I was still not transparent enough apparently

#

I gave them the courtesy of notice instead of insta raise and changing to remote working so I wasn’t expecting to get any stick

#

Not to mention I hadn’t signed shit, so it shud go both ways and yet the manager still had to double check if I deserve holidays

desert oar
# steady basalt U know it’s quite funny how they’ve titled me DS consultant but I’ll be working ...

completely. data scientist is a more desired job title, but most companies need data engineers more than they need data scientists. moreover, actually good data engineers are in extreme demand, hard to find, and expensive. hence you have a lot of companies trying to pass of data engineering jobs as data science jobs, and a lot of candidates wanting to get into data science trying to hack it as under-qualified data engineers.

steady basalt
#

I’d honestly not mind following and asking later to move over to DE, as I don’t beleive DS is inherently better

#

Nor should it be desired more

desert oar
#

you'll learn a lot either way. and being a good data engineer nowadays is always going to be an asset as a data science candidate

steady basalt
#

Yeah, but the fact that I’m also “consultant” adds extra workload ??

desert oar
#

btw regarding counteroffers, it works both ways: a company isn't going to want to throw extra money at someone they know is dissatisfied and going to leave anyway

steady basalt
#

Was curious why it’s in the title if I’ll be doing technical work - makes it look to other employers that I did only meetings and presentations

steady basalt
#

Was kinda a bit rude tbh given we do more than agreed to intusll

desert oar
steady basalt
#

Initially*

steady basalt
desert oar
#

there are plenty of actually technical consultants

#

it more describes the nature of the employment relationship than anything

steady basalt
#

It does confuse me…

#

I havnt used wsl

#

I could try to adapt and run things on cloud more

#

I also made it my mission to learn some etl tools

#

There’s a book pdf teaching a few

#

They’re a bit annoying sometimes tho

desert oar
#

it's pretty important to know docker nowadays anyway

#

you don't even need a dockerfile, you can just do docker run -it ubuntu:latest /bin/bash and you're in

steady basalt
#

Spark, Kafka, airflow

#

Etc , all things I shud really know at this point

steady basalt
#

My current role hasn’t rly required too much, and half of my queries have been like 5 left joins cause it’s a pretty late stage database

#

Left join x left join y left join z lol

#

One query

#

Bad practise?

#

Largely for fetching info for colleagues

desert oar
# steady basalt Spark, Kafka, airflow
  • spark is a huge can of worms, i honestly wouldn't worry about it. it takes a lot of infrastructure and tuning to run a cluster of JVM applications. you can start learning pyspark in a "local" cluster running on your own machine, but frankly it's becoming less relevant as purpose-built data warehouse tools catch up in their functionality, and most companies realize that they really do not need a computing cluster for their daily work.

  • kafka, kind of a specific tool. not necessary unless you are working with really high-volume data, and in that case you'll likely be able to collaborate with software engineers.

  • airflow is great to know. learn it, spend time practicing building pipelines with it

however, i think your highest priorities should be:

  • sql. get really good. if you think doing joins is good, you are a sql baby. learn about all different kinds of joins. convince yourself that an inner join is just a filtered cross join, which is a synonym for a cartesian product. learn about window functions and lateral joins, and understand when and why you want to use them. set up your own postgres server, populate it with more data than you're comfortable with (a couple tables of a few hundred million rows). learn about indexes and query plans. understand relational algebra.

  • excel. yes, fucking excel (or google sheets). learn about array functions, vlookup/hlookup/xlookup/index-match, and named ranges. try to make some sense of its internal data type system, especially how dates and percentages are stored. understand the difference between "3" and 3 in a cell.

  • try to make some kind of dashboardy thing somehow. there are a million tools for this, some more "industrial-strength" than others. data viz is never not important.

  • docker (as above)

  • airflow (as above)

#
  • oh also: dbt. dbt is a great tool. learn it before or after airflow, but not before docker. it integrates pretty well with airflow, so they will be a good pair.
serene scaffold
desert oar
#

finally, spend some time messing with aws s3 or google cloud storage. just practice uploading and downloading data with python and the command line. you can set up an ec2 or gcp instance as well and try to get comfortable poking around in a command line linux environment (figure out how to make your own ssh keys and connect using ssh; change the ssh daemon port and disable password auth to make sure you did it right). this won't take very long unless you're a total raw noob, in which case you should learn anyway. this you can do a little over time.

desert oar
steady basalt
#

Good advice in airflow and lesser spark, I’ve already done s3 based dashboaridng with Boto3

#

What can I do with airflow to make life easier

#

As for spark everyone keeps telling me to learn it but I get frustrated with it

#

Sql is a no brainier to improve at he’s

#

I used ec2 to host the dashboard refreshes

desert oar
desert oar
steady basalt
#

I was until I taught myself in the office last second

#

But I used boto3 to send the data w sql, no need for airflow for simple dash

#

Cause I used cron

desert oar
# steady basalt As for spark everyone keeps telling me to learn it but I get frustrated with it

feel free to ask for help here. the key thing to remember about spark is that operations on rdds and dataframes are not executed right away. it builds up a chain or sequence of computations, and executes them all at once when you collect the data or perform an aggregating operation. it's a lot like programmatically building up a big sql query, and then running it at the end.

also don't even bother with scala spark. pyspark is good enough for most things.

desert oar
#

and if you have multiple steps in the pipeline, practice by breaking it up into separate airflow tasks

steady basalt
#

So run airflow on Linux?

#

I’m sure it’s straightforward but I didn’t try yet

#

As for pyspark yes I don’t know scala, but even the syntax annoys me

desert oar
steady basalt
desert oar
desert oar
#

or do something similar

steady basalt
desert oar
steady basalt
#

I had to import integer and string

desert oar
# steady basalt So run airflow on Linux?

airflow homework:

task 1: run script that writes data to a directory

task 2: load data in directory into db table

task 3: apply some data transformation that creates a different table

task 4: do some data quality checks

DAG:

T1 - T2 - T3
        \ T4
steady basalt
#

I will try this tomororw

desert oar
steady basalt
#

By that script writing data, did you mean creating a new csv

desert oar
steady basalt
#

I have plenty of data available

#

Ok

desert oar
#

oh here's another good skill to practice: simulating fake data

steady basalt
#

I have Athena as a source for downloading data

desert oar
#

its ridiculous how many things we have to know and be good at

steady basalt
#

True words

#

I’m still reading maths in my spare hours

desert oar
#

i've been learning new things every week for like 10 years and i still feel like a novice at a lot of things

#

you're doing the right things

steady basalt
#

I’m on a 1000 page precalc book cause catch-up innit

#

Who knew sets in python functions are in maths

desert oar
#

heh, that's where the name "set" comes from!

#

you want to catch up quickly? watch the 3blue1brown calculus and linear algebra series

steady basalt
#

I noticed today in one book a potential mistake in the example ?

#

Oh , no, calculus after precalc no linalg as I’d be busy for a year

#

Anyway

#

Say you have

#

Man I’ll just find it

#

Ok so it’s math syntax here, ) and ] being exclusive and inclusive of the set

iron basalt
#

*Unlike learning frameworks / random software tools, math is deep knowledge, it will not be outdated.

steady basalt
iron basalt
#

Software tools come and go as they fall in and out of fashion.

steady basalt
#

I know it’s basic shit but how is 3 included in the intersect?

#

I thought 3) meant it won’t be

#

A n B

#

Therefore 3 isn’t in A?

#

Oh my bad they wrote with ()

#

Bad memory, read that at like 2pm

desert oar
# steady basalt

this is good stuff. authors in calculus etc. will take it for granted that you know what these things are

#

fortunately you'll find that it all becomes very natural. it's a language that you will gain fluency with, much like programming.

iron basalt
desert oar
#

yep. or at least get better at piecing ideas to together to produce nontrivial outputs

#

rather than just relying on what other people have done already

iron basalt
#

And also start to see how many are the same thing but with different paint (abstract algebra).

#

(Also like with programming languages)

desert oar
#

i never ever did well with combinatorial stuff, and we were doing algebra with permutations early in the course, and that kinda threw me off for the whole semester

#

linear algebra is also like that

steady basalt
#

I somewhat enjoy what I’m learning now and I know Linalg,probability and more pure stats should be important im not planning on it any time soon as I think precalc and calc are more enjoyable

desert oar
#

it's wild to see how many things can be reduced to linear algebra problems (including calculus problems)

desert oar
steady basalt
#

I think it will take me at least a year

iron basalt
steady basalt
#

Because I want to demolish the calc preface test in my book

#

And that requires some good precalc skills plus algebra

#

Even down to simple rules

#

It’s been years

#

I realised I was getting ahead of myself so step back and went back to basics

desert oar
iron basalt
#

Some are fine with, but those probably are self taught (obsessed), and have probably even studied it before reaching that class.

desert oar
#

and i did great in topology. again, stuff i could visualize

steady basalt
#

What u guys doing in DS, u could be researchers ?

desert oar
#

the problem i had w/ algebra is that it's the first time you've seen some really general abstract shit, and i don't think it was all too well motivated

steady basalt
#

Topology sounds very scary

desert oar
iron basalt
#

Linear algebra is usually fine still, specifically because it has so many directly understandable applications.

desert oar
#

i have used abstract algebra concepts exactly 0 times in data science, but plenty of times when learning functional programming. who knew?

steady basalt
#

Functional programming, seriously?

#

All u do is split up a script

iron basalt
#

"A monad is just a monoid in the category of endofunctors"

desert oar
desert oar
steady basalt
#

I’m not bother to move on from python any time soon anyways

desert oar
#

it's not "breaking your code up into functions"

#

yeah, honestly don't even bother. it's something you'll come across eventually

#

focus on the stuff you actually need to focus on

steady basalt
#

I was briefly interested in c and js

iron basalt
#

BUT, you still need to know some of them to do anything.

steady basalt
#

Man I’m so shocked my new role didn’t have technical round w coding or theory, they just looked at last work

desert oar
#

i would argue that sql is becoming pretty much timeless, and that the underlying concepts will never go away even if the query interfaces do

iron basalt
#

Relational databases yes, SQL, IDK, I hope not.

desert oar
#

it's held on for 50 years already

iron basalt
#

Yea and so has C and I hope it goes away too.

#

I don't think we will be programming in C in 100 years.

steady basalt
#

Isn’t C so legacy it can never go?

iron basalt
#

If we are then I will be VERY sad.

serene scaffold
iron basalt
#

Then there is Zig, which is designed to slowly delete C.

steady basalt
#

I wonder what happens once we get quantum computers in users hands

#

I heard they’re very fast

#

I know nothing about it though

iron basalt
#

Quantum computers give speedups to specific problems, but also we just don't have enough q-bits.

#

And probably never will.

warm verge
#

Lotta speculation

iron basalt
#

But countries like to flex their quantum computers because the word "quantum".

steady basalt
#

What do you mean by not Enough a bits

serene scaffold
iron basalt
#

What we can expect is something like neuromorphic processors. Which are way more energy efficient, but the tradeoff is giving up Vom Neumann architecture.

steady basalt
#

Whilst we speculate on future technology, do we think AR visors will become mainstream once you get a irl HUD with really sleek implementation and interface

desert oar
#

i suspect that militaries and intelligence orgs will continue to invest in quantum stuff for crypto breaking

steady basalt
#

And will people buy apple or Meta

serene scaffold
iron basalt
#

If you look at how much energy we need to use compared to the brain, to do less, there is huge room for improvement.

steady basalt
#

Sucks because I bag hold @serene scaffold

mild dirge
#

I buy apples all the time

serene scaffold
#

bag hold?

steady basalt
#

Their stock is dead

desert oar
iron basalt
#

And unlike quantum we don't need to guess, we have brains, they exist and work.

#

Proof of concept biologically.

serene scaffold
#

but do we even have free will?

steady basalt
#

Hell why can’t we just be immortal with synthetic organs, what stops death by age

#

No heart stopping means no dead

iron basalt
serene scaffold
#

you have organs that need to work other than your heart.

steady basalt
#

And lungs and pancreas

#

Just the brain really.::

iron basalt
#

Can't ask if we have X, before X is clearly defined.

steady basalt
#

So when we have essentially all organs but the brain synehtic, can we live 1000 years

#

Or some sort of drug that stops aging and no need for surgery

serene scaffold
# iron basalt Can't ask if we have X, before X is clearly defined.

that reminds me of the time that someone told me that Mormonism's truth claims were ridiculous as compared to protestant Christianity, and I told them that you can't say that without first assuming that protestant Christianity is the baseline for religious normality.

iron basalt
warm verge
steady basalt
#

Mmm philosophy, how you can tell it’s 2am

desert oar
#

differences vs. absolute positions

steady basalt
#

Now, how do you know you’re not an artificial neural network

iron basalt
#

Yes, because otherwise we could be arguing about whether two different things exist or not (your definition vs mine).

serene scaffold
steady basalt
#

I think it’s a universal catastrophe that humans die

iron basalt
# warm verge Kinda reductionist no?

Yes, it's trying to be pragmatic about it. No agreed upon definition is an endless debate for philosophers that will never reach any conclusion. Which is why there will never be a conclusion about "free will" and everyone loves to "debate" about it.

steady basalt
#

We’re so cool

warm verge
iron basalt
#

Don't need complete, just a majority, even just a good chunk.

steady basalt
#

Am I a Boltzmann brain?

#

First answer is truth

warm verge
iron basalt
#

And between those chunks, they debate forever in circles, or just "agree to disagree".

iron basalt
warm verge
#

Fairly reductionist tbqh

serene scaffold
#

btw, I'm allowing this discussion because I trust that all participants will make this channel available for data science discussion if someone has a question.

steady basalt
#

Could a ANN act as a brain in a vat

warm verge
#

No

#

Yes

#

Maybe

steady basalt
#

I think yes

iron basalt
steady basalt
#

(Non physical or printed, actually in binary)

iron basalt
#

Yes.

steady basalt
#

I concur, wonder if I am one

#

Unlikely

warm verge
#

What is the function of it

steady basalt
#

wdym?

warm verge
#

What behaviour is the NN replicating of a "brain"

steady basalt
#

i suppose the exact same as a real nn

#

stimulated and given data?

warm verge
#

For what output

steady basalt
#

why output? did we not learn visually/auditory or whatever

#

ah, you mean prediction?

#

yeah not like that

iron basalt
desert oar
#

interval vs. ratio scale

austere swift
#

the subject of that question doesn't really make sense in the first place anyways, it's pretty well understood that we don't consider humans as artificial intelligence so I don't see why the question is relevant.

iron basalt
#

*And interpretation. If just reading the words for the common definitions of artificial, one could see it in that way. But the written definitions are not how "artificial" is commonly understood (depending on group and context), but also that it can't really be written down as one singular thing (in one sentence context-free (could try to enumerate as many cases as possible)).

rugged comet
#

Does it make sense to use L2 regularization in non-linear models?

desert oar
#

dropout is also practically a form of regularization in NNs

rugged comet
#

I'm combining L2 reg and dropout in my MtG color-prediction model.

desert oar
#

afaik thats very common

rugged comet
#

Alright. Sounds good. Thanks.

desert oar
#

L1 vs L2 matters a lot in linear models where you maybe want to do explicit feature selection

#

and in other contexts as well like where the model has no exact solution

#

they also have different interpretations in a bayesian statistics framework

#

there's also "elastic net" regularization which basically a weighted mix of L1 and L2

#

the weighting parameter (between 0 and 1) essentially becomes a tunable hyperparameter, in addition to the regularization strength itself

rugged comet
#

Have you ever used keras_tuner for tuning hyperparameters? I forget if you're a tensorflow/keras person or not.

desert oar
#

i have not, im not great with either NN framework but im better with pytorch. i knew a bit of tensorflow the 1.x era and forgot all of it. i only know keras from reading docs & its broad similarity w/ pytorch in some aspects

#

i've done hyperparameter tuning with a handful of "black box" optimizers though, so maybe i can offer general advice

#

sad, they don't have halving search

#

it's in scikit-learn, you could probably write our own keras tuner class for it if you wanted

#

i've had really excellent results with it, better model performance and substantially faster than bayesian optimization in a couple of cases

rugged comet
#

What is Bayesian optimization?

desert oar
#

it's a category of techniques for "black box" optimization, meaning trying to find the maximum of a completely unknown function for which you can only test individual points

rugged comet
#

Would the individual points be the hyperparameters?

desert oar
#

the individual points would be locations in the hyperparameter space + the model performance at those locations

#

so if you're searching over l1 and l2 optimization parameters, you are trying to optimize a completely unknown function of 2 inputs

#

the "bayesian" part is that you start with a prior distribution over some space of possible functions, and iteratively update your prior until you have a better and better estimate of the function

#

the most common technique for bayesian optimization is to use something called a "gaussian process"

#

there is also the Hyperband algorithm, which is based on the "multi-armed bandit" model

#

both Bayesian GP and Hyperband are listed in the keras docs here

rugged comet
#

Can you talk a little bit about hyperparameter space or as keras calls it I believe, search space?

desert oar
#

yeah, it's the search space for hyperparameters 🙂

#

have you ever done any "formal" math, maybe with sets and proofs?

rugged comet
#

I have not.

desert oar
#

have you heard of a "set" in math?

#

abstractly, it's just a collection of things.

(technically if you define it that way, you end up with an interesting logical paradox. look up russell's paradox if you're curious about that one.)

#

but if you think of such a collection as the complete collection of something, it's natural to interpret a set as its own little world. a "space" if you will.

#

so for example we might talk about the space of real numbers between 0 and 1

#

it's a self-contained universe, with some known rules and properties attached to it

#

there are many different specific kinds of spaces (e.g. vector spaces which are important in machine learning), but the concept is what i'm trying to illustrate here. the idea that a "space" is "a specific collection of things" and "some mathematical operations or properties" that define the space.

#

let's say you're fitting your neural network, and you're optimizing over the L2 regularization parameter and the dropout probability. you might say that the L2 parameters λ exist in one space, and the dropout probabilities p exist in another space. and if you take all possible pairs of L2 parameters and dropout probabilities, i.e. pairs of the form (λ, p), then you have defined a new space, consisting of all such pairs.

#

usually you wouldn't spend too much time studying the idea of spaces in general, but you'd start working with different kinds of spaces in math courses and eventually build up a general concept of "a space". and that's what we meant when we talk about "feature space" or "hyperparameter search space".

#

(although "feature space" is usually also specifically a vector space)

rugged comet
#

It sounds like the search space in this context would be all the combinations of the possible values for the hyperparameters. The search space would be the collection of those combinations over which we search.

#

We can define the search space to limit the number of combinations.

#

How does one decide which search algorithm to use?

desert oar
desert oar
gaunt anvil
#

also window is a tensor object .-.

void bone
#

Hello, does anyone have an idea on how to use adaboost for multi class classification. Do I split the dataset into two and categorize ech group as 1 and 0?
Also I'm new to machine learning and I'm kinda struggling with stuffs like how to prepare datasets for machine learning models,and applying models for multi class and multilabel classifications.I would appreciate any helpful links.Thanks

gaunt anvil
ruby depot
#
ax1.plot(goog_data_signal.loc[goog_data_signal.poisitions==1.0].index, 
    goog_data_signal.price[goog_data_signal.positions==1.0],"^",markersize=5,color="M")

Can someone explain to me why i' have to write the code two times? if i already set the condition before why i have to do it again

rugged comet
ruby depot
#
ax1.plot(goog_data_signal.price[goog_data_signal.positions==1.0],"^",markersize=5,color="M")

I was asking why it can't be like this, why do i have to search for the index when positions is equal to 1 if i could do it directly with the code marked below? obviusly there is smth that i'm missing here

hazy hare
#

THE train had many loss 😦

rugged comet
#

Sorry to hear that.

desert oar
lapis sequoia
#

There's more than this but thats usually path I have followed and have seen people following.

#

Gives you knowledge and you get on with things step by step, and ofc there's no rule of thumb to follow it at all.

young granite
#

if i got a df like this:

     ID comp 1  amount 1 comp 2  amount 2 comp 3  amount 3 comp 4  amount 4
0  772D    D45       0.5    U45       0.3    T45       0.2    NaN       NaN
1  223P    D54       0.5    U54       0.5    NaN       NaN    NaN       NaN
2  212E    D45       0.6    U55       0.1    I23       0.2    Z23       0.1```
and i want to transpose it in such a way that the unique values in "comp x" are cols and the amounts are the values of the cols
#

i tried it rowise with .iloc and with a for/if loops as well as with .melt and .stack

rugged comet
lapis sequoia
rugged comet
#

Yeah not like searching a list or graph. It's somewhat different I think. Though some of the rules probably still apply.

lapis sequoia
#

Or perhaps give an example. a small one.

young granite
#

ok let me just do it by hand real quick

#
    ID  D45  D54  U45  U54  U55  T45  I23  Z23
0  772D  0.5  0.0  0.3  0.0  0.0  0.2  0.0  0.0
1  223P  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
2  212E  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN```
#

@lapis sequoia

gray dove
#

Has anyone here messed with conditional GANS for image generation before?

#

I am currently ankle-deep in it trying to work with an implementation I found that supposedly checks the boxes when it comes to best practices, but it just seems to skirt the edge of being useful and getting mode collapse

versed gulch
#

does anyone know how to put titiles/ labels on the row of the images

#

i.e. i want to put ground-truth and filtered on the first and second row of these images

lapis sequoia
#

Hello guys, I am working on building a recommendation system for an online store. I got a list of of users, and a list of items. Was thinking to rank each item for each user based on the frequency a user has bought that Item.

I am following this tutorial for my implementation: https://realpython.com/build-recommendation-engine-collaborative-filtering/#when-can-collaborative-filtering-be-used

My question is: How will my model learn from newly generated data ?(after the train)

The steps I have in mind are:

  1. Organise the data.
  2. Train the model.
  3. Make an api request to get recommendations for a specific user.
  4. ??? (How do I update the model with the new data if the recommendation was successful or not)
karmic ore
fossil ivy
#

Is there something like doing too many iterations of a simulation?

#

Currently I take 50 runs and take the averages for visualization, its quite a smooth graph, I am wondering if it is too smooth

mighty patio
lapis sequoia
karmic ore
quaint plover
#

Someone up for a code optimization challenge?
I've written a functioning piece of code that takes 50 minutes to run and I can't find ways to optimize it right now -- I have limited Python knowledge, so I might be doing something stupid

mighty patio
#

@quaint plover you can just post your code, you don't have to ask to ask

quaint plover
#

Don't want to spam this channel, come to help-pear

karmic ore
#

bro does anyone know how to remove a console log from jupyter notebook

#

single line

#

like remove a new line console log

#

i came across this and it makes me wanna shoot my self because they closed the issue without actually fucking fixing it

twin sleet
#

Yo, does anyone has some experience with intelligent chatbots?
I'm trying to make wake word detection for mine

#

Need some guidance

young granite
#

if i use scipy to interpolate a dataset i raise an err:
ValueError: A value in x_new is above the interpolation range.
How do i adjust the interpolation range
i simply want to generate more datapoints in the original range

karmic ore
ember quail
#

hello people

#

im getting this error IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1)

#

when i predict thru my model

#
output = model(game1.cuda())
---> _, predictions = torch.max(output, 1)
print(predictions)```
#

its something related to the loss function according to the internet

#

loss = loss_fn(outputs, labels ) #label shape: torch.Size([10]), output shape: torch.Size([10, 5])

#

my shapes are right but still

versed gulch
# mighty patio TBQH I would consider putting your labels inside figure, as you have a lot of em...
widths = [0, 5, 10, 15, 20]
# replacing a border zone full of zeros to the edges of the skeleton image
fig, ax = plt.subplots(2, 5, figsize = (25, 25))
for width in widths:
  g_bz, p_bz = g_skel.copy(), p_skel.copy()
  
  # REPLACE 512 WITH THE SHAPE [0]
  # ground truth
  g_bz[:width, :], g_bz[512-width:, :] = 0, 0 # rows
  g_bz[:, :width], g_bz[:, 512-width:] = 0, 0 # cols  
  # filtered
  p_bz[:width, :], p_bz[512-width:, :] = 0, 0 # rows
  p_bz[:, :width], p_bz[:, 512-width:] = 0, 0 # cols
 
  
  ax[0, widths.index(width)].imshow(g_bz, cmap = "gray")
  ax[0, widths.index(width)].set_title(f"border zone = {width}", fontsize = 20)
  ax[0, widths.index(width)].axis("off")
  
  ax[1, widths.index(width)].imshow(p_bz, cmap = "gray")
  ax[1, widths.index(width)].axis("off")
fig.subplots_adjust(hspace=-0.75)
fig.subplots_adjust(wspace=0.05)

i edited the plot regarding the spacing

mighty patio
#
import matplotlib.pyplot as plt
import numpy as np
im = np.zeros((256,256))
widths = [0, 5, 10, 15, 20]
# replacing a border zone full of zeros to the edges of the skeleton image
fig, ax = plt.subplots(2, 5, figsize = (9.5, 4), dpi = 200)
for width in widths:
    ax[0, widths.index(width)].imshow(im, cmap = "gray")
    ax[0, widths.index(width)].set_title(f"border zone = {width}", fontsize = 12)

    ax[1, widths.index(width)].imshow(im, cmap = "gray")
for x in ax.ravel():
    x.set_xticks([])
    x.set_yticks([])
ax[0,0].set_ylabel("Ground truth", fontsize = 12)
ax[1,0].set_ylabel("Filtered", fontsize = 12)
fig.tight_layout()
fig.savefig("plot.png")
#

Here I simply used the xlabel to label the rows.
I also reduced the figsize, as matplotlib plots tend to look better when the figsize is not too large, but increased the DPI instead to get more pixels
(matplotlib plots tend to look better if the figsize is not too big, as a large figsize makes lines very thin)

spiral glacier
#

hi, how can i transform a dataframe like in the plotly examples with indexed=True parameter?

#

when i use .set_index("date"), i get close to the expected result, but still missing how to name the column "company"

serene scaffold
spiral glacier
#

thank you

serene scaffold
# spiral glacier thank you

no problem! keep in mind that you're not renaming a column--you're naming the whole column index. in other words, you're saying what kind of thing each column is.

void bone
arctic wedgeBOT
#

Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

serene scaffold
#

You would probably never get a code review from images

copper mica
#

!resources

arctic wedgeBOT
#
Resources

The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.

spiral glacier
serene scaffold
spiral glacier
#

can you point me to place where i can read more about that, so that i can make more sense of this for different use cases?

upbeat dagger
#

I need help troubleshooting this code.

import numpy as np
import pandas as pd

series = pd.Series({0: '0001', 1: '0002', 2: '0003', 3: '0003', 4: '0004'})
print(f'series dtype is {series.dtype}')

series_with_converted_dtypes = series.convert_dtypes()
print(f'series dtype is {series_with_converted_dtypes.dtype}')

print(np.issubdtype(series.dtype, object))
print(np.issubdtype(series_with_converted_dtypes.dtype, str))

### OUTPUT
#
# series dtype is object
# series dtype is string
# True
# ---------------------------------------------------------------------------
# TypeError                                 Traceback (most recent call last)
# Cell In [94], line 11
#       8 print(f'series dtype is {series_with_converted_dtypes.dtype}')
#      10 print(np.issubdtype(series.dtype, object))
# ---> 11 print(np.issubdtype(series_with_converted_dtypes.dtype, str))

# File c:\Python\3.10.8\lib\site-packages\numpy\core\numerictypes.py:416, in issubdtype(arg1, arg2)
#     358 r"""
#     359 Returns True if first argument is a typecode lower/equal in type hierarchy.
#     360 
#    (...)
#     413 
#     414 """
#     415 if not issubclass_(arg1, generic):
# --> 416     arg1 = dtype(arg1).type
#     417 if not issubclass_(arg2, generic):
#     418     arg2 = dtype(arg2).type

# TypeError: Cannot interpret 'string[python]' as a data type
#

There's a pandas series with numeric information encoded as text. I'm trying to come up with some logic to identify that the data is text encoded so that I can say "if series is string do some logic".

However, it comes in as object dtype, which I'm not sure is enough for me to say "yep, this is a column of strings". So I convert it using .convert_dtypes().. and that gives it the dtype string. However, when I try to test if the converted dtype is a string dtype, i get this error saying that the dtype can't be interpreted as a dtype.

steady basalt
#

@serene scaffold working on my first language work w spacy, wondering how youd write in pandas 'replace integer inside the string sentence with string characters' whereby if the number was three digits, youd replace 500 with 'three digits 'inside the string

#

derived from a single string such as 'james had 20 apples' > 'james had two digits apples'

#

as later on my classifier will take into account not exact values but how many digits they were

#

although not all strings contain them, im hoping to encode that later though

#

after this I will do 'return [token.lemma_ for token in doc if not token.is_stop and not token.is_punct and not token.is_digit]'

#

then I will expand the strings out by splitting them one word per column, and likely end with many having 'None' in the final few cols, but that shud be ok

paper wharf
#

Hello friends, I turned my python project files into exe and when I run the exe, I started getting this error, what is the solution?

serene scaffold
steady basalt
#

only temporary

#

well the manipulations are fine in general

#

I have managed to have tokenized lists in the column

#

instead of a string

#

else id get ''str' object has no attribute 'is_stop''

soft badge
#

Guys i am need sum the values equals of the columns in my dataframe, but when i use groupby('column').sum() sum every column, need sum only 1 column, someone can help me?

steady basalt
#

for example 'df["tokens"] = df["question"].apply(lambda x: [t.text for t in nlp.tokenizer(x)])'

#

ahhh now i have a listnot adoc

upbeat dagger
steady basalt
#

ayyy got it now

#

now i just need to figure out my first question, replacing numbers in strings depending on the number size with words

soft badge
#

so i am using this, but my case , i have 4 columns, i want sum 1 column, and other i dont want sum, because are values that cant sum, but when i use groupby('columns').sum()
sum every columns, understand?

compact star
#

How can I implement backwards propagation in a convolutional layer?

soft badge
grand quarry
soft badge
#

yes, line equals

#

i will send a print

#

i want aggregate the columns, the rows that have same value

hasty mountain
#

But that's how you do it... simply assign window = window.cuda()

quaint plover
#

I've been trying to optimize a piece of code for a data science project. The following function runs many hundred thousand of time (once per path) and I'm trying to find elements in the loop that could be source of waste. If you're interested in a little challenge, ping me and we can take it to a help channel!

# Initialise stack with first link
foo = list()
foo.append(path[0])

# Iterate over every step of user's path
for element in path[1:]:
    if element != '<':
        # If not return character, going forward, store information
        ## Source node is the current top of stack
        source_node = foo[-1]

        ## Count one impression for all pairs with source_node as source (source_node;*)
        df_links.loc[pos_map[source_node],'impressions'] += 1
        ## Add next link in list to top of stack
        foo.append(element)

        ## New top of stack is target
        target_node = foo[-1]

        # Create key for pair identification
        search_value = source_node + ';' + target_node

        ## Add one click-through for the pair (source_node;target_node)
        df_links.loc[df_links['linkPair']==search_value,'hits'] += 1
    else:
        # If return character is read, pop top of stack and don't store any info
        foo.pop()
wheat snow
#

heyo

#

is there a way to make the following shorter?

#
    Jahr_all=df_vd.groupby(df_vd["Start Time"].dt.date)["Duration"].sum()
    Jahr_all.index = pd.to_datetime(Jahr_all.index)
    Jahr_all=(Jahr_all.dt.total_seconds()/60/60)
    Jahr_all= Jahr_all.groupby([Jahr_all.index.year]).sum()
    print(Jahr_all)
gaunt anvil
#

it errored out when I tried to do .cuda

hasty mountain
#

It's the way you pass that tensor into the cuda device

#

That, or simply passing the argument device=cuda when creating the tensor

turbid bay
#

Anyone know of any datasets for text topic classification.

Essentially assigning a block of text a topic like: sport, news, education, etc.

compact star
#

does anyone have a good resource on implementing back prop for a convolutional layer?

compact star
steady basalt
#

Deep learning libraries handle it for u

#

What library do you use?

uncut loom
#

Which data type can store objects

compact star
#

I am writing stuff from scratch as it is a school project but I could use a library for one function

serene scaffold
desert oar
hasty mountain
#

I mean...the derivative of a conv2d is another conv2d...right?

#

At least that's what I found when I tried to implement it

compact star
fringe anvil
#

so i made a random walk function that uses an adjacency list (from a dictionary) .. where would i start if i had to modify my code so that it can use an adjacency matrix instead?

def random_walk(graph, nodeid, steps):
    da_walk = [nodeid]
    while len(da_walk) < steps:
        nodeid = random.choice(list(graph[nodeid]))
        da_walk.append(nodeid)
        turns = steps - 1
        random_walk(graph, nodeid, turns)
    return da_walk
fringe anvil
hasty mountain
#

Maybe my code can inspire you

haughty pewter
#

In a regression tree, is there a way to set the root node (at the top) to a specific column?

#

For example, what if I wanted to set it to company_revenue

haughty pewter
#

Unless it just isn't possible

#

without only using 2 variables

mint palm
#

Apart from explicit to.device()
What else can fill CUDA memory??
I am getting out of memory error, but all the to.device() are in a file that gets called using os.system().
I mean should they be freed continuously after they are done running?

#

Also i get that error after i have called os.system numerous times in a loop. I mean doing one step isnt showing error but doing multiple steps is I THINK building up and filling memory.
What could be filling it apart from explicit to.decive()

regal ingot
#

need help with naive bayes classifer

#

im stuck on what my prior probability should be

#

since the probability im using checks if a things value is over a certain number

#

so like out of a range 500 check if value is over > 250

#

so would the probability be .5

weak cliff
#

guys ... for data science would you guys advice pycharm or jupyter ?

misty flint
#

neither. vscode

#

if you really want code cells, you can get the jupyter notebook extension in vscode

#

this will help you transition easier when you need to deploy your models

weak cliff
#

why not pycharm? I find it so smooth...

#

but im watching freecodecamp video about data scientist, and he's saying everyone is using jupyter in data science ...

#

what is nano @charred egret ?

#

so.. to work in a company I can use what im more confortable with?

#

or they choose in which program I should code?

#

yeah I agree 100%

#

other question a bit more tecnical ..

#

im using a macbook air m2 128gb ,, is that enough to code and to try all of these editors for terminals ?

tacit basin
#

But what the python program does would be more important. For example for neural nets you may need Nvidia GPU access which is not available for mac's. In this case you can connect to remote machine from your m2

#

In larger corporations especially most of the AI work happens in cloud. Powerful cloud systems would crunch a lot of data. For example spark/pyspark system would use multiple machines for huge datasets that otherwise would not fit in single machine ram.

weak cliff
#

usually for coding people are using mac right?

tacit basin
regal ingot
#

man im screwed

#

i got an answer but idk how to get it

#

my eyes r killing me

rugged comet
#

What's wrong?

lapis sequoia
#

what's an easy way to replace the duplicate rows by the mean of a particular column

#

this is the way I am trying to do it. Not sure what mean(0) does

#

But without mean(0) it was not showing created column

untold bloom
#

replace the duplicate rows by the mean of a particular column

col = df["some_col"]
df["some_col"] = col.where(~col.duplicated(), other=col.mean())

"keep where the column values are not duplicated; for the other positions, put mean of the column

If you are asking to do this per-group, then similar applies but you need GroupBy.transform to select the "other" part; what transform does is repeat the aggregated values per group so the shape is preserved & where et al. can work:

col_name = "some_col"
groupers = "area", "field", ...
df[col_name] = df[col_name].where(~df.duplicated([col_name, *groupers]), other=df.groupby(groupers)[col_name].transform("mean"))

Not sure what mean(0) does

it's passing a nontruthful value to "numeric_only" parameter of GroupBy.mean; by default, only the numeric columns' (int, float, etc.) mean are yielded back; with numeric_only=0 as you do, you can also incorporate other meanable types of columns; an example is the datetime64 type you have in 1 column there; mean of datetimes are also put to the result per group with numeric_only=False. Arguably, pandas should warn/error when you pass nonboolean to what expects a boolean. It does it in some other places.

uncut iris
#

how can this error?
I have followed the documentation from numpy but still error

fossil ivy
#

helloo data science people,
I am structuring the findings of my research at the moment, and am wondering about potential good visualization methods to analyse my results.
I am simulating the day-to-day decommissioning of offshore wind farms to investigate the impact of weather seasonality. I have a simple line plot at the moment

#

What would be good approaches for the analysis here? Show min/max?
I basically need to shed light on the impact of starting the decommissioning project on different days of the year (the line shows the duration per starting date throughout a year)

#

Any suggestions are much appreciated

tacit basin
steady basalt
#

does anyone know how loss.backward() and optimizer.step() do not need to be told which network these processes are working on? how has pytorch coded this? (primarility optimizer object because this was never told any data)

#

im assuming that criterion and optimizer objects store this as attribute?

#

it goes criterion object > loss for our data object > .backward() call on this

#

and also optmizer on our graph, with a .step() called on it

#

much different to how tensorflow works, but seems so much more logical if i can just know how the classes work

#

ah, kind of got it now. man torch is so good compared to tf in terms of allowing to code the full process

static birch
#

I am planning to create a Django Application that uses NLP to analyze caption of a Facebook post ,Is it possible to create it or is it just a bad idea

steady basalt
#

Why Django?

#

It is possible do to all these things in theory the question is does it match the use case

soft badge
#

guys i'm getting lost in my study of AI and machine learning, can anyone help me, with a script or course or tips

tacit basin
floral hollow
#

How can i convert custom images to be fed to the keras.datasets.fashion_mnist model

#

the images are 600x600 and i want them to be converted in a way to be used in model.predict()

paper wharf
#

I convert it to exe and start it and I get such an error, what should I do, does anyone have an idea?

#

I'm using Yolo v7

serene scaffold
#

I've never heard of anyone making an executable for a python program that involves cuda.

mild dirge
#

I assume the shape is (n_samples, height, width), so first of all you resize the images to 28x28 (width and height) and then put all the images in a list, and convert that list to a numpy array. @floral hollow

novel python
#

how do I fill a new column based on multiple conditions in a dataframe? I know the existence of np.where, but I wanted to use more than 1 condition to fill the values

floral hollow
#
drawn_image = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE) 
resized_drawn_image = cv2.resize(drawn_image, (28, 28), interpolation=cv2.INTER_LINEAR)
resized_drawn_image = resized_drawn_image.reshape(-1, 28, 28)
#

cause no matter what image i give it the model always guesses the same thing

boreal gale
#

random question, do people prefer naming their dataframe df_something (⬅️ ) or something_df (➡️) in pandas? personally a df_something guy..

serene scaffold
quaint plover
desert oar
# boreal gale random question, do people prefer naming their dataframe `df_something` (⬅️ ) o...

this is a form of what you might call "hungarian notation", and i don't find it necessary in most cases, because in my data science code, data frames are usually the "baseline" data structure that everything is built around.

however sometimes i am working with more than one "version" of the same data, in which case i'll use the suffix form to disambiguate:

results_lst = [ ... ]
results_df = pd.concat(results_lst)
results_np = results_df.to_numpy()
#

however for the most part in my notebooks i use variable names like this:

flights = pd.read_parquet('data/flights.parquet')
airports = pd.read_parquet('data/airports.parquet')

airports_start = airports.rename(columns=lambda c: f'start_{c}')
airports_dest = airports.rename(columns=lambda c: f'dest_{c}')

flights = (
    flights
    .join(airports_start)
    .join(airports_dest)
)
boreal gale
#

huh. i never actively think about the fact that you can opt to not put df in the name 😂
that's a really good explanation/reason to not put df in the name

but alas if i don't see df a small part of my brain just yells "put a damn df in the variable name" until i do it

desert oar
#

hungarian notation has its place imo

#

Hungarian notation is an identifier naming convention in computer programming, in which the name of a variable or function indicates its intention or kind, and in some dialects its type. The original Hungarian notation uses intention or kind in its naming convention and is sometimes called Apps Hungarian as it became popular in the Microsoft App...

#

e.g. you might have both battery_raw and battery_mvolt

#

true hungarian notation might be something like raw_battery and battery, but i think suffixes work better with underscores than prefixes

steady basalt
steady basalt
#

I use prefix and suffix depending on the way the wind blows tbh

#

But keep it constant throughout

signal ocean
#

guys how can i acheive this?

serene scaffold
signal ocean
#

SD = "2019-07-20"
ED = "2019-10-10"

#

Create a dataframe using a method which takes in the start and end date and create a date range for which the contract is valid.
The dataframe needs to include an additional column where it holds the number of days which is within the contract for each month.
for reference look at the "calendar" dataframe below

signal ocean
#

been stuck on this for 5 hrs

serene scaffold
signal ocean
#

i tried but

#

stuck so bad

#

@serene scaffold could you help me pls

#

Anyone please help me

fallen ginkgo
#

I am trying to onboard some new employees at work and a large part of the job is data analysis and visualization. I am doing what I can to train them myself but it's been a long time since I was fresh and I don't know what the good resources are. Are there good introductions to Python, numpy, pandas and pyplot, which I could link some fresh talent to for self-study?

steady basalt
#

I’m shocked someone hired data analysts who aren’t comfortable w python and libraries

#

Training in this day and age wow

plush jungle
#

I'm trying to retrain a stylegan3 model on avatar the last airbender images, but I'm working with a dataset of about 2000 images and it's not improving after the first hour of training

#

it's stuck at about this quality

fallen ginkgo
#

some people still use matlab 💀

plush jungle
#

should I worry more about increasing the size of my dataset or the quality?

steady basalt
#

Physics n maths maybe

plush jungle
#

by quality I mean trying to make sure the images of different characters are more similar in pose and camera angle

steady basalt
#

These people have no reason to go beyond

plush jungle
#

or should I not be worrying about datasets and try to tune the hyperparameters?

simple fossil
#

Hi. Is there a free alternative for pinecone which you could install locally and deploy on the server?

fallen ginkgo
serene scaffold
# plush jungle

I'm impressed by how obviously ATLA these are. What is stylegan3 supposed to do? Can you guess which ATLA/LoK character is my favorite?

plush jungle
#

and I'm gonna guess Azula

serene scaffold
#

Why Azula?

plush jungle
# serene scaffold Why Azula?

she's pretty popular and one of the generated images I posted above kinda looks like her so I figured it was a decent guess. But probably I should have gone with Zuko since you also said /LoK

serene scaffold
#

Well it's Bolin.

#

Anyway, let's see how many images they used in the paper that introduced this technique.

plush jungle
#

but I'm not sure if where I'm going wrong is dataset diversity/noise, dataset size, hyperparameters or just time

serene scaffold
#

what kind of images were ffhqu-256 trained on? other animes?

plush jungle
#

before retraining it generates images like this

serene scaffold
#

And the 2000 images are headshots from ATLA? I wonder how many "headshot instances" are even in the series.

plush jungle
#

but there's way more noise in the full dataset

#

and I'm wondering if that's the problem

#

but the saying always goes that in deep learning throwing more data and more compute at the problem usually works, so I wonder if I shouldn't start cleaning my dataset more until I have like 10k images

#

I've confirmed that on a dataset of 30 images of aang, it overfits massively after a couple hours of training

#

these results are almost identical to training images

serene scaffold
#

A lot of the ATLA pictures cut off the top of the person's head. does that happen in the original data as well?

serene scaffold
plush jungle
#

if you mean the flicker faces, no, they're usually full faces, but in my dataset there are a few extreme close ups

serene scaffold
#

you might omit the extreme close ups.

steady basalt
plush jungle
steady basalt
#

(im not experienced with generators)

serene scaffold
plush jungle
#

if it didn't overfit, then I have bigger problems

steady basalt
#

how is it overfitting? arent those generated?

#

are they too similar to training:

serene scaffold
#

but that's not the point.

plush jungle
#

increase dataset size?
remove samples that aren't exact face shots?
change learning rate or gamma (whatever that is)?

serene scaffold
#

to be honest, I'm not sure what to do. but I think this is really cool! 😄

#

thanks for sharing it.

plush jungle
serene scaffold
#

but with only 2000 instances, you've made faces that look like a nightmare version of ATLA 😄

steady basalt
#

Maybe u just need more viariety yeah

plush jungle
serene scaffold
# steady basalt Maybe u just need more viariety yeah

judging by the slice of their training data that they showed, it looks like it might already be exhaustive. There are only so many possible training instances in the set of all headshots from a three-season anime.

plush jungle
#

the first couple iterations of training are always massively uncanny valley

steady basalt
#

Try to add fan art

serene scaffold
plush jungle
steady basalt
#

Anyone here use azure

#

And sdk for azure

fringe anvil
#

#help-cookie for ds/algo/random-walk .. im stuck i need help. thanks

steady basalt
#

This is DS, not DS

serene scaffold
fringe anvil
serene scaffold
fringe anvil
serene scaffold
steady basalt
#

Sounds like a comp sci class

#

I never touched DSA on my msc

serene scaffold
steady basalt
#

Not like DSA would help a lot in most of what we do

serene scaffold
fringe anvil
#

im in the middle of a change of career. it's a diploma from Concordia university on a 10 months period in the form of a bootcamp.. it's intensive, as i have a full time job at the same time. a bit tough on the maths side lol

steady basalt
#

My mate also had the same, but it was a shared module with comp sci

serene scaffold
serene scaffold
fringe anvil
#

oh yeah, numpy and pandas was kind of cool. but right now we are using that networkx library, which tbh, there isnt much info out there on the google. so i cant use my googlefu to learn all that stuff

steady basalt
fringe anvil
#

it used to be in person. but with covid and all, we have lectures in zoom, we also have a discord etc

serene scaffold
steady basalt
#

is graph theory all that complicated?

serene scaffold
#

Graph theory is very useful btw. I legitimately use it every day at work.

serene scaffold
fringe anvil
serene scaffold
#

no idea what it even involves.

#

other than flame

steady basalt
#

I know about bfs and dfs, but i cudnt code it off the top of my head

fringe anvil
#

when the pandemic hit, i decided to learn python on my own. then my gf did a full stack web dev bootcamp at the same university. and im like.. you know what, my turn now 😆

fringe anvil
#

that was fun, loll

steady basalt
#

yeah idk, ive watched alot of videos on it but theres 0 way i cud code it properly without cheating

#

its been a minute

fringe anvil
#

this is the module we are in now

steady basalt
#

test urself on leetcode

#

theres plenty of traversal questions

fringe anvil
#

ive done a lot of codewars back when i was doing a lot of python, even coded my own small discord/twitch api bot. but leetcode is something else. the easy challenges on codewars are for babies compared to the ones ive seen on leetcode. i thought maybe id wait a bit before tackling it, lol