#data-science-and-ml

1 messages Β· Page 46 of 1

stone glacier
#

can discuss about NLP here in this channel?

atomic tide
#

It will probably be easier to sort them out before taking the square root, as the two solutions will be conjugates (in the complex case) or just two real numbers. If they're complex conjugates, you could put all the solutions with positive imaginary component on one side, and all the solutions with negative imaginary component on the other side. If you have two real solutions, you could group together the solutions that are the smaller of their pair, and group together the solutions that are the larger of their pair.

stone glacier
#

Oh, cool. Is there any trusted guide on how to get acquainted with some NLP methods for Twitter tweet analytics? in like a week?

#

There are many guides available, I know. But I want one which people trust (since I got a project deadline to meet so need to utilise what time I get for it)

boreal gale
#

in your example

[1.50622964+650.25736311j, 449.3031039+25.95939333j]
 [1.87818391 +631.29563062j, 789.98518552+34.33014745j]
 [1402.82082129+84.79794406j, 2.40353116 +607.05689764j]
 [1602.45701021+4146.32391044j, 3.18701564 +575.16495683j]

what would be your desired output?

candid garnet
#
 [  5.96173114+1112.47599554j,  33.73591086 +181.08331466j]
 [  5.79370343+1108.15897559j,  40.55934614 +154.85239687j]
 [ 51.96897045 +124.30033042j,   5.6327504 +1103.9652141j ]
 [ 73.19468058  +90.8068841j ,   5.47847438+1099.88955899j]
 [106.66038373  +64.1440674j  ,  5.3305056 +1095.92714543j]```

would ideally be 

```[  6.13726168+1116.92173512j , 29.1556275  +203.88033415j]
 [  5.96173114+1112.47599554j , 33.73591086 +181.08331466j]
 [  5.79370343+1108.15897559j , 40.55934614 +154.85239687j]
 [  5.6327504 +1103.9652141j , 51.96897045 +124.30033042j]
 [  5.47847438+1099.88955899j, 73.19468058  +90.8068841j]
 [  5.3305056 +1095.92714543j, 106.66038373  +64.1440674j]```
boreal gale
candid garnet
#

yeah there can still be legitimate big jumps

#

you can see here the big vertical line is where they have swapped incorrectly

#

that's with some conditions put in to check

#

this is what it looks like with no conditions

#

the second column should be relatively consistent though

boreal gale
#

okay, basically you are trying to optimise for minimal difference between successive entries's imaginary component (or maybe + real)

candid garnet
#

yeah i had (excuse the awfulness):

condition_1 = abs(item[0].imag - previous[1].imag) < abs(item[0].imag - previous[0].imag)
  
condition_2 = abs(item[0].real - previous[1].real) < abs(item[0].real - previous[0].real)

condition_3 = abs(item[1].imag - previous[0].imag) < abs(item[1].imag - previous[1].imag)

condition_4 = abs(item[1].real - previous[0].real) < abs(item[1].real - previous[1].real)

if (condition_1 and condition_2) or (condition_3 and condition_4):
  swap()```
#

and it worked for one setup, but now i've made the variables a bit more complicated some rogues have slipped through the cracks

austere prawn
#

Hello, first time seing pandas in our huge python code base. It's used to output an html table. (quite a simple usecase I think). Question copied from python-general:

What does it mean to style a pandas dataframe, and why is the outputted html so disgusting? example:
<th id="T_c9383ca4_a7c2_11ed_9342_f02f74177e5elevel2_row56"
It's annotating every table cell with a unique id 😲

candid garnet
austere prawn
tidal bough
candid garnet
#

you can see in row 27 a swap needs to happen

austere prawn
austere prawn
tidal bough
# candid garnet you can see in row 27 a swap needs to happen
import csv
from pathlib import Path

file = Path("ky_roots.csv")
res = []
with file.open() as fo:
    reader = csv.DictReader(fo)
    for line in reader:
        a, b = map(complex, (line["0"], line["1"]))
        if res:
            last_a, last_b = res[-1]
            dist = lambda a, b: abs(a - last_a) ** 2 + abs(b - last_b) ** 2
            # change order if that makes it closer:
            if dist(b, a) < dist(a, b):
                a, b = b, a
        res.append((a, b))

this works for me

#
import cmath
roots = [(cmath.sqrt(a), cmath.sqrt(b)) for a,b in res]
plt.figure()
plt.plot([a.real for a,b in roots])
plt.plot([a.imag for a,b in roots])
plt.plot([b.real for a,b in roots])
plt.plot([b.imag for a,b in roots])
plt.show()

makes the plots pretty continious I think

candid garnet
#

sadly that's VERY CLOSE but incorrect, it's supposed to be more like

#

two peaks need to be in the one array, with the second being much more stable

tidal bough
#

This doesn't even look the same - did I not take the root right or something?

candid garnet
#

this is for a different set of results that were sorted correctly, just a few different parameters but the first solution is definitely supposed to have two peaks

versed gulch
#

Hi does anyone know how to merge 2 datasets together into one in pytorch?

candid garnet
#

So that's the issue I'm facing now of how to find a method that will preserve two peaks

lapis sequoia
#

how can i add each five rows into a new row in pandas

serene scaffold
lapis sequoia
#

sum of last 5 changes

serene scaffold
arctic wedgeBOT
#

Series.rolling(window, min_periods=None, center=False, win_type=None, on=None, axis=0, closed=None, step=None, method='single')```
Provide rolling window calculations.
serene scaffold
#

though this is with a sliding window. if you want separate groups of 5, with no overlap, that's different.

austere prawn
#

anyone here experimented with html output of pandas? The id hash generation is not very stable, irritatingly

austere prawn
#

I get diffs like this by just generating the same output again. My guess is that some hash is using the python id() of some object to create the id:

[-#T_e5c18203_a7c9_11ed_9342_f02f74177e5e-]{+#T_64ec984f_a7d1_11ed_9342_f02f74177e5e+}
tidal bough
#

Are you sure it's not just, like, a random uuid?

austere prawn
#

nope, it could be, but why would you do that?

#

oh

tidal bough
#

It seems to me like a straightforward solution to generate random unique ids for each cell.

#

!e the format matches, too:

import uuid
print(uuid.uuid4())
arctic wedgeBOT
#

@tidal bough :white_check_mark: Your 3.11 eval job has completed with return code 0.

b16cc679-402a-4840-9533-9efb901a1d21
austere prawn
#

to me, just using an index would be much more straightforward and fool-proof (and more stable)

tidal bough
#

no idea why they needed unique ids for each cell anyway, tbh

austere prawn
#

ooh, I didn't know about uuid, thanks!

tidal bough
#

but if they used the index, that could conflict between dataframes, so maybe that's why they didn't.

austere prawn
#

sure, but then they could just put a discriminant per dataframe

#

oh well, I'll just live with it :3

wooden sail
#

take the 3 previous values, fit a 2nd order polynomial to them, and try to predict the next value

#

then pick the order based on the point that minimizes the distance to the predicted value

#

this will enforce some amount of smoothness instead of being only pointwise

austere prawn
#

in my output I should say, there is probably some feature where the output actually uses the ids instead of just generating them as dead-weight :p

tidal bough
#

Are you using pandas.to_html here? Or what, jupyter's export to html?

#

maybe try these templates and see if they're better?

#

ah, I guess you're using pandas's to_html after all, which seems non-customizable

#

i mean I guess if you really wanted it, you could always, like, parse the output with bs4 and remove all the ids πŸ₯΄

austere prawn
tidal bough
#

huh, so what are you actually looking at? how did you get the html file?

austere prawn
#

The script uses pandas and DataFrame, and to_markdown to generate a markdown report. But I haven't figured out how it's actually generating html. Perhaps html is the default output when doing string on a style.
The code has stuff like

df_specs.index = pd.MultiIndex.from_frame(
    df_specs.spec_id.str.split("/", n=2, expand=True).fillna("")
)
df_specs.drop(columns=["spec_id"], inplace=True)
style = set_style(
    df_specs.style.apply(paint_empty_red, colname="tested by", axis=1)
)
outfile.write(style.render(index=False).replace(", ", "<br>"))

and without ever having used pandas, it's hard to tell what it does. Although it's really smelly! 🀒

tidal bough
#

πŸ‘€

austere prawn
#

oh, i guess perhaps render defaults to html. (brb, laundry time!)

tidal bough
#

magic is happening here I guess, lemme read docs more

tidal bough
#

if this is the right class

austere prawn
#

deprecation doesn't face us, we are still at python 3.6 and pandas 1.1.5

austere prawn
#

We have been trying to go up python version for over 2 years, but the US office is holding us back T_T
Hopefully we'll get python 3.9 or 3.10 any month now

hidden mist
#

Si! I've been out of school for awhile now. That might be trivial for you guys but it's a bit of a reach for me right now lol

#

I was talking elsewhere about this, but I understand a lot of these concepts on a fundamental level from prior knowledge and intuition, but the math really obfuscates a lot of these concepts in a way I'm having a hard time remedying.

austere swift
#

Try increasing your batch size until it runs out of memory then bring it back a bit till it runs (then a bit more since you don’t want it to be just on the edge)

gusty agate
woeful ridge
#

@queen cradle how did you go with this? any luck?

fading gate
#

for my dataframe I can do something like for a,s in df.groupby("a"): ax.plot(s.bar, label=a)

#

but it's unclear how to accomplish the above if I do df.groupby(["a","b"]) any idea?

#

or is there a pandas specific discord I should use?

serene scaffold
fading gate
#

yeah

serene scaffold
#

like entirely separate images?

fading gate
#

nope, so the first example I'd get 1 plot (line) per a and a is like 0..10

#

but in the 2nd case instead of just groupby on a, I need to also aggregate by a,b

#

but then I'm stuck how to keep b as the index and plot 11 plots for a={0..10} like original

#
a,b,c,d
0,1,2,3
0,2,5,10
0,3,10,15
1,1,20,30
1,2,50,100
1,3,100,150
2,1,21,31
2,2,51,11
3,3,11,13"""

df = pd.read_csv(io.StringIO(t))
df```
#
for a,series in df.groupby("a"):
    ax.plot(series.set_index("b")["c"], label=a)```
#

Having trouble with this: fig,ax = plt.subplots() for a,series in df.groupby(["a", "b"]): ax.plot(series.unstack()["c"], label=a)

lyric wave
#

hello, so im trying to make a jarvis like ai and now im working on a automated google search code. but everytime i ask her to search somthing she does it and it al lloks fine and woring but like a second after the google tab is opened it closes again.
can anyone help me ?

#

That's my code, sorry for screenshot

#

elif 'search' in command:
keyword = command.replace('search', '')
browser = webdriver.Chrome()
browser.get('https://google.com/search?q=' + keyword)

#

That's in Text

fading gate
#

found out that you can do something like: for i,x in df.groupby("a"): ax.plot(x.groupby("b")["c"].sum(), ...) unsure if this is even ideal though

queen cradle
#

Your problem is that graph_objects.Volume is only able to render grids of points, and your data is not a complete grid. Try running:

import numpy as np
import pandas as pd

np.set_printoptions(linewidth=125, precision=4)

df = pd.read_csv("HygroGen_TempUniformity_PointCloud.csv")

xx = df['x'].to_numpy()
yy = df['y'].to_numpy()
zz = df['z'].to_numpy()

sort_idxs = np.lexsort((xx, yy, zz))
xx = xx[sort_idxs]
yy = yy[sort_idxs]
zz = zz[sort_idxs]
print(np.reshape(xx, (-1, 10))[:3])
print(np.reshape(yy, (-1, 10))[:3])
print(np.reshape(zz, (-1, 10))[:3])

and take a look at the output. These are your first 30 data points, in lexicographic order (by z first, then y, then x), in groups of ten. The first group has z equal to -47.641 and the second has z being -42.5167. For both of these, you have y in steps of 20 from -40 to +40. And for each of these y and z values, you have some positive x value and some negative x value; but the x values are different, so the data does not fit a grid. The last displayed group is actually really two: You have a single z value and evenly spaced y values, but only a single x value. Plotly does not know what to do with this.

You might have some luck with interpolating your data to a grid; but you will have to be careful and make sure that whatever behaviors you see are really in the data and not an artifact of the interpolation.

#

(Sorry for the delay, by the way. I only got a chance to look at it just now.)

muted zenith
#

i am using with cli and getting the following error,
: error: input types 'tensor<1x768x3000xf16>' and 'tensor<1xf32>' are not broadcast compatible
LLVM ERROR: Failed to infer result type(s).
pytorch version is Version: 2.0.0.dev20230208

#

this is using pytorch with cli for openai's whisper

dense yarrow
#

is anyone available to help me with a code?

queen cradle
plush jungle
#

so I'm training a deep q learning agent to play a top down shooter game of my own creation and I'm having trouble teaching it to shoot at targets

#

I have two versions I've tried, one where the neural net is a CNN that takes the game window as input and uses that as the state

#

the other takes a vector of player position, enemy position, enemy velocity, player angle, etc.

#

the CNN has yet to converge to the optimal strategy, so maybe it just needs more training or maybe my hyperparameters need to be tweaked

#

.
but as for the other version, I realized that I haven't coded bullet position into its state vector

#

and I'm wondering how to do that considering there could be any number of bullets on screen and the neural net needs a fixed sized input vector

#

I thought about having a super long input vector with zeros representing bullets that don't exist yet

#

but then the neural net will store bullets as "bullet 0, bullet 1, etc" and it'll mess up the learning because each new bullet will have new weights associated with it that haven't been trained yet

midnight kayak
plush jungle
midnight kayak
#

nvm I found the method

#
dataset = tfds.builder_cls('scientific_papers')
fringe bay
#

hey everybody
I have coordinates in 2d space, which are actually gps coordinates in reality
Most of the points follow a good path, nothing crazy, but in some circumstances I get them like this.
Does anyone have an idea on what algorithm I should look into, if there's an easy or not so easy solution to this

boreal gale
fringe bay
boreal gale
#

and the task is to recreate/determine what's the real path?
or the task is to remove points that are not on the real path?

warm jungle
#

I'm doing a bit of profiling of some numpy code, and one of the more expensive lines looks like this: x[:] = y[z] Is there anything that might speed this up - I tried using take: np.take(y, z, out=x) but this is actually significantly slower (for reasons I don't understand)

hasty grail
#

and what is it for

warm jungle
#

x and z are matrices with about 10 million rows, and 15 columns, y is 1 d with about (700,)

#

It's part of the score calculation for a game.

hasty grail
#

Hmm, from my understanding, using fancy indexing is already quite performant

#

is there a way you can avoid operating on such large data?

warm jungle
#

I need to use this data - it's not prohibitively expensive, I'm just look around for potential improvements. (I don't get why take is more expensive tho')

#

Since the calculation for each row of the result is independent, there's potentially scope for parallelizing I guess - doing subsets in different threads

hasty grail
#

yeah, maybe you can try parallelizing the operation using Numba

warm jungle
#

the take thing is odd tho' because doing the fancy indexing and then assigning means allocating quite a big array, whereas take with out doesn't need anything allocating.

candid garnet
wooden sail
#

was it any better at all?

fickle rock
#

Hello! Can anyone help me out? I'm getting a pretty bad r^2, negative in fact, but the graph where predicted values are plotted over test values looks very solid. How is this possible?

#

By the way, this is how Ridge regression graph looks in comparison to the Lasso one above:

zealous badger
#

does anybody know why we use sklearn.preprocessing.PolynomialFeatures for polynomial regression?

mild dirge
#

Because it allows you to perform linear regression methods on all possible polynomial features

#

@zealous badger

zealous badger
mild dirge
#

If you have two features, a and b. then linear regression just uses features a and b. but if we want to look at quadratic features, then we also have (a + b)^2

#

So a, b and then a^2, b^2, ab (actually 2ab, but it doesn't matter for linear regression as long as we are consistent)

#

And also 1 to be able to have a bias

#

so we have 1, a, b, a^2, b^2, ab

zealous badger
#

ah i see

mild dirge
#

The same can be done for higher degrees

zealous badger
#

thank you :)

rigid bronze
austere swift
#

did you install the face recognition module?

dense yarrow
#

getting error message on this df.groupby("offspring_recode").mean()

#

'DataFrameGroupBy' object has no attribute 'groupby'

#

i googled it but didn't find anything helpful

austere swift
#

the error makes it look like df is already a groupby object

#

can you show more code?

dense yarrow
#

okay! I tried a diff approach now and reloaded the code, not getting the error msg but something new: ```py
df.groupby("age")

def function (o):
if o == 1:
return o.mean()

df.groupby("offspring_recode").apply(function)```

#

The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

#

I want to find the mean, and then plot the fraction of respondents who have kids (set it to equal one before) with respect to age.

#

but idk how to get there

boreal gale
#

I want to find the mean,
mean of what? do we need to group by some common attributes?
and then plot the fraction of respondents who have kids (set it to equal one before)
" set it to equal one before" - what is this?
with respect to age.
do you mean group by age?

dense yarrow
#

which is what i'm trying to do in the earlier codes

dense yarrow
boreal gale
#

at this point it would be beneficial to show a subset of your data

dense yarrow
#

there is a column for age of the respondents

#

one sec

dense yarrow
boreal gale
#

best to show OKCupid_data itself

dense yarrow
#

this is column age: print(OKCupid_data["age"])

boreal gale
#

to me "calculate the average number of respondents" is just a weird request..

but anyway..

you want to filter the dataframe such that only rows with children remains, then group by the age group, do .mean aggregation after the group by.

arctic wedgeBOT
#

Hey @dense yarrow!

You either uploaded a .txt file or entered a message that was too long. Please use our paste bin instead.

dense yarrow
#

so i turned to function which didn't help either

#

it's calculating the average number of responded who have kids

amber barn
boreal gale
#
df[df['kids'] == 1].groupby('age_group')['num_respondents'].mean()

are you doing something like this?

#
df[df['kids'] == 1]
  1. filter the dataframe such that only rows with children remains,
df[df['kids'] == 1].groupby('age_group')
  1. then group by the age group
df[df['kids'] == 1].groupby('age_group')['num_respondents'].mean()
  1. do .mean aggregation after the group by (on the number of respondents as requested)
dense yarrow
boreal gale
#

i just assumed you have something like that..
you said

first one asks to calculate the average number of respondents within each age group who have children by calling .mean()

#

show me a screenshot of the dataframe, even if it's only the first few rows it will do, because i might have misunderstood what do you have

edit: i gotta head home now, so gonna be afk!

molten hamlet
#

is there pandas function that plots array with colors? πŸ€”

#

I mean, like in some spreadsheet, to make cells colorized

dense yarrow
#

Anyone know how to imputate missing values? Basically what I understand is I have to use groupby and find the number of missing values in multiple columns but idk how the code should look

serene scaffold
molten hamlet
dense yarrow
prime hearth
#

hello, i would like to please ask would it be a good idea to do drop certain features from dataset and see if my SVM with rbf kernel performs better? The goal is to find the most important features

lapis sequoia
#

why is the index column showing less number of rows than what actually is

fringe bay
serene scaffold
#

basically, dropping rows creates gaps in the numbering.

lapis sequoia
#

how do ireset index

serene scaffold
#

.reset_index(drop=True)

lapis sequoia
serene scaffold
#

.reset_index returns a new df

lapis sequoia
#

oh

serene scaffold
#

also, you should probably reset the index before doing an operation that depends on it

#

namely df.index % 3

lapis sequoia
#

true but reset didnt work

serene scaffold
#

yes it did

lapis sequoia
#

length is still 54512

#

oh wait

serene scaffold
#

I'll come back to this when my meeting ends

serene scaffold
lapis sequoia
#

yeah it worked actually

odd meteor
odd meteor
# dense yarrow Anyone know how to imputate missing values? Basically what I understand is I hav...

There's no best approach to imputation of missing value in a dataset. It all depends on the kind of dataset you are working with and the column that has missing data.

The code below will show you how many missing values you have in all the columns in your dataframe

df.isna().sum()

Now, how you decide to fill each column that has missing value is your prerogative. However, I hope the attached link would provide you more clarity

  1. https://towardsdatascience.com/how-to-fill-missing-data-with-pandas-8cb875362a0d

  2. https://scikit-learn.org/stable/modules/generated/sklearn.impute.SimpleImputer.html

Medium

A beginner’s guide

inland condor
#

Hello, I never really did data analysis stuff before, but know databases, discordbots and a little matplotlib. I am interested in how raid performance in clash of clans is calculated. There is supposed to be a formula, but it is unknown. I have around 30k samples in my database with four input values and the actual performance I want to be able to compute. Can you recommend any concept, library, tool, guide or tutorial how to approach finding that formula or a close enough approximation?

charred light
inland condor
#

ah yes, regression looks like what I should dive into. Thanks πŸ‘

patent lynx
solar torrent
#

Hello I'm trying to get some help with reading a dataframe from a csv... I need to know how to reshape/pivot the table for columns in the field TITLE . Here's the link to the .csv - https://www2.census.gov/programs-surveys/acs/data/2021/CD118_Data_Profiles/ALL_CD by Nation/DP03_1yr_500.csv and some sample code I've tried. I tried marking up the screenshot for my desired pivot, but let me know if it's unclear. Note the column GEONAME isn't unique so I'd need to pivot it without making it the index...

dense yarrow
#

i'm getting error message on this py OKCupid_data.groupby(['drugs', 'drinks', 'smokes]).isna().sum()

#

i want to know how many rows have non-NaN values in these columns. can anyone help?

serene scaffold
dense yarrow
serene scaffold
dense yarrow
#

got it!

serene scaffold
#

do you know what a string literal is?

dense yarrow
serene scaffold
# dense yarrow no 😦

it's where you have the actual string right there in the code.

foo = "cake"
print(foo)

"cake" is a string literal, and foo is a variable that refers to it.

#

and if you forget the last quote mark, Python will just assume that your code that comes after it is part of the string

#

(you forgot the last quote mark.)

#

see?

dense yarrow
#

ohhh

#

lemme fix it and try again

#

new error message: 'DataFrameGroupBy' object has no attribute 'isna'

serene scaffold
#

try doing isna before the groupby

#

because isna is done on individual elements, regardless of what row or column they're in

#

or what other data is in their rows/columns

dense yarrow
#

thanks! idk if it's right but the code ranπŸ™

patent lynx
#

Would you guys reccomend split another set of data called the validation set? Assuming i have created a model from a train set and I want to fine tune it

agile cobalt
# patent lynx Would you guys reccomend split another set of data called the validation set? As...

after you collect the data, before you fit the model, you should always split it into three sets - one for training, one for testing, and one for checking how well it will perform with completely unseen data (the last one only to be opened for evaluating once, after you have the model 100% decided, with no changes after seeing how it performs on it)
in some cases, you can get by with just training&test sets, specially if you do not plan to actually put that model in the real world (e.g. part of a competition or the model is just for proof of concept)
at least splitting into training&test sets is completely vital for any supervised models though
tl;dr yes

#

comparing the model's score on training & test sets can help you diagnose a bunch of problems, specially underfitting / overfitting, as well as give you a slightly better notion of how well it'll perform on unseen data
(furthermore ; if you make too many tweaks to perform better on the test set, it'll end up """fit""" to the test set to some extent, which is why you may want that third set)

dense yarrow
#

I feel like I knew how to do this but I can't figure it out,, i have to write a function to make new columns. here's the instructions: "if s is a Series, then s==<number> is another Series whose entries are True when entries of
s are that number and False otherwise"

#

here's my function : def sign_function(s) if s = True when s = 1

agile cobalt
arctic wedgeBOT
#

@agile cobalt :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | 0    3
002 | 1    4
003 | 2    5
004 | 3    6
005 | 4    7
006 | dtype: int64
007 | 0    False
008 | 1    False
009 | 2     True
010 | 3    False
011 | 4    False
... (truncated - too many lines)

Full output: https://paste.pythondiscord.com/tokuxerije.txt?noredirect

agile cobalt
#

did you mean that or you're asking how to assign that to a df?

dense yarrow
agile cobalt
#

adding a new column can be done the same way you would assign a value to a dictionary

#

(side note: when editing an existing column you may want to use .loc / .iloc to specify both which rows and which columns to edit)

dense yarrow
#

create 12 new columns. The new columns should contain a 1 for observations with the corresponding astrological sign and a 0 otherwise. this is what i have to do

agile cobalt
#

ah, so dummy variables?

#

look that term up - there are built in methods to do it

dense yarrow
#

i think this is supposed to be really simple, i just can't rememeber how to write this function

queen cradle
#

Also, depending on the next thing you're going to do with the data, you may be able to replace the assignment with the reduce method of a ufunc. Or you may be able to take out the assignment to x entirely.

queen cradle
dense yarrow
#

can anyone help me with this? I want to create multiple columns using a for-loop. this is the function I have (not sure if it's written correctly):

def sign_function(s):
  if s == 1:
    return True 
  else:
    return False```
#

here are the instructions: Loop through your dictionary of astrological signs (already have this). For each astrological sign, create a new column that checks whether sign_recode is the corresponding number. This new column will contain True and False entries, so vectorize int() over the Series to turn the boolean variables into 0s and 1s.

serene scaffold
#

@dense yarrow this is with pandas, right? You're adding columns to a DataFrame?

serene scaffold
# dense yarrow yeah!

When you're using pandas, assume that the solution will not involve a loop and will not involve iteration

dense yarrow
#

: /

serene scaffold
dense yarrow
serene scaffold
dense yarrow
#

hahaha

serene scaffold
#

I'm not joking though.

dense yarrow
#

i've we've been using for-loops for a lot of things with pandas

serene scaffold
#

What is the context for this course?

dense yarrow
#

why is it not okay? I'm a beginner i'd love to know

#

we're learning data cleaning and basic data visualization

#

for non-coders

serene scaffold
dense yarrow
#

ahh

serene scaffold
#

so vectorize int() over the Series to turn the boolean variables into 0s and 1s.
this isn't how you would do that. you would do astype(int)

#

your instructor is doing you a disservice.

#

can you print the astrological signs dict and the sign_recode columns, and show the text?

#

@dense yarrow

dense yarrow
#

i cannot copy paste the result bc the file is too large

arctic wedgeBOT
#

Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

serene scaffold
#

I will only accept text, so please put the text in the paste bin as soon as you can.

dense yarrow
#

okay one sec

#

i think i did?

serene scaffold
#

you have to save it and give the link

dense yarrow
serene scaffold
#

@dense yarrow my instructions were "can you print the astrological signs dict and the sign_recode columns, and show the text?" -- please do a new paste bin

serene scaffold
dense yarrow
#

yeah!

#

12 columns

serene scaffold
# dense yarrow yeah!
0        6.0
1        7.0
2        NaN
3        3.0
4        2.0
        ... 
59941    NaN
59942    NaN
59943    NaN
59944    NaN
59945    NaN

idk what this is called, but let's assume it's df['sign_num']. do print(df['sign_num'] == 6) and look at what you get.

#

I guess it's df['sign_recode'], but idk what your df is called (df or what)

serene scaffold
dense yarrow
#

true when the value is 6 and false when it's not?

serene scaffold
#

because pandas is about doing operations to the whole data. so with == 6, you get the answer for every element

dense yarrow
#

ahh, yeah

serene scaffold
#

here's another thing you might need

#

!dict-iter

#

!iterdict

arctic wedgeBOT
#

There are two common ways to iterate over a dictionary in Python. To iterate over the keys:

for key in my_dict:
    print(key)

To iterate over both the keys and values:

for key, val in my_dict.items():
    print(key, val)
serene scaffold
#
for key, val in my_dict.items():
    print(key, val)

this part will come in useful, if you insist on looping.

#

also, adding a new column to a df is like assigning a value in a dict.

#

anyway, see how far you can get with this. I need to take a shower irl

dense yarrow
#

how would i use this to make columns?

#

okay

serene scaffold
#

(I was walking home from the gym when this started. gains.)

serene scaffold
violet gull
#

why are my Dense Outputs so large? It is causing issues when I try to put a large number through softmax and it tries e^1000 which returns NaN. My weights are initialized use a gaussian distribution, my biases are initialized at 0, and my inpuits are range -1 to 1

#

Dense output is just calculated by inputs * weights + biases

slate hollow
#

could i have some clue on how to solve this?

wooden sail
slate hollow
#

hm i got to the same result but for a diff reason

#

sarsa looks @ the next action, which in e-greedy can sometimes take you into a red box

#

so it'll play it safe and go the long route

wooden sail
#

how would it know to "play it safe" though if it only has local knowledge

slate hollow
#

fair point

#

but

#

SARSA only looks at the current state

#

in the lecture i'm looking at, sarsa is updated like so:
Q(s, a) = (1- x)Q(s, a) + x(r + Q(s', a'))

#

(x is alpha)

wooden sail
#

well, 2 states. the current and where it would end up after taking an action

slate hollow
#

i mean it technically has map-wide knowledge through q though?

#

also uh q-learning is updated like this:
Q(s, a) = (1- x)Q(s, a) + x(r + max a' of Q(s', a'))

wooden sail
#

aren't the Q values the payoffs for each state action pair?

slate hollow
#

yeah

wooden sail
#

in the expressions you wrote, Q is evaluated at s, a. that means all you know is Q at one point, not the overall state

slate hollow
#

fair enough

#

going back to your argument, what does "expected value of the total reward" vs. "only looks at the current state"

#

could you possibly elaborate on that a lil?

wooden sail
#

i wouldn't know what else to say

#

q-learning considers the overall reward, so the average of the sum of all the Q values along it's path

wooden sail
#

we want the reward to be as big as possible, and in this example, all rewards are negative

#

so it picks a shorter path

slate hollow
#

looking at the 2 formulas i have, they both have access to essentially the same info

#

it's just the last a'

wooden sail
#

i'm trying to look for an alternative explanation

#

so the big difference is the presence of that "max" there

#

the best action is not necessarily the one taken by the agent

#

that's the big difference

rigid bronze
#

Can anyone tell me the use of face_locations and what output it give ??
and also what arguments cv2.rectangle takes

faceLoc = face_recognition.face_locations(img1)[0]
cv2.rectangle(img1 , (faceLoc[3] , faceLoc[0]) , (faceLoc[1] , faceLoc[2]))
#

(26, 187, 78, 135)

lapis sequoia
#
arr = df["change"].astype(str)
print(arr.dtype) # object```
how do i convert it to `str`
violet gull
#

why are my Dense Outputs so large? It is causing issues when I try to put a large number through softmax and it tries e^1000 which returns NaN. My weights are initialized use a gaussian distribution, my biases are initialized at 0, and my inpuits are range -1 to 1

mild dirge
#

Try a lower standard deviation @violet gull

mild dirge
#

Because the weights will then be closer to zero, so the magnitude of the output will be lower too

#

But with only 4 nodes it shouldn't be this high anyways if inputs are between -1 and 1

#

So maybe something is wrong in the calculation

#

The softmax output is wrong too, since softmax should sum up to 1

mild dirge
#

For the second one they are all E-24 or less

violet gull
wooden sail
#

there's a 1 in there πŸ˜›

mild dirge
#

Oh right

#

sneaky 1

violet gull
#

so we dont know whats wrong with it

mild dirge
#

If you have inputs between -1* and 1, and weights gaussian with std 1.0, then in worst case scenario all weights are like 4, and all inputs are 1, so then you have an output of 16

violet gull
#

the weights arent -1 to 1

mild dirge
#

-1, but same idea

violet gull
#

they arent that either

mild dirge
#

Oh sht, meant inputs

violet gull
#

ye

mild dirge
#

But basically that, it shouldn't really be possible to get those kinda outputs with that initialization and scaling of inputs

violet gull
#

so what do i change

mild dirge
#

Did you make the dense layer yourself?

violet gull
mild dirge
#

Probably miscalculated something I suppose?

violet gull
#

i dont think so

mild dirge
#

You just do a dot product like weights @ inputs?

violet gull
#

i just want this thing to work

mild dirge
#

Well that is literally just the whole forward pass, a dot product between the weights and the input vector

#

And then the activation

#

Which is softmax, but that seems to work fine in your case

violet gull
#

yes

#

so what the problem?

mild dirge
#

Well like I said, if your inputs are between -1 and 1, and the weights are sampled from a gaussian with mean 0, std 1.0, it's basically impossible to get those outputs

#

So either the inputs are not scaled correctly, the weights are not initialized correctly, or the forward pass isn't just a dot product

#

Check all of those

violet gull
#

i checked everything the best i could

#

i can send code though

mild dirge
#

I have to go in a bit, have class in half an hour

#

But maybe someone else could check

violet gull
#

i have test cases implemented for most of these and the tests all pass

violet gull
mild dirge
#

Nah, there's plenty of people knowledgeble here.

#

It may take a bit for them to wake up though, it's 8 am in EU

violet gull
#

im genuinly been asking this same question for days

#

you the first person who has responded

mild dirge
#

When I get home I could help if you haven't fixed it yet

violet gull
#

ok i appreciate it ❀️

mild dirge
#

But that will be in a few hours, I really have to go

violet gull
#

ok ping me whenever ur available

orchid carbon
#

someone know easy way to put my model into mobile without using web

#

tensorflow's

wooden sail
#

how about tflite?

wooden sail
wooden sail
#

aight. well yeah, there's no reason in general why the parameters would have to be small πŸ˜›

wooden sail
#

nothing inherent to the network or the optimization problem will prevent the parameters from becoming arbitrarily large

#

you either add extra constraints to the problem or try to circumvent the issue somehow

#

a nice trick is that you can subtract an arbitrary constant from the softmax argument

#

it should be the case that softmax(x) = softmax(x + c) for arbitrary c (x is a vector here and c is a scalar)

#

so you could subtract a c such that all the x values are negative

#

then you won't get float overflows, but instead some of the entries may be softmaxed down to 0

#

does that make sense?

violet gull
wooden sail
#

no

violet gull
#

ok

#

and what would an example of the constant be

#

like 100?

wooden sail
#

max(x)?

#

then all the numbers are 0 or negative

#

and all the exponentials are <= 1

violet gull
#

wat

wooden sail
#

hmm?

#

lemme make a minimum working example

#

!e

import numpy as np

def vanilla_softmax(x):
    return np.exp(x), np.exp(x)/np.sum(np.exp(x))

def shifted_softmax(x):
    c = np.max(x)
    return np.exp(x-c), np.exp(x-c)/np.sum(np.exp(x-c))

x = np.array([10,60,100,4])

e, s = vanilla_softmax(x)
print(f"numerator: {e}")
print(f"softmax output: {s}")

e, s = shifted_softmax(x)
print(f"numerator: {e}")
print(f"softmax output: {s}")
arctic wedgeBOT
#

@wooden sail :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | numerator: [2.20264658e+04 1.14200739e+26 2.68811714e+43 5.45981500e+01]
002 | softmax output: [8.19401262e-40 4.24835426e-18 1.00000000e+00 2.03109266e-42]
003 | numerator: [8.19401262e-40 4.24835426e-18 1.00000000e+00 2.03109266e-42]
004 | softmax output: [8.19401262e-40 4.24835426e-18 1.00000000e+00 2.03109266e-42]
violet gull
#

oh

wooden sail
#

here you go @violet gull . see, the output of the softmax is the same, but this computation only involves numbers that are each <= 1

violet gull
#

and this will allow me to put in any number of inputs into the dense?

wooden sail
#

that's because the + c can be factored out and cancels out

wooden sail
violet gull
#

wat

wooden sail
#

this will just prevent the softmax from overflowing when the values are large

wooden sail
#

what does the number of inputs have to do with that?

violet gull
#

the big numbers only exist when the number of inputs is large

wooden sail
#

that's not in general true

violet gull
#

my results says it is

wooden sail
#

all you have is a small number of experiments

#

the only valid result here is computing upper and lower bounds for the cost function and its gradient

#

have you computed the lipschitz constant of the gradient of the cost?

violet gull
#

no

wooden sail
#

or done any other sort of analysis on the function

violet gull
#

no

wooden sail
#

ok

#

it is often the case that magnitudes of gradients and distances are larger in higher dimensions, but this does not necessarily translate into higher weights in the dense layer

#

that depends also on the properties of the cost function

violet gull
#
Softmax output: [0.0, 2.145833396191962E-129, 1.0, 0.0, 6.681095513076006E-290]Loss: NaN```
#

this is with 1 million inputs ^

#

1 iteration

wooden sail
#

mhm, so?

violet gull
#
Softmax output: [2.0499549890080238E-47, 0.9999999999966144, 1.2861755034666737E-52, 1.2367781133861323E-14, 3.3732857498361036E-12]
Loss: 3.385625113600146E-12```
#

this is 10000 inputs

#

the dense outputs in the large input are massive but the dense outputs in the smaller input are more reasonable

wooden sail
#

sure, in this particular case

violet gull
#

thats the problem

wooden sail
#

still it's no proof or guarantee πŸ˜› i also can't say whether it will work for arbitrarily large inputs because that may bring other problems

#

all i can tell you is that specific evaluations of softmax won't overflow

violet gull
#

im not understanding

wooden sail
#

you can either test and see if it works for you now, or do the math and show whether it works in general now

#

numerical results are not a proof

violet gull
#

it shows a trend

wooden sail
#

but not a proof

violet gull
#

that hasnt been wrong once

#

it is a fact that as I put in larger and larger amount of inputs, the dense outputs become larger

#

leading to e^really big number error

wooden sail
wooden sail
violet gull
#

with the softmax function yes

iron basalt
#

I have not really read this whole conversation, but consider changing your weight initialization scheme.

violet gull
#

but that just seems like a bandaid on a lareger issue

#

multiple people have said the outputs im getting should be impossible

wooden sail
#

what are we calling outputs here? the output of the dense layer?

violet gull
#

yes

iron basalt
violet gull
#

normal distribution'

iron basalt
#

Does you initialization take into account the number of inputs / outputs?

iron basalt
warm jungle
iron basalt
#

Choosing exactly which scheme will require analysis like Edd mentioned, but you can try something not that great like simply scaling by number of inputs to start.

lapis sequoia
#

why does rolling().apply() get me float altho the column is str

warm jungle
fickle rock
clever owl
#

I have the following series'

s = pd.Series([0,1,'random',2,3,4])

s2 = pd.Series([5,6,7,8,9,10])  

How can I use s.mask to return a series where every even number in s is replaced by s2, and elements in s that can't get evaluated per the condition get ignored (e.g. 'random')?

I tried this which gave an ValueError: Array conditional must be same shape as self

def is_even_if_is_number(x):
    if isinstance(x, int):
        return x % 2 == 0
    return False        

s.mask(lambda x: is_even_if_is_number(x), s2)

I want an output of this

0         5
1         1
2    random
3         8
4         3
5        10
mint palm
#

can we use a combination of fixed and flexible backbones? for a model where i am working with 3 modalities(text, video, audio)?

#

i was thinking of making audio fixed

boreal gale
# clever owl I have the following series' ```py s = pd.Series([0,1,'random',2,3,4]) s2 = pd....

i also didn't know the answer to this originally.

error ValueError: Array conditional must be same shape as self is an interesting one.

my approach to this is to use pdb to inspect what's going wrong.

final bit of the stacktrace is

File ~/.virtualenvs/poly/lib/python3.9/site-packages/pandas/core/generic.py:9052, in NDFrame._where(self, cond, other, inplace, axis, level, errors)
   9050         cond = np.asanyarray(cond)
   9051     if cond.shape != self.shape:
-> 9052         raise ValueError("Array conditional must be same shape as self")
   9053     cond = self._constructor(cond, **self._construct_axes_dict())
   9055 # make sure we are boolean

ValueError: Array conditional must be same shape as self

upon entering pdb and checking what cond is, we see cond is just array(-1) which doesn't seem right, this plus the error message points to the first argument being wrong.

and reading more into the docs of mask, i see you are supposed to pass in a conditional series with the same length (duh once you re-read the error message)

so, the correct thing to write is probably s.mask(s.map(is_even_if_is_number), s2).

sharp frost
#

can some help me pld

patent lynx
#

Just ask the question

clever owl
boreal gale
clever summit
#

Hello. I need help.

#

So here's a citation of my codes:
for c in range(len(centroidIdX)): #catat rekaman posisi centroid sebelumnya centroidRecordX=centroidIdX.copy() centroidRecordy=centroidIdY.copy() x,y,w,h=cv2.boundingRect(cnt) cv2.rectangle(frame,(x,y),(x+w,y+h),(0,255,0),2) #gambar centroid cx=int((x+x+w)/2) cy=int((y+y+h)/2) centroidDistance = math.hypot(centroidRecordX[c]-centroidIdX[c],centroidRecordY[c]-centroidIdY[c]) if centroidDistance<20: centroidIdX[c-1]=cx centroidIdY[c-1]=cy else: centroidIdX.append(cx) centroidIdY.append(cy)

However, it returns IndexError:

~\AppData\Local\Temp\ipykernel_13912\3650250176.py in <module>
     79                     cx=int((x+x+w)/2)
     80                     cy=int((y+y+h)/2)
---> 81                     centroidDistance = math.hypot(centroidRecordX[c]-centroidIdX[c],centroidRecordY[c]-centroidIdY[c])
     82                     if centroidDistance<20:
     83                         centroidIdX[c-1]=cx

IndexError: list index out of range```


What do you think? Does variable 'c' try to access an index position that does not exist at all? Which variable then?
patent lynx
patent lynx
clever summit
#

Yeah i know. But i was wondering how those 2 variable pairs have different length, that's all.

patent lynx
#

Oh right my bad you used copy, yeah one of the y should be in lowercase

clever summit
#

Another question. Here's the code now:

                    #catat rekaman posisi centroid sebelumnya
                    centroidRecordX=centroidIdX.copy()
                    centroidRecordY=centroidIdY.copy()
                    x,y,w,h=cv2.boundingRect(cnt)
                    cv2.rectangle(frame,(x,y),(x+w,y+h),(0,255,0),2)
                    #gambar centroid
                    cx=int((x+x+w)/2)
                    cy=int((y+y+h)/2)
                    cv2.circle(frame,(cx,cy),4,(0,0,255),-1)
                    centroidDistanceX=centroidRecordX[c]-centroidIdX[c]
                    centroidDistanceY=centroidRecordY[c]-centroidIdY[c]
                    #centroidDistance = math.hypot(centroidRecordX[c]-centroidIdX[c],centroidRecordY[c]-centroidIdY[c])
                    if centroidDistanceX<20 and centroidDistanceY<20:
                        centroidIdX[c-1]=cx
                        centroidIdY[c-1]=cy
                    else:
                        centroidIdX.append(cx)
                        centroidIdY.append(cy)
                    car=len(centroidIdX)```

From this code, i'm expecting centroidDistanceX to finally show a non-zero value, but instead:
```Frame ke- 16
CentroidIdX:  [256]
CentroidRecordX:  [160]
CentroidDistanceX:  0```

What's happening here?
patent lynx
#

Isnt centroidIdX[c] and centroidRecordX[c] the same

clever summit
#

No.

patent lynx
#

Because you make a centroidId a copy for centroidRecord. Thus, centroidIdX[c]-centroidRecordX[c] in the loop equals to zero? I feel there is missing code here

clever summit
#

Yeah, but then...

#

...I should have assigned new values for centroidIdX/Y...

#

Oh right.

#

I'm missing codes.

meager ocean
hidden mist
meager ocean
#

haha its a chat gbt project

#

should have mentioned that

#

its essentially supposed to be like a machine learning face recognition thing

#

@hidden mist

hidden mist
#

How would you define your Python knowledge?

meager ocean
#

bro spent that long just to say that

#

idk beginner

hidden mist
#

Yeah, start with something a little lower reaching.

meager ocean
#

ik its a chatgbt thing

#

u gonna say anything?

#

i can see ur typing 😭

hidden mist
#

Don't be rude, I'm under no obligation to assist you.

meager ocean
#

ik

lapis sequoia
#

would be good

meager ocean
#

but u typing and it takes 4 ever

lapis sequoia
#

im a beginner and I got recommended chatbot by many ppl

meager ocean
#

same

hidden mist
#

I'm trying to think of a polite way to say that what you're doing and the things you're asking require an explanation in a depth that I don't think you'll understand-- so essentially it just boils down to 'I want someone to write this for me.'

meager ocean
#

haha yes i understand all im saying is that you were constatally typing for 4 mins to say like 7 words

hidden mist
#

I know this is a wild concept, but sometimes I consider the implications of what I'm typing before I press the enter key.

meager ocean
#

idk what that means lol

hidden mist
#

Start with the basics, hello world and the like.

#

Build up to implementing machine learning and modeling in TensorFlow.

meager ocean
#

done that

#

u not lisning

#

this not a python thing

hidden mist
#

Ah, well good luck then, hopefully someone can help you out! πŸ™‚

meager ocean
#

this a chatgbt thing

lapis sequoia
#

Hey guys
Im a beginner in machine learning so my question might be pretty dumb
I tried searching the solution online but i couldnt find anything useful

model = Sequential()
model.add(Dense(10, input_dim = 7, kernel_initializer = 'normal', activation = 'relu'))
model.add(Dense(6, kernel_initializer = 'normal', activation = 'relu'))
model.add(Dense(1, kernel_initializer = 'normal'))

model.compile(loss = 'mean_squared_error', optimizer = 'adam')

model_history = model.fit(X_train, y_train, batch_size = 7, epochs = 100)
model.summary()```
I wrote this code and I got this error:

ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type int).```
How do I solve this issue?

meager ocean
#

idk lol

#

ask the old guy

lapis sequoia
lapis sequoia
meager ocean
#

not u

#

the import pandas guy

lapis sequoia
lapis sequoia
meager ocean
#

idk

lapis sequoia
#

bruh

lapis sequoia
#

ok i saw

#

but he's offline

meager ocean
#

some guy who spends like 1 min for 1 work

#

*word

lapis sequoia
#

to the person you are referring to

#

panda guy

lapis sequoia
lapis sequoia
#

bruh marlie can't even write a word in 1 minute 🀣

#

but can you help me solve my question first

#

he wrote for so long and sent nothin

meager ocean
#

like discord ppl

#

i broke my best mates arm on purpose and he dident care lol

hidden mist
#

<@&831776746206265384>

lapis sequoia
# meager ocean like discord ppl

this is an educational server and not a random server. I believe it is proper edicates to behave correctly in correct server. For eg., in a goofy ahh server be goofy ahh and in an educational server just be respectful to others

meager ocean
#

yh

lapis sequoia
carmine solstice
#

!mute 1010275447189287003

arctic wedgeBOT
#

:incoming_envelope: :ok_hand: applied mute to @meager ocean until <t:1676046609:f> (1 hour).

devout sail
#

!mute 1010275447189287003 incident_investigating

arctic wedgeBOT
#

:x: According to my records, this user already has a mute infraction. See infraction #85703.

devout sail
#

Thanks cheeki πŸ™‚

lapis sequoia
#

p e a c e

sleek herald
#

Excuse me, it's quite vague - I know nothing about programming and want to learn python to become a data analyst - how do I start? I downloaded anaconda and got jupyter notebook, heard it's a good place to start

lapis sequoia
#

they explain everything clearly and in a fun way

sleek herald
lapis sequoia
lapis sequoia
lapis sequoia
sleek herald
lapis sequoia
#

not only for data analysis

hidden mist
#

Wes McKinney is the original author of pandas, and he explains things in a level that is excellent for beginners broaching the subject.

sleek herald
#

Like I don't even know if I should use anaconda or pycharm, I'm at this level of knowledge

hidden mist
#

He gives you a precursory introduction to Python in Chapter 2, as well as making some recommendations himself on other resources.

#

You can audit Harvard's Data Analytics classes for free as well.

sleek herald
#

Is there a difference between the open edition and the physical copy? I'm from Poland so it's about 4times as expensive as it is for an average american

hidden mist
#

Use the open edition.

#

It's the same thing.

sleek herald
#

Thank you

hidden mist
#

I have the physical book, it's almost a direct copy.

sleek herald
#

What a good guy, so he's offering it for free but also allows other people to support him by buying the book?

hidden mist
#

This is a copy paste from a slack post but if you google any of these you're likely to stumble upon the free edition

#

SQL for Data Analysis: Advanced Techniques for Transforming Data into Insights
R for Data Science: Import, Tidy, Transform, Visualize, and Model Data (This is free linked above.)
Fundamentals of Data Engineering: Plan and Build Robust Data Systems
Python for Data Analysis: Data Wrangling with pandas, NumPy, and Jupyter
Designing Machine Learning Systems: An Iterative Process for Production-Ready Applications

sleek herald
#

are these all books written by him?

hidden mist
#

They're all O'Reily books, but no they're not all by Wes lol

sleek herald
#

Also last question please:
There are quite a lot of things to learn for data analysis
Python
sql
excel, kind of
probability and statistics
power bi/tableau

hidden mist
#

Would be pretty weird if Wes McKinney was writing R books, but I have no doubts he's familiar with the language.

sleek herald
#

it gets quite convoluted and I'm not sure where to start, does it matter which one I start learning first? Is it fine to start with python first?

hidden mist
#

We introduce sample spaces and the naive definition of probability (we'll get to the non-naive definition later). To apply the naive definition, we need to be able to count. So we introduce the multiplication rule, binomial coefficients, and the sampling table (for sampling with/without replacement when order does/doesn't matter).

β–Ά Play video
sleek herald
#

my probability calculus sucks though

hidden mist
#

If you feel you have a strong understanding of statistics, I'd start with learning the more advanced actions in Excel (PivotTables, Macros, etc.) So you always have a 'final destination' for data where, regardless of transformations in Python/R, you can manipulate it into something workable.

#

Then move onto either R or Python to learn how to actually manipulate your data in a programmatic sense.

sleek herald
#

There's a lot, but I kind of get the picture

hidden mist
#

Once you've nailed that, start looking into some of the more advanced concepts of PowerPoint, Tableau, Looker, etc. So you can actually turn your data insights into something valuable for stakeholders.

#

Kaggle is especially useful once you get some of the base concepts under your belt, they have competitions both currently and in the past that give you real problems to solve.

sleek herald
#

I heard a lot about kaggle, will definitely try that later down the line

#

But I'm really a baby bird at this point

hidden mist
#

Well I gave you some fantastic starting resources, dive in πŸ™‚

sleek herald
#

one more thing

#

wes kinney's book, is it good for someone who has 0 I mean 0 knowledge about python or do I need to learn some basic elsewhere first?

hidden mist
#

If you get to a point where the concepts are not making sense, he makes some recommendations in the book on building up that knowledge.

#

So... yes.

#

Even if the book itself doesn't address something, he's going to point you somewhere else that does.

sleek herald
#

Perfect, thank you so much

hidden mist
#

Feel free to poke me down the line if you find yourself lacking resources or a real direction to travel in. I gave you a pretty comprehensive roadmap on how I'd approach it, but it's not foolproof.

lapis sequoia
#

Good afternoon i just saw this channel which is very applicable to my current situation, I'm here with a 4090 and would like to utillize the true power of my GPU, it right now seems that programs are more or less depending on my CPU which is also strong but not as strong as my GPU ofcourse, I often have about 15 seconds of CPU 100% usage and then 5 seconds of GPU usage 100%, is there anyway to bring it all over to my GPU? I am currently using PyTorch.

If TensorFlow is better please also let me know as I'm still a beginner and want to learn more about what modules are good etc, i am personally coming from javascript but that's more or less a dead end with deep learning which is why I'm now learning python.

lapis sequoia
hidden mist
#

If you've got CUDA installed and properly configured, PyTorch and TensorFlow will both take advantage of your GPU to (generally) the best of their abilities, provided the functions you're calling are supported. I was under the impression that PyTorch wouldn't let you allocate a tensor and operate on it with CUDA if it was allocated to the CPU, but it seems that's incorrect, so perhaps you'll get a bit more information out of the documentation than I'm able to provide.
https://pytorch.org/docs/stable/notes/cuda.html

lapis sequoia
#

I have in fact properly installed it and it has confirmed with me that it is by using torch.cuda.is_available() which returns true

#

I first had some issues but I had found out cuda 11.8 was a requirement

hidden mist
#

That doesn't strike me as remarkably surprising for native behavior, and frankly I think whatever you ended up with falls in line with PyTorch's best practices. I'm not an expert by any means, but you're supposed to be device agnostic, so you should be addressing scenarios where a CUDA-device may not be available.

#

To resolve that to something that's a little more digestible, explicit optimizations need to be made regarding synchronization and the stream output from the CUDA device in general to ensure that your models are accurate when they're passed between environments. I wouldn't think that those optimizations are incredibly worthwhile for hobby applications-- there's a good chance that you're utilizing your GPU's CUDA cores for the demanding processes regardless by virtue of the way PyTorch is written.

lapis sequoia
#

Oh okay so I guess my CPU will just be utilized at its fullest aswell

#

Thank you

#

I'm going to be doing some more testing it's a lot of fun at least it is right now but my friends have told me the further you go the more painful it gets as more small issues occur which will be very hard to debug.

hidden mist
#

You can try asking again in a little bit, my knowledge on PyTorch/TF is tenuous at best, and attempting to explain something I don't have a comprehensive knowledge of is... difficult. lol

#

That isn't to say I think I mislead you, there just might be a different or easier to implement solution that I'm unaware of.

lapis sequoia
#

Ah yes no worries thanks a lot for your time and I will look further in the pytorch docs as they should say about anything I could need. My code right now is functioning, and it runs so I got the time to do other things while it's training and hopefully along the way I find a way to fix it but this may very well be the expected thing to happen and I just don't know, especially because my GPU is so powerful my CPU may not be able to catch up this quick. Will look into it and possibly upgrade where needed I guess as this is probably going to be a new chapter for me as developer.

#

I saw the resources you and some others sent above and I will also be taking those in use as I really am very interested in this and it seems to be the future.

#

It's kind of like this with GPU dropping to 10%~ every few seconds and CPU to 100% every few seconds

#

If someone knows if this is expected please do let me know

clever summit
#

Hello. I need some help.

#

So i have been trying to construct a traffic-counting program using openCV object tracking method. Well, i'm still a beginner so don't expect this code to use some advanced functions like math.hypot().

The problem i encountered is, i successfully implemented object detection using contours, and i pretty much get a hang out of it. But the object tracking code i've been working on still has this general issue: The vehicle counter rapidly increases up to a damn 500 every second, whenever my program detects more than one instance of vehicle. Here's the code:

#pada frame pertama
if frCount==1:
  #gambar bounding box
  x,y,w,h=cv2.boundingRect(cnt)
  cv2.rectangle(frame,(x,y),(x+w,y+h),(0,255,0),2)
  #gambar centroid
  cx=int((x+x+w)/2)
  cy=int((y+y+h)/2)
  cv2.circle(frame,(cx,cy),4,(0,0,255),-1)
  #tambahkan id centroid dan lokasinya
  centroidIdX.append(cx)
  centroidIdY.append(cy)
  else:
    for c in range(len(centroidIdX)):
      #catat rekaman posisi centroid sebelumnya
    centroidRecordX = centroidIdX.copy()
    centroidRecordY = centroidIdY.copy()        x,y,w,h=cv2.boundingRect(cnt)
    cv2.rectangle(frame,(x,y),(x+w,y+h),(0,255,0),2)
    #gambar centroid
    cx=int((x+x+w)/2)
    cy=int((y+y+h)/2)
    cv2.circle(frame,(cx,cy),4,(0,0,255),-1)
    #posisi baru centroidId
    centroidIdX[c]=cx
    centroidIdY[c]=cy
    centroidDistanceX = centroidRecordX[c] - centroidIdX[c]
    centroidDistanceY = centroidRecordY[c] - centroidIdY[c]
    if centroidDistanceX<60 and centroidDistanceY<60:
      centroidIdX[c-1]=cx
      centroidIdY[c-1]=cy
    else:
      centroidIdX.append(cx)
      centroidIdY.append(cy)
  car=len(centroidIdX)```

I've been watching many YouTube tutorial about object tracking fundamentals but none seemed to work. Maybe any of you can help me with this?
arctic wedgeBOT
#

Hey @simple fossil!

It looks like you tried to attach file type(s) that we do not allow (.docx). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a, .csv, .json.

Feel free to ask in #community-meta if you think this is a mistake.

hidden mist
# clever summit So i have been trying to construct a traffic-counting program using openCV objec...

Not familiar with this in general, so going to speak in the abstract, someone might have better intuition on how to implement this. But here goes:

What's happening is, since your centroid's are determined by your bounding boxes and you're drawing multiple bounding boxes, if those centroids are outside of the range (centroidDistanceX<60, etc) they're incrementing your counter. Since every instance of centroid being outside the bounding box will result in incrementing the counter, the centroids for subsequent vehicles also fall into that category.

#

Without more distinguishing information such as... looking at relative color/shapes observed between points, your solutions could be to rate limit them based on how far you think vehicles might travel in that time. It's not foolproof, but cars will never move more than once per frame, so there's no reason to do that calculation more than once per frame.

#

And you never need to calculate the distance between more centroids than were drawn in the previous frame. (That is, if I have two vehicles in one frame, and three in the subsequent frames, I can rate limit it by only ever counting the first two instances, which will now have a distance associated with each of them.)

#

If you're simply interested in parameterizing it as "I want to count vehicles." then job done. If you want to count moving vehicles and be a little more precise, you're looking at something like... looking at the first frame, looking at the second frame, looking at the third frame, and assigning a line between those three objects that indicates the path you anticipate that center point to be moving. If a centerpoint continues to move along that line within your bounding box, you can safely disregard any subsequent occurrences of that centerpoint until it's traveled off screen.

#

The longer you wait to increment the counter, the higher your confidence can be that you only counted one car-- if your line is of irregular shape as it would be if you were counting two opposing lanes of traffic (it would zig-zag) you can safely disregard that line, and your confidence increases that you've drawn the trajectory of the vehicle itself as you add more points to the line (two points could indicate that you drew a line from one lane of traffic or oncoming traffic to another and would be perpendicular or at least not parallel with the line of actual travel.)

#

Object tracking, as you've discovered, is very simple. Unique object tracking is incredibly difficult.

#

Another thing to consider is that a car will likely never go backward in a frame, if it's on a trajectory, it's probably going to remain on that trajectory. You can leverage this to your advantage when it comes to vehicles with similar or identical centerpoints traveling along the same axis.

lapis sequoia
#

Hey I am new to data science field. I am confused from where should I practice python questions for data science.
Can anyone help me?
And also if you have any pdf of python coding examples... please share

nocturne eagle
#

!resources

arctic wedgeBOT
#
Resources

The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.

drifting ridge
#

Hi everyone i want ask to apply LSTM for predicting stock trend but im still confuse how. Thank you very much.

arctic fog
#

Hi everyone, any Vertex AI pros here?

serene scaffold
lapis sequoia
hasty mountain
#

Guys, I'm trying to make a Reward Model to assign rewards to a Reinforcement Learning network. I think this reward model might be a good way to provide continuous rewards instead of dense rewards.

Question is...if each reward follows a mathematical logic(a math function), then might model will be able to deduce this function and output it without directly calculating it, right?
However...if my reward is something more subjective/relative(in situation A, such action could provide a reward +1. In situation B, the same action provides reward -3), then how could my reward model be affected?

#

I know that OpenAI even used a reward model that classifies actions from worse to best in order to provide a reward and mitigate subjectivies, but I don't think I'll be able to use this here...
Unless...maybe, if I used classes like "awful, bad, neutral, good, excellent" and its indices as rewards for each situation...

wanton stone
#

Anyone online rn? Need some help

serene scaffold
#

(I'm in class though.)

wanton stone
#

Gotcha xD

#

So I got csv file which has 8columns
I want to calculate speed for it
Ik it's distance/time
I got the value of time it's 10

Now I want to calculate distance
How do I go upon doing that?
I got frame values
Position values
Euler angle values which all comprise of 8 columns

#

Any help would be helpful!

serene scaffold
#

Show the csv as text

wanton stone
#

One sec

arctic bridge
#

Has anyone here ever worked with point clouds and ICP algorithm ?

wanton stone
#

Rx Ry Rz are the euler angles
Tx Ty Tz are the position values

#

The frame is 10milliseconds apart

wanton stone
#

@serene scaffold πŸ˜…

serene scaffold
#

Same with "does anyone know about x"

hidden mist
#

This looks more like a math question than a programming question lol.

wanton stone
#

This is the data thats given

serene scaffold
#

It's still a calculus question, from what I can tell. Which is fine

wanton stone
#

Ya just want to implement a code which let's me calculate the distance

#

Cause I have the time value =10

#

After getting the distance value i:ll just divide it by 10

#

But getting that distance values for x y z need some help in how to write it

hidden mist
#

Can you give a sample of the csv?

wanton stone
#

It's like this

hidden mist
#

I... should've saw that coming. Hold on I'll just make it myself with dummy values lol

wanton stone
#

XD

serene scaffold
#

@wanton stone please always do text as much as you can

wanton stone
#

Ya mb on that

hidden mist
#

Actually trying to implement this without a comprehensive understanding of the math behind it is proving to be a little more difficult than I anticipated.
You want to convert that into a pandas data frame.

import pandas as pd
isuckatnames = pd.read_csv('locationofthe.csv', skiprows=lambda x: x in [0,5])
# We stripped the column names off, let's add them back.
isuckatnames.columns = ['frame','subframe','rx','ry','rz','tx','ty','tz']
# We can probably index by frame now. 
isuckatnames.set_index('frame')
print(isuckatnames.head(10))
#

From that point you can act on the data as you see fit, I can help you if you can write out what you want to do math wise to the entries themselves.

wanton stone
#

Tryin to do without pandas

#

Pandas would be helpful
But tryin to do it with numpy

hidden mist
#

That's a strangely arbitrary restriction. Can you give me a hastebin of the csv file?

#

!paste

arctic wedgeBOT
#

Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

lapis sequoia
#

Is it possible to make a pyhton chat bot that can gain feelings ?

hidden mist
lapis sequoia
#

πŸ˜„

#

🫢

drifting spear
#

Hello I'm brand new to things like python etc so I'm sorry for what might seem like a dumb question to python and need to make a knn run in real time with a microphone is this possible?

arctic wedgeBOT
#

Hey @deep spire!

It looks like you tried to attach file type(s) that we do not allow (.ipynb). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a, .csv, .json.

Feel free to ask in #community-meta if you think this is a mistake.

wanton stone
clever summit
restive schooner
#

Hello folks I have a question for those that are AI, data scientist, machine learning engineers

serene scaffold
restive schooner
#

What type of math is needed for AI, data science, machine learning? From what I know stats is essential

serene scaffold
restive schooner
#

@serene scaffolddo you have a degree in mathematics?

serene scaffold
#

(it being computer science)

restive schooner
#

ahh ok
The reason I asked this is because am not sure whether pursue a masters in CS or Mathematics/Stats/Applied Mathematics@serene scaffold

bold timber
#

Hello guys, I have a question: Why I get a different result when I run the model with metrics parameters on compile the model and without metrics parameters?

hidden mist
#

There's a fair bit of overlap. If you want to lean into the analytics, a strong foundation in math is essential. If you're trying to build-a-bear your machine learning program, you'll end up utilizing a very good portion of both math and programming. If you're sticking more to pre-built ML packages like TF/PT/etc., or if you're looking to bark more up the Data Engineering tree, you'll end up relying a lot more on your CS/programming fundamentals.

At least that's my take.

bold timber
restive schooner
hidden mist
#

If you ask a programmer, they're going to say you need math. If you ask a mathematician, they're going to say you need programming.

#

I can say confidently that I'm so incredibly far out of my depth even having done college statistics when it comes to some of the raw math behind the modeling of machine learning.

#

I considered elaborating on that further but I'm going to leave my math-confidence alone and avoid kicking myself while I'm down KEKW

restive schooner
#

@hidden mist I may not know much but if you know how to program in this life you can walk but if you know both programming and mathematics, you can basically fly
Yea I know, its a weird analogy lol

hidden mist
#

I'd say that's fairly accurate honestly. I switched majors from CS to Chemistry when I was 19, and really my only requirements for both were some basic math classes. Nothing as advanced as I think you genuinely need to take advantage of in ML applications. Now that I'm going back to school, I don't have to take any math classes for Cybersecurity. And while I realize the applications of CS and Cybersecurity are very different fundamentally, I can still see a lot of benefit in having a strong math background in any IT/CS or programmer-related field.

bold timber
#

But why I get different result like this? I mean when I just set loss parameters (without metrics) the result is only showing for model 0

#

This is the code that I've created without metrics parameters @deep spire

terse kindle
#

I have a mini project in ML to do in one month. But Im new to this. Recommend me any best courses on the basics and intermediate levels on ML that you guys know of

bold timber
#

But do you know the reason why for my case?

bold timber
#

But when I only use metrics = ['mae'] the result is still same like this.

arctic crown
#

please help
i have a very basic polynomial regression program here

from sklearn.preprocessing import PolynomialFeatures
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt


dataset = pd.read_csv("hiring.csv")

model = PolynomialFeatures(degree=2)
model.fit([dataset.level], dataset.salary)

plt.scatter(dataset.level, dataset.salary)
plt.show()```

and i was just wondering how can i actually see the regression line on the graph
violet gull
#
expected = torch.zeros(5)
expected[3] = 1
actual = torch.zeros(5)
actual[3] = 1
lossFunction = nn.CrossEntropyLoss()
print(lossFunction(actual, expected))```
#

tensor(0.9048)

#

it is lying to my face and i do not appreciate it

arctic saffron
arctic saffron
junior stone
#

what do yall think do you agree ?

junior stone
#

lol

#

nickname

arctic saffron
#

Can I have a look to learn please? though im still a newbie to DS and AI🀣

#

Alright sure thank you!

clever owl
#

Is there a way to easily filter a df by its index, like you would when filtering a column? Or do I always have to reset_index and treat the index as a column

e.g. I want to get the sums for col1 and col2 for all even indexes

df = pd.DataFrame({
    'col1': [5,6,7,8,9,10,11],
    'col2': [12,13,14,15,16,17,18]
})

df.index.name = 'index_name'

# This will give a key error since 'index_name' isnt a column name
df.loc[df['index_name'] % 2 == 0][['col1','col2']].sum()
clever owl
#

Oh lol that works yeah

#

thanks haha

odd dagger
#

is there any way in pandas to read csv and excel file with one single function instead of read_csv or read_excel

#

ping me to reply, thanks

south viper
odd dagger
#

I am using os to split filename and extension

#

then running a if case to check whether its excel

distant tide
#

anyone know anything about fourier analysis using PyFFTW or SciPy FFTpack

tender knot
#

import cv2 as cv
import numpy as np

blank= np.zeros((500,500), dtype='uint8')# numpy.zeros:Return a new array of given shape and type, filled with zeros.
cv.imshow('Blank', blank)
#draw a line:
cv.line(blank,(0,250),(0,255,0))
cv.waitKey(0)

#

hey this is my code

#

it saids that theres unexpected indent on line 5

#

im currently working on opencv rn

#

can anyone tell me why

distant tide
#

can you send it in python format @tender knot

tender knot
#

yes

arctic wedgeBOT
tender knot
#

i send this link instead of the file bc of this

#

i hope it still works

distant tide
tender knot
#

yeah but

#

it still says unexpected indent

#

should i save or create a new terminal

distant tide
#

run it in an IDE

#

like replit or pycharm

tender knot
#

tks i

#

succed

distant tide
#

what

clever owl
#

How can I filter for all rows where a label is not present. e.g. I can select rows where the index is 'label', df.loc['label'], how can I get the inverse? All rows where the label is not present instead

strange wolf
#

does a free faster compiler than google colab with gpu exist?

green wasp
#

id if it's bad etiquette but #1073962529291771964 if anyone has two mins to spare. Trying to learn ComputerVision and getting stuck on something that I honestly feel should be trivial, but I don't have the skills to fix it. I know the problem, idk how to tackle it

tacit galleon
#

Hi everyone.
I have a question.
I have a group of 10 images and I want to compare that images with a new images to know from my base group what images is the most similar, so my question is what is the most easy way to dos this I mean I don't need to be super precise

serene scaffold
tacit galleon
#

There are images from a building

chrome zealot
#

i wanna know that how can i get into ai

tacit galleon
#

So I just want to compare a zone from the building

tacit galleon
#

right?

serene scaffold
tacit galleon
serene scaffold
tacit galleon
#

I don't know if this is the best way to solve this problem.
I didn't explain well

#

I have this group of images

#

So if I past a new image

#

like this one

#

the 2 most similar are the first ones

#

and at the end I need to select one from those 2, the most similar image

serene scaffold
#

are you always comparing new images to that set of 10?

tacit galleon
#

Yes

slender wyvern
#

you could calculate the sum of the squared distances or look for the maximum value in the cross correlation matrix

tacit galleon
serene scaffold
# tacit galleon Yes

so for this, you might want to use a distance formula. in AI, "distance" is where a function has two parameters, and it returns 0 when the two are exactly the same, and it never returns a negative number.

tacit galleon
serene scaffold
#

I'm not an image AI person, though, so all I can do is give you terms to Google.

slender wyvern
tacit galleon
slender wyvern
#

If your image is stored as a numpy array, you can simply apply the squared distance calculation pointwise and perform the normalized sum at the end to get a scalar.

tacit galleon
#

Okay

#

I'll give it a try and see what my results are.

green wasp
#

Anyone know why opencv isn't finding the sudoku puzzle in the middle? Thought it had to do with the line being too thin after blurring so I tried to make it bolder but it still doesn't find it

#

if I add some noise it does

#
def findBiggest(cont):
    biggest = np.array([]) # Init empty array of biggest points
    max_area = 0
    for i in cont:
        area = cv2.contourArea(i) # Get area of cont
        print(area)
        if area > 50:
            # Is big enough to be considered
            peri = cv2.arcLength(i, True)
            approx = cv2.approxPolyDP(i, 0.02*peri, True) # Approximates perimeter
            if area > max_area and len(approx) == 4: # if the segment we're considering is bigger than the current and it's rectangular/square
                biggest = approx
                max_area = area
    return biggest, max_area

def order(points):
    points = points.reshape((4,2))
    pNew = np.zeros((4,1,2), dtype=np.int32)
    add = points.sum(axis=1) # We sum all the values on the X axis finding which is higher
    pNew[0] =  points[np.argmin(add)]
    pNew[2] =  points[np.argmax(add)]
    diff = np.diff(points, 1) # We subtract all values on the X axis to find which is lowest
    pNew[1] = points[np.argmin(diff)]
    pNew[3] =  points[np.argmax(diff)]
    return pNew

def contours(image_neg, img_orig):
    # find all contours
    contours, hierarchy = cv2.findContours(image_neg, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE) # Find all external contours
    print(contours)
    #cv2.drawContours(img_orig, contours, -1, (0, 255, 0), 10) # contourIdx is neg so all cont are drawn
    biggest, max_area = findBiggest(contours)
   # print(biggest)
    if biggest.size != 0:
        biggest = order(biggest)
        #print(biggest)
        return cv2.drawContours(img_orig, biggest, -1, (0, 255, 0), 10) # contourIdx is neg so all cont are drawn``` the code
slender wyvern
#

On another note - I have a pandas dataframe with a lot of unneeded entries,
therefore I want to convert it to a sparse array.

As an example, create some sparse data:

import numpy as np
import pandas as pd
from numpy.random import default_rng

rng = default_rng(seed=0)
dates = pd.date_range("20130101", periods=6)
df = pd.DataFrame(rng.standard_normal(size=(6, 4)), index=dates, columns=list("ABCD"))
df_filtered = df[(df > 0.5)]

Now I can convert this to a scipy.sparse.coo_array object, but the conversion
is somewhat awkward, as I have to specify the astype conversion using
pd.SparseDtype multiple times. Is there a better way than what I'm doing?

from scipy.sparse import coo_array

def df_float_to_coo_arr(dataframe: pd.DataFrame) -> coo_array:
    return coo_array(
        dataframe.astype(pd.SparseDtype("float", fill_value=np.nan)).astype(
            pd.SparseDtype("float", fill_value=0),
        )
    )
>>> df_float_to_coo_arr(df_filtered)
<6x4 sparse array of type '<class 'numpy.float64'>'
 with 5 stored elements in COOrdinate format>
inland grotto
#

I impressed ChatGPT, impression is an emotion. Thus bot having an emotions

clever owl
#

In Pandas what's the best way to store a time period. e.g. 1/1/2020 - 3/1/2020. I'm not trying to get all the dates in between, I just want to store that period of time. Pandas has a pd.Period but it doesn't seem like what Im looking for. There's this SO thread https://stackoverflow.com/questions/62580625/in-pandas-is-there-any-way-to-create-a-time-span-between-two-dates but it's a bit inconclusive

arctic wedgeBOT
#

pandas.date_range(start=None, end=None, periods=None, freq=None, tz=None, normalize=False, name=None, closed=_NoDefault.no_default, inclusive=None, **kwargs)```
Return a fixed frequency DatetimeIndex.

Returns the range of equally spaced time points (where the difference between any two adjacent points is specified by the given frequency) such that they all satisfy start <[=] x <[=] end, where the first one and the last one are, resp., the first and last time points in that range that fall on the boundary of `freq` (if given as a frequency string) or that are valid for `freq` (if given as a [`pandas.tseries.offsets.DateOffset`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.tseries.offsets.DateOffset.html#pandas.tseries.offsets.DateOffset "pandas.tseries.offsets.DateOffset")). (If exactly one of `start`, `end`, or `freq` is *not* specified, this missing parameter can be computed given `periods`, the number of timesteps in the range. See the note below.)
clever owl
#

Oh damn yeah that's perfect thanks

clever owl
# serene scaffold !docs pandas.date_range

Actually nah I just want a period between two dates like 2023-01-28 - 2023-01-31 , not all the dates between them like this that date_range gives

[2023-01-28 00:00:00, 2023-01-29 00:00:00, 2023-01-30 00:00:00, 2023-01-31 00:00:00],
clever owl
#

Is there no cleaner object to store a time period like this than date_range?

serene scaffold
#

year_2017 = pd.Interval(pd.Timestamp('2017-01-01 00:00:00'),
pd.Timestamp('2018-01-01 00:00:00'),
closed='left')
pd.Timestamp('2017-01-01 00:00') in year_2017

year_2017.length

pallid ore
#

i know this image is fake but what’s the names of these types of applications

#

can’t find any other good example

clever owl
drifting ridge
# serene scaffold Try being more specific

I first look up some web and try to implement it on my data but it look a little bit crazy as it's parallel to test data. My point is I want to use it just to predict only the direction of stock price up or down but idk how to change from stock price to direction of stock price.

drifting ridge
mild dirge
#

Well you can just predict the next stock price and see if it is higher or lower than the current price

drifting ridge
mild dirge
#

"close"? What do you mean

#

I assume you have the stock price as feature?

drifting ridge
#

my data look like this i use close price as the feature

#

i have looked for RandomForest too but the precision_score is about half only

mint palm
#

can anyone give me a pro tip on how to understand CODE of complex architecture with high amount of abstraction?
I am new to transformers, understand the theoritical architecture but having problem incorporating my own changes/tweaks to the model

serene scaffold
#

@mint palm are you using pytorch or what

#

if you're using, for example, a BERT model with pytorch, you can look in and see what all the layers are

mint palm
serene scaffold
mint palm
#

oh, Currently i dont understand how the data is being fed, the modal uses 3 madalities and i am confused how much of each is being fed

serene scaffold
#

yeah. they don't care if other people can run their code. they just want to get the results and get the paper out.

mint palm
#

i was able to run it but i get the point.

lapis sequoia
#

So guys I just trained my first model using pytorch to a degree where i deem it accurate, I have stopped training to prevent overfitting. I would like to use my model in a javascript website, does anyone know a good module to use and predict via javascript? Or should I setup a python api and do it like that. I'd prefer the first option as it will save me a huge amount of money.

dim palm
#

Hi everyone !
I am a French student from Polytech Nantes (France) on third year of engineer degree called "Data Engineering and Artificial Intelligence".
I am seeking for a 9-week internship abroad in the area of data science for summer 2023 from July 3rd to September 1st.
I would be grateful pour any opportunity ! Please send me a message πŸ™‚ my mp are open

lapis sequoia
#

Is abroad a must?

#

while i don't represent any company i may know some where u could try but those are also from france however they have branches in different countries though

dim palm
lapis sequoia
#

Oh then nvm

dim palm
#

the company could be french but in another country with english like professional language

lapis sequoia
#

are you fluent in english?

#

or at least can speak it very well

dim palm
lapis sequoia
#

Hmm if u seek in the Netherlands let me know i know some companies who will probably be happy to have you

#

Not sure how open they are to abroad students but at least one will probably let you

dim palm
#

I'm looking this internship in all over european union (easier for visa) so Netherlands is a great opportunity

#

I would be very happy if you can connect me with these companies πŸ™‚

lapis sequoia
#

You could try the ING (dutch bank), Dutch Police (Politie), Albert Heijn (dutch super market), Blender (3d editing program) and some more but i cant name them from the top of my head rn

dim palm
#

Yes I know ING, huge bank

lapis sequoia
#

Yes

#

they allow dutch students often

#

so you should definitely try

dim palm
#

Thanks for company names, I am going to add them in my list

dim palm
lapis sequoia
# dim palm Yeah thx

No problem make sure to not depend on one and just contact them all as it's not worth risking it for one company although they usually do what they say you really don't want to come to the Netherlands to find out something didn't work out

#

Get them to send you a digital contract or something on paper

dim palm
silent pendant
#

Could anyone explain why my MLPRegressor returns different results each time with a set random state & np seed? I’m super new to machine learning and I got somethin weird goin on

#

The regressor seems to be way off like 50% of the time, somewhat accurate maybe 40-45% of the time, and dead-on maybe 5-10% of the times I run the code

silent pendant
#

Ah no my bad, I didn't realize train_test_split() had a random_state parameter

heavy chasm
#
import time
import playsound
import speech_recognition as sr
from gtts import gTTS

def speak(text):
    if os.path.exists(filename):
        os.remove(filename) 
    tts = gTTS(text=text, lang="en")
    tts.save(filename)
    playsound.playsound(filename)

def get_audio():
    r = sr.Recognizer()
    print("Listening")
    with sr.Microphone() as mic:
        audio = r.listen(mic)
        try:
            said = r.recognize_google(audio)
            said.remove
            print(said)
        except sr.UnknownValueError:
            print("Sorry I didn't get that")
    return said

get_audio()```


when i say "what's up" i get
```result2:
{   'alternative': [   {'confidence': 0.93134201, 'transcript': "what's up"},
                       {'transcript': 'WhatsApp'},
                       {'transcript': 'whats up'}],
    'final': True}
what's up```

while it works, i want to only have the output be "what's up" 
what do i do?
hasty mountain
#

Guys, why is a Residual Network able to extract features and classify an image better than a VGG model?
I mean...the VGG architecture is quite straightforward and intuitive: the initial layers deal with the complete image, then apply pooling to extract the most relevant features in the feature maps, and then the next layers try to create even more feature maps(thus, extracting even more features) from these relevant features. Until, in the last feature extraction layers, the model might be able to have encoded the features for almost all pixels in the initial image.

However, the Residual Network doesn't have this. It simply extracts feature from the input, then applies a skip connection, then try to somehow generate features based on the features extracted before while also considering the original input...

#

It's a bit confusing...but quite efficient. At least, in my GAN, a Generator with a residual architecture(4 million parameters) is able to collapse a VGG discriminator with around 40 million parameters. I had to tune the learning rate and "nerf" the generator in order to make them converge pithink

bold timber
#

Hello guys, do you know why the score keep changing when I run the model for several times even though I've set random seed?

grand swan
#

Dear data science and AI, I come to you humbly asking for your sincere inputs on a computer vision project. I am attempting to make a script that looks at a 2d floor plan, and can detect walls, windows, and the floor within the walls.

My first attempt was to use numpy and matplotlib to extract the rgb values of the image and create an array, but Im struggling on which will be the optimal way for the machine to differ windows from walls, i was thinking measuring thickness between pixels maybe?

patent lynx
# bold timber Hello guys, do you know why the score keep changing when I run the model for sev...

https://stackoverflow.com/questions/60058588/tensorflow-2-0-tf-random-set-seed-not-working-since-i-am-getting-different-resul

In summary you can either:

  1. reset the kernel because there is an issue with running in sequence
  2. or pass a function to reset random seed into the layers:
def reset_random_seeds():
   os.environ['PYTHONHASHSEED']=str(2)
   tf.random.set_seed(2)
   np.random.seed(2)
   random.seed(2)
arctic wedgeBOT
latent wedge
#

why do I get this error when trying to graph piecewise functions?

import numpy as np
import matplotlib.pyplot as  plt
f = lambda x: 0 if x<0 else 1
x = np.linspace(-1, 1, 1000)
f(x)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_5732/2508726767.py in <module>
      1 f = lambda x: 0 if x<0 else 1
      2 x = np.linspace(-1, 1, 1000)
----> 3 f(x)

~\AppData\Local\Temp/ipykernel_5732/2508726767.py in <lambda>(x)
----> 1 f = lambda x: 0 if x<0 else 1
      2 x = np.linspace(-1, 1, 1000)
      3 f(x)

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

​```
#

(it's in jupyter lab if that's relevant)

#

well i fixed it by doing

f = lambda x: 0 if x<0 else 1
x = np.linspace(-1, 1, 1000)
y = np.array([f(k) for k in x])
plt.plot(x, y)```
but I'd like to know why that error happens anyway
boreal gale
# latent wedge well i fixed it by doing ```py f = lambda x: 0 if x<0 else 1 x = np.linspace(-1...

sure.
you have f = lambda x: 0 if x<0 else 1
when applied to that numpy array, it contains this if condition within
if x<0
which is
if np.linspace(-1, 1, 1000)<0
which is
if bool(np.linspace(-1, 1, 1000)<0)
but
np.linspace(-1, 1, 1000) < 0 is numpy array of 1000 true or false, "bool(numpy array of 1000 true or false)" is ambiguous (that's what it meant in the error message - there is no good way to determine what you want), hence you need to use *.any() or *.all() as suggested - but looking at your fixed code, this isn't required at all.

latent wedge
#

thanks!

tidal bough
#

You want np.where(x<0, 0, 1) for this.

wooden sail
#

you can do it a bit easier

#

x[x < some value] = some othe value

#

or more easily x < 0

#

!e

import numpy as np
x = np.array([-1,-1,0,1,2])
print(x < 0)
arctic wedgeBOT
#

@wooden sail :white_check_mark: Your 3.11 eval job has completed with return code 0.

[ True  True False False False]
wooden sail
#

booleans in python already behave as 0 and 1 and directly cast as such when doing math later on

tender knot
#

[WinError 3] The system cannot find the path specified:

#

import os
import numpy as np
import cv2 as cv

p=['Lee Nadine', 'Elon Musk']
DIR = r'C:\Users\HONG-ANH\Downloads\compvision\img'
haar_cascade= cv.CascadeClassifier("haarcascade_frontalface_default.xml")

#array of image for faces:
features=[]
labels=[]
def create_train():
for person in p:
path=os.path.join(DIR, person)
label=p.index(person)
# loop thru image in one folder
for img in os.listdir(path):
img_path = os.path.join(path, img)
#read thru the images in that folder
img_array = cv.imread(img_path)
gray = cv.cvtColor(img_array, cv.COLOR_BGR2GRAY )

        faces_rect= haar_cascade.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=4)

        for(x,y,w,h) in faces_rect:
            faces_roi = gray[y:y+h, x:x+w]
            features.append(faces_roi)
            labels.append(label)

create_train()
print(f'The length of the features={len(features)}')
print(f'The length of the label={len(labels)}')

tender knot
#

although i did include the full path

#

can anyone explain to me why

lapis sequoia
#

where can i start with AI?

spare mulch
#

look at the pins @lapis sequoia

lapis sequoia
#

so based on the first pinned message, i think i actually wanna do machine learning.

spare mulch
#

you can find some tutorials on how to use a simple ML algorithm with python on yt, with tensorflow or without, from scracth difficult and easy

#

btw, ||chatgpt||

lapis sequoia
#

i find youtube tutorials really annoying. i think i'm gonna read the books in the pinned messages

atomic tide
# lapis sequoia where can i start with AI?

If you want to get straight into things, there are some nice short tutorials on Kaggle. But you still need to put time into studying the theory of the subject, and for that books are your best bet.

prime hearth
#

@lapis sequoia it also depends which area of AI you are interested and your objective. Is it to land a job or just a hobby? If you want a job you have to see which field of AI simply by looking at the jobs in your area what they look for. Then research the common tools the company is using and learn the related AI topics. But for any AI you need to learn programming basics, numpy and pandas, how to clean data and then go in depth into the stats/math that relate to the AI and algos. Lots of youtube and articles on medium that give like a small roadmap or where to start depending what interest you.

hasty mountain
#

Guys, considering that in Unsupervised Learning the model tries to categorize the data in a way that it provides the smallest entropy as possible...is it possible for a unsupervised learning model to overfit?
I'm trying to make a neural network with unsupervised learning + supervised fine-tuning, but, though the model is correctly following a logic for determining its output, it's not providing the output in a value close to what I desire(or close to my labels in fine-tuning). I was thinking about using more iterations for training it, but I'm afraid it might somehow "overfit", it this is even possible in unsupervised learning

#

I've read that "overfitting" in unsupervised learning would be the entropy of the pseudolabel being close to 0, thus the model is almost 100% sure of the pseudolabel it assigned to that data.
How can this be bad, though?

mild dirge
#

Because the model is overconfident in it's decision, that could maybe indicate that it is able to fully store the examples, instead of generalizing.

#

What would you even base that unsupervised learning on?

#

Because just reducing entropy of output means that the model will fit to a single label (doesn't matter which) which doesn't seem very productive

#

And overfitting in unsupervised learning is def. a thing

hasty mountain
#

The reducing entropy is more for multi-label classification. At least most papers I see deal with multi-label classifiers
Though I admit I'm using a regressor model for the task. It's for labeling a dataset

hasty mountain
mild dirge
#

You could maybe make an auto-encoder for the unlabeled data

hasty mountain
#

The model "memorizes" the relation input - label

hasty mountain
mild dirge
#

That way it learns some kind of embedding that holds important information about the data

hasty mountain
#

For my personal model, I'm making it encode an input image and assign a pseudolabel to it(following a ResNet-like architecture).
Then I iterate over the same image again(the model has dropout layers, 0.35) and make it assign a pseudolabel again.
Then I use this MinEnt

#

Before I discovered this paper, I was simply applying a MSE(pseudolabelA, pseudolabelB) as a consistency loss.
Now I'm using this technique, but instead of Cross Entropy, I'm using MSE.

#

And it seems to work pretty well...even better than directly using MSE.

#

But the thing is...my labels for certain input images are like "22, 24, 27", and the pseudolabels the model is generating are "7, 7.55, 7.88".
So the model gets the idea, but it doesn't generate the output value I desire.
I was thinking about making it iterate more and more through the data, but, since overfitting is also a problem here, perhaps I shouldn't...

mild dirge
#

I'm not really familiar with that specific unsupervised learning process, so I don't have much to say on it, srr

hasty mountain
#

Oh, I thought the idea was more or less the same...decreasing entropy, capturing patterns...

#

At least, when I see unsupervised learning in neural networks, people usually compare it to KNN and PCA

#

Though I admit I don't remember how PCA works

sleek aurora
#

hey guys , I am a beginner who wishes to learn and explore more abt data science

#

is "kaggle" the website good for it ? If one could give an opinion about it...would be helpful perhaps

woven coral
vestal ocean
#

Hey there, is there anyone that is knowledgeable about nlp that could I ask them a question?

serene scaffold
vestal ocean
#

Could someone possibly check my code?

serene scaffold
#

!code

arctic wedgeBOT
#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

serene scaffold
#

you always want to ask complete questions that someone who knows the answer can jump right into.

vestal ocean
#
def prob_classify(self, d: List[Any]) -> Dict[str, float]:
        """
        Compute the probability P(c|d) for all classes.
        :param d: A list of features.

        :return: The probability p(c|d) for all classes as a dictionary.
        """
        

        prob_classify = {}
        sum_of_probabilities = sum(self.prior.values())
        
        for individual_class in self.prior:
            prob_classify[individual_class] = (self.prior[individual_class])
            for feature in d:
                if feature in self.vocab:
                    prob_classify[individual_class] *= self.likelihood[individual_class][feature]
                else:
                    prob_classify[individual_class] *= self.alpha/(len(self.vocab)*self.alpha)
        
       
        for individual_class in prob_classify:
            prob_classify[individual_class] = prob_classify[individual_class]/sum_of_probabilities
            
    
        

        return prob_classify ```
#

This is my function^

frozen marten
#

guys why do we use a 2d unet for an rgb image segmentation
different ppl give different reasons
im unable to say which one is legit
each one gives their own reason

#

ping me on reply

topaz bear
#

Hello everyone, does anyone know a python library for interacting with ChatGPT? Right now I am trying out ChatGPT Wrapper but it is extremely slow, if you know of any alternatives that would very helpful.

serene scaffold
lapis sequoia
#

List=[a,b,b,c,b,c,b,b,a,b,b]
How can I print the indexes of each 'b' in a single line in python.
( How can I print the indexes starting with 1 instead of 0)

young granite
young granite
#

i tried to let him learn a bit lel

#

@deep spire are u a bot? ur info misleads me πŸ˜„

agile cobalt
young granite
#

she makes fun of us

agile cobalt
#

a link to an announcement
tl;dr chatgpt is banned from this server

gilded kestrel
#

is there any other community more geared towards reinforcement learning?

agile cobalt
#

the way you used it seems fine so far but I'm not 100% sure
not my call to make either way though shrug

gilded kestrel
#

i'm not looking for subreddits

young granite
#

@deep spire can u do math?

#

hehehehe

#

@deep spire generate 10000 different sine functions and assign features for each one which correlate to the function.

gilded kestrel
#

this is annoying

young granite
#

roast is fire tho

wheat snow
#

sup guys

#

im a bit confused with this error message

#
 s_edge = self.ax_pos[0] - 0.25 + self.lim_offset
IndexError: index 0 is out of bounds for axis 0 with size 0
pallid ore
#

just kinda bored

mild dirge
#

Two of those are models not libraries

#

Also pytorch is not "an implementation of faster r-cnn"?

#

Seems like a sussy gpt response πŸ˜›

#

Pytorch is a bit easier to install with cuda though imo, it gets installed in your env together with pytorch.

young granite
#

dont be political

#

πŸ—Ώ

#

i like nera for technical reasons

#

my head spins now

#

now i do dislike nera...

#

narcissistic

#

can u generate pictures?

wheat snow
#

why always da americaa

#

nah not fr