#data-science-and-ml
1 messages Β· Page 46 of 1
It will probably be easier to sort them out before taking the square root, as the two solutions will be conjugates (in the complex case) or just two real numbers. If they're complex conjugates, you could put all the solutions with positive imaginary component on one side, and all the solutions with negative imaginary component on the other side. If you have two real solutions, you could group together the solutions that are the smaller of their pair, and group together the solutions that are the larger of their pair.
Oh, cool. Is there any trusted guide on how to get acquainted with some NLP methods for Twitter tweet analytics? in like a week?
There are many guides available, I know. But I want one which people trust (since I got a project deadline to meet so need to utilise what time I get for it)
in your example
[1.50622964+650.25736311j, 449.3031039+25.95939333j]
[1.87818391 +631.29563062j, 789.98518552+34.33014745j]
[1402.82082129+84.79794406j, 2.40353116 +607.05689764j]
[1602.45701021+4146.32391044j, 3.18701564 +575.16495683j]
what would be your desired output?
[1.87818391 +631.29563062j, 789.98518552+34.33014745j]
[2.40353116 +607.05689764j, 402.82082129+84.79794406j]
[3.18701564 +575.16495683j, 1602.45701021+4146.32391044j]```
[ 5.96173114+1112.47599554j, 33.73591086 +181.08331466j]
[ 5.79370343+1108.15897559j, 40.55934614 +154.85239687j]
[ 51.96897045 +124.30033042j, 5.6327504 +1103.9652141j ]
[ 73.19468058 +90.8068841j , 5.47847438+1099.88955899j]
[106.66038373 +64.1440674j , 5.3305056 +1095.92714543j]```
would ideally be
```[ 6.13726168+1116.92173512j , 29.1556275 +203.88033415j]
[ 5.96173114+1112.47599554j , 33.73591086 +181.08331466j]
[ 5.79370343+1108.15897559j , 40.55934614 +154.85239687j]
[ 5.6327504 +1103.9652141j , 51.96897045 +124.30033042j]
[ 5.47847438+1099.88955899j, 73.19468058 +90.8068841j]
[ 5.3305056 +1095.92714543j, 106.66038373 +64.1440674j]```
interesting. the last entry's imag component is a pretty big jump from the previous
yeah there can still be legitimate big jumps
you can see here the big vertical line is where they have swapped incorrectly
that's with some conditions put in to check
this is what it looks like with no conditions
the second column should be relatively consistent though
okay, basically you are trying to optimise for minimal difference between successive entries's imaginary component (or maybe + real)
yeah i had (excuse the awfulness):
condition_1 = abs(item[0].imag - previous[1].imag) < abs(item[0].imag - previous[0].imag)
condition_2 = abs(item[0].real - previous[1].real) < abs(item[0].real - previous[0].real)
condition_3 = abs(item[1].imag - previous[0].imag) < abs(item[1].imag - previous[1].imag)
condition_4 = abs(item[1].real - previous[0].real) < abs(item[1].real - previous[1].real)
if (condition_1 and condition_2) or (condition_3 and condition_4):
swap()```
and it worked for one setup, but now i've made the variables a bit more complicated some rogues have slipped through the cracks
Hello, first time seing pandas in our huge python code base. It's used to output an html table. (quite a simple usecase I think). Question copied from python-general:
What does it mean to style a pandas dataframe, and why is the outputted html so disgusting? example:
<th id="T_c9383ca4_a7c2_11ed_9342_f02f74177e5elevel2_row56"
It's annotating every table cell with a unique id π²
here's a csv of the data before taking the square root if that helps
(all these hashes seem to change when the content of the cells change, with makes it really difficult to look at the changes of the outputted html with a plain diff / git diff)
oh, that actually is a lot better
you can see in row 27 a swap needs to happen
They even get an id even if it's not used for anything. T_T
Oh, even funnier. I thought the html output was excessively large, turns out the markdown output is twice the size π€£
import csv
from pathlib import Path
file = Path("ky_roots.csv")
res = []
with file.open() as fo:
reader = csv.DictReader(fo)
for line in reader:
a, b = map(complex, (line["0"], line["1"]))
if res:
last_a, last_b = res[-1]
dist = lambda a, b: abs(a - last_a) ** 2 + abs(b - last_b) ** 2
# change order if that makes it closer:
if dist(b, a) < dist(a, b):
a, b = b, a
res.append((a, b))
this works for me
import cmath
roots = [(cmath.sqrt(a), cmath.sqrt(b)) for a,b in res]
plt.figure()
plt.plot([a.real for a,b in roots])
plt.plot([a.imag for a,b in roots])
plt.plot([b.real for a,b in roots])
plt.plot([b.imag for a,b in roots])
plt.show()
makes the plots pretty continious I think
sadly that's VERY CLOSE but incorrect, it's supposed to be more like
two peaks need to be in the one array, with the second being much more stable
This doesn't even look the same - did I not take the root right or something?
this is for a different set of results that were sorted correctly, just a few different parameters but the first solution is definitely supposed to have two peaks
Hi does anyone know how to merge 2 datasets together into one in pytorch?
The problem after this point is that they actually are so close that they won't be swamped, but after that point they'll need to all be swapped because the first solution needs to have two peaks
So that's the issue I'm facing now of how to find a method that will preserve two peaks
how can i add each five rows into a new row in pandas
what are those new rows supposed to be?
sum of last 5 changes
!docs pandas.Series.rolling
Series.rolling(window, min_periods=None, center=False, win_type=None, on=None, axis=0, closed=None, step=None, method='single')```
Provide rolling window calculations.
though this is with a sliding window. if you want separate groups of 5, with no overlap, that's different.
anyone here experimented with html output of pandas? The id hash generation is not very stable, irritatingly
na i want the first ty for ref
I get diffs like this by just generating the same output again. My guess is that some hash is using the python id() of some object to create the id:
[-#T_e5c18203_a7c9_11ed_9342_f02f74177e5e-]{+#T_64ec984f_a7d1_11ed_9342_f02f74177e5e+}
Are you sure it's not just, like, a random uuid?
It seems to me like a straightforward solution to generate random unique ids for each cell.
!e the format matches, too:
import uuid
print(uuid.uuid4())
@tidal bough :white_check_mark: Your 3.11 eval job has completed with return code 0.
b16cc679-402a-4840-9533-9efb901a1d21
to me, just using an index would be much more straightforward and fool-proof (and more stable)
no idea why they needed unique ids for each cell anyway, tbh
ooh, I didn't know about uuid, thanks!
but if they used the index, that could conflict between dataframes, so maybe that's why they didn't.
sure, but then they could just put a discriminant per dataframe
oh well, I'll just live with it :3
i would propose again to do some sort of extrapolation, then
take the 3 previous values, fit a 2nd order polynomial to them, and try to predict the next value
then pick the order based on the point that minimizes the distance to the predicted value
this will enforce some amount of smoothness instead of being only pointwise
(they don't need it, the id:s are not used for anything in the generated output - but I guess they can be used as html anchors)
in my output I should say, there is probably some feature where the output actually uses the ids instead of just generating them as dead-weight :p
Are you using pandas.to_html here? Or what, jupyter's export to html?
@austere prawn Because I looked at the docs a tiny bit, and nbconvert only has a few options, but some of them are interesting: https://nbconvert.readthedocs.io/en/latest/usage.html#convert-html
maybe try these templates and see if they're better?
ah, I guess you're using pandas's to_html after all, which seems non-customizable
i mean I guess if you really wanted it, you could always, like, parse the output with bs4 and remove all the ids π₯΄
Ok I'm embarassed, I haven't looked at the code yet, and it doesn't have any call to to_html (investigation ongoing)
huh, so what are you actually looking at? how did you get the html file?
I executed a script in our repo
The script uses pandas and DataFrame, and to_markdown to generate a markdown report. But I haven't figured out how it's actually generating html. Perhaps html is the default output when doing string on a style.
The code has stuff like
df_specs.index = pd.MultiIndex.from_frame(
df_specs.spec_id.str.split("/", n=2, expand=True).fillna("")
)
df_specs.drop(columns=["spec_id"], inplace=True)
style = set_style(
df_specs.style.apply(paint_empty_red, colname="tested by", axis=1)
)
outfile.write(style.render(index=False).replace(", ", "<br>"))
and without ever having used pandas, it's hard to tell what it does. Although it's really smelly! π€’
π
oh, i guess perhaps render defaults to html. (brb, laundry time!)
magic is happening here I guess, lemme read docs more
it's also deprecated I think π₯΄
if this is the right class
deprecation doesn't face us, we are still at python 3.6 and pandas 1.1.5
it probably is, thanks! π
We have been trying to go up python version for over 2 years, but the US office is holding us back T_T
Hopefully we'll get python 3.9 or 3.10 any month now
Si! I've been out of school for awhile now. That might be trivial for you guys but it's a bit of a reach for me right now lol
I was talking elsewhere about this, but I understand a lot of these concepts on a fundamental level from prior knowledge and intuition, but the math really obfuscates a lot of these concepts in a way I'm having a hard time remedying.
Try increasing your batch size until it runs out of memory then bring it back a bit till it runs (then a bit more since you donβt want it to be just on the edge)

@queen cradle how did you go with this? any luck?
for my dataframe I can do something like for a,s in df.groupby("a"): ax.plot(s.bar, label=a)
but it's unclear how to accomplish the above if I do df.groupby(["a","b"]) any idea?
or is there a pandas specific discord I should use?
so you're trying to make separate plots for each group?
yeah
like entirely separate images?
nope, so the first example I'd get 1 plot (line) per a and a is like 0..10
but in the 2nd case instead of just groupby on a, I need to also aggregate by a,b
but then I'm stuck how to keep b as the index and plot 11 plots for a={0..10} like original
a,b,c,d
0,1,2,3
0,2,5,10
0,3,10,15
1,1,20,30
1,2,50,100
1,3,100,150
2,1,21,31
2,2,51,11
3,3,11,13"""
df = pd.read_csv(io.StringIO(t))
df```
for a,series in df.groupby("a"):
ax.plot(series.set_index("b")["c"], label=a)```
Having trouble with this: fig,ax = plt.subplots() for a,series in df.groupby(["a", "b"]): ax.plot(series.unstack()["c"], label=a)
hello, so im trying to make a jarvis like ai and now im working on a automated google search code. but everytime i ask her to search somthing she does it and it al lloks fine and woring but like a second after the google tab is opened it closes again.
can anyone help me ?
That's my code, sorry for screenshot
elif 'search' in command:
keyword = command.replace('search', '')
browser = webdriver.Chrome()
browser.get('https://google.com/search?q=' + keyword)
That's in Text
found out that you can do something like: for i,x in df.groupby("a"): ax.plot(x.groupby("b")["c"].sum(), ...) unsure if this is even ideal though
Your problem is that graph_objects.Volume is only able to render grids of points, and your data is not a complete grid. Try running:
import numpy as np
import pandas as pd
np.set_printoptions(linewidth=125, precision=4)
df = pd.read_csv("HygroGen_TempUniformity_PointCloud.csv")
xx = df['x'].to_numpy()
yy = df['y'].to_numpy()
zz = df['z'].to_numpy()
sort_idxs = np.lexsort((xx, yy, zz))
xx = xx[sort_idxs]
yy = yy[sort_idxs]
zz = zz[sort_idxs]
print(np.reshape(xx, (-1, 10))[:3])
print(np.reshape(yy, (-1, 10))[:3])
print(np.reshape(zz, (-1, 10))[:3])
and take a look at the output. These are your first 30 data points, in lexicographic order (by z first, then y, then x), in groups of ten. The first group has z equal to -47.641 and the second has z being -42.5167. For both of these, you have y in steps of 20 from -40 to +40. And for each of these y and z values, you have some positive x value and some negative x value; but the x values are different, so the data does not fit a grid. The last displayed group is actually really two: You have a single z value and evenly spaced y values, but only a single x value. Plotly does not know what to do with this.
You might have some luck with interpolating your data to a grid; but you will have to be careful and make sure that whatever behaviors you see are really in the data and not an artifact of the interpolation.
(Sorry for the delay, by the way. I only got a chance to look at it just now.)
i am using with cli and getting the following error,
: error: input types 'tensor<1x768x3000xf16>' and 'tensor<1xf32>' are not broadcast compatible
LLVM ERROR: Failed to infer result type(s).
pytorch version is Version: 2.0.0.dev20230208
this is using pytorch with cli for openai's whisper
is anyone available to help me with a code?
You will get more help if you simply ask your question.
so I'm training a deep q learning agent to play a top down shooter game of my own creation and I'm having trouble teaching it to shoot at targets
I have two versions I've tried, one where the neural net is a CNN that takes the game window as input and uses that as the state
the other takes a vector of player position, enemy position, enemy velocity, player angle, etc.
the CNN has yet to converge to the optimal strategy, so maybe it just needs more training or maybe my hyperparameters need to be tweaked
.
but as for the other version, I realized that I haven't coded bullet position into its state vector
and I'm wondering how to do that considering there could be any number of bullets on screen and the neural net needs a fixed sized input vector
I thought about having a super long input vector with zeros representing bullets that don't exist yet
but then the neural net will store bullets as "bullet 0, bullet 1, etc" and it'll mess up the learning because each new bullet will have new weights associated with it that haven't been trained yet
how do I load this dataset? https://www.tensorflow.org/datasets/catalog/scientific_papers#scientific_paperspubmed there's no docs on the github or tensorflow site to tell me how to preprocess the data nor load it into a RNN model
do you not just do
import tfds.datasets.scientific_papers.Builder```
it says unresolved reference
nvm I found the method
dataset = tfds.builder_cls('scientific_papers')
hey everybody
I have coordinates in 2d space, which are actually gps coordinates in reality
Most of the points follow a good path, nothing crazy, but in some circumstances I get them like this.
Does anyone have an idea on what algorithm I should look into, if there's an easy or not so easy solution to this
do you have time associated with the points as well?
no, only the points
and the task is to recreate/determine what's the real path?
or the task is to remove points that are not on the real path?
I'm doing a bit of profiling of some numpy code, and one of the more expensive lines looks like this: x[:] = y[z] Is there anything that might speed this up - I tried using take: np.take(y, z, out=x) but this is actually significantly slower (for reasons I don't understand)
We'll need a bit more context, e.g. about the shape of your variables
and what is it for
x and z are matrices with about 10 million rows, and 15 columns, y is 1 d with about (700,)
It's part of the score calculation for a game.
Hmm, from my understanding, using fancy indexing is already quite performant
is there a way you can avoid operating on such large data?
I need to use this data - it's not prohibitively expensive, I'm just look around for potential improvements. (I don't get why take is more expensive tho')
Since the calculation for each row of the result is independent, there's potentially scope for parallelizing I guess - doing subsets in different threads
yeah, maybe you can try parallelizing the operation using Numba
the take thing is odd tho' because doing the fancy indexing and then assigning means allocating quite a big array, whereas take with out doesn't need anything allocating.
still not working sadly:(
was it any better at all?
Hello! Can anyone help me out? I'm getting a pretty bad r^2, negative in fact, but the graph where predicted values are plotted over test values looks very solid. How is this possible?
By the way, this is how Ridge regression graph looks in comparison to the Lasso one above:
does anybody know why we use sklearn.preprocessing.PolynomialFeatures for polynomial regression?
Because it allows you to perform linear regression methods on all possible polynomial features
@zealous badger
what's a polynomial feature though
If you have two features, a and b. then linear regression just uses features a and b. but if we want to look at quadratic features, then we also have (a + b)^2
So a, b and then a^2, b^2, ab (actually 2ab, but it doesn't matter for linear regression as long as we are consistent)
And also 1 to be able to have a bias
so we have 1, a, b, a^2, b^2, ab
ah i see
The same can be done for higher degrees
thank you :)
did you install the face recognition module?
getting error message on this df.groupby("offspring_recode").mean()
'DataFrameGroupBy' object has no attribute 'groupby'
i googled it but didn't find anything helpful
the error makes it look like df is already a groupby object
can you show more code?
okay! I tried a diff approach now and reloaded the code, not getting the error msg but something new: ```py
df.groupby("age")
def function (o):
if o == 1:
return o.mean()
df.groupby("offspring_recode").apply(function)```
The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I want to find the mean, and then plot the fraction of respondents who have kids (set it to equal one before) with respect to age.
but idk how to get there
i don't quite understand this, could you rephrase it?
I want to find the mean,
mean of what? do we need to group by some common attributes?
and then plot the fraction of respondents who have kids (set it to equal one before)
" set it to equal one before" - what is this?
with respect to age.
do you mean group by age?
there are two parts to the question. first one asks to calculate the average number of respondents within each age group who have children by calling .mean()
which is what i'm trying to do in the earlier codes
" set it to equal one before" - I created a column called offspring_recode where those with kids have a value of 1 and 0 otherwise
at this point it would be beneficial to show a subset of your data
this is the column offspring_recode: print(OKCupid_data["offspring_recode"]) 0 1.0 1 1.0 2 NaN 3 1.0 4 NaN ... 59941 1.0 59942 1.0 59943 1.0 59944 1.0 59945 NaN Name: offspring_recode, Length: 59946, dtype: float64
best to show OKCupid_data itself
this is column age: print(OKCupid_data["age"])
to me "calculate the average number of respondents" is just a weird request..
but anyway..
you want to filter the dataframe such that only rows with children remains, then group by the age group, do .mean aggregation after the group by.
Hey @dense yarrow!
You either uploaded a .txt file or entered a message that was too long. Please use our paste bin instead.
that's what i was trying to do and it gave me an error message
so i turned to function which didn't help either
it's calculating the average number of responded who have kids
hey can anyone advice me on how to plot a specific regression model from a list of models by writing a function? i posted my question here. https://discord.com/channels/267624335836053506/1073297973112479925
df[df['kids'] == 1].groupby('age_group')['num_respondents'].mean()
are you doing something like this?
df[df['kids'] == 1]
- filter the dataframe such that only rows with children remains,
df[df['kids'] == 1].groupby('age_group')
- then group by the age group
df[df['kids'] == 1].groupby('age_group')['num_respondents'].mean()
- do
.meanaggregation after the group by (on the number of respondents as requested)
thank you! what is num_respondents here?
i just assumed you have something like that..
you said
first one asks to calculate the average number of respondents within each age group who have children by calling .mean()
show me a screenshot of the dataframe, even if it's only the first few rows it will do, because i might have misunderstood what do you have
edit: i gotta head home now, so gonna be afk!
thank you for your help
is there pandas function that plots array with colors? π€
I mean, like in some spreadsheet, to make cells colorized
Anyone know how to imputate missing values? Basically what I understand is I have to use groupby and find the number of missing values in multiple columns but idk how the code should look
what you're asking about isn't plotting. take a look at this, though: https://pandas.pydata.org/docs/user_guide/style.html
well I don't run code in jupyter
i'd really appreciate any help on this. this is what i have but i know this is wrong OKCupid_data['missing_values'] = OKCupid_data.groupby('drugs')['drinks', 'smokes].transform('count').fillna(nasum)
hello, i would like to please ask would it be a good idea to do drop certain features from dataset and see if my SVM with rbf kernel performs better? The goal is to find the most important features
why is the index column showing less number of rows than what actually is
sorry for the late reply
the real path, already have it, but trying to remove the points that are noise
if you did something that caused some rows to be dropped, that doesn't change the numbering for the rows that were retained.
basically, dropping rows creates gaps in the numbering.
yeah i did exactly that
how do ireset index
.reset_index(drop=True)
.reset_index returns a new df
also, you should probably reset the index before doing an operation that depends on it
namely df.index % 3
true but reset didnt work
yes it did
I'll come back to this when my meeting ends
it might also be that you had duplicate numbers in the index before
yeah it worked actually
Anybody? :(
Yeah sure that's one of the ways to improve your model performance. You can use Variance Threshold method, Chi-square (SelectKBest), Recursive Feature Elimination (RFE) etc.
There's no best approach to imputation of missing value in a dataset. It all depends on the kind of dataset you are working with and the column that has missing data.
The code below will show you how many missing values you have in all the columns in your dataframe
df.isna().sum()
Now, how you decide to fill each column that has missing value is your prerogative. However, I hope the attached link would provide you more clarity
Hello, I never really did data analysis stuff before, but know databases, discordbots and a little matplotlib. I am interested in how raid performance in clash of clans is calculated. There is supposed to be a formula, but it is unknown. I have around 30k samples in my database with four input values and the actual performance I want to be able to compute. Can you recommend any concept, library, tool, guide or tutorial how to approach finding that formula or a close enough approximation?
Something like linear regression:
Y_performance = aX_var1 + bX_var2 + cX_var3 + dX_var4 + error
ah yes, regression looks like what I should dive into. Thanks π
Careful Sometimes some missing values are not NaN format, like they are filled with *, ?.
Hello I'm trying to get some help with reading a dataframe from a csv... I need to know how to reshape/pivot the table for columns in the field TITLE . Here's the link to the .csv - https://www2.census.gov/programs-surveys/acs/data/2021/CD118_Data_Profiles/ALL_CD by Nation/DP03_1yr_500.csv and some sample code I've tried. I tried marking up the screenshot for my desired pivot, but let me know if it's unclear. Note the column GEONAME isn't unique so I'd need to pivot it without making it the index...
i'm getting error message on this py OKCupid_data.groupby(['drugs', 'drinks', 'smokes]).isna().sum()
i want to know how many rows have non-NaN values in these columns. can anyone help?
please always show the error message, instead of saying that you got an error.
my bad. here's the error msg: EOL while scanning string literal
yep! just keep in mind for the future to never say that you "got an error" without showing the whole error message.
got it!
sorry, for some reason I thought "my bad" meant that you figured it out.
do you know what a string literal is?
no π¦
it's where you have the actual string right there in the code.
foo = "cake"
print(foo)
"cake" is a string literal, and foo is a variable that refers to it.
and if you forget the last quote mark, Python will just assume that your code that comes after it is part of the string
(you forgot the last quote mark.)
see?
ohhh
lemme fix it and try again
new error message: 'DataFrameGroupBy' object has no attribute 'isna'
try doing isna before the groupby
because isna is done on individual elements, regardless of what row or column they're in
or what other data is in their rows/columns
thanks! idk if it's right but the code ranπ
Would you guys reccomend split another set of data called the validation set? Assuming i have created a model from a train set and I want to fine tune it
after you collect the data, before you fit the model, you should always split it into three sets - one for training, one for testing, and one for checking how well it will perform with completely unseen data (the last one only to be opened for evaluating once, after you have the model 100% decided, with no changes after seeing how it performs on it)
in some cases, you can get by with just training&test sets, specially if you do not plan to actually put that model in the real world (e.g. part of a competition or the model is just for proof of concept)
at least splitting into training&test sets is completely vital for any supervised models though
tl;dr yes
comparing the model's score on training & test sets can help you diagnose a bunch of problems, specially underfitting / overfitting, as well as give you a slightly better notion of how well it'll perform on unseen data
(furthermore ; if you make too many tweaks to perform better on the test set, it'll end up """fit""" to the test set to some extent, which is why you may want that third set)
I feel like I knew how to do this but I can't figure it out,, i have to write a function to make new columns. here's the instructions: "if s is a Series, then s==<number> is another Series whose entries are True when entries of
s are that number and False otherwise"
here's my function : def sign_function(s) if s = True when s = 1
!e uh, I mean ```py
import pandas as pd
se = pd.Series(range(3, 8))
print(se)
is_equal_to_five = (se == 5)
print(is_equal_to_five)
@agile cobalt :white_check_mark: Your 3.11 eval job has completed with return code 0.
001 | 0 3
002 | 1 4
003 | 2 5
004 | 3 6
005 | 4 7
006 | dtype: int64
007 | 0 False
008 | 1 False
009 | 2 True
010 | 3 False
011 | 4 False
... (truncated - too many lines)
Full output: https://paste.pythondiscord.com/tokuxerije.txt?noredirect
did you mean that or you're asking how to assign that to a df?
i think it's about assigning it to a df
adding a new column can be done the same way you would assign a value to a dictionary
(side note: when editing an existing column you may want to use .loc / .iloc to specify both which rows and which columns to edit)
create 12 new columns. The new columns should contain a 1 for observations with the corresponding astrological sign and a 0 otherwise. this is what i have to do
we didn't really talk about dummy variables in class. this is mostly for learning data cleaning methods and stuff
i think this is supposed to be really simple, i just can't rememeber how to write this function
Did you try mode='wrap' or mode='clip'? The documentation says that these have better performance than the default mode of 'raise'.
Also, depending on the next thing you're going to do with the data, you may be able to replace the assignment with the reduce method of a ufunc. Or you may be able to take out the assignment to x entirely.
If you already have the real path, then you could try measuring the distance from a point to the nearest location on the path. If that's too large then that suggests the point is noise.
can anyone help me with this? I want to create multiple columns using a for-loop. this is the function I have (not sure if it's written correctly):
def sign_function(s):
if s == 1:
return True
else:
return False```
here are the instructions: Loop through your dictionary of astrological signs (already have this). For each astrological sign, create a new column that checks whether sign_recode is the corresponding number. This new column will contain True and False entries, so vectorize int() over the Series to turn the boolean variables into 0s and 1s.
@dense yarrow this is with pandas, right? You're adding columns to a DataFrame?
yeah!
When you're using pandas, assume that the solution will not involve a loop and will not involve iteration
the instructions say to use a loop
: /
Who wrote them
my instructor
Can you drop the course and get a refund?
hahaha
I'm not joking though.
i've we've been using for-loops for a lot of things with pandas
What is the context for this course?
why is it not okay? I'm a beginner i'd love to know
we're learning data cleaning and basic data visualization
for non-coders
The point of pandas and numpy is to do operations on the whole data as one thing
ahh
so vectorize int() over the Series to turn the boolean variables into 0s and 1s.
this isn't how you would do that. you would doastype(int)
your instructor is doing you a disservice.
can you print the astrological signs dict and the sign_recode columns, and show the text?
@dense yarrow
yeah one sec
i cannot copy paste the result bc the file is too large
!paste
Pasting large amounts of code
If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/
After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.
I will only accept text, so please put the text in the paste bin as soon as you can.
you have to save it and give the link
@dense yarrow my instructions were "can you print the astrological signs dict and the sign_recode columns, and show the text?" -- please do a new paste bin
thank you for being patient with me
so you want an aries column, and a taurus column, etc?
0 6.0
1 7.0
2 NaN
3 3.0
4 2.0
...
59941 NaN
59942 NaN
59943 NaN
59944 NaN
59945 NaN
idk what this is called, but let's assume it's df['sign_num']. do print(df['sign_num'] == 6) and look at what you get.
I guess it's df['sign_recode'], but idk what your df is called (df or what)
this is what i get https://paste.pythondiscord.com/evupojoxug
can you tell what the Trues and Falses mean?
true when the value is 6 and false when it's not?
ya
because pandas is about doing operations to the whole data. so with == 6, you get the answer for every element
ahh, yeah
There are two common ways to iterate over a dictionary in Python. To iterate over the keys:
for key in my_dict:
print(key)
To iterate over both the keys and values:
for key, val in my_dict.items():
print(key, val)
for key, val in my_dict.items():
print(key, val)
this part will come in useful, if you insist on looping.
also, adding a new column to a df is like assigning a value in a dict.
anyway, see how far you can get with this. I need to take a shower irl
(I was walking home from the gym when this started. gains.)
you need to use that astrology dict.
why are my Dense Outputs so large? It is causing issues when I try to put a large number through softmax and it tries e^1000 which returns NaN. My weights are initialized use a gaussian distribution, my biases are initialized at 0, and my inpuits are range -1 to 1
Dense output is just calculated by inputs * weights + biases
could i have some clue on how to solve this?
i might be wrong, but maybe something along the lines of: q learning optimizes the expected value of the total reward while SARSA only looks at the current state. so it makes sense for q learning to take a shorter path
hm i got to the same result but for a diff reason
sarsa looks @ the next action, which in e-greedy can sometimes take you into a red box
so it'll play it safe and go the long route
how would it know to "play it safe" though if it only has local knowledge
fair point
but
SARSA only looks at the current state
in the lecture i'm looking at, sarsa is updated like so:
Q(s, a) = (1- x)Q(s, a) + x(r + Q(s', a'))
(x is alpha)
well, 2 states. the current and where it would end up after taking an action
i mean it technically has map-wide knowledge through q though?
also uh q-learning is updated like this:
Q(s, a) = (1- x)Q(s, a) + x(r + max a' of Q(s', a'))
aren't the Q values the payoffs for each state action pair?
yeah
in the expressions you wrote, Q is evaluated at s, a. that means all you know is Q at one point, not the overall state
fair enough
going back to your argument, what does "expected value of the total reward" vs. "only looks at the current state"
could you possibly elaborate on that a lil?
i wouldn't know what else to say
q-learning considers the overall reward, so the average of the sum of all the Q values along it's path
did you uh see this?
we want the reward to be as big as possible, and in this example, all rewards are negative
so it picks a shorter path
looking at the 2 formulas i have, they both have access to essentially the same info
it's just the last a'
pls
i'm trying to look for an alternative explanation
so the big difference is the presence of that "max" there
the best action is not necessarily the one taken by the agent
that's the big difference
Can anyone tell me the use of face_locations and what output it give ??
and also what arguments cv2.rectangle takes
faceLoc = face_recognition.face_locations(img1)[0]
cv2.rectangle(img1 , (faceLoc[3] , faceLoc[0]) , (faceLoc[1] , faceLoc[2]))
(26, 187, 78, 135)
arr = df["change"].astype(str)
print(arr.dtype) # object```
how do i convert it to `str`
why are my Dense Outputs so large? It is causing issues when I try to put a large number through softmax and it tries e^1000 which returns NaN. My weights are initialized use a gaussian distribution, my biases are initialized at 0, and my inpuits are range -1 to 1
Try a lower standard deviation @violet gull
why
Because the weights will then be closer to zero, so the magnitude of the output will be lower too
But with only 4 nodes it shouldn't be this high anyways if inputs are between -1 and 1
So maybe something is wrong in the calculation
The softmax output is wrong too, since softmax should sum up to 1
it does sum to 1
For the second one they are all E-24 or less
no the 3rd slot is 1.0
there's a 1 in there π
so we dont know whats wrong with it
If you have inputs between -1* and 1, and weights gaussian with std 1.0, then in worst case scenario all weights are like 4, and all inputs are 1, so then you have an output of 16
the weights arent -1 to 1
-1, but same idea
they arent that either
Oh sht, meant inputs
ye
But basically that, it shouldn't really be possible to get those kinda outputs with that initialization and scaling of inputs
so what do i change
Did you make the dense layer yourself?
yeth
Probably miscalculated something I suppose?
i dont think so
You just do a dot product like weights @ inputs?
yeah i can find the actual code if needed
i just want this thing to work
Well that is literally just the whole forward pass, a dot product between the weights and the input vector
And then the activation
Which is softmax, but that seems to work fine in your case
Well like I said, if your inputs are between -1 and 1, and the weights are sampled from a gaussian with mean 0, std 1.0, it's basically impossible to get those outputs
So either the inputs are not scaled correctly, the weights are not initialized correctly, or the forward pass isn't just a dot product
Check all of those
i have test cases implemented for most of these and the tests all pass
π everyone is afraid of neural nets
Nah, there's plenty of people knowledgeble here.
It may take a bit for them to wake up though, it's 8 am in EU
im genuinly been asking this same question for days
you the first person who has responded
When I get home I could help if you haven't fixed it yet
ok i appreciate it β€οΈ
But that will be in a few hours, I really have to go
ok ping me whenever ur available
how about tflite?
also regarding your dense layer, what's the cost function you are optimizing them with? if you enforce no constraints, the values the dense layer takes can really be anything
catagorical cross entropy
aight. well yeah, there's no reason in general why the parameters would have to be small π
wym?
nothing inherent to the network or the optimization problem will prevent the parameters from becoming arbitrarily large
you either add extra constraints to the problem or try to circumvent the issue somehow
a nice trick is that you can subtract an arbitrary constant from the softmax argument
it should be the case that softmax(x) = softmax(x + c) for arbitrary c (x is a vector here and c is a scalar)
so you could subtract a c such that all the x values are negative
then you won't get float overflows, but instead some of the entries may be softmaxed down to 0
does that make sense?
do i need to change anything in back prop?
no
wat
hmm?
lemme make a minimum working example
!e
import numpy as np
def vanilla_softmax(x):
return np.exp(x), np.exp(x)/np.sum(np.exp(x))
def shifted_softmax(x):
c = np.max(x)
return np.exp(x-c), np.exp(x-c)/np.sum(np.exp(x-c))
x = np.array([10,60,100,4])
e, s = vanilla_softmax(x)
print(f"numerator: {e}")
print(f"softmax output: {s}")
e, s = shifted_softmax(x)
print(f"numerator: {e}")
print(f"softmax output: {s}")
@wooden sail :white_check_mark: Your 3.11 eval job has completed with return code 0.
001 | numerator: [2.20264658e+04 1.14200739e+26 2.68811714e+43 5.45981500e+01]
002 | softmax output: [8.19401262e-40 4.24835426e-18 1.00000000e+00 2.03109266e-42]
003 | numerator: [8.19401262e-40 4.24835426e-18 1.00000000e+00 2.03109266e-42]
004 | softmax output: [8.19401262e-40 4.24835426e-18 1.00000000e+00 2.03109266e-42]
oh
here you go @violet gull . see, the output of the softmax is the same, but this computation only involves numbers that are each <= 1
and this will allow me to put in any number of inputs into the dense?
that's because the + c can be factored out and cancels out
that has nothing to do with it
wat
this will just prevent the softmax from overflowing when the values are large
so yes
what does the number of inputs have to do with that?
the big numbers only exist when the number of inputs is large
that's not in general true
my results says it is
all you have is a small number of experiments
the only valid result here is computing upper and lower bounds for the cost function and its gradient
have you computed the lipschitz constant of the gradient of the cost?
no
or done any other sort of analysis on the function
no
ok
it is often the case that magnitudes of gradients and distances are larger in higher dimensions, but this does not necessarily translate into higher weights in the dense layer
that depends also on the properties of the cost function
Softmax output: [0.0, 2.145833396191962E-129, 1.0, 0.0, 6.681095513076006E-290]Loss: NaN```
this is with 1 million inputs ^
1 iteration
mhm, so?
Softmax output: [2.0499549890080238E-47, 0.9999999999966144, 1.2861755034666737E-52, 1.2367781133861323E-14, 3.3732857498361036E-12]
Loss: 3.385625113600146E-12```
this is 10000 inputs
the dense outputs in the large input are massive but the dense outputs in the smaller input are more reasonable
sure, in this particular case
thats the problem
still it's no proof or guarantee π i also can't say whether it will work for arbitrarily large inputs because that may bring other problems
all i can tell you is that specific evaluations of softmax won't overflow
im not understanding
you can either test and see if it works for you now, or do the math and show whether it works in general now
numerical results are not a proof
it shows a trend
but not a proof
that hasnt been wrong once
it is a fact that as I put in larger and larger amount of inputs, the dense outputs become larger
leading to e^really big number error
well i cannot address this part because it requires doing some analysis on the overall function
but this we can easily fix
with the softmax function yes
I have not really read this whole conversation, but consider changing your weight initialization scheme.
but that just seems like a bandaid on a lareger issue
multiple people have said the outputs im getting should be impossible
its gaussian distribution
what are we calling outputs here? the output of the dense layer?
yes
Are you using a well known named weight initialization scheme?
yes
normal distribution'
Does you initialization take into account the number of inputs / outputs?
no
That's probably the problem.
The mode didn't make a significant difference. I wonder whether the difference is to do with the memory layout of the out array
Choosing exactly which scheme will require analysis like Edd mentioned, but you can try something not that great like simply scaling by number of inputs to start.
why does rolling().apply() get me float altho the column is str
The next thing I do with the output is an argsort
Hello, can anyone help me out with this issue?
I have the following series'
s = pd.Series([0,1,'random',2,3,4])
s2 = pd.Series([5,6,7,8,9,10])
How can I use s.mask to return a series where every even number in s is replaced by s2, and elements in s that can't get evaluated per the condition get ignored (e.g. 'random')?
I tried this which gave an ValueError: Array conditional must be same shape as self
def is_even_if_is_number(x):
if isinstance(x, int):
return x % 2 == 0
return False
s.mask(lambda x: is_even_if_is_number(x), s2)
I want an output of this
0 5
1 1
2 random
3 8
4 3
5 10
can we use a combination of fixed and flexible backbones? for a model where i am working with 3 modalities(text, video, audio)?
i was thinking of making audio fixed
i also didn't know the answer to this originally.
error ValueError: Array conditional must be same shape as self is an interesting one.
my approach to this is to use pdb to inspect what's going wrong.
final bit of the stacktrace is
File ~/.virtualenvs/poly/lib/python3.9/site-packages/pandas/core/generic.py:9052, in NDFrame._where(self, cond, other, inplace, axis, level, errors)
9050 cond = np.asanyarray(cond)
9051 if cond.shape != self.shape:
-> 9052 raise ValueError("Array conditional must be same shape as self")
9053 cond = self._constructor(cond, **self._construct_axes_dict())
9055 # make sure we are boolean
ValueError: Array conditional must be same shape as self
upon entering pdb and checking what cond is, we see cond is just array(-1) which doesn't seem right, this plus the error message points to the first argument being wrong.
and reading more into the docs of mask, i see you are supposed to pass in a conditional series with the same length (duh once you re-read the error message)
so, the correct thing to write is probably s.mask(s.map(is_even_if_is_number), s2).
can some help me pld
Just ask the question
ahhh when you pass in a callable, x isn't each entry of the series, it's the whole series.
aye, what you have done is same as s.mask(is_even_if_is_number(s), s2), which is slightly wrong.
Hello. I need help.
So here's a citation of my codes:
for c in range(len(centroidIdX)): #catat rekaman posisi centroid sebelumnya centroidRecordX=centroidIdX.copy() centroidRecordy=centroidIdY.copy() x,y,w,h=cv2.boundingRect(cnt) cv2.rectangle(frame,(x,y),(x+w,y+h),(0,255,0),2) #gambar centroid cx=int((x+x+w)/2) cy=int((y+y+h)/2) centroidDistance = math.hypot(centroidRecordX[c]-centroidIdX[c],centroidRecordY[c]-centroidIdY[c]) if centroidDistance<20: centroidIdX[c-1]=cx centroidIdY[c-1]=cy else: centroidIdX.append(cx) centroidIdY.append(cy)
However, it returns IndexError:
~\AppData\Local\Temp\ipykernel_13912\3650250176.py in <module>
79 cx=int((x+x+w)/2)
80 cy=int((y+y+h)/2)
---> 81 centroidDistance = math.hypot(centroidRecordX[c]-centroidIdX[c],centroidRecordY[c]-centroidIdY[c])
82 if centroidDistance<20:
83 centroidIdX[c-1]=cx
IndexError: list index out of range```
What do you think? Does variable 'c' try to access an index position that does not exist at all? Which variable then?
"In practice,
R2
will be negative whenever your modelβs predictions are worse than a constant function that always predicts the mean of the data."
Might be the case of overfitting. Adjust the parameters/feature selection on the model or the train test split distribution.
My guess is that centroidRecordX/Y and centroidIdX/Y are not in the same length.
Solved. Just a silly typo.
Yeah i know. But i was wondering how those 2 variable pairs have different length, that's all.
Oh right my bad you used copy, yeah one of the y should be in lowercase
Another question. Here's the code now:
#catat rekaman posisi centroid sebelumnya
centroidRecordX=centroidIdX.copy()
centroidRecordY=centroidIdY.copy()
x,y,w,h=cv2.boundingRect(cnt)
cv2.rectangle(frame,(x,y),(x+w,y+h),(0,255,0),2)
#gambar centroid
cx=int((x+x+w)/2)
cy=int((y+y+h)/2)
cv2.circle(frame,(cx,cy),4,(0,0,255),-1)
centroidDistanceX=centroidRecordX[c]-centroidIdX[c]
centroidDistanceY=centroidRecordY[c]-centroidIdY[c]
#centroidDistance = math.hypot(centroidRecordX[c]-centroidIdX[c],centroidRecordY[c]-centroidIdY[c])
if centroidDistanceX<20 and centroidDistanceY<20:
centroidIdX[c-1]=cx
centroidIdY[c-1]=cy
else:
centroidIdX.append(cx)
centroidIdY.append(cy)
car=len(centroidIdX)```
From this code, i'm expecting centroidDistanceX to finally show a non-zero value, but instead:
```Frame ke- 16
CentroidIdX: [256]
CentroidRecordX: [160]
CentroidDistanceX: 0```
What's happening here?
Isnt centroidIdX[c] and centroidRecordX[c] the same
No.
Because you make a centroidId a copy for centroidRecord. Thus, centroidIdX[c]-centroidRecordX[c] in the loop equals to zero? I feel there is missing code here
Yeah, but then...
...I should have assigned new values for centroidIdX/Y...
Oh right.
I'm missing codes.
hey can somebody help me w this?https://paste.pythondiscord.com/iserokilux its a faciel recognition code thing
I don't think you wrote this yourself, but can you explain what you think this is doing? Because... it's a whole lot of nothing if its even able to initialize your Webcam.
haha its a chat gbt project
should have mentioned that
its essentially supposed to be like a machine learning face recognition thing
@hidden mist
How would you define your Python knowledge?
Yeah, start with something a little lower reaching.
Don't be rude, I'm under no obligation to assist you.
ik
but u typing and it takes 4 ever
im a beginner and I got recommended chatbot by many ppl
same
I'm trying to think of a polite way to say that what you're doing and the things you're asking require an explanation in a depth that I don't think you'll understand-- so essentially it just boils down to 'I want someone to write this for me.'
haha yes i understand all im saying is that you were constatally typing for 4 mins to say like 7 words
I know this is a wild concept, but sometimes I consider the implications of what I'm typing before I press the enter key.
idk what that means lol
Start with the basics, hello world and the like.
Build up to implementing machine learning and modeling in TensorFlow.
Ah, well good luck then, hopefully someone can help you out! π
this a chatgbt thing
what coding language are you using for the chatgbt thing?
Hey guys
Im a beginner in machine learning so my question might be pretty dumb
I tried searching the solution online but i couldnt find anything useful
model = Sequential()
model.add(Dense(10, input_dim = 7, kernel_initializer = 'normal', activation = 'relu'))
model.add(Dense(6, kernel_initializer = 'normal', activation = 'relu'))
model.add(Dense(1, kernel_initializer = 'normal'))
model.compile(loss = 'mean_squared_error', optimizer = 'adam')
model_history = model.fit(X_train, y_train, batch_size = 7, epochs = 100)
model.summary()```
I wrote this code and I got this error:
ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type int).```
How do I solve this issue?
holy jesus im a beginner in python and this overhelms me
hey dont be rude
i know...
Who is that?
idk
That is actually mean...
to the person you are referring to
panda guy
like i never even created a numpy array how did i get this error im so confused
oh
bruh marlie can't even write a word in 1 minute π€£
but can you help me solve my question first
he wrote for so long and sent nothin
this one
im rly confused
u ppl get offended by being called "smelly"or meanie pants"
like discord ppl
i broke my best mates arm on purpose and he dident care lol
<@&831776746206265384>
this is an educational server and not a random server. I believe it is proper edicates to behave correctly in correct server. For eg., in a goofy ahh server be goofy ahh and in an educational server just be respectful to others
yh
umm can you pls help me
its not been"mean"tho
!mute 1010275447189287003
:incoming_envelope: :ok_hand: applied mute to @meager ocean until <t:1676046609:f> (1 hour).
!mute 1010275447189287003 
:x: According to my records, this user already has a mute infraction. See infraction #85703.
Thanks cheeki π
p e a c e
anyone?
Excuse me, it's quite vague - I know nothing about programming and want to learn python to become a data analyst - how do I start? I downloaded anaconda and got jupyter notebook, heard it's a good place to start
the indian youtube tutorials are a good start
they explain everything clearly and in a fun way
https://wesmckinney.com/book/
Start here.
oh this is good, thank you
I think you should post a help request in #1035199133436354600
can anybody pls tell me how to fix this
I didn't knew somethin like this existed
https://www.freecodecamp.org/learn/data-analysis-with-python/#data-analysis-with-python-course Someone recommended me this, any opinion on this?
freecodecamp is very good
not only for data analysis
Wes McKinney is the original author of pandas, and he explains things in a level that is excellent for beginners broaching the subject.
oh
but is it good for someone who has NO knowledge of programming at all? or should I start somewhere else?
Like I don't even know if I should use anaconda or pycharm, I'm at this level of knowledge
He gives you a precursory introduction to Python in Chapter 2, as well as making some recommendations himself on other resources.
You can audit Harvard's Data Analytics classes for free as well.
https://www.edx.org/course/cs50s-introduction-to-programming-with-python
https://www.edx.org/course/introduction-to-probability
Is there a difference between the open edition and the physical copy? I'm from Poland so it's about 4times as expensive as it is for an average american
Thank you
I have the physical book, it's almost a direct copy.
What a good guy, so he's offering it for free but also allows other people to support him by buying the book?
This is a copy paste from a slack post but if you google any of these you're likely to stumble upon the free edition
SQL for Data Analysis: Advanced Techniques for Transforming Data into Insights
R for Data Science: Import, Tidy, Transform, Visualize, and Model Data (This is free linked above.)
Fundamentals of Data Engineering: Plan and Build Robust Data Systems
Python for Data Analysis: Data Wrangling with pandas, NumPy, and Jupyter
Designing Machine Learning Systems: An Iterative Process for Production-Ready Applications
are these all books written by him?
They're all O'Reily books, but no they're not all by Wes lol
Also last question please:
There are quite a lot of things to learn for data analysis
Python
sql
excel, kind of
probability and statistics
power bi/tableau
Would be pretty weird if Wes McKinney was writing R books, but I have no doubts he's familiar with the language.
excuse me pls help
it gets quite convoluted and I'm not sure where to start, does it matter which one I start learning first? Is it fine to start with python first?
My recommendation is to start with the Stats 110 (Introduction to Probability) course I linked above. Having a strong grasp of Statistics is going to be very beneficial to you.
https://www.youtube.com/watch?v=KbB0FjPg0mw&list=PL2SOU6wwxB0uwwH80KTQ6ht66KWxbzTIo
Here's a playlist of the course lectures on YouTube.
We introduce sample spaces and the naive definition of probability (we'll get to the non-naive definition later). To apply the naive definition, we need to be able to count. So we introduce the multiplication rule, binomial coefficients, and the sampling table (for sampling with/without replacement when order does/doesn't matter).
I'm having a statistics course at the moment so it's not like I know nothing about it, I just need more revision
my probability calculus sucks though
If you feel you have a strong understanding of statistics, I'd start with learning the more advanced actions in Excel (PivotTables, Macros, etc.) So you always have a 'final destination' for data where, regardless of transformations in Python/R, you can manipulate it into something workable.
Then maybe pick up some basic SQL syntax, you can practice on BigQuery with open datasets. https://cloud.google.com/bigquery
Then move onto either R or Python to learn how to actually manipulate your data in a programmatic sense.
There's a lot, but I kind of get the picture
Once you've nailed that, start looking into some of the more advanced concepts of PowerPoint, Tableau, Looker, etc. So you can actually turn your data insights into something valuable for stakeholders.
Remember your resources as well. Google Datasets https://datasetsearch.research.google.com/ and Kaggle https://www.kaggle.com/ will be your principle resources for education, information, and insight.
Kaggle is the worldβs largest data science community with powerful tools and resources to help you achieve your data science goals.
Kaggle is especially useful once you get some of the base concepts under your belt, they have competitions both currently and in the past that give you real problems to solve.
I heard a lot about kaggle, will definitely try that later down the line
But I'm really a baby bird at this point
Well I gave you some fantastic starting resources, dive in π
one more thing
wes kinney's book, is it good for someone who has 0 I mean 0 knowledge about python or do I need to learn some basic elsewhere first?
If you get to a point where the concepts are not making sense, he makes some recommendations in the book on building up that knowledge.
So... yes.
Even if the book itself doesn't address something, he's going to point you somewhere else that does.
Perfect, thank you so much
Feel free to poke me down the line if you find yourself lacking resources or a real direction to travel in. I gave you a pretty comprehensive roadmap on how I'd approach it, but it's not foolproof.
Good afternoon i just saw this channel which is very applicable to my current situation, I'm here with a 4090 and would like to utillize the true power of my GPU, it right now seems that programs are more or less depending on my CPU which is also strong but not as strong as my GPU ofcourse, I often have about 15 seconds of CPU 100% usage and then 5 seconds of GPU usage 100%, is there anyway to bring it all over to my GPU? I am currently using PyTorch.
If TensorFlow is better please also let me know as I'm still a beginner and want to learn more about what modules are good etc, i am personally coming from javascript but that's more or less a dead end with deep learning which is why I'm now learning python.
Excuse me could you help me too please
If you've got CUDA installed and properly configured, PyTorch and TensorFlow will both take advantage of your GPU to (generally) the best of their abilities, provided the functions you're calling are supported. I was under the impression that PyTorch wouldn't let you allocate a tensor and operate on it with CUDA if it was allocated to the CPU, but it seems that's incorrect, so perhaps you'll get a bit more information out of the documentation than I'm able to provide.
https://pytorch.org/docs/stable/notes/cuda.html
I have in fact properly installed it and it has confirmed with me that it is by using torch.cuda.is_available() which returns true
I first had some issues but I had found out cuda 11.8 was a requirement
That doesn't strike me as remarkably surprising for native behavior, and frankly I think whatever you ended up with falls in line with PyTorch's best practices. I'm not an expert by any means, but you're supposed to be device agnostic, so you should be addressing scenarios where a CUDA-device may not be available.
To resolve that to something that's a little more digestible, explicit optimizations need to be made regarding synchronization and the stream output from the CUDA device in general to ensure that your models are accurate when they're passed between environments. I wouldn't think that those optimizations are incredibly worthwhile for hobby applications-- there's a good chance that you're utilizing your GPU's CUDA cores for the demanding processes regardless by virtue of the way PyTorch is written.
Oh okay so I guess my CPU will just be utilized at its fullest aswell
Thank you
I'm going to be doing some more testing it's a lot of fun at least it is right now but my friends have told me the further you go the more painful it gets as more small issues occur which will be very hard to debug.
You can try asking again in a little bit, my knowledge on PyTorch/TF is tenuous at best, and attempting to explain something I don't have a comprehensive knowledge of is... difficult. lol
That isn't to say I think I mislead you, there just might be a different or easier to implement solution that I'm unaware of.
Ah yes no worries thanks a lot for your time and I will look further in the pytorch docs as they should say about anything I could need. My code right now is functioning, and it runs so I got the time to do other things while it's training and hopefully along the way I find a way to fix it but this may very well be the expected thing to happen and I just don't know, especially because my GPU is so powerful my CPU may not be able to catch up this quick. Will look into it and possibly upgrade where needed I guess as this is probably going to be a new chapter for me as developer.
I saw the resources you and some others sent above and I will also be taking those in use as I really am very interested in this and it seems to be the future.
It's kind of like this with GPU dropping to 10%~ every few seconds and CPU to 100% every few seconds
If someone knows if this is expected please do let me know
Hello. I need some help.
So i have been trying to construct a traffic-counting program using openCV object tracking method. Well, i'm still a beginner so don't expect this code to use some advanced functions like math.hypot().
The problem i encountered is, i successfully implemented object detection using contours, and i pretty much get a hang out of it. But the object tracking code i've been working on still has this general issue: The vehicle counter rapidly increases up to a damn 500 every second, whenever my program detects more than one instance of vehicle. Here's the code:
#pada frame pertama
if frCount==1:
#gambar bounding box
x,y,w,h=cv2.boundingRect(cnt)
cv2.rectangle(frame,(x,y),(x+w,y+h),(0,255,0),2)
#gambar centroid
cx=int((x+x+w)/2)
cy=int((y+y+h)/2)
cv2.circle(frame,(cx,cy),4,(0,0,255),-1)
#tambahkan id centroid dan lokasinya
centroidIdX.append(cx)
centroidIdY.append(cy)
else:
for c in range(len(centroidIdX)):
#catat rekaman posisi centroid sebelumnya
centroidRecordX = centroidIdX.copy()
centroidRecordY = centroidIdY.copy() x,y,w,h=cv2.boundingRect(cnt)
cv2.rectangle(frame,(x,y),(x+w,y+h),(0,255,0),2)
#gambar centroid
cx=int((x+x+w)/2)
cy=int((y+y+h)/2)
cv2.circle(frame,(cx,cy),4,(0,0,255),-1)
#posisi baru centroidId
centroidIdX[c]=cx
centroidIdY[c]=cy
centroidDistanceX = centroidRecordX[c] - centroidIdX[c]
centroidDistanceY = centroidRecordY[c] - centroidIdY[c]
if centroidDistanceX<60 and centroidDistanceY<60:
centroidIdX[c-1]=cx
centroidIdY[c-1]=cy
else:
centroidIdX.append(cx)
centroidIdY.append(cy)
car=len(centroidIdX)```
I've been watching many YouTube tutorial about object tracking fundamentals but none seemed to work. Maybe any of you can help me with this?
Hey @simple fossil!
It looks like you tried to attach file type(s) that we do not allow (.docx). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a, .csv, .json.
Feel free to ask in #community-meta if you think this is a mistake.
Not familiar with this in general, so going to speak in the abstract, someone might have better intuition on how to implement this. But here goes:
What's happening is, since your centroid's are determined by your bounding boxes and you're drawing multiple bounding boxes, if those centroids are outside of the range (centroidDistanceX<60, etc) they're incrementing your counter. Since every instance of centroid being outside the bounding box will result in incrementing the counter, the centroids for subsequent vehicles also fall into that category.
Without more distinguishing information such as... looking at relative color/shapes observed between points, your solutions could be to rate limit them based on how far you think vehicles might travel in that time. It's not foolproof, but cars will never move more than once per frame, so there's no reason to do that calculation more than once per frame.
And you never need to calculate the distance between more centroids than were drawn in the previous frame. (That is, if I have two vehicles in one frame, and three in the subsequent frames, I can rate limit it by only ever counting the first two instances, which will now have a distance associated with each of them.)
If you're simply interested in parameterizing it as "I want to count vehicles." then job done. If you want to count moving vehicles and be a little more precise, you're looking at something like... looking at the first frame, looking at the second frame, looking at the third frame, and assigning a line between those three objects that indicates the path you anticipate that center point to be moving. If a centerpoint continues to move along that line within your bounding box, you can safely disregard any subsequent occurrences of that centerpoint until it's traveled off screen.
The longer you wait to increment the counter, the higher your confidence can be that you only counted one car-- if your line is of irregular shape as it would be if you were counting two opposing lanes of traffic (it would zig-zag) you can safely disregard that line, and your confidence increases that you've drawn the trajectory of the vehicle itself as you add more points to the line (two points could indicate that you drew a line from one lane of traffic or oncoming traffic to another and would be perpendicular or at least not parallel with the line of actual travel.)
Object tracking, as you've discovered, is very simple. Unique object tracking is incredibly difficult.
Another thing to consider is that a car will likely never go backward in a frame, if it's on a trajectory, it's probably going to remain on that trajectory. You can leverage this to your advantage when it comes to vehicles with similar or identical centerpoints traveling along the same axis.
Hey I am new to data science field. I am confused from where should I practice python questions for data science.
Can anyone help me?
And also if you have any pdf of python coding examples... please share
!resources
The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.
Hi everyone i want ask to apply LSTM for predicting stock trend but im still confuse how. Thank you very much.
Hi everyone, any Vertex AI pros here?
Always ask your actual question. Don't ask to ask.
Try being more specific
I am partially on the same route
Guys, I'm trying to make a Reward Model to assign rewards to a Reinforcement Learning network. I think this reward model might be a good way to provide continuous rewards instead of dense rewards.
Question is...if each reward follows a mathematical logic(a math function), then might model will be able to deduce this function and output it without directly calculating it, right?
However...if my reward is something more subjective/relative(in situation A, such action could provide a reward +1. In situation B, the same action provides reward -3), then how could my reward model be affected?
I know that OpenAI even used a reward model that classifies actions from worse to best in order to provide a reward and mitigate subjectivies, but I don't think I'll be able to use this here...
Unless...maybe, if I used classes like "awful, bad, neutral, good, excellent" and its indices as rewards for each situation...
Anyone online rn? Need some help
Don't ask to ask. Just ask
(I'm in class though.)
Gotcha xD
So I got csv file which has 8columns
I want to calculate speed for it
Ik it's distance/time
I got the value of time it's 10
Now I want to calculate distance
How do I go upon doing that?
I got frame values
Position values
Euler angle values which all comprise of 8 columns
Any help would be helpful!
Show the csv as text
One sec
Has anyone here ever worked with point clouds and ICP algorithm ?
Rx Ry Rz are the euler angles
Tx Ty Tz are the position values
The frame is 10milliseconds apart
@serene scaffold π
I'm in class. But you have to ask a question to get help. No one will ever commit to answering if you just ask "is anyone online"
Same with "does anyone know about x"
This looks more like a math question than a programming question lol.
Mb
No no
This is a csv file given
I want to write a code in python
Like write a function to calculate speed of the vehicle
This is the data thats given
It's still a calculus question, from what I can tell. Which is fine
Ya just want to implement a code which let's me calculate the distance
Cause I have the time value =10
After getting the distance value i:ll just divide it by 10
But getting that distance values for x y z need some help in how to write it
Can you give a sample of the csv?
I... should've saw that coming. Hold on I'll just make it myself with dummy values lol
XD
@wanton stone please always do text as much as you can
Ya mb on that
Actually trying to implement this without a comprehensive understanding of the math behind it is proving to be a little more difficult than I anticipated.
You want to convert that into a pandas data frame.
import pandas as pd
isuckatnames = pd.read_csv('locationofthe.csv', skiprows=lambda x: x in [0,5])
# We stripped the column names off, let's add them back.
isuckatnames.columns = ['frame','subframe','rx','ry','rz','tx','ty','tz']
# We can probably index by frame now.
isuckatnames.set_index('frame')
print(isuckatnames.head(10))
From that point you can act on the data as you see fit, I can help you if you can write out what you want to do math wise to the entries themselves.
That's a strangely arbitrary restriction. Can you give me a hastebin of the csv file?
!paste
Pasting large amounts of code
If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/
After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.
Is it possible to make a pyhton chat bot that can gain feelings ?
I can't devote 30 minute blocks of time to waiting for your replies to assist you.
You'll use numpy.loadtxt (docs here https://numpy.org/doc/stable/reference/generated/numpy.loadtxt.html#numpy.loadtxt) to pull your CSV into a matrix, and iterate through the index on that matrix starting at 1 instead of 0 so you don't out of bounds error when you subtract 1 from that value to get the previous value.
From my brief googling trying to understand what you were doing, this is a fairly regular question. StackOverflow here: https://stackoverflow.com/questions/20184992/finding-3d-distances-using-an-inbuilt-function-in-python
Hello I'm brand new to things like python etc so I'm sorry for what might seem like a dumb question to python and need to make a knn run in real time with a microphone is this possible?
Hey @deep spire!
It looks like you tried to attach file type(s) that we do not allow (.ipynb). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a, .csv, .json.
Feel free to ask in #community-meta if you think this is a mistake.
Sorry about that
Was in class the whole time so was struggling to send a reply mb
I took and refined this from tutorials tho.
Hello folks I have a question for those that are AI, data scientist, machine learning engineers
no one will answer until you've asked your question.
What type of math is needed for AI, data science, machine learning? From what I know stats is essential
stats, calculus, linear algebra, and discrete math.
@serene scaffolddo you have a degree in mathematics?
my bachelors is in computer science; I'm pursuing a masters in it now.
(it being computer science)
ahh ok
The reason I asked this is because am not sure whether pursue a masters in CS or Mathematics/Stats/Applied Mathematics@serene scaffold
Hello guys, I have a question: Why I get a different result when I run the model with metrics parameters on compile the model and without metrics parameters?
There's a fair bit of overlap. If you want to lean into the analytics, a strong foundation in math is essential. If you're trying to build-a-bear your machine learning program, you'll end up utilizing a very good portion of both math and programming. If you're sticking more to pre-built ML packages like TF/PT/etc., or if you're looking to bark more up the Data Engineering tree, you'll end up relying a lot more on your CS/programming fundamentals.
At least that's my take.
Shouldn't we not necessarily set metrics if we have setting the loss parameter?
what is your bachelors in?
I agree.
The reason I have trouble deciding on one is because well I think anyone can learn CS on their own and this goes for mathematics as well but its much harder to learn mathematics on your own or at least for me it is (I'm not the brightest lol). In my particular case, I think I need guidance from professors or a program overall, not sure if this makes much sense lol
If you ask a programmer, they're going to say you need math. If you ask a mathematician, they're going to say you need programming.
I can say confidently that I'm so incredibly far out of my depth even having done college statistics when it comes to some of the raw math behind the modeling of machine learning.
I considered elaborating on that further but I'm going to leave my math-confidence alone and avoid kicking myself while I'm down 
lol exactly
@hidden mist I may not know much but if you know how to program in this life you can walk but if you know both programming and mathematics, you can basically fly
Yea I know, its a weird analogy lol
I'd say that's fairly accurate honestly. I switched majors from CS to Chemistry when I was 19, and really my only requirements for both were some basic math classes. Nothing as advanced as I think you genuinely need to take advantage of in ML applications. Now that I'm going back to school, I don't have to take any math classes for Cybersecurity. And while I realize the applications of CS and Cybersecurity are very different fundamentally, I can still see a lot of benefit in having a strong math background in any IT/CS or programmer-related field.
But why I get different result like this? I mean when I just set loss parameters (without metrics) the result is only showing for model 0
This is the code that I've created without metrics parameters @deep spire
I have a mini project in ML to do in one month. But Im new to this. Recommend me any best courses on the basics and intermediate levels on ML that you guys know of
But do you know the reason why for my case?
But when I only use metrics = ['mae'] the result is still same like this.
please help
i have a very basic polynomial regression program here
from sklearn.preprocessing import PolynomialFeatures
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
dataset = pd.read_csv("hiring.csv")
model = PolynomialFeatures(degree=2)
model.fit([dataset.level], dataset.salary)
plt.scatter(dataset.level, dataset.salary)
plt.show()```
and i was just wondering how can i actually see the regression line on the graph
expected = torch.zeros(5)
expected[3] = 1
actual = torch.zeros(5)
actual[3] = 1
lossFunction = nn.CrossEntropyLoss()
print(lossFunction(actual, expected))```
tensor(0.9048)
it is lying to my face and i do not appreciate it
youre having an error for this right?
model = PolynomialFeatures(degree=2)
x = model.fit_transform(dataset.level)
y = mode.fit_transform(dataset.salary)
lr = LinearRegression()
lr.fit(x,y)
yhat = lr.predict(x)
sns.regplot(data=dataset, x = x, y= yhat)
plt.show()```
https://www.youtube.com/watch?v=0NJCq5Aa3lw&t=437s Bard to the moon??
2023 could be the start of Google's end or are they just getting started when Bard is launched? And how will it go against Chatgpt?
Comment below your thoughts....
Follow me on Twitter: https://mobile.twitter.com/SullyBillions
what do yall think do you agree ?
wtf is sully?
Can I have a look to learn please? though im still a newbie to DS and AIπ€£
Alright sure thank you!
Is there a way to easily filter a df by its index, like you would when filtering a column? Or do I always have to reset_index and treat the index as a column
e.g. I want to get the sums for col1 and col2 for all even indexes
df = pd.DataFrame({
'col1': [5,6,7,8,9,10,11],
'col2': [12,13,14,15,16,17,18]
})
df.index.name = 'index_name'
# This will give a key error since 'index_name' isnt a column name
df.loc[df['index_name'] % 2 == 0][['col1','col2']].sum()
Try with df.index
is there any way in pandas to read csv and excel file with one single function instead of read_csv or read_excel
ping me to reply, thanks
Did you figure it out?
I am using os to split filename and extension
then running a if case to check whether its excel
anyone know anything about fourier analysis using PyFFTW or SciPy FFTpack
import cv2 as cv
import numpy as np
blank= np.zeros((500,500), dtype='uint8')# numpy.zeros:Return a new array of given shape and type, filled with zeros.
cv.imshow('Blank', blank)
#draw a line:
cv.line(blank,(0,250),(0,255,0))
cv.waitKey(0)
hey this is my code
it saids that theres unexpected indent on line 5
im currently working on opencv rn
can anyone tell me why
can you send it in python format @tender knot
yes
Hey @tender knot!
It looks like you tried to attach a Python file - please use a code-pasting service such as https://paste.pythondiscord.com
I dont see any indent?
what
How can I filter for all rows where a label is not present. e.g. I can select rows where the index is 'label', df.loc['label'], how can I get the inverse? All rows where the label is not present instead
does a free faster compiler than google colab with gpu exist?
Kaggle maybe?
id if it's bad etiquette but #1073962529291771964 if anyone has two mins to spare. Trying to learn ComputerVision and getting stuck on something that I honestly feel should be trivial, but I don't have the skills to fix it. I know the problem, idk how to tackle it
Hi everyone.
I have a question.
I have a group of 10 images and I want to compare that images with a new images to know from my base group what images is the most similar, so my question is what is the most easy way to dos this I mean I don't need to be super precise
what makes two images "similar", in this sense?
There are images from a building
i wanna know that how can i get into ai
So I just want to compare a zone from the building
the minor difference between the two images
right?
what kind of difference? color?
Pixels?
AIs that deal with images deal with those images as arrays of pixel values.
I don't know if this is the best way to solve this problem.
I didn't explain well
I have this group of images
So if I past a new image
like this one
the 2 most similar are the first ones
and at the end I need to select one from those 2, the most similar image
are you always comparing new images to that set of 10?
Yes
you could calculate the sum of the squared distances or look for the maximum value in the cross correlation matrix
I found this repository, but according to that code, the most similar i'ts the 6 imagehttps://github.com/bnsreenu/python_for_microscopists/blob/master/191_measure_img_similarity.py
https://www.youtube.com/channel/UC34rW-HtPJulxr5wp2Xa04w?sub_confirmation=1 - python_for_microscopists/191_measure_img_similarity.py at master Β· bnsreenu/python_for_microscopists
so for this, you might want to use a distance formula. in AI, "distance" is where a function has two parameters, and it returns 0 when the two are exactly the same, and it never returns a negative number.
Can you help where can I read how that distance formula work?
there's more than one kind of distance formula. it's just a general concept.
I'm not an image AI person, though, so all I can do is give you terms to Google.
Maybe this answer here helps as a start: https://datascience.stackexchange.com/a/48646
so if a try the sum of the squeared distances, how can a I do it in images?
I mean It works if a pass my images to vectors and do it in that way?
let me read it
If your image is stored as a numpy array, you can simply apply the squared distance calculation pointwise and perform the normalized sum at the end to get a scalar.
Anyone know why opencv isn't finding the sudoku puzzle in the middle? Thought it had to do with the line being too thin after blurring so I tried to make it bolder but it still doesn't find it
if I add some noise it does
def findBiggest(cont):
biggest = np.array([]) # Init empty array of biggest points
max_area = 0
for i in cont:
area = cv2.contourArea(i) # Get area of cont
print(area)
if area > 50:
# Is big enough to be considered
peri = cv2.arcLength(i, True)
approx = cv2.approxPolyDP(i, 0.02*peri, True) # Approximates perimeter
if area > max_area and len(approx) == 4: # if the segment we're considering is bigger than the current and it's rectangular/square
biggest = approx
max_area = area
return biggest, max_area
def order(points):
points = points.reshape((4,2))
pNew = np.zeros((4,1,2), dtype=np.int32)
add = points.sum(axis=1) # We sum all the values on the X axis finding which is higher
pNew[0] = points[np.argmin(add)]
pNew[2] = points[np.argmax(add)]
diff = np.diff(points, 1) # We subtract all values on the X axis to find which is lowest
pNew[1] = points[np.argmin(diff)]
pNew[3] = points[np.argmax(diff)]
return pNew
def contours(image_neg, img_orig):
# find all contours
contours, hierarchy = cv2.findContours(image_neg, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE) # Find all external contours
print(contours)
#cv2.drawContours(img_orig, contours, -1, (0, 255, 0), 10) # contourIdx is neg so all cont are drawn
biggest, max_area = findBiggest(contours)
# print(biggest)
if biggest.size != 0:
biggest = order(biggest)
#print(biggest)
return cv2.drawContours(img_orig, biggest, -1, (0, 255, 0), 10) # contourIdx is neg so all cont are drawn``` the code
On another note - I have a pandas dataframe with a lot of unneeded entries,
therefore I want to convert it to a sparse array.
As an example, create some sparse data:
import numpy as np
import pandas as pd
from numpy.random import default_rng
rng = default_rng(seed=0)
dates = pd.date_range("20130101", periods=6)
df = pd.DataFrame(rng.standard_normal(size=(6, 4)), index=dates, columns=list("ABCD"))
df_filtered = df[(df > 0.5)]
Now I can convert this to a scipy.sparse.coo_array object, but the conversion
is somewhat awkward, as I have to specify the astype conversion using
pd.SparseDtype multiple times. Is there a better way than what I'm doing?
from scipy.sparse import coo_array
def df_float_to_coo_arr(dataframe: pd.DataFrame) -> coo_array:
return coo_array(
dataframe.astype(pd.SparseDtype("float", fill_value=np.nan)).astype(
pd.SparseDtype("float", fill_value=0),
)
)
>>> df_float_to_coo_arr(df_filtered)
<6x4 sparse array of type '<class 'numpy.float64'>'
with 5 stored elements in COOrdinate format>
I impressed ChatGPT, impression is an emotion. Thus bot having an emotions
In Pandas what's the best way to store a time period. e.g. 1/1/2020 - 3/1/2020. I'm not trying to get all the dates in between, I just want to store that period of time. Pandas has a pd.Period but it doesn't seem like what Im looking for. There's this SO thread https://stackoverflow.com/questions/62580625/in-pandas-is-there-any-way-to-create-a-time-span-between-two-dates but it's a bit inconclusive
!docs pandas.date_range
pandas.date_range(start=None, end=None, periods=None, freq=None, tz=None, normalize=False, name=None, closed=_NoDefault.no_default, inclusive=None, **kwargs)```
Return a fixed frequency DatetimeIndex.
Returns the range of equally spaced time points (where the difference between any two adjacent points is specified by the given frequency) such that they all satisfy start <[=] x <[=] end, where the first one and the last one are, resp., the first and last time points in that range that fall on the boundary of `freq` (if given as a frequency string) or that are valid for `freq` (if given as a [`pandas.tseries.offsets.DateOffset`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.tseries.offsets.DateOffset.html#pandas.tseries.offsets.DateOffset "pandas.tseries.offsets.DateOffset")). (If exactly one of `start`, `end`, or `freq` is *not* specified, this missing parameter can be computed given `periods`, the number of timesteps in the range. See the note below.)
Oh damn yeah that's perfect thanks
Actually nah I just want a period between two dates like 2023-01-28 - 2023-01-31 , not all the dates between them like this that date_range gives
[2023-01-28 00:00:00, 2023-01-29 00:00:00, 2023-01-30 00:00:00, 2023-01-31 00:00:00],
What period
Not sure what the proper term for it is, but something like this '2023-01-28 - 2023-01-31'
Is there no cleaner object to store a time period like this than date_range?
What do you plan to do with that object once you have it
Looks like you can use this @clever owl https://pandas.pydata.org/docs/reference/api/pandas.Interval.html
year_2017 = pd.Interval(pd.Timestamp('2017-01-01 00:00:00'),
pd.Timestamp('2018-01-01 00:00:00'),
closed='left')
pd.Timestamp('2017-01-01 00:00') in year_2017
year_2017.length
i know this image is fake but whatβs the names of these types of applications
canβt find any other good example
I have to write to a column in an excel file that looks like this
PERIODE
1/1 - 3/1
1/1 - 31/1
5/1 - 31/1
9/1 - 31/1
13/1 - 31/1
Ill take a look at Interval
I first look up some web and try to implement it on my data but it look a little bit crazy as it's parallel to test data. My point is I want to use it just to predict only the direction of stock price up or down but idk how to change from stock price to direction of stock price.
i would love to know more about it
Well you can just predict the next stock price and see if it is higher or lower than the current price
Yepp really like Interval
thanks i will try it but if i can change the predict to only up and down Can it improve my prediction because i only use 1 data is "Close" to predict i have no idea to combine all OHLCV to predict
my data look like this i use close price as the feature
i have looked for RandomForest too but the precision_score is about half only
can anyone give me a pro tip on how to understand CODE of complex architecture with high amount of abstraction?
I am new to transformers, understand the theoritical architecture but having problem incorporating my own changes/tweaks to the model
@mint palm are you using pytorch or what
if you're using, for example, a BERT model with pytorch, you can look in and see what all the layers are
i am using pytorch, actually i am looking at CVPR research paper's code. I get confused often due to inheritence, cofig files, new fuction names
the code for research papers is usually incomprehensible garbage.
oh, Currently i dont understand how the data is being fed, the modal uses 3 madalities and i am confused how much of each is being fed
yeah. they don't care if other people can run their code. they just want to get the results and get the paper out.
i was able to run it but i get the point.
So guys I just trained my first model using pytorch to a degree where i deem it accurate, I have stopped training to prevent overfitting. I would like to use my model in a javascript website, does anyone know a good module to use and predict via javascript? Or should I setup a python api and do it like that. I'd prefer the first option as it will save me a huge amount of money.
Hi everyone !
I am a French student from Polytech Nantes (France) on third year of engineer degree called "Data Engineering and Artificial Intelligence".
I am seeking for a 9-week internship abroad in the area of data science for summer 2023 from July 3rd to September 1st.
I would be grateful pour any opportunity ! Please send me a message π my mp are open
Is abroad a must?
while i don't represent any company i may know some where u could try but those are also from france however they have branches in different countries though
It's required in France in order to become engineer
Oh then nvm
the company could be french but in another country with english like professional language
I speak english good for professional topics, but I'm not fluent
Hmm if u seek in the Netherlands let me know i know some companies who will probably be happy to have you
Not sure how open they are to abroad students but at least one will probably let you
I'm looking this internship in all over european union (easier for visa) so Netherlands is a great opportunity
I would be very happy if you can connect me with these companies π
You could try the ING (dutch bank), Dutch Police (Politie), Albert Heijn (dutch super market), Blender (3d editing program) and some more but i cant name them from the top of my head rn
Yes I know ING, huge bank
Thanks for company names, I am going to add them in my list
Yeah thx
No problem make sure to not depend on one and just contact them all as it's not worth risking it for one company although they usually do what they say you really don't want to come to the Netherlands to find out something didn't work out
Get them to send you a digital contract or something on paper
Yeah you're right
Anyway, the internship have to be approved by the school so they check many things including the contract
Could anyone explain why my MLPRegressor returns different results each time with a set random state & np seed? Iβm super new to machine learning and I got somethin weird goin on
The regressor seems to be way off like 50% of the time, somewhat accurate maybe 40-45% of the time, and dead-on maybe 5-10% of the times I run the code
Ah no my bad, I didn't realize train_test_split() had a random_state parameter
import time
import playsound
import speech_recognition as sr
from gtts import gTTS
def speak(text):
if os.path.exists(filename):
os.remove(filename)
tts = gTTS(text=text, lang="en")
tts.save(filename)
playsound.playsound(filename)
def get_audio():
r = sr.Recognizer()
print("Listening")
with sr.Microphone() as mic:
audio = r.listen(mic)
try:
said = r.recognize_google(audio)
said.remove
print(said)
except sr.UnknownValueError:
print("Sorry I didn't get that")
return said
get_audio()```
when i say "what's up" i get
```result2:
{ 'alternative': [ {'confidence': 0.93134201, 'transcript': "what's up"},
{'transcript': 'WhatsApp'},
{'transcript': 'whats up'}],
'final': True}
what's up```
while it works, i want to only have the output be "what's up"
what do i do?
image recognition?
Guys, why is a Residual Network able to extract features and classify an image better than a VGG model?
I mean...the VGG architecture is quite straightforward and intuitive: the initial layers deal with the complete image, then apply pooling to extract the most relevant features in the feature maps, and then the next layers try to create even more feature maps(thus, extracting even more features) from these relevant features. Until, in the last feature extraction layers, the model might be able to have encoded the features for almost all pixels in the initial image.
However, the Residual Network doesn't have this. It simply extracts feature from the input, then applies a skip connection, then try to somehow generate features based on the features extracted before while also considering the original input...
It's a bit confusing...but quite efficient. At least, in my GAN, a Generator with a residual architecture(4 million parameters) is able to collapse a VGG discriminator with around 40 million parameters. I had to tune the learning rate and "nerf" the generator in order to make them converge 
Hello guys, do you know why the score keep changing when I run the model for several times even though I've set random seed?
Dear data science and AI, I come to you humbly asking for your sincere inputs on a computer vision project. I am attempting to make a script that looks at a 2d floor plan, and can detect walls, windows, and the floor within the walls.
My first attempt was to use numpy and matplotlib to extract the rgb values of the image and create an array, but Im struggling on which will be the optimal way for the machine to differ windows from walls, i was thinking measuring thickness between pixels maybe?
In summary you can either:
- reset the kernel because there is an issue with running in sequence
- or pass a function to reset random seed into the layers:
def reset_random_seeds():
os.environ['PYTHONHASHSEED']=str(2)
tf.random.set_seed(2)
np.random.seed(2)
random.seed(2)
aright, thank you!
Hey @deep spire!
It looks like you tried to attach a Python file - please use a code-pasting service such as https://paste.pythondiscord.com
why do I get this error when trying to graph piecewise functions?
import numpy as np
import matplotlib.pyplot as plt
f = lambda x: 0 if x<0 else 1
x = np.linspace(-1, 1, 1000)
f(x)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_5732/2508726767.py in <module>
1 f = lambda x: 0 if x<0 else 1
2 x = np.linspace(-1, 1, 1000)
----> 3 f(x)
~\AppData\Local\Temp/ipykernel_5732/2508726767.py in <lambda>(x)
----> 1 f = lambda x: 0 if x<0 else 1
2 x = np.linspace(-1, 1, 1000)
3 f(x)
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
β```
(it's in jupyter lab if that's relevant)
well i fixed it by doing
f = lambda x: 0 if x<0 else 1
x = np.linspace(-1, 1, 1000)
y = np.array([f(k) for k in x])
plt.plot(x, y)```
but I'd like to know why that error happens anyway
sure.
you have f = lambda x: 0 if x<0 else 1
when applied to that numpy array, it contains this if condition within
if x<0
which is
if np.linspace(-1, 1, 1000)<0
which is
if bool(np.linspace(-1, 1, 1000)<0)
but
np.linspace(-1, 1, 1000) < 0 is numpy array of 1000 true or false, "bool(numpy array of 1000 true or false)" is ambiguous (that's what it meant in the error message - there is no good way to determine what you want), hence you need to use *.any() or *.all() as suggested - but looking at your fixed code, this isn't required at all.
ohhh I see, it's running that inequality against the entire array and not each element, i didn't think about that
thanks!
You want np.where(x<0, 0, 1) for this.
you can do it a bit easier
x[x < some value] = some othe value
or more easily x < 0
!e
import numpy as np
x = np.array([-1,-1,0,1,2])
print(x < 0)
@wooden sail :white_check_mark: Your 3.11 eval job has completed with return code 0.
[ True True False False False]
booleans in python already behave as 0 and 1 and directly cast as such when doing math later on
[WinError 3] The system cannot find the path specified:
import os
import numpy as np
import cv2 as cv
p=['Lee Nadine', 'Elon Musk']
DIR = r'C:\Users\HONG-ANH\Downloads\compvision\img'
haar_cascade= cv.CascadeClassifier("haarcascade_frontalface_default.xml")
#array of image for faces:
features=[]
labels=[]
def create_train():
for person in p:
path=os.path.join(DIR, person)
label=p.index(person)
# loop thru image in one folder
for img in os.listdir(path):
img_path = os.path.join(path, img)
#read thru the images in that folder
img_array = cv.imread(img_path)
gray = cv.cvtColor(img_array, cv.COLOR_BGR2GRAY )
faces_rect= haar_cascade.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=4)
for(x,y,w,h) in faces_rect:
faces_roi = gray[y:y+h, x:x+w]
features.append(faces_roi)
labels.append(label)
create_train()
print(f'The length of the features={len(features)}')
print(f'The length of the label={len(labels)}')
i got this error message in my code
although i did include the full path
can anyone explain to me why
where can i start with AI?
look at the pins @lapis sequoia
so based on the first pinned message, i think i actually wanna do machine learning.
you can find some tutorials on how to use a simple ML algorithm with python on yt, with tensorflow or without, from scracth difficult and easy
btw, ||chatgpt||
i find youtube tutorials really annoying. i think i'm gonna read the books in the pinned messages
If you want to get straight into things, there are some nice short tutorials on Kaggle. But you still need to put time into studying the theory of the subject, and for that books are your best bet.
@lapis sequoia it also depends which area of AI you are interested and your objective. Is it to land a job or just a hobby? If you want a job you have to see which field of AI simply by looking at the jobs in your area what they look for. Then research the common tools the company is using and learn the related AI topics. But for any AI you need to learn programming basics, numpy and pandas, how to clean data and then go in depth into the stats/math that relate to the AI and algos. Lots of youtube and articles on medium that give like a small roadmap or where to start depending what interest you.
Guys, considering that in Unsupervised Learning the model tries to categorize the data in a way that it provides the smallest entropy as possible...is it possible for a unsupervised learning model to overfit?
I'm trying to make a neural network with unsupervised learning + supervised fine-tuning, but, though the model is correctly following a logic for determining its output, it's not providing the output in a value close to what I desire(or close to my labels in fine-tuning). I was thinking about using more iterations for training it, but I'm afraid it might somehow "overfit", it this is even possible in unsupervised learning
I've read that "overfitting" in unsupervised learning would be the entropy of the pseudolabel being close to 0, thus the model is almost 100% sure of the pseudolabel it assigned to that data.
How can this be bad, though?
Because the model is overconfident in it's decision, that could maybe indicate that it is able to fully store the examples, instead of generalizing.
What would you even base that unsupervised learning on?
Because just reducing entropy of output means that the model will fit to a single label (doesn't matter which) which doesn't seem very productive
And overfitting in unsupervised learning is def. a thing
The reducing entropy is more for multi-label classification. At least most papers I see deal with multi-label classifiers
Though I admit I'm using a regressor model for the task. It's for labeling a dataset
Oh, I see....so it's a bit like what happens in supervised learning, afterall...
You could maybe make an auto-encoder for the unlabeled data
The model "memorizes" the relation input - label
That's more or less what I'm doing
That way it learns some kind of embedding that holds important information about the data
For my personal model, I'm making it encode an input image and assign a pseudolabel to it(following a ResNet-like architecture).
Then I iterate over the same image again(the model has dropout layers, 0.35) and make it assign a pseudolabel again.
Then I use this MinEnt
Before I discovered this paper, I was simply applying a MSE(pseudolabelA, pseudolabelB) as a consistency loss.
Now I'm using this technique, but instead of Cross Entropy, I'm using MSE.
And it seems to work pretty well...even better than directly using MSE.
But the thing is...my labels for certain input images are like "22, 24, 27", and the pseudolabels the model is generating are "7, 7.55, 7.88".
So the model gets the idea, but it doesn't generate the output value I desire.
I was thinking about making it iterate more and more through the data, but, since overfitting is also a problem here, perhaps I shouldn't...
I'm not really familiar with that specific unsupervised learning process, so I don't have much to say on it, srr
Oh, I thought the idea was more or less the same...decreasing entropy, capturing patterns...
At least, when I see unsupervised learning in neural networks, people usually compare it to KNN and PCA
Though I admit I don't remember how PCA works
hey guys , I am a beginner who wishes to learn and explore more abt data science
is "kaggle" the website good for it ? If one could give an opinion about it...would be helpful perhaps
Hey there, is there anyone that is knowledgeable about nlp that could I ask them a question?
be sure to always ask your actual question; don't ask for people to commit before you reveal the question.
Apologies, im trying to write a function which takes a list of features and returns the probability P (c|d) for all classes as a dictionary for a naive bayes classifier. I've computed p(c) and p(f|c) but i cant seem to get the correct answer for p(c|d)
Could someone possibly check my code?
in the same way, you should also show the code, instead of waiting for a commitment for someone to check it
!code
Here's how to format Python code on Discord:
```py
print('Hello world!')
```
These are backticks, not quotes. Check this out if you can't find the backtick key.
you always want to ask complete questions that someone who knows the answer can jump right into.
def prob_classify(self, d: List[Any]) -> Dict[str, float]:
"""
Compute the probability P(c|d) for all classes.
:param d: A list of features.
:return: The probability p(c|d) for all classes as a dictionary.
"""
prob_classify = {}
sum_of_probabilities = sum(self.prior.values())
for individual_class in self.prior:
prob_classify[individual_class] = (self.prior[individual_class])
for feature in d:
if feature in self.vocab:
prob_classify[individual_class] *= self.likelihood[individual_class][feature]
else:
prob_classify[individual_class] *= self.alpha/(len(self.vocab)*self.alpha)
for individual_class in prob_classify:
prob_classify[individual_class] = prob_classify[individual_class]/sum_of_probabilities
return prob_classify ```
This is my function^
I understand thank you
guys why do we use a 2d unet for an rgb image segmentation
different ppl give different reasons
im unable to say which one is legit
each one gives their own reason
ping me on reply
Hello everyone, does anyone know a python library for interacting with ChatGPT? Right now I am trying out ChatGPT Wrapper but it is extremely slow, if you know of any alternatives that would very helpful.
a different library isn't going to make it faster, since ChatGPT throttles requests from non-payers.
you might be interested in this: https://huggingface.co/docs/api-inference/quicktour
List=[a,b,b,c,b,c,b,b,a,b,b]
How can I print the indexes of each 'b' in a single line in python.
( How can I print the indexes starting with 1 instead of 0)
if u want to recall only one position u can do so by list.index(b) do u see how u can do it for full list?
i tried to let him learn a bit lel
@deep spire are u a bot? ur info misleads me π
she makes fun of us
a link to an announcement
tl;dr chatgpt is banned from this server
is there any other community more geared towards reinforcement learning?
the way you used it seems fine so far but I'm not 100% sure
not my call to make either way though 
i'm not looking for subreddits
@deep spire can u do math?
hehehehe
@deep spire generate 10000 different sine functions and assign features for each one which correlate to the function.
this is annoying
roast is fire tho
sup guys
im a bit confused with this error message
s_edge = self.ax_pos[0] - 0.25 + self.lim_offset
IndexError: index 0 is out of bounds for axis 0 with size 0
just kinda bored
Two of those are models not libraries
Also pytorch is not "an implementation of faster r-cnn"?
Seems like a sussy gpt response π
Pytorch is a bit easier to install with cuda though imo, it gets installed in your env together with pytorch.
