#data-science-and-ml
1 messages · Page 421 of 1
And I dunno what to search on google on that so yeah
probably not, then. start with single variable calculus and linear algebra
i'd rather not, we can just talk here and i'm fairly busy 😛
Ok lol
what are you trying to do here?
Edd can I ping u any time?
hi, how do i plese do naive bayes clasifier for tfidf ? most tutorials show for bag of words
i probably won't answer immediately, you might as well just ask stuff here and have other people help you too
ok, ive done that and im getting 0.008 accuracy
still not good
Ok
numpy already has an economy-size SVD function. as for what you're doing there, slice notation works the same in python as it does in matlab, except that indices start at 0 and you index arrays with [] instead of ()
So tech me single variable stuff
So I'm trying to convert an entire code from matlab to python for a uni project
The entire code is this
https://github.com/hli1221/imagefusion_Infrared_visible_latlrr
Nvm I will study from ur
I've been constantly googling the stuff
So like sigma_J = sigma_J[1:svp_J] - 1 / mu in python?
starting from 0 instead of 1
diag is a function in numpy, so depending on how you import numpy, you can do either diag() or np.diag()
@wooden sail do you have any idea on how I can improve my model?
this is an example of my input data
and are you sure the input shape is correct for each layer? notice that every time you do a convolution and pooling, the size of the image changes. what's more, keras automatically computes the sizes. this only needs to be specified for the first layer. can you try removing the input_shape from the other layers?
the result of copy pasting for you
ive done that and same thing
oh did I mention the non-zero accuracy is only for the first epoch?
I tried converting from matlab to python once, learning matlab was the easier thing to do.
yeah:```py
model = keras.Sequential()
model.add(keras.layers.Conv2D(16, (3, 3), activation='relu', input_shape=(48, 48, 1)))
model.add(keras.layers.MaxPooling2D((4, 4)))
model.add(keras.layers.Conv2D(64, (3, 3), activation='relu'))
model.add(keras.layers.MaxPooling2D(4, 4))
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(64, activation='relu'))
model.add(keras.layers.Dense(16, activation='relu'))
model.add(keras.layers.Dense(7, activation='softmax'))
model.summary()```
why are you pooling so hard btw?
you're keeping like 1 or 2 pixels per image
that certainly won't work well
replace the first pooling layer by a (2,2) one and remove the second pooling
and only 2 Dense layers to manage that?
why?
ok
@wooden sail the dense layers then have to manage 28K inputs
u sure thats not too much
my computer's lagging and I havent even fitted it
oh, also, the batch size ive kept it to 128, is that too much>
oops
i meant
LGBM gets 65% and XGB and RF get 56-57%
is this natural in some cases
yeah try reducing that to 32
but 28k isn't bad at all
uh oh
@wooden sail what do u think
I thought theyre similar enough to not make such a difference
no idea
Thing is, I have the code right there in the repo
Would kinda be pointless to reinvent the wheel
I'd just use the code itself if it wasn't a uni project
if svp_J>=1
sigma_J = sigma_J(1:svp_J)-1/mu;
else
svp_J = 1;
sigma_J = 0;
end
svp_J should be 0 in the python conversion, right?
hi guys
i'm running tensorflow
but facing these errors
any solution how to fix it
Please don't ask people to read screenshots of text. You can just copy and paste the text into the chat.
ans is giving
ok i will do that
2022-07-18 22:18:43.055182: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2022-07-18 22:18:43.063715: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2022-07-18 22:18:49.171567: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2022-07-18 22:18:49.193277: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cublas64_11.dll'; dlerror: cublas64_11.dll not found
these ☝🏿
what dynamic library missing
i don't understand
or using pip tensor malfunction?
nvidia issues
xd
its fu nny how peopel on widows and mac have just as many but totally different tensorflow issues
- screw tensorflow
- use pytorch and avboid all problems
that why im learning pt
latest version 3.10.5
do you have a gpu?
and you installed cudatoolkit and cudnn?
you need those if you want to use your gpu with tensorflow
nothing more
dont even try to understand tensorflow works they built it wrong
move to pytorch
its form the ground up
really !
rather than using all of these weird ass code google made
it relys on more pure calculations and less modules
its harder to learn at first it will take days and days but i can see it being less buggy
what's diff between tensor and pytorch
pytorch is more coded
and less behind the scenes functions
its lighterweight i think
ok get it
@pseudo pasture install pytorch and see what happens
if u stillg et this issue
bet u wont
ok i will install then
you dont ? i have a mac
and also on windows it can run in cpu
same as tensorflow
you don't if you don't want to use an nvidia gpu, sure
so i think his issue isnt related to that
tensorflow is saying there to just ignore the warnings if you don't have a gpu, it'll just run on cpu
it's there in the warning message they shared
both pytorch and tensorflow need cuda if you want to use an nvidia gpu
@pseudo pasture wats ur gpu
whats then dynamic file problem
it means you don't have the cuda libraries
^ install that if your on nvidia
nothing of what supermoon just said applies to any of the problems you mentioned at all
next year im going to buy a 4080 pc shud blow everything out of the water including my mac pro m1
you just need extra libraries to use an nvidia gpu, regardless if which machine learning framework you use
im using tf for my thesis rn
still, it's true that pytorch might be better and easier to use. it's just not the issue at hand 😛
and im going crazyyy
I think at first its much much harder to use
alot of stuff has to be coded
its more complex at first def
ok ok
but leaves less room for actual library issues
for exxample it just relys on numpy and uses its own tensor, tf has massive compatibility demands on mac
ok let me read it
i use conda normally
vs code
miniforge user here
not sure why every tutorial tellsme to use homebrew
when its not required at all
whats the benefit of using brew
don't know
@wooden sail any idea how to increase recall
i oversampled my imbalanced class
to fix trhe 1.0 recall issue
now im only at 59% acc
i don't know what any of that means
hi, how do i plese do naive bayes clasifier for tfidf ? most tutorials show for bag of words
wdym
every data scientist knows what recall is
path1 = ['./source_images/IV_images/IR',num2str(index),'.png'];
path2 = ['./source_images/IV_images/VIS',num2str(index),'.png'];
fuse_path = ['./fused_images/fused',num2str(index),'_latlrr.png'];
What would the conversion be in python?
Is there like a function that does this?
i'm not a data scientist
what are you
lemme look up what that is
mathematician?
master in signal processing? and the degree i'm in rn is for a doctorate in engineering
interesting
i meant sensitivity
like
how you calculate ROC
one of my classes has aweful recall
ah, like the true positives
yeah
and you are classifying images?
all right, similar enough
i minmaxxed scale my training set
i tuned params
i didnt scale testing set
i fit my tuned params to the trainnig set
and predicted test x
i one hot encoded both train and test
i encoded before splitting to solve the issue of having too many cols
sadly most of those terms alone still mean nothing to me 😛 if you can show an example, i could possible comment.
well lets say you had
500k rows and about umm
10 features or less
some of which numerical, some categorical
the class to be pred is either 0 or 1
whats your strategy to maximising your accuracy
each row of 10 cols is one thing you wanna classify?
a brief pipeline results in 57-58%
yes
had to resample
so used SMOTE to synthetically generate the underpopulated class
so that my model wont predict the more common one to get a 90% scofre
that put me from 96% acc to 58%
I have a model that does 64% BUT it has 1.0 recall and 1.0 precision for something so thats cheatin
oversampling was a mistake?
i thought its better
its about 18k of 500k that are class 1 and the rest class 0
so by resampling brings it to 700k+ rows
how does SMOTE generate synthetic data?
u tell me, its generally just a tool people blindly apply to fix class imbalance
so that you dont cut out data by removing rows from the popular class
I think it just uses randomised values that would usually be attirbuted to the class
my test size has also been 0.2
same as tuning cv
omg
i know what is the issue
ah i thought it'd be the case. it's taking convex combinations of the examples
ive used smote where my categorical values were treated as floats
how devastating could that be
actually that cannot be the case
it did put someones age as a decimal, but it seems my encdoing still worked
v weird
well, you do want the categorical vars to have the same distance from each other
age can be treated that way?
smote would certainly generate "categorical" vars that are not valid and whose distances don't fall on the grid you'd like
i would guess age can be more easily left as a float, but you probably want an intermediate step to check that the vectors are on the correct manifold
the thing is though, i have ages like 58.0 etc and smote made one thats 45.72. however, other categories like ethnicity seem to have been kept as floats and not scuffed
so rounding floats and using a max() function on the categorical vars
would be worthwhile to print a few of these synthetic examples and see whether they look like true data
im doing it rn
the fact that I only have 48 columns after get_dummies indicates that hasnt happened somehow
for example, sex are all still 1.0
1/0
how about something with more than 2 categories?
idk what "get dummies" is
encoding like
sex col to
two columns
sex1 and sex0
shudnt make a difference?
I’m baffled how so much data can result in sub 60 accuracy
what are you using for the classification? a generic deep neural network or something like svm?
Basically it’s records such as age sex ethnicity smoking status bmi and whether they got a stroke or not , I’ve tried xgboost and random forest and light boost but that one cheats and gets 1.0 recall for a class
64% acc has to be discarded for light gb
i'll have to admit that, although i've heard those names several times, i don't know how any of them work. lemme google it real quick
Xgb is a super go to- it’s proven better than neural nets on tabular data
No one uses neural nets on tabular data
I’m sure had I used SVM I’d get 55 or less
all data is tabular data
Ok you know what I mean
i don't
there's no difference
Generally for this type of data you don’t use a neural network
There’s a huge difference
guys I get a " ValueError: Data must be 1-dimensional" in pandas when I try to create a df with 2 columns... any idea?
I will humour you and use a simple neural net and see what scor eit gets
xgboost looks to be like a specific optimization method, underlying model notwithstanding
it's an optimization approach
and youd be ar guing with google sayiung tabular data is the same
they made tabnet
i'll grab the original paper and take a look
for xgb?
mhm
Any ideas guys? code is on matlab
the paper mentions nothing of so-called "tabular data", which appears to be a widely accepted term for database-like data that might include text-based categories. i'd just call that "highly structured data", since valid data can only take special "shapes" and is presumably sparse in a well-chosen basis
its called tabular data acros the entire community
not really, its just what they say in stats carried over to ds
data that is organised in rows and columns is tabular
an image is not tabular, you create a matrix from it
matrices are isomorphic to vectors in an appropriately sized vector space
there is no difference
so are higher dimensional tensors, too
I think there is a difference
only that they inherit a structure when vectorized
nothing else
"vector" is a very abstract term. matrices form vector spaces too, and so do tensors and other goodies
so you can "vectorize" many things and arrange them as matrices, netting you rows and columns
but they will never have anywhere near the same characteristics as business or medical data
this is why peopel dont use a neural network on it (moostly)
hmmm that's not really the case, but rather that you want to extract different things from them
you can also look at xgboost as a neural network, btw
why do you think people resort to xgb and not a traditional neural network
what advantage does it have
aha, if you put "traditional" in front we're getting somewhere
the difference is that xgboost is based on a model, not just black box
but why would that have better results on this type of data
EXCEPT for tabnet but thats another story
not for the data, for the task
mostly people skip even importing tensorflow
what you want to get out of the data
classification
or even regression
doesnt matter
i just want to work out why my performance sucks os bad man
ud think with hundreds of thousands of rows youd get somethjing decent after a trivial halving grid search
what were the sizes of the original classes? how many of each
18k vs 485k
its binary
howe many people ended up getting x disease
io suppose the turth is i dont have good enough predicors
of that disease
well, you can only get so many extra examples out of that 18k sample using the method you chose for generating synthetic data
i got enough so it balanced them totally
what it's doing is clustering the data and generating examples near the centroid
yeah, but those new examples don't have much extra info
yeah but at least it stopped the model cheating
i think you might get better results if you do a mix of the two things
generate some synthetic examples and downsample the overrepresented one
could you try that?
i could but that would only LOSE data
yep. but generating random data from convex combination doesn't actually give you new data
but how can downsampling increase whith less info performance
it will take some time
ill need to work out how to do that to a specifed number
imblearn afaik just throws everything to a min or max
oh they have different smote types
kmeans, svm, borderline ada
What's the equivalent of MATLAB's imread in Python?
OpenCV's imread?
there's pillow and imageio that let you read images
Does it work for a replica of this code?
going by the first few lines, yes
lookie here https://pypi.org/project/imageio/
Thing is
X1 = image1;
[Z1,L1,E1] = latent_lrr(X1,lambda);
I feel like that X1 is an array
So there seems to be some sort of conversion already
Oh
that's nice I just read it converts the image to numpy array
you can reshape the array to a vector later if you want, but doesn't latent llr do an SVD and truncate the singular values based on the threshold lambda?
tbvh I kinda suck at computer vision and image processing
lol
I got this warning kinda thing
I did what it asked
But kinda curious what it will be
Starting with ImageIO v3 the behavior of this function will switch to that of iio.v3.imread. To keep the current behavior (and make this warning dissapear) use `import imageio.v2 as imageio` or call `imageio.v2.imread` directly.
do what it says and see if it does what you need still
i have no idea, use it and find out
Aight so I get to this code
image1 = rgb2gray(image1)
And google search tells me to use Pillow
But like
Is there any lib to do it alongside imageio
Probably matplotlibs imread
It also doesn't seem to have a rgb2gray function
But seems they recommend pil now
This function exists for historical reasons. It is recommended to use PIL.Image.open instead for loading images.
oh alright
yeah try pil instead, then, or do the transformation by hand
Scikit image maybe @scarlet siren
https://scikit-image.org/docs/stable/auto_examples/color_exposure/plot_rgb_to_gray.html
https://stackoverflow.com/questions/687261/converting-rgb-to-grayscale-intensity you could do some simple maths on the numpy array
So I have to use np.asarray on the PIL image?
I don't think I'd be able to convert ndarray to grayscale
an rgb image would make either an m x n x 3 or m x n x 4 array. the third dimension are the color slices
you can then scale each slice according to that color recommendation thing in the stackoverflow link and add them together
What if I use imageio, get the images as ndarrays and then use scikit image to convert to grayscale?
get used to it with python
the reason people like python is that people have made/make cool libraries for it
each one is more or less specialized and you often have to use many
Use pil image convert('L') if you want to convert RGB to grayscale
I'll have to convert a ndarray to grayscale
Cause of this code
image1 = imread(path1);
image2 = imread(path2);
if size(image1,3)>1
image1 = rgb2gray(image1);
image2 = rgb2gray(image2);
end
Pil image from array then pul convert
you can read the image and make it into RGB with pil, then make it into numpy array
That works too yeah
What size(image1,3) does?
that's like nparray.shape[2]
Why not directly use opencv, it loads it as a numpy array to begin with
Something like if number of channels is more than 1 convert to grayscale?
Can also convert from rgb to gray
yep
Or if RGB convert to grayscale?
Or I think even directly load in as gray (whether or not the image is gray)
So the index is 1 less in python?
python counts from zero, i think i mentioned it earlier
MATLAB does from 1?
yes
import cv2
im_gray = cv2.imread('gray_image.png', cv2.IMREAD_GRAYSCALE)
works for rgb and grayscale pretty sure
Didn't know
The grayscale has to be loaded on a condition
Expected type 'ndarray | Iterable | int | float', got 'Image' instead
aimage1 = asarray(image1)
What condition?
using that code got me this even tho the PIL doc says you should be able to
If the amount of channels is larger than 1 right?
This is the matlab code
Yeah
So why wouldn't this suffice?
This loads in grayscale (1 channel) as gray, and RGB (>1 channel) as gray
Instead of having to use pil, and then convert to ndarray (think PIL doesn't use numpy arrays, could be wrong)
all of these methods would work, you'd just do it in a different way
just pick one
we spoke of like 4 or 5 solutions
Yeah you're correct
pccamel's looks good, try that out
Alright good to know
Pil example
im = Image.open("file")
if im.mode = "RGB":
im = im.convert("L")
That is also nice
I chose to use imageio and scikit
Going to the im2double function can't find an equivalent in them
Why do you want the image to be doubles?
So TLDR I'm translating a code from MATLAB to Python
And tbvh not sure what the code is exactly doing
But take a look
once you have the numpy array, you can use .astype(float)
Oh right, I remember
or when you numpy-ize it, np.array(the_thing, dtype=float)
Can SciPy's Fisher's test be used to compare two numbers instead of the usual four? I'm analyzing specific DNA coordinates for their change in a trait, and each coordinate has two scores, which is the change in the trait after experimental treatment
Could I apply Fisher's test for these scores?
I gotta use the matplot lib for imshow function?
there are other plotting libs, but that's the common way of doing it, yeah
Ah ok gotcha
So the code also has figure
How do I implement that using matplotlib
plt.figure()
oh ok
ty
imwrite(F, fuse_path, 'png')
So I tried to write this last code in python
And it doesn't seem to work exactly like in matlab
So I tried to dig up the docs
The issue is
I don't see the array as the first parameter in examples
nvm the examples I saw were not that good
seems I first gotta put the path and then the array
So running the code
I got this error
/usr/bin/python /home/arshia/PycharmProjects/pythonProject/src/main.py
Traceback (most recent call last):
File "/home/arshia/PycharmProjects/pythonProject/src/main.py", line 17, in <module>
if image1.shape[2] > 1:
IndexError: tuple index out of range
Process finished with exit code 1
Even tho I converted this
if size(image1,3)>1
To this
if image1.shape[2] > 1:
seems in this library, if there is a single color slice, the array has only 2 axes instead of 3
you can add in a check that len(image.shape) == 3
check that condition before you check whether image.shape[2] > 1
if len(image1.shape) == 3:
if image1.shape[2] > 1:
image1 = rgb2gray(image1)
image2 = rgb2gray(image2)
Like this?
hmm i guess you can put them in the same line, it'll escape before reaching the second check anyway, that was my bad
at any rate, what you shared just now will work
yeah
So I see this now
/usr/bin/python /home/arshia/PycharmProjects/pythonProject/src/main.py
LatLLR:
Traceback (most recent call last):
File "/home/arshia/PycharmProjects/pythonProject/src/main.py", line 31, in <module>
llr1 = latent_llr(X1, lambda_value)
File "/home/arshia/PycharmProjects/pythonProject/src/latent_llr.py", line 22, in latent_llr
atx = X * transpose(X)
ValueError: operands could not be broadcast together with shapes (496,632) (632,496)
Does the * operator not work with ndarrays?
I thought it was overwritten
I thought there was a __mul__ in python
yeah well, numpy decided to use it for something else
yeah
Ah alright ty
or A.dot(B.dot(C)) or whatever way you wanna exploit associativity. some orders are faster
This code
E = max(0,temp - lambda/mu)+min(0,temp + lambda/mu);
mu is a number tho
that's a scalar operation then
so is lambda 💀
just use /
multiply is hadamard, just like *
matmul works like dot on matrices, but not on multidimensional arrays
Then dot should work
ah btw you're gonna love this
the one thing where numpy is better than matlab
einstein notation: native support for tensor contractions without having to reshape into matrices
check out numpy.einsum()
nice
/usr/bin/python /home/arshia/PycharmProjects/pythonProject/src/main.py
LatLLR:
Traceback (most recent call last):
File "/home/arshia/PycharmProjects/pythonProject/src/main.py", line 31, in <module>
llr1 = latent_llr(X1, lambda_value)
File "/home/arshia/PycharmProjects/pythonProject/src/latent_llr.py", line 24, in latent_llr
inv_b = inv(A.dot(A) + eye(d))
ValueError: shapes (496,632) and (496,632) not aligned: 632 (dim 1) != 496 (dim 0)
How do I uh
fix this
you missed a transpose
in code translation?
A^2 only exists if A is a square matrix
oh yeah you're right
it's telling you you tried to multiply matrices where the # of cols of one does not match the number of rows of the other
dim 1 is cols, dim 0 is rows
and it showed you the sizes of the matrices
just take it easy and read the error messages, numpy's are pretty clear
Yeah you're correct
/usr/bin/python /home/arshia/PycharmProjects/pythonProject/src/main.py
LatLLR:
Traceback (most recent call last):
File "/home/arshia/PycharmProjects/pythonProject/src/main.py", line 31, in <module>
llr1 = latent_llr(X1, lambda_value)
File "/home/arshia/PycharmProjects/pythonProject/src/latent_llr.py", line 25, in latent_llr
J = zeros(m, n)
TypeError: Cannot interpret '632' as a data type
Process finished with exit code 1
Where does it put 632 as datatype
ok that's a not clear one lol
lol
numpy takes in the dimensions for zeros and ones as a tuple
so it wants zeros((m,n))
yeah
in python, putting stuff in parentheses separated by commas makes a "tuple"
zeros expects that the first parameter is a singleton or a tuple, and the second parameter, if specified, is the datatype
e.g. zeros((m,n,w), dtype = complex)
👌
So I converted this code to this
svp_J = length(find(sigma_J>1/mu));
svp_J = max(where(sigma_J > 1 / mu).shape)
As I found out on SO
But I get this error
/usr/bin/python /home/arshia/PycharmProjects/pythonProject/src/main.py
LatLLR:
initial
Traceback (most recent call last):
File "/home/arshia/PycharmProjects/pythonProject/src/main.py", line 31, in <module>
llr1 = latent_llr(X1, lambda_value)
File "/home/arshia/PycharmProjects/pythonProject/src/latent_llr.py", line 44, in latent_llr
svp_J = max(where(sigma_J > 1 / mu).shape)
AttributeError: 'tuple' object has no attribute 'shape'
I understand what it means
max doesn't return a ndarray
Do I have to use np.max?
i don't see how length(find()) is equivalent to max(where())
so the find function is supposedly the same as where in numpy
In [1]: import numpy as np
In [2]: x = np.array([1,2,3,4])
In [3]: comparison = x >= 3
In [4]: comparison
Out[4]: array([False, False, True, True])
In [5]: np.sum(comparison)
Out[5]: 2
In [6]: np.sum(x >= 3)
Out[6]: 2
this is how i would do it
And there's this
the last line is the same as the previous ones, just doing it all in one line after you've seen what's going on
lemme read what np.sum does exactly
you can directly compare all elements of a vector (or matrix) to a scalar by using ==, <, >, etc
that returns an array of bools
but bools behave like 0s and 1s, where 1 is true
np.sum by default adds up all elements of a multidimensional array
so it's equivalent to counting the number of elements that satisfy the boolean condition
does the length function in matlab do that too?
e.g. being larger than a threshold
the length function counts the number of elements in what np.where returns
but np where returns an array of the length of the number of elements that satisfy a condition, so it's equivalent
hmm?
it's the same as len or shape[] here
oh ok
i'm just proposing an alternative that is imo easier to understand
I'm kinda understanding
so x >= 3 returns an array that has the condirion
And if I add a len() behind it
I'm essentially doing the exact thing in matlab?
len(x >= 3)
x being a ndarray
if you add len(), it will still be he same length as the original vector. matlab would do the same, so this doesn'T work neither in matlab nor here
notice in my example, both the original array and x >= 3 have 4 elements
it's just that the ones that don't satisfy the condition are False
mhm
mhm
and then when you call sum, False booleans behave as 0, and True ones behave as 1
so this counts the number of Trues
so sum(x >= 3) basically does what I want?
yep
ty!
svp_S = length(find(sigma_S>1/mu));
if svp_S>=1
sigma_S = sigma_S(1:svp_S)-1/mu;
else
svp_S = 1;
sigma_S = 0;
end
so in my code
I translated it to this
svp_S = sum(sigma_S > 1 / mu`)
if svp_S >= 0:
sigma_S = sigma_S[0:svp_S] - 1 / mu
else:
svp_S = 0
sigma_S = 0
I removed one from the condition since I thought python arrays start with 0 so this also should go from 0
And so I changed others to 0 as well
thing is now with using this method you mentioned
I feel like the logic is a little bit different now
or am I just confused
has anyone ever worked with taxonomists
im going to have a meeting with a bunch of them on wednesday as stakeholders
and my boss bailed since hes on PTO

rip me
🕯️
almost. in the if statement, it should be svp_S >= 1. this is because slice notation in python includes the bottom limit but excludes the lower one
that means the slice 0:0 is empty
the slice 0:1 contains 0
So I also have to do svp_J = 1?
In the if?
/usr/bin/python /home/arshia/PycharmProjects/pythonProject/src/main.py
LatLLR:
initial
Traceback (most recent call last):
File "/home/arshia/PycharmProjects/pythonProject/src/main.py", line 31, in <module>
llr1 = latent_llr(X1, lambda_value)
File "/home/arshia/PycharmProjects/pythonProject/src/latent_llr.py", line 50, in latent_llr
J = U_J[:, 0:svp_J].dot(diag(sigma_J)).dot(transpose(V_J[:, 0:svp_J]))
File "<__array_function__ internals>", line 180, in diag
File "/home/arshia/.local/lib/python3.10/site-packages/numpy/lib/twodim_base.py", line 309, in diag
raise ValueError("Input must be 1- or 2-d.")
ValueError: Input must be 1- or 2-d.
Process finished with exit code 1
Also I got this
And while I understand that the array should be 1d or 2d
I literally translated this to that
what's the shape of what you called diag on
```py
temp_J = Z + Y2 / mu
U_J, sigma_J, V_J = svd(temp_J, full_matrices=False)
sigma_J = diag(sigma_J)
what's the shape of sigma_J
what's it's shape lol
i'm not sure where the error is, then
i have to go sleep, maybe someone else can help out
good luck
alright ty for the help!
Well I think I know why
nvm it won't be that
oh yeah I think ik why
so
diag(sigma_J)
if it's already set to 0
will probably cause the issue
cause 0 is a number and not an array
right?
decided to see what my conv net is "seeing" while it processes an image, actually turned out pretty interesting
Is there any advantage of doing additional feature selection after some initial features are selected? (i.e. further narrowing down the number of features)
When applying cosine similarity formula to find similar objects in database, how can i optimize code??
like it has to loop over all other objects in database and find score(using cosine similarity formula), isnt it slow??
well, if you want to compare one item to all the others, you kinda have no choice
that's O(n) in general
you can speed it up only by using specialized routines that do it quickly, but you cannot avoid having to check all elements
i am using it in recommendation system to predict "User who viewed this also viewed"
can you explain a little more please
what i would note is that, if you have normalized columns in a database (after doing some sort of "embedding"), then the cosine of the angle is the same as taking a dot product
then you can compute all the cosines as a matrix-vector product, for which there are super efficient implementations
but this is probably already what any decent library would do
ok thank you for the insight, i think currently i have hardcoded cosine implementation, using this would definitely improve
hi I'm completely new to ML so I may need a lot of help, apologies in advance 😅
but does anyone here have experience with using wav files as input for neural networks? all of the beginner-friendly NN tutorials are image-based so I'm having a hard time navigating through audio
like a spectrogram?
you can arrange it as something like rows representing the time axis and columns representing audio channels
or sure, in frequency domain if you like
or stft domain where you have frequency in one axis and time in the other
I've been trying to pass each wavfile as a 2D array of frequency x time where each element is amplitude
and I used constant Q instead of stft
would there be anything wrong with this
sklearn.metrics.pairwise this is used currently, i think it isnt vectorised, right? its actually pairwise
i don't know what you mean by constant Q so i can't comment on that, but if it returns some sort of matrix and contains all the info about your wav file, it should be fine 😛
this should already compute it in an efficient manner as long as you call it on all of the data at the same time
if the data is too large to fit in memory, you have no choice but to loop in one way or another. maybe by chunking into submatrices
got it
I think I should this dataset but it's 120 GB
def video_similarity(video1, video2):
both_rated = {}
for person in dataset.keys():
if video1 in dataset[person] and video2 in dataset[person]:
both_rated[person] = [dataset[person][video1], dataset[person][video2]]
# print(both_rated)
number_of_ratings = len(both_rated)
if number_of_ratings == 0:
return 0
video1_ratings = [
[dataset[k][video1] for k, v in both_rated.items() if video1 in dataset[k] and video2 in dataset[k]]]
video2_ratings = [
[dataset[k][video2] for k, v in both_rated.items() if video1 in dataset[k] and video2 in dataset[k]]]
# print("{} ratings :: {}".format(video1,video1_ratings))
# print("{} ratings :: {}".format(video2,video2_ratings))
cs = cosine_similarity(video1_ratings, video2_ratings)
return cs[0][0]
what's the standard protocol for using giant datasets?
I don't think it's just downloading all of it into RAM and feeding it unaltered from what I've heard
do it in batches maybe
if you're using tensorflow, check out the keras data generators
they essentially read chunks/batches of data from storage and only load a few parts at a time into memory
I see, I am using tf so that's good to hear
but
doesn't that mean I still have to download the dataset locally?
there must be a more clever way to make it so that you don't call this function on 2 videos at a time, but rather on the whole set of videos. that way you can call the pairwise distance func on the whole dataset in one go, which would be faster than iterating through all elements yourself
yeah. i think there are also ways to get the data from online APIs, but that means that you need a really good internet connection, since you'd download the parts of the dataset you need as you go along, and then delete them again
also this is current format: dict[user]: [dict[movies]:rating,dict[movies]:rating],dict[movies]:rating]...]
hmm ok, so to confirm
I can't avoid using up storage, but RAM during loading dataset is the real bottleneck and that's what data generator helps with?
yeah. you really could do it from the internet if you have a great connection tho
then you're limited by your internet speed and the api's rate limit
my connection isn't that great so I think I'm just going to download
thanks for the help!
Hi
Anyone has an idea how to fix?
What I did was doing svp_J = np.array([0]) instead of svp_J = 0
Cause apparently [0] = 0 on matlab
Not sure if it was the right solution tho
Hey all, if interested this is a cool discussion with Travis Oliphant who developed NumPy, SciPy, Anaconda. https://www.youtube.com/watch?v=gFEE3w7F0ww
Travis Oliphant is a data scientist, entrepreneur, and creator of NumPy, SciPy, and Anaconda. Please support this podcast by checking out our sponsors:
- Novo: https://banknovo.com/lex
- Allform: https://allform.com/lex to get 20% off
- Onnit: https://lexfridman.com/onnit to get up to 10% off
- Athletic Greens: https://athleticgreens.com/lex and...
wish he hadn't developed anaconda.
Python 3.8.13 (default, Mar 28 2022, 06:59:08) [MSC v.1916 64 bit (AMD64)] :: Anaconda, Inc. on win32
Type "help", "copyright", "credits" or "license" for more information.
u dont like anaconda? i dont even use it
I don't like that it's taught as the default assumption for data scientists, and I think its use cases are much fewer and further between than when it was first introduced.
i mainly use it because i don't like pip
what's wrong with pip?
updating all packages in a sensible way requires a lot of care and the results will vary depending on the order in which you go through the libraries as you update them
conda handles building a sensible dependency tree automatically
You can get a pipupgrade package
Aight so for the time being I made that change
so the code looks like this
temp_J = Z + Y2 / mu
U_J, sigma_J, V_J = svd(temp_J, full_matrices=False)
sigma_J = diag(sigma_J)
svp_J = sum(sigma_J > 1 / mu)
if svp_J >= 1:
sigma_J = sigma_J[0:svp_J] - 1 / mu
else:
svp_J = 1
sigma_J = array([0])
J = U_J[:, 0:svp_J].dot(diag(sigma_J)).dot(transpose(V_J[:, 0:svp_J]))
For now it doesn't error ig
But now I get a new one
Traceback (most recent call last):
File "/home/arshia/PycharmProjects/pythonProject/src/main.py", line 31, in <module>
llr1 = latent_llr(X1, lambda_value)
File "/home/arshia/PycharmProjects/pythonProject/src/latent_llr.py", line 63, in latent_llr
Z = inv_a.dot((atx - transpose(X).dot(L).dot(X) - transpose(X).dot(E) + J + (transpose(X).dot(Y1) - Y2) / mu))
ValueError: operands could not be broadcast together with shapes (496,496) (632,632)
Process finished with exit code 1
The original code is
Z = inv_a*(atx-X'*L*X-X'*E+J+(X'*Y1-Y2)/mu);
%udpate L
L = ((X-X*Z-E)*X'+S+(Y1*X'-Y3)/mu)*inv_b;
And the py one is
Z = inv_a.dot((atx - transpose(X).dot(L).dot(X) - transpose(X).dot(E) + J + (transpose(X).dot(Y1) - Y2) / mu))
L = ((X - X * Z - E).dot(transpose(X)) + S + (Y1.dot(transpose(X)) - Y3) / mu).dot(inv_b)
Not sure if I mistranslated again
oh and yeah I gotta fix the second one
first one tho
i'll just recommend you do what i would if i were to try to debug it now 😛 take out a notebook and check the dimensions of all matrices by hand, and then check that the ones in the code match that
You should check if the shape of each resulting matrix is what you expect it to be
So it would be good to know what shape you expect the matrices to be
If you don't know this yourself, you should then also print them in matlab, and compare it with that
Yeah that would work
hi, sorry how can i implement naive bayes from scratch with bag of words or tdidf?
im confused as when doing naive bayes for text classification, we can have 500 features
"word","good","bad"......
if i were to approach this as other naive bayes problems it might not work as what happens if i want to predict new values, it wont match up 500 features vs a sentence with only 200 unique words
so shapes dont align
https://www.youtube.com/watch?v=nRSBaq3vAeY&t=1995s&ab_channel=InstituteofNoeticSciences this is real?
Dean Radin, PhD, chief scientist at the Institute of Noetic Sciences (IONS), speaking at the 2016 Science of Consciousness Conference in Tucson, AZ.
Want to hear more about the science of consciousness? Join us for the IONS ConnectIONS Live free webinars! Visit https://noetic.org/connections/
are there any resources to how to implement naive bayes from scratch with text classifcation?
i cant seem to find any most tutorials only show using sklearn
others show example.
I was thinking should i create a dictionary set to store all the words and their associated probability?
and store the mean, variance and prior
back again, okay i googled the api for the libraries and was able to find a way to do this but my tfidf matrix shows lots of zeros
should i still use this in my model or use bag of words?
tfidf i understand represents the weight but i have lots of weights of 0
@wooden sail Up until that error-giving line the debugger on Pycharm tracked the values as this
The exact moment in enters the formula
now use that to check where the error is
transpose(X).dot(E)
This is the exact location where the debugger goes nuts
E: (496, 632)
Wait this doesn't make sense
X: (496, 632)
Shouldn't X.T be (632, 496)?
The error explicitly says
Traceback (most recent call last):
File "/home/arshia/.local/share/JetBrains/Toolbox/apps/PyCharm-P/ch-0/221.5080.212/plugins/python/helpers/pydev/pydevd.py", line 1491, in _exec
pydev_imports.execfile(file, globals, locals) # execute the script
File "/home/arshia/.local/share/JetBrains/Toolbox/apps/PyCharm-P/ch-0/221.5080.212/plugins/python/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/home/arshia/PycharmProjects/pythonProject/src/main.py", line 31, in <module>
llr1 = latent_llr(X1, lambda_value)
File "/home/arshia/PycharmProjects/pythonProject/src/latent_llr.py", line 65, in latent_llr
Z = inv_a.dot((atx - transpose(X).dot(L).dot(X) - transpose(X).dot(E) + J + (transpose(X).dot(Y1) - Y2) / mu))
ValueError: operands could not be broadcast together with shapes (496,496) (632,632)
python-BaseException
Process finished with exit code 1
check all of the sizes
The very exact location would be the transpose(X) function
mhm
atx is of size 496 x 496
either atx is the wrong size, or everything else is
go one by one and check every single one of the matrices
take out a piece of paper and check your math
Ironically I forgot how matrix dots worked 💀
well, time to pick up a book
Clarifying
I remember what the dot product is
I just forgot the final product size rule
if we dot A(a, b) and B(b, n)
The size would be
C(a, n)
right?
atx = X'*X;
This is the matlab definition
so
the right code would be
atx = transpose(X).dot(X)
I got it backwards 💀
You were right, atx was backwards it had to be X' * X but it was X * X' which is not the same in matrix rules
We come to the next error now 💀
Traceback (most recent call last):
File "/home/arshia/PycharmProjects/pythonProject/src/main.py", line 31, in <module>
llr1 = latent_llr(X1, lambda_value)
File "/home/arshia/PycharmProjects/pythonProject/src/latent_llr.py", line 70, in latent_llr
E = max(0, temp - lambda_value / mu) + min(0, temp + lambda_value / mu)
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Lemme pull out the debugger again
OK so
temp: (496, 632)
lambda_value = 0.8
mu = 1e-06
I feel like there's a logic difference in MATLAB and python
Cause the values are the same in the MATLAB code
the built in max and min don't work on vectors
there is, but you also don't need it here, since there are other ways to compute the same thing
What are those other ways?
are the matrices real-valued or complex-valued?
The matrices are based on images
I think real values
And looking at the debugger I also seem to only see real values
View as array on Z for example
signs = np.sign(temp)
E = np.abs(temp) - lambda_value / mu
E = E*signs*(E > 0)
is one way
otherwise, the direct equivalents are np.maximum and np.minimum
This does the same thing as the E = max(0, temp - lambda_value / mu) + min(0, temp + lambda_value / mu)?
soft thresholding, yeah?
So like
E = np.maximum(0, temp - lambda_value / mu) + np.minimun(0, temp + lambda_value / mu)
?
mhm
Tyvm
i'm out for the day
Alright have fun!
max_l1 = max(max(abs(leq1)))
max_l2 = max(max(abs(leq2)))
max_l3 = max(max(abs(leq3)))
Gives me the same error
Using maximum on both gives me this
Traceback (most recent call last):
File "/home/arshia/PycharmProjects/pythonProject/src/main.py", line 31, in <module>
llr1 = latent_llr(X1, lambda_value)
File "/home/arshia/PycharmProjects/pythonProject/src/latent_llr.py", line 75, in latent_llr
max_l1 = maximum(maximum(abs(leq1)))
TypeError: maximum() takes from 2 to 3 positional arguments but 1 were given
can I dm someone a dataset? I need help interpreting it
Hi i need help too
anyone free?
https://paste.pythondiscord.com/filuzedaxu this is my source code , this is what im supposed to find:missing value has more than 5% of total number of samples- Replace
missing value with mean, median or mode (whichever is appropriate)
• If missing value has less than 5% of total number of samples- Remove rows
with the missing data.
im having a problem with opencv err is: File "c:\Users\90505\Desktop\serittakibi\main.py", line 66, in <module>
lines = cv2.HoughLinesP(kesik, 2, np.pi/180, 100, np.array([]), minLineLength=40, maxLineGap=5)
cv2.error: OpenCV(4.6.0) D:\a\opencv-python\opencv-python\opencv\modules\core\src\matrix.cpp:246: error: (-215:Assertion failed) s >= 0 in function 'cv::setSize'
the code:
import cv2
from cv2 import cvtColor
import numpy as np
import matplotlib.pyplot as plt
def make_coordinates(image, line_parameters):
slope, intercept = line_parameters
y1 = image.shape[0]
y2 = int(y1*(3/5))
x1 = int((y1 - intercept)/slope)
x2 = int((y2 - intercept)/slope)
return np.array([x1, y1, x2, y2])
def avarage_slope_intercept(image, lines):
left_fit = []
right_fit = []
for line in lines:
x1, y1, x2, y2 = line.reshape(4)
parameters = np.polyfit((x1, x2), (y1, y2), 1)
slope = parameters[0]
intercept = parameters[1]
if slope < 0:
left_fit.append((slope, intercept))
else:
right_fit.append((slope, intercept))
left_fit_average = np.average(left_fit, axis=0)
right_fit_average = np.average(right_fit, axis=0)
left_line = make_coordinates(image, left_fit_average)
right_line = make_coordinates(image, right_fit_average)
return np.array([left_line, right_line])
def candy(image):
gray = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)
billur = cv2.GaussianBlur(gray,(3, 1), 6)
candy = cv2.Canny(billur, 50, 150)
return candy
def display_lines(image, lines):
line_image = np.zeros_like(image)
if lines is not None:
for line in lines:
x1, y1, x2, y2 = line.reshape(4)
cv2.line(line_image, (x1, y1), (x2, y2), (255, 0, 0), 10)
return line_image
def merak(image):
height = image.shape[0]
polygons = np.array([
[(800, height ), (100, height), (400, 250)]
])
mask = np.zeros_like(image)
cv2.fillPoly(mask, polygons, 255)
maske = cv2.bitwise_and(image, mask)
return maske
image = cv2.imread('ang.jpg')
lane_image = np.copy(image)
serit = np.copy(image)
candy = candy(serit)
candy_image = candy[lane_image]
kesik = merak(candy_image)
sonuc = cv2.addWeighted(lane_image, 0.8, lane_image, 1, 1)
lines = cv2.HoughLinesP(kesik, 2, np.pi/180, 100, np.array([]), minLineLength=40, maxLineGap=5)
averaged_lines = avarage_slope_intercept(lane_image, lines)
line_image = display_lines(lane_image, averaged_lines)
cv2.imshow("result", sonuc)
cv2.waitKey(0)
sorry if its considired as spam
code block it
better to format the code
!code
Here's how to format Python code on Discord:
```py
print('Hello world!')
```
These are backticks, not quotes. Check this out if you can't find the backtick key.
write code in these:
```py
#here goes code
```
im confused
edit your message of code, and put the code in those back ticks like bot says, it will be formatted like below
print('hello')
ok but i dont have nitro so i need to send in 2 separete messages
that is okay.
is it like
'''py
import cv2
from cv2 import cvtColor
import numpy as np
import matplotlib.pyplot as plt
def make_coordinates(image, line_parameters):
slope, intercept = line_parameters
y1 = image.shape[0]
y2 = int(y1*(3/5))
x1 = int((y1 - intercept)/slope)
x2 = int((y2 - intercept)/slope)
return np.array([x1, y1, x2, y2])
'''
think its not
'''
import cv2
from cv2 import cvtColor
import numpy as np
import matplotlib.pyplot as plt
def make_coordinates(image, line_parameters):
slope, intercept = line_parameters
y1 = image.shape[0]
y2 = int(y1*(3/5))
x1 = int((y1 - intercept)/slope)
x2 = int((y2 - intercept)/slope)
return np.array([x1, y1, x2, y2])
'''
doesent work
` not '
the one below esc key
`
oh ok
leme send the full cod
import cv2
from cv2 import cvtColor
import numpy as np
import matplotlib.pyplot as plt
def make_coordinates(image, line_parameters):
slope, intercept = line_parameters
y1 = image.shape[0]
y2 = int(y1*(3/5))
x1 = int((y1 - intercept)/slope)
x2 = int((y2 - intercept)/slope)
return np.array([x1, y1, x2, y2])
def avarage_slope_intercept(image, lines):
left_fit = []
right_fit = []
for line in lines:
x1, y1, x2, y2 = line.reshape(4)
parameters = np.polyfit((x1, x2), (y1, y2), 1)
slope = parameters[0]
intercept = parameters[1]
if slope < 0:
left_fit.append((slope, intercept))
else:
right_fit.append((slope, intercept))
left_fit_average = np.average(left_fit, axis=0)
right_fit_average = np.average(right_fit, axis=0)
left_line = make_coordinates(image, left_fit_average)
right_line = make_coordinates(image, right_fit_average)
return np.array([left_line, right_line])
def candy(image):
gray = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)
billur = cv2.GaussianBlur(gray,(3, 1), 6)
candy = cv2.Canny(billur, 50, 150)
return candy
def display_lines(image, lines):
line_image = np.zeros_like(image)
if lines is not None:
for line in lines:
x1, y1, x2, y2 = line.reshape(4)
cv2.line(line_image, (x1, y1), (x2, y2), (255, 0, 0), 10)
return line_image
def merak(image):
height = image.shape[0]
polygons = np.array([
[(800, height ), (100, height), (400, 250)]
])
mask = np.zeros_like(image)
cv2.fillPoly(mask, polygons, 255)
maske = cv2.bitwise_and(image, mask)
return maske
image = cv2.imread('ang.jpg')
lane_image = np.copy(image)
serit = np.copy(image)
candy = candy(serit)
candy_image = candy[lane_image]
kesik = merak(candy_image)
sonuc = cv2.addWeighted(lane_image, 0.8, lane_image, 1, 1)
lines = cv2.HoughLinesP(kesik, 2, np.pi/180, 100, np.array([]), minLineLength=40, maxLineGap=5)
averaged_lines = avarage_slope_intercept(lane_image, lines)
line_image = display_lines(lane_image, averaged_lines)
cv2.imshow("result", sonuc)
cv2.waitKey(0)
the err
lines = cv2.HoughLinesP(kesik, 2, np.pi/180, 100, np.array([]), minLineLength=40, maxLineGap=5)
cv2.error: OpenCV(4.6.0) D:\a\opencv-python\opencv-python\opencv\modules\core\src\matrix.cpp:246: error: (-215:Assertion failed) s >= 0 in function 'cv::setSize'
can somone help
me when val_loss < loss
funy right
dud im having a mental breakdown does anyone know how to fix the err
im suffering a pyshical pain
have u tried changing whats in that line
hello for NLP, how to improve accuracy for sentimental anaylsis?
my accuracy was 80% when using bag of words then 85% for tfidf
the accuracy increase for bag of words to 85% after removing stop words such as "not"
but my best accuracy is 85%.
i not sure how else to improve,
i viewed some words which resulted in false predictions
but not sure what they have in common
my dataset is also large 50,000
i already cleaned the data too removing stopwords, lowercase, special characters removed, anything non english
should i try word cloud to better visualize?
Say, I have made a regression model that predicts the future profit depending on the price, date, season, and prior profit data. How can I find the price that maximizes the profit other than simply just input searching?
I think this is where things like MILP comes in... and I can't get my head around what the hell even they are
how would i use tensorflow to find if an image is similar to another one?
That's a really open question
To begin with, define similarity
does anyone know a reason as to why the pct_change function from pandas causes a large amount of NaNs to be generated and if there is a way to stop this
!docs pandas.Series.pct_change
Series.pct_change(periods=1, fill_method='pad', limit=None, freq=None, **kwargs)```
Percentage change between the current and a prior element.
Computes the percentage change from the immediately previous row by default. This is useful in comparing the percentage of change in a time series of elements.
it's going to depend on what the data is.
and where it's creating NaNs
basically, you should give an exact example.
tried to make a GAN produce images of cats, didn't turn out great
It's camouflaged. Is sneaky kitty.
yeah
saw an example of a GAN that by default trained for 200 epochs
i was only training mine for 3
it takes like 8 minutes per epoch, i have to find some way to speed this up because i'm not waiting an entire day per test
Just use fasstdup 🤪
https://github.com/visualdatabase/fastdup
i'm using a neural network in keras to solve a regression problem, but i'm wondering, should I scale/normalize my y data? it ranges from 1,000 -> 1,000,000,000+ and i'm wondering if that could cause issues with training and convergence. if I should, what's the best way to scale/normalize it?
a good starting point could be to scale the inputs to have mean 0 and a st deviation of 1 i think
the inputs are already scaled
I'm worried about the outputs
my main concern throughout training is how fast it'll converge and why keras is saying loss is nan
hi, i have a question on Thompson Sampling. by definition, it says we sample from a distribution and then choose the best choice. isn't the idea of sampling and then choosing the best choice a contradiction?
guys i need help with this , i have a celeba dataset and i would like to create a face keypoints detection. i noticed that all the faces in that dataset are straight and not tiled any weird way so its not accurately detecting the keypoints , so for this i want to prepare my tensorflow dataset like usual format. sets of images and labels ( keypoints ) in a tuple and are batched
even if i try tf.data.Dataset.from_tensor_slices(images).map(transform).batch(128) i still want to get that same augmented new keypoints
so basically i need a augmented images and keypoints partitioned in a batch
im using albumentation module to augment the image and keypoints
I have a quick question regarding "quantitative metrics". Is F-Score the same as F1-Score the same as Dice-Score as Dice-Coefficient? It starts getting confusing.
F1-Score and F-Score seem to be the same. The Dice Coefficient is referring to F-Score, too. So I assume that those are the same?
i HATE Ai now
how come following a tutorial on MNIST
i get barely 9% after 10 epochs but they and literally everyone else gets 97%?
can you post your code and the suggested code?
@wooden sail
i was using the wrong metrics
I dont think you remember I had this same issue yesterday
and you helped
i think I can fix it
i remember some, you had some issues with the input sizes of the layers
uv edone it wrong thats why
i followed pytorch tutorial and got 97+
follow pytorch
the issue was I used some keras.Categorical_accuracy()
instead of 'accuracy'
whats the difference between those metrics
u got 9% accuracy on clothing something isnt right
what metric are u using
Hello how can i please improve the accuracy for text classification? I cleaned my data, and excluded certain stopwords like not, but not sure how else to improve
I tried tfidf too and bag of words
i dont use nlp but sometimes you just cant, it all depends on ur data
@prime hearth there isn't a one size fits all answer for this. We'd need to know what the data is, what the classes are, and how it's performing currently.
Hi everyone, I have a few questions.
At the moment I have a laptop with the processor intel i5 3320M and planning to study artificial intelligence and data science.
1-) does it make sense to start the university with this laptop for the beginning or should I buy a new laptop? Can this laptop do what is needed at the beginning?
2-) also, is MacBook m1 or laptop with rtx 3070 better for artificial intelligence and data science?
I can't imagine how an rtx 3070 would fit into a laptop.
whether or not you want to use Mac is a religious question. I'm allergic to Apple products.
if you're doing an AI/DS focused degree program, and they want you to do model training, they will probably give you access to a high-performance computer for that.
M3070 or some such may be good
m1 pro is good but id wait for m2 pro
id also wait for 4070
im sure for ur pgoramme it will be easy enough that u can use co lab for class work
for heavy projects theyll give u a cluster
If you like gaming, get the RTX 3070. When DL is not your thing you still got gaming to drown your sorrows.
ow yea or this
anyway ur current laptop is fine
theyre not gona make u train massiv eneural networks on ur own laptop
Makes sense but the semester will start in around 2-3 months. Therefore, it the laptop I have does the job, then can wait for 4000s
it will be simple semester probably teaching u the tools
and theory
i wudnt waste the money unless it was REQUIRED for projects
because we are only 5 months away from m2 pro or nvidia gpu
ur buying in right at the end of a gen
you really shouldn't need to buy any GPU. I would be very surprised if they give you an assignment that can only be completed in a reasonable amount of time with GPU computation, but force you to use your own hardware.
i can only say good things about the m1 pro, after 2 years finally most things work on arm64
however...
theres still some kinks
ull prob have a more issueless experience on nvidia
But when the new gpus came out, will they be available for laptops?
for example... deep learning toopls
you getting nvidia gpu use a desktop
otherwise, wait for the m2pro
m2pro will perform as well as a 3080
i bet you
I have a 3070 TI, and it's basically the size of a keyboard. and it has lots of space around it so the fans can blow over and under it. I don't think anyone makes laptops with these in them.
u CAN get mobile/laptop versions
but im not sure theyre as good as the normal ones
id recommend desktop for nvidia
in either case, @finite kayak, is there a reason you're interested to buy a GPU? because you almost certainly won't need one at all for your coursework. if you want one for gaming, that's a different question.
once the hard stuff starts it will be second semester and the new m2 pro will have released around then its the perfect play
No I want to buy a laptop either MacBook or windows
it will speed up ur training massively
i do agree there are times when ur using jupyter notebook for certain projects and not a cluster where the power will help u
its quick and easy to just open a laptop and open the app
rather than connect to the university cluster
Yet, I also have a laptop but it’s old. Therefore, I am unsure whether I should buy a laptop or not. If I should buy, then should I go for rtx or Mq
M1
like i said, wait until end of semester and get the m2 pro
it will be a massive improvement on m1
Okay then thank you a lot
what OS is the laptop that you currently have?
Windows
you can get a performance boost if you delete everything and install linux on it.
xD
Hey there homies
leq1 = xmaz - E
leq2 = Z - J
leq3 = L - S
max_l1 = max(max(abs(leq1)))
max_l2 = max(max(abs(leq2)))
max_l3 = max(max(abs(leq3)))
that's what I did with my school/gaming laptop for my last semester.
mac=linux(depends if u care about experience) > > windows
E, Z, J, S being numpy arrays
what shape are they
intersting indentation
Lemme show rq
just do print(Z.shape) and say what it is.
@finite kayak have u tried using ur current laptop to grid search or train a network
E: (496, 632)
Okay I will install Linux. Then, if I feel like I need a new laptop during my projects, I will wait for the next gen. Thanks a lot!
Z: (632, 632)
!sm set 3s
✅ The slowmode delay for #data-science-and-ml is now 3 seconds.
please say what you want to say in one message, so that questions stay on screen longer.
Alright
time to crack on coding my thesis
@scarlet siren this is all you should need to do
>>> arr = np.random.random((10, 10))
>>> np.abs(arr).max()
0.999
are you looking for the maximum element?
The equivalent of max(max(abs(leq1))) on matlab
for an array leq1 of arbitrary size/dimensionality, what does that chain of functions do? because we don't necessarily know matlab.
Neither do I tbh, but I gotta translate the code
a quick test on octave shows that max takes the elementwise maximum along the first axis. so it'd first take the elementwise max along the rows, then the columns. what stelercus proposed is a solution. alternatively, you can use np.amax(np.abs(my_array))
you should probably run it in matlab and observe the behavior, so that you know what the code you're translating is intended to do.
!docs numpy.amax
numpy.amax(a, axis=None, out=None, keepdims=<no value>, initial=<no value>, where=<no value>)```
Return the maximum of an array or maximum along an axis.
In [1]: import numpy as np
In [2]: x = np.array([[1,2,3], [3,4,4]])
In [3]: np.amax(x)
Out[3]: 4
if you specify no axis, it takes the overall max over all axes. same as when you do np.sum without specifying an axis