#data-science-and-ml
1 messages Ā· Page 420 of 1
What do you mean?
What?
stop there?
I'm so confused
im saying is it enough to learn up until vectors in calc 3 and stop
Calc 3 was multivariable calculus at my university
keep in mind the names of courses vary from uni to uni, as does their content, so saying "calc 3" doesn't mean much
oh here its 1: differential 2: integral 3: 3d or vectors
vector and multivariate calc is what you'll need
so in my case its calc 1 , 2 and half of 3
its actually an insanely big topic
and very hard on the brain at some parts
Interesting. Calc 1 for me was differentiation + integration, Calc 2 was integration by parts and trig substitution, Calc 3 was multivariable. Differential Equations and Linear Algebra were their own courses
well, saying "vectors" doesn't mean they'll learn linalg in there... hopefully š
probably just gradients, jacobians, hessians, line integrals and vector fields
dammit i got a beginner question wrong
you dont get multiply the integer by the root power keeping the power in side the root?
hmm?
if you don't merge the 3 with the 1/5, for example, you need to use the chain rule with the power rule
if you do, then you just need the power rule
you get the same result
the scalar 10 doesn't participate in the derivative, since differentiation is linear
its first question of its kind i just used methods prior
so i looked at the 10
do you mean you only look at the closest transormation of x?
i saw it was 10* 1/5 x^3
what is 10 * 1/5 x^3?
10(5rootx^3) = 10(x^3 ^1/5)?
use parentheses
let me re write it lolp
so basically, you do the root stuff first and th en the 10 later instead of 10 first by the root power
to reprhase then, looking at a 5th root of x something
i looked at that and thought well thats 1/5 of that value
so i did the power rule with the 1/5 and 10 first
instead of 3
what???
mhm
i have no idea where you're getting multiplication there, it's clearly saying "take the 5th root"
power rule
and it has nothing to do with the 10
for example if we had 4x^5, id say 20x... in this case ^4
i tried to apply this logic
ah, you want to use the chain rule, you mean? because you're not saying anything nor using symbols, so i have no way of understanding what you're trying to say if you don't use words š
is this chain rule?
yes, the first step of the chain rule there will give that result. but you have g(f(x)), where g is exponentiation to the 1/5
and f is raising to the 3rd power
so you need the chain rule
so its the chain rule even if ur just doing a simple two step power rule on one x
it's chain rule when you compose two functions
i thought u can just multiply 10 by 1/5
that you can do
the parser is probably wrong
that sounds correct to me
so i got 6x^2 instead of 6x^2/5
yeah the issue was not with the 10/5 š
but rather with differentiating the 1/5 power
but i thought we already got rid of that by multiply 1/5 by 10
and - 1 off the power i did forget to do
that's what you got wrong, yes
ohhhhhh
lol
!
bsaically what went wrong was i was under the impression that you only minus one off the power when its the power closest to x, for some reason
that's an odd assumption to make
this is not the chain rule though right? it seems simple power rule
oh but in a way, i suppose it is
chain rule because you do the power rule twice
you have composition of power functions
its sort of like saying d x^3 / something
if you do power rule twice, why is it dy and not d2y
so in this particular case what are these steps substitued with my gvalues
10 d (x^3)^(1/5) / dx= 10 [(1/5) (x^3)^(-4/5)] * [3(x^2)]
the first [] is g'(f(x)), the second is f'(x)
or maybe i did that wrong, i'm in a meeting rn
check my arithmetic with the exponents
Hey, I'm running a yolov3 script for detection but getting this error.. can anyone help out?
Traceback (most recent call last):
File "number_plate.py", line 34, in <module>
ln = ln[i[0] - 1]
IndexError: invalid index to scalar variable.
This is the line of code
LABELS = open(LABELS_FILE).read().strip().split("\n")
np.random.seed(4)
COLORS = np.random.randint(0, 255, size=(len(LABELS), 3),
dtype="uint8")
net = cv2.dnn.readNetFromDarknet(CONFIG_FILE, WEIGHTS_FILE)
image = cv2.imread(INPUT_FILE)
(H, W) = image.shape[:2]
ln = net.getLayerNames()
ln = [ln[i[0] - 1] for i in net.getUnconnectedOutLayers()]
What am I doing wrong here. If I remove [0], the code runs but doesn't detect anything.
I'm trying to read this machine learning book and am getting the feeling that I need a statistics crash course first ... I'm correct about that right? The code doesn't seem crazy ( in beginning ) but I've already seen quite a few stats terms that I don't fully understand
Statistics will help your understanding of a variety of topics in machine learning. Although not required to apply the methods, you will find yourself more efficient in optimizing or trying to find what to tune or add if you understand the math behind
you don't need to understand EVERYTHING, otherwise you'd need a whole bachelors and PhD in Statistics
but it's helpful to learn the basics
Stats is awesome start with medical statistics imo
How different tests work⦠how logistic regression works
Ok got it - I can definitely understand the super basic stuff of probability/conditional probability and some of the notation is familiar from calc. but when stuff like normal distribution / weighted sums / weighted regression gets introduced I only understand it on a very shallow level. I'm gonna look into a primer on stats for comp sci and go back at it. Thank you !
haha sounds good! for real tho which calc are you in? Have you heard of the professor leonard videos? I would come home absolutely lost from what my professor said until watching those videos and they just cracked it open so easily for me.
Guys is there any way to do data entry on a mobile app? (esspecially if the app does not have an API avaliable)
Learning the stats behind the ML gives exponentially diminishing returns. I would say go learn the statistics crash course specialized on ML.
But basics will benefit you very much and debug certain things definetly (No Loss, exploding gradients, unstable convergence)
For production settings you won't be able to do much other than just implementing ready-made models to your application, with no further improvement if you don't learn a lot about the frameworks and stats.

Right on - so the deeper I go into it the more I can leave to abstraction. I'm just confused at the basics so I think I will get to know some of the vocab and ML related stats math before I get back to the ML book. Thank you much!
Iām in calc 1
Exact same error I encountered yesterday with the same yolov3 python implementation
Just use yolov5 or v6 š¤Ŗ
How do we go about choosing the correct model seeing as with the correct parameters a model can perform so much better?
by just picking several models that fit the use case and then performing tuning on them?
Hey, this is an Gaussian mixture model. But a little strange here.
Is there any idea what kind of occurrence could cause an odd binary between -1 and -0.5 to occur?
hey anyone help me on ai proctoring
I have many rows like this (Player, Year, Points). Like LeBron in the pic there is data for many years corresponding to each player.
How would I go about getting a linear regression coefficient per player for Year/Points? I'm not looking for a plot just something like
LeBron James, 0.88
Kevin Durant, 0.72
@rich gull if you have a function that can calculate that value given a sequence of numbers, you can do a group by player/year and do it.
Am using Gauss-Newton method for data fitting, anybody free to maybe check code? Getting everything I need from it but would love a second set of eyes!
!paste
Pasting large amounts of code
If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/
After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.
I'm not volunteering to do a code review per se, but you're more likely to get one if the code to be reviewed is readily available.
I have it in a jupyter notebook, is there a way of sharing in that format here? Or will I just paste the code
if it's not in colab, you might have to use one of those tools to convert it to a py file.
Code review on Gauss Newton method of fitting if possible: https://paste.pythondiscord.com/fojecuqazo (missing some prints, as in jupyter format)
I can probably only make suggestions about how to make it more pythonic.
for example, your ri function should be one statement.
def ri(x_list,alp,bet,gamma):
return np.array([np.exp(-1 * gamma * x) - alp * np.exp(-1 * bet * x ** 2) for x in x_list])
you can make this code a lot less verbose in general by using list comprehensions, as I've done here.
like with your get_multi_jacobian function
as for ri_multi
for i in range(len(alps)):
expo_sum -= alps[i]*np.exp(-1*bets[i]*(x**2))
are alps and bets lists or (numpy) arrays?
!zip
The zip function allows you to iterate through multiple iterables simultaneously. It joins the iterables together, almost like a zipper, so that each new element is a tuple with one element from each iterable.
letters = 'abc'
numbers = [1, 2, 3]
# list(zip(letters, numbers)) --> [('a', 1), ('b', 2), ('c', 3)]
for letter, number in zip(letters, numbers):
print(letter, number)
The zip() iterator is exhausted after the length of the shortest iterable is exceeded. If you would like to retain the other values, consider using itertools.zip_longest.
For more information on zip, please refer to the official documentation.
alphas = [1/N for i in range(int(N))]
betas = [3.0**(i) for i in range(int(N))]
so they are lists. you should probably turn them into numpy arrays.
!e
import numpy as np
N = 5
alphas = 1 / np.arange(N)
print(alphas)
betas = 3.0 ** np.arange(N)
print(betas)
@serene scaffold :white_check_mark: Your eval job has completed with return code 0.
001 | <string>:3: RuntimeWarning: divide by zero encountered in divide
002 | [ inf 1. 0.5 0.33333333 0.25 ]
003 | [ 1. 3. 9. 27. 81.]
Okay, cool thank you! I've implemented those there correctly i think: https://paste.pythondiscord.com/tumitudihi
That makes sense yep!
š
The real innovation is deep learning tbh
This is where all the calculus and algebra is required.. typical ML data science is just stats reborn 2.0
Think the only exceptional stand out development was xgboost
And other boosting algos
Having a small syntax issue (line 359) now I think after changing some of these things: https://paste.pythondiscord.com/sededasimo
Can't seem to spot my issue
Does this while loop is ever executed?
def solve_system_via_iteration(x_list,input_alp,input_bet,gamma):
alp=input_alp
bet=input_bet
error_alp=1.0
error_bet=1.0
while error_alp<= 0.001 and error_bet<= 0.001:
count += 1
ri_list=ri(x_list, alp,bet,gamma)
jacobian=get_jacobian(x_list,alp,bet)
lu,p=linalg.lu_factor(jacobian)
b=-r
i_list
x=linalg.lu_solve((lu,p),b)
error_alp,error_bet=x[0],x[1]
alp+=error_alp
bet +=error_bet
return alp,bet
Can someone please guide me how to extract key value pair from scanned invoices using LayoutLM model from Huggingface.
can anyone help me with this, I tried using background subtraction but some of the image isn't exactly the same so the subtraction process is flawed with imperfect results
so what I'm trying to achieve is making a bullet hole detection which I manage to achieve in the 4th image (right-most)
You managed to achieve it but youāre trying to achieve it?
Have you tried labelling with masks to segment them so you donāt need to do any of that
So what you do is take ur images and by hand label where the bullet holes are
Make the entire image black except for the holes
For a day or two until u have thousands
And any cnn should easily manage
that's ok, but there's no need to do that by hand
you can generate synthetic data sets where the hole number, location, shape, and size all vary randomly within sensible ranges, and where the crosshair also changes, by adding e.g. noise to a template of its basic shape
then you immediately know the ground truth and can generate examples on the fly without having to store them
i would apply some traditional image processing techniques like edge detection methods before putting it into a neural network; that would vastly improve performance
yes I was hoping it to work on all cases
not just a single case
sorry for the late response and not clearly explained my issue
but I think I figure out a way to solve it
though also just adding a few convolutional layers at the beginning should achieve the same effect if you have enough examples
is the detection done with networks or classical methods?
Oh sorry I'm making a simple detection system with opencv nothing fancy like that
all righty
one thing you can do is make a mask slightly larger than the crosshair and apply it to all images, followed by inpainting
that's the problem I only have small amount of examples
What exaclty do you mean
you can generate examples synthetically, the scenario is rather simple
A cnn can detect edges itself
i know, that's also what i said, and that's why i asked if they're even using networks in the first place. they aren't.
Well they should
there's no need, this one can be done deterministically
looks like a very nice inverse problem practice
And in my experience itās a hand label job I donāt know a method of generating similar images with edges as masks
Iām not sure what is better here than cnn
define "better"
you can make an asymptotically unbiased estimator for which you have performance guarantees
Ah hold on, because the targets are symmetrical you can just look at pixels
All right thanks for the suggestion everyone I appreciate it.
Where they arenāt at the usual rings
Imma go back to work on this
If you know which pixels in every target are rings u can just see what pixels are there that shudnt be
But itās more robust to cnn because you might be dealing with irregular photos or whatever
cnns aren't the only image processing method there is. not only are there tons of other things you could do, if you're familiar with other methods, you can also integrate them into your network as hybrid methods to get better performance
What would you do here, personally
i already explained above
Sometimes hand labelling is unavoidable
probably two or one step sparse recovery, no deep learning
You mean where the pattern of pixels is interrupted
You have rings that are technically constant
that they aren't constant is the reason their subtraction method doesn't work
the 2d image is expected to have very few parameters. if you approximately represent the holes as circles of an average radius, each circle is parameterized by its x,y coordinates
then there are 2x as many parameters as there are holes
but you have however many pixels there are in the image as data inputs
So youād need a target without bullet holes first
if the number of parameters is much smaller than the number of inputs and the model is well structured, then you can recover the coordinates with good guarantees even if the number of samples is decreased to close to the number of parameters
since they have strong landmarks like rings it might be worth considering image registration
then what you can do is make a mask of where the crosshair is and ignore those pixels entirely, regardless of their value. registration may or may not be needed, depending on how well the crosshairs match from image to image already
cnn would also work
in particular, there exist cases where you can hit the trivial lower bound. say there are 5 circles in the image, each with an x,y coord. then in optimal conditions, you can use exactly 10 pixels from the image to find where all the holes are if you use the correct domain
yes, ofc, cnn will work if you can generate enough examples, synthetic or otherwise. but also just using a cnn is the same as not using any domain knowledge you might already have
that's brute force ML and has no merit
if you want to learn something yourself, you wouldn't do that
you can solve any problem that way with enough data, though
without ever bothering to understand what, why, or how
Is sparse recovery hard to code
pretty sure scikit learn and scipy have solvers for it already, so no. understanding it well, i guess yeah
lemme fish up the original paper
Suppose x is an unknown vector in Ropf
m
(a digital image or signal); we plan to measure n general linear functionals of x and then reconstruct. If x is known to be compressible by transform coding with a known transform, and we reconstruct via the nonlinear procedure defined here, the number of measurements n can be dramatically smaller than t...
As someone who doesnāt know how to use this technique and I was given this task Iād simply have to use cnn as there is no time to learn āwellā as you say
exactly what i mean. there are textbooks on traditional image processing. and a combination of this and neural networks lead to vastly superior performance rather than one or the other alone.
ofc you would š
Whatever works
would you seriously spend months taking pictures and labeling them by hand for this?
you could make a synthetic data generator in a couple of hours and then use either ML or classical techniques and solve it in one go
Trueā¦
i don't think you could make thousands of images in just 2 days
Could
Iād say 5000 is more than enough to perfect it
Imagine with 2000 images itās enough
Would suck ass tho
anyway, this is beside the point. ML and deep learning are not always the best solution, and black box ml is, well, what i'd expect from someone that doesn't know much stuff
this is the same guy that lambasted me for trying to say DS should learn some basics of data engineering
š
I think u mistook me
you could get a phd and make a living out of classical image processing alone
Some basics, sure, not entire workflows that take a long time
Apache spark, yes
Automation yes
one of my profs did their postdoc in this and publishes in this
indeed
If I were to do a PhD it would probably be on imaging
but nowadays he combines it with ML
But thereās nothing novel I could come@up with
he taught our Feature Engineering class and Pattern Recognition class
nice
U guys have phds? Or masters
yeah some solid traditional image processing techniques and then the latter combined these techniques with ML
i have a masters and am about half way into a phd
i'm doing something similar...ish
oh nice!
it surprises me how much you can do with fourier transforms + ML
def yet another area of potential research
How is it compared to masters
I know some guy whoās a bachelors math student but has done a lot of advanced deep learning projects for his age⦠must have coded since young how to compete with such talent
Imaging having such a GitHub portfolio at age 19!
Even wrote some impressive papers on Ml for his age
the whole github portfolio thing matters depending on what you're aiming for and where you live
papers are what talks here
published
No one gets published before masters really
I may be a part of a publication soon but itās in my opinion a very weak project
So not sure if it would help when it comes to people who care
the coding part, you can learn whenever you want. seems you're already on the wagon. as for doing cool stuff with ML, i'll die on the hill that you can do cooler stuff, and do it better, if you understand what you're doing better
you always see "domain knowledge" in descriptions of jobs and whatnot involving data analysis, ML, signal processing, etc
that means, you need to know math, you need to be familiar with several different techniques and how they work, and you need a base level of technical knowledge regarding the thing you will apply all of this to
Yeah I can code but Iām 24 haha
The hard part is
I have an interview next week at a tech company
And itās gona be DSA coding
Which is hard⦠not good at it as I only first learnt to code for my msc
I hope they donāt ask more than east
Easy
Iād choke on mediums
i guess the immediate question is, why do you apply to a job for which you know you don't meet the requirements š
Ehhh I do
I could do it
You donāt use DSA questions as part of the job
For example, you code but youāll never need to do dfs or some shit on a tree
they're gonna ask you stuff to see if you understand what it means, i think people often stackoverflow during interviews
I can easily explain that but no
You have to code it
Itās hard
Ur not meant to SOF
Basically itās like being given a leetcode question and asked to do it infront of the guy
And considering I didnāt rote memorise the logic behind mediums Iāll prob fail on the spot
But Iām defiantly qualified for the job
Itās a grad job
Fingers crossed itās arrays and strings and not dsa
best of luck
Thanks
Hey guys, I have a some news; I developed an advanced mathematical solver(console) like Maple, Matlab, WolframAlpha. (name is "Mathpath console") But you know if you a student, you don't have money you don't have support about it. My motivation sending this message is feedback, introduce to people... Because my app not know by people yet.
What language
Using Gauss-Newton method for data fighting with LU sampling points and QR method, anybody able to help with code review? https://paste.pythondiscord.com/oqaciteyaj
I've cut most of the fat off the code but would love any help
I don't immediately see what the syntax issue is. If you got an error message please always show the error message
By the way, your code would be easier to read if you used spaces.
Q, R = linalg.qr(jaco, mode='economic')
b = -1 * np.array(ri_list)
y = np.matmul(Q.T, b)
x = np.linalg.solve(R, y)
also if you're working with matrices and vectors, Q.T.dot(y) does the same as matmul. that's just flavor though, i kinda prefer it
you could also do y = Q.T @ b. the cpython devs changed the grammar of python specifically for that.
(which I don't think was necessary, but yay options?)
(also I can't continue complaining about that, or I'll get angry that we don't have function composition.)
function composition how?
How should i go about adding multiple objectives to a bin packing problem with or tools?
I have a list of items each with a weight, volume, vendor, customer and port. I am easily able to constrain the bins on weight and volume but i want to alter the objective from least ammount of bins to least ammount of cost. Cost for a bin with 1 vendor and 1 customer is lower then a bin with 3 vendors, 3 ports and 3 customers for example.
where (a @ b @ c)(x) is a(b(c(x)))
I can paste my code but it is long
!paste
Pasting large amounts of code
If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/
After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.
though this sounds like an #algos-and-data-structs question.
aha, that's what you mean. well... don't forget that multiplication is composition when working with matrices š but i see what you mean. at that point they may as well integrate this in general instead of just for matrix products though
Okay thanks. I wasnt sure which channel was approriate.
Hello, I have a very annoying issue while plotting with seaborn using timeseries. I have a series of readings of temperature over different days, and I would like to plot them all together, similarly to what I'm doing here. The problem is that I have one day that starts at 14h, and that forces the plotting at 14h, rather than midnight, with the annoying result that at midnight the line switches to the wrong day.
If I try to use a full datetime object (date and all) as the x-axis, I can't filter by day anymore with hue...
Is there a good way of fixing the axis? I looked up the matplotlib locator/formatters but I wasn't able to find a better way of fixing it while using seaborn.
Dropping the entire set of incomplete measurements also works, but I'd rather not have to do that
try #web-development
+1 for transformer papers. hmm pre-transformer is a wide landscape, especially concerning NLP.
youre better served going through stanford's NLP class syllabus i believe
Yup š„²
then you can choose which papers you want afterwards
Hey guys still stuck on a seemingly basic question... how does one pick the correct model? Seeing as param tuning can result in significantly better results, testing loads of untuned models doesnt seem great. For example on my dataset testing the untuned SVM classifier results in ~0.6 accuracy, while tuned its a top performer at ~0.81
ah i went through the original jurafsky course but that one should be even better since it includes more up-to-date NLP practices like transformers
@wooden sail u familiar w the fastest way to compute Fibonacci n
Wondering why itās faster for computer to split integers
hmm?
Kara whatās his face
not familiar with that, idk what you mean
you still didn't use full sentences to explain what you mean, but i'm guessing the question goes in the direction of "why is it faster to use the naive recursion formula than to use the closed form expression"
Anyway, dynamic programming isnāt the fastest solver to finding fib n, thereās a linear algebra solution that I donāt understand
taking a square root cannot be done in closed form. when you ask your computer for a square root, it does an iterative algorithm that stops after the approximation no longer improves much per iteration
the linalg approach you found is probably a matrix-vector representation of the difference equation that defines the fib numbers.
please use words instead of just sending links with no comments from your side lol
but yeah
Matrix and further is beyond me. I only know until second
the matrix is diagonalizable and therefore there is a nice way to take arbitrary matrix powers based on the eigenvalues
But we donāt have fn so how do we use f n+1
this is equivalent to the closed form expression in the end, since the eigenvalues are precisely the pair of golden ratios
Oh
Ohhh
We use the f n we have already and then find the f 2n +1
But doesnāt that skip in between
In fast doubling
Is this just used to compute up until a large value of n and then use a naive approach beyond that?
Matrix exponentiation would have to be used to find an exact
And that uses f n and fn -1 but I donāt know how it computes faster than the previous method
the gains don't come from using the matrix approach itself, they're from a "fast exponentiation algorithm"
Do you know why karatsuba multiplication is so fast
you decompose the exponent in base two and observe that you can obtain the result from log n squarings
no, i'd never heard of it before
Do computers multiply by splitting a integer into its 10^n chunks
Oh itās like
Dividing a number up to multiply
Is somehow faster
Can the method be used to find fib n where n is not divisible by two?
Binary?
mhm
Who ever started numbers on computers is a genius
fast exponentiation is a very simple algorithm, really:
def power(x, p):
if p==0: return 1
elif p==1: return x
elif p%2==0: return power(x**2, p//2)
else:
return x*power(x**2,p//2)
and for matrices it's exactly the same (just replace 1 with an identity matrix I guess).
I think a fraction can be represented in Binary
Nice
That simplifies it a lot
ofc, but you have to be careful regarding which numbers are irrational in base 2 vs base 10, for instance. you don't have that problem with integers
Hey, could someone explain the fifth line( data = data * srd_0 + mu_0), what's he doing?
This is a process to create a Gaussian
the fifth line is just a scaling and a shift
by default a gaussian distribution has mean 0 and standard deviation 1
so by multipliying it by 2, the standard deviation is now 2
and by adding it by 5, the mean is now 5
it's quite a good idea to brush up on what mean and standard deviation mean
Oh, got it! So it would be better than just calculate the raw data without scaling, right?
By the way, so is it a common way to shift data?
Thanks for the explanation! @modest onyx I appreciate it
@charred egret :white_check_mark: Your eval job has completed with return code 0.
001 | orig std: 1.0003886578560521
002 | orig mean: 0.005016058783782292
003 |
004 | new std is: 2.0007773157121043
005 | new mean is: 5.010032117567565
it's not uncommon at all
but most commonly I've seen it used to normalize a dataset
meaning we shift it and scale it in such a way as to make the mean 0 and std 1, rather than the other way around
I don't quite get what you mean here
Oh! The dataset have already normalized. So in this case, do I need to shift it?
I mean, before creating a Gaussian distribution, to normalize data (or shifting data) is kind of necessary, right?
Not unless you have a specific reason why you'd want to do that
this isn't a real dataset. In real life datasets never perfectly have mean 0 and std 1
I'm not exactly sure what the context is here maybe you can link me the page you're reading
Yah, that's true
But there isn't enough explanation
BTW, it's weird that my plot look like this
In a binary classification with class imbalance towards class 1, if the logistic regression confusion matrix failed to predict a single 0 class but Random Forest, XGBoost didn't have this issue, what can I say about logistic regression model? Is it because the dataset is not linear?
I've learnt python basics including data structures and v basics of libraries such as numpy, pandas, matplotlib etc. Can anyone guide me that how can i get started with ML/ AI? Pls
some maths would come in handy
probability and stats, multivar calc, and linear algebra are the basics
tensorflow
I keep getting this error: cs ValueError: Input 0 of layer conv2d_40 is incompatible with the layer: : expected min_ndim=4, found ndim=2. Full shape received: (None, 1)
this is my model:```python
inputs = keras.Input(shape=(48, 48, 1,))
x = keras.layers.Conv2D(16, 2, padding="same")(inputs)
x = keras.layers.LeakyReLU()(x)
x = keras.layers.MaxPooling2D(pool_size=(2, 2))(x)
x = keras.layers.Conv2D(32, 3, padding="same")(x)
x = keras.layers.LeakyReLU()(x)
x = keras.layers.Conv2D(32, 3, padding="same")(x)
x = keras.layers.LeakyReLU()(x)
x = keras.layers.Conv2D(64, 3, padding="same")(x)
x = keras.layers.LeakyReLU()(x)
x = keras.layers.MaxPooling2D(pool_size=(2, 2))(x)
x = keras.layers.Conv2D(128, 3, padding="same")(x)
x = keras.layers.LeakyReLU()(x)
x = keras.layers.MaxPooling2D(pool_size=(2, 2))(x)
x = keras.layers.Flatten()(x)
x = keras.layers.Dense(36, activation='relu')(x)
x = keras.layers.Dense(72, activation='relu')(x)
x = keras.layers.Dense(72, activation='relu')(x)
x = keras.layers.Dense(36, activation='relu')(x)
x = keras.layers.Dense(18, activation='relu')(x)
outputs = keras.layers.Dense(7, activation='softmax')(x)
model = keras.Model(inputs=inputs, outputs=outputs)
and also, what should be the shape of y_train
because rn, its a ndarray with values 0 - 6
Hi, Everyone
I didnot understand this question can someone explain it to me without giving answers.
that weighted sum part bounced over my head.
so... what don't you get?
weights part
the weighted sum is basically red * weight + blue * weight + green * wieght
and that is 1 pixel
im pretty sure
?
can someone help?
so sum of those weights is 1
Also you need to do gray scale checking to make sure it not out of bounds in gray scale
so red * 1
to get red color we do this right
[:, :, 1]
?
no
i dont get you
so, for each pixel something like this
pixel = (red * weight) + (blue * weight) + (green * weight)
i have a question
what
how do you structure a y_train class
cuz what I have is a 1D array
i think it needs to be 2D
but how
oh I get it I think
I am begineer so I dont knowš
depends on ytrain
is it like an image ?
if so it makes sense for it to be 2d
and in this case your model is generating an image, which is common with segmentation models
no its a classification
of emotions
there are 7 classes
from 0 - 6
and there are 28709 images
np.apply_along_axis is you friend for that one š
i cant tell which emotion is which class god darn it
is it happy? sad? angry?
this is a major problem
im dumb enough that even after looking at the face, I still cant tell
initially you would have a 1D array for you labels (ytrain) which would look like so : ["sad", "angry", "happy" , ...]
no
this is the dataset: https://www.kaggle.com/datasets/ahmedmoorsy/facial-expression
ok
how would an angry person look like
can you help?
lol what is this, no description of dataset or anything xD
it looks like the classes are numbered
ik
each number representing a feeling
what does this girl look like?
you need to manually look into the labels and check which number is assigned to which feeling
That is a very unprofessionnal author that posted this on kaggle
yeah but he could have at least added a mapping to the numbers, 1 = happy, 2 = sad
But I guess you can name them as you perceive them yourself
and the model would still work
hehehehe look what I found: label_map = ['Anger', 'Disgust', 'Fear', 'Happy', 'Sad', 'Surprise', 'Neutral']
on someones code
yay
help plz: ```python
ValueError: Creating variables on a non-first call to a function decorated with tf.function.
y
i did nothing: python ValueError: Input 0 of layer conv2d_45 is incompatible with the layer: : expected min_ndim=4, found ndim=2. Full shape received: (None, 1)
working?
your input should be 4 dimensions tho
(r , g, b, num_of_images)
how is it only 2 dimensions?
it says in the epochs
oh ok
acuraccy: 0.0000e+00
that is not normal
shouldn't it be like 64.203
64 should be the loss
loss is nan
accuracy on validation should be small but not 0
that means you are doing something wrong
also, I don't have any validation data
what are you using on output layer
okay
my model: ```python
tf.config.run_functions_eagerly(True)
inputs = keras.Input(shape=(48, 48, 1,))
x = keras.layers.Conv2D(16, 2, padding="same")(inputs)
x = keras.layers.LeakyReLU()(x)
x = keras.layers.MaxPooling2D(pool_size=(2, 2))(x)
x = keras.layers.Conv2D(32, 3, padding="same")(x)
x = keras.layers.LeakyReLU()(x)
x = keras.layers.Conv2D(32, 3, padding="same")(x)
x = keras.layers.LeakyReLU()(x)
x = keras.layers.Conv2D(64, 3, padding="same")(x)
x = keras.layers.LeakyReLU()(x)
x = keras.layers.MaxPooling2D(pool_size=(2, 2))(x)
x = keras.layers.Conv2D(128, 3, padding="same")(x)
x = keras.layers.LeakyReLU()(x)
x = keras.layers.MaxPooling2D(pool_size=(2, 2))(x)
x = keras.layers.Flatten()(x)
x = keras.layers.Dense(36, activation='relu')(x)
x = keras.layers.Dense(72, activation='relu')(x)
x = keras.layers.Dense(72, activation='relu')(x)
x = keras.layers.Dense(36, activation='relu')(x)
x = keras.layers.Dense(18, activation='relu')(x)
outputs = keras.layers.Dense(7, activation='softmax')(x)
model = keras.Model(inputs=inputs, outputs=outputs)
model.summary()```
So your input is probably wrong, or the labels
my y_train is currently shaped (28709, 7)
print one of them
okay
thats y_train[2]
and one of the inputs?
the picture I showed you earlier
48x48
each pixel is between 0 - 255
do I need to normalize them?
wait is it rgb?
i want it to take input 1 image at a time
no
then you cannot train it one image at a time
?
I mean you could but why xD
but after its trained, it can do 1 img at a time right?
yes
ok fine
for test labels
so what am I missing
but for train you need to input all your train data
and run the model for X epochs
okay so on teh 10th epoch it was giving you 0 accuracy?
i have no validation data, is that required?
im not on the 10th
im on 8th
this is my compile: python model.compile( loss = "categorical_crossentropy", optimizer = keras.optimizers.Adam(learning_rate=0.01), metrics = [tf.keras.metrics.Accuracy()] )
this is my fit: python run = model.fit( x_train, y_train, batch_size = 16, epochs = 10 )
I see, where did you get the architecture of your model
wdym?
The layers, like input, hidden layers (conv, maxpooling) and output layer
well, I made it up?
I would start with bigger filters in convolutional layers and then shrinking them down,
like 96 then 48 then 24 etc
because you're shrinking the image to 16x16 on your first layer, which IMO removes a lot of info from the input
this is interesting tho..I should try it once
I usually keep it at 32 and then another model where I double it in the next layers
so...... how can I fix my issues
can somebody help?
I get this error: ```python
ValueError: Shapes (16, 1) and (16, 7) are incompatible
My model:```py
inputs = keras.Input(shape=(48, 48,1,))
x = keras.layers.Conv2D(16, 2, padding="same")(inputs)
x = keras.layers.LeakyReLU()(x)
x = keras.layers.MaxPooling2D(pool_size=(2, 2))(x)
x = keras.layers.Conv2D(32, 3, padding="same")(x)
x = keras.layers.LeakyReLU()(x)
x = keras.layers.Conv2D(32, 3, padding="same")(x)
x = keras.layers.LeakyReLU()(x)
x = keras.layers.Conv2D(64, 3, padding="same")(x)
x = keras.layers.LeakyReLU()(x)
x = keras.layers.MaxPooling2D(pool_size=(2, 2))(x)
x = keras.layers.Conv2D(128, 3, padding="same")(x)
x = keras.layers.LeakyReLU()(x)
x = keras.layers.MaxPooling2D(pool_size=(2, 2))(x)
x = keras.layers.Flatten()(x)
x = keras.layers.Dense(36, activation='relu')(x)
x = keras.layers.Dense(72, activation='relu')(x)
x = keras.layers.Dense(72, activation='relu')(x)
x = keras.layers.Dense(36, activation='relu')(x)
x = keras.layers.Dense(18, activation='relu')(x)
outputs = keras.layers.Dense(7, activation='softmax')(x)
model = keras.Model(inputs=inputs, outputs=outputs)
model.summary()```
Compile:py model.compile( loss = "categorical_crossentropy", optimizer = keras.optimizers.Adam(learning_rate=0.01), metrics = [tf.keras.metrics.Accuracy()] )
Fit: py run = model.fit( x_train, y_train, batch_size = 16, epochs = 10 )
Input: 48x 48 img with 1 channel from values 0 - 255
Output eg. [0, 0, 0, 1, 0 ,0, 0]
can be 1 of 7 classes
reshape the array
x train y train
u need match with ur input xtrain
ur model expecting 1 but got 7?
did u encode or something
Where do you get the error?
Show the full traceback
obviously inputting a 7 instead of 1
he just needs a reshjape prob
why do u have 7 when thats meant to be the output
aka y train
The input is an image of 48x48x1 though, where is the 16x1 or 16x7?
Where he tried forcing x train into it
Something tells me that isnāt the full error usually it tells more
ok 1 sec
Hey @gleaming osprey!
You either uploaded a .txt file or entered a message that was too long. Please use our paste bin instead.
i don't get you
(28709, 48, 48, 1)
(28709, 7) is y_train
I'm getting a new error
anything?
same as here
anybody?
Considering your batch size is 16, and the shape mismatch is about something with shape (16, 1) I assume you didn't one hot encode your output
You have something like:
y_batch = [4, 2, 0, 1, 6, 2, 3, 5, 3, 6, 4, 2, 1, 4, 5, 5]
whereas it should be:
y_batch = [
[0, 0, 0, 0, 0, 1, 0],
[0, 0, 0, 1, 0, 0, 0],
... (12 more)
[0, 0, 0, 0, 0, 1, 0],
[0, 0, 1, 0, 0, 0, 0],
]
@gleaming osprey
yeah
i fixed it by changing to categorical_crossentropy
now, i'm on epoch 1
but
I have an issue, it seems that the accuracy is decreasing?
That can be due to many many things
2% for 7 classes is worse than random guessing
Iām working with a pandas dataframe; how do I set the indices in this level to rangeindex like so? (Simply taking the last character from each index wonāt work for all of the indices I have.)
hello, please do print(df.to_dict()) and put the resultant text in the paste bin
!paste
Pasting large amounts of code
If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/
After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.
Please ping me when you have done this. I will not use any screenshots
thanks, one moment
U donāt need to encode output tho right? Itās categorical
Seems like itās trying to do regression?
@timid hollow
# move the three levels of indexing out of the index
reset = df.reset_index()
# extract the numbers from the third level and overwrite
reset['level_2'] = reset['level_2'].str.extract('(\d+)', expand=False).astype(int)
# overwrite the original df variable
df = reset.set_index(['level_0', 'level_1', 'level_2'])
I got this at the end.
MultiIndex([( 'W', 'W', 1),
( 'W', 'W', 2),
( 'W', 'W', 3),
( '令', 'Ling', 1),
( '令', 'Ling', 2),
( '令', 'Ling', 3),
( 'ä¼čå©ē¹', 'Ifrit', 1),
( 'ä¼čå©ē¹', 'Ifrit', 2),
( 'ä¼čå©ē¹', 'Ifrit', 3),
( 'åę„åØé¾é', 'Ch'en the Holungday', 2),
( 'åę„åØé¾é', 'Ch'en the Holungday', 2),
( 'åę„åØé¾é', 'Ch'en the Holungday', 2),
( 'åå½±', 'Phantom', 1),
( 'åå½±', 'Phantom', 2),
( 'åå½±', 'Phantom', 3),
( 'åÆå°åø', 'Kal'tsit', 1),
( 'åÆå°åø', 'Kal'tsit', 2),
( 'åÆå°åø', 'Kal'tsit', 3),
( 'å»äæę', 'Ceobe', 1),
( 'å»äæę', 'Ceobe', 2),
( 'å»äæę', 'Ceobe', 3),
( 'å”ę¶
å©å®', 'Carnelian', 1),
( 'å”ę¶
å©å®', 'Carnelian', 2),
( 'å”ę¶
å©å®', 'Carnelian', 3),
( 'å²å°ē¹å°', 'Surtr', 1),
( 'å²å°ē¹å°', 'Surtr', 2),
( 'å²å°ē¹å°', 'Surtr', 3),
('å”ę¶
���å®', 'Carnelian', 3)],
names=['level_0', 'level_1', 'level_2'])
This doesnāt work for all of the indices I have; I would like indices like (1, 2, 3), (1, 2, 3), (1, 2, 3), ā¦
(Simply taking the last character from each index wonāt work for all of the indices I have.)
Here is more of the dataframe that shows the issue
https://paste.pythondiscord.com/tasegakati
so you hadn't sent the whole dataframe originally? because that's what I asked for š¦
I did use to_dict() on the entire dataframe, but the dict written to stdout was cut off because it was too long.
see if you can figure out how to solve it using the code example I gave you.
Your code example doesnāt help because youāre extracting the numbers, but that doesnāt produce (1, 2, 3), (1, 2, 3), ⦠for all indices. I will ask elsewhere.
Hi guys I have a question regarding dataframes in python
so I was given a Month and Year columns which are both integer and now I want to combine them
hello, please do print(df.head().to_dict()) and put the text in the paste bin
!paste
Pasting large amounts of code
If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/
After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.
Hello
oh ok let me send the latest one
I have trained a roberta model and I want that model to read a document and give the results of the classification in a CSV file...how to do this can anyone help?
basically I have a problem with converting the year and month to string before doing month + "_" + year
it shows up as 2019\n4 as an example
You asked how to combine the Month and Year columns a certain way, but you have provided an example where they are already combined.
Anyway, take a look at this
In [7]: df['month']
Out[7]:
0 8
1 8
2 8
3 8
4 8
Name: month, dtype: int64
In [8]: df['year']
Out[8]:
0 2019
1 2019
2 2019
3 2019
4 2019
Name: year, dtype: int64
Does anything stick out to you here?
they're both integers right
yes. what code did you do for this?
I did the str(df['month']) + "_" + str(df['year']) in my case
df['Month_Year']= str(df['month'])+ "_" + str(df['year'])
ah, that's what the problem was
str(df['some_column']) makes one string that represents the whole column
it does not convert each element to a string.
oh what
if you want to go from an int Series to a str Series, you need to use .astype(str)
because for any object, str( ) will return one string.
so i need to do that for both the data
even though pandas objects have special behaviors, they're still python objects as well.
yes, try that
let me know how it goes. there's another thing we should go over.
oh thanks a lot stelercus
the problem's solved
now it's displaying correctly now
I can finally make a plot out of this thanks to you man
no problem 

however, you should keep in mind that pandas has an actual time datatype
oh you mean obj and int64
and if you're trying to represent time, you should pretty much always use that
no, datetime
I'm a bit confused here can you explain a bit more about it
In [14]: time_strs
Out[14]:
0 2019_8
1 2019_8
2 2019_8
3 2019_8
4 2019_8
dtype: object
In [15]: pd.to_datetime(time_strs, format='%Y_%m')
Out[15]:
0 2019-08-01
1 2019-08-01
2 2019-08-01
3 2019-08-01
4 2019-08-01
dtype: datetime64[ns]
note the dtype
these have methods for dealing with those values as actual moments in time, rather than as strings.
well this is new to me
I guess this works best when the time variables are more than 2 when you add
like not just month_year but hour_minutes_second_day_month_year thing right
In [20]: times = pd.to_datetime(time_strs, format='%Y_%m')
In [21]: times.dt.month_name()
Out[21]:
0 August
1 August
2 August
3 August
4 August
dtype: object
things like this.
like more variables and need separation then it's better if I use datetime instead am I correct?
any time I'm dealing with time, I use datetime, and then use the .dt. methods to get whatever I need later.
yay learning 
and crying 
Can anyone help me calculate year over year growth prev year
Q42017 : 0 , Q42018: 152305, year over year growth prev year 0
How it is calculated
!paste
Pasting large amounts of code
If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/
After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.
add ur data to that and highlight which part's the problem in first
I know this varies from job to job, but what does the modern data scientist "stack" look like?
The obvious ones are SQL, Python or R, classification and regression techniques, model deployment, etc.
And the not so obvious ones are might be deep learning, Big Data frameworks like Hadoop/PySpark, cloud computing like AWS or Azure
I'm trying to see if I'm missing something and fill in the gaps...
hello, for natural langauge processsing sentimenal anaylsis, whats the best way to remove unfinished responses from users in dataset? for example
if have sentence "a series" or "series" only, i know stopwords can help remove 'a" but what about single words that doesnt really count as good or bad?
i have one row for example:
"series" and it is consider positive (a. good review)
but not sure if this adds value
aaand I'm back with another question
so I want to convert this groupby into violin plot but I can't find a way to do so
is there someone with a clue of how to do so?
Stack overflow
Iām pretty sure if u google pandas groupby violin plot itās the first one
I read it before
nope can't find it
i do find those with dataframes
but I can't find the one with the groupby function that's the problem
U can prob apply that method to ur one
As itās functionally the same thing
A data frame
that'd work if it's not because of the order count I have tbh
basically I have a problem because I need to count the number of orders I have in the data
which I have to use groupby instead to create that
unless I create a completely new data with the groupby then maybe it'll work
i have an issue with my tensorflow model
my accuracy is 0.0000e+00
why
my loss is about 1
it classified all wrong?
Can anyone help me with leavepgroupsout split?
cuz u did smtn wrong
hello sorry
how does naive bayes classifier work with tfidf?
i did tfidf to my dataset and would like implement naive bayes to it but not sure how it works on paper
since most data we need to manually compute the probability
but tfidf already has the log prob
suppose i have like 3 features as simple example
and output is 1 or 0
what i have is like:
x1 x2. x3
0 0.56 0.57
0.56. 0 0.4
0.2. 9 0.9
would i do sum(x1) * sum(x2) * sum(x3)
okay after a little research would i sum instead of counting?
so i sum all x1 whereever p(x1|y=1)
and divide it by total sum?
this would probably be a better question for #databases.
finally finished my convolutional neural network
turns out it's faster and better than my linear network, even with just 1 convolution
YAY
still some code cleanup and optimization to do, but overall turned out great
Tensorflow or Pytorch?
(and scipy for its correlate2d and convolve2d functions)
If you switch it to cupy, you might even get performance comparable to pytorch
i tried, but with the gpu i tried it on, it actually got ~5x slower
that would imply something is fishy in the way the GPU is leveraged
I know this finds number of entries per year, but I want to find missing values for each year
Any idea how to do this?
I would usually do something like df.loc[f'{something.index[5]}','name']
but what
can somebody help
my accuracy starts at 40 then goes to 0
Gauss Newton - QR LS data fitting, anybody any help jazzing up this or even the plots? https://paste.pythondiscord.com/pafehebuce
Hello there guys I'm back again with more questions here
so originally, my goal's to convert the groupby into a violin plot but I couldn't find how on google
so I extracted both keys and values into 2 parts and group them into a new dataframe instead before finally make a violin plot of of that
somehow the data aren't distributed equally and it looks like this
Itās impossible for us to know
What data is it
So I'm familiar with python and know a fair amount of stuff about the pandas library how can I get started with machine learning?
continue with scikit decision trees, random forests and xg boost
do you recommend any sources or should I just follow the official documentation of scikit learn?
personally I study through kaggle.com
I looked through it and I actually did their introductory machine learning course but when I moved on to the intermediate one I found it kind of hard to follow
I can help you with that
you just apply the algorithms to the predictions you want to make
the intermediate one is not as important as the feature engineering course
so should I just skip it or learn any specific stuff form it?
the most important thing to be able to get go and start working on projects is feature engineering.... it's the only part for a beginner where they have to use their brains
so should I skip the others and focus on this one or the others like deep learning are equally as important?
just give it a fast pace reading through those and focus more on their exercises
dude if you go through those courses and get to feature engineering you can team up with me to work on projects
im looking for teammates
ok thanks a lot!
sounds like a good idea!
A dataset on kaggle
Heyo
A while ago I advertised an AI chat bot that utilized a modified version of GPT-3, and I'm just letting you guys know I'm still looking for testers
Hello,
I am looking for an e-learning platform to learn python applied to data. I'm looking for something that has been proven to work. What would you recommend?
Thanks in advance.
Anybody able to help? Even just looking to switch lists to numpy arrays but getting errors
that's a bunch of code. what do you want help with
there's udemy and corsera. they sell online courses. be sure to read the reviews for the course before you purchase it.
note that, at least in coursera, you can apply for financial support. if you're a student, it's almost guaranteed that you'll get free access to courses
I can answer most pandas questions, but I won't look at screenshots of text/code.
if you decide to provide the code as a code block, ping me, and I'll see if I still have time to go over it by then.
Are there any alternatives to OpenCV because it feels so ancient and is poorly documented
quickfire question: eigenvectors can go in reverse of original right? on same span let ssay 1,0 goes to -1,0
its still eigen?
ahhh i found my answer, its YES
-1 is a perfectly valid eigenvalue, sure
: )
on the final week of linalg before calc
id still fail the exam qs : S
still feels kinda nice to start to see how it works though
but hell nah can i be assed to calculate this myself
there's little point in doing that by hand anyway
the concept is super important though
here's a fun toy task that often shows up
in numpy it makes sense
its hard and slow tho
to work out
idk, we have transform functions so why do it in pure matrices
there's no special reason. matrices are a representation of a linear transformation in a given basis for the domain and codomain
so they're the same thing
however, if you choose a basis super cleverly, then some operations become a lot easier
that's the whole point of learning about eigenvalues and eigenvectors
say you have a linear transformation and you want to compose it n times with itself
f(f(f(....(f(x))....)
if f is diagonalizable, then computing this is trivial
:incoming_envelope: :ok_hand: applied mute to @latent creek until <t:1658150939:f> (9 minutes and 59 seconds) (reason: duplicates rule: sent 4 duplicated messages in 10s).
stuff like PCA is the same: it can be seen as the problem of choosing a clever basis so that there's a super nice low rank representation of data
the idea behind linear algebra and using matrices and vectors for it is that "vector" is quite an abstract concept that applies to many objects. if you can show something behaves like a vector/vector space, then you can immediately disengage your brain and fall back on the well understood, mature toolset of matrices, or more generally the toolset of linalg
e.g. solving differential equations via diagonalization
@serene scaffold alright gimme a moment
https://paste.pythondiscord.com/efexexisas
aand off I'm back to you senpai...
and don't worry about the time because this is just me doing for self-improvement
the goal's to convert the groupby to DataFrame so that I can use seaborn violinplot to draw one
@severe oriole I'll need a moment to figure it out
yup don't worry about the time man
I'm available all time
that's the problem lmao
if i can just make a normal plot then it's no big deal
but the question was to filter out data and make a violin plot out of that
so the docs have this figure
and the code to create it is sns.violinplot(x="day", y="total_bill", data=tips)
so we can infer that the x axis can be labels, and the y axis can be numbers
so using the sample data you gave yesterday, I did this: sns.violinplot(x=df['product'], y=df['total'])
and I got this terrible thing
but I suspect the problem is that I only have five data points. try it with your actual data.
@severe oriole
in this code, tips is a dataframe.
yup that's the problem here
that is the price x quantity of items
so therefore it's fixed
should I send the entire original csv file also?
because the original data does not include the number of orders tbh
oh alright then
this is the raw data i got to begin with
what figure did you get for sns.violinplot(x=df['product'], y=df['total']) when you used the whole dataset?
it's a fixed figure per product also
fixed with total price in that order per product
but I soon deleted that because the question's this
not sure what you mean by "fixed figure". also, you can change which columns you're using to create the figure. I just picked those two arbitrarily.
Plot violin plot between above respective 13 categories of unique year month combinations Month_Year ( 1_2019 , 2_2019 ā¦. 1_2020) along side the number of orders being placed.
Please suggest me some projects ideas based on AI/ML
I need it for my final year project
i mean like iphone is 700 and wired headphone's 23.98 at most else it's just the price per 1 item that's shown in the violin plot

i forgot how to adjust the size of the plot
My first interview assessment is gona be pandas or sql what shud I choose to use
Pandas is easier right
idk since it really depends which one you use more
I used pandas more
then go for pandas
I can't tell since mine's filtering data out
i just have to filter and analyze the information for my case
Mines gona be on the spotwith a interviewer
No help form discord haha
I may have to do what u are doing
I wud filter out just by group by
And use of iloc
hi i have a linear regrression model and a dataset. the thing is i want to see in a praragraph the dots of G3(the predicted label) and the line of the model.how do i do it in matplotlib? ```python
import pandas as pd
import numpy as np
import sklearn
from sklearn import linear_model
from sklearn.linear_model import LinearRegression
import pickle
import matplotlib.pyplot as plt
data = pd.read_csv("student-mat.csv", sep=";")
data = data[["G1", "G2", "G3", "studytime", "failures", "traveltime"]]
predict = "G3"
score = open("score.txt", "r")
data = data.dropna(axis=0)
X = np.array(data.drop([predict], axis= 1))
y = np.array(data[predict])
best = float(score.read())
score.close()
x_train, x_test, y_train, y_test = sklearn.model_selection.train_test_split(X, y, test_size = 0.1)
for i in range(500):
linear = LinearRegression()
linear.fit(x_train, y_train)
acc = linear.score(x_test, y_test)
if acc > best:
print("acc: ", acc)
best = acc
score = open("score.txt", "w")
score.write(str(best))
score.close()
with open("reg.pickle", "wb") as f:
pickle.dump(linear, f)
pickle_in = open("reg.pickle", "rb")
linear = pickle.load(pickle_in)
print('Coefficient: \n', linear.coef_)
print('Intercept: \n', linear.intercept_)
predictions = linear.predict(x_test)
for x in range(len(predictions)):
print("Predict: ", predictions[x],"Features" , x_test[x]+"Actual: ", y_test[x])
#plt.scatter(x_test,x_test)
#plt.plot(predictions, predictions)
#plt.show()```
almost like this but y isn't total sadly
y = number of orders which is equal to the number of ids recorded instead
the problem is that the order count doesn't exist in the original file so i used groupby and I got that order counts in that month
x = month and y = order counts in that month
number of items per order
like in 1 order you can order 2 iphones
but that's still just +1 in order count
man I love violin or catplot
basically it only works if I have the necessary columns right
@charred egret actually on second thought
is it possible to create a new dataframe and use violin plot on that
nvm i figured out now
yeah cuz the columsn don't exist in original data
imma just make a new dataframe with the filtered data instead
kek why didn't I think of that first
nvm it doesn't work
but thanks melio
the question's wrong it seems
I managed to create the violin plot now
yeah it's alright
I think it's the question that's the problem, not my solution
because 3 people helped and it's still like this
so I'm pretty sure this violin plot works if the condition's different
would imbalanced classes be the reason why i have a recall of 1.0
for a uncomon class
thus 96% accuracy
and a recall of 0.0 for another
its just guessing one
yup it should be vary tbh
like it works if there's a range per stuffs
like the default violin plot file given right
why would resampling for random forest fixing the single class guessing translate to better irl performance
incase you get another balance?
yup now I completely understand ur question now
i guess I'm better off trying with other questions instead then
thanks both Melio and Sterlecus for helping me today
you both did your best in helping me
It's 3x3 to 80x80, pillow won't be high enough quality
i hate google API
why is my accuracy 0.0000e+00?
I've been trying to find an exact replica of matlab econ svd in numpy/scipy
The one in numpy doesn't seem to support the 'econ' parameter
this is my model:```py
model = keras.Sequential()
model.add(keras.layers.Conv2D(16, (3, 3), activation='relu', input_shape=(48, 48, 1)))
model.add(keras.layers.MaxPooling2D((2, 2)))
model.add(keras.layers.Conv2D(32, (3, 3), activation='relu', input_shape=(48, 48, 1)))
model.add(keras.layers.MaxPooling2D((2, 2)))
model.add(keras.layers.Conv2D(64, (3, 3), activation='relu', input_shape=(48, 48, 1)))
model.add(keras.layers.Conv2D(64, (3, 3), activation='relu', input_shape=(48, 48, 1)))
model.add(keras.layers.MaxPooling2D(2, 2))
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(576, activation='relu'))
model.add(keras.layers.Dense(576, activation='relu'))
model.add(keras.layers.Dense(256, activation='relu'))
model.add(keras.layers.Dense(128, activation='relu'))
model.add(keras.layers.Dense(64, activation='relu'))
model.add(keras.layers.Dense(16, activation='relu'))
model.add(keras.layers.Dense(7, activation='softmax'))
model.summary()```
this is the model summary: https://paste.pythondiscord.com/asidebobaz
this is the compile:py model.compile( loss = "categorical_crossentropy", optimizer = keras.optimizers.Adam(learning_rate=0.001), metrics = [tf.keras.metrics.Accuracy()] )
this is the fit:py run = model.fit( x_train, y_train, batch_size = 16, epochs = 10 )
https://mail.python.org/pipermail/scipy-user/2005-October/005627.html ps. i dont understand econ or svd or any of this.
set full matrices = false https://numpy.org/doc/stable/reference/generated/numpy.linalg.svd.html
however, google fu i can do
and this is how the fit output looks like: py Epoch 1/10 1795/1795 [==============================] - 45s 25ms/step - loss: 1.8332 - accuracy: 0.0000e+00 Epoch 2/10 1795/1795 [==============================] - 44s 25ms/step - loss: 1.6180 - accuracy: 0.0000e+00 Epoch 3/10 1795/1795 [==============================] - 44s 24ms/step - loss: 1.4600 - accuracy: 0.0000e+00 Epoch 4/10 1795/1795 [==============================] - 45s 25ms/step - loss: 1.3633 - accuracy: 0.0000e+00 Epoch 5/10 707/1795 [==========>...................] - ETA: 26s - loss: 1.3010 - accuracy: 0.0000e+00
why is the accuracy 0?
Ty!
so, economy-size SVD means that, if the matrix is rank deficient, the 0 singular values and their corresponding right and left singular vectors are not returned. you get a square singular value matrix and, in general, rectangular singular value matrices
what should the output look like
is the output eg. [0, 0, 1, 0, 0, 0, 0]?
i don't get it
*it is
wait, are your images size 80 x 80? why are your convolution filters so large?
almost everything they return will be close to zero
Yoo edd
no
what size is the input?
shape is (48, 48, 1)
ok
@wooden sail teach me AI??
imma try that
wait wait, i also misread, give me a second to settle down
yt tutorial?
or kaggle introduction
wait
Ya
I have a brilliant tutorial
i mixed up the (3, 3) with the number of filters, that's my bad, ignore what i said. the network looks mostly ok, the only things i can think of off the top of my head are that the learning rate is either too small or too large. try changing it in both directions and see how that changes
Ok
1 sec
i can help you with some math
ok
but i have a problem when I do that
when the learning rate is too small, the loss turns NaN
if you have specific questions right now, i can give you a hand for a few mins
the basics are probability and statistics, linear algebra, and multivariable calculus
@wooden sail I have tried what you said, and sometimes the accuracy starts at 40 then drops to 1
So letās start with multi variable calculus what ever that is
other times nothing happens
have you tried the rmsprop opimizer instead of adam?
have you already learned single variable calculus?
yep
and mean_sqared_error
changes the learning rate
batch size
what if you make the network smaller, for starters?
[U_J,sigma_J,V_J] = svd(temp_J,'econ');
sigma_J = diag(sigma_J);
svp_J = length(find(sigma_J>1/mu));
if svp_J>=1
sigma_J = sigma_J(1:svp_J)-1/mu;
else
svp_J = 1;
sigma_J = 0;
end
I've tried to convert this to numpy but the sigma_J = sigma_J(1:svp_J)-1/mu; line I can't find an explanation for
ok