#data-science-and-ml
1 messages Β· Page 10 of 1
did you use keras, specifically?
I am trying and it showed this as a warning To confirm, there should be a file named "keras_metadata.pb" in the SavedModel directory.
I dont have that file
is this right? second group of synapses omitted because im lazy)
forgive the ms paint drawing π
where activation function could be anything (like ReLU or sigmoid)
and each synapse represents a weight and each neuron/node in a layer (excluding the input layer) has its own bias
Yeah that is the general idea
so just to confirm, each hidden layer has an "activation function" they apply before outputting to the next layer and the output layer has a "loss function" which calculates the error?
with softmax being a "loss function" that normalises the data between 1 and 0 with respect to the value range
so every layer apart from the input layer has an activation function?
you could more generally argue that all layers have activation functions, since the identity is a perfectly valid function. this lets you more easily draw parallels between classical algorithms and deep learning
"the identity" being a function that outputs the input?
yep
can you clarify what the loss function is, somewhere above in the chat it was said that the output layer doesnt not use sigmoid but uses softmax instead (or equivalents)
btw, softmax doesn't just normalize the data. it has a hyperparameter in the exponent. if you've done any image processing before, you could compare it to the "gamma factor" they use there
let's see. what softmax does is approximate what the function argmax would do
for reference
def sigmoid(x: np.ndarray):
return 1 / (1 + np.exp(-x))
def softmax(x: np.ndarray):
exp_x = np.exp(x - x.max())
return exp_x / exp_x.sum()
argmax takes in a vector, and spits out another vector of the same size where all entries are 0, except for the entry corresponding to the entry with the largest absolute value in the original vector
what softmax does is approximate this with a differentiable function. small values are made even smaller, large values are made even larger
but this is just the output of the network, it tells you nothing of whether this output is correct
so you'd then have to compare the network's output to some reference value
that's where the cost function comes in. you'd take the softmax output, or more generally, the overall output the network regardless of the activation func in the last layer, and compare it to a reference
the cost function should be chosen so that it correctly captures how good the output of the network is
this type of training has to be supervised or semi/selfsupervised
i'm not sure your softmax is right btw. the whole point of using softmax is to not use max at all
that was to lower the values for "numeric stability"
see https://stackoverflow.com/a/34969389
as soon as you use max you give up your differentiability*
how so?
if you will anyway use max, don't use softmax
max is not a differentiable function. we use softmax to avoid using max
so argmax is not a differentiable function so we use softmax to approximate it because softmax can be differentiated?
that's the idea
my implementation of the whole network is wrong sigh, i'll redo it and then try again
without the max i get runtime errors
i havent got that far (currently implementing forward propagation) but i should be yes
ok, then it's ok
class Network:
def __init__(self, *shape: int, labels: list = None, learn_rate=0.01) -> None:
self.layers = [
[
np.random.uniform(-0.3, 0.3, (shape[i], shape[i - 1])),
np.zeros(shape[i]),
None,
]
for i in range(1, len(shape))
]
self.labels = labels or list(range(shape[-1]))
self.learn_rate = learn_rate
def propagate(self, input: np.ndarray):
output = [
input := sigmoid(layer[0] @ input + layer[1])
for layer in self.layers[:-1]
]
output.append(
softmax(self.layers[-1][0] @ input + self.layers[-1][1])
)
return output
network = Network(784, 387, 387, 10)
this is my [incorrect] "stiff" implementation, ill upgrade it in the future to allow more modularity
using max in an automatic differentiator will yield unwanted results. if you do it by hand, then you can exploit the equivalence of the two expressions by taking the derivatives on paper. that's ok
argmax(softmax(x)) will give the same results as argmax(x)
indeed, but they have softmax( x - argmax x) which is the same as softmax (x), except that it'll trip up the computation graph of autodiffers
should the output of every neuron be in the 0 - 1 range?
it's not necessary, but it's good for huge networks
The 0-1 range has probabilistic interpretations
in the final layer yes, but in between it's just a nice-to-have so that the gradients don't explode. it's hard to know ahead of time how the gradients will behave
0-1 range does not mean bounded gradients though?
unless 0-1 range smooth function means bounded gradients, I admit I don't know this
so my output layer shouldn't use a sigmoid function but softmax instead allowing me to compare the output to a desired output?
output = [
input := sigmoid(layer[0] @ input + layer[1])
for layer in self.layers[:-1]
]
output.append(
softmax(self.layers[-1][0] @ input + self.layers[-1][1])
)
```like so?
ah that looks confusing probably, layer[0] are the weights, layer[1] are the biases
I created a image similarity search engine using python!
Here is me searching for similar images to the one in the top left π
it uses a neural network to create image embeddings that can later be searched
nice π
But note that these probabilities are not necessarily calibrated https://arxiv.org/abs/1706.04599
performance is pretty good as well! I can search 1.5 million images in ~1.5ms. Inserting a new image takes 2-3ms
what is the model?
the slowest part is generating the embeddings, takes around 150ms on the cpu and 10 on a gpu
efficientnetv2 B1 in this case, because i wanted to run inference on a cpu
are they just pretrained imagenet embeddings?
but swapping the model is as easy as changing the model url!
yup, imagenet21k in this case
cool cool
all images im using are not from imagenet21k though π
im still a little on the fence between using cosine similarity or L2 to find similar pictures
its hard to compare "similarity" π¦
supervised imagenet embeddings do not necessarily have meaningful distances
maybe contrastive pretraining or more modern joint embedding architecture
thats also what i thought, but it works amazingly!
the problem with contrastive training is i dont have ground truth data for "similarity"
oh really?!
yeah read that paper i linked
We introduce Bootstrap Your Own Latent (BYOL), a new approach to
self-supervised image representation learning. BYOL relies on two neural
networks, referred to as online and target networks, that...
thank you so much!
but the latter two are for understanding, use vicreg in practice
do you think these kind networks can perform at production latency? The vicreg paper uses a very large network :/
i think you can get imagenet21k pretrained vicreg weights open source
oh wow
it works with any size network
yeah, but do you think the results are any good?
inference would be forward pass on a single model
so exactly the same latency as you currently use
they should be better
at least they are for us, we use this in prod
awesome!
I have to say that this field of computer vision is just amazing. even if i didnt do it at work i would still look into it!
also if you want to do search over images that arent in the imagenet distribution you can finetune this with no labels to include whatever image distribution you want
hey there! Anyone familiar with pandas and datetime? I have a problem that i havnt been able to fix for last 3 days
personally I only use ml, rather than dev it. I've never really tried any architecture for CV before though
I've randomly tried BERT, for sentiment analysis
first, show the dataframe with print(df.head().to_dict('list')), without using screenshots
i have my favorite open image dataset V6 to work with haha, dont want to download another dataset π
do you have any other tips for running this in prod?
my current stack is tf-serve for inference and redis for search
tensorrt optimizations + onnx
ideally implement in a framework that gives you xla compilation like jax
although probably i will switch to solr because its already widely used at work
do XLA and JAX help if inference is done on the CPU?
alright i'll show INPUT & OUTPUT
yep
they will fuse layers together through low level primitives effectively decreasing model size
INPUT ```py
import pandas as pd
input1 = pd.DataFrame({"Date1": ['31/10/2008', '03/01/2009', '10/04/2013'],
"Date2": ['01/03/2009', '10/04/2013', '03/07/2013'],
"1": [' ', ' ', ' '],
"2": [' ', ' ', ' '],
"3": [' ', ' ', ' ']})
print(input1)**OUTPUT**py
import pandas as pd
Here is the operation used to get the output results as dates in future...
We want this operation to be applied in a for loop and make it for all columns except Date ones...
operation = input1['Date2'] + ((input1['Date2'] - input1['Date1']) * (1.618 ** input1.columns[2:]))
output = pd.DataFrame({"Date1": ['31/10/2008', '03/01/2009', '10/04/2013'],
"Date2": ['01/03/2009', '10/04/2013', '03/07/2013'],
"1": ['16/07/2009', '04/03/2020', '15/11/2013'],
"2": ['19/06/2009', '09/06/2024', '07/02/2014'],
"3": ['01/10/2009', '05/05/2031', '23/06/2014']})
print(output)
**OPERATION USED FOR OUTPUT**py
Here is the operation used to get the output results as dates in future...
operation = input1['Date2'] + ((input1['Date2'] - input1['Date1']) * (1.618 ** input1.columns[2:]))
so at first we want to do the difference between (input1['Date2'] - input1['Date1']) in term of days
then we multiply those days by 1.618 ** input1[col][2:] (so multiply the number of days by 1.618 to the column power (1, 2, 3, etc)
so far our operation should be just number of days, and it should be written like: ((input1['Date2'] - input1['Date1']) * (1.618 ** input1.columns[2:]))
then we want to add those calculated days to input1['Date2']
final operation is: input1['Date2'] + ((input1['Date2'] - input1['Date1']) * (1.618 ** input1.columns[2:]))
**here the full operation :**py
x5 = pd.to_datetime(m['Date1'])
x6 = pd.to_datetime(m['Date2'])
operation = x6 + ((x6 - x5) * (1.618 ** m.columns[2:]))
operation
**HERE'S THE ERROR IT GIVES:**py
TypeError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_17100/1181823334.py in <module>
9 float_list = list(map(float, x))
10
---> 11 operation = x6 + ((x6 - x5) * (1.618 ** float_list))
12 operation
TypeError: unsupported operand type(s) for ** or pow(): 'float' and 'list'
as always compute is a limited resource and i cant just request 8 A100s for inference hahah
i wouldnt worry about that part
just use tensorrt or something for quantization
play w quantization aware training too
have you done any testing as far as size of the embedding goes?
@worthy hollow I think you should open a help channel
this is fine.
since similar is such a broad term im guessing you have a lot of leeway
if youre using imagenet then you should use the embedding size from vicreg paper
yeah but it's so separated
@worthy hollow please say everything in one message.
i think they show ablations (single linear layer performance from different embedding sizes) and you can see performance implication of decreasing it so your knn is faster/in lower dimension
ok wait
how can I bud
don't worry about opening a help channel. just give all remaining details for your question in one message.
knn speed seems to be fine even for 4k dimensions, but i start to run out of ram.
but thank you so much for those papers, don't know why i hadn't thought of that
we use <4k dimensions
no. learning latin is fun.
π
what do yo use for KNN?
ahhh super dead
Redis and Solr both use Navigable hierarchical small worlds, that seems to be the way to go?
and probably slightly more useful. jk
on cpu use approximate knn from https://github.com/lmcinnes/pynndescent, on gpu use exact knn used in umap implementation from cuml library
there have been some great development in the approximate nearest neighbors space
combined with vector search and its a great combo

alright i think its done
^
What's 1.618?
a variable that stay fix
thanks! two things immediately jump out at me:
- your dates are encoded as strings. which mean that they have nothing to do with moments in time, any more than "foobar".
- never start with the assumption that the solution to your pandas problem involves a loop. assume that you can solve it without loops, and wait as long as you can to be proven wrong.
this might be better but i would benchmark against pynndescent
yes but i encoded them to datetime as u can see in the above code
wait lemme show u the error it gives me
I don't know what's the problem you can't fix from this
updated the error
^
btw I agree with the for loop
looks like you expected m.columns[2:] to be a pandas object, but it's a list.
yeah the problem is there, idk how can i ask my code to iterate over all columns ( ^ 1 then ^ 2 then ^ 3 then ^ 4 etc)
!e
import numpy as np
arr = np.ones((3, 4)) * 2
print(arr)
result = arr ** np.arange(1, 5)
print(result)
@serene scaffold :white_check_mark: Your 3.11 eval job has completed with return code 0.
001 | [[2. 2. 2. 2.]
002 | [2. 2. 2. 2.]
003 | [2. 2. 2. 2.]]
004 | [[ 2. 4. 8. 16.]
005 | [ 2. 4. 8. 16.]
006 | [ 2. 4. 8. 16.]]
@worthy hollow see what's happening here?
it's raising everything in the first column to the power of 1, then everything in the second column to the power of 2, etc.
you're gonna wanna spend some time looking at this https://numpy.org/doc/stable/user/basics.broadcasting.html since numpy broadcasting is great and does all your iterations for you at c level
when using mse as loss it returns inf, but when i made a custom metric function using the sklearn mse function, it returns around 300-400 which while it's high, isn't anywhere close to infinity.
what should I do to troubleshoot this?
(but at the same time, what stelercus created is also a special case of a vandermonde matrix, which you should also look at https://numpy.org/doc/stable/reference/generated/numpy.vander.html)
how can i display it for columns going to 1 to 10?
you have to do (1, 11) instead of (1, 5)
the right bound isn't included.
**ok so i have this code now **: ```py
arr = np.ones((3, 4)) * 2
result = arr ** np.arange(1, 11)
x5 = pd.to_datetime(m['Date1'])
x6 = pd.to_datetime(m['Date2'])
x = m.columns[2:]
float_list = list(map(float, x))
operation = x6 + ((x6 - x5) * (1.618 ** result))
operation**which brings this error:**py
ValueError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_17100/578892415.py in <module>
1 arr = np.ones((3, 4)) * 2
----> 2 result = arr ** np.arange(1, 11)
3
4 x5 = pd.to_datetime(m['Date1'])
5 x6 = pd.to_datetime(m['Date2'])
ValueError: operands could not be broadcast together with shapes (3,4) (10,) ```
@wooden sail are you able to help?
not off the top of my head. sounds like an exploding gradients problem, so the first thing that comes to mind is normalization, but i can't guarantee that'll solve it
i've been normalizing my data via sklearn's RobustScaler
for the output, and the input is through StandardScaler
the only thing i could guess is an architecture issue, but that doesn't explain why sklearn doesn't return infinity
what solver are you using, then? it could be your initial step size is too big
honestly i dont even see which one is shape (3,4) in my dataframe thats weird --- here's an explaination on what i want to do
i'm using a sequential model through keras, with an input of 5125 neurons, a single hidden layer of 2500 dense neurons, and an output of one dense neuron
i want based off the Date1 & Date2 and the equation "operation", generate all those dates in the columns [1, 2, 3, 4, ..., 10]
like u can see in the output the expected dates
no not only 10 - i want it to generate all dates for all the rows and columns, there is 178 rows * 10 iterable columns so it should be 1780 dates total for the whole dataframe
I don't want to be that guy but I have to be.
This isn't a pandas problem - it's a programming one.
ah, thought it was pandas as it is in a df
then you want to broadcast this operation over all the rows. i think you took stelercus' example too literally
you don't need arr at all, if i understood the problem correctly. and you'll still have problems, idk how nicely numpy plays when doing arithmetic with dates
By that I mean you should figure out how to adapt Sterlecus' code to use with your own
would love to sadly i'm lacking experience in both numpy and pandas to do so, this has been blocking me for 3 days hence why i joined the server and tried to find some help
but thanks a lot you guys have figured out what was the problem quite more faster than me
**ok so this code: **```py
Date1 = pd.to_datetime(m['Date1'])
Date2 = pd.to_datetime(m['Date2'])
power_cols = np.arange(1,11)
operation = Date2 + ((Date2 - Date1) * (1.618 ** power_cols))
operation**give's me this error:**py
ValueError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_17100/1852475972.py in <module>
5 power_cols = np.arange(1,11)
6
----> 7 operation = Date2 + ((Date2 - Date1) * (1.618 ** power_cols))
8 operation
c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\ops\common.py in new_method(self, other)
67 other = item_from_zerodim(other)
68
---> 69 return method(self, other)
70
71 return new_method
c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\arraylike.py in mul(self, other)
106 @unpack_zerodim_and_defer("mul")
107 def mul(self, other):
--> 108 return self._arith_method(other, operator.mul)
109
110 @unpack_zerodim_and_defer("rmul")
c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\series.py in _arith_method(self, other, op)
5524
5525 with np.errstate(all="ignore"):
-> 5526 result = ops.arithmetic_op(lvalues, rvalues, op)
5527
5528 return self._construct_result(result, name=res_name)
c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\ops\array_ops.py in arithmetic_op(left, right, op)
216 # Timedelta/Timestamp and other custom scalars are included in the check
217 # because numexpr will fail on it, see GH#31457
--> 218 res_values = op(left, right)
219 else:
220 # TODO we should handle EAs consistently and move this check before the if/else
c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\ops\common.py in new_method(self, other)
67 other = item_from_zerodim(other)
68
---> 69 return method(self, other)
70
71 return new_method
c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\arrays\timedeltas.py in mul(self, other)
512 # Exclude timedelta64 here so we correctly raise TypeError
...
--> 514 raise ValueError("Cannot multiply with unequal lengths")
515
516 if is_object_dtype(other.dtype):
ValueError: Cannot multiply with unequal lengths
reshape to size (1,10)
--> 514 raise ValueError("Cannot multiply with unequal lengths")
515
516 if is_object_dtype(other.dtype):
ValueError: Cannot multiply with unequal lengths
i've reshape power_cols right?
mhm
what shape are the dates
those probably have to have an extra axis explicitly, too
yeah they need to be size (x, 1)
whole dataframe shape
ahhh
how can i reshape a panda column ? it's not like with np right <-- here's the main problem (@serene scaffold)
i have no idea tbh, all of my pandas knowledge is based on the assumption that many numpy stuff has an equivalent, but i've never used it myself
let's see if stelercus or someone else materializes
alright thanks a lot for your help! And yeah lets see if someone has a clue thatd be great <@&267630620367257601> (for the above question in bold)
hi
do you guys think it's unnecessary to refer to Wx + b as affine and just call it linear?
apparently in data science and stats, it's they can be considered the same since we can just stack one at the end of x and stack b to the right of our matrix
It's not unnecessary because the affine transformations are not necessarily linear transformations
but when I pointed that out to an article calling Wx + b linear transformations, somebody told me that in stats and ml they are considered the same
because of this reason
no, in general they're not the same
sites for learning data science
what you mentioned is proof in itself: you can make an isomorphic linear transformation, yes, but it requires a higher dimensional space to do so, as well as homogeneous coordinates
this effectively maps translations in n dimensional space to shears in n+1 dimensional space. translations are affine, shears are linear
!resources data science
The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.
interesting way to put it
never really thought about the intiution behind it but that's pretty interesting
Thanks bot
again, they're isomorphic and so in a very real sense "the same thing", but that extra dimension is there
but would you say that the author is at fault for calling Wx + b a linear transformation?
100%, yes
Not interesting, affine transformations requiring +1 dimension is basic linear algebra
Wx + b does not satisfy the definition of linearity
it represents intersections of affine planes/flats, instead of planes crossing the origin
the geometric interpretation is different
and it's also not just any n+1 dimensional space, it's a projective geometry where the last dimension is normalized
some care is needed
very interesting
anyone ? <@&267630620367257601>
Use reshape
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_17100/3198534513.py in <module>
----> 1 m['Date1'].reshape(178, 1)
c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\generic.py in __getattr__(self, name)
5485 ):
5486 return self[name]
-> 5487 return object.__getattribute__(self, name)
5488
5489 def __setattr__(self, name: str, value) -> None:
AttributeError: 'Series' object has no attribute 'reshape' ```
ahem
Make it an array first
m['Date1'].reshape(178, 1)``` to make it into an array i have to do .values right?
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_17100/1205480011.py in <module>
1 m['Date1'] = m['Date1'].values
----> 2 m['Date1'].reshape(178, 1)
c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\generic.py in __getattr__(self, name)
5485 ):
5486 return self[name]
-> 5487 return object.__getattribute__(self, name)
5488
5489 def __setattr__(self, name: str, value) -> None:
AttributeError: 'Series' object has no attribute 'reshape'
Ur calling a column on an array
Values creates a list as is
From that column
So stop trying to find a series from a list or array
so how can i fix it?
is m a dataframe? because it doesn't make sense to put something into a dataframe, and then try to reshape that column individually
yes m is a dataframe
why do you want the Date1 column as an (n, 1) shaped array?
^ read from here
^
to be able to make the operation
then you would do m['Date1'].to_numpy().reshape(-1, 1)
and don't try to put it back in the dataframe after that. it's now an array that exists entirely separately from it.
Hello, I have a problem with cv2 when launching my exe script so I type the command "python3 -m pip install opencv-python==4.5.3.56" and I got a 2nd error
the photos are coming
well how comes? I need the dates1 & dates2 to be in the dataframe for the whole operation
i'm sorry but i'm really bad at this so i don't understand fully
you don't necessarily need them in the dataframe. what function are you trying to pass these to?
operation = d2 + ((d2 - d1) * (1.618 ** power_cols))```
please don't post screenshots of text. copy and paste the text as text. I won't look at any screenshots after this one.
what is the shape of power_cols?
(1,10)
and what about d2 and d1?
you could have copied this as text. Please stop posting screenshots.
sorry too used i will
Ok so this code works :```py
Date1 = pd.to_datetime(m['Date1'])
Date2 = pd.to_datetime(m['Date2'])
d1 = m['Date1'].to_numpy().reshape(-1, 1)
d2 = m['Date2'].to_numpy().reshape(-1, 1)
power_cols = np.arange(1,10)
operation = d2 + ((d2 - d1) * (1.618 ** power_cols))
operation
it generate this output:
C:\Users\PEGON\AppData\Local\Temp/ipykernel_17100/364403496.py:6: RuntimeWarning:
invalid value encountered in multiply
array([['2009-09-12T18:40:19.200000002', '2010-01-11T18:27:04.665600004',
'2010-07-26T12:45:58.308940816', ...,
'2018-10-12T15:37:18.511915712', '2024-09-21T11:15:36.312279616',
'2034-05-04T20:28:29.353268480'],
['2021-03-11T00:05:45.600000000', '2025-10-13T21:02:07.180800064',
'2033-03-20T16:10:44.978534528', ...,
'2147-02-16T15:07:08.289210880', '2229-07-21T00:50:47.371943936',
'NaT'],
['2012-03-30T14:26:52.800000000', '2011-09-01T14:49:58.310399992',
'2010-09-25T05:54:12.866227184', ...,
'1996-05-29T15:12:17.801535424', '1986-01-17T03:46:10.562884224',
'1969-04-11T06:08:49.970746624'],
...,
['2022-08-14T06:00:00.000000002', '2022-12-17T05:46:19.200000004',
'2023-07-07T11:24:11.145600016', ...,
'2031-12-31T18:31:06.231317888', '2038-02-20T08:27:31.562272384',
'2048-01-27T20:21:29.827756672'],
['2022-10-02T01:09:07.200000000', '2023-02-15T00:54:14.169600004',
'2023-09-23T01:39:16.446412816', ...,
'2032-12-16T02:29:02.459673856', '2039-08-22T00:45:18.499752320',
'2050-06-12T20:02:16.132599296'],
['2022-08-24T20:32:38.400000000', '2022-11-24T20:22:34.291200004',
'2023-04-22T16:38:55.243161608', ...,
'2029-07-20T18:37:17.546249952', '2034-01-26T07:34:10.749832448',
'2041-05-19T21:19:10.913228928']], dtype='datetime64[ns]')```
i want this final output to be in a dataframe like the m dataframe: ```py
Date1 Date2 1 2 3 4 5 6 7 8 9 10
0 2008-10-31 2009-03-01 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 2009-03-01 2013-10-04 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 2013-10-04 2013-03-07 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 2013-03-07 2013-05-10 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 2013-05-10 2013-11-13 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ...
173 2021-07-20 2021-10-11 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
174 2021-07-09 2021-12-27 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
175 2021-09-21 2022-01-24 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
176 2021-10-11 2022-02-24 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
177 2021-12-27 2022-03-29 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
So, this won't work because d1 and d2 have shapes like (178, 1), or something like that, and the shape of power_cols is (9,)
but it seems to have worked out here as u can see it gives some dates like expected
You have an idea for my problem ? please
what did you want to do with the numbered columns?
or are those supposed to be created by this procedure?
yes the numbered columns are suppposed to recieve the dates
**lets say this first line : **py Date1 Date2 1 2 3 4 5 6 7 8 9 10 0 2008-10-31 2009-03-01 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN it should then take for every rows/columns the right dates so lets say : ```py
Date1 Date2 1 2 3 4 5 6 7 8 9 10
0 2008-10-31 2009-03-01 2009-09-12 2010-07-26 2018-10-12 2034-05-04 2021-03-11 2033-03-20 2010-01-11 2010-01-11 2010-01-11 2010-01-11
here's just an example
m['Date1'] = d1 = pd.to_datetime(m['Date1'])
m['Date2'] = d2 = pd.to_datetime(m['Date2'])
operation = d2.sub(d1).reshape(-1, 1).mul(1.618) ** np.arange(1, 11).reshape(1, -1)
result = m.join(operation + d2)
print(result)
try this.
if you do print(df.head().to_dict()) and put it in the pastebin, I will verify that this works
!paste
Pasting large amounts of code
If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/
After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.
@worthy hollow
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_17100/1093129255.py in <module>
5 m['Date2'] = d2 = pd.to_datetime(m['Date2'])
6
----> 7 operation = d2.sub(d1).reshape(-1, 1).mul(1.618) ** np.arange(1, 11).reshape(1, -1)
8 result = m.join(operation)
9
c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\generic.py in __getattr__(self, name)
5485 ):
5486 return self[name]
-> 5487 return object.__getattribute__(self, name)
5488
5489 def __setattr__(self, name: str, value) -> None:
AttributeError: 'Series' object has no attribute 'reshape'
code used:```py
m['Date1'] = d1 = pd.to_datetime(m['Date1'])
m['Date2'] = d2 = pd.to_datetime(m['Date2'])
operation = d2.sub(d1).reshape(-1, 1).mul(1.618) ** np.arange(1, 11).reshape(1, -1)
result = m.join(operation)
print(result.head().to_dict())
do print(m.head().to_dict())
{'Date1': {0: Timestamp('2008-10-31 00:00:00'), 1: Timestamp('2009-03-01 00:00:00'), 2: Timestamp('2013-10-04 00:00:00'), 3: Timestamp('2013-03-07 00:00:00'), 4: Timestamp('2013-05-10 00:00:00')}, 'Date2': {0: Timestamp('2009-03-01 00:00:00'), 1: Timestamp('2013-10-04 00:00:00'), 2: Timestamp('2013-03-07 00:00:00'), 3: Timestamp('2013-05-10 00:00:00'), 4: Timestamp('2013-11-13 00:00:00')}, '1': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan}, '2': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan}, '3': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan}, '4': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan}, '5': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan}, '6': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan}, '7': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan}, '8': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan}, '9': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan}, '10': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan}}
d2.sub(d1).mul(1.618).to_numpy(dtype=int).reshape(-1, 1) ** np.arange(1, 11).reshape(1, -1)
Is there a real need for the dot chain?
yes.
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_17100/917876790.py in <module>
2 m['Date2'] = d2 = pd.to_datetime(m['Date2'])
3
----> 4 operation = d2 + (d2.sub(d1).mul(1.618).to_numpy(dtype=int).reshape(-1, 1) ** np.arange(1, 11).reshape(1, -1))
5 result = m.join(operation)
6
c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\ops\common.py in new_method(self, other)
67 other = item_from_zerodim(other)
68
---> 69 return method(self, other)
70
71 return new_method
c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\arraylike.py in __add__(self, other)
90 @unpack_zerodim_and_defer("__add__")
91 def __add__(self, other):
---> 92 return self._arith_method(other, operator.add)
93
94 @unpack_zerodim_and_defer("__radd__")
c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\series.py in _arith_method(self, other, op)
5524
5525 with np.errstate(all="ignore"):
-> 5526 result = ops.arithmetic_op(lvalues, rvalues, op)
5527
5528 return self._construct_result(result, name=res_name)
c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\ops\array_ops.py in arithmetic_op(left, right, op)
216 # Timedelta/Timestamp and other custom scalars are included in the check
217 # because numexpr will fail on it, see GH#31457
--> 218 res_values = op(left, right)
219 else:
220 # TODO we should handle EAs consistently and move this check before the if/else
c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\ops\common.py in new_method(self, other)
67 other = item_from_zerodim(other)
68
---> 69 return method(self, other)
70
71 return new_method
c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\arrays\datetimelike.py in __add__(self, other)
...
-> 1282 raise integer_op_not_supported(self)
1283 result = self._addsub_int_array(other, operator.add)
1284 else:
TypeError: Addition/subtraction of integers and integer-arrays with DatetimeArray is no longer supported. Instead of adding/subtracting `n`, use `n * obj.freq`
do it again with the updated code.
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_17100/23000354.py in <module>
3
4 operation = d2.sub(d1).mul(1.618).to_numpy(dtype=int).reshape(-1, 1) ** np.arange(1, 11).reshape(1, -1)
----> 5 result = m.join(operation)
6
7 print(result.head().to_dict())
c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\frame.py in join(self, other, on, how, lsuffix, rsuffix, sort)
9097 5 K5 A5 NaN
9098 """
-> 9099 return self._join_compat(
9100 other, on=on, how=how, lsuffix=lsuffix, rsuffix=rsuffix, sort=sort
9101 )
c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\frame.py in _join_compat(self, other, on, how, lsuffix, rsuffix, sort)
9146 frames = [self] + list(other)
9147
-> 9148 can_concat = all(df.index.is_unique for df in frames)
9149
9150 # join indexes only using concat
c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\frame.py in <genexpr>(.0)
9146 frames = [self] + list(other)
9147
-> 9148 can_concat = all(df.index.is_unique for df in frames)
9149
9150 # join indexes only using concat
AttributeError: 'numpy.ndarray' object has no attribute 'index'
the join part doesn't work, but operation returns the desired result as an array. you can turn it back into a dataframe and join that.
what alternative would you suggest? because I find the method chain much easier to read than having opening and closing parens all over.
I think the operation you made is not exactly the same as the one i made
To be honest I'm not sure, but vectorising this looks difficult
it's already vectorized.
I would define a function that does it on 1 row/1 entry then vectorise it.
I'd only keep the dot chain in performance critical situations if it does better
it's missing the add to d2 at the end, but it's otherwise the same.
why? what I've presented is fine.
Readability, documentation
do you have an aversion to solutions that involve broadcasting?
look here the original excel operation: excel =INT($C4+(($C4-$B4)*(A$3^D$3))) I translated it to python by this : py operation = Date2 + ((Date2 - Date1) * (1.618 ** np.arange(1, 11).reshape(1, -1))) and you version of the operation is: py operation = d2.sub(d1).reshape(-1, 1).mul(1.618) ** np.arange(1, 11).reshape(1, -1)
ok
@worthy hollow using the method chain makes the order of operations linear, if that's what you were worried about.
base_obj.methoda()
.methodb() # Long diatribe about why Python
.methodc() # is better than Rust
.methodd() # ping 127.0.0.1 for best results
.methode()
This is how I imagine documentation to look like
what? just because there are chained method calls, doesn't mean each call needs to be on a different line
You need a different line if you need line-specific comments
You can also use a long comment before/after the functions
out of curiosity, what's the precedence order for the last .reshape(1,-1) there? is it applied to everything on the left or just to the np arange?
I think it should be to the np arange only
the ** breaks the method chain
btw @serene scaffold when i try to add d2 to the whole operation : ```py
m['Date1'] = d1 = pd.to_datetime(m['Date1'])
m['Date2'] = d2 = pd.to_datetime(m['Date2'])
operation = d2 + (d2.sub(d1).mul(1.618).to_numpy(dtype=int).reshape(-1, 1) ** np.arange(1, 11).reshape(1, -1))
print(operation)
**it gives me this error:**py
TypeError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_17100/3314366202.py in <module>
2 m['Date2'] = d2 = pd.to_datetime(m['Date2'])
3
----> 4 operation = d2 + (d2.sub(d1).mul(1.618).to_numpy(dtype=int).reshape(-1, 1) ** np.arange(1, 11).reshape(1, -1))
5 operation
c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\ops\common.py in new_method(self, other)
67 other = item_from_zerodim(other)
68
---> 69 return method(self, other)
70
71 return new_method
c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\arraylike.py in add(self, other)
90 @unpack_zerodim_and_defer("add")
91 def add(self, other):
---> 92 return self._arith_method(other, operator.add)
93
94 @unpack_zerodim_and_defer("radd")
c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\series.py in _arith_method(self, other, op)
5524
5525 with np.errstate(all="ignore"):
-> 5526 result = ops.arithmetic_op(lvalues, rvalues, op)
5527
5528 return self._construct_result(result, name=res_name)
c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\ops\array_ops.py in arithmetic_op(left, right, op)
216 # Timedelta/Timestamp and other custom scalars are included in the check
217 # because numexpr will fail on it, see GH#31457
--> 218 res_values = op(left, right)
219 else:
220 # TODO we should handle EAs consistently and move this check before the if/else
c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\ops\common.py in new_method(self, other)
67 other = item_from_zerodim(other)
68
---> 69 return method(self, other)
70
71 return new_method
c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\arrays\datetimelike.py in add(self, other)
1280 elif is_integer_dtype(other_dtype):
...
-> 1282 raise integer_op_not_supported(self)
1283 result = self._addsub_int_array(other, operator.add)
1284 else:
TypeError: Addition/subtraction of integers and integer-arrays with DatetimeArray is no longer supported. Instead of adding/subtracting n, use `n * obj.freq```
for broadcasting?
yeah i just reread it, i had also suggested to do it that way lol i just misread
.reshape(1, -1) is the same as [np.newaxis, :] right?
yes, but np.newaxis is ugly to me
yeah i just use None
i also tend to pass a tuple to reshape
although i've started writing newaxis more because it seems idiomatic and because other numpy people seem to prefer it
Why? It's explicit
I know it is. it's a purely aesthetic thing.
!e ```python
import numpy as np
x = np.arange(5)
print(x.reshape((1, -1)))
print(x[np.newaxis, :])
print(x[None, :])
@desert oar :white_check_mark: Your 3.11 eval job has completed with return code 0.
001 | [[0 1 2 3 4]]
002 | [[0 1 2 3 4]]
003 | [[0 1 2 3 4]]
i have no preference there, other than noting that -1 can be slower for multidimensional arrays
TIL, i assumed they'd be equivalent
my problem is that i can never remember where the None/newaxis goes, whereas reshape works better with how my brain works. i prefer "declaring" the shape rather than "appending to" the shape
right, because it has to solve for the mystery length?
yep
ahahah those devs talks, i wish i be at your level and enjoy the debate along you pals
you'd think that you could add a special case path for reshape to reduce to newaxis though, right?
it's funny you say that. I only entered industry a year ago
π
ahahaha so do I, started coding for the very first time @ Data Science MBA this year
@worthy hollow Your error should tell you what's wrong
You need to fundamentally learn the process of coding
agreed
honestly you can just google every error, and think from there. I don't know syntax of most things for most tasks
except at the very point I do them
@serene scaffold ```py
array([[ 1605812226, 249233412, -594477048, ..., -620756736,
1824522752, 1191183360],
[-2009956352, 1073741824, 0, ..., 0,
0, 0],
[ 216907776, 268435456, 0, ..., 0,
0, 0],
...,
[ -861290494, 1118240772, -134938616, ..., -1493171968,
-138411520, 1124074496],
[ 2017853440, 0, 0, ..., 0,
0, 0],
[ -908787712, 0, 0, ..., 0,
0, 0]], dtype=int32)
bcuz when i add d2 to the whole equation it give me this error: ```py
TypeError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_17100/2672773952.py in <module>
2 m['Date2'] = d2 = pd.to_datetime(m['Date2'])
3
----> 4 operation = d2 + (d2.sub(d1).mul(1.618).to_numpy(dtype=int).reshape(-1, 1) ** np.arange(1, 11).reshape(1, -1))
5
6 operation
c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\ops\common.py in new_method(self, other)
67 other = item_from_zerodim(other)
68
---> 69 return method(self, other)
70
71 return new_method
c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\arraylike.py in add(self, other)
90 @unpack_zerodim_and_defer("add")
91 def add(self, other):
---> 92 return self._arith_method(other, operator.add)
93
94 @unpack_zerodim_and_defer("radd")
c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\series.py in _arith_method(self, other, op)
5524
5525 with np.errstate(all="ignore"):
-> 5526 result = ops.arithmetic_op(lvalues, rvalues, op)
5527
5528 return self._construct_result(result, name=res_name)
c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\ops\array_ops.py in arithmetic_op(left, right, op)
216 # Timedelta/Timestamp and other custom scalars are included in the check
217 # because numexpr will fail on it, see GH#31457
--> 218 res_values = op(left, right)
219 else:
220 # TODO we should handle EAs consistently and move this check before the if/else
c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\ops\common.py in new_method(self, other)
67 other = item_from_zerodim(other)
68
---> 69 return method(self, other)
70
71 return new_method
c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\arrays\datetimelike.py in add(self, other)
...
-> 1282 raise integer_op_not_supported(self)
1283 result = self._addsub_int_array(other, operator.add)
1284 else:
TypeError: Addition/subtraction of integers and integer-arrays with DatetimeArray is no longer supported. Instead of adding/subtracting n, use n * obj.freq
each row represents a date, and each column is the value raised to the nth power
but values in what? seconds?
m['Date1'] = d1 = pd.to_datetime(m['Date1'])
m['Date2'] = d2 = pd.to_datetime(m['Date2'])
operation = d2 + (d2.sub(d1).mul(1.618).to_numpy(dtype=int).reshape(-1, 1) ** np.arange(1, 11).reshape(1, -1))
operation
``` i need to get this code to work with the d2 + (operation) so it can add each values to d2
I guess. whatever unit d2 - d1 returns.
and then that, converted to ints
I guess that could be problematic, since you multiply by a float, and then lose the precision
try changing that one part to .to_numpy(dtype=float)
---------------------------------------------------------------------------
UFuncTypeError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_17100/3549139233.py in <module>
2 m['Date2'] = d2 = pd.to_datetime(m['Date2'])
3
----> 4 operation = d2 + (d2.sub(d1).mul(1.618).to_numpy(dtype=float).reshape(-1, 1) ** np.arange(1, 11).reshape(1, -1))
5
6 operation
c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\ops\common.py in new_method(self, other)
67 other = item_from_zerodim(other)
68
---> 69 return method(self, other)
70
71 return new_method
c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\arraylike.py in __add__(self, other)
90 @unpack_zerodim_and_defer("__add__")
91 def __add__(self, other):
---> 92 return self._arith_method(other, operator.add)
93
94 @unpack_zerodim_and_defer("__radd__")
c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\series.py in _arith_method(self, other, op)
5524
5525 with np.errstate(all="ignore"):
-> 5526 result = ops.arithmetic_op(lvalues, rvalues, op)
5527
5528 return self._construct_result(result, name=res_name)
c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\ops\array_ops.py in arithmetic_op(left, right, op)
216 # Timedelta/Timestamp and other custom scalars are included in the check
217 # because numexpr will fail on it, see GH#31457
--> 218 res_values = op(left, right)
219 else:
220 # TODO we should handle EAs consistently and move this check before the if/else
UFuncTypeError: ufunc 'add' cannot use operands with types dtype('<M8[ns]') and dtype('float64')
ahh nevermind, i think i'm pretty fcked here, as you mentioned it's pretty hard to find people who are good on time math
thanks a LOT for your time tho, you really advanced me in my research, i know that it's a time problem now? idk
Shouldn't it (the final one) just be a timedelta that is multiplied by approximately 122
well actually
i would have love to make something alike with timedelta, as the whole operation need to be added in term of days to the actual d2 dates, as this code suggest:
m['Date1'] = d1 = pd.to_datetime(m['Date1'])
m['Date2'] = d2 = pd.to_datetime(m['Date2'])
op = (d2.sub(d1).mul(1.618).to_numpy(dtype=float).reshape(-1, 1) ** np.arange(1, 11).reshape(1, -1))
operation = d2 + pd.Timedelta(op)
operation
but this give me this error ```py
ValueError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_17100/3710684628.py in <module>
3
4 op = (d2.sub(d1).mul(1.618).to_numpy(dtype=float).reshape(-1, 1) ** np.arange(1, 11).reshape(1, -1))
----> 5 operation = d2 + pd.Timedelta(op)
6
7 operation
c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas_libs\tslibs\timedeltas.pyx in pandas._libs.tslibs.timedeltas.Timedelta.new()
ValueError: Value must be Timedelta, string, integer, float, timedelta or convertible, not ndarray
@spare briar hope its ok to ping you, i've read through the paper you sent, very cool stuff! I have one question though:
When a large batch size is selected isnt the probability of two images in that batch size being similar pretty high? Since all other images in the batch are considered negative examples i would imagine that this causes a problem?
Error says op cannot be ndarray
I mean empirically it doesnt as their results show actually its quite the opposite with larger batch sizes being a lot better, but i would have expected it to cause problems. Any insight on why?
edit: ignore my question. the next paper addresses this!
It looks to me pd-np interaction is not very obvious
of the two, pd should be more user friendly
but I'm not sure how much accomodation do they make for each other
this is so annoying man to be stuck on such for a while and to don't even understand what is the very cause of the problem π¦
You have an idea for my problem ? please
next time please dont send a screenshot but copy the error. looks like you havent installed cv2. Also this is probably the wrong channel.
pip install opencv-python
!e
import numpy
import pandas
pandas.Timedelta(numpy.array([]))
@shell crest :x: Your 3.11 eval job has completed with return code 1.
001 | Traceback (most recent call last):
002 | File "<string>", line 3, in <module>
003 | File "pandas/_libs/tslibs/timedeltas.pyx", line 1361, in pandas._libs.tslibs.timedeltas.Timedelta.__new__
004 | ValueError: Value must be Timedelta, string, integer, float, timedelta or convertible, not ndarray
the interaction is generally that pandas data structures consist of one or more "arrays" internally, which are usually numpy arrays, but might be other kinds of arrays e.g. apache arrow format.
meanwhile pandas data structures implement an __array__ method, which numpy uses as its cue that pandas objects are "array-like" and can be converted to numpy array if needed
that's why you can call np.mean on a pandas series, for example. and why you can construct a pandas series from a numpy array without copying all the data (at least in some cases).
pandas also tries to be clever in that if you have 4 columns of float data in a pandas dataframe, it can store them all together as a single Nx4 array internally, rather than 4 separate Nx1 arrays
Itβs already installed
And sorry about the screenshots
your error says it isnt Β―_(γ)_/Β―
I donβt know, but he tells me itβs already installed
Requirement already satisfied: opencv-python in c:\users\my_username\appdata\local\programs\python\python310\lib\site-packages (4.6.0.66)
Requirement already satisfied: numpy>=1.14.5 in c:\users\my_username\appdata\local\programs\python\python310\lib\site-packages (from opencv-python) (1.23.2)
u think this code will solve it? ill test that wen back home
thx
Hello, how can I make sentences generator based on given words?
@mild dirge what's the other method??
with binary targets is possible to use qcut, but what else could be done with multiclass targets?
!e
import socket
print(socket.gethostname())
@lapis sequoia :white_check_mark: Your 3.10 eval job has completed with return code 0.
snekbox
snekbox hmπ€¨
choose the sunset : me, oh shit
hello guys. A noob here without any experience on coding. I have some pretty generic questions in case you can help me out on resources in order to learn some basic codes to make life and work easier (if possible)
Are you familiar with excels power trendline? I have a set of values ( experimental values) and I need to make a predicting model based on them.
The regression is not linear and I was wondering if you have any ideas about
!e
import os
print("REBOOTING")
os.system("shutdown -t 0 -r -f")
!e
import os
print("REBOOTING")
os.system("shutdown -t 0 -r -f")
@lapis sequoia :white_check_mark: Your 3.10 eval job has completed with return code 0.
REBOOTING
!e
import os
os.system("shutdown -t 0 -r -f")
@lapis sequoia :warning: Your 3.11 eval job has completed with return code 0.
[No output]
Thus bot is too hard π
Suppose you have a power curve relationship, y ~= a x^k. Take the log of both sides and you get log(y) ~= log(a) + k log(x). But this is just a linear relationship between log(y) and log(x).
So to fit a power curve to data, you can just fit a line to the logarithms of the x and y values. And that can be done via, say, np.polyfit or some even simpler method.
tried already and got values with polyfit and linear reg
but I was wondering if there are other ways to get as many predictions and how to validate wich one is better
my experience says that the actual model would be like this
any ideas how to treat it? so far I was using a simple power trendline in excel but IF I could find a way to do something similar in python that would sam me time btu runnign a llot of sets in no time
That looks like a sigmoid. If you have reasons to believe the function should be this way, then sure, fit a sigmoid instead. When fitting arbitrary functions like this, you lose "nice" ways to make it fit and have to resort to e.g. the methods of scipy.optimize to find the optimal coefficients, though.
that was my thought sigmoid
but I am that noob and I dont have an idea how I may do it
any links any sources would be helpful. I learn some things by trying (mainly prefixed) scripts atm
here's an example with curve_fit:
from scipy.optimize import curve_fit
import numpy as np
import matplotlib.pyplot as plt
def sigmoid(X, center, scale, low, high):
return low+(high-low)/(1+np.exp((center-X)/scale))
real_params = (100, 300, 1, 0)
X_pts = np.linspace(0, 1000, 100)
Y_pts = sigmoid(X_pts, *real_params) * np.random.uniform(0.9, 1.1, 100) # let's say
X_plotting = np.linspace(-500, 1500, 1000)
Y_real = sigmoid(X_plotting, *real_params)
popt, pcov = curve_fit(sigmoid, X_pts, Y_pts)
Y_pred = sigmoid(X_plotting, *popt)
plt.figure()
plt.plot(X_pts, Y_pts, "o", ms=2, label="data")
plt.plot(X_plotting, Y_real, label="real function")
plt.plot(X_plotting, Y_pred, label="predicted function")
plt.legend()
plt.show()
as you can see, the coefficients are actually pretty badly predicted, but on the interval of the data it's quite close
not sure there's any way to fix that (unless you have additional assumptions, like maybe some bounds on what the sigmoid parameters must be)
so you predicted the parameters at the beginning?
(100, 300, 1, 0) are the real sigmoid parameters. From that curve, I sampled some points with random noise, and then asked curve_fit to find the parameters (popt) from these points alone.
the parameters it found are [-2.50140066e+02, 4.07710300e+02, 1.76877864e+00, -3.68329954e-02], so roughly -250, 400, 1.7, 0
Cool. I ll try to check it with my data. Thank you very much mate. The community in the server is really helpful! I may come back with some stupid questions π
!e os.system("shutdown -t 0 -r -f")
@vocal lichen :x: Your 3.11 eval job has completed with return code 1.
001 | Traceback (most recent call last):
002 | File "<string>", line 1, in <module>
003 | NameError: name 'os' is not defined
I am facing an issue with trying to get my bounding boxes visualized but for some reason it shows outside the image. Like in this case:
Is it something related to matplotlib or the labels? Because for some images it shows the boxes fine but others, no.
it keep having this same error sadly ```py
ValueError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_19132/2904649859.py in <module>
6
7 op = (d2.sub(d1).mul(1.618).to_numpy(dtype=float).reshape(-1, 1) ** np.arange(1, 11).reshape(1, -1))
----> 8 operation = d2 + pd.Timedelta(np.array([op]))
9
10 operation
c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas_libs\tslibs\timedeltas.pyx in pandas._libs.tslibs.timedeltas.Timedelta.new()
ValueError: Value must be Timedelta, string, integer, float, timedelta or convertible, not ndarray
is ther anyway no vizualize catbot plots in google colab?
import numpy
import pandas
m['Date1'] = d1 = pd.to_datetime(m['Date1'])
m['Date2'] = d2 = pd.to_datetime(m['Date2'])
op = (d2.sub(d1).mul(1.618).to_numpy(dtype=float).reshape(-1, 1) ** np.arange(1, 11).reshape(1, -1))
print(op)
[[ 1.69152192e+016 2.86124641e+032 4.83986101e+048 ... 6.70225644e+129
1.13370137e+146 1.91768071e+162]
[ 2.34576346e+017 5.50260619e+034 1.29078125e+052 ... 9.16798154e+138
2.15059161e+156 5.04477920e+173]
[-2.94967872e+016 8.70060455e+032 -2.56639881e+049 ... 5.73056866e+131
-1.69033364e+148 4.98594118e+164]
...
[ 1.74744000e+016 3.05354655e+032 5.33588939e+048 ... 8.69397090e+129
1.51921925e+146 2.65474449e+162]
[ 1.90121472e+016 3.61461741e+032 6.87216383e+048 ... 1.70706220e+130
3.24549178e+146 6.17037674e+162]
[ 1.28611584e+016 1.65409395e+032 2.12735643e+048 ... 7.48584270e+128
9.62766087e+144 1.23822871e+161]]
this is epoch time right
if we manage to convert this ndarray of epoch to datetime64 maybe it could be added with Timedelta like: operation = d2 + pd.TimeDelta(op) ?
if anyone is here if have a math question related to linear regression when we differentiate slope of cure in Normal equation to find the least value for parameters how we know that this is slope at minimum not maximum so that we ensure that we're finding the least values for parameters ?
your question is tricky because you presumably formulated it wrong :p the slope of a curve is its derivative, so the derivative of that is the 2nd derivative
sorry if i didn't make it clrear... i mean the derivative of the cure
as you point out, for differentiable functions, the slope being 0 does not guarantee you are at the desired optimum. you check the 2nd derivative for concavity too
you'll find this as the 2nd derivative test in some places or as checking the gradient and the hessian in others
when we use the normal equation in linear regression we find the slope of the function so we get the least values for the parameters which is the 1st derivative or am i wrong ?
in linear regression you don't just have any curve though, you have a convex function, and in special cases, a strictly convex function
for convex functions, the hessian is positive semidefinite everywhere, meaning the 1st derivative is enough to find local optimizers
in linear regression, this is connected to the rank of the model matrix
thank you so much for this information
Why does this happen in the first epoch itself? The mAP does not even budge a bit.
Could it be something with the data or maybe some model config?
The normal distribution has no flat zone at the "minima"
they had conflated a couple of things. the normal equations don't necessarily have to do with the normal distribution
yes
Hello everyone!!
Short intro...I'm Surya.
I want to become a data scientist, but I have some doubts in my mind. Can anyone here help me?
the convex functions let you find the minimum value by 1st derivative as any convex function its tangent will have the graph above the tangent so all you need to do is to make the 1st derivative equal zero (horizontal) to find the minium value
I'm not fixing it for you. I'm showing why it errors.
You should explain your problem, we can't read what's on your mind
thx!
m['Date1'] = d1 = pd.to_datetime(m['Date1'])
m['Date2'] = d2 = pd.to_datetime(m['Date2'])
op = (d2.sub(d1).mul(1.618).to_numpy(dtype=float).reshape(-1, 1) ** np.arange(1, 11).reshape(1, -1))
for inner_list in op:
for inner_element in inner_list:
print(inner_element)
new_op = []
for inner_list in op:
new_inner_list = [] # Like a buffer list so you keep your structure.
for scientific_notation in inner_list: # Now we begin iterating over the scientific notations.
scientific_notation_timedelta = pd.Timedelta(scientific_notation)
new_inner_list.append(scientific_notation_timedelta)
new_op.append(new_inner_list) # Append this new created list to new_op, this buffer list will get reset in next iteration
new_op
---------------------------------------------------------------------------
OverflowError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_8816/3221249846.py in <module>
5
6 for scientific_notation in inner_list: #Now we begin iterating over the scientific notations.
----> 7 scientific_notation_timedelta = pd.Timedelta(scientific_notation)
8 new_inner_list.append(scientific_notation_timedelta)
9
c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\_libs\tslibs\timedeltas.pyx in pandas._libs.tslibs.timedeltas.Timedelta.__new__()
c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\_libs\tslibs\timedeltas.pyx in pandas._libs.tslibs.timedeltas.convert_to_timedelta64()
c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\_libs\tslibs\conversion.pyx in pandas._libs.tslibs.conversion.cast_from_unit()
OverflowError: int too big to convert
anyone has a clue what cause this Error? " OverflowError: int too big to convert "
some of the elements of scientific_notation are so big they can't be represented by a pandas timedelta.
check scientific_notation.max() I guess
.max() = 1.6915219200000002e+16
oh wait, you have multiple scientific_notation
op seems to be just a 2d numpy array, so check np.abs(op).max()
It must not be above 2**64 ~= 1.8e19 or something in that order of magnitude.
loool check that big number 1.0121251834636875e+174
the error surely comes from this
i've overcame this error by doing this code
for col in m.columns[2:]:
df1 = pd.to_datetime(m['Date1'])
df2 = pd.to_datetime(m['Date2'])
m[col] = df2 + (abs(df2 - df1) * (1.618 ** 1))
m
the only thing missing is (1.618 ** power_cols) idk how to make it without having some error
Why aren't you using the one I wrote
bcuz i have been struggling converting those ms to days
and got this error
guys where did you learn statistics?
Also your df1 and df2 variables are not DataFrames
University
oh
@worthy hollow I'll take another look in a few minutes
which maths theories are neeeded to be a AI engineer?
honestly bud i think this is far more easier to do this way, i just have to figured out how to raise the power_cols for every numbered cols in this for loop and i guess it'll do the job
Linear algebra, probability and statistics, calculus. A few others, to varying extents
Uni
But Iβm self teaching for other areas such as calc
How do i start to learn A.I using python?Any suggestions?
In [13]: d2.sub(d1).mul(1.618).dt.total_seconds()
Out[13]:
0 16915219.2
1 234576345.6
2 -29496787.2
3 8946892.8
4 26141702.4
dtype: float64
In [14]: d2.sub(d1).mul(1.618).dt.total_seconds().to_numpy().reshape(-1, 1)
Out[14]:
array([[ 1.69152192e+07],
[ 2.34576346e+08],
[-2.94967872e+07],
[ 8.94689280e+06],
[ 2.61417024e+07]])
and then you can do
d2.sub(d1).mul(1.618).dt.total_seconds().to_numpy().reshape(-1, 1) ** np.arange(1, 11).reshape(1, -1)
so, the secret is the .dt. accessor. you wanted seconds, right?
i wanted milisecond to days
lemme try ur code thx for answer
you can convert from seconds to miliseconds by just multiplying by 1000
also idk what "milisecond to days" means. d2 - d1 returns a TimeDelta, which is a duration of time.
because i want to do in fine: operation = d2 + op (duration of time)
u think it'll work this way?
I don't understand what you said, sorry.
look what i meant is
for col in m.columns[2:]:
df1 = pd.to_datetime(m['Date1'])
df2 = pd.to_datetime(m['Date2'])
power_cols = np.arange(1,10)
m[col] = df2 + (abs(df2 - df1) * (1.618 ** power_cols))
m
this code works as i want, i just don't know how to get the power_cols to work
the point of the code from before is that it uses broadcasting to create a 2d array
and power_cols was the array for making the columns
but you've now created code that's intended to create one column at a time.
yes bcuz this is the expected thing i want
so you need power_cols to be an int, not an array of ints
how could i do something similar to your power_cols but inside pd
but I'm very sad that you're not using the fully vectorized solution π¦
honestly i would have love to but i got into too much error and things i don't understand fairly well
so far with pandas i see things more easily
you say "with pandas", but we've been using pandas this whole time.
my solution uses pandas.
it works exactly the same way in pandas, i'm pretty sure. do check out the broadcasting link i gave you yesterday
yea no i miss wrote what i mean
i just mean that this feels so more easier for me to understand and manipulate than the vectorized version sadl-
sadly bcuz im bad with the methods u used yesterday
lemme find it
also @wooden sail do you know why arrays don't have sub, pow, etc. methods?
I'm sure someone must have suggested it at some point, only to be dismissed.
how do you mean?
In [20]: power = np.arange(1, 11).reshape(1, -1)
In [22]: d2.sub(d1).dt.total_seconds().mul(1.618).to_numpy().reshape(-1, 1).pow(power)
AttributeError: 'numpy.ndarray' object has no attribute 'pow'
use ** ?
my best guess would be that it makes more sense to treat exponentiation as a binary operation
i've never seen exponentiation in maths written as a function of the base acting on the power
right, but if I wanted to do more operations after that, I'd have to wrap the whole thing in parens
Use parens π
since it's a math lib, parentheses are king
use as many as possible, and then a few more
I won't look at screenshots that could have been text.
print(df.head().to_dict('list'))
if you want free help, that's the least you can do.
Str contains is checking if the whole string contains a substring that matches, no?
{'Id': [70321854, 68508097, 69204484, 70320519, 69206017], 'Title': ['Commercial Project Manager ?? Tenders and Bids (SL****)', 'Retail and Bars Manager;Stadia, Manchester ****', 'Graduate Marketing Communications Officer', 'Senior Application Technician Linux and Java', 'Teaching Assistant Job Kingston, London'], 'Location': ['UK', 'Manchester', 'UK', 'UK', 'Kingston Upon Thames'], 'Company': ['JOBG8', 'Berkeley Scott Limited', 'EasyWebRecruitment.com', 'JOBG8', 'Hays Specialist Recruitment Ltd'], 'ContractType': ['full_time', 'full_time', 'full_time', 'full_time', 'full_time'], 'ContractTime': ['permanent', 'permanent', 'permanent', 'permanent', 'permanent'], 'Category': ['Sales Jobs', 'Hospitality & Catering Jobs', 'PR, Advertising & Marketing Jobs', 'IT Jobs', 'Teaching Jobs'], 'Salary': [34999.0, 25000.0, 18548.0, 66940.0, 16800.0], 'OpenDate': [Timestamp('2013-02-03 15:00:00'), Timestamp('2013-12-23 15:00:00'), Timestamp('2013-01-31 12:00:00'), Timestamp('2013-03-23 12:00:00'), Timestamp('2012-08-01 15:00:00')], 'CloseDate': [Timestamp('2013-03-05 15:00:00'), Timestamp('2014-02-21 15:00:00'), Timestamp('2013-05-01 12:00:00'), Timestamp('2013-06-21 12:00:00'), Timestamp('2012-08-31 15:00:00')], 'SourceName': ['fish4.co.uk', 'fish4.co.uk', 'fish4.co.uk', 'fish4.co.uk', 'fish4.co.uk']}
!docs pandas.Series.str.contains
Series.str.contains(pat, case=True, flags=0, na=None, regex=True)```
Test if pattern or regex is contained within a string of a Series or Index.
Return boolean Series or Index based on whether a given pattern or regex is contained within a string of a Series or Index.
ah
So if you wanted to match against the whole string, your choice of operation is wrong
should just be checking if a substring matches the pattern.
Was this helpful? @serene scaffold
yes. do you understand what Darr is saying?
Those always super helpful btw, pandas questions can be annoying without a sample df to play with.
I thought people just look at it and answer mostly
so @serene scaffold bro what could I do to overcome this? or use the fully vectorized solution?
I always want to be able to replicate the dataframe and verify that my solution is correct, and then help the person arrive at it.
Depends i suppose.i don't always remember the pandas methods offhand and have to run some experiments. There, having some sample df is a life saver.
Cool. So should i always send it. I just feel like the person asks for it if they need it
that means that the person has to sit and wait for you to see that they responded and for you to make the sample. and by that time, they could have just answered your question already.
Tbh, its not even that. One day you'll start solving your own problems just by making mcve (minimal, complete, verifiable example) out of them, it's a surprisingly powerful learning technique. i just don't tell this to the people when i ask them to make an mcve π
It just..happens.
d2 + (d2.sub(d1).dt.total_seconds().mul(1.618).to_numpy().reshape(-1, 1) ** np.arange(1, 10).rehsape(1, -1))
I think that's the whole solution.
---------------------------------------------------------------------------
UFuncTypeError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_8816/714492621.py in <module>
2 m['Date2'] = d2 = pd.to_datetime(m['Date2'])
3
----> 4 op = d2 + (d2.sub(d1).dt.total_seconds().mul(1.618).to_numpy().reshape(-1, 1) ** np.arange(1, 10).reshape(1, -1))
5
6 #timedeltas = [pd.Timedelta(epoch) for epoch in op]
c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\ops\common.py in new_method(self, other)
67 other = item_from_zerodim(other)
68
---> 69 return method(self, other)
70
71 return new_method
c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\arraylike.py in __add__(self, other)
90 @unpack_zerodim_and_defer("__add__")
91 def __add__(self, other):
---> 92 return self._arith_method(other, operator.add)
93
94 @unpack_zerodim_and_defer("__radd__")
c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\series.py in _arith_method(self, other, op)
5524
5525 with np.errstate(all="ignore"):
-> 5526 result = ops.arithmetic_op(lvalues, rvalues, op)
5527
5528 return self._construct_result(result, name=res_name)
c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\ops\array_ops.py in arithmetic_op(left, right, op)
216 # Timedelta/Timestamp and other custom scalars are included in the check
217 # because numexpr will fail on it, see GH#31457
--> 218 res_values = op(left, right)
219 else:
220 # TODO we should handle EAs consistently and move this check before the if/else
UFuncTypeError: ufunc 'add' cannot use operands with types dtype('<M8[ns]') and dtype('float64')
Sick. Sterclus gets mcve everytime now π
what unit do you want for d2 @worthy hollow? seconds?
In [26]: d2.sub(d1).dt.total_seconds().mul(1.618).to_numpy().reshape(-1, 1) ** np.arange(1, 10).reshape(1, -1)
Out[26]:
array([[ 1.69152192e+07, 2.86124641e+14, 4.83986101e+21, 8.18673099e+28, 1.38480349e+36, 2.34242546e+43, 3.96226402e+50, 6.70225644e+57, 1.13370137e+65],
[ 2.34576346e+08, 5.50260619e+16, 1.29078125e+25, 3.02786749e+33, 7.10266091e+41, 1.66611624e+50, 3.90831459e+58, 9.16798154e+66, 2.15059161e+75],
[-2.94967872e+07, 8.70060455e+14, -2.56639881e+22, 7.57005196e+29, -2.23292212e+37, 6.58640285e+44, -1.94277723e+52, 5.73056866e+59, -1.69033364e+67],
[ 8.94689280e+06, 8.00468908e+13, 7.16170951e+20, 6.40750472e+27, 5.73272579e+34, 5.12900831e+41, 4.58886875e+48, 4.10561168e+55, 3.67324676e+62],
[ 2.61417024e+07, 6.83388604e+14, 1.78649415e+22, 4.67019985e+29, 1.22086975e+37, 3.19156135e+44, 8.34328471e+51, 2.18107666e+59, 5.70170570e+66]])
these are all a number of seconds, right?
d2 should actually be a date, like look: d2 + (d2.sub(d1).dt.total_seconds().mul(1.618).to_numpy().reshape(-1, 1) ** np.arange(1, 10).rehsape(1, -1)), the d2 should be a Date and (d2.sub(d1).dt.total_seconds().mul(1.618).to_numpy().reshape(-1, 1) ** np.arange(1, 10).reshape(1, -1)) the rest in parenthesis should be the number of days added to d2 Date --- which will give the expected date
yes
coming back to what i was mentioning about operators in maths, my POV is that OOP makes no sense for math, but python leaves you no choice
I found a solution, but the numbers are too large
In [47]: d2
Out[47]:
0 2009-03-01
1 2013-10-04
2 2013-03-07
3 2013-05-10
4 2013-11-13
Name: Date2, dtype: datetime64[ns]
In [48]: arr.astype('timedelta64[ns]')
Out[48]:
array([[ 16915219, 286124640584048, 'NaT', 'NaT', 'NaT', 'NaT', 'NaT', 'NaT', 'NaT'],
[ 234576345, 55026061915050648, 'NaT', 'NaT', 'NaT', 'NaT', 'NaT', 'NaT', 'NaT']], dtype='timedelta64[ns]')
yeah i think thats the problem with vectorized solution, maybe we need to do something like if a date is too big (lets say >2100) then we dont display? idk
not sure I see the dilemma. numpy/pandas objects behave a lot differently than objects in the traditional Python OOP paradigm
the problem isn't that the solution is vectorized
you'll run into this problem regardless.
the dilemma is interpreting one input as a function acting on the other input
this makes sense for some operations, but not others
for i, col in enumerate(m.columns[2:]):
df1 = pd.to_datetime(m['Date1'])
df2 = pd.to_datetime(m['Date2'])
power_cols = np.arange(1,10)
power_cols = power_cols.astype(int)
m[col] = df2 + (abs(df2 - df1) * (1.618 ** i))
m
worked it out*
@wooden sail any ideas for phi-ve's problem btw? each column of their array gets raised to the nth power (for n columns), and by the third column, the numbers are too large
there were no absolute values in the question as you presented it yesterday 
this is a different formula all together. one moment.
you can do it with modulo arithmetic. you can get the quotient and remainder separately, and use the quotient to modify the years and the remainder to modify months and days, for example
this is getting kinda complex though π wouldn't have thought that doing modulo exponentiation was gonna pop up
sorry if i didnt expressed myself well yesterday
what do u mean
In [64]: np.abs(d2 - d1).dt.days.to_numpy().reshape(-1, 1) * (1.618 ** np.arange(1, 11)).reshape(1, -1)
Out[64]:
array([[1.95778000e+02, 3.16768804e+02, 5.12531925e+02, 8.29276654e+02, 1.34176963e+03, 2.17098326e+03, 3.51265091e+03, 5.68346917e+03, 9.19585312e+03, 1.48788903e+04],
[2.71500400e+03, 4.39287647e+03, 7.10767413e+03, 1.15002167e+04, 1.86073507e+04, 3.01066934e+04, 4.87126300e+04, 7.88170353e+04, 1.27525963e+05, 2.06337008e+05],
[3.41398000e+02, 5.52381964e+02, 8.93754018e+02, 1.44609400e+03, 2.33978009e+03, 3.78576419e+03, 6.12536646e+03, 9.91084293e+03, 1.60357439e+04, 2.59458336e+04],
[1.03552000e+02, 1.67547136e+02, 2.71091266e+02, 4.38625668e+02, 7.09696332e+02, 1.14828866e+03, 1.85793106e+03, 3.00613245e+03, 4.86392231e+03, 7.86982630e+03],
[3.02566000e+02, 4.89551788e+02, 7.92094793e+02, 1.28160938e+03, 2.07364397e+03, 3.35515594e+03, 5.42864231e+03, 8.78354326e+03, 1.42117730e+04, 2.29946487e+04]])
this can get you the number of days you want to add. the numbers are smaller than they look (1.95778000e+02 is just 195.8)
m['Date1'] = d1 = pd.to_datetime(m['Date1'])
m['Date2'] = d2 = pd.to_datetime(m['Date2'])
op = np.abs(d2 - d1).dt.days.to_numpy().reshape(-1, 1) * (1.618 ** np.arange(1, 11)).reshape(1, -1)
final = d2 + op
print(final)
---------------------------------------------------------------------------
UFuncTypeError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_8816/432421814.py in <module>
5
6 op = np.abs(d2 - d1).dt.days.to_numpy().reshape(-1, 1) * (1.618 ** np.arange(1, 11)).reshape(1, -1)
----> 7 final = d2 + op
8
9 print(final)
c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\ops\common.py in new_method(self, other)
67 other = item_from_zerodim(other)
68
---> 69 return method(self, other)
70
71 return new_method
c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\arraylike.py in __add__(self, other)
90 @unpack_zerodim_and_defer("__add__")
91 def __add__(self, other):
---> 92 return self._arith_method(other, operator.add)
93
94 @unpack_zerodim_and_defer("__radd__")
c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\series.py in _arith_method(self, other, op)
5524
5525 with np.errstate(all="ignore"):
-> 5526 result = ops.arithmetic_op(lvalues, rvalues, op)
5527
5528 return self._construct_result(result, name=res_name)
c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\ops\array_ops.py in arithmetic_op(left, right, op)
216 # Timedelta/Timestamp and other custom scalars are included in the check
217 # because numexpr will fail on it, see GH#31457
--> 218 res_values = op(left, right)
219 else:
220 # TODO we should handle EAs consistently and move this check before the if/else
UFuncTypeError: ufunc 'add' cannot use operands with types dtype('<M8[ns]') and dtype('float64')
In [70]: arr.astype('timedelta64[D]') + d2.to_numpy().reshape(-1, 1)
Out[70]:
array([['2009-09-12T00:00:00.000000000', '2010-01-11T00:00:00.000000000', '2010-07-26T00:00:00.000000000', '2011-06-08T00:00:00.000000000', '2012-11-01T00:00:00.000000000', '2015-02-08T00:00:00.000000000', '2018-10-12T00:00:00.000000000', '2024-09-21T00:00:00.000000000', '2034-05-04T00:00:00.000000000', '2049-11-24T00:00:00.000000000'],
['2021-03-11T00:00:00.000000000', '2025-10-13T00:00:00.000000000', '2033-03-20T00:00:00.000000000', '2045-03-30T00:00:00.000000000', '2064-09-13T00:00:00.000000000', '2096-03-08T00:00:00.000000000', '2147-02-16T00:00:00.000000000', '2229-07-21T00:00:00.000000000', '1778-05-10T00:25:26.290448384', '1994-02-19T00:25:26.290448384'],
['2014-02-11T00:00:00.000000000', '2014-09-10T00:00:00.000000000', '2015-08-17T00:00:00.000000000', '2017-02-20T00:00:00.000000000', '2019-08-02T00:00:00.000000000', '2023-07-18T00:00:00.000000000', '2029-12-13T00:00:00.000000000', '2040-04-24T00:00:00.000000000', '2057-01-30T00:00:00.000000000', '2084-03-19T00:00:00.000000000'],
['2013-08-21T00:00:00.000000000', '2013-10-24T00:00:00.000000000', '2014-02-05T00:00:00.000000000', '2014-07-22T00:00:00.000000000', '2015-04-19T00:00:00.000000000', '2016-07-01T00:00:00.000000000', '2018-06-10T00:00:00.000000000', '2021-08-02T00:00:00.000000000', '2026-09-02T00:00:00.000000000', '2034-11-25T00:00:00.000000000'],
['2014-09-11T00:00:00.000000000', '2015-03-17T00:00:00.000000000', '2016-01-14T00:00:00.000000000', '2017-05-17T00:00:00.000000000', '2019-07-18T00:00:00.000000000', '2023-01-20T00:00:00.000000000', '2028-09-23T00:00:00.000000000', '2037-11-30T00:00:00.000000000', '2052-10-10T00:00:00.000000000', '2076-10-27T00:00:00.000000000']], dtype='datetime64[ns]')
@worthy hollow sorry for dragging you through the mud of my thought processes for all this.
no worry!!! its very interesting
m['Date1'] = d1 = pd.to_datetime(m['Date1'])
m['Date2'] = d2 = pd.to_datetime(m['Date2'])
op = np.abs(d2 - d1).dt.days.to_numpy().reshape(-1, 1) * (1.618 ** np.arange(1, 11)).reshape(1, -1)
final = d2.astype('timedelta64[D]') + d2.to_numpy().reshape(-1, 1) + op
print(final)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_8816/874748170.py in <module>
3
4 op = np.abs(d2 - d1).dt.days.to_numpy().reshape(-1, 1) * (1.618 ** np.arange(1, 11)).reshape(1, -1)
----> 5 final = d2.astype('timedelta64[D]') + d2.to_numpy().reshape(-1, 1) + op
6
7 print(final)
c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\generic.py in astype(self, dtype, copy, errors)
5813 else:
5814 # else, only a single dtype is given
-> 5815 new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
5816 return self._constructor(new_data).__finalize__(self, method="astype")
5817
c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\internals\managers.py in astype(self, dtype, copy, errors)
416
417 def astype(self: T, dtype, copy: bool = False, errors: str = "raise") -> T:
--> 418 return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
419
420 def convert(
c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\internals\managers.py in apply(self, f, align_keys, ignore_failures, **kwargs)
325 applied = b.apply(f, **kwargs)
326 else:
--> 327 applied = getattr(b, f)(**kwargs)
328 except (TypeError, NotImplementedError):
329 if not ignore_failures:
c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\internals\blocks.py in astype(self, dtype, copy, errors)
589 values = self.values
590
--> 591 new_values = astype_array_safe(values, dtype, copy=copy, errors=errors)
592
593 new_values = maybe_coerce_values(new_values)
c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\dtypes\cast.py in astype_array_safe(values, dtype, copy, errors)
1307
1308 try:
-> 1309 new_values = astype_array(values, dtype, copy=copy)
1310 except (ValueError, TypeError):
1311 # e.g. astype_nansafe can fail on object-dtype of strings
c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\dtypes\cast.py in astype_array(values, dtype, copy)
...
--> 424 raise TypeError(msg)
425 elif is_categorical_dtype(dtype):
426 arr_cls = dtype.construct_array_type()
TypeError: Cannot cast DatetimeArray to dtype timedelta64[D]
final = op.astype('timedelta64[D]') + d2.to_numpy().reshape(-1, 1)
this line was wrong.
op is an (x, 10) shape array of floats, but we can get the number of days it represents with .astype('timedelta64[D]'). and then d2 is a (x,) shaped array, so we can do broadcasting if we reshape it to (x, 1), so that final is also (x, 10) shaped.
sorry
delete the ex = part entirely.
[['2009-09-12T00:00:00.000000000' '2010-01-11T00:00:00.000000000'
'2010-07-26T00:00:00.000000000' ... '2024-09-21T00:00:00.000000000'
'2034-05-04T00:00:00.000000000' '2049-11-24T00:00:00.000000000']
['2021-03-11T00:00:00.000000000' '2025-10-13T00:00:00.000000000'
'2033-03-20T00:00:00.000000000' ... '2229-07-21T00:00:00.000000000'
'1778-05-10T00:25:26.290448384' '1994-02-19T00:25:26.290448384']
['2014-02-11T00:00:00.000000000' '2014-09-10T00:00:00.000000000'
'2015-08-17T00:00:00.000000000' ... '2040-04-24T00:00:00.000000000'
'2057-01-30T00:00:00.000000000' '2084-03-19T00:00:00.000000000']
...
['2022-08-14T00:00:00.000000000' '2022-12-17T00:00:00.000000000'
'2023-07-07T00:00:00.000000000' ... '2038-02-20T00:00:00.000000000'
'2048-01-27T00:00:00.000000000' '2064-02-23T00:00:00.000000000']
['2022-10-02T00:00:00.000000000' '2023-02-15T00:00:00.000000000'
'2023-09-23T00:00:00.000000000' ... '2039-08-22T00:00:00.000000000'
'2050-06-12T00:00:00.000000000' '2067-12-08T00:00:00.000000000']
['2022-08-24T00:00:00.000000000' '2022-11-24T00:00:00.000000000'
'2023-04-22T00:00:00.000000000' ... '2034-01-26T00:00:00.000000000'
'2041-05-19T00:00:00.000000000' '2053-03-18T00:00:00.000000000']]
π₯
but sadly idk why it doesnt give the same output as my initial excel file
sorry i'm forced to send screenshot
I don't know why that is :/
i'll assume that the python file is correct and excel one has some issue - we did everything well on python we tried to translate as much possible this excel code: excel =INT($C4+(($C4-$B4)*(A$3^D$3)))
these formulae look different.
they use those
ah? compared to those you did on python?
well 1 . 618
it just the .
after the 0
in excel it use a comma instead of a point
where do u see they are different
I might be able to look at it again later
for sure thanks for your time and effort! will look forward when u back
@worthy hollow so you have =INT($C4+(($C4-$B4)*(A$3^D$3))). I haven't used excel for years, but it looks like everything is offset. why do you have C4 and B4 on the left, but A3 and D3 on the right?
ok so
B4 = Date1
C4= Date2
A$3 = 1.618
D$3 = col numbered power
bcuz we want to doo the power
only to 1.618
thats (A$3^D$3) first --- in other words: (1.618 ** np.arange(1, 11))
then ```(($C4-$B4)*(A$3^D$3)) --- in other words ((Date2 - Date1)) * ((1.618 ** np.arange(1, 11)))
then C4 + (($C4-$B4)*(A$3^D$3)) in other words Date2 + (((Date2 - Date1)) * ((1.618 ** np.arange(1, 11)))
I'm surprised this is still ongoing, can I have an update on the latest problem?
here's all the code. they say the result is different than in their spreadsheet https://paste.pythondiscord.com/ociharepoh
let me try
yea we managed to fix the error thanks to @serene scaffold
i want to be sure the operation we did in python is the exact same as this one on excel
bcuz so far, we haven't found identical result on python and on excel, it should be the same tho
I'm getting weird calculations when I try it on python too lmao
yea thats weird
it is possible. (the use of "cant" in that sentence is confusing.)
we have no idea what error you got unless you tell us. so try showing the code and the error message.
ok I'm getting weird ISO-date conversions
Can you confirm the first 3 rows dates in English?
16/04/2009 - 04/03/2020 - 15/11/2013
31st October 2008, 3rd January 2009
3rd January 2009, 10th April 2013
10th April 2013, 3rd July 2013
those right (in black)
ohhh
yea
I'm sorry, because AMERICA date formats, all ISO date formats are screwed
UK dates are best
DdmmYyyy
yeah i use those : ```py
'%d/%m/%Y'
@spare briarο»Ώ so i actually got a round to training a neural network over night, i followed the Vicreg paper which was straight forward enough :).
But it looks like the performance cost is relatively high, since the projection head has so many parameters. using effnetv2 as the backbone, even a small projection head of (2048-2048-512) has more parameters than the backbone :/
i think for my use case using the default effnetv2 is best, even if it wasnt trained for this
you dont need the projector to be that big!
the vicreg paper used a 81924-81924-81924-2048 projector!
oh wow
yeah that big projector assumes a huge backbone
one layer? or multi layer?
like vit large
what backbone did you guys use?
we dont use it but have gotten resnet18 scale with 64 proj dim to work
@worthy hollow Still here? I think I can reproduce the excel result-ish
yeah bro
Well wait, I don't have it, unfortunately there's an issue with the timedelta in ns
i can send you my data and the exact code i have so you can try on your pc if thats more easier for u?
oh wow, thats pretty cool!
btw, feel free to check out my results:
https://search.technicallyruns.com/
uploaded images are not stored btw
i just have 150k images that i added that you can search through
used to have 1.6million but dropped that db by accident, you know how it is haha
software engineer?
yeah
@spare briar youβll be very happy to hear Iβm learning calculus π
@worthy hollow Well, unfortunately it seems in standard pandas it won't really have the tools you're looking for to complete what you need.
At least with what I'm doing
i've been working as a software engineer for a little bit over a year now, but currently studying something mostly unrelated π
ahhh thats fucked up π¦
Ok wait nvm
There is something, but it is so convoluted, now I understand why people use Excel LMAO
ahahah yeah for calculation sometimes it feels much more friendly
but python is greater than excel in a lot of things
Well one issue is that you're handling dates which don't make sense
If I'm looking at it right, you're looking at year 2537?
This is the maximum possible date in pandas:
https://pandas.pydata.org/docs/reference/api/pandas.Timestamp.max.html
There's about a 300 year difference
So in short - it's possible to do what you want in Python, but you will need customised datatyping
honestly i don't mind for such big year
like
if it go above 2322 it wil lreturn NaN
and i'm fine with that
Going back to this, I don't think you're adding the date back in Python
But you can show me your final Python code
ok wait
so this code
from datetime import datetime
m618 = m.copy()
for i, col in enumerate(m618.columns[2:]):
df1 = pd.to_datetime(m618['Date1'])
df2 = pd.to_datetime(m618['Date2'])
m618[col] = df2 + ((df2 - df1) * (1.618 ** i))
m618```
give me this dataframe
Does it not produce the expected result?
You should check the months and day
Print out df2 - df1 only
You are looking for 64, 1558, 84.
Anything else means the dates are not encoded properly
0 121 days
1 1678 days
2 -211 days
3 64 days
4 187 days
...
173 83 days
174 171 days
175 125 days
176 136 days
177 92 days
Length: 178, dtype: timedelta64[ns]```
using print(df2 - df1)
Your months and days are not properly interpreted in pandas
how comes? i dont understand
10/10/1990 is obviously 10th October 1990
But 12/1/1990 can be 12th January 1990 or 1st December 1990
i didnt involved it in the first code, but i think the day month and years are well set you can check here : ```py
from datetime import datetime
m618 = m.copy()
for i, col in enumerate(m618.columns[2:]):
df1 = pd.to_datetime(m618['Date1'])
df2 = pd.to_datetime(m618['Date2'])
m618[col] = df2 + ((df2 - df1) * (1.618 ** i))
m618['Date1'] = m618['Date1'].dt.strftime('%d/%m/%Y')
m618['Date2'] = m618['Date2'].dt.strftime('%d/%m/%Y')
m618['1'] = m618['1'].dt.strftime('%d/%m/%Y')
m618['2'] = m618['2'].dt.strftime('%d/%m/%Y')
m618['3'] = m618['3'].dt.strftime('%d/%m/%Y')
m618['4'] = m618['4'].dt.strftime('%d/%m/%Y')
m618['5'] = m618['5'].dt.strftime('%d/%m/%Y')
m618['6'] = m618['6'].dt.strftime('%d/%m/%Y')
m618['7'] = m618['7'].dt.strftime('%d/%m/%Y')
m618['8'] = m618['8'].dt.strftime('%d/%m/%Y')
m618['9'] = m618['9'].dt.strftime('%d/%m/%Y')
m618['10'] = m618['10'].dt.strftime('%d/%m/%Y')
m618```
i do .dt.strftime('%d/%m/%Y')
on every columns so i can get the right days/month/years
Well, don't ask me why
But it only takes 64 days from October 2008 to January 2009. not 121 days = 4 months
So the date-reading is doing something wrong
honestly i have not a single idea how thats possible - boring af
lol just fix it, and it should produce the same results
I can get the first row to match perfectly
DatetimeIndex(['2009-04-16 13:14:52.800012288',
'2009-06-19 13:07:52.550390784',
'2009-10-01 02:11:25.386559488',
'2010-03-17 15:00:57.755430144',
'2010-12-13 16:42:43.048285184',
'2012-02-25 06:55:40.612135936',
'2014-02-03 22:20:43.510441472',
'2017-03-28 03:10:43.999886592',
'2022-04-28 22:08:07.591830016',
'2030-07-21 19:49:52.123568128'],
how did u manage to get that
can u show me ur code to read them properly? i will try to adapt it to my code and see how it goes
that must be something rly silly that bring this error
I can't, because I'm creating it out of thin-air
aah
d1 = to_datetime(["2008/10/31", "2009/01/03", "2013/04/10"]).to_frame(index=False)
d2 = to_datetime(["2009/01/03", "2013/04/10", "2013/07/03"]).to_frame(index=False)
wait i will do something on the excel data i use for this ooperation i think its coming from here
i changed my date to this
this now brings me this error ```py
OverflowError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_3036/898974524.py in <module>
6 df1 = pd.to_datetime(m618['Date1'])
7 df2 = pd.to_datetime(m618['Date2'])
----> 8 m618[col] = df2 + ((df2 - df1) * (1.618 ** i))
9
10 m618['Date1'] = m618['Date1'].dt.strftime('%d/%m/%Y')
c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\ops\common.py in new_method(self, other)
67 other = item_from_zerodim(other)
68
---> 69 return method(self, other)
70
71 return new_method
c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\arraylike.py in add(self, other)
90 @unpack_zerodim_and_defer("add")
91 def add(self, other):
---> 92 return self._arith_method(other, operator.add)
93
94 @unpack_zerodim_and_defer("radd")
c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\series.py in _arith_method(self, other, op)
5524
5525 with np.errstate(all="ignore"):
-> 5526 result = ops.arithmetic_op(lvalues, rvalues, op)
5527
5528 return self._construct_result(result, name=res_name)
c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\ops\array_ops.py in arithmetic_op(left, right, op)
216 # Timedelta/Timestamp and other custom scalars are included in the check
217 # because numexpr will fail on it, see GH#31457
--> 218 res_values = op(left, right)
219 else:
220 # TODO we should handle EAs consistently and move this check before the if/else
c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\ops\common.py in new_method(self, other)
67 other = item_from_zerodim(other)
68
---> 69 return method(self, other)
70
71 return new_method
c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\arrays\datetimelike.py in add(self, other)
...
-> 1112 raise OverflowError("Overflow in int64 addition")
1113 return arr + b
1114
OverflowError: Overflow in int64 addition
i think its the 2300+ years right?
Should be I think
Not sure if you can somehow ignore the OverflowError in pandas
but it is a pain
for pandas errors, it's sufficient to show this much of the call stack
---------------------------------------------------------------------------
OverflowError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_3036/898974524.py in <module>
6 df1 = pd.to_datetime(m618['Date1'])
7 df2 = pd.to_datetime(m618['Date2'])
----> 8 m618[col] = df2 + ((df2 - df1) * (1.618 ** i))
9
10 m618['Date1'] = m618['Date1'].dt.strftime('%d/%m/%Y')
OverflowError: Overflow in int64 addition
the parts of the call stack that are internal to pandas aren't that interesting.
ok
you scooped that right bcuz look now it give the exact right dates
yea bcuz i mispelled the date to 2103 instead of 2013
actually with the vectorized version
it seems to give the right one
wait lemme replace all the date in the .csv sheet to the right format as u did
Oo nice
Looking good generally?
Anyway you just taught me to not touch pd time objects within 10 metres
is there another way to switch %d/%m/%Y to %Y/%m/%d, than doing it manually on notepad++? it's so long to do
no, don't do it manually
You should be able to change the way it is read in python instead of changing the data input
m['Date1'] = m['Date1'].dt.strftime('%Y/%m/%d')
``` sometimes i'm very dumb lol i've used it just after
lemme see how it look now
show the same now!!!!!
THANKS A LOT @serene scaffold @shell crest you guys are genius
grats
one last question
how can i delete anything after the %Y/%m/%d date?
Date1 Date2 1 2 3 4 5 6 7 8 9 10
0 2008-10-31 2009-01-03 2009-03-08 2009-04-23 20:21:07.200 2009-07-13 23:43:46.790400000 2009-12-01 12:35:16.000972796 2010-08-01 22:22:40.913684888 2011-09-27 12:21:31.502502224 2013-09-26 16:42:04.882333856 2017-03-14 08:51:45.896202240 2023-03-13 18:48:22.612222272 2033-08-01 13:56:54.524368896
change the print settings for pandas. I'm sure it's in there somewhere.
Seems to ask for a formatter
Or maybe just use strftime directly somehow
!docs pandas.DataFrame.to_string
DataFrame.to_string(buf=None, columns=None, col_space=None, header=True, index=True, na_rep='NaN', formatters=None, float_format=None, sparsify=None, index_names=True, ...)```
Render a DataFrame to a console-friendly tabular output.
from datetime import datetime
m618 = m.copy()
for i, col in enumerate(m618.columns[2:]):
df1 = pd.to_datetime(m618['Date1'])
df2 = pd.to_datetime(m618['Date2'])
m618[col] = df2 + ((df2 - df1) * (1.618 ** i))
m618['Date1'] = pd.to_datetime(m618['Date1'])
m618['Date2'] = pd.to_datetime(m618['Date2'])
m618['Date1'] = m618['Date1'].dt.strftime('%Y/%m/%d')
m618['Date2'] = m618['Date2'].dt.strftime('%Y/%m/%d')
m618['1'] = m618['1'].dt.strftime('%Y/%m/%d')
m618['2'] = m618['2'].dt.strftime('%Y/%m/%d')
m618['3'] = m618['3'].dt.strftime('%Y/%m/%d')
m618['4'] = m618['4'].dt.strftime('%Y/%m/%d')
m618['5'] = m618['5'].dt.strftime('%Y/%m/%d')
m618['6'] = m618['6'].dt.strftime('%Y/%m/%d')
m618['7'] = m618['7'].dt.strftime('%Y/%m/%d')
m618['8'] = m618['8'].dt.strftime('%Y/%m/%d')
m618['9'] = m618['9'].dt.strftime('%Y/%m/%d')
m618['10'] = m618['10'].dt.strftime('%Y/%m/%d')
m618
Uhh can you not write the ['1'] and so on
huge thx lads
what do u mean
my_variable['1'] = my_variable['1'].method()
...
my_variable['1000'] = my_variable['1000'].method()
is the same as
for i in range(1, 1000+1):
my_variable[str(i)] = my_variable[str(i)].method()
Condensing 1000 lines into 2
thanks a lot i will re use this code u just sent a lot
weirdly im starting to not having
the expected result with the range method
lol this has been such a headache i think i will let the 1000 lines
at least it worked out
can someone send me a good tutorial for pytorch?? like the coding not the math?
Hello I need some help here to decide
I am not from CS background, currently studying CS50's Web development course.
And I am also curious about Data Analysis.
Should I do this certification course of google on coursera.
google teaches data analysis with R.
https://www.coursera.org/professional-certificates/google-data-analytics?
or should I do this course of Finland's university of helsinki.
this course teaches Data analysis with python
https://dap-21.mooc.fi/ (edited)
Offered by Google. This is your path to a career in data analytics. In this program, youβll learn in-demand skills that will have you ... Enroll for free.
An online course open to everyone at the University of Helsinki that teaches data analysis with Python. The course gives an overview of the different phases of the data analysis pipeline using Python. Participation in the course does not require prior knowledge of Python but it is assumed that you have good programming skills in some language. L...
Googles certification course I think is in depth which will make me job ready thats what they are saying
but its not free
Finlands course is free but not in depth
btw @shell crest @serene scaffold my project readme: https://github.com/amirlehmam/astrotool when I finish it i'll give you guys some account to access our web interface when everything is done (IF ONLY YOU ARE INTERESTED IN FINANCE) - this is the least i can do for thanking you guys
Astrotool.py. Contribute to amirlehmam/astrotool development by creating an account on GitHub.
aah stock market price prediction
based on a range of natural laws and esoteric principles that 99% of traders doesnt even know π
Hello guys need some help here....................................
I've read some articles on it, not sure if I ever wanna go down that road. But a lot of people are researching this tho.
yeah people do research
I did the Google one and can recommend it, but "job ready" feels like a little of a stretch somehow tbh. Should be enough for entry-level positions though.
it doesn't goes very in depth in R at all, and is very focused on the analytics part and generic data stuff - that other one you linked seems to cover some Machine Learning?
Yes that covers machine learning
but not a lot excel on this matter trust me, the knowledge needed is so secretive and well hidden, only a few got access, anyway the purpose of the project will be to display in advance the major turning point of any stock market or crypto.. Above is an example on BTC.. You can clearly see how much our dates are correlated to the actual price action...
we get our dates from 10 to 100 years in advance from this calendar, that is generated from all our 20 different methods... Today the help i received was just for one specific method...
lol, it's easy to say that kind of stuff looking at historical data
if predicting huge changes to the market was that easy, big corporations would be doing it way sooner than any independent person
@agile cobalt
this one too
https://www.coursera.org/professional-certificates/ibm-data-science
man haven't you checked what i just said above
we have those PIVOT DATES far far in advance years before they do get printed by the market π
sure, go get rich on your own and do not try to market it to others then cya
and trust me, the price action follow this much the dates BECAUSE the market makers (banks, hedgefund etcs...) does follow the same path because they all use this kind of knowledge at the highest degree of their direction
ouch man you are pretty mad I ain't marketing shit lol I was just saying for the two dudes that helped me out there they'll get free access BECAUSE THEY DESERVE IT ahaha not like some others
cya
idk, doesn't sounds any better than the others at first glance
Okayπ
you can audit / try for free the first few days of each of them and stick to whichever one's format you like the most
Yes I will.
Do you have other recommendations?
i need some ai projects for a begineer
i want to make a ai chatbot
but i cant find any library or api that would work
@lapis sequoia download a kaggle datasets and try manipulating it with pandas to demonstrate some of its interesting properties
Chat bots sound cool, but they really aren't.
yeah, i know i just need to learn something with ai
More formally, what I was suggesting was that you practice exploratory data analysis.
AI doesn't lend itself well to beginner "projects" that have a finished product at the end, but there are still things you can do to start developing your skills.
ok
a good chatbot isnt a beginner project
thats like YoE level shit
I have read many times: unsupervised and self supervised same. No exception?
what do they mean by saying Don't use the dataset in commercial purposes do they mean selling the data itself or using the model in applications which make people pay for it
self supervised is a sub-field of unsupervised learning
They're not synonyms
self supervised is always unsupervised, but not the other way around
Right. Don't use the dataset for something where making money is involved
Read this #data-science-and-ml message
wow
def negamax(self, grid: np.array, player: int, depth: int, alpha: float, beta: float, started: bool) -> tuple[int, int] | int:
if depth == 0 or self.check_win(grid=grid):
return -depth
moves = self.get_all_moves(grid=grid) # , player=player)
if not moves:
return 0
value = -inf
for move in moves:
new_grid = np.copy(grid)
self.make_move(grid=new_grid, move=move, player=player)
nmv = -self.negamax(
grid=new_grid,
player=self.next_turn(player=player),
depth=depth-1,
alpha=-beta,
beta=-alpha,
started=False,
)
if nmv > value:
value = nmv
best_move = move
alpha = max(alpha, value)
if alpha >= beta:
break
return best_move if started else value``` this is my negamax algorithm for my connect 4 ai, its for my discord bot so its current starting depth is 6,
but now (i think this is why) when it sees a certain loss, it basically 'gives up' because it assumes the opponent plays perfectly,
but of course humans do not play perfectly at all, so how could i improve this to still choose moves that prevent loss at a low depth (i think that will solve it, i can be very wrong though please correct me if so)
How does it calculate
calculate what
Best move
when it returns -depth
How does it calculate that
well did you look at the algorithm
goes through all moves, checks if someone has won, if so, return -depth else do that again recursively
That isnβt calculating a move
Lots of boardy type games ai are based off of like
AI ai
Like calculated optimal moves based off of a lot of potential moves
yeah thats basically what it does
I canβt see any sort of neural network
but since the depth is only 6 and the board is a connect 4 board so 6x7 it reaches depth 0 pretty quickly
well its just recursion
its a negamax algorithm
figured this channel was the most suitable
Probably go to DSA
whats that D:
oh wait thats better yeah
Thatβs for coding algos like@urs
sorry, ill move there
This is the correct room. Search algorithms with extra memorization is how the term "machine learning" came to be used.
You can memorize certain states and know what to play from there without doing the actual search. This helps a lot if your search has low depth. The trade-off is more memory usage. You can have it more loosely match states so that you need less memorized, but how well that works depends on if similar looking states require similar (optimal) plays, etc. To this end, more advanced methods in machine learning including the current SOTA for board games, neural networks, can be applied. Sub-optimal play from the opponent can really throw off an agent with low depth search because it can easily end up in a branch that it did not search because it was not high priority (assumes opponent won't make ridiculous plays).
ill look into that, thanks, helps a lot!
can targets be defined as actions too?
I LOVE

Would anyone be interested in trying out an AI chat bot I helped develop for discord? It utilizes a personalized version of GPT-3 and I'm looking for testers
is sentdex's ai playlist good enough to get the basics started? or is there any other tuts which yall would recommend?
I need help with pyspark:
I have a column containing such values in pyspark dataframe.
+--------------------------+
|value_ml_actuals_quarterly|
+--------------------------+
| {}|
| {"stage1_stage2_v...|
| {"stage1_stage2_v...|
| {"stage4_stage5_v...|
| {"stage4_stage5_v...|
| {"stage4_stage5_v...|
| {"stage4_stage5_v...|
| {"stage4_stage5_v...|
| {"stage4_stage5_v...|
| {}|
| {}|
| {}|
| {}|
| {}|
| {"stage1_stage2_v...|
| {"stage1_stage2_v...|
| {"stage1_stage2_v...|
| {"stage1_stage2_v...|
| {"stage1_stage2_v...|
| {"stage1_stage2_v...|
+--------------------------+
{"stage1_stage2_value_q0":155377.25760193774,"stage1_stage2_value_q1":1.6324835169675915,"stage1_stage2_value_q2":1.2416516765040377,"stage1_stage2_value_q3":0.6989978731097944,"stage1_stage3_value_q0":153358.93874629532,"stage1_stage3_value_q1":1046.664551481815,"stage1_stage3_value_q2":1113.5050521549135,"stage1_stage3_value_q3":307.54128324443144,"stage1_stage4_value_q0":155332.70160937821,"stage1_stage4_value_q1":52.833406048086644,"stage1_stage4_value_q2":27.174443064288468,"stage1_stage5_value_q0":152331.90042767464,"stage1_stage5_value_q1":514.7405984413591,"stage1_stage5_value_q2":1187.6654328153859,"stage1_stage5_value_q3":1800.8622445981073,"stage1_stage6_value_q0":154394.15477047203,"stage4_stage7_value_q2":5343.860267413727}
I want to do such query over the dataframe like we do in sql
select count(*) from (select JSONExtractFloat(value_ml_actuals_quarterly, 'stage4_stage6_value_q0') as temp_q0, JSONExtractFloat(value_ml_actuals_overall, 'stage4_stage6_value_overall') as temp_overall from stored_table) where temp_q0 > temp_overall
Thought someone would try to call me out on that, but I still think in 2022 youβre wasting your time coming to a data science room and not a DSand algorithms room for your search algorithm
Machine learning based on data is the current meta of this channel Iβm pretty sure
And people would probably be of better help over there
This channel is so confusing π
what confuses you about it?
Hi guys!
I just started my master program where we will use python to see the water balance in a river. First day and we was given a difficult task (At least for me who has never used python before).
I will be using Python in Jupyter notebook and will be handling Xlsx and kml files.
Is there anyone who would like to help me?
idk what you are talking about this is a classic ai problem
leads to ideas like https://en.wikipedia.org/wiki/A*_search_algorithm and eventually https://en.wikipedia.org/wiki/Monte_Carlo_tree_search which is used in modern ai systems like alphago/alphastar
i think their confusion stems from optimization falling square in the middle of AI/ML, but obviously overlapping with pretty much everything else, since optimization tasks are widespread in many disciplines
You might wanna explore NLU ( Natural Language Understanding) specifically, Intent and Entity extraction
you should ask a specific question once you have one. idk what a kml file is, but you can manipulate CSV data with pandas.
you'd want to look into intent classification. you can use information about the structure of the sentence as features. spaCy can help you with this.
So I have this task with several question but lets start with the first one. I will be looking at the Water balance of the Balkh River basin in Afghanistan. I have been given data from 6 individual streamflow stations in a table shows the Station attributes. Streamflow data, as well as time series of precipitation, reference evapotranspiration and air temperature for the contributing areas of all stations are provided on DTU Learn. Map layers containing river network, catchments and stations are distributed as google earth kml files on DTU Learn.
I have also been given Xlsx that I uploaded into Jupyter notebook that I will be using.
My question is how I should start to:
Plot precipitation, cumulative precipitation, reference ET, cumulative reference ET, air
temperature and discharge as a function of time for the 6 catchments ?
wat code?
this server has a channel dedicated literally to search algorithms, not sure what you are talking about
just because its applicable to ai in certain context doesnt mean that the algorithms/search channel wouldnt be more useful for someone wanting help on that
!epy import pandas as pd data = pd.DataFrame([["Billy","Test 1", 90, 80], ["Billy","Test 2", 70, 60], ["Tommy", "Test 1", 30, 40], ["Tommy", "Test 2", 45, 50]], columns =["Name", "Test Name", "Mid-Term", "End-Term"]) print (data) Need some help trying to understand the sort of issues I might have building a model to predict scores of 100x students with 20x tests with a data set that has this sort of format
@elfin jungle :white_check_mark: Your 3.11 eval job has completed with return code 0.
001 | Name Test Name Mid-Term End-Term
002 | 0 Billy Test 1 90 80
003 | 1 Billy Test 2 70 60
004 | 2 Tommy Test 1 30 40
005 | 3 Tommy Test 2 45 50
Not all search is the same, you could argue that stochastic gradient descent performs search over NN model parameters, should all deep learning discussion be in the search channel?
all neural network architecture discussion is just choosing a family of functions to search over
all the code here
why are you dying on this hill when its literally someone trying to find help for his search algorithm when theres a dedicated channel of people whod be way more useful than us
look at his code and you will see that hed be better off there
id say our code is simpler than anywhere else
it looks difficult tho
how so?
its probably easier to code than any other area of this server
well wtf is this?:
select count(*) from (select JSONExtractFloat(value_ml_actuals_quarterly, 'stage4_stage6_value_q0') as temp_q0, JSONExtractFloat(value_ml_actuals_overall, 'stage4_stage6_value_overall') as temp_overall from stored_table) where temp_q0 > temp_overall
SQL ?
yes
its someone using sql
yes
not python x)
real hard sql is on another level
I dont want to know that lmao
yeah i rly suck at it personally
I now have newfound respect for experienced programmers and coders
why do you think you should be answering people's questions when you are such a goober noob
i think u shud shut your mouth tbh. ur literally the only person on this channel with their head stuck that far up their ass. I was trying to help that person find better help in a more relevent server
cut it out, what is even going on here
this guys consistently acting like this
when
That's not the way we treat other users. If you have an issue with another issue, report it to us. Don't throw a tantrum
literally liek x3 times now youve been petty on that level
3x to me and multiple times to someone else
its not petty, you give bad advice, I don't know how you have such low self awareness -
feel free to look at my history
I'm looking at this current situation. And I'm saying you need to take a step back
im pretty sure asking someone to take algos to algos is good advice
you are massively overconfident and wrong
even if you want to die on the hill that search algos ahve a place in ML
im literally not confident tho
youre just shitting on other people because thats how you feel
didn't you start learning ml like a month ago, don't know even basics, a full decade from being hirable in a ml role
I'm literally not confident
this is the shit im talkigna bout tho?
and no, i started my masters in this a year ago if taht even matters
your comments are arrogant on another level, even if mine seem ignorant and im sure everyone has just been made aware of that from your prior comment
alright I've said my part
For the record, this comment was not in line with our #code-of-conduct either.
The questions that were asked about search methods seemed relevant enough to the conversation. If you felt that it was better that they ask or talk about it in #algos-and-data-structs (which isn't specifically a search channel, btw), there were much nicer ways you could have put it without attempting to shutdown the things that were already being discussed.
but if anyone wants to discuss this further, please send a message to @sonic vapor.
fair enough
would it be a gamble to guess that it would stop any negative values and keep the function relative scale?
because the squared terms have nicer properties. you can interpret what "nice" means in different ways
I didn't understand the instructor as follows it's squared bcas the negative and positive value has to be fit to approximate the output
Not exactly sorry if I'm wrong,
a sum of squares is what we call "positive semi definite", or in other words it is always >= 0. then it is easy to see that the goal is to set this equal to 0, which corresponds to "minimizing"
i'm not sure what you mean by this
As in the course he says
consider that error = estimate - measurement. then we want the error to be close to zero. if the error is 0, then estimate = measurement
there are many ways to try to solve estimate = measurement. one of them is by minimizing the absolute value of the error
absolute value is equal to squaring, then taking the square root (or taking the vector norm/l-2 norm depending on what you're more comfortable with). then we can square this because squaring a non negative number preserves ordering
there is also a probabilistic interpretation, if you assume a model with gaussian distributed noise [y = wx + \epsilon where \epsilon ~ N(0,1)] then this loss function gives you the maximum likelihood solution
see https://www.microsoft.com/en-us/research/uploads/prod/2006/01/Bishop-Pattern-Recognition-and-Machine-Learning-2006.pdf 3.1.1 Maximum likelihood and least squares
we have .latex in this channel, if you like that
.latex $ y = wx + \epsilon where \epsilon ~ N(0,1) $
in this case you add a nonlinearity from the sigmoid which complicates things (the linear case actually has a closed form solution, no SGD needed) but it ideologically comes from the same place
.latex let $\bm{y} \in \mathbb{R}^m$ be given by
\begin{align*}
\bm{y} = \bm{Ax} + \bm{n},
\end{align*}
with $\bm{x}, \bm{n} \in \mathbb{R}^n$ and $\bm{n} \sim \mathcal{N}(\bm{0}, \bm{\Sigma})$. then the maximum likelihood estimator of $\bm{x}$ is the solution to the classical linear least squares expression
i guess i should've specified with n AWGN or \Sigma diagonal
Hello everyone, does anyone know how to solve the error,
job exception: 'XGBRegressor' object has no attribute 'XGBClassifier'
Trying to tune my model so I can make it as efficient and successful as possible, but when doing so, this pops up
Show code
Show import line and the tuning line
Copy, please hold
IMPORT
The error is coming in the second picture in the block of code under space
@steady basalt
Ur using xgbclasifier in ur function
I swear the image changed from RSE to accuracy
MSE rather
Sorry, I just swapped it from Classifier to Regressor, but it is the same error output
Did u also import classifier
Nope,
from xgboost import XGBRegressor is what I have
If u use classifier you need it imported
Yup, but I need Regressor, that is I why I swapped Classifier out, I retyped my code and put that there on accident haha
Youβre using regression but using accuracy metric?
I am honestly knew to this, so I need to figure out a lot of it, but I need to fix this whole no object error first
Ok wait
I am trying to design a sports model