#data-science-and-ml

1 messages Β· Page 10 of 1

marble ocean
#

do you have an idea?

serene scaffold
marble ocean
#

can keras do that?

marble ocean
#

I dont have that file

twin ravine
#

is this right? second group of synapses omitted because im lazy)

#

forgive the ms paint drawing πŸ˜„

#

where activation function could be anything (like ReLU or sigmoid)

#

and each synapse represents a weight and each neuron/node in a layer (excluding the input layer) has its own bias

mild dirge
#

Yeah that is the general idea

twin ravine
#

so just to confirm, each hidden layer has an "activation function" they apply before outputting to the next layer and the output layer has a "loss function" which calculates the error?

#

with softmax being a "loss function" that normalises the data between 1 and 0 with respect to the value range

mild dirge
#

Softmax is not the loss function

#

It is an activation function

twin ravine
#

so every layer apart from the input layer has an activation function?

mild dirge
#

Yep

#

Normally

wooden sail
#

you could more generally argue that all layers have activation functions, since the identity is a perfectly valid function. this lets you more easily draw parallels between classical algorithms and deep learning

twin ravine
#

"the identity" being a function that outputs the input?

wooden sail
#

yep

twin ravine
#

can you clarify what the loss function is, somewhere above in the chat it was said that the output layer doesnt not use sigmoid but uses softmax instead (or equivalents)

wooden sail
#

btw, softmax doesn't just normalize the data. it has a hyperparameter in the exponent. if you've done any image processing before, you could compare it to the "gamma factor" they use there

#

let's see. what softmax does is approximate what the function argmax would do

twin ravine
#

for reference

def sigmoid(x: np.ndarray):
    return 1 / (1 + np.exp(-x))

def softmax(x: np.ndarray):
    exp_x = np.exp(x - x.max())
    return exp_x / exp_x.sum()
wooden sail
#

argmax takes in a vector, and spits out another vector of the same size where all entries are 0, except for the entry corresponding to the entry with the largest absolute value in the original vector

#

what softmax does is approximate this with a differentiable function. small values are made even smaller, large values are made even larger

#

but this is just the output of the network, it tells you nothing of whether this output is correct

#

so you'd then have to compare the network's output to some reference value

#

that's where the cost function comes in. you'd take the softmax output, or more generally, the overall output the network regardless of the activation func in the last layer, and compare it to a reference

#

the cost function should be chosen so that it correctly captures how good the output of the network is

#

this type of training has to be supervised or semi/selfsupervised

#

i'm not sure your softmax is right btw. the whole point of using softmax is to not use max at all

twin ravine
wooden sail
#

as soon as you use max you give up your differentiability*

twin ravine
#

how so?

wooden sail
#

if you will anyway use max, don't use softmax

#

max is not a differentiable function. we use softmax to avoid using max

twin ravine
#

so argmax is not a differentiable function so we use softmax to approximate it because softmax can be differentiated?

wooden sail
#

that's the idea

twin ravine
#

my implementation of the whole network is wrong sigh, i'll redo it and then try again

#

without the max i get runtime errors

wooden sail
#

well, let's take a step back

#

you're computing the derivatives yourself?

twin ravine
#

i havent got that far (currently implementing forward propagation) but i should be yes

wooden sail
#

ok, then it's ok

twin ravine
#
class Network:
    def __init__(self, *shape: int, labels: list = None, learn_rate=0.01) -> None:
        self.layers = [
            [
                np.random.uniform(-0.3, 0.3, (shape[i], shape[i - 1])),
                np.zeros(shape[i]),
                None,
            ]
            for i in range(1, len(shape))
        ]
        self.labels = labels or list(range(shape[-1]))
        self.learn_rate = learn_rate

    def propagate(self, input: np.ndarray):
        output = [
            input := sigmoid(layer[0] @ input + layer[1])
            for layer in self.layers[:-1]
        ]
        output.append(
            softmax(self.layers[-1][0] @ input + self.layers[-1][1])
        )
        return output

network = Network(784, 387, 387, 10)
this is my [incorrect] "stiff" implementation, ill upgrade it in the future to allow more modularity

wooden sail
#

using max in an automatic differentiator will yield unwanted results. if you do it by hand, then you can exploit the equivalence of the two expressions by taking the derivatives on paper. that's ok

mild dirge
#

argmax(softmax(x)) will give the same results as argmax(x)

wooden sail
#

indeed, but they have softmax( x - argmax x) which is the same as softmax (x), except that it'll trip up the computation graph of autodiffers

twin ravine
#

should the output of every neuron be in the 0 - 1 range?

wooden sail
#

it's not necessary, but it's good for huge networks

shell crest
#

The 0-1 range has probabilistic interpretations

wooden sail
#

in the final layer yes, but in between it's just a nice-to-have so that the gradients don't explode. it's hard to know ahead of time how the gradients will behave

shell crest
#

0-1 range does not mean bounded gradients though?

#

unless 0-1 range smooth function means bounded gradients, I admit I don't know this

twin ravine
#

so my output layer shouldn't use a sigmoid function but softmax instead allowing me to compare the output to a desired output?

#
        output = [
            input := sigmoid(layer[0] @ input + layer[1])
            for layer in self.layers[:-1]
        ]
        output.append(
            softmax(self.layers[-1][0] @ input + self.layers[-1][1])
        )
```like so?
#

ah that looks confusing probably, layer[0] are the weights, layer[1] are the biases

heavy crow
#

I created a image similarity search engine using python!
Here is me searching for similar images to the one in the top left πŸ™‚

#

it uses a neural network to create image embeddings that can later be searched

spare briar
heavy crow
#

performance is pretty good as well! I can search 1.5 million images in ~1.5ms. Inserting a new image takes 2-3ms

spare briar
#

what is the model?

heavy crow
#

the slowest part is generating the embeddings, takes around 150ms on the cpu and 10 on a gpu

#

efficientnetv2 B1 in this case, because i wanted to run inference on a cpu

spare briar
#

are they just pretrained imagenet embeddings?

heavy crow
#

but swapping the model is as easy as changing the model url!

#

yup, imagenet21k in this case

spare briar
#

cool cool

heavy crow
#

all images im using are not from imagenet21k though πŸ™‚

#

im still a little on the fence between using cosine similarity or L2 to find similar pictures

#

its hard to compare "similarity" 😦

spare briar
#

supervised imagenet embeddings do not necessarily have meaningful distances

#

maybe contrastive pretraining or more modern joint embedding architecture

heavy crow
heavy crow
spare briar
#

you dont need it πŸ™‚

#

can be fully unsupervised

heavy crow
#

oh really?!

spare briar
#

yeah read that paper i linked

#
heavy crow
#

thank you so much!

spare briar
#

but the latter two are for understanding, use vicreg in practice

heavy crow
#

do you think these kind networks can perform at production latency? The vicreg paper uses a very large network :/

spare briar
#

i think you can get imagenet21k pretrained vicreg weights open source

heavy crow
#

oh wow

spare briar
#

it works with any size network

heavy crow
#

yeah, but do you think the results are any good?

spare briar
#

inference would be forward pass on a single model

#

so exactly the same latency as you currently use

#

they should be better

#

at least they are for us, we use this in prod

heavy crow
#

awesome!

#

I have to say that this field of computer vision is just amazing. even if i didnt do it at work i would still look into it!

spare briar
#

also if you want to do search over images that arent in the imagenet distribution you can finetune this with no labels to include whatever image distribution you want

worthy hollow
#

hey there! Anyone familiar with pandas and datetime? I have a problem that i havnt been able to fix for last 3 days

shell crest
#

personally I only use ml, rather than dev it. I've never really tried any architecture for CV before though

#

I've randomly tried BERT, for sentiment analysis

serene scaffold
heavy crow
#

do you have any other tips for running this in prod?

#

my current stack is tf-serve for inference and redis for search

spare briar
#

tensorrt optimizations + onnx

#

ideally implement in a framework that gives you xla compilation like jax

heavy crow
#

although probably i will switch to solr because its already widely used at work

#

do XLA and JAX help if inference is done on the CPU?

worthy hollow
spare briar
#

yep

#

they will fuse layers together through low level primitives effectively decreasing model size

worthy hollow
#

INPUT ```py
import pandas as pd

input1 = pd.DataFrame({"Date1": ['31/10/2008', '03/01/2009', '10/04/2013'],
"Date2": ['01/03/2009', '10/04/2013', '03/07/2013'],
"1": [' ', ' ', ' '],
"2": [' ', ' ', ' '],
"3": [' ', ' ', ' ']})

print(input1)**OUTPUT**py
import pandas as pd

Here is the operation used to get the output results as dates in future...

We want this operation to be applied in a for loop and make it for all columns except Date ones...

operation = input1['Date2'] + ((input1['Date2'] - input1['Date1']) * (1.618 ** input1.columns[2:]))

output = pd.DataFrame({"Date1": ['31/10/2008', '03/01/2009', '10/04/2013'],
"Date2": ['01/03/2009', '10/04/2013', '03/07/2013'],
"1": ['16/07/2009', '04/03/2020', '15/11/2013'],
"2": ['19/06/2009', '09/06/2024', '07/02/2014'],
"3": ['01/10/2009', '05/05/2031', '23/06/2014']})

print(output)
**OPERATION USED FOR OUTPUT**py
Here is the operation used to get the output results as dates in future...

operation = input1['Date2'] + ((input1['Date2'] - input1['Date1']) * (1.618 ** input1.columns[2:]))
so at first we want to do the difference between (input1['Date2'] - input1['Date1']) in term of days

then we multiply those days by 1.618 ** input1[col][2:] (so multiply the number of days by 1.618 to the column power (1, 2, 3, etc)
so far our operation should be just number of days, and it should be written like: ((input1['Date2'] - input1['Date1']) * (1.618 ** input1.columns[2:]))

then we want to add those calculated days to input1['Date2']
final operation is: input1['Date2'] + ((input1['Date2'] - input1['Date1']) * (1.618 ** input1.columns[2:]))
**here the full operation :**py
x5 = pd.to_datetime(m['Date1'])
x6 = pd.to_datetime(m['Date2'])

operation = x6 + ((x6 - x5) * (1.618 ** m.columns[2:]))
operation
**HERE'S THE ERROR IT GIVES:**py

TypeError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_17100/1181823334.py in <module>
9 float_list = list(map(float, x))
10
---> 11 operation = x6 + ((x6 - x5) * (1.618 ** float_list))
12 operation

TypeError: unsupported operand type(s) for ** or pow(): 'float' and 'list'

heavy crow
#

as always compute is a limited resource and i cant just request 8 A100s for inference hahah

spare briar
#

i wouldnt worry about that part

#

just use tensorrt or something for quantization

#

play w quantization aware training too

heavy crow
#

have you done any testing as far as size of the embedding goes?

shell crest
#

@worthy hollow I think you should open a help channel

serene scaffold
heavy crow
#

since similar is such a broad term im guessing you have a lot of leeway

spare briar
#

if youre using imagenet then you should use the embedding size from vicreg paper

shell crest
#

yeah but it's so separated

serene scaffold
#

@worthy hollow please say everything in one message.

spare briar
#

i think they show ablations (single linear layer performance from different embedding sizes) and you can see performance implication of decreasing it so your knn is faster/in lower dimension

worthy hollow
serene scaffold
heavy crow
#

knn speed seems to be fine even for 4k dimensions, but i start to run out of ram.

#

but thank you so much for those papers, don't know why i hadn't thought of that

spare briar
#

we use <4k dimensions

misty flint
#

"learning hadoop is like learning latin"

#

ahhh im dead

serene scaffold
misty flint
#

πŸ’€

heavy crow
misty flint
heavy crow
#

Redis and Solr both use Navigable hierarchical small worlds, that seems to be the way to go?

misty flint
#

and probably slightly more useful. jk

spare briar
misty flint
#

there have been some great development in the approximate nearest neighbors space

#

combined with vector search and its a great combo

shell crest
#

What's 1.618?

worthy hollow
#

a variable that stay fix

serene scaffold
# worthy hollow alright i think its done

thanks! two things immediately jump out at me:

  1. your dates are encoded as strings. which mean that they have nothing to do with moments in time, any more than "foobar".
  2. never start with the assumption that the solution to your pandas problem involves a loop. assume that you can solve it without loops, and wait as long as you can to be proven wrong.
spare briar
worthy hollow
#

wait lemme show u the error it gives me

shell crest
worthy hollow
#

updated the error

serene scaffold
#

looks like you expected m.columns[2:] to be a pandas object, but it's a list.

worthy hollow
serene scaffold
arctic wedgeBOT
#

@serene scaffold :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | [[2. 2. 2. 2.]
002 |  [2. 2. 2. 2.]
003 |  [2. 2. 2. 2.]]
004 | [[ 2.  4.  8. 16.]
005 |  [ 2.  4.  8. 16.]
006 |  [ 2.  4.  8. 16.]]
serene scaffold
#

@worthy hollow see what's happening here?

worthy hollow
#

honestly not really i'm kinda bad with numpy

#

can u explain a bit bro

serene scaffold
#

it's raising everything in the first column to the power of 1, then everything in the second column to the power of 2, etc.

worthy hollow
#

ohhhh

#

greattttttt

wooden sail
tropic matrix
#

when using mse as loss it returns inf, but when i made a custom metric function using the sklearn mse function, it returns around 300-400 which while it's high, isn't anywhere close to infinity.

what should I do to troubleshoot this?

wooden sail
worthy hollow
serene scaffold
#

the right bound isn't included.

worthy hollow
#

**ok so i have this code now **: ```py
arr = np.ones((3, 4)) * 2
result = arr ** np.arange(1, 11)

x5 = pd.to_datetime(m['Date1'])
x6 = pd.to_datetime(m['Date2'])

x = m.columns[2:]

float_list = list(map(float, x))

operation = x6 + ((x6 - x5) * (1.618 ** result))
operation**which brings this error:**py

ValueError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_17100/578892415.py in <module>
1 arr = np.ones((3, 4)) * 2
----> 2 result = arr ** np.arange(1, 11)
3
4 x5 = pd.to_datetime(m['Date1'])
5 x6 = pd.to_datetime(m['Date2'])

ValueError: operands could not be broadcast together with shapes (3,4) (10,) ```

wooden sail
#

where'S the array of size (3,4) coming from?

#

what do you hope to do with it

tropic matrix
wooden sail
tropic matrix
#

for the output, and the input is through StandardScaler

#

the only thing i could guess is an architecture issue, but that doesn't explain why sklearn doesn't return infinity

wooden sail
#

what solver are you using, then? it could be your initial step size is too big

worthy hollow
#

honestly i dont even see which one is shape (3,4) in my dataframe thats weird --- here's an explaination on what i want to do

tropic matrix
worthy hollow
#

i want based off the Date1 & Date2 and the equation "operation", generate all those dates in the columns [1, 2, 3, 4, ..., 10]

worthy hollow
wooden sail
#

do you want 10 dates to be generated?

#

and for how many rows?

worthy hollow
#

no not only 10 - i want it to generate all dates for all the rows and columns, there is 178 rows * 10 iterable columns so it should be 1780 dates total for the whole dataframe

shell crest
#

I don't want to be that guy but I have to be.
This isn't a pandas problem - it's a programming one.

worthy hollow
#

ah, thought it was pandas as it is in a df

wooden sail
#

then you want to broadcast this operation over all the rows. i think you took stelercus' example too literally

#

you don't need arr at all, if i understood the problem correctly. and you'll still have problems, idk how nicely numpy plays when doing arithmetic with dates

shell crest
worthy hollow
#

would love to sadly i'm lacking experience in both numpy and pandas to do so, this has been blocking me for 3 days hence why i joined the server and tried to find some help

#

but thanks a lot you guys have figured out what was the problem quite more faster than me

wooden sail
#

try letting powercols be just np.arange(1,11)

#

let's see if numpy likes dates

worthy hollow
#

**ok so this code: **```py
Date1 = pd.to_datetime(m['Date1'])
Date2 = pd.to_datetime(m['Date2'])

power_cols = np.arange(1,11)

operation = Date2 + ((Date2 - Date1) * (1.618 ** power_cols))
operation**give's me this error:**py

ValueError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_17100/1852475972.py in <module>
5 power_cols = np.arange(1,11)
6
----> 7 operation = Date2 + ((Date2 - Date1) * (1.618 ** power_cols))
8 operation

c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\ops\common.py in new_method(self, other)
67 other = item_from_zerodim(other)
68
---> 69 return method(self, other)
70
71 return new_method

c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\arraylike.py in mul(self, other)
106 @unpack_zerodim_and_defer("mul")
107 def mul(self, other):
--> 108 return self._arith_method(other, operator.mul)
109
110 @unpack_zerodim_and_defer("rmul")

c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\series.py in _arith_method(self, other, op)
5524
5525 with np.errstate(all="ignore"):
-> 5526 result = ops.arithmetic_op(lvalues, rvalues, op)
5527
5528 return self._construct_result(result, name=res_name)

c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\ops\array_ops.py in arithmetic_op(left, right, op)
216 # Timedelta/Timestamp and other custom scalars are included in the check
217 # because numexpr will fail on it, see GH#31457
--> 218 res_values = op(left, right)
219 else:
220 # TODO we should handle EAs consistently and move this check before the if/else

c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\ops\common.py in new_method(self, other)
67 other = item_from_zerodim(other)
68
---> 69 return method(self, other)
70
71 return new_method

c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\arrays\timedeltas.py in mul(self, other)
512 # Exclude timedelta64 here so we correctly raise TypeError
...
--> 514 raise ValueError("Cannot multiply with unequal lengths")
515
516 if is_object_dtype(other.dtype):

ValueError: Cannot multiply with unequal lengths

wooden sail
#

reshape to size (1,10)

worthy hollow
# wooden sail reshape to size (1,10)
--> 514             raise ValueError("Cannot multiply with unequal lengths")
    515 
    516         if is_object_dtype(other.dtype):

ValueError: Cannot multiply with unequal lengths
#

i've reshape power_cols right?

wooden sail
#

mhm

#

what shape are the dates

#

those probably have to have an extra axis explicitly, too

worthy hollow
#

date1 & date2 same shape

wooden sail
#

yeah they need to be size (x, 1)

worthy hollow
#

whole dataframe shape

worthy hollow
#

how can i reshape a panda column ? it's not like with np right <-- here's the main problem (@serene scaffold)

wooden sail
#

i have no idea tbh, all of my pandas knowledge is based on the assumption that many numpy stuff has an equivalent, but i've never used it myself

#

let's see if stelercus or someone else materializes

worthy hollow
#

alright thanks a lot for your help! And yeah lets see if someone has a clue thatd be great <@&267630620367257601> (for the above question in bold)

modest onyx
#

hi

#

do you guys think it's unnecessary to refer to Wx + b as affine and just call it linear?

#

apparently in data science and stats, it's they can be considered the same since we can just stack one at the end of x and stack b to the right of our matrix

shell crest
#

It's not unnecessary because the affine transformations are not necessarily linear transformations

modest onyx
#

but when I pointed that out to an article calling Wx + b linear transformations, somebody told me that in stats and ml they are considered the same

wooden sail
hushed stag
#

sites for learning data science

wooden sail
#

what you mentioned is proof in itself: you can make an isomorphic linear transformation, yes, but it requires a higher dimensional space to do so, as well as homogeneous coordinates

#

this effectively maps translations in n dimensional space to shears in n+1 dimensional space. translations are affine, shears are linear

serene scaffold
arctic wedgeBOT
#
Resources

The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.

modest onyx
#

never really thought about the intiution behind it but that's pretty interesting

hushed stag
wooden sail
#

again, they're isomorphic and so in a very real sense "the same thing", but that extra dimension is there

modest onyx
#

but would you say that the author is at fault for calling Wx + b a linear transformation?

wooden sail
#

100%, yes

shell crest
wooden sail
#

Wx + b does not satisfy the definition of linearity

#

it represents intersections of affine planes/flats, instead of planes crossing the origin

#

the geometric interpretation is different

#

and it's also not just any n+1 dimensional space, it's a projective geometry where the last dimension is normalized

#

some care is needed

modest onyx
#

very interesting

worthy hollow
steady basalt
#

Use reshape

worthy hollow
# steady basalt Use reshape
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_17100/3198534513.py in <module>
----> 1 m['Date1'].reshape(178, 1)

c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\generic.py in __getattr__(self, name)
   5485         ):
   5486             return self[name]
-> 5487         return object.__getattribute__(self, name)
   5488 
   5489     def __setattr__(self, name: str, value) -> None:

AttributeError: 'Series' object has no attribute 'reshape' ```
lapis sequoia
#

ahem

steady basalt
#

Make it an array first

worthy hollow
#
m['Date1'].reshape(178, 1)``` to make it into an array i have to do .values right?
#
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_17100/1205480011.py in <module>
      1 m['Date1'] = m['Date1'].values
----> 2 m['Date1'].reshape(178, 1)

c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\generic.py in __getattr__(self, name)
   5485         ):
   5486             return self[name]
-> 5487         return object.__getattribute__(self, name)
   5488 
   5489     def __setattr__(self, name: str, value) -> None:

AttributeError: 'Series' object has no attribute 'reshape'
steady basalt
#

Ur calling a column on an array

#

Values creates a list as is

#

From that column

#

So stop trying to find a series from a list or array

worthy hollow
#

so how can i fix it?

serene scaffold
serene scaffold
worthy hollow
worthy hollow
serene scaffold
#

and don't try to put it back in the dataframe after that. it's now an array that exists entirely separately from it.

north crystal
#

Hello, I have a problem with cv2 when launching my exe script so I type the command "python3 -m pip install opencv-python==4.5.3.56" and I got a 2nd error

#

the photos are coming

worthy hollow
#

i'm sorry but i'm really bad at this so i don't understand fully

north crystal
#

sorry if I take time to answer I speak bad English

serene scaffold
worthy hollow
#
operation = d2 + ((d2 - d1) * (1.618 ** power_cols))```
serene scaffold
#

please don't post screenshots of text. copy and paste the text as text. I won't look at any screenshots after this one.

worthy hollow
#

ok sorry

#

what i meant is

serene scaffold
worthy hollow
serene scaffold
#

you could have copied this as text. Please stop posting screenshots.

worthy hollow
#

sorry too used i will

#

Ok so this code works :```py
Date1 = pd.to_datetime(m['Date1'])
Date2 = pd.to_datetime(m['Date2'])

d1 = m['Date1'].to_numpy().reshape(-1, 1)
d2 = m['Date2'].to_numpy().reshape(-1, 1)

power_cols = np.arange(1,10)

operation = d2 + ((d2 - d1) * (1.618 ** power_cols))

operation

#

it generate this output:

#
C:\Users\PEGON\AppData\Local\Temp/ipykernel_17100/364403496.py:6: RuntimeWarning:

invalid value encountered in multiply

array([['2009-09-12T18:40:19.200000002', '2010-01-11T18:27:04.665600004',
        '2010-07-26T12:45:58.308940816', ...,
        '2018-10-12T15:37:18.511915712', '2024-09-21T11:15:36.312279616',
        '2034-05-04T20:28:29.353268480'],
       ['2021-03-11T00:05:45.600000000', '2025-10-13T21:02:07.180800064',
        '2033-03-20T16:10:44.978534528', ...,
        '2147-02-16T15:07:08.289210880', '2229-07-21T00:50:47.371943936',
                                  'NaT'],
       ['2012-03-30T14:26:52.800000000', '2011-09-01T14:49:58.310399992',
        '2010-09-25T05:54:12.866227184', ...,
        '1996-05-29T15:12:17.801535424', '1986-01-17T03:46:10.562884224',
        '1969-04-11T06:08:49.970746624'],
       ...,
       ['2022-08-14T06:00:00.000000002', '2022-12-17T05:46:19.200000004',
        '2023-07-07T11:24:11.145600016', ...,
        '2031-12-31T18:31:06.231317888', '2038-02-20T08:27:31.562272384',
        '2048-01-27T20:21:29.827756672'],
       ['2022-10-02T01:09:07.200000000', '2023-02-15T00:54:14.169600004',
        '2023-09-23T01:39:16.446412816', ...,
        '2032-12-16T02:29:02.459673856', '2039-08-22T00:45:18.499752320',
        '2050-06-12T20:02:16.132599296'],
       ['2022-08-24T20:32:38.400000000', '2022-11-24T20:22:34.291200004',
        '2023-04-22T16:38:55.243161608', ...,
        '2029-07-20T18:37:17.546249952', '2034-01-26T07:34:10.749832448',
        '2041-05-19T21:19:10.913228928']], dtype='datetime64[ns]')```
#

i want this final output to be in a dataframe like the m dataframe: ```py
Date1 Date2 1 2 3 4 5 6 7 8 9 10
0 2008-10-31 2009-03-01 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 2009-03-01 2013-10-04 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 2013-10-04 2013-03-07 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 2013-03-07 2013-05-10 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 2013-05-10 2013-11-13 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ...
173 2021-07-20 2021-10-11 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
174 2021-07-09 2021-12-27 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
175 2021-09-21 2022-01-24 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
176 2021-10-11 2022-02-24 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
177 2021-12-27 2022-03-29 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

serene scaffold
#

So, this won't work because d1 and d2 have shapes like (178, 1), or something like that, and the shape of power_cols is (9,)

worthy hollow
north crystal
#

You have an idea for my problem ? please

serene scaffold
#

or are those supposed to be created by this procedure?

worthy hollow
#

yes the numbered columns are suppposed to recieve the dates

#

**lets say this first line : **py Date1 Date2 1 2 3 4 5 6 7 8 9 10 0 2008-10-31 2009-03-01 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN it should then take for every rows/columns the right dates so lets say : ```py
Date1 Date2 1 2 3 4 5 6 7 8 9 10
0 2008-10-31 2009-03-01 2009-09-12 2010-07-26 2018-10-12 2034-05-04 2021-03-11 2033-03-20 2010-01-11 2010-01-11 2010-01-11 2010-01-11

worthy hollow
serene scaffold
#
m['Date1'] = d1 = pd.to_datetime(m['Date1'])
m['Date2'] = d2 = pd.to_datetime(m['Date2'])

operation = d2.sub(d1).reshape(-1, 1).mul(1.618) ** np.arange(1, 11).reshape(1, -1)
result = m.join(operation + d2)

print(result)

try this.

#

if you do print(df.head().to_dict()) and put it in the pastebin, I will verify that this works

#

!paste

arctic wedgeBOT
#

Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

serene scaffold
#

@worthy hollow

worthy hollow
# serene scaffold if you do `print(df.head().to_dict())` and put it in the pastebin, I will verify...
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_17100/1093129255.py in <module>
      5 m['Date2'] = d2 = pd.to_datetime(m['Date2'])
      6 
----> 7 operation = d2.sub(d1).reshape(-1, 1).mul(1.618) ** np.arange(1, 11).reshape(1, -1)
      8 result = m.join(operation)
      9 

c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\generic.py in __getattr__(self, name)
   5485         ):
   5486             return self[name]
-> 5487         return object.__getattribute__(self, name)
   5488 
   5489     def __setattr__(self, name: str, value) -> None:

AttributeError: 'Series' object has no attribute 'reshape'
#

code used:```py
m['Date1'] = d1 = pd.to_datetime(m['Date1'])
m['Date2'] = d2 = pd.to_datetime(m['Date2'])

operation = d2.sub(d1).reshape(-1, 1).mul(1.618) ** np.arange(1, 11).reshape(1, -1)
result = m.join(operation)

print(result.head().to_dict())

serene scaffold
worthy hollow
#
{'Date1': {0: Timestamp('2008-10-31 00:00:00'), 1: Timestamp('2009-03-01 00:00:00'), 2: Timestamp('2013-10-04 00:00:00'), 3: Timestamp('2013-03-07 00:00:00'), 4: Timestamp('2013-05-10 00:00:00')}, 'Date2': {0: Timestamp('2009-03-01 00:00:00'), 1: Timestamp('2013-10-04 00:00:00'), 2: Timestamp('2013-03-07 00:00:00'), 3: Timestamp('2013-05-10 00:00:00'), 4: Timestamp('2013-11-13 00:00:00')}, '1': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan}, '2': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan}, '3': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan}, '4': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan}, '5': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan}, '6': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan}, '7': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan}, '8': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan}, '9': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan}, '10': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan}}
serene scaffold
#
d2.sub(d1).mul(1.618).to_numpy(dtype=int).reshape(-1, 1) ** np.arange(1, 11).reshape(1, -1)
shell crest
#

Is there a real need for the dot chain?

serene scaffold
worthy hollow
# serene scaffold ```py d2.sub(d1).mul(1.618).to_numpy(dtype=int).reshape(-1, 1) ** np.arange(1, 1...
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_17100/917876790.py in <module>
      2 m['Date2'] = d2 = pd.to_datetime(m['Date2'])
      3 
----> 4 operation = d2 + (d2.sub(d1).mul(1.618).to_numpy(dtype=int).reshape(-1, 1) ** np.arange(1, 11).reshape(1, -1))
      5 result = m.join(operation)
      6 

c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\ops\common.py in new_method(self, other)
     67         other = item_from_zerodim(other)
     68 
---> 69         return method(self, other)
     70 
     71     return new_method

c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\arraylike.py in __add__(self, other)
     90     @unpack_zerodim_and_defer("__add__")
     91     def __add__(self, other):
---> 92         return self._arith_method(other, operator.add)
     93 
     94     @unpack_zerodim_and_defer("__radd__")

c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\series.py in _arith_method(self, other, op)
   5524 
   5525         with np.errstate(all="ignore"):
-> 5526             result = ops.arithmetic_op(lvalues, rvalues, op)
   5527 
   5528         return self._construct_result(result, name=res_name)

c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\ops\array_ops.py in arithmetic_op(left, right, op)
    216         # Timedelta/Timestamp and other custom scalars are included in the check
    217         # because numexpr will fail on it, see GH#31457
--> 218         res_values = op(left, right)
    219     else:
    220         # TODO we should handle EAs consistently and move this check before the if/else

c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\ops\common.py in new_method(self, other)
     67         other = item_from_zerodim(other)
     68 
---> 69         return method(self, other)
     70 
     71     return new_method

c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\arrays\datetimelike.py in __add__(self, other)
...
-> 1282                 raise integer_op_not_supported(self)
   1283             result = self._addsub_int_array(other, operator.add)
   1284         else:

TypeError: Addition/subtraction of integers and integer-arrays with DatetimeArray is no longer supported.  Instead of adding/subtracting `n`, use `n * obj.freq`
serene scaffold
worthy hollow
# serene scaffold ```py d2.sub(d1).mul(1.618).to_numpy(dtype=int).reshape(-1, 1) ** np.arange(1, 1...
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_17100/23000354.py in <module>
      3 
      4 operation = d2.sub(d1).mul(1.618).to_numpy(dtype=int).reshape(-1, 1) ** np.arange(1, 11).reshape(1, -1)
----> 5 result = m.join(operation)
      6 
      7 print(result.head().to_dict())

c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\frame.py in join(self, other, on, how, lsuffix, rsuffix, sort)
   9097         5  K5  A5  NaN
   9098         """
-> 9099         return self._join_compat(
   9100             other, on=on, how=how, lsuffix=lsuffix, rsuffix=rsuffix, sort=sort
   9101         )

c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\frame.py in _join_compat(self, other, on, how, lsuffix, rsuffix, sort)
   9146             frames = [self] + list(other)
   9147 
-> 9148             can_concat = all(df.index.is_unique for df in frames)
   9149 
   9150             # join indexes only using concat

c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\frame.py in <genexpr>(.0)
   9146             frames = [self] + list(other)
   9147 
-> 9148             can_concat = all(df.index.is_unique for df in frames)
   9149 
   9150             # join indexes only using concat

AttributeError: 'numpy.ndarray' object has no attribute 'index'
serene scaffold
serene scaffold
worthy hollow
#

I think the operation you made is not exactly the same as the one i made

shell crest
#

To be honest I'm not sure, but vectorising this looks difficult

serene scaffold
shell crest
#

I would define a function that does it on 1 row/1 entry then vectorise it.
I'd only keep the dot chain in performance critical situations if it does better

serene scaffold
serene scaffold
shell crest
#

Readability, documentation

serene scaffold
#

do you have an aversion to solutions that involve broadcasting?

worthy hollow
serene scaffold
#

@worthy hollow using the method chain makes the order of operations linear, if that's what you were worried about.

shell crest
serene scaffold
shell crest
#

You need a different line if you need line-specific comments

#

You can also use a long comment before/after the functions

wooden sail
#

out of curiosity, what's the precedence order for the last .reshape(1,-1) there? is it applied to everything on the left or just to the np arange?

shell crest
serene scaffold
worthy hollow
#

btw @serene scaffold when i try to add d2 to the whole operation : ```py
m['Date1'] = d1 = pd.to_datetime(m['Date1'])
m['Date2'] = d2 = pd.to_datetime(m['Date2'])

operation = d2 + (d2.sub(d1).mul(1.618).to_numpy(dtype=int).reshape(-1, 1) ** np.arange(1, 11).reshape(1, -1))
print(operation)
**it gives me this error:**py

TypeError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_17100/3314366202.py in <module>
2 m['Date2'] = d2 = pd.to_datetime(m['Date2'])
3
----> 4 operation = d2 + (d2.sub(d1).mul(1.618).to_numpy(dtype=int).reshape(-1, 1) ** np.arange(1, 11).reshape(1, -1))
5 operation

c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\ops\common.py in new_method(self, other)
67 other = item_from_zerodim(other)
68
---> 69 return method(self, other)
70
71 return new_method

c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\arraylike.py in add(self, other)
90 @unpack_zerodim_and_defer("add")
91 def add(self, other):
---> 92 return self._arith_method(other, operator.add)
93
94 @unpack_zerodim_and_defer("radd")

c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\series.py in _arith_method(self, other, op)
5524
5525 with np.errstate(all="ignore"):
-> 5526 result = ops.arithmetic_op(lvalues, rvalues, op)
5527
5528 return self._construct_result(result, name=res_name)

c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\ops\array_ops.py in arithmetic_op(left, right, op)
216 # Timedelta/Timestamp and other custom scalars are included in the check
217 # because numexpr will fail on it, see GH#31457
--> 218 res_values = op(left, right)
219 else:
220 # TODO we should handle EAs consistently and move this check before the if/else

c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\ops\common.py in new_method(self, other)
67 other = item_from_zerodim(other)
68
---> 69 return method(self, other)
70
71 return new_method

c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\arrays\datetimelike.py in add(self, other)
1280 elif is_integer_dtype(other_dtype):
...
-> 1282 raise integer_op_not_supported(self)
1283 result = self._addsub_int_array(other, operator.add)
1284 else:

TypeError: Addition/subtraction of integers and integer-arrays with DatetimeArray is no longer supported. Instead of adding/subtracting n, use `n * obj.freq```

serene scaffold
#

for broadcasting?

wooden sail
#

yeah i just reread it, i had also suggested to do it that way lol i just misread

desert oar
#

.reshape(1, -1) is the same as [np.newaxis, :] right?

serene scaffold
desert oar
#

yeah i just use None

#

i also tend to pass a tuple to reshape

#

although i've started writing newaxis more because it seems idiomatic and because other numpy people seem to prefer it

shell crest
serene scaffold
desert oar
#

!e ```python
import numpy as np
x = np.arange(5)
print(x.reshape((1, -1)))
print(x[np.newaxis, :])
print(x[None, :])

arctic wedgeBOT
#

@desert oar :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | [[0 1 2 3 4]]
002 | [[0 1 2 3 4]]
003 | [[0 1 2 3 4]]
wooden sail
#

i have no preference there, other than noting that -1 can be slower for multidimensional arrays

desert oar
#

TIL, i assumed they'd be equivalent

#

my problem is that i can never remember where the None/newaxis goes, whereas reshape works better with how my brain works. i prefer "declaring" the shape rather than "appending to" the shape

serene scaffold
wooden sail
#

yep

worthy hollow
#

ahahah those devs talks, i wish i be at your level and enjoy the debate along you pals

desert oar
#

you'd think that you could add a special case path for reshape to reduce to newaxis though, right?

serene scaffold
worthy hollow
shell crest
#

@worthy hollow Your error should tell you what's wrong

#

You need to fundamentally learn the process of coding

shell crest
#

honestly you can just google every error, and think from there. I don't know syntax of most things for most tasks

#

except at the very point I do them

worthy hollow
#

@serene scaffold ```py
array([[ 1605812226, 249233412, -594477048, ..., -620756736,
1824522752, 1191183360],
[-2009956352, 1073741824, 0, ..., 0,
0, 0],
[ 216907776, 268435456, 0, ..., 0,
0, 0],
...,
[ -861290494, 1118240772, -134938616, ..., -1493171968,
-138411520, 1124074496],
[ 2017853440, 0, 0, ..., 0,
0, 0],
[ -908787712, 0, 0, ..., 0,
0, 0]], dtype=int32)

#

bcuz when i add d2 to the whole equation it give me this error: ```py

TypeError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_17100/2672773952.py in <module>
2 m['Date2'] = d2 = pd.to_datetime(m['Date2'])
3
----> 4 operation = d2 + (d2.sub(d1).mul(1.618).to_numpy(dtype=int).reshape(-1, 1) ** np.arange(1, 11).reshape(1, -1))
5
6 operation

c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\ops\common.py in new_method(self, other)
67 other = item_from_zerodim(other)
68
---> 69 return method(self, other)
70
71 return new_method

c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\arraylike.py in add(self, other)
90 @unpack_zerodim_and_defer("add")
91 def add(self, other):
---> 92 return self._arith_method(other, operator.add)
93
94 @unpack_zerodim_and_defer("radd")

c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\series.py in _arith_method(self, other, op)
5524
5525 with np.errstate(all="ignore"):
-> 5526 result = ops.arithmetic_op(lvalues, rvalues, op)
5527
5528 return self._construct_result(result, name=res_name)

c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\ops\array_ops.py in arithmetic_op(left, right, op)
216 # Timedelta/Timestamp and other custom scalars are included in the check
217 # because numexpr will fail on it, see GH#31457
--> 218 res_values = op(left, right)
219 else:
220 # TODO we should handle EAs consistently and move this check before the if/else

c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\ops\common.py in new_method(self, other)
67 other = item_from_zerodim(other)
68
---> 69 return method(self, other)
70
71 return new_method

c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\arrays\datetimelike.py in add(self, other)
...
-> 1282 raise integer_op_not_supported(self)
1283 result = self._addsub_int_array(other, operator.add)
1284 else:

TypeError: Addition/subtraction of integers and integer-arrays with DatetimeArray is no longer supported. Instead of adding/subtracting n, use n * obj.freq

serene scaffold
worthy hollow
#

but values in what? seconds?

#
m['Date1'] = d1 = pd.to_datetime(m['Date1'])
m['Date2'] = d2 = pd.to_datetime(m['Date2'])

operation = d2 + (d2.sub(d1).mul(1.618).to_numpy(dtype=int).reshape(-1, 1) ** np.arange(1, 11).reshape(1, -1))

operation
``` i need to get this code to work with the d2 + (operation) so it can add each values to d2
serene scaffold
#

I guess. whatever unit d2 - d1 returns.

#

and then that, converted to ints

#

I guess that could be problematic, since you multiply by a float, and then lose the precision

#

try changing that one part to .to_numpy(dtype=float)

worthy hollow
# serene scaffold try changing that one part to `.to_numpy(dtype=float)`
---------------------------------------------------------------------------
UFuncTypeError                            Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_17100/3549139233.py in <module>
      2 m['Date2'] = d2 = pd.to_datetime(m['Date2'])
      3 
----> 4 operation = d2 + (d2.sub(d1).mul(1.618).to_numpy(dtype=float).reshape(-1, 1) ** np.arange(1, 11).reshape(1, -1))
      5 
      6 operation

c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\ops\common.py in new_method(self, other)
     67         other = item_from_zerodim(other)
     68 
---> 69         return method(self, other)
     70 
     71     return new_method

c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\arraylike.py in __add__(self, other)
     90     @unpack_zerodim_and_defer("__add__")
     91     def __add__(self, other):
---> 92         return self._arith_method(other, operator.add)
     93 
     94     @unpack_zerodim_and_defer("__radd__")

c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\series.py in _arith_method(self, other, op)
   5524 
   5525         with np.errstate(all="ignore"):
-> 5526             result = ops.arithmetic_op(lvalues, rvalues, op)
   5527 
   5528         return self._construct_result(result, name=res_name)

c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\ops\array_ops.py in arithmetic_op(left, right, op)
    216         # Timedelta/Timestamp and other custom scalars are included in the check
    217         # because numexpr will fail on it, see GH#31457
--> 218         res_values = op(left, right)
    219     else:
    220         # TODO we should handle EAs consistently and move this check before the if/else

UFuncTypeError: ufunc 'add' cannot use operands with types dtype('<M8[ns]') and dtype('float64')
serene scaffold
#

I've never done this kind of time math. maybe someone else has.

worthy hollow
#

ahh nevermind, i think i'm pretty fcked here, as you mentioned it's pretty hard to find people who are good on time math

#

thanks a LOT for your time tho, you really advanced me in my research, i know that it's a time problem now? idk

shell crest
#

Shouldn't it (the final one) just be a timedelta that is multiplied by approximately 122

worthy hollow
#

well actually

#

i would have love to make something alike with timedelta, as the whole operation need to be added in term of days to the actual d2 dates, as this code suggest:

#
m['Date1'] = d1 = pd.to_datetime(m['Date1'])
m['Date2'] = d2 = pd.to_datetime(m['Date2'])

op = (d2.sub(d1).mul(1.618).to_numpy(dtype=float).reshape(-1, 1) ** np.arange(1, 11).reshape(1, -1))
operation = d2 + pd.Timedelta(op)

operation
#

but this give me this error ```py

ValueError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_17100/3710684628.py in <module>
3
4 op = (d2.sub(d1).mul(1.618).to_numpy(dtype=float).reshape(-1, 1) ** np.arange(1, 11).reshape(1, -1))
----> 5 operation = d2 + pd.Timedelta(op)
6
7 operation

c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas_libs\tslibs\timedeltas.pyx in pandas._libs.tslibs.timedeltas.Timedelta.new()

ValueError: Value must be Timedelta, string, integer, float, timedelta or convertible, not ndarray

heavy crow
#

@spare briar hope its ok to ping you, i've read through the paper you sent, very cool stuff! I have one question though:
When a large batch size is selected isnt the probability of two images in that batch size being similar pretty high? Since all other images in the batch are considered negative examples i would imagine that this causes a problem?

shell crest
heavy crow
#

I mean empirically it doesnt as their results show actually its quite the opposite with larger batch sizes being a lot better, but i would have expected it to cause problems. Any insight on why?
edit: ignore my question. the next paper addresses this!

shell crest
#

It looks to me pd-np interaction is not very obvious

#

of the two, pd should be more user friendly

#

but I'm not sure how much accomodation do they make for each other

worthy hollow
#

this is so annoying man to be stuck on such for a while and to don't even understand what is the very cause of the problem 😦

north crystal
heavy crow
#

next time please dont send a screenshot but copy the error. looks like you havent installed cv2. Also this is probably the wrong channel.

#

pip install opencv-python

shell crest
arctic wedgeBOT
#

@shell crest :x: Your 3.11 eval job has completed with return code 1.

001 | Traceback (most recent call last):
002 |   File "<string>", line 3, in <module>
003 |   File "pandas/_libs/tslibs/timedeltas.pyx", line 1361, in pandas._libs.tslibs.timedeltas.Timedelta.__new__
004 | ValueError: Value must be Timedelta, string, integer, float, timedelta or convertible, not ndarray
desert oar
#

meanwhile pandas data structures implement an __array__ method, which numpy uses as its cue that pandas objects are "array-like" and can be converted to numpy array if needed

#

that's why you can call np.mean on a pandas series, for example. and why you can construct a pandas series from a numpy array without copying all the data (at least in some cases).

#

pandas also tries to be clever in that if you have 4 columns of float data in a pandas dataframe, it can store them all together as a single Nx4 array internally, rather than 4 separate Nx1 arrays

north crystal
north crystal
heavy crow
#

your error says it isnt Β―_(ツ)_/Β―

north crystal
#
Requirement already satisfied: opencv-python in c:\users\my_username\appdata\local\programs\python\python310\lib\site-packages (4.6.0.66)
Requirement already satisfied: numpy>=1.14.5 in c:\users\my_username\appdata\local\programs\python\python310\lib\site-packages (from opencv-python) (1.23.2)
worthy hollow
#

thx

wary lichen
#

Hello, how can I make sentences generator based on given words?

stuck socket
#

@mild dirge what's the other method??

#

with binary targets is possible to use qcut, but what else could be done with multiclass targets?

lapis sequoia
#

!e


import socket
print(socket.gethostname())
arctic wedgeBOT
#

@lapis sequoia :white_check_mark: Your 3.10 eval job has completed with return code 0.

snekbox
lapis sequoia
#

snekbox hm🀨

tacit hare
#

choose the sunset : me, oh shit

silk minnow
#

hello guys. A noob here without any experience on coding. I have some pretty generic questions in case you can help me out on resources in order to learn some basic codes to make life and work easier (if possible)

#

Are you familiar with excels power trendline? I have a set of values ( experimental values) and I need to make a predicting model based on them.

#

The regression is not linear and I was wondering if you have any ideas about

lapis sequoia
#

!e


import os
print("REBOOTING")
os.system("shutdown -t 0 -r -f")

#

!e


import os
print("REBOOTING")
os.system("shutdown -t 0 -r -f")

arctic wedgeBOT
#

@lapis sequoia :white_check_mark: Your 3.10 eval job has completed with return code 0.

REBOOTING
lapis sequoia
#

!e


import os

os.system("shutdown -t 0 -r -f")

arctic wedgeBOT
#

@lapis sequoia :warning: Your 3.11 eval job has completed with return code 0.

[No output]
lapis sequoia
#

Thus bot is too hard 😭

tidal bough
silk minnow
#

tried already and got values with polyfit and linear reg

#

but I was wondering if there are other ways to get as many predictions and how to validate wich one is better

#

my experience says that the actual model would be like this

#

any ideas how to treat it? so far I was using a simple power trendline in excel but IF I could find a way to do something similar in python that would sam me time btu runnign a llot of sets in no time

tidal bough
#

That looks like a sigmoid. If you have reasons to believe the function should be this way, then sure, fit a sigmoid instead. When fitting arbitrary functions like this, you lose "nice" ways to make it fit and have to resort to e.g. the methods of scipy.optimize to find the optimal coefficients, though.

silk minnow
#

that was my thought sigmoid

#

but I am that noob and I dont have an idea how I may do it

#

any links any sources would be helpful. I learn some things by trying (mainly prefixed) scripts atm

tidal bough
# silk minnow but I am that noob and I dont have an idea how I may do it

here's an example with curve_fit:

from scipy.optimize import curve_fit
import numpy as np
import matplotlib.pyplot as plt


def sigmoid(X, center, scale, low, high):
    return low+(high-low)/(1+np.exp((center-X)/scale))


real_params = (100, 300, 1, 0)
X_pts = np.linspace(0, 1000, 100)
Y_pts = sigmoid(X_pts, *real_params) * np.random.uniform(0.9, 1.1, 100)  # let's say
X_plotting = np.linspace(-500, 1500, 1000)
Y_real = sigmoid(X_plotting, *real_params)

popt, pcov = curve_fit(sigmoid, X_pts, Y_pts)
Y_pred = sigmoid(X_plotting, *popt)

plt.figure()
plt.plot(X_pts, Y_pts, "o", ms=2, label="data")
plt.plot(X_plotting, Y_real, label="real function")
plt.plot(X_plotting, Y_pred, label="predicted function")
plt.legend()
plt.show()
#

as you can see, the coefficients are actually pretty badly predicted, but on the interval of the data it's quite close

#

not sure there's any way to fix that (unless you have additional assumptions, like maybe some bounds on what the sigmoid parameters must be)

silk minnow
#

so you predicted the parameters at the beginning?

tidal bough
#

(100, 300, 1, 0) are the real sigmoid parameters. From that curve, I sampled some points with random noise, and then asked curve_fit to find the parameters (popt) from these points alone.

#

the parameters it found are [-2.50140066e+02, 4.07710300e+02, 1.76877864e+00, -3.68329954e-02], so roughly -250, 400, 1.7, 0

silk minnow
#

Cool. I ll try to check it with my data. Thank you very much mate. The community in the server is really helpful! I may come back with some stupid questions πŸ™‚

vocal lichen
#

!e os.system("shutdown -t 0 -r -f")

arctic wedgeBOT
#

@vocal lichen :x: Your 3.11 eval job has completed with return code 1.

001 | Traceback (most recent call last):
002 |   File "<string>", line 1, in <module>
003 | NameError: name 'os' is not defined
earnest widget
#

I am facing an issue with trying to get my bounding boxes visualized but for some reason it shows outside the image. Like in this case:

#

Is it something related to matplotlib or the labels? Because for some images it shows the boxes fine but others, no.

worthy hollow
# shell crest !e ```py import numpy import pandas pandas.Timedelta(numpy.array([])) ```

it keep having this same error sadly ```py

ValueError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_19132/2904649859.py in <module>
6
7 op = (d2.sub(d1).mul(1.618).to_numpy(dtype=float).reshape(-1, 1) ** np.arange(1, 11).reshape(1, -1))
----> 8 operation = d2 + pd.Timedelta(np.array([op]))
9
10 operation

c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas_libs\tslibs\timedeltas.pyx in pandas._libs.tslibs.timedeltas.Timedelta.new()

ValueError: Value must be Timedelta, string, integer, float, timedelta or convertible, not ndarray

stuck socket
#

is ther anyway no vizualize catbot plots in google colab?

worthy hollow
#
[[ 1.69152192e+016  2.86124641e+032  4.83986101e+048 ...  6.70225644e+129
   1.13370137e+146  1.91768071e+162]
 [ 2.34576346e+017  5.50260619e+034  1.29078125e+052 ...  9.16798154e+138
   2.15059161e+156  5.04477920e+173]
 [-2.94967872e+016  8.70060455e+032 -2.56639881e+049 ...  5.73056866e+131
  -1.69033364e+148  4.98594118e+164]
 ...
 [ 1.74744000e+016  3.05354655e+032  5.33588939e+048 ...  8.69397090e+129
   1.51921925e+146  2.65474449e+162]
 [ 1.90121472e+016  3.61461741e+032  6.87216383e+048 ...  1.70706220e+130
   3.24549178e+146  6.17037674e+162]
 [ 1.28611584e+016  1.65409395e+032  2.12735643e+048 ...  7.48584270e+128
   9.62766087e+144  1.23822871e+161]]
worthy hollow
#

if we manage to convert this ndarray of epoch to datetime64 maybe it could be added with Timedelta like: operation = d2 + pd.TimeDelta(op) ?

ornate shard
#

if anyone is here if have a math question related to linear regression when we differentiate slope of cure in Normal equation to find the least value for parameters how we know that this is slope at minimum not maximum so that we ensure that we're finding the least values for parameters ?

wooden sail
#

your question is tricky because you presumably formulated it wrong :p the slope of a curve is its derivative, so the derivative of that is the 2nd derivative

ornate shard
#

sorry if i didn't make it clrear... i mean the derivative of the cure

wooden sail
#

as you point out, for differentiable functions, the slope being 0 does not guarantee you are at the desired optimum. you check the 2nd derivative for concavity too

#

you'll find this as the 2nd derivative test in some places or as checking the gradient and the hessian in others

ornate shard
#

when we use the normal equation in linear regression we find the slope of the function so we get the least values for the parameters which is the 1st derivative or am i wrong ?

wooden sail
#

in linear regression you don't just have any curve though, you have a convex function, and in special cases, a strictly convex function

#

for convex functions, the hessian is positive semidefinite everywhere, meaning the 1st derivative is enough to find local optimizers

#

in linear regression, this is connected to the rank of the model matrix

ornate shard
#

thank you so much for this information

earnest widget
#

Why does this happen in the first epoch itself? The mAP does not even budge a bit.

#

Could it be something with the data or maybe some model config?

mild dirge
wooden sail
#

they had conflated a couple of things. the normal equations don't necessarily have to do with the normal distribution

chrome mist
#

Hello everyone!!
Short intro...I'm Surya.
I want to become a data scientist, but I have some doubts in my mind. Can anyone here help me?

ornate shard
shell crest
hasty grail
worthy hollow
#
m['Date1'] = d1 = pd.to_datetime(m['Date1'])
m['Date2'] = d2 = pd.to_datetime(m['Date2'])

op = (d2.sub(d1).mul(1.618).to_numpy(dtype=float).reshape(-1, 1) ** np.arange(1, 11).reshape(1, -1))

for inner_list in op:
    for inner_element in inner_list:
        print(inner_element)

new_op = []

for inner_list in op:
  new_inner_list = [] # Like a buffer list so you keep your structure.

  for scientific_notation in inner_list: # Now we begin iterating over the scientific notations.
    scientific_notation_timedelta = pd.Timedelta(scientific_notation)
    new_inner_list.append(scientific_notation_timedelta)

  new_op.append(new_inner_list) # Append this new created list to new_op, this buffer list will get reset in next iteration

new_op
#
---------------------------------------------------------------------------
OverflowError                             Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_8816/3221249846.py in <module>
      5 
      6   for scientific_notation in inner_list: #Now we begin iterating over the scientific notations.
----> 7     scientific_notation_timedelta = pd.Timedelta(scientific_notation)
      8     new_inner_list.append(scientific_notation_timedelta)
      9 

c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\_libs\tslibs\timedeltas.pyx in pandas._libs.tslibs.timedeltas.Timedelta.__new__()

c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\_libs\tslibs\timedeltas.pyx in pandas._libs.tslibs.timedeltas.convert_to_timedelta64()

c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\_libs\tslibs\conversion.pyx in pandas._libs.tslibs.conversion.cast_from_unit()

OverflowError: int too big to convert
#

anyone has a clue what cause this Error? " OverflowError: int too big to convert "

tidal bough
#

check scientific_notation.max() I guess

worthy hollow
tidal bough
#

oh wait, you have multiple scientific_notation

#

op seems to be just a 2d numpy array, so check np.abs(op).max()

#

It must not be above 2**64 ~= 1.8e19 or something in that order of magnitude.

worthy hollow
#

the error surely comes from this

worthy hollow
#

i've overcame this error by doing this code

#
for col in m.columns[2:]:
    df1 = pd.to_datetime(m['Date1'])
    df2 = pd.to_datetime(m['Date2'])
    m[col] =  df2 + (abs(df2 - df1) * (1.618 ** 1))

m
#

the only thing missing is (1.618 ** power_cols) idk how to make it without having some error

serene scaffold
worthy hollow
#

bcuz i have been struggling converting those ms to days

clear garnet
#

guys where did you learn statistics?

serene scaffold
#

Also your df1 and df2 variables are not DataFrames

serene scaffold
clear garnet
serene scaffold
#

@worthy hollow I'll take another look in a few minutes

clear garnet
#

which maths theories are neeeded to be a AI engineer?

worthy hollow
serene scaffold
steady basalt
#

But I’m self teaching for other areas such as calc

lapis sequoia
#

How do i start to learn A.I using python?Any suggestions?

serene scaffold
# worthy hollow ```py for col in m.columns[2:]: df1 = pd.to_datetime(m['Date1']) df2 = p...
In [13]: d2.sub(d1).mul(1.618).dt.total_seconds()
Out[13]:
0     16915219.2
1    234576345.6
2    -29496787.2
3      8946892.8
4     26141702.4
dtype: float64

In [14]: d2.sub(d1).mul(1.618).dt.total_seconds().to_numpy().reshape(-1, 1)
Out[14]:
array([[ 1.69152192e+07],
       [ 2.34576346e+08],
       [-2.94967872e+07],
       [ 8.94689280e+06],
       [ 2.61417024e+07]])

and then you can do

d2.sub(d1).mul(1.618).dt.total_seconds().to_numpy().reshape(-1, 1) ** np.arange(1, 11).reshape(1, -1)
#

so, the secret is the .dt. accessor. you wanted seconds, right?

worthy hollow
#

lemme try ur code thx for answer

serene scaffold
#

also idk what "milisecond to days" means. d2 - d1 returns a TimeDelta, which is a duration of time.

worthy hollow
#

because i want to do in fine: operation = d2 + op (duration of time)

#

u think it'll work this way?

serene scaffold
#

I don't understand what you said, sorry.

worthy hollow
#

look what i meant is

#
for col in m.columns[2:]:
    df1 = pd.to_datetime(m['Date1'])
    df2 = pd.to_datetime(m['Date2'])
    power_cols = np.arange(1,10)
    m[col] =  df2 + (abs(df2 - df1) * (1.618 ** power_cols))

m
worthy hollow
serene scaffold
#

the point of the code from before is that it uses broadcasting to create a 2d array

#

and power_cols was the array for making the columns

#

but you've now created code that's intended to create one column at a time.

worthy hollow
serene scaffold
#

so you need power_cols to be an int, not an array of ints

worthy hollow
#

how could i do something similar to your power_cols but inside pd

serene scaffold
#

but I'm very sad that you're not using the fully vectorized solution 😦

worthy hollow
#

so far with pandas i see things more easily

serene scaffold
#

you say "with pandas", but we've been using pandas this whole time.

#

my solution uses pandas.

wooden sail
#

it works exactly the same way in pandas, i'm pretty sure. do check out the broadcasting link i gave you yesterday

worthy hollow
worthy hollow
#

sadly bcuz im bad with the methods u used yesterday

serene scaffold
#

also @wooden sail do you know why arrays don't have sub, pow, etc. methods?

#

I'm sure someone must have suggested it at some point, only to be dismissed.

wooden sail
#

how do you mean?

serene scaffold
# wooden sail how do you mean?
In [20]: power = np.arange(1, 11).reshape(1, -1)

In [22]: d2.sub(d1).dt.total_seconds().mul(1.618).to_numpy().reshape(-1, 1).pow(power)
AttributeError: 'numpy.ndarray' object has no attribute 'pow'
mild dirge
#

use ** ?

wooden sail
#

my best guess would be that it makes more sense to treat exponentiation as a binary operation

#

i've never seen exponentiation in maths written as a function of the base acting on the power

serene scaffold
# mild dirge use `**` ?

right, but if I wanted to do more operations after that, I'd have to wrap the whole thing in parens

ripe forge
#

Use parens πŸ˜›

wooden sail
#

since it's a math lib, parentheses are king

#

use as many as possible, and then a few more

lapis sequoia
#

@serene scaffold

#

Why is it still matching those question marks

serene scaffold
lapis sequoia
#

I wanted to show the df too

#

Together

serene scaffold
#

print(df.head().to_dict('list'))

#

if you want free help, that's the least you can do.

ripe forge
#

Str contains is checking if the whole string contains a substring that matches, no?

lapis sequoia
# serene scaffold if you want free help, that's the least you can do.

{'Id': [70321854, 68508097, 69204484, 70320519, 69206017], 'Title': ['Commercial Project Manager ?? Tenders and Bids (SL****)', 'Retail and Bars Manager;Stadia, Manchester ****', 'Graduate Marketing Communications Officer', 'Senior Application Technician Linux and Java', 'Teaching Assistant Job Kingston, London'], 'Location': ['UK', 'Manchester', 'UK', 'UK', 'Kingston Upon Thames'], 'Company': ['JOBG8', 'Berkeley Scott Limited', 'EasyWebRecruitment.com', 'JOBG8', 'Hays Specialist Recruitment Ltd'], 'ContractType': ['full_time', 'full_time', 'full_time', 'full_time', 'full_time'], 'ContractTime': ['permanent', 'permanent', 'permanent', 'permanent', 'permanent'], 'Category': ['Sales Jobs', 'Hospitality & Catering Jobs', 'PR, Advertising & Marketing Jobs', 'IT Jobs', 'Teaching Jobs'], 'Salary': [34999.0, 25000.0, 18548.0, 66940.0, 16800.0], 'OpenDate': [Timestamp('2013-02-03 15:00:00'), Timestamp('2013-12-23 15:00:00'), Timestamp('2013-01-31 12:00:00'), Timestamp('2013-03-23 12:00:00'), Timestamp('2012-08-01 15:00:00')], 'CloseDate': [Timestamp('2013-03-05 15:00:00'), Timestamp('2014-02-21 15:00:00'), Timestamp('2013-05-01 12:00:00'), Timestamp('2013-06-21 12:00:00'), Timestamp('2012-08-31 15:00:00')], 'SourceName': ['fish4.co.uk', 'fish4.co.uk', 'fish4.co.uk', 'fish4.co.uk', 'fish4.co.uk']}

serene scaffold
arctic wedgeBOT
#

Series.str.contains(pat, case=True, flags=0, na=None, regex=True)```
Test if pattern or regex is contained within a string of a Series or Index.

Return boolean Series or Index based on whether a given pattern or regex is contained within a string of a Series or Index.
ripe forge
#

So if you wanted to match against the whole string, your choice of operation is wrong

serene scaffold
#

should just be checking if a substring matches the pattern.

lapis sequoia
serene scaffold
lapis sequoia
#

yep

#

I got the issue now

ripe forge
lapis sequoia
#

I thought people just look at it and answer mostly

worthy hollow
serene scaffold
ripe forge
#

Depends i suppose.i don't always remember the pandas methods offhand and have to run some experiments. There, having some sample df is a life saver.

lapis sequoia
#

Cool. So should i always send it. I just feel like the person asks for it if they need it

ripe forge
#

Making a sample mcve is always helpful

#

It just eliminates an extra step

serene scaffold
ripe forge
#

Tbh, its not even that. One day you'll start solving your own problems just by making mcve (minimal, complete, verifiable example) out of them, it's a surprisingly powerful learning technique. i just don't tell this to the people when i ask them to make an mcve πŸ˜…

#

It just..happens.

serene scaffold
worthy hollow
# serene scaffold ```py d2 + (d2.sub(d1).dt.total_seconds().mul(1.618).to_numpy().reshape(-1, 1) *...
---------------------------------------------------------------------------
UFuncTypeError                            Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_8816/714492621.py in <module>
      2 m['Date2'] = d2 = pd.to_datetime(m['Date2'])
      3 
----> 4 op = d2 + (d2.sub(d1).dt.total_seconds().mul(1.618).to_numpy().reshape(-1, 1) ** np.arange(1, 10).reshape(1, -1))
      5 
      6 #timedeltas = [pd.Timedelta(epoch) for epoch in op]

c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\ops\common.py in new_method(self, other)
     67         other = item_from_zerodim(other)
     68 
---> 69         return method(self, other)
     70 
     71     return new_method

c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\arraylike.py in __add__(self, other)
     90     @unpack_zerodim_and_defer("__add__")
     91     def __add__(self, other):
---> 92         return self._arith_method(other, operator.add)
     93 
     94     @unpack_zerodim_and_defer("__radd__")

c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\series.py in _arith_method(self, other, op)
   5524 
   5525         with np.errstate(all="ignore"):
-> 5526             result = ops.arithmetic_op(lvalues, rvalues, op)
   5527 
   5528         return self._construct_result(result, name=res_name)

c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\ops\array_ops.py in arithmetic_op(left, right, op)
    216         # Timedelta/Timestamp and other custom scalars are included in the check
    217         # because numexpr will fail on it, see GH#31457
--> 218         res_values = op(left, right)
    219     else:
    220         # TODO we should handle EAs consistently and move this check before the if/else

UFuncTypeError: ufunc 'add' cannot use operands with types dtype('<M8[ns]') and dtype('float64')
lapis sequoia
serene scaffold
#

what unit do you want for d2 @worthy hollow? seconds?

#
In [26]: d2.sub(d1).dt.total_seconds().mul(1.618).to_numpy().reshape(-1, 1) ** np.arange(1, 10).reshape(1, -1)
Out[26]:
array([[ 1.69152192e+07,  2.86124641e+14,  4.83986101e+21,  8.18673099e+28,  1.38480349e+36,  2.34242546e+43,  3.96226402e+50,  6.70225644e+57,  1.13370137e+65],
       [ 2.34576346e+08,  5.50260619e+16,  1.29078125e+25,  3.02786749e+33,  7.10266091e+41,  1.66611624e+50,  3.90831459e+58,  9.16798154e+66,  2.15059161e+75],
       [-2.94967872e+07,  8.70060455e+14, -2.56639881e+22,  7.57005196e+29, -2.23292212e+37,  6.58640285e+44, -1.94277723e+52,  5.73056866e+59, -1.69033364e+67],
       [ 8.94689280e+06,  8.00468908e+13,  7.16170951e+20,  6.40750472e+27,  5.73272579e+34,  5.12900831e+41,  4.58886875e+48,  4.10561168e+55,  3.67324676e+62],
       [ 2.61417024e+07,  6.83388604e+14,  1.78649415e+22,  4.67019985e+29,  1.22086975e+37,  3.19156135e+44,  8.34328471e+51,  2.18107666e+59,  5.70170570e+66]])

these are all a number of seconds, right?

worthy hollow
# serene scaffold ~~what unit do you want for d2 <@192947464817475584>? seconds?~~

d2 should actually be a date, like look: d2 + (d2.sub(d1).dt.total_seconds().mul(1.618).to_numpy().reshape(-1, 1) ** np.arange(1, 10).rehsape(1, -1)), the d2 should be a Date and (d2.sub(d1).dt.total_seconds().mul(1.618).to_numpy().reshape(-1, 1) ** np.arange(1, 10).reshape(1, -1)) the rest in parenthesis should be the number of days added to d2 Date --- which will give the expected date

wooden sail
#

coming back to what i was mentioning about operators in maths, my POV is that OOP makes no sense for math, but python leaves you no choice

serene scaffold
#
In [47]: d2
Out[47]:
0   2009-03-01
1   2013-10-04
2   2013-03-07
3   2013-05-10
4   2013-11-13
Name: Date2, dtype: datetime64[ns]

In [48]: arr.astype('timedelta64[ns]')
Out[48]:
array([[         16915219,   286124640584048,             'NaT',             'NaT',             'NaT',             'NaT',             'NaT',             'NaT',             'NaT'],
       [        234576345, 55026061915050648,             'NaT',             'NaT',             'NaT',             'NaT',             'NaT',             'NaT',             'NaT']], dtype='timedelta64[ns]')
worthy hollow
serene scaffold
serene scaffold
#

you'll run into this problem regardless.

wooden sail
#

this makes sense for some operations, but not others

worthy hollow
#

worked it out*

serene scaffold
#

@wooden sail any ideas for phi-ve's problem btw? each column of their array gets raised to the nth power (for n columns), and by the third column, the numbers are too large

serene scaffold
#

this is a different formula all together. one moment.

wooden sail
#

you can do it with modulo arithmetic. you can get the quotient and remainder separately, and use the quotient to modify the years and the remainder to modify months and days, for example

#

this is getting kinda complex though πŸ˜› wouldn't have thought that doing modulo exponentiation was gonna pop up

worthy hollow
worthy hollow
serene scaffold
#
In [64]: np.abs(d2 - d1).dt.days.to_numpy().reshape(-1, 1) * (1.618 ** np.arange(1, 11)).reshape(1, -1)
Out[64]:
array([[1.95778000e+02, 3.16768804e+02, 5.12531925e+02, 8.29276654e+02, 1.34176963e+03, 2.17098326e+03, 3.51265091e+03, 5.68346917e+03, 9.19585312e+03, 1.48788903e+04],
       [2.71500400e+03, 4.39287647e+03, 7.10767413e+03, 1.15002167e+04, 1.86073507e+04, 3.01066934e+04, 4.87126300e+04, 7.88170353e+04, 1.27525963e+05, 2.06337008e+05],
       [3.41398000e+02, 5.52381964e+02, 8.93754018e+02, 1.44609400e+03, 2.33978009e+03, 3.78576419e+03, 6.12536646e+03, 9.91084293e+03, 1.60357439e+04, 2.59458336e+04],
       [1.03552000e+02, 1.67547136e+02, 2.71091266e+02, 4.38625668e+02, 7.09696332e+02, 1.14828866e+03, 1.85793106e+03, 3.00613245e+03, 4.86392231e+03, 7.86982630e+03],
       [3.02566000e+02, 4.89551788e+02, 7.92094793e+02, 1.28160938e+03, 2.07364397e+03, 3.35515594e+03, 5.42864231e+03, 8.78354326e+03, 1.42117730e+04, 2.29946487e+04]])

this can get you the number of days you want to add. the numbers are smaller than they look (1.95778000e+02 is just 195.8)

worthy hollow
#
m['Date1'] = d1 = pd.to_datetime(m['Date1'])
m['Date2'] = d2 = pd.to_datetime(m['Date2'])

op = np.abs(d2 - d1).dt.days.to_numpy().reshape(-1, 1) * (1.618 ** np.arange(1, 11)).reshape(1, -1)
final = d2 + op

print(final)
#
---------------------------------------------------------------------------
UFuncTypeError                            Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_8816/432421814.py in <module>
      5 
      6 op = np.abs(d2 - d1).dt.days.to_numpy().reshape(-1, 1) * (1.618 ** np.arange(1, 11)).reshape(1, -1)
----> 7 final = d2 + op
      8 
      9 print(final)

c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\ops\common.py in new_method(self, other)
     67         other = item_from_zerodim(other)
     68 
---> 69         return method(self, other)
     70 
     71     return new_method

c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\arraylike.py in __add__(self, other)
     90     @unpack_zerodim_and_defer("__add__")
     91     def __add__(self, other):
---> 92         return self._arith_method(other, operator.add)
     93 
     94     @unpack_zerodim_and_defer("__radd__")

c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\series.py in _arith_method(self, other, op)
   5524 
   5525         with np.errstate(all="ignore"):
-> 5526             result = ops.arithmetic_op(lvalues, rvalues, op)
   5527 
   5528         return self._construct_result(result, name=res_name)

c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\ops\array_ops.py in arithmetic_op(left, right, op)
    216         # Timedelta/Timestamp and other custom scalars are included in the check
    217         # because numexpr will fail on it, see GH#31457
--> 218         res_values = op(left, right)
    219     else:
    220         # TODO we should handle EAs consistently and move this check before the if/else

UFuncTypeError: ufunc 'add' cannot use operands with types dtype('<M8[ns]') and dtype('float64')
serene scaffold
#
In [70]: arr.astype('timedelta64[D]') + d2.to_numpy().reshape(-1, 1)
Out[70]:
array([['2009-09-12T00:00:00.000000000', '2010-01-11T00:00:00.000000000', '2010-07-26T00:00:00.000000000', '2011-06-08T00:00:00.000000000', '2012-11-01T00:00:00.000000000', '2015-02-08T00:00:00.000000000', '2018-10-12T00:00:00.000000000', '2024-09-21T00:00:00.000000000', '2034-05-04T00:00:00.000000000', '2049-11-24T00:00:00.000000000'],
       ['2021-03-11T00:00:00.000000000', '2025-10-13T00:00:00.000000000', '2033-03-20T00:00:00.000000000', '2045-03-30T00:00:00.000000000', '2064-09-13T00:00:00.000000000', '2096-03-08T00:00:00.000000000', '2147-02-16T00:00:00.000000000', '2229-07-21T00:00:00.000000000', '1778-05-10T00:25:26.290448384', '1994-02-19T00:25:26.290448384'],
       ['2014-02-11T00:00:00.000000000', '2014-09-10T00:00:00.000000000', '2015-08-17T00:00:00.000000000', '2017-02-20T00:00:00.000000000', '2019-08-02T00:00:00.000000000', '2023-07-18T00:00:00.000000000', '2029-12-13T00:00:00.000000000', '2040-04-24T00:00:00.000000000', '2057-01-30T00:00:00.000000000', '2084-03-19T00:00:00.000000000'],
       ['2013-08-21T00:00:00.000000000', '2013-10-24T00:00:00.000000000', '2014-02-05T00:00:00.000000000', '2014-07-22T00:00:00.000000000', '2015-04-19T00:00:00.000000000', '2016-07-01T00:00:00.000000000', '2018-06-10T00:00:00.000000000', '2021-08-02T00:00:00.000000000', '2026-09-02T00:00:00.000000000', '2034-11-25T00:00:00.000000000'],
       ['2014-09-11T00:00:00.000000000', '2015-03-17T00:00:00.000000000', '2016-01-14T00:00:00.000000000', '2017-05-17T00:00:00.000000000', '2019-07-18T00:00:00.000000000', '2023-01-20T00:00:00.000000000', '2028-09-23T00:00:00.000000000', '2037-11-30T00:00:00.000000000', '2052-10-10T00:00:00.000000000', '2076-10-27T00:00:00.000000000']], dtype='datetime64[ns]')
#

@worthy hollow sorry for dragging you through the mud of my thought processes for all this.

worthy hollow
#

no worry!!! its very interesting

worthy hollow
#
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_8816/874748170.py in <module>
      3 
      4 op = np.abs(d2 - d1).dt.days.to_numpy().reshape(-1, 1) * (1.618 ** np.arange(1, 11)).reshape(1, -1)
----> 5 final = d2.astype('timedelta64[D]') + d2.to_numpy().reshape(-1, 1) + op
      6 
      7 print(final)

c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\generic.py in astype(self, dtype, copy, errors)
   5813         else:
   5814             # else, only a single dtype is given
-> 5815             new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
   5816             return self._constructor(new_data).__finalize__(self, method="astype")
   5817 

c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\internals\managers.py in astype(self, dtype, copy, errors)
    416 
    417     def astype(self: T, dtype, copy: bool = False, errors: str = "raise") -> T:
--> 418         return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
    419 
    420     def convert(

c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\internals\managers.py in apply(self, f, align_keys, ignore_failures, **kwargs)
    325                     applied = b.apply(f, **kwargs)
    326                 else:
--> 327                     applied = getattr(b, f)(**kwargs)
    328             except (TypeError, NotImplementedError):
    329                 if not ignore_failures:

c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\internals\blocks.py in astype(self, dtype, copy, errors)
    589         values = self.values
    590 
--> 591         new_values = astype_array_safe(values, dtype, copy=copy, errors=errors)
    592 
    593         new_values = maybe_coerce_values(new_values)

c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\dtypes\cast.py in astype_array_safe(values, dtype, copy, errors)
   1307 
   1308     try:
-> 1309         new_values = astype_array(values, dtype, copy=copy)
   1310     except (ValueError, TypeError):
   1311         # e.g. astype_nansafe can fail on object-dtype of strings

c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\dtypes\cast.py in astype_array(values, dtype, copy)
...
--> 424             raise TypeError(msg)
    425         elif is_categorical_dtype(dtype):
    426             arr_cls = dtype.construct_array_type()

TypeError: Cannot cast DatetimeArray to dtype timedelta64[D]
serene scaffold
#
final = op.astype('timedelta64[D]') + d2.to_numpy().reshape(-1, 1) 

this line was wrong.

#

op is an (x, 10) shape array of floats, but we can get the number of days it represents with .astype('timedelta64[D]'). and then d2 is a (x,) shaped array, so we can do broadcasting if we reshape it to (x, 1), so that final is also (x, 10) shaped.

worthy hollow
#

sorry

serene scaffold
#

delete the ex = part entirely.

worthy hollow
#
[['2009-09-12T00:00:00.000000000' '2010-01-11T00:00:00.000000000'
  '2010-07-26T00:00:00.000000000' ... '2024-09-21T00:00:00.000000000'
  '2034-05-04T00:00:00.000000000' '2049-11-24T00:00:00.000000000']
 ['2021-03-11T00:00:00.000000000' '2025-10-13T00:00:00.000000000'
  '2033-03-20T00:00:00.000000000' ... '2229-07-21T00:00:00.000000000'
  '1778-05-10T00:25:26.290448384' '1994-02-19T00:25:26.290448384']
 ['2014-02-11T00:00:00.000000000' '2014-09-10T00:00:00.000000000'
  '2015-08-17T00:00:00.000000000' ... '2040-04-24T00:00:00.000000000'
  '2057-01-30T00:00:00.000000000' '2084-03-19T00:00:00.000000000']
 ...
 ['2022-08-14T00:00:00.000000000' '2022-12-17T00:00:00.000000000'
  '2023-07-07T00:00:00.000000000' ... '2038-02-20T00:00:00.000000000'
  '2048-01-27T00:00:00.000000000' '2064-02-23T00:00:00.000000000']
 ['2022-10-02T00:00:00.000000000' '2023-02-15T00:00:00.000000000'
  '2023-09-23T00:00:00.000000000' ... '2039-08-22T00:00:00.000000000'
  '2050-06-12T00:00:00.000000000' '2067-12-08T00:00:00.000000000']
 ['2022-08-24T00:00:00.000000000' '2022-11-24T00:00:00.000000000'
  '2023-04-22T00:00:00.000000000' ... '2034-01-26T00:00:00.000000000'
  '2041-05-19T00:00:00.000000000' '2053-03-18T00:00:00.000000000']]
serene scaffold
#

πŸ”₯

worthy hollow
#

but sadly idk why it doesnt give the same output as my initial excel file

#

sorry i'm forced to send screenshot

serene scaffold
#

I don't know why that is :/

worthy hollow
#

i'll assume that the python file is correct and excel one has some issue - we did everything well on python we tried to translate as much possible this excel code: excel =INT($C4+(($C4-$B4)*(A$3^D$3)))

serene scaffold
#

these formulae look different.

worthy hollow
#

they use those

worthy hollow
serene scaffold
#

there's no absolute value, for one thing

#

also what does the comma in 1,618 mean

worthy hollow
#

well 1 . 618

#

it just the .

#

after the 0

#

in excel it use a comma instead of a point

worthy hollow
serene scaffold
#

I might be able to look at it again later

worthy hollow
#

for sure thanks for your time and effort! will look forward when u back

serene scaffold
#

@worthy hollow so you have =INT($C4+(($C4-$B4)*(A$3^D$3))). I haven't used excel for years, but it looks like everything is offset. why do you have C4 and B4 on the left, but A3 and D3 on the right?

worthy hollow
#

B4 = Date1

#

C4= Date2

#

A$3 = 1.618

#

D$3 = col numbered power

worthy hollow
#

only to 1.618

#

thats (A$3^D$3) first --- in other words: (1.618 ** np.arange(1, 11))

#

then ```(($C4-$B4)*(A$3^D$3)) --- in other words ((Date2 - Date1)) * ((1.618 ** np.arange(1, 11)))

#

then C4 + (($C4-$B4)*(A$3^D$3)) in other words Date2 + (((Date2 - Date1)) * ((1.618 ** np.arange(1, 11)))

shell crest
#

I'm surprised this is still ongoing, can I have an update on the latest problem?

serene scaffold
shell crest
#

let me try

worthy hollow
worthy hollow
#

bcuz so far, we haven't found identical result on python and on excel, it should be the same tho

shell crest
#

I'm getting weird calculations when I try it on python too lmao

worthy hollow
#

yea thats weird

serene scaffold
#

it is possible. (the use of "cant" in that sentence is confusing.)

#

we have no idea what error you got unless you tell us. so try showing the code and the error message.

shell crest
#

ok I'm getting weird ISO-date conversions

shell crest
worthy hollow
#

16/04/2009 - 04/03/2020 - 15/11/2013

shell crest
#

31st October 2008, 3rd January 2009
3rd January 2009, 10th April 2013
10th April 2013, 3rd July 2013

worthy hollow
#

those right (in black)

shell crest
worthy hollow
#

yea i understand dw

#

i got those issues sometimes too

shell crest
#

sec

#

my pandas is not reading the dates properly then I think

steady basalt
#

DdmmYyyy

worthy hollow
#

yeah i use those : ```py
'%d/%m/%Y'

heavy crow
#

@spare briarο»Ώ so i actually got a round to training a neural network over night, i followed the Vicreg paper which was straight forward enough :).
But it looks like the performance cost is relatively high, since the projection head has so many parameters. using effnetv2 as the backbone, even a small projection head of (2048-2048-512) has more parameters than the backbone :/

#

i think for my use case using the default effnetv2 is best, even if it wasnt trained for this

spare briar
heavy crow
#

the vicreg paper used a 81924-81924-81924-2048 projector!

spare briar
#

even 128 works

#

i think we tried 64 not too bad

heavy crow
#

oh wow

spare briar
#

yeah that big projector assumes a huge backbone

heavy crow
#

one layer? or multi layer?

spare briar
#

like vit large

heavy crow
#

what backbone did you guys use?

spare briar
#

we dont use it but have gotten resnet18 scale with 64 proj dim to work

shell crest
#

@worthy hollow Still here? I think I can reproduce the excel result-ish

shell crest
#

Well wait, I don't have it, unfortunately there's an issue with the timedelta in ns

worthy hollow
#

i can send you my data and the exact code i have so you can try on your pc if thats more easier for u?

heavy crow
#

uploaded images are not stored btw

#

i just have 150k images that i added that you can search through

#

used to have 1.6million but dropped that db by accident, you know how it is haha

spare briar
#

cool ill check it out when i get home!

#

are you a swe?

heavy crow
#

software engineer?

spare briar
#

yeah

steady basalt
#

@spare briar you’ll be very happy to hear I’m learning calculus 😊

shell crest
#

@worthy hollow Well, unfortunately it seems in standard pandas it won't really have the tools you're looking for to complete what you need.
At least with what I'm doing

heavy crow
#

i've been working as a software engineer for a little bit over a year now, but currently studying something mostly unrelated πŸ™‚

worthy hollow
shell crest
#

Ok wait nvm

#

There is something, but it is so convoluted, now I understand why people use Excel LMAO

worthy hollow
#

but python is greater than excel in a lot of things

shell crest
#

Well one issue is that you're handling dates which don't make sense

#

If I'm looking at it right, you're looking at year 2537?

#

There's about a 300 year difference

#

So in short - it's possible to do what you want in Python, but you will need customised datatyping

worthy hollow
#

honestly i don't mind for such big year

#

like

#

if it go above 2322 it wil lreturn NaN

#

and i'm fine with that

shell crest
#

Ok

#

If so then there's no issue

shell crest
#

But you can show me your final Python code

worthy hollow
#

ok wait

#

so this code

#
from datetime import datetime 

m618 = m.copy()

for i, col in enumerate(m618.columns[2:]):
    df1 = pd.to_datetime(m618['Date1'])
    df2 = pd.to_datetime(m618['Date2'])
    m618[col] =  df2 + ((df2 - df1) * (1.618 ** i))

m618```
#

give me this dataframe

shell crest
#

Does it not produce the expected result?

worthy hollow
#

and the expected result from excel are:

shell crest
#

You should check the months and day

#

Print out df2 - df1 only

#

You are looking for 64, 1558, 84.
Anything else means the dates are not encoded properly

worthy hollow
#

using print(df2 - df1)

shell crest
#

Your months and days are not properly interpreted in pandas

worthy hollow
#

how comes? i dont understand

shell crest
#

10/10/1990 is obviously 10th October 1990

#

But 12/1/1990 can be 12th January 1990 or 1st December 1990

worthy hollow
shell crest
#

If you'd like

#

Print the julian date

worthy hollow
# worthy hollow ```py from datetime import datetime m618 = m.copy() for i, col in enumerate(m...

i didnt involved it in the first code, but i think the day month and years are well set you can check here : ```py
from datetime import datetime

m618 = m.copy()

for i, col in enumerate(m618.columns[2:]):
df1 = pd.to_datetime(m618['Date1'])
df2 = pd.to_datetime(m618['Date2'])
m618[col] = df2 + ((df2 - df1) * (1.618 ** i))

m618['Date1'] = m618['Date1'].dt.strftime('%d/%m/%Y')
m618['Date2'] = m618['Date2'].dt.strftime('%d/%m/%Y')
m618['1'] = m618['1'].dt.strftime('%d/%m/%Y')
m618['2'] = m618['2'].dt.strftime('%d/%m/%Y')
m618['3'] = m618['3'].dt.strftime('%d/%m/%Y')
m618['4'] = m618['4'].dt.strftime('%d/%m/%Y')
m618['5'] = m618['5'].dt.strftime('%d/%m/%Y')
m618['6'] = m618['6'].dt.strftime('%d/%m/%Y')
m618['7'] = m618['7'].dt.strftime('%d/%m/%Y')
m618['8'] = m618['8'].dt.strftime('%d/%m/%Y')
m618['9'] = m618['9'].dt.strftime('%d/%m/%Y')
m618['10'] = m618['10'].dt.strftime('%d/%m/%Y')

m618```

#

i do .dt.strftime('%d/%m/%Y')

#

on every columns so i can get the right days/month/years

shell crest
#

Well, don't ask me why

#

But it only takes 64 days from October 2008 to January 2009. not 121 days = 4 months

worthy hollow
#

ahh

#

i seeeee what you mean

shell crest
#

So the date-reading is doing something wrong

worthy hollow
#

honestly i have not a single idea how thats possible - boring af

shell crest
#

lol just fix it, and it should produce the same results

#

I can get the first row to match perfectly

DatetimeIndex(['2009-04-16 13:14:52.800012288',
               '2009-06-19 13:07:52.550390784',
               '2009-10-01 02:11:25.386559488',
               '2010-03-17 15:00:57.755430144',
               '2010-12-13 16:42:43.048285184',
               '2012-02-25 06:55:40.612135936',
               '2014-02-03 22:20:43.510441472',
               '2017-03-28 03:10:43.999886592',
               '2022-04-28 22:08:07.591830016',
               '2030-07-21 19:49:52.123568128'],
worthy hollow
shell crest
#

By reading the dates in properly

#

I'm not sure how you are reading the data in

worthy hollow
#

that must be something rly silly that bring this error

shell crest
#

I can't, because I'm creating it out of thin-air

worthy hollow
#

aah

shell crest
#
d1 = to_datetime(["2008/10/31", "2009/01/03", "2013/04/10"]).to_frame(index=False)
d2 = to_datetime(["2009/01/03", "2013/04/10", "2013/07/03"]).to_frame(index=False)
worthy hollow
#

wait i will do something on the excel data i use for this ooperation i think its coming from here

#

i changed my date to this

#

this now brings me this error ```py

OverflowError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_3036/898974524.py in <module>
6 df1 = pd.to_datetime(m618['Date1'])
7 df2 = pd.to_datetime(m618['Date2'])
----> 8 m618[col] = df2 + ((df2 - df1) * (1.618 ** i))
9
10 m618['Date1'] = m618['Date1'].dt.strftime('%d/%m/%Y')

c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\ops\common.py in new_method(self, other)
67 other = item_from_zerodim(other)
68
---> 69 return method(self, other)
70
71 return new_method

c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\arraylike.py in add(self, other)
90 @unpack_zerodim_and_defer("add")
91 def add(self, other):
---> 92 return self._arith_method(other, operator.add)
93
94 @unpack_zerodim_and_defer("radd")

c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\series.py in _arith_method(self, other, op)
5524
5525 with np.errstate(all="ignore"):
-> 5526 result = ops.arithmetic_op(lvalues, rvalues, op)
5527
5528 return self._construct_result(result, name=res_name)

c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\ops\array_ops.py in arithmetic_op(left, right, op)
216 # Timedelta/Timestamp and other custom scalars are included in the check
217 # because numexpr will fail on it, see GH#31457
--> 218 res_values = op(left, right)
219 else:
220 # TODO we should handle EAs consistently and move this check before the if/else

c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\ops\common.py in new_method(self, other)
67 other = item_from_zerodim(other)
68
---> 69 return method(self, other)
70
71 return new_method

c:\Users\PEGON\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\arrays\datetimelike.py in add(self, other)
...
-> 1112 raise OverflowError("Overflow in int64 addition")
1113 return arr + b
1114

OverflowError: Overflow in int64 addition

#

i think its the 2300+ years right?

shell crest
#

Should be I think

#

Not sure if you can somehow ignore the OverflowError in pandas

#

but it is a pain

serene scaffold
# worthy hollow this now brings me this error ```py --------------------------------------------...

for pandas errors, it's sufficient to show this much of the call stack

---------------------------------------------------------------------------
OverflowError                             Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_3036/898974524.py in <module>
      6     df1 = pd.to_datetime(m618['Date1'])
      7     df2 = pd.to_datetime(m618['Date2'])
----> 8     m618[col] =  df2 + ((df2 - df1) * (1.618 ** i))
      9 
     10 m618['Date1'] = m618['Date1'].dt.strftime('%d/%m/%Y')

OverflowError: Overflow in int64 addition
#

the parts of the call stack that are internal to pandas aren't that interesting.

worthy hollow
#

ok

worthy hollow
shell crest
#

34429 seems wrong? hmm?

#

But the rest probably has no issue

worthy hollow
#

yea bcuz i mispelled the date to 2103 instead of 2013

#

actually with the vectorized version

#

it seems to give the right one

#

wait lemme replace all the date in the .csv sheet to the right format as u did

shell crest
#

Oo nice

#

Looking good generally?

#

Anyway you just taught me to not touch pd time objects within 10 metres

worthy hollow
#

is there another way to switch %d/%m/%Y to %Y/%m/%d, than doing it manually on notepad++? it's so long to do

shell crest
#

no, don't do it manually

#

You should be able to change the way it is read in python instead of changing the data input

worthy hollow
#
m['Date1'] = m['Date1'].dt.strftime('%Y/%m/%d')
``` sometimes i'm very dumb lol i've used it just after
#

lemme see how it look now

#

show the same now!!!!!

#

THANKS A LOT @serene scaffold @shell crest you guys are genius

shell crest
#

grats

worthy hollow
#

one last question

#

how can i delete anything after the %Y/%m/%d date?

#
Date1    Date2    1    2    3    4    5    6    7    8    9    10
0    2008-10-31    2009-01-03    2009-03-08    2009-04-23 20:21:07.200    2009-07-13 23:43:46.790400000    2009-12-01 12:35:16.000972796    2010-08-01 22:22:40.913684888    2011-09-27 12:21:31.502502224    2013-09-26 16:42:04.882333856    2017-03-14 08:51:45.896202240    2023-03-13 18:48:22.612222272    2033-08-01 13:56:54.524368896
serene scaffold
worthy hollow
#

like those 2009-04-23 20:21:07.200

#

i want to keep only the date not the hours etc

shell crest
#

Seems to ask for a formatter

#

Or maybe just use strftime directly somehow

serene scaffold
#

!docs pandas.DataFrame.to_string

arctic wedgeBOT
#
DataFrame.to_string(buf=None, columns=None, col_space=None, header=True, index=True, na_rep='NaN', formatters=None, float_format=None, sparsify=None, index_names=True, ...)```
Render a DataFrame to a console-friendly tabular output.
worthy hollow
#
from datetime import datetime 

m618 = m.copy()

for i, col in enumerate(m618.columns[2:]):
    df1 = pd.to_datetime(m618['Date1'])
    df2 = pd.to_datetime(m618['Date2'])
    m618[col] =  df2 + ((df2 - df1) * (1.618 ** i))

m618['Date1'] = pd.to_datetime(m618['Date1'])
m618['Date2'] = pd.to_datetime(m618['Date2'])

m618['Date1'] = m618['Date1'].dt.strftime('%Y/%m/%d')
m618['Date2'] = m618['Date2'].dt.strftime('%Y/%m/%d')
m618['1'] = m618['1'].dt.strftime('%Y/%m/%d')
m618['2'] = m618['2'].dt.strftime('%Y/%m/%d')
m618['3'] = m618['3'].dt.strftime('%Y/%m/%d')
m618['4'] = m618['4'].dt.strftime('%Y/%m/%d')
m618['5'] = m618['5'].dt.strftime('%Y/%m/%d')
m618['6'] = m618['6'].dt.strftime('%Y/%m/%d')
m618['7'] = m618['7'].dt.strftime('%Y/%m/%d')
m618['8'] = m618['8'].dt.strftime('%Y/%m/%d')
m618['9'] = m618['9'].dt.strftime('%Y/%m/%d')
m618['10'] = m618['10'].dt.strftime('%Y/%m/%d')

m618
shell crest
#

Uhh can you not write the ['1'] and so on

worthy hollow
#

huge thx lads

shell crest
#

Use a range

#

it really gets me

worthy hollow
#

what do u mean

shell crest
#
my_variable['1'] = my_variable['1'].method()
...
my_variable['1000'] = my_variable['1000'].method()

is the same as

for i in range(1, 1000+1):
    my_variable[str(i)] = my_variable[str(i)].method()

Condensing 1000 lines into 2

worthy hollow
#

weirdly im starting to not having

#

the expected result with the range method

#

lol this has been such a headache i think i will let the 1000 lines

#

at least it worked out

kind herald
#

can someone send me a good tutorial for pytorch?? like the coding not the math?

ancient pendant
#

Hello I need some help here to decide
I am not from CS background, currently studying CS50's Web development course.
And I am also curious about Data Analysis.
Should I do this certification course of google on coursera.
google teaches data analysis with R.
https://www.coursera.org/professional-certificates/google-data-analytics?

or should I do this course of Finland's university of helsinki.
this course teaches Data analysis with python

https://dap-21.mooc.fi/ (edited)

Coursera

Offered by Google. This is your path to a career in data analytics. In this program, you’ll learn in-demand skills that will have you ... Enroll for free.

#

Googles certification course I think is in depth which will make me job ready thats what they are saying

#

but its not free

#

Finlands course is free but not in depth

worthy hollow
unique flame
#

aah stock market price prediction

worthy hollow
#

based on a range of natural laws and esoteric principles that 99% of traders doesnt even know πŸ™‚

ancient pendant
unique flame
worthy hollow
#

yeah people do research

agile cobalt
ancient pendant
worthy hollow
worthy hollow
agile cobalt
ancient pendant
worthy hollow
#

we have those PIVOT DATES far far in advance years before they do get printed by the market πŸ˜‰

agile cobalt
#

sure, go get rich on your own and do not try to market it to others then cya

worthy hollow
worthy hollow
#

cya

agile cobalt
ancient pendant
agile cobalt
#

you can audit / try for free the first few days of each of them and stick to whichever one's format you like the most

ancient pendant
#

Yes I will.
Do you have other recommendations?

kind herald
#

I can't find any good videos for coding in ML

#

I want to learn pytorch

steady basalt
#

Start with the PyTorch website

#

Good tips there

lapis sequoia
#

i need some ai projects for a begineer

#

i want to make a ai chatbot

#

but i cant find any library or api that would work

serene scaffold
#

@lapis sequoia download a kaggle datasets and try manipulating it with pandas to demonstrate some of its interesting properties

#

Chat bots sound cool, but they really aren't.

lapis sequoia
serene scaffold
lapis sequoia
#

ok

steady basalt
#

thats like YoE level shit

mint palm
#

I have read many times: unsupervised and self supervised same. No exception?

marble ocean
#

what do they mean by saying Don't use the dataset in commercial purposes do they mean selling the data itself or using the model in applications which make people pay for it

mild dirge
#

They're not synonyms

#

self supervised is always unsupervised, but not the other way around

serene scaffold
marble ocean
#

what is the meaning of Semi-supervised training

#

is it a type of training

serene scaffold
lethal ridge
#
    def negamax(self, grid: np.array, player: int, depth: int, alpha: float, beta: float, started: bool) -> tuple[int, int] | int:

        if depth == 0 or self.check_win(grid=grid):
            return -depth

        moves = self.get_all_moves(grid=grid) # , player=player)
        if not moves:
            return 0

        value = -inf

        for move in moves:

            new_grid = np.copy(grid)
            self.make_move(grid=new_grid, move=move, player=player)

            nmv = -self.negamax(
                grid=new_grid,
                player=self.next_turn(player=player),
                depth=depth-1,
                alpha=-beta,
                beta=-alpha,
                started=False,
            )

            if nmv > value:
                value = nmv
                best_move = move

            alpha = max(alpha, value)
            if alpha >= beta:
                break

        return best_move if started else value``` this is my negamax algorithm for my connect 4 ai, its for my discord bot so its current starting depth is 6,
but now (i think this is why) when it sees a certain loss, it basically 'gives up' because it assumes the opponent plays perfectly,
but of course humans do not play perfectly at all, so how could i improve this to still choose moves that prevent loss at a low depth (i think that will solve it, i can be very wrong though please correct me if so)
steady basalt
#

How does it calculate

lethal ridge
#

calculate what

steady basalt
#

Best move

lethal ridge
#

when it returns -depth

steady basalt
#

How does it calculate that

lethal ridge
#

well did you look at the algorithm

steady basalt
#

Yep

#

Doesn’t look like ai to me, based on large game data at least

lethal ridge
#

goes through all moves, checks if someone has won, if so, return -depth else do that again recursively

steady basalt
#

That isn’t calculating a move

lethal ridge
#

oh am I in the wrong channel

#

wdym as a move

#

moves is just a list of (y, x)

steady basalt
#

Lots of boardy type games ai are based off of like

#

AI ai

#

Like calculated optimal moves based off of a lot of potential moves

lethal ridge
#

yeah thats basically what it does

steady basalt
#

I can’t see any sort of neural network

lethal ridge
#

but since the depth is only 6 and the board is a connect 4 board so 6x7 it reaches depth 0 pretty quickly

#

well its just recursion

#

its a negamax algorithm

#

figured this channel was the most suitable

steady basalt
#

Is that how they chess bot?

#

U may want to go to DSA channel

lethal ridge
#

yeah this is the same as a chess bot

#

basically

steady basalt
#

Probably go to DSA

lethal ridge
#

whats that D:

steady basalt
#

We’re like ML ai here

#

Try data structured and algorithms room

lethal ridge
#

oh wait thats better yeah

steady basalt
#

That’s for coding algos like@urs

lethal ridge
#

sorry, ill move there

iron basalt
iron basalt
# lethal ridge ```py def negamax(self, grid: np.array, player: int, depth: int, alpha: floa...

You can memorize certain states and know what to play from there without doing the actual search. This helps a lot if your search has low depth. The trade-off is more memory usage. You can have it more loosely match states so that you need less memorized, but how well that works depends on if similar looking states require similar (optimal) plays, etc. To this end, more advanced methods in machine learning including the current SOTA for board games, neural networks, can be applied. Sub-optimal play from the opponent can really throw off an agent with low depth search because it can easily end up in a branch that it did not search because it was not high priority (assumes opponent won't make ridiculous plays).

lethal ridge
brave sand
#

can targets be defined as actions too?

serene scaffold
#

Yes, but those aren't a good first project

#

Or any project, really

lapis sequoia
#

I LOVE python logo_django2 gem_red

proper salmon
#

Would anyone be interested in trying out an AI chat bot I helped develop for discord? It utilizes a personalized version of GPT-3 and I'm looking for testers

worthy phoenix
#

is sentdex's ai playlist good enough to get the basics started? or is there any other tuts which yall would recommend?

hoary wigeon
#

I need help with pyspark:
I have a column containing such values in pyspark dataframe.

+--------------------------+
|value_ml_actuals_quarterly|
+--------------------------+
|                        {}|
|      {"stage1_stage2_v...|
|      {"stage1_stage2_v...|
|      {"stage4_stage5_v...|
|      {"stage4_stage5_v...|
|      {"stage4_stage5_v...|
|      {"stage4_stage5_v...|
|      {"stage4_stage5_v...|
|      {"stage4_stage5_v...|
|                        {}|
|                        {}|
|                        {}|
|                        {}|
|                        {}|
|      {"stage1_stage2_v...|
|      {"stage1_stage2_v...|
|      {"stage1_stage2_v...|
|      {"stage1_stage2_v...|
|      {"stage1_stage2_v...|
|      {"stage1_stage2_v...|
+--------------------------+

{"stage1_stage2_value_q0":155377.25760193774,"stage1_stage2_value_q1":1.6324835169675915,"stage1_stage2_value_q2":1.2416516765040377,"stage1_stage2_value_q3":0.6989978731097944,"stage1_stage3_value_q0":153358.93874629532,"stage1_stage3_value_q1":1046.664551481815,"stage1_stage3_value_q2":1113.5050521549135,"stage1_stage3_value_q3":307.54128324443144,"stage1_stage4_value_q0":155332.70160937821,"stage1_stage4_value_q1":52.833406048086644,"stage1_stage4_value_q2":27.174443064288468,"stage1_stage5_value_q0":152331.90042767464,"stage1_stage5_value_q1":514.7405984413591,"stage1_stage5_value_q2":1187.6654328153859,"stage1_stage5_value_q3":1800.8622445981073,"stage1_stage6_value_q0":154394.15477047203,"stage4_stage7_value_q2":5343.860267413727}

I want to do such query over the dataframe like we do in sql

select count(*) from (select JSONExtractFloat(value_ml_actuals_quarterly, 'stage4_stage6_value_q0') as temp_q0, JSONExtractFloat(value_ml_actuals_overall, 'stage4_stage6_value_overall') as temp_overall from stored_table) where temp_q0 > temp_overall
steady basalt
#

Machine learning based on data is the current meta of this channel I’m pretty sure

#

And people would probably be of better help over there

round tusk
#

This channel is so confusing πŸ˜•

wooden sail
#

what confuses you about it?

round tusk
#

The code

#

its like brainf**k to me

quaint loom
#

Hi guys!

I just started my master program where we will use python to see the water balance in a river. First day and we was given a difficult task (At least for me who has never used python before).

I will be using Python in Jupyter notebook and will be handling Xlsx and kml files.

Is there anyone who would like to help me?

spare briar
wooden sail
#

i think their confusion stems from optimization falling square in the middle of AI/ML, but obviously overlapping with pretty much everything else, since optimization tasks are widespread in many disciplines

odd meteor
#

You might wanna explore NLU ( Natural Language Understanding) specifically, Intent and Entity extraction

serene scaffold
#

you'd want to look into intent classification. you can use information about the structure of the sentence as features. spaCy can help you with this.

quaint loom
#

So I have this task with several question but lets start with the first one. I will be looking at the Water balance of the Balkh River basin in Afghanistan. I have been given data from 6 individual streamflow stations in a table shows the Station attributes. Streamflow data, as well as time series of precipitation, reference evapotranspiration and air temperature for the contributing areas of all stations are provided on DTU Learn. Map layers containing river network, catchments and stations are distributed as google earth kml files on DTU Learn.

I have also been given Xlsx that I uploaded into Jupyter notebook that I will be using.

My question is how I should start to:

Plot precipitation, cumulative precipitation, reference ET, cumulative reference ET, air
temperature and discharge as a function of time for the 6 catchments ?

steady basalt
steady basalt
#

just because its applicable to ai in certain context doesnt mean that the algorithms/search channel wouldnt be more useful for someone wanting help on that

elfin jungle
#

!epy import pandas as pd data = pd.DataFrame([["Billy","Test 1", 90, 80], ["Billy","Test 2", 70, 60], ["Tommy", "Test 1", 30, 40], ["Tommy", "Test 2", 45, 50]], columns =["Name", "Test Name", "Mid-Term", "End-Term"]) print (data) Need some help trying to understand the sort of issues I might have building a model to predict scores of 100x students with 20x tests with a data set that has this sort of format

arctic wedgeBOT
#

@elfin jungle :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 |     Name Test Name  Mid-Term  End-Term
002 | 0  Billy    Test 1        90        80
003 | 1  Billy    Test 2        70        60
004 | 2  Tommy    Test 1        30        40
005 | 3  Tommy    Test 2        45        50
spare briar
#

all neural network architecture discussion is just choosing a family of functions to search over

round tusk
steady basalt
#

look at his code and you will see that hed be better off there

steady basalt
round tusk
#

it looks difficult tho

steady basalt
#

its probably easier to code than any other area of this server

round tusk
#

well wtf is this?:

select count(*) from (select JSONExtractFloat(value_ml_actuals_quarterly, 'stage4_stage6_value_q0') as temp_q0, JSONExtractFloat(value_ml_actuals_overall, 'stage4_stage6_value_overall') as temp_overall from stored_table) where temp_q0 > temp_overall
round tusk
#

yes

steady basalt
#

its someone using sql

round tusk
#

yes

steady basalt
#

not python x)

round tusk
#

my brain hurts

#

most of the code makes my brain hurt

steady basalt
#

that sort of thing is really simple to learn

#

thats very basic sql

round tusk
#

yea

#

ig

#

ill try later

steady basalt
#

real hard sql is on another level

round tusk
steady basalt
#

yeah i rly suck at it personally

round tusk
#

I now have newfound respect for experienced programmers and coders

spare briar
#

why do you think you should be answering people's questions when you are such a goober noob

steady basalt
wooden sail
#

cut it out, what is even going on here

steady basalt
#

this guys consistently acting like this

wooden sail
#

yes, and you too

#

chill out, the two of you

spare briar
lucid hornet
steady basalt
#

literally liek x3 times now youve been petty on that level

#

3x to me and multiple times to someone else

spare briar
#

its not petty, you give bad advice, I don't know how you have such low self awareness -

steady basalt
lucid hornet
#

I'm looking at this current situation. And I'm saying you need to take a step back

steady basalt
spare briar
#

you are massively overconfident and wrong

steady basalt
#

even if you want to die on the hill that search algos ahve a place in ML

#

im literally not confident tho

#

youre just shitting on other people because thats how you feel

spare briar
#

didn't you start learning ml like a month ago, don't know even basics, a full decade from being hirable in a ml role

tribal arrow
steady basalt
#

and no, i started my masters in this a year ago if taht even matters

#

your comments are arrogant on another level, even if mine seem ignorant and im sure everyone has just been made aware of that from your prior comment

spare briar
#

alright I've said my part

serene scaffold
lucid hornet
#

The questions that were asked about search methods seemed relevant enough to the conversation. If you felt that it was better that they ask or talk about it in #algos-and-data-structs (which isn't specifically a search channel, btw), there were much nicer ways you could have put it without attempting to shutdown the things that were already being discussed.

serene scaffold
#

but if anyone wants to discuss this further, please send a message to @sonic vapor.

exotic thicket
#

Why it's squared in the equation??

#

As it says (predicted value - true value)

steady basalt
#

would it be a gamble to guess that it would stop any negative values and keep the function relative scale?

wooden sail
# exotic thicket

because the squared terms have nicer properties. you can interpret what "nice" means in different ways

exotic thicket
#

I didn't understand the instructor as follows it's squared bcas the negative and positive value has to be fit to approximate the output

exotic thicket
wooden sail
#

a sum of squares is what we call "positive semi definite", or in other words it is always >= 0. then it is easy to see that the goal is to set this equal to 0, which corresponds to "minimizing"

wooden sail
exotic thicket
wooden sail
#

consider that error = estimate - measurement. then we want the error to be close to zero. if the error is 0, then estimate = measurement

#

there are many ways to try to solve estimate = measurement. one of them is by minimizing the absolute value of the error

#

absolute value is equal to squaring, then taking the square root (or taking the vector norm/l-2 norm depending on what you're more comfortable with). then we can square this because squaring a non negative number preserves ordering

spare briar
serene scaffold
#

.latex $ y = wx + \epsilon where \epsilon ~ N(0,1) $

strange elbowBOT
spare briar
#

in this case you add a nonlinearity from the sigmoid which complicates things (the linear case actually has a closed form solution, no SGD needed) but it ideologically comes from the same place

wooden sail
#

.latex let $\bm{y} \in \mathbb{R}^m$ be given by
\begin{align*}
\bm{y} = \bm{Ax} + \bm{n},
\end{align*}
with $\bm{x}, \bm{n} \in \mathbb{R}^n$ and $\bm{n} \sim \mathcal{N}(\bm{0}, \bm{\Sigma})$. then the maximum likelihood estimator of $\bm{x}$ is the solution to the classical linear least squares expression

strange elbowBOT
wooden sail
#

i guess i should've specified with n AWGN or \Sigma diagonal

rain horizon
#

Hello everyone, does anyone know how to solve the error,

job exception: 'XGBRegressor' object has no attribute 'XGBClassifier'

Trying to tune my model so I can make it as efficient and successful as possible, but when doing so, this pops up

steady basalt
#

Show import line and the tuning line

rain horizon
#

Copy, please hold

#

IMPORT

#

The error is coming in the second picture in the block of code under space

#

@steady basalt

steady basalt
#

Ur using xgbclasifier in ur function

#

I swear the image changed from RSE to accuracy

#

MSE rather

rain horizon
#

Sorry, I just swapped it from Classifier to Regressor, but it is the same error output

steady basalt
#

Did u also import classifier

rain horizon
#

Nope,

from xgboost import XGBRegressor is what I have

steady basalt
#

If u use classifier you need it imported

rain horizon
#

Yup, but I need Regressor, that is I why I swapped Classifier out, I retyped my code and put that there on accident haha

steady basalt
#

The image shows claaaifier

#

Classifier

rain horizon
#

I know, I swapped it out for Regressor

steady basalt
#

You’re using regression but using accuracy metric?

rain horizon
#

I am honestly knew to this, so I need to figure out a lot of it, but I need to fix this whole no object error first

steady basalt
#

Ok wait

rain horizon
#

I am trying to design a sports model

steady basalt
#

Ok instead of saying xgb.regressor put regressor

#

Not sure if it helps tho

#

You imported XGBregressor already