#data-science-and-ml

1 messages · Page 406 of 1

desert oar
#

('Quantity', 'sum') this is a tuple

loud cove
desert oar
#

my code does not include that

loud cove
#

.

desert oar
#

don't copy and paste. read, understand, and apply the knowledge to your own situation

loud cove
#

.

#

t_q = group by orders and get sum quantity.
drop duplicates from original df then merge it back with t_q to get the results i want, which is working fine.

cerulean stream
#

Hello, idk if this is the right channel to ask this but
how would I find the most common item in a 3d numpy array?, bincount only works on 1D arrays , I have a an array thats like [[[1, 2, 3], [4, 5, 6], [[1, 2, 3], [10, 11, 12]]] and I want the most common "innermost list" not the numbers
obviously the lactual array would be much bigger but in this case [1, 2, 3] would be returned

desert oar
#

the resulting dataframe is empty

#

and you get messed up data as a result

loud cove
#

my question is about the group by with the same exact code you evaluated here and not working.

#

nvm

wooden sail
loud cove
#

but yea doesn't matter im mainly interested on the group by thing

desert oar
#

@loud cove i am actually running your code now with your data... give me a bit

loud cove
wooden sail
#

possibly, if it takes nd arrays as valid objects

desert oar
#

your solution is pretty clever

serene scaffold
desert oar
#

@loud cove it's possible that groupby interacts badly with the missing values in the string data

wooden sail
#

alternatively, you can use an equivalence relation to do something similar. this should be better, on second thought. depending on what index gymnastics you are used to, you could keep the dimensions as is or reshape to a matrix, then use the outermost index and == my_array_at_this_iteration. again, summing over the resulting boolean array will give you the count you're after

loud cove
desert oar
#

well you'd have to merge anyway, so that solution is correct

loud cove
desert oar
#

however are you really trying to group and merge on all of these fields?

desert oar
loud cove
cerulean stream
#

Okay thanks everyone Ill try them

wooden sail
#

lemme set something up and show you an example

desert oar
loud cove
desert oar
#

oh, i see... you were trying to avoid merging

#

yeah just do the join/merge

#

apparently groupby + null is a bad mix

loud cove
#

it seems to be more about the duplicates

#

you see the dropping duplicates doesn't work

desert oar
#

i think it might be dropping the nulls when grouping

#

i don't think it has to do with duplicates

#
import pandas as pd

text_columns = [
    "Sale Code",
    "Order ID",
    "Store Name",
    "Player First Name",
    "Player Last Name",
    "Shipping First Name",
    "Shipping Last Name",
    "Shipping Address",
    "Shipping City",
    "Shipping State",
    "Shipping Zip",
    "Billing Phone",
    "Billing Email",
]

dtypes = {c: "string" for c in text_columns}
dtypes.update({"Quantity": pd.Int64Dtype()})

df = pd.read_csv(
    "order_report.csv",
    usecols=text_columns + ["Quantity"],
    dtype=dtypes,
)

total_quantity = df.groupby('Order ID')["Quantity"].sum().rename('Total Quantity')
df = df.join(total_quantity, on='Order ID')

print(df[['Sale Code', 'Order ID', 'Quantity', 'Total Quantity']])
loud cove
desert oar
#

note the use of proper null-supporting Int64 and Stringg dtypes

#

that's still not the point

#

the point is that there are nulls in the columns you are grouping on

loud cove
#

yea im just saying trying to dedupe even doesn't work

desert oar
#

because it's irrelevant

#

that's the whole point of groupby - aggregating across duplicated values

#

you need to pass dropna=False to groupby

loud cove
#

im talking about the groups, not the aggregations

wooden sail
# cerulean stream Okay thanks everyone Ill try them
In [24]: import numpy as np

In [25]: X = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [10, 11, 12]]])

In [26]: test = np.array([1,2,3]) #iterate over these

In [27]: counts_intermediate_step = ( X.reshape(4,3,order='F') == test ).dot(np.ones(3))

In [28]: counts = counts_intermediate_step == 3

In [29]: counts
Out[29]: array([ True,  True, False, False])

In [30]: result = sum(counts)

In [31]: result
Out[31]: 2

there must be a more clever way, but this works and should be more or less efficient. equivalently, you could use the subtraction approach i mentioned earlier. idk what is faster in numpy, a broadcasted difference or a boolean comparison

#

ofc there is no need to print nor keep the intermediate result, so you can take what says In [28] and call sum on it directly or multiply by a vector of ones from the left

#

this should extend to arbitrary-sized innermost dimensions and arbitrarily many axes or ways or whatever you call it (here you have a 3d or 3way array) as long as you're careful in the reshaping

#

on further thought, this can be done in like 2 lines using einsum, but i don't think it's much faster

desert oar
#

good old einsum

arctic wedgeBOT
#

:incoming_envelope: :ok_hand: applied mute to @lapis sequoia until <t:1653405227:f> (9 minutes and 59 seconds) (reason: duplicates rule: sent 4 duplicated messages in 10s).

lapis sequoia
#

@runic raft

#

Hi mate. I just realised that the JACCARD SIMILARITY that you taught me doesn't have all booleans. And one column is the sex column with 0 as females and 1 as males. Is it still fine to put them in the Jaccard? Seems alright because it's a relative measure standard for all rows. But just confirming.

loud cove
#

So I have a data frame that I want to use to fill a pdf, anyone have recommendation for a lib? I'll wrap it in a file and then move it to an excutable for non python users to use.

stone pollen
loud cove
misty flint
#

have you guys noticed the websites with good search engines vs. those that have crappy search engines

#

makes me want to build my own sometimes

#

if anyone has good resources for that btw lmk ID_blurryeyes

dusty valve
#

How would i write a wordle solver using tensorflow and transformers?

serene scaffold
dusty valve
serene scaffold
dusty valve
#

and wow do you type fast

#

is that a steno keyboard i sense?

serene scaffold
serene scaffold
#

probably not that uncommon for a late millennial.

#

when you get to gen z, they probably have more experience with touchscreen keyboards than physical ones.

#

@dusty valve you might enjoy this deep dive: https://www.youtube.com/watch?v=v68zYyaEmEA

An excuse to teach a lesson on information theory and entropy.
Special thanks to these supporters: https://3b1b.co/lessons/wordle#thanks
Help fund future projects: https://www.patreon.com/3blue1brown
An equally valuable form of support is to simply share the videos.

Contents:
0:00 - What is Wordle?
2:43 - Initial ideas
8:04 - Information theory...

▶ Play video
lapis sequoia
#
model=DecisionTreeClassifier() 
kfold_validation=KFold(10)

results=cross_val_score(model,X,y,cv=kfold_validation)```
Can someone tell me the difference between this and
```py
model=DecisionTreeClassifier() results=cross_val_score(model,X,y,cv=10)```
serene scaffold
#
model = DecisionTreeClassifier() 
kfold_validation = KFold(10)
results = cross_val_score(model, X, y, cv=kfold_validation)
# vs
model = DecisionTreeClassifier()
results = cross_val_score(model, X, y, cv=10)

Please use spaces in your code, so that it's easier to read.

One moment.

#

!docs sklearn.model_selection.cross_val_score

arctic wedgeBOT
#

sklearn.model_selection.cross_val_score(estimator, X, y=None, *, groups=None, scoring=None, cv=None, n_jobs=None, verbose=0, fit_params=None, pre_dispatch='2*n_jobs', error_score=nan)```
Evaluate a score by cross-validation.

Read more in the [User Guide](https://scikit-learn.org/stable/modules/cross_validation.html#cross-validation).
serene scaffold
#

@lapis sequoia it appears that if you pass a KFold for cv=, then the KFold instance does the work of partitioning the dataset. Whereas if you pass an int, then cross_val_score decides how to partition the data into that many folds on its own.

lapis sequoia
#

So k-fold is the same as standard value of 10. Whereas you can send in different instances such as stratified k fold. Or stratified random?

serene scaffold
#

sounds right to me.

lapis sequoia
#

Or whatever the default way to split might be for cross_val_score. You just gotta put in the number of folds.

#

Sounds good

serene scaffold
#

or you could define your own generator that does it however you want

lapis sequoia
#

Also. You know you were right. Feature selection is usually worthless for decision trees

#

At each split it automatically does a sort of "feature selection" so it knows what's good for it.

serene scaffold
#

yay

lapis sequoia
#

Based on information gain or some other index.

#

Also. Could you tell me something on how can I compare 2 models. I only have the final accuracy scores of them. And I compared them a bit on that. Is there something else I can do?

serene scaffold
lapis sequoia
#

@serene scaffold no just a binary classifier

serene scaffold
#

so you'll know not only which one has a better accuracy score, but also if the worse one is doing poorly in terms of false positives or false negatives.

lapis sequoia
#

Because there's no combined confusion matrix available. Only one for each fold.

#

And wrote, no point looking at one for each fold. It's worthless

#

Well, not like it's actually worthless. But I was lazy to write the code. Since I was getting the validation score directly from cross_val_score without having to generate the folds 🤪

serene scaffold
lapis sequoia
#

Can we?

#

Shit

#

Not gonna do it now though. Gonna take up a lot of my brain power

#

Is the composite one the average of the values in each entry? Might look ugly

serene scaffold
mint palm
#

when and how are custom loss functions made
i mean what are the symptoms they arent able to?
gradient descent not going down??
or irregularity or what?

#

@warm verge

warm verge
#

Ok basically

#

Sometimes you get functions that you need to optimise which are just a mess

#

The gradients are too erratic or discontinuous on a local level to even make sense of anything

#

Or potentially your data isn't well suited to have a conventional error function

#

This will happen a lot in some problem domains, so you make a new loss function based on some method of calculation

mint palm
#

got it thx

grave hare
#

I have a dataset that i am trying to forcast by the day using 3 indicators to do so. the dataset is a list of orders that have happened over the last 6 months. some customers/order combinations are repeated, some are but taper off. I am needing to forcast what orders will fall on what day. forecasting would be no more than a month ahead. I'm thinking some type of time series modeling, but not sure how to go about it. any suggestions or directions?

scenic tulip
#

well, you could isolate the orders that you know are repeated. find which day they fall on, then just add a timedelta that would increment the month and generate and new spreadsheet with the forecasted data @grave hare

#

the ones that taper off you would have to find at what rate they taper off, find the day and decrement the difference from the running value and just add it to that day of the next month @grave hare

mint palm
#

i was wondering: as there are two or more approach to predict generally everything, then can you apply siamese network to every prediction model??

arctic wedgeBOT
#

:incoming_envelope: :ok_hand: applied mute to @boreal summit until <t:1653430148:f> (9 minutes and 59 seconds) (reason: duplicates rule: sent 4 duplicated messages in 10s).

tranquil sage
#

Can anyone explain why need to split the training set into mini-batch? If my data size is around 3k, what is the recommended mini-batch size? Training Text CNN classifier

misty flint
#

mini-batch is a batch normalization method that can help the model train faster and sometimes improve model accuracy; dunno the recommended size since this is usually empirical and you have to try a few values for that hyperparameter.

theres a seminal paper about this entire concept from google

hallow patrol
#

Hello team, I am having issues understanding the below code related to Data Aggregates

Code#1

data = [1, 2, 3, 4, 5, 6]

for i in range(1, 6):
data[i - 1] = data[i]

for i in range(0, 6):
print(data[i], end=' ')

Output

2 3 4 5 6 6

Code# 2

data = [[0, 1, 2, 3] for i in range(2)]
print(data[2][0])

Output

0

For code #1 I am not sure what the data[i - 1] = data[i] code is doing and for code #2 I do not know if [2] is referring to the range code portion and then [0] is the index to be applied on the list.

Thank you for any feedback

misty flint
tacit basin
# hallow patrol Hello team, I am having issues understanding the below code related to Data Aggr...

data[i - 1] = data[i] shifts elements of 'data' list to the left so [1,2,3,4,5,6] becomes [2,3,4,5,6,6]

Code#2 creates list of lists

>>> data = [[0,1,2,3] for i in range(2)]
>>> data
[[0, 1, 2, 3], [0, 1, 2, 3]]
>>> data[2][0]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: list index out of range

The first index data[1] refers to list [0,1,2,3] and second data[1][0] is index of element in that list, so 0.
In your case there are two lists in the generated list so data[2] throws an error, as there's no index 2 in that list

tacit basin
grand anvil
#

Hi anyone knows how to save multiple ML model to a single pickle file? Thanks

dusk tide
#

Anyone worked with song recommendation system?

rose agate
lone vortex
#

hey guys, does anyone know panda ?

fleet plover
arctic wedgeBOT
#

gdas.py line 396

self.nodes[n-1].connections[ni].forward(x, types=types)  # Ltrain(w±, alpha)```
tacit basin
summer oracle
#

hi Is there anyone using doccano for relation annotation?

#

Sequence Labeling(NER part) works fine and but 'relation' label function seems not working

lone vortex
fair nimbus
# lone vortex Ah I just started pandas and I have no idea what to do 😅

The examples in the official pandas docs are quite good. Its a good lesson at minimum reproducible examples and every time I'm debugging something that has got too large debug and too proprietary to share. It worth to break it down the problem into smaller chunks usually from the pandas examples themselves.

mild dirge
#

I have a dataset of letter images I want to train a CNN on. But for pre-training I also have a dataset of the same characters, but a different font. What would be the best way for pre-training? since the model architecture doesn't have to change at all.

mild dirge
#

Seems like an exam or hw

tacit basin
shut phoenix
#
#classifying algoirthim
CSV_COLUMN_NAMES = ['SepalLength', 'SepalWidth', 'PetalLength', 'PetalWidth', 'Species']
SPECIES = ['Setosa', 'Versicolor', 'Virginica']

training_path = tf.keras.utils.get_file('iris_training', 'https://storage.googleapis.com/download.tensorflow.org/data/iris_training.csv')
testing_path = tf.keras.utils.get_file('iris_testing', 'https://storage.googleapis.com/download.tensorflow.org/data/iris_test.csv')
train = pd.read_csv(training_path, names=CSV_COLUMN_NAMES, header=0)
test = pd.read_csv(testing_path, names=CSV_COLUMN_NAMES, header=0)
training_y = train.pop('Species')
testing_y = test.pop('Species')

feature_column = []
for feature_name in train.keys():
  ...


why are we iterating over train

hollow hearth
#

hey i need some help with some signal processing applications

#

i have this signal that can be composed of multiple fractional frequencies

#

and i want to use DFT to find those

#

but it only returns the integer frequencies

#

so after a ton of searching i heard about this sinc interpolation thing that i can use to estimate the frequency between two bins

#

but i for the life of me cant find any resource to help me with that

mint palm
#

doubt regarding siamese applicability:

#

i read that it need similar NN configuration and input types.....but

#

can i apply it onto something like following:
input 1 is lips

#

input 2 is nose

#

if i wanted to predict skin disease, and feature that indicate skin disease is similar but have different visibility(identifiable) and magnitude of identifiaabilty

#

i mean both inputs are different in look, but used to identify same thing

charred cedar
#

Hello, keen for some help around implementing linear multiple regression analysis in Python please.
I have the analysis working in terms of I can select my four independent variables and my one dependent variable, and execute the analysis.
The problem is I have a bunch of control variables such as gender, age, another variable, and a categorical variable. How do I control for these?

dim palm
#

You must create as many binary variables as modalities in your categorical variable

#

gender variable must be a binary (1 or 0) not a string

upper spindle
#

what courses do people recommend for me to learn deep learning/ML

serene scaffold
tidal bough
#

I heard something about that course being moved to python

#

definitely not done yet, though

upper spindle
mild dirge
#

It's specific for pytorch, but if you are a little bit familiair with most of the concepts of machine learning they do also explain some of the basics throughout

charred cedar
mild dirge
#

You want to 1 hot encode organisation @charred cedar

charred cedar
#

What does that mean?

mild dirge
#

otherwise you implicitely assume that organisation 1 and 2 are more similar to each other than 1 and 3 f.e.

#

So instead of having 1 number, let it be represented by 3 numbers

#

and 1 hot encoding means it's either [1, 0, 0] [0, 1, 0] or [0, 0, 1]

wooden sail
#

1hot means to make a vector whose dimension is equal to the number of categories, with a 1 at the corresponding category and 0 everywhere else

mild dirge
#

for organisation 1 2 or 3

charred cedar
#

So a true false for each orgid?

mild dirge
#

jup

wooden sail
#

pretty much

mild dirge
#

basically

charred cedar
#

Then feed all three in as independent variables?

mild dirge
#

yes

charred cedar
#

Alright let me code this real quick...

#
               coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          2.0455      0.315      6.490      0.000       1.425       2.666
nofeed        -0.1022      0.041     -2.473      0.014      -0.184      -0.021
notrain       -0.0940      0.042     -2.258      0.025      -0.176      -0.012
cont           0.3919      0.074      5.313      0.000       0.247       0.537
neur          -0.0182      0.047     -0.385      0.700      -0.111       0.075
gender        -0.0362      0.141     -0.258      0.797      -0.313       0.241
age            0.0194      0.007      2.919      0.004       0.006       0.032
orgidA         0.4694      0.145      3.235      0.001       0.184       0.755
orgidB         0.7681      0.141      5.460      0.000       0.491       1.045
orgidC         0.8080      0.146      5.524      0.000       0.520       1.096
#

So does this mean I have controlled for them in the analysis now? Forgive the dumb question please 😄

wild dome
#

in Pandas, how to group each red box into one column with 3 subcolumns?

#

this is what I want

#

a "parent" column, if that makes sense lol

serene scaffold
#

if you have those columns in a separate dataframe (with the same row indexing as the current dataframe), you can get that behavior

stone pollen
#

what should i learn before trying to hop into data science and ml (considering i have basic python skills)

serene scaffold
#

data science is applied statistics, in many ways. and then ML is a lot of statistical inference.

stone pollen
#

thanks

mint palm
cold saddle
#

I have 5000 documents of 5 different types, 1000 each. I want a model which will tell me which of the 5 kinds a document is.
Any advice on where to begin?

mild dirge
#

what is a document?

#

It's like the most general definition of some data that you could give 😛

#

Is it just text, or also images, is it in image form, or do you have raw text as well? @cold saddle

cold saddle
#

Sorry for delay I got rear ended lol. I’m gona take a step back and think about my problem.

#

Okay so I have invoices as images. They are scans but high quality ones. 600dpi I think. I only have 5 different ones so I don’t mind making a separate model for each one actually. Is there a recommendation on where to start for extracting information? I have tried pdf to text ocr solutions but I don’t think that’s the way forward as the formatting isn’t great.

mild dirge
#

What differentiates the documents

#

If its the content of the text then you want to use that method

#

If it's just the lay-out, a cnn might be good enough alrdy

spare briar
# mint palm does no one know siamese? 😢

Siamese networks are not a good model for your application. Siamese architecture is typically used with triplet or contrastive loss to compare the two input images. The model essentially learns an energy function that measures similarity.

In the case of your example a siamese architecture could model something like whether the nose and lips originate from the same person.

Since you just want to use two images as inputs for classification the simplest thing would be to have a CNN for each of them, concatenate the outputs then add a few dense layers

cold saddle
# mild dirge If it's just the lay-out, a cnn might be good enough alrdy

Im going to focus on one kind of document first. Invoices from one vendor. The reason I can’t just do OCR and regex is the invoices come from China and the layout is similar but not perfect. Sometimes they very obviously cut and glue stuff lol. I think I just need bounding boxes around the table with the lines and paragraphs. Then I can OCR and regex what I need

#

I think my best path is treating them as images and CNN object detection. Since the docs are relatively similar I think I can be more specific then paragraphs and table

misty flint
#

highly recommend streamlit for ML prototypes if you arent already using it

#

especially if you have to show your model or analyses to others

half jolt
#

who offers services to parallelize a genetic algorithm in gpu python?

serene scaffold
half jolt
#

I have the algorithm, I just want to parallelize

serene scaffold
half jolt
#

parallelize fitness, crossover, mutation

#

or whatever is possible in the code

serene scaffold
#

GPUs are ideal for lots of independent, element-wise operations

half jolt
#

I have tried with Cuda but I cannot understand very well, in that part I am still very new

serene scaffold
half jolt
#

I want to hire someone who can help me. u be available?

iron basalt
#

!rule 9

arctic wedgeBOT
#

9. Do not offer or ask for paid work of any kind.

iron basalt
#

What is your task for the genetic algorithm?

serene scaffold
half jolt
half jolt
iron basalt
#

What are your performance bottlenecks? Have you profiled it?

half jolt
iron basalt
#

Also need a bit more information on what kind of genetic algorithm / how it's implemented. Can it even be made parallel? By a GPU?

half jolt
iron basalt
#

Ok, but it depends on how that is done / represented. There are multiple ways, and the GPU is only good at some things (a lot, but there are limits to what it can do well / at all).

#

If you are dealing with a bunch of numeric arrays (e.g. big contiguous numpy arrays), then it may the type of problem to run on a GPU.

#

Linear algebra computations.

#

Have you parallelized the algorithm without the GPU (on CPU)?

half jolt
#

what library could i use?

iron basalt
#

My answer depends on the type of computations being done. Are you doing things with numpy arrays and that is what is taking the most time?

#

(or arrays of numbers in general)

lapis sequoia
#

hello, don't want to clog up the channel, but i was just wondering if I could get some help with a dataframe problem in pandas.

half jolt
median moat
iron basalt
lapis sequoia
iron basalt
# half jolt numpy

Ok, you can try using numba first, maybe simply telling numba to parallelize it will be fast enough (it can do CPU or GPU, but for now try just CPU (assuming your CPU has a decent number of cores/threads)).

#

After that, if you want even fast, you can try using cupy if you are using an nvidia GPU, and pyopencl if not.

#

Or Pytorch and to device and all that. That can work too although it's a bit more than needed (it's a whole deep learning framework, not just for some generic computation on the GPU).

#

cupy basically gives you numpy on the GPU.

half jolt
iron basalt
#

(Or CPU in parallel)

#

(it also makes the numpy code run faster even without parallelism)

half jolt
iron basalt
#

I think I missed that, ok, so you tried pytorch.

half jolt
#

maybe my algorithm cannot be parallelized 😦

iron basalt
#

Yeah, but also could be how you are doing it.

#

IDK what to really say other than having to learn more about parallelization. It's too complicated of a topic, you often have to do some pretty big transformations on the algorithm to get it to parallelize well.

half jolt
iron basalt
#

Common ones are splitting the algorithm into multiple passes / phases (e.g. 1 for loop becomes 3 separate ones), flipping the data "touching" POV upside down (really hard to explain that one, it influences what synchronization is needed (if any)), removing branching (if statements).

#

Making local copies of data so that you don't need to have locks.

iron basalt
#

I guess a big one is make sure you are not constantly moving data back and forth from CPU to GPU and back. Do it all in one place (in batches).

iron basalt
iron basalt
#

!paste

arctic wedgeBOT
#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

half jolt
#

there it is until the part of the crossover

iron basalt
#

This has a lot of pure python stuff happening in it, including pure/plain Python loops. If you can vectorize it with numpy it will be a lot faster.

iron basalt
#

It seems you are finding the argmin.

#

In a plain Python loop.

#

A better understanding of how to vectorize things with numpy will go a long way.

half jolt
#

and with GPU?

iron basalt
#

You probably don't need the GPU. Not even multiple CPU cores.

#

Simply making proper use of numpy will give you a very large performance increase.

#

If it's still too slow, you can throw numba at it (after correctly applying numpy).

half jolt
iron basalt
#

Yes, probably, just from a short glance looks like it.

#

GPU probably too, but it is probably not needed, unless you really start scaling up really big.

half jolt
iron basalt
#

If you don't want to learn more about numpy and just want to write in this style (hand written loops), then it's time to switch to a language that does numeric computation better.

#

(For example, if you translated this to something like C++ (pretty much as directly as possible), it would just be fast already (no parallelization))

half jolt
iron basalt
#
fo = np.inf
index = 0
for i in parents:
  _fo = fitness(poblacionInicial[i], rendimiento_kg_m2_prod, precio_kg_t_suma)
  if _fo < fo:
    fo = _fo
    index = i
#
float fo = F32_INFINITY;
int index = 0;
for (int i = 0; i < num_parents; ++i) {
  _fo = fitness(poblacionInicial[i], rendimiento_kg_m2_prod, precio_kg_t_suma);
  if (_fo < fo) {
    fo = _fo;
    index = i;  
  }
}
half jolt
#

sorry if i didn't express myself

iron basalt
#

The idea with numpy is to avoid Python loops.

#

You operate on entire arrays rather than individual elements.

#

So fitness does not work on a specific i, but rather all of them (so no [i]).

half jolt
iron basalt
#

(The reason why avoiding Python loops is important is because they are slow, and instead the looping happens inside numpy which is implemented in C so its loops are fast like in the C++ example).

iron basalt
#

(Because for example, cupy has many of the same functions as numpy, and so it can be pretty much one to one converted (so the code looks the same, just using the GPU))

half jolt
iron basalt
#

Computers like groups of the same type of thing. For speed, simplicity, etc.

half jolt
iron basalt
#

So your functions should take arrays are arguments, and apply array-level operations like argmin.

#

Now you might get into a situation in which you don't know how to vectorize it / don't know which numpy functions to use and don't see a way to do it. That is where numba comes in, it lets you make your own functions like argmin that are just as fast / operate on numpy arrays. Numba is made to work with numpy to fill any gaps in numpy (missing functions). It can also just make it a lot faster (and even run on the GPU, but also can parallelize on CPU (don't worry about this yet)).

half jolt
#

Thanks for the help

half jolt
iron basalt
#

Try splitting individuo into two different arrays. Or you can do some fancy numpy datatype stuff.

#

(So fitness takes two args)

#

So you can store stuff like this in general: ```py
[(x, y), (x, y), (x, y), ...]

#

Or

#
[x, x, x, ...]
[y, y, y, ...]
#

1 array versus 2 arrays.

#

This part is a bunch of elementwise operations: rendimiento[i]*precio[i]*j

#

And the rest is a sum.

#

So say you have individuo_i and individuo_j.

#

rendimiento[individuo_i] gives you another array.

#

Numpy lets you use arrays with indices in them to index another array.

#

np.sum(rendimiento[individuo_i] * precio[individuo_i] * individuo_j)

#
>>> a = np.random.randint(10, size=10)
>>> a
array([6, 4, 2, 8, 8, 8, 3, 1, 7, 1])
>>> b = np.array([3, 1, 5])
>>> b
array([3, 1, 5])
>>> a[b]
array([8, 4, 8])
>>> 
#

So, it's basically the same thing, just no hand written loop, working at the array level, and that includes indexing at the array level.

half jolt
tacit basin
#

Could you share sample frames you want to join and code and output?

timid narwhal
#

does anyone know how to turn the to_numpy output into an array with separated indicies?

#

like [-35.2210673 -9.0063682 'Delmiro Gouveia 774, Maceió, Alagoas'] and add the commas into this [-35.2210673, -9.0063682, 'Delmiro Gouveia 774, Maceió, Alagoas']

#

I tried to do np.char.split but it doesnt work with nonstrings

tacit basin
timid narwhal
#

yeah

tacit basin
#

convert numpy array to python list?

royal crest
#

list()

lone vortex
#

Anyone can help with pandas, I am completely new to it

rose agate
lone vortex
#

Ok thanks

bold timber
#

What is the meaning of ord? whether ord=1 is for calculating manhattan distance?

wooden sail
#

the common vector norms you are familiar with are what is called l-p norms, which consist of the sum of the absolute values of the entries of the vector raised to the pth power, and then you take the pth root of the whole sum

#

ord = 1 means raised to the first power and taking the 1st root, i.e. the sum of absolute values. as you said, this is the manhattan distance

#

ord = 2 is the usual euclidean distance

#

infinity norms return the element with largest or smallest magnitude, and 'fro' is short for Frobenius, which is similar to the 2-norm (euclidean distance) but for matrices (a double sum instead of a single sum)

bold timber
#

why the result of this code is one? can you elaborate to me by math?

wooden sail
#

pth as in ordinal. e.g. 1st, 2nd, 3rd, 4th, 5th, 6th, etc

#

i don't think this server has a latex bot so i can't just write the math

#

lemme find an image

bold timber
wooden sail
#

if you substitute what you have into the equation i shared, and note that by default norm takes p = 2

#

we get sqrt((4/5)^2 + (3/5)^2) = sqrt (16/25 + 9/25) = sqrt(25/25) = 1

bold timber
tacit basin
wooden sail
#

.latex $\sum_{n=1}^1 \vert x_n \vert^p$

dusty valve
#

how would i fit a dataset from a text file into a language prediction model?

peak ridge
#

How much math is imp to learn data science data visualization and machine learning
I mean how much math is required for being a data scientist

gray orchid
odd meteor
wooden sail
#

but the higher the level, the better

peak ridge
#

@gray orchid @wooden sail resources to learn plz!

gray orchid
#

Oh, and the most important

#

the ability to find resourse

wooden sail
#

uni, spivak's calculus book, gilbert strang's linalg book

#

louis scharf's statistical signal processing book

#

and classics like randolph moses and petre stoica's spectral analysis of signals

wooden sail
#

that book predates ML and AI becoming such hot buzzwords btw, so you'll find no mention of them. nowadays, most topics of signal processing, statistical analysis and optimization fall under that umbrella though. they overlap like 99% or one is a subset of the other, pretty much

terse frigate
#

Ant colony optimization

#

can someone explain an approach

wooden sail
#

you can set up a system of equations. kinda have to make an assumption on the desired quantity, but this should be a scalar factor. you can assume the output quantity is 1[units] * desired_percentage

#

with that in mind, you want a linear combination of the given percentages that yields the desired percentage

#

with the restriction that the sum of quantities equals 1

#

sounds like linear programming

wooden sail
#

mhm

terse frigate
terse frigate
wooden sail
#

idk, try lagrange multipliers?

#

you have one equation and a constraint

#

the equation is convex (but not strictly so, there might be many solutions)

wooden sail
#

heh the solution is the same as for beamforming, but there's a pseudo inverse of a rank 1 matrix involved. i'll type it up after i eat

wooden sail
#

ok let's give this a shot

#

.latex let's start by calling the desired percentage $p$, the given percentages $\boldsymbol{x} \in \mathbb{R}^n$, and our target quantity $\boldsymbol{w} \in \mathbb{R}^n$, i.e. the amount of each of the ingredients

#

oh latex is still not allowed here

#

oof

#

lemme grab my tablet

wooden sail
#

underlined quantities are vectors, so that underlined 1 is a vector of 1s of size n @terse frigate

#

this should give you ONE solution. there are others, since xx^T is rank 1

mint palm
#

i transfer learning we have two step, right?

  1. training a task by simple supervised ANN
  2. using pretrained model to further train ANN to suite similar but little different prob

but, i read about transfer learning in two place:

  1. after pretrain model(using supervised ann) was deployed and still learned and improved
  2. after pretrain model(using supervised ann) was again trained(using supervised ann) learned and then deployes
    i mean, in 2nd algo doesnt improve after deploying
#

are both these transfer learning??

#

i wanna implement the first one where it improve after deploying....but i am getting tutorial for 2nd only

wooden sail
#

they're both transfer learning, since the idea behind that is to train a part of the network ahead of time, and then keeps its parameters fixed while adding new, trainable parameters after the pre-trained network. how you do the training of the new part is a different matter

mint palm
wooden sail
#

i'm not sure what methods are used for that

charred cedar
#

Can anyone explain this stupid example code to me please?

>>> import statsmodels.api as sm
>>> import statsmodels.genmod.families.links as links
>>> probit = links.probit
>>> outcome_model = sm.GLM.from_formula("cong_mesg ~ emo + treat + age + educ + gender + income",
...                                     data, family=sm.families.Binomial(link=probit()))
>>> mediator_model = sm.OLS.from_formula("emo ~ treat + age + educ + gender + income", data)
>>> med = Mediation(outcome_model, mediator_model, "treat", "emo").fit()
>>> med.summary()
#

Specifically the string arguments... because it makes no sense...

#

This is meant to be an example implementation of a mediated regression analysis in Python with statsmodels

dusty valve
#

i got this code -

#
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
print('done')
sentences = set(open('users.txt').read().split('\n'))

vocab_size = 1000
embedding_dim = 16
max_length = 16
trunc_type = 'post'
padding_type = 'post'
oov_toke = '<OOV>'
training_size = 20000

tokenizer = Tokenizer(num_words=100, oov_token='<OOV>')
tokenizer.fit_on_texts(sentences)
word_index = tokenizer.word_index
sequences = tokenizer.texts_to_sequences(sentences)

sequences = pad_sequences(sequences, padding=padding_type,
                          truncating=trunc_type, maxlen=5)


model = keras.Sequential([keras.layers.Embedding(vocab_size, embedding_dim, input_length=max_length),
                          keras.layers.GlobalAveragePooling1D(),
                          keras.layers.Dense(6, activation='relu'),
                          keras.layers.Dense(1, activation='sigmoid'),])```
#

when i run it, all it shows is 2022-05-26 10:42:40.631846: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found 2022-05-26 10:42:40.632428: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 2022-05-26 10:42:59.610722: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'nvcuda.dll'; dlerror: nvcuda.dll not found 2022-05-26 10:42:59.611401: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303) 2022-05-26 10:42:59.623799: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: LAPTOP-KDFNN9DK 2022-05-26 10:42:59.625071: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: LAPTOP-KDFNN9DK 2022-05-26 10:42:59.626772: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2 To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.

#

it only loads tf up, doesn't run the rest of the code

leaden crow
#

hey, for some reason an image won;'t show up behind my data

wild dome
#

in Pandas, how to merge the red boxes into a single cell? how to have multirows of rows with same value, from an existing dataframe

serene scaffold
#

do you want to sum them, or what?

#

or are you trying to have cells that span multiple rows? because you can't do that

desert oar
#

you can make a cell that contains tuples, although usually you don't want to do that

lapis sequoia
#

what is the difference between LSTM and RNN?

tacit basin
charred cedar
#

Second would be what does ~ mean?

#

Third would be is all the +'s after for control variables?

charred cedar
tacit basin
#

What is data?

charred cedar
#

I am assuming a dataframe but I don't know, this is example code

tacit basin
#

I guess these are col names

charred cedar
#

Yes they should be column names

tacit basin
#

Ok so mistery solved? :)

charred cedar
#

No unfortunately

#

It doesn't answer the three questions that have me stumped

tacit basin
#

So first q. We assume there are col namea

#

~ usually means neg, so i guess here is the same

#
  • i would guess are for features to include in linear model
serene scaffold
tacit basin
charred cedar
#

So I have Neuroticism (which is the column I want to use as a mediator), Lack of Feedback (which is the independent variable), and Job Satisfaction (which is the dependent variable). I also have Age, Gender, OrgidA, OrgidB, and OrgidC which are variables to control for. So how do you think I format those columns into the correct arguments?

#
probit = links.probit
outcome_model = sm.GLM.from_formula("neur ~ nofeed + jsat + age + gender + orgidA + orgidB + orgidC",
                                     df, family=sm.families.Binomial(link=probit()))
mediator_model = sm.OLS.from_formula("nofeed ~ jsat + age + gender + orgidA + orgidB + orgidC", df)
med = Mediation(outcome_model, mediator_model, "jsat", "nofeed").fit()
med.summary()
#

This gets some error which tells you nothing helpful.

#
D:\Projects\Python\135 Code\Git\BSN414\.venv\lib\site-packages\statsmodels\stats\mediation.py:372: RuntimeWarning: invalid value encountered in true_divide
  self.prop_med_tx = self.ACME_tx / self.total_effect
#

All column names are correct

tacit basin
#

Never used it. Now on mobile hard to debug this . Sorry

charred cedar
#

All good, I appreciate the help either way. Do you think this is the way the string arguments are done though?

tacit basin
#

Let me check docs

charred cedar
#

That is the annoying part, docs are useless. Do you need the link again though?

tacit basin
#

These are R-style formulas I'm reading

charred cedar
#

Yes this Python packaged is probably based on R

charred cedar
#

I'll admit I don't understand this formula writing, and for a 3am read, these docs also aren't very clear.

#

None the less the code snippet should be correct for a mediated regression analysis.

haughty topaz
#
from sklearn.preprocessing import MultiLabelBinarizer

the_100_most_common_words = ['i', 'you', 'the', 'to', 'and', 'a', 'it', 'ross', 'monica', 'rachel', 'chandler', 'is', 'that', 'joey', 'phoebe', 'oh', 'in', 'of', 'do', "n't", 'me', 'on', 'know', 'this', 'just', 'my', 's', 'with', 'you', 'what', 'her', 'we', 'have', "'m", 'was', 'for', 'are', 'not', 'he', 'like', 'up', 'be', 'what', 'na', 'out', "'re", 'at', 'yeah', 'no', 'so', 'scene', 'well', 'your', 'there', 't', 'hey', 'no', 'she', 'okay', 'ross', 'right', 'his', 'all', 'but', 'him', 'about', 'get', 'go', 'gon', 'got', 'chandler', 'can', 'monica', 'joey', 'rachel', 'the', 'here', 'phoebe', 'm', 'it', 'uh', 'they', 'one', 'think', 'mean', 'did', 'so', 'all', 're', 'see', 'don', 'back', 'and', "'ll", 'from', 'he', 'okay', 'if', 'want', "y'know"]

mlb = MultiLabelBinarizer().fit([the_100_most_common_words])

sentence_to_transform = ["c'mon", ',', 'you', "'re", 'going', 'out', 'with', 'the', 'guy', '!']

vector = mlb.transform([sentence_to_transform])
print(vector)
print(len(vector[0]))
#

The length of the 100 most common words is 100
How come the length of the vector it creates is only 84?

charred cedar
#

Is that intended?

haughty topaz
#

no that's just a copy paste mistake

charred cedar
#

Did you confirm the length of that list?

haughty topaz
#

Yea yea it's 100

#

For sure

charred cedar
#

Well that is the dumb reasons checked off. I don't know enough about the sklearn package unfortunately.

#

Goodluck fixing it.

haughty topaz
serene scaffold
#

!e

print(len(set(['i', 'you', 'the', 'to', 'and', 'a', 'it', 'ross', 'monica', 'rachel', 'chandler', 'is', 'that', 'joey', 'phoebe', 'oh', 'in', 'of', 'do', "n't", 'me', 'on', 'know', 'this', 'just', 'my', 's', 'with', 'you', 'what', 'her', 'we', 'have', "'m", 'was', 'for', 'are', 'not', 'he', 'like', 'up', 'be', 'what', 'na', 'out', "'re", 'at', 'yeah', 'no', 'so', 'scene', 'well', 'your', 'there', 't', 'hey', 'no', 'she', 'okay', 'ross', 'right', 'his', 'all', 'but', 'him', 'about', 'get', 'go', 'gon', 'got', 'chandler', 'can', 'monica', 'joey', 'rachel', 'the', 'here', 'phoebe', 'm', 'it', 'uh', 'they', 'one', 'think', 'mean', 'did', 'so', 'all', 're', 'see', 'don', 'back', 'and', "'ll", 'from', 'he', 'okay', 'if', 'want', "y'know"])))
arctic wedgeBOT
#

@serene scaffold :white_check_mark: Your eval job has completed with return code 0.

84
charred cedar
#

Guess we missed a dumb reason.

#

😄

serene scaffold
#

😄

wooden sail
#

ok, let's give this another shot

#

.latex $\left( \sum_{n=1}^N \vert x_n \vert ^p \right)^\frac{1}{p}$ for the l-p norm

strange elbowBOT
wooden sail
#

aight, cool

serene scaffold
#

!docs pandas.DataFrame.groupby

arctic wedgeBOT
#

DataFrame.groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=NoDefault.no_default, observed=False, dropna=True)```
Group DataFrame using a mapper or by a Series of columns.

A groupby operation involves some combination of splitting the
object, applying a function, and combining the results. This can be
used to group large amounts of data and compute operations on these
groups.
serene scaffold
#

groupby returns a grouped dataframe, whereon you can apply another operation, like mean

#

you would probably want to drop the name column before grouping, since it doesn't matter for this.

haughty topaz
#
from sklearn.preprocessing import MultiLabelBinarizer

the_100_most_common_words = ['you', 'the', 'to', 'and', 'a', 'it', 'is', 'that', 'in', 'of', 'do', "n't", 'me', 'on', 'know', 'this', 'just', 'my', 's', 'with', 'what', 'her', 'we', 'have', "'m", 'was', 'for', 'are', 'not', 'he', 'like', 'up', 'be', 'na', 'out', "'re", 'at', 'so', 'your', 'there', 't', 'no', 'she', 'right', 'his', 'all', 'but', 'him', 'about', 'get', 'go', 'gon', 'got', 'can', 'here', 'm', 'uh', 'they', 'one', 'think', 'mean', 'did', 're', 'see', 'don', 'back', "'ll", 'from', 'okay', 'if', 'want', "y'know", 'look', 'now', 'over', 'really', 'guys', 'guy', 'as', 'how', 'then', 'who', 'phone', '‘', 'by', 'ah', "'ve", 'would', 'when', 'thing', 'down', 'going', 'good', 'were', 'tell', 'had', 'off', 'apartment', 'door', 'something']

mlb = MultiLabelBinarizer().fit([the_100_most_common_words])

sentence_to_transform = ["c'mon", ',', 'you', "'re", 'going', 'out', 'with', 'the', 'guy', '!']

vector = mlb.transform([sentence_to_transform])
print(vector)
#
[[0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0
  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
  0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0]]
#

Why does this give this vector bruh, I can't with this MultiLabelBinarizer

#

Wtf it sorts the classes?

serene scaffold
haughty topaz
#

Yeah but look at the classes I fit in the MultiLabelBinarizer

serene scaffold
haughty topaz
#

No it doesn't need to actually

#

but I didn't get why it did that

serene scaffold
#

there's a mean() method

haughty topaz
#

weird that it sorts the classes

serene scaffold
#

I would need to see dfAthletesClean.head().to_dict('list') as text to tell you.

#

but I'm not sure why you didn't just do dfAthletesClean.groupby(['sex', 'nationality'])['height'].mean()

quiet cloak
#

I just looked through this channel and I did not understand a thing that was written in here. Computer science major so flop right now 🤦‍♂️

haughty topaz
#

same

mint palm
#

has anyone used NS2 before??

serene scaffold
wooden sail
#

you could also do a phd in CS and never touch any of these topics

#

CS is pretty broad, and depending on your country, ranges from really software dev, to basically a branch of math

#

this stuff can fit somewhere in between

serene scaffold
#

so, you're doing mean imputation. the best solution I can think of involves pd.merge, and that might be confusing for you.

mint palm
#

if a model it to be made considering its a pretraining model, how should i evaluate its appropriateness?
i mean should i evaluate normally, using accuracy, f1 score, confusion matrix etc??

wild dome
#

but now I wanna take a different approach

serene scaffold
wild dome
#

consider this dataframe, and note how each color represent the same instance parameters, for example n=50, m=50, p=5, a=2 in red

I have multiple rows with these same parameters, and I want to group them by their average, so instead of having multiple rows, I want only one row of the same parameters but with the rest of the data being the average of the total rows

#

so the desired output is like

#

imagine the data from RGD to the right is the average of the original

agile cobalt
#

re: the original thing
depending on what you want to do, perhaps you could use a Multi Index or just df.groupby(), but "multirows" does not makes much sense to me

serene scaffold
wild dome
serene scaffold
#

I'm not completely sure how you'd achieve that when your columns are multiindexed

wild dome
serene scaffold
#

if you do print(df.head().to_dict('list')) and show the text, I can experiment. No screenshots.

scenic tulip
#

you could do groupby(n, m, p, alpha). pretty sure you can group multiple indexes eh? in sql you can

wild dome
# serene scaffold if you do `print(df.head().to_dict('list'))` and show the text, I can experiment...
{('instance', 'n'): [50, 50, 50, 50, 50],
 ('instance', 'm'): [50, 50, 50, 50, 50],
 ('instance', 'p'): [5, 5, 12, 12, 5],
 ('instance', 'alpha'): [2, 3, 2, 3, 2],
 ('RGD', 'OF'): [595, 824, 387, 595, 716],
 ('NI', 'OF'): [519, 626, 306, 358, 547],
 ('NI', 'time'): [0.2850522999999612,
  0.9070183999999699,
  0.2490571999999247,
  0.3609853000000385,
  0.35417499999994106],
 ('NI', 'improvement'): [12.77310924369748,
  24.02912621359223,
  20.930232558139537,
  39.831932773109244,
  23.60335195530726],
 ('FVS', 'OF'): [519, 626, 305, 438, 547],
 ('FVS', 'time'): [0.010051900000007663,
  0.04784549999999399,
  0.007143799999994371,
  0.005448199999818826,
  0.011858300000085364],
 ('FVS', 'improvement'): [12.77310924369748,
  24.02912621359223,
  21.188630490956072,
  26.386554621848738,
  23.60335195530726]}
#

ok I tried this code

df.groupby([("instance", "n"), ("instance", "m"), ("instance", "p"), ("instance", "alpha")]).mean()
#

the instance column is cursed lol now I'll try without top columns

serene scaffold
#
In [22]: poop.index.names
Out[22]: FrozenList([('instance', 'n'), ('instance', 'm'), ('instance', 'p'), ('instance', 'alpha')])

In [23]: poop.index.names = 'n m p alpha'.split()

In [24]: poop
Out[24]:
                  RGD     NI                          FVS
                   OF     OF      time improvement     OF      time improvement
n  m  p  alpha
50 50 5  2      655.5  533.0  0.319614   18.188231  533.0  0.010955   18.188231
         3      824.0  626.0  0.907018   24.029126  626.0  0.047845   24.029126
      12 2      387.0  306.0  0.249057   20.930233  305.0  0.007144   21.188630
         3      595.0  358.0  0.360985   39.831933  438.0  0.005448   26.386555
#

I had to think of a name for the resultant df, so I picked "poop" because I didn't like it.

#

but, uh, there you go

wild dome
#

thanks

#

I have a question about the index

#

I removed the top headers and ran this code

results50.groupby("n m p alpha".split()).mean()
#

and if I add reset_index I get the following

results50.groupby("n m p alpha".split()).mean().reset_index()
#

why in the first case I had 2 rows in the headers? is it a multiindex too?

#

now for context, I'm gonna write this DF to a latex table, that's why I'd prefer a multirow

#

so I like the first output, without .reset_index, but idk why there are 2 rows in the headers

serene scaffold
wild dome
serene scaffold
lapis sequoia
#

Can someone please share links to other big servers of data science and ml

serene scaffold
lapis sequoia
#

I tried joining it once. But didn't get entry access

#

The DS one

#

Stuck in quarantine

serene scaffold
#

that may be by design. the DS server tries to cater to a more knowledgeable crowd than we do.

lapis sequoia
#

But how did they find out that I am not knowledgeable 🤪

#

I am just stuck outside

serene scaffold
lapis sequoia
#

Lol

#

Now you are just making jokes

serene scaffold
#

did you read everything in the screenshot?

lapis sequoia
#

Maybe that's how they determine if I am smart or not

#

Can you help me cheat on this "exam"

serene scaffold
#

the "Hint:" part looks relevant

lapis sequoia
serene scaffold
lapis sequoia
#

Oh there's a question

#

Regarding other name to normal distribution

robust jungle
#

after augmenting my data (yale faces dataset) my loss actually went up and my accuracy went down, what am I doing wrong?

lapis sequoia
#

I broke out mate 😀

#

Smart kolv

serene scaffold
lapis sequoia
#

Well. I am very smart and ||googled it||

serene scaffold
median moat
#

Reading and the ability to use Google?!? Impossible.

misty flint
lapis sequoia
mighty relic
#

Hi guys, am I wanted to showcase my forecasting package here. I mentioned it six months ago. I am a professional forecaster and felt like this was a gap when it comes to large scale enterprise forecasting.
https://github.com/alexhallam/tablespoon

GitHub

🥄✨Time-series Benchmark methods that are Simple and Probabilistic - GitHub - alexhallam/tablespoon: 🥄✨Time-series Benchmark methods that are Simple and Probabilistic

#

I will be online for about an hour if anyone has any questions about it.

#

If you click on "Open in Colab" you can run it in Google Colab.

austere steppe
#

Hey everyone I have a problem on an exercise if someone can help me thanks

#

I can't show the linear regression on my scatter plot

#

It use pandas matplotlib and scikit learn

mighty relic
#

can you share your notebook link?

misty flint
meager portal
#

I've been stuck onto this for several months now and no video really explained it well. What weights do I use for the partial derivitive? Do I transpose the matrices and get the dot product of them? Do I multiply all the derivitives of all the weights together with respect to the previous layer? What do I do?

main fox
barren wedge
#

does anyone implement torch.jit in Bert model?

serene scaffold
#

also, keep in mind that "implement" does not mean the same thing as "use".

barren wedge
wooden sail
# meager portal I've been stuck onto this for several months now and no video really explained i...

the example you show there does not appear to have any matrices at all, though you could generalize it to W_i being matrices and a_i and y being vectors. in the example you showed, all they have done is use the chain rule repeatedly, and the equations given are exactly what you would do to update the parameters: the gradients here are only products of the weights, and the only explanation really is "use the chain rule". as for the matrix case, there is no general expression for the derivative. some authors like expressing it all in einstein notation to hide the pain of the derivative being a 3-way tensor. you can also use some matrix unfoldings to turn it into a huge matrix. the easiest way is to find the expression component-wise and apply it that way, or use einsum to do the relevant operations

#

some things you can do are read about tensor unfoldings, einstein notation, and simply brush up your chain rule. since the weights and biases represent affine transformations, and the activation functions are usually "well-behaved", the derivatives are usually not very difficult to analyze component-wise if you write everything as a sum (or in einstein notation foregoing the sigmas)

#

you might find "the matrix cookbook" a useful read, although it presents some common matrix calculus results without any proof. the proofs follow from writing out the sum and doing it by hand 😛 not very difficult, but certainly tedious

glass lark
#

are there is a website for training data science

lapis sequoia
#

Sharing the best Pandas cheat sheet I have found yet. In case someone else might be interested. It's super intuitive and easy to understand!

bold timber
#

How to grabbing values of 3,5,7?

drifting fjord
torpid cave
#

Hey @mighty relic, went through the package. Looks nice. You're only doing 3 methods though (maybe I saw it wrong)

#

Are you going to include more in the future?

bold timber
drifting fjord
bold timber
cerulean violet
#

Hello there,I am getting this error for my JARVIS AI well it aint workin

sleek tapir
#

for svm

#

can u tune C, gamma and kernel at the same time

#

kernel has [linear, rbf, sigmoid, poly]

#

or u cant do tat

mint palm
#

are there any rules for limiting regularization usage while pre training a model before deployment

sleek tapir
#

hmmm

fading geyser
#

bro u were right

#

it was corrupted

dusty valve
#

how do i create a model and train it from a text file of sentences like

Never gonna give you up
Never gonna let you down
Never gonna run around and desert you```
because all the tutorials i tried to find didn't show me exactly how they worked
gray orchid
young granite
#

i created 7subplots onto a grid and now want to fig.update_layout but only the first subplot is changed how can i update all at the same time?:

from plotly.subplots import make_subplots

fig = go.Figure()
fig = make_subplots(rows=7, cols=1,
        specs=[[{'type': 'surface'}],
               [{'type': 'surface'}],
               [{'type': 'surface'}],
               [{'type': 'surface'}],
               [{'type': 'surface'}],
               [{'type': 'surface'}],
               [{'type': 'surface'}],
               ])
count=0
for group_name in data:
    define= "7a direct"
    if define in group_name:
        count+=1
        trace = group_name
        df = data[group_name]
        df.drop_duplicates(subset ="name",
                         keep = False, inplace = True)
        z = df.drop(["name"], axis=1)
        fig.add_trace(go.Surface(z=z,
                                 y=df["name"],
                                 x=df.columns[1:],
                                 name=trace,
                                ),
                      row=0+count,
                      col=1,
                     )```
mint palm
#

what kind of notation is that?

#

"pi is represented by an
Artificial Neural Network (ANN), which is generated by
AI algorithms"

tidal bough
#

huh, this looks to me like they either meant it's a set of 3 things (in which case they probably mean there are 3 kinds of NNs), or (less likely) an array of 3 things (in which case I guess they can be meaning that it's an NN consisting of 3 separate NNs).

final field
#

can anyone help me with object detection with tensorflow?

dusty valve
#

im following this tutorial in tf - https://www.tensorflow.org/text/tutorials/text_generation
but im getting this error

Input 0 of layer "gru" is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: (100, 256)

Call arguments received by layer "my_model" (type MyModel):
  • inputs=tf.Tensor(shape=(100,), dtype=int64)
  • states=None
  • return_state=False
  • training=False```
#

!paste

arctic wedgeBOT
#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

dusty valve
fierce pine
#

I'm learning machine learning and doing an internship. Can anyone please give me some project ideas

fierce pine
dusty valve
fierce pine
#

Can u give me some project ideas please

mint palm
#

Can it be?

brittle skiff
#

hello guys, can someone help me with bit strange question

"""
VAR 3

    min z = min(3x1 - 5x2 - 2x3 + 4x4)
    x1 + 7x2 + x3 + 7x4 <= 46
    3x1 - x2 + x3 + 2x4 <= 8
    2x1 + 3x2 - x3 + x4 <= 10
    xi >= 0, i = 1,2,3,4
"""

table: list = [[1, 7, 1, 7, 1, 0, 0],
               [3, -1, 1, 2, 0, 1, 0],
               [2, 3, -1, 1, 0, 0, 1],
               [-3, 5, 2, -4, 0, 0, 0],
               [46, 8, 10, 0]]

n, m = 4, 3


index_max_basis: int = table[-2].index(max(table[-2], key=abs))

max_basis_column: list = [column[index_max_basis] for column in table[:m+1]]
divided_basis: list = [float(f'{(i / j):.3f}') if j > 0 else 0 for i, j in zip(table[-1], max_basis_column)]

index_basis_row = divided_basis.index(min([x for x in divided_basis if x != 0]))

max_basis_row: list = table[index_basis_row][:]
max_basis_row.insert(0, table[-1][index_basis_row])

intersectDigit: int = max_basis_row[index_max_basis+1]

for column_id, column in enumerate(table):
    print(f"\nPART {column_id}\n")
    for row_id, row in enumerate(column):
        if len(max_basis_column) > column_id and row != max_basis_row[1:][row_id] and column != max_basis_column[column_id]:
            # print(f'{row} - ({max_basis_row[1:][row_id]} * {max_basis_column[column_id]}) / {intersectDigit}')
            print(row - (max_basis_row[1:][row_id] * max_basis_column[column_id]) / intersectDigit)
        elif row in max_basis_row and row == max_basis_row[row_id+1]:
            print(row / intersectDigit)
        else:
            print(row - (table[-1][index_basis_row] * max_basis_column[row_id]) / intersectDigit)```

its Lineal programming task and i need to solve it using Simplex method. i kinda made it but got stuck with last IF ELSE...
Every other column and row works  perfect exclude *x2* and *F(x1)*
Screenshot 1, simple look at **B** Column and what i got *screenshot 2*. If i change AND with OR i got *screenshot 3*. But i need both of them.
Hope u can help me. I can explain if need.
civic stone
#

Hello Everyone,

I am working on Clustering Documents
i used TF-IDF matrix for vectorization
is there any other clustering algorithms that can work with TF-IDF matrix except K-Means and HAC ?

Thanks

bold canopy
#

Hello,
i have the following code and i want to subtract new calculated gradient from my old weights but instead of subtracting the weight from the 1 at the beginning it replaces it with the gradient it self

self.calculate_gradient(self.X_train, self.y_train, self.weights)
new_weights = self.weights - self.alpha * self.gradient
self.send_new_weights(new_weights)

In the screenshot you can whats happening but i want that the outcome of new weights is
1- loss and not -loss

wooden sail
#

the gradients are much too big, look at them

#

for practical purposes, that 1 may as well be 0 when you subtract a number 12 orders of magnitude larger

bold canopy
#

I have to calculate the squared loss

#

but i dont know really how to calculate the gradient of the squared loss i assumed its 2*(y_pred - y) . x.T

#

of this

#
y_pred = np.dot(x, weights)
diff = y_pred - y
self.gradient = 2 * (np.dot(x.T, diff))
wooden sail
#

what size are x, weights, and y?

bold canopy
#

x is 50000, 406, weights 406,1 and y 50000,1

wooden sail
#

ok

bold canopy
#

i dont now if the formular im using for the gradient is right

strange elbowBOT
wooden sail
#

.latex we have the model $y_{\text{pred}} = X w$ and the loss $\Vert y - X w \Vert_2^2$

strange elbowBOT
wooden sail
#

i wonder what the matter is, the log isn't super helpful

#

.latex we have the model $y_{pred} = X w$ and the loss $\Vert y - X w \Vert_2^2$

strange elbowBOT
wooden sail
#

i guess it didn't like the text box in the subscript, weird

#

anyway

#

.latex the gradient w.r.t. w is indeex $X^T (X w - y)$

strange elbowBOT
wooden sail
#

indeed* typo

#

and i missed a factor of 2, what's wrong with me today

serene scaffold
#

!otn a indeex

arctic wedgeBOT
#

:ok_hand: Added indeex to the names list.

wooden sail
bold canopy
#

Ok the i guess my alpha has to be much smaller then so the loss isnt big any more

wooden sail
#

for your info, the stability of gradient descent applied to linear least squares problems, if you keep your step size fixed, relies on the step size being SMALLER than 1/largest singular value squared of X

#

or equivalently, 1/largest eigenvalue of X^TX

bold canopy
#

Ok thank you very much

wooden sail
#

.latex though it seems you're working without the factor 1/2 in front, so revise that to $\frac{1}{2 \sigma^2(X)}$

strange elbowBOT
bold canopy
#

which 1/2 factor ?

wooden sail
#

some people like putting a 1/2 in front of their least squares cost so that the factor 2 that pops up in the gradient cancels out

#

your gradient has that factor 2 in front, which means the lipschitz constant is also twice as big

bold canopy
#

but wouldnt be least squares when i do 1/n in front ?

wooden sail
#

you can put whatever scalar factor you want in front. this changes the minimum value, but not the minimizer 😛 just be careful with the step size because you need to account for the actual size of the lipschitz constant when doing gradient descent. otherwise, the algorithm will converge slower than it could, or will diverge altogether

#

here without that factor 2 in the denominator of the step size, the alg would diverge

bold canopy
#

ill test it

woven coral
#

hello

#

anyone working on transformer models ???

misty flint
#

vector databases are really cool

#

very good for semantic search and RecSys

woven coral
#

bert,albert anyone knows???

misty flint
lapis sequoia
#

Hi, I want to make an item-based recommendation system. I found some info on the internet and tried to rebuilt their idea. They did the following: I always get this error...

#

fixed, thanks!

misty flint
#

there are different RecSys for different use cases and you can see the pros/cons of each

lapis sequoia
#

looks good indeed!

misty flint
mint palm
#

In transfer learning, after pretraining, when we deploy the architecture it learns through unsupervised methods right??

#

but if we talk of classifier model how does it know while fine tuning which cluster belong to which class of pretrained model

velvet plover
#

@spiral peak this is how it looks like

spiral peak
velvet plover
#

how can i do it sorry im relatively new to python

spiral peak
#

So for this, I would use numpy to select the values that exclude the first X amount and the last Y amount. Looking at your code I think that can be done when you define w=... and p=..., you can slice them further and only take the section you're interested in

velvet plover
#

alright im going to try it now

#

@spiral peak it unfortunately didnt work

#

it cuts off the curve but the pitch doesnt fit

civic stone
trim sapphire
trim sapphire
fierce pine
#

From where can i start for machine learning

serene scaffold
worthy trail
#

Any recommendations on books that teach you stats in Python? My stats knowledge is very basic so I would like to get comfortable with advanced concepts like p-values, probability distributions, chi square testing etc through Python before jumping into ML. Been working at an AI company as a backend engineer (Python) so understanding what data science talks about/does everyday would be nice lol

worthy trail
bitter quarry
#

I’m hella stuck in my programming project zzzz my head is gonna burst can someone help me list the salary range and their total

serene scaffold
bitter quarry
serene scaffold
bitter quarry
#

I’m tryna get the total of job postings I aint get that yet

#

I’m not sure if I did everything else right

serene scaffold
#

the total of job posting. what does that mean?

#

the number of job postings?

bitter quarry
#

ya

serene scaffold
#

and that's not the number of rows?

bitter quarry
#

column

#

With different title

serene scaffold
#

the number of columns should be how many fields you have. not how many instances you have.

bitter quarry
#

did I do that wrongly

robust granite
#

Hi people!
i have data set of states , cities across 5 years and some additional column on which ill perform analysis

#

But, the values of cities are changing across years. How do i manage that?

#

For example, lets say in 2011-13 it was New Yorrk but latter years it had name as New York

rose agate
# robust granite For example, lets say in 2011-13 it was New Yorrk but latter years it had name a...

My assumption is that the best way would probably to do an iterative loop and check pairwise similarity between the city names. You could try something similar like LCS, the longest common subsequence, which I assume should work pretty well. You could check if the LCS is within 1 or 2 of the actual length which would indicate a minor misspelling, then change the names to match. If the names are really messed up you might look at word similarity with spaCy or something, but seems overkill to me. @serene scaffold might be able to give some better ideas.

rose agate
# robust granite For example, lets say in 2011-13 it was New Yorrk but latter years it had name a...

something like this

names = ['New York', 'New Yorkk', 'Los Vegas', 'Las Vegas', 'Hollywood']
print('before:', names)

def lcs(X, Y, m, n):
    if m == 0 or n == 0:
       return 0;
    elif X[m-1] == Y[n-1]:
       return 1 + lcs(X, Y, m-1, n-1);
    else:
       return max(lcs(X, Y, m, n-1), lcs(X, Y, m-1, n));
  
for i in range(len(names)):
    for j in range(i, len(names)):
        X = names[i]
        Y = names[j]
        
        if X!=Y:
        
            LCS = lcs(X,Y,len(X), len(Y))
            if len(X) - LCS <= 2:
                print("Similar names found:", X, 'and', Y)
                names[j] = names[i]

print('after:', names)
shy mural
#

is there a way that i can find the gap of this door section

young granite
random peak
#

can i feed array to support vector?
or does it have to be a dataframe?

young granite
#

how can i adjust subplots in plotly when i use fig.update only the last one is changed

shy mural
young granite
# shy mural thanks

its not what u want to hear i know but with a simple picture in that resolution u cant even approximate by functions

robust granite
rose agate
# robust granite So its like you are matching the length of common string.?

i was doing the difference between the length of string X and the LCS, if it's only a single character that is wrong, that should just be 1. I did this instead of just checking if the LCS is large because I assume that some name pairs could have a high LCS but not actually be the same place. e.g. if there's a 'New Hampshire' and 'Old Hampshire' the LCS would be 9 because they both have the word 'Hampshire', but we wouldn't want to classify them as the same word

rose agate
shy mural
harsh nexus
#

Hey guys! I got an interesting problem in #help-pear about plotting a merged dataframe on 2 subplots with a shared Y axis, any help is welcome

dusty valve
#

so, i've trained a language prediction model, (Sequential), do i just save it with mode.save()?

harsh nexus
#

what library did u use? Genism?

desert oar
dusty valve
#

tf.keras.Sequential

austere swift
#

yeah you can save it with model.save('filename.h5')

#

thats for saving the whole model, including the optimizer state and architecture, if you wanna save just the weights you can use the save_weights method instead, it works the same

merry glacier
#

Where should I start with AI? I wanna try make some kind of text classification eventually but rn just need basics

austere swift
#

learn the math behind it first

#

its primarily linear algebra and some calculus concepts

hollow sentinel
#

wow

#

why did i not look at jason brownlee before this

harsh nexus
#

I asked my question already but I'll try to simplify it more: I got 2 dataframes, it contains results of 2 different textfiles on an LDA model. I merged the two dataframes with an extra column ('Originated').

Next step: I want to visualise how each txt file scored on the LDA model. I make a figure with 2 subplots, a shared Y axis (with all the topics, IMPORTANT: they have some topics in common) and an inverted X axis (see image). Also: I'd like to color certain topics based on their category (which is also in the dataframe). It's really hard to succeed in this and I'm kinda stuck, 2 important things that won't work together: categorising by color AND getting the labels correct for BOTH subplots

serene scaffold
harsh nexus
#

So basically, a part of the Y axis is correct, generic economic language is the most important topic of Goodwin, it's category is correct and it's on the correct spot

Problem: in the other txt file, there is ALSO a score for this topic, so they should be NEXT to each other, not like this. They need to share this label/tick in some way.

The categories do seem to work 'okay'? I think, It's hard to say this way. Main focus is to get to clearly compare these 2 subplots

#

color code seems fine for the first subplot, as that's the only part that I can (kind of) evaluate

craggy pier
#

What are the best free courses offers for basic programming in python from zero till database usage?

#

Can anyone give me a list at least?

#

good afternoon...

arctic wedgeBOT
#
Resources

The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.

craggy pier
# tacit basin !resources

Thanks, but only found 1 source that teaches database, what about the paid courses, any private company course recommendation?

craggy pier
tacit basin
#

but i mean python is still python, so you can learn python using any of the books/courses from resource page, then add database to it

tacit basin
tacit basin
#

@craggy pier look at pinned messages in #databases channel

craggy pier
tacit basin
fierce pine
hard idol
#

in the initial population of the neat algorithm, is every input node connected to every output node?

worldly dawn
#

From the paper:

In contrast, NEAT biases the search towards minimal-dimensional spaces by starting out with a uniform
population of networks with zero hidden nodes (i.e., all inputs connect directly to out-
puts).
hard idol
hard idol
worldly dawn
#

sounds about it

hard idol
#

oh alright

hard idol
#

just in the inital population

worldly dawn
hard idol
#

no connections??

worldly dawn
#

(random online articles do make some assumptions sometimes which turn out to go against the source code of the paper)

hard idol
#

oh okay thanks

worldly dawn
hard idol
#

and a few other possibilies too

#

@worldly dawn

worldly dawn
#

makes sense for a library. Interesting to see by default there is no connection.

hard idol
#

since in that case every organism in the initial population would be exactly identical

#

and it also wouldnt do anything at all

worldly dawn
hard idol
#

but partial would also do the same

worldly dawn
#

I don't think it would make or break it though

hard idol
#

but ig its different chances

#

yeah

worldly dawn
hard idol
#

unless the randomness for partial and mutation is different

worldly dawn
#

Having zero connection is an extra step to trying some and having all the connections for everyone would add some extra connections that the evolution would have to figure out to trim

hard idol
#

true

#

i think i might agree with partial

#

wait actually

worldly dawn
#

Comparing these starting points could be a fun project too, as a way to see which one could converge the fastest/most reliable way

hard idol
#

nvm neither partial or mutation would have any new nodes

#

yeah

#

i also wanna compare the percentages on a 3d graph

worldly dawn
#

I find quantiles useful too in these contexts

wind girder
#

I want to find the intersection of a horizontal line to a contour line in plotly.

I cannot find an implementation of it
One said to use skimage.find_contours to find the contour line but it changes units

short heart
#

How can I add custom augmentations to albumentations composition?

gray steppe
#
centers = kmeans.cluster_centers_.reshape(10, 8, 8)
for axi, center in zip(ax.flat, centers):
    axi.set(xticks=[], yticks=[])
    axi.imshow(center, interpolation='nearest', cmap=plt.cm.binary)``` whats this code doing?
wooden sail
#

the first line makes an image composed of several subplots. specifically, 2 rows with 5 columns each of subplots, of size (8,3) (i think this one is in inches, can't recall)

#

the second line seems to be doing some sort of kmeans clustering, i can't tell how exactly because i don't recognize the command. the result is reshaped into an array of size (10,8,8)

#

then, the axes (the object that contains the data to be plotted in each subfigure) are zipped together with the kmeans results. there are 10 subplots and 10 centers, so this iterates over them together

#

then the person removes the x and y ticks (the markings along the x and y axes)

#

and finally, in each of the subplots, an 8x8 image is displayed (of whatever it is that kmeans is returning here). since the image probably won't be 8x8 pixels (especially because of the size that was specified in the first line), they pick a flavor of interpolation to scale the figures up. 'nearest' essentially makes pixels bigger by just scaling them up, so the image will look blocky. cmap puts a colormap on the image. seems they just went for black and white

#

@gray steppe

gray steppe
gray steppe
wooden sail
#

i mean axes, i'm not talking about the variable names

#

what the notebook calles "ax" there is a list of axes

gray steppe
#

sorry, i am very poor in python.

#

so what's axi, center in this case?

wooden sail
#

axes is the plural of axis, like in x and y axis

#

axi is an element of the list of axes there

gray steppe
#

oh cool

wooden sail
#

it seems like you should look up how for loops work in python

gray steppe
#

so do you mind telling how's that for loop working?

wooden sail
#

you should ask in python general or in a help channel, i think

gray steppe
#

they won't answer

#

labels = np.zeros_like(clusters)
for i in range(10):
    mask = (clusters == i)
    labels[mask] = mode(digits.target[mask])[0]``` i am confused with this one as well.
gray orchid
gray steppe
#

I was not expecting that.

gray orchid
#

Or what?

loud apex
#

Hello there

What is the roadmap to learn data science and ai? like should i learn data science then ai? and what are the libraries should i know? and if there are courses for beginners about ai that would be helpful

arctic wedgeBOT
#
Resources

The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.

lapis sequoia
loud apex
#

i do know the basics of ai like naive classifier, NN and knn algorithm

#

thanks for the help

sour lynx
#

how can i fix this? does anyone knows pls help me

serene scaffold
odd meteor
sour lynx
#

but i used np method

#

is it wrong?

odd meteor
#

You need to vectorize your text and clean_text feature. It appears you didn't do that from your pics.

Use TfidfVectorizer or CountVectorizer + TfidfTransformer on those columns

lapis sequoia
#

I was learning recommender systems. And I have a question. The dataset basically had rows with user no, item no and the corresponding rating. We then form n user x m item matrix using this.

The teacher taught to do train test split in this data. And then use the train data to find similarity between users. And then predict the ratings that are available in the test matrix based on train matrix.
But my question is, why didn't we simply predict each value in the whole data by finding similar users to the user at hand?
I don't see any data leakage happening here.

bold timber
#

Hi, how to select the value of start_station_name that contain the words 'San Francisco' in dataframe?

serene scaffold
bold timber
#

if I have the plot like this, what the type of integral to calculate the area? definite or indefinite?

mint palm
#

are there architectures that are intelligent enough to extract portions of video which are relevant for a prediction and eliminate other portion.

serene scaffold
#

@bold timber do you know what definite and indefinite integrals are?

lapis sequoia
bold timber
#

Vice versa

eager wedge
#

I am doing a project regarding semantic segmentation. I am achieving 98 accuracy and 0 loss on first epoch? Why is it not working?

unet = models.Sequential()
unet.add(layers.Conv2D(64, (3,3), activation='relu', padding='same', input_shape=(i_size, i_size, 1)))
unet.add(layers.MaxPool2D((2,2), padding='same'))
unet.add(layers.Conv2D(128, (3,3), activation='relu', padding='same'))
unet.add(layers.MaxPool2D((2,2), padding='same'))
unet.add(layers.Conv2D(256, (3,3), activation='relu', padding='same'))
unet.add(layers.MaxPool2D((2,2), padding='same'))
unet.add(layers.Conv2D(512, (3,3), activation='relu', padding='same'))
unet.add(layers.MaxPool2D((2,2), padding='same'))
unet.add(layers.Conv2D(1024, (3,3), activation='relu', padding='same'))
unet.add(layers.Conv2D(512, (3,3), activation='relu', padding='same'))
unet.add(layers.UpSampling2D((2,2)))
unet.add(layers.Conv2D(256, (3,3), activation='relu', padding='same'))
unet.add(layers.UpSampling2D((2,2)))
unet.add(layers.Conv2D(128, (3,3), activation='relu', padding='same'))
unet.add(layers.UpSampling2D((2,2)))
unet.add(layers.Conv2D(64, (3,3), activation='relu', padding='same'))
unet.add(layers.UpSampling2D((2,2)))

unet.add(layers.Conv2D(1, 1, padding="same", activation = "sigmoid"))

unet.compile(optimizer='Adam', loss="categorical_crossentropy", metrics=["accuracy"])

model_history = unet.fit(x_train, y_train,
epochs=100,
verbose = 1,
batch_size = 32,
validation_data = (x_test, y_test))

unet.summary()
misty flint
#

highly recommend eugene yan's content about RecSys

rose agate
#

maybe look for a dataset on a topic you're interested in, e.g. books, movies, sports, etc. you can then think of something you want to predict or explore more about

#

or make a bot that optimises a game, I've always wanted to do that

#

depends what type of game I guess. If it's a video game then it needs to interpret the image which is quite difficult. If it's something like chess/checkers/go then AI can do that for sure

iron basalt
#

World models are pretty cool. You can make your AI simulate various things, real or virtual, one cool experiment is having it mimic various applications by learning models of them (e.g. copy a text editor).

#

(input is the window's buffer (pixels / video) and the keyboard and mouse (it's also the outputs in this case))

barren wedge
#

Is it better to batch into BERT model or not?

lusty spear
# barren wedge Is it better to batch into BERT model or not?

Is it better to batch into BERT model or not?
Generally, yes. When model processes inputs in a batch, GPU will process each input in parallel. But you're limited by the size of your GPU, so if you're running out of GPU memory, then you'll need to decrease batch size.
If you're using huggingface pipeline, then AFAIK it's going to handle batching for you.

barren wedge
lapis sequoia
#

guys i need help , i can code into C++ but when i entered the AI and data world i needed to learn python so i don't know where to learn and practice it for datascience

next sphinx
#

What is the purpose of an activation function?
A. To decide whether a neuron will fire or not
B. To increase the depth of a neural network
C. To create connectivity among hidden layers
D. To normalize the inputs

flint mason
#

how to put a condition where the running tab is interrupted automatically if its about to exceed available ram python

arctic wedgeBOT
gray steppe
#

Hi guys, how logistic regression can be used as a classifier?

lime current
#

hello guys, how can I use machine learning to detect fraudulent transactions in a dataset.
which ML algorithm will be suitable for it?

sharp leaf
#

In order to test if the k-nn algorithm works properly for the selected parameters ( parameter k and metric) and sample database, an appropriate methodology should be used. One of them is 1 versus the rest.

How does this 1 versus rest method work with Knn? I want to implement this method into knn but I can't find any useful information that describes it.

light crescent
#

hi, not exactly Python related but I'm asking here since I couldn't find anything online. does anyone know of an algorithm to generate a realistic set of values for a line chart/bar chart? basically, a "smoothed" set of random values with no big changes in values and ideally it should keep the random values within a neutral trend

wooden sail
#

the easiest way would be to use either a gaussian distribution with a low variance around the true values, or a uniform distribution

light crescent
#

thank you 😁

wooden sail
#

aight

#

let me cook up a MWE

#
In [1]: import numpy as np

In [2]: import matplotlib.pyplot as plt

In [3]: xvals = np.arange(100)

In [4]: yvals = xvals + np.random.normal(size=100)

In [5]: plt.plot(xvals, yvals)
Out[5]: [<matplotlib.lines.Line2D at 0x23de33aa3a0>]

In [6]: plt.show()
#

just as an example. you can change the variance of the noise by multiplying it with a scalar. you could also low pass filter it if you wanted, to get it to look smoother

gloomy anvil
#

hello party people

#

I've got a question, that I posted in stackoverflow: https://stackoverflow.com/questions/72436420/lstm-always-predicts-1s-for-binary-classifications
I figured I might ask here as well if you have some ideas why my LSTM always returns 1s in binary classification

#

What else could I change about my model? I tried different configurations of nodes and hidden layers, different optimizers as well as learning rates

mint palm
#

are transfer learning based architectures "regularly" fine tuned after deployment?

mild dirge
#

Depends if the data distribution changes over time I'd think @mint palm

mint palm
#

i was expecting the same....

hollow sentinel
#

anyone know of websites to get data from besides kaggle?

#

i think kaggle is too clean

#

uci machine learning repo?

#

the problem is that companies don't really like putting their data out there anymore so it's difficult to come up with nice projects when the data isn't available

#

i wouldn't use a dataset from kaggle in a portfolio

half jolt
#

Hello, could someone give me a hand and guide me how I could do this in an array type and not in a list like the example?

ivory steppe
#

Can anyone please guide me on how to determine whether the arm movement is in clockwise/anticlockwise through computer vision?
Can anyone share some similar projects?

native rune
#

can anyone clarify my doubt that whether the offline handwritten recognition (OHR) and optical character recognition (OCR) the same?

harsh nexus
serene scaffold
harsh nexus
#

Good one

sleek fjord
#

hii, i m getting cuda out of memory error, this is my GPU memory usage

#

how do i resolve the error?

#

RuntimeError: CUDA out of memory. Tried to allocate 396.00 MiB (GPU 0; 4.00 GiB total capacity; 3.05 GiB already allocated; 0 bytes free; 3.09 GiB reserved in total by PyTorch)

misty flint
#

i have opened up the data engineering can of worms

#

and there are a million dif ways to move data from point A to point B

#

ELT/ETL nightmare

#

i am starting to understand this space a little bit more

#

and why it needs its own role

serene scaffold
sleek fjord
#

I tried many different things, clearing the cache, reducing the batch size.. getting the memory usage, but still no luck

serene scaffold
#

@sleek fjord you're trying to allocate almost 100 times more memory than your GPU has, so you might need to brace for the possibility that you simply can't accomplish this with your hardware

#

Do your tensors have a lot of zeros?

sleek fjord
serene scaffold
serene scaffold
sleek fjord
arctic wedgeBOT
#

:incoming_envelope: :ok_hand: applied mute to @lapis sequoia until <t:1653937878:f> (9 minutes and 59 seconds) (reason: discord_emojis rule: sent 35 emojis in 10s).

mild dirge
#

So how big would you estimate your batch in memory size?

#

@sleek fjord

#

How many floats in a batch?

upper spindle
#

how good is the mit deep learning course by alexander amini

#

any opinions from anyone?

median dove
#

Hello. I have been scratching my head about advanced projects but nothing comes to mind.
I want to make a project to impress a college and make them want to enroll me. For that I will need an advanced project but literally nothing comes to my mind.
What is an advanced AI project I could develop during the next year to impress some people? Thanks 🙂

lapis sequoia
serene scaffold
#

@median dove what kind of AI do you want to do?

median dove
#

Well I’d like deep learning, I don’t have any real experience with AI but I would like to spend my highschool years on research and the development of an advanced model that colleges could like and offer me a place with them

serene scaffold
median dove
celest flax
#

im trying to get tensorflow to work

#

but

#

it just doesnt

#

is there any alternative

#

thatworks similar

#

everytime i look up neural network and machine learning

#

it just shows tensorflow and tensorflow.keras

royal crest
#

pyTorch

celest flax
#

thank youuuu

#

so is that like

#

the same thing

#

ish

serene scaffold
# celest flax it just doesnt

you've already been informed that PyTorch is similar to tensorflow. and it is. but you should probably address why tensorflow "isn't working". because it's a very widely used library, and chances are, you're the one making the mistake.

celest flax
#

m1 chip mac

#

that's why

#

pip is up to date

#

mac is up to date

#

pycharm and python up to date

#

iirc google didnt get full access to develop on m1 chips, not sure tho

hollow oasis
#

Where should I start learning data-science and ai? like what are some good resources to start learning

arctic wedgeBOT
#
Resources

The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.

half jolt
south condor
misty flint
wooden sail
pliant pewter
#

What are the best discord communities for data science and AI/ML/DL?

wooden sail
#

there's a data science and a math server that are part of the same network. both are good

pliant pewter
#

I'm on a math server, it's just called Mathematics, is that the one?

#

Has a wireframe torus logo

wooden sail
#

yeah that's the one. there's a channel on applied computational math

#

i'm pretty sure there's an AI-related server in their network, too

pliant pewter
#

Yeah, I found the AI one. Cool

gray steppe
#

hi guys, any help regarding this?

stable anchor
#

wtf is this

#

@celest vine u want me to learn this s**t

wooden sail
#

what are the usual regression model assumptions?

wooden sail
#

i guess they're referring to optimality of LS under AWGN. this isn't AWGN, and so the estimator should have a covariance matrix that accounts for this

#

or in other words, the estimator will depend on the inverse covariance of the income

#

it should become more or less clear if you go all the way back to the expression of the PDF of the data and formulate it as a maximum likelihood problem

celest vine
celest flax
wooden sail
#

idk, i didn't see which error you got

celest flax
#

plus conda just doesnt wanna find the tensorflow deps

#

lmao i followed what the conda website says and it still dont work

#

okay i manually installed the latest release

#

still gives ```Process finished with exit code 132 (interrupted by signal 4: SIGILL)

#

i did the entire apple instruction too