#data-science-and-ml

1 messages · Page 396 of 1

cloud maple
#

Getting setup for data science is easy in Pop!_OS because they have a one line command to set up nvidia and cuda drivers. So that's the OS I use. Once that's done, it's a sudo apt install tensorflowgpu and export CUDA_VISIBLE_DEVICES=0

#

or whatever your gpu number is. You can find it in nvidia-smi

karmic valley
#

oh hey got an issue.

¬```py

temp=image_t.numpy()
temp=temp[0,0,...]
fig = plt.figure(frameon=False,)
ax = fig.add_axes([0, 0, 1, 1])
ax.axis('off')
ax.imshow(temp, cmap='gray', vmin=-0.4916811, vmax=0.5)
ax.plot(xs,256-file.flow[source_start:source_end],"#04dff6", linewidth=0.5)
ax.plot(xs, ys, "r-", linewidth=0.5)
plot_path = OUT_IMAGE_DIR / "test" / "plot" / f"{i}.png"
plot_path.parent.mkdir(exist_ok=True, parents=True)
fig.savefig(plot_path,dpi=300)


i want my figure to by 1024 by 256 pixels. how to do that?
small orbit
#

@cloud maple: i actually think i just got it to work now. but thanks anyways

#

but it seems that i dont have enough gpu memory 😛

cloud maple
#

Yup. That's tough. Most stuff won't run of 4 or 6 GBs of vram. I bought a cheap k80 with 24gb.

small orbit
#

batch_size = 1 <-- actually worked. 😄

#

so, now i am looking into finding what the biggest batch_size is

#

5 was too much

cloud maple
#

Yes, but it's slow going. I went down that road too. What model are you running?

small orbit
#

nvidia tesla k80?

cloud maple
#

yes

small orbit
#

cool, how much?

cloud maple
#

I think I paid $130 a year ago.

small orbit
#

running keras bert

100 000 emails(350mb)

cloud maple
#

Now they're $300

small orbit
#

oh, wow

cloud maple
small orbit
#

i ran the model on the cpu for 5 days, but it was 5 days left. i thought that it would be better to test with GPU

#

ah, cool!

cloud maple
#

I got tired of not being able to run stuff on my 1050ti.

small orbit
#

will tensorflow use the TPU as a GPU? or do you need to do anything special in relation to tensorflow?

#

i have a laptop with nvidia quadro T1000 😮

cloud maple
#

Yes, on the CPU you might have 8 threads, but even a cheap GPU has hundreds of cores.

small orbit
#

yeah, true

#

crazy price on tesla k80 yeah

#

are they using these for bitcoin mining also maybe?

desert oar
#

little did i know that gpu prices would go through the roof and i wouldn't be able to upgrade when i wanted

small orbit
#

@cloud maple: It seems by the estimated time to be running it in 2 days, instead of 10 days. quite a bit faster!

daring pilot
#

hi

mild dirge
#

Do you have any experience with AI at all?

daring pilot
#

well

#

that was kinda meant to rickroll somebody

serene scaffold
#

so you don't actually want to know? you just came here to rickroll us?

daring pilot
#

Sorry, but I'm not working on a bot right now.

serene scaffold
#

Please don't waste our time.

mild dirge
daring pilot
#

ok.

cloud maple
#

Sorry, my neighbor rang the door bell.

small orbit
#

@mild dirge: who r u asking?

mild dirge
small orbit
#

ah, ok 😛

#

@mild dirge: it seems to be a lot faster on gpu btw...

#

thanks

mild dirge
#

how much is the speedup?

#

using your own gpu or some cloud?

small orbit
#

"It seems by the estimated time to be running it in 2 days, instead of 10 days. quite a bit faster!"

mild dirge
#

ah that seems like a fair improvement

small orbit
#

i am using the laptop gpu

#

10 days = laptop cpu
2 days = laptop gpu

#

i tried running on cloud cpu, but it wasnt faster it seemed.

mild dirge
#

ah too bad

small orbit
#

so, now i am trying to setup a cloud server to use with gpu

mild dirge
#

Our uni actually supplies a school computer with 220.000 cuda cores and like 5k cpu cores or something

#

But there's a waiting list and all, so most of the times I use my own computer anyways

small orbit
#

just struggling with installing all this cuda crap in linux:P

mild dirge
#

ah yeah, it's a struggle on windows

small orbit
#

oOo

mild dirge
#

So i'd imagine it's worse on linux

small orbit
#

that is crazy

#

well, on windows i managed to do it

#

on linux, you really need to find a tutorial with the correct linux distro and version. at the same time correct tensorflow, correct cuda, correct cudnn...etc

mild dirge
#

yikes haha

small orbit
#

the cloud computer has nvidia tesla M60. 6 cores, 56gb ram, 380gb storage

#

1.2 usd per hour

grave frost
#

@small orbit most cloud providers give CUDA and all libraries pre-intstalled

#

you just mess about PIP a fair bit for your own packages, then you're good to go

small orbit
#

@grave frost: yeah, if you can figure out how

grave frost
small orbit
#

i am in azure machine learning studio

#

how to get the pre installed libs

grave frost
#

well yeah, it probably provides an image when you're setting up the VM

small orbit
#

its really not intuitive at all

grave frost
#

it wont be

small orbit
#

too bad, because azure is quite good

grave frost
#

Azure itself is quite bad

#

plus cloud is usually simple for most people. the danger is to not get billed accidentally

small orbit
#

but you really need to read the M$ documentation to be able to understand it all, if you're not doing it like i am, learn by doing

grave frost
#

that's what the docs are for

small orbit
#

yeah, and the docs sux bigtime

#

they are using wording so that you dont really know what they are talking about

#

like, environment. what does that mean?

#

you can choose 😛

#

not intuitive

grave frost
#

thats basic terminology 🤷‍♂️

small orbit
#

yeah, but which environment are they talking about?

#

what kind of environment?

grave frost
#

virtual environment

small orbit
#

or python environment?

grave frost
#

perhaps you might be from a non-tech background? I recommend following some YT tutorial/course first

#

yes

#

it depends. either is a virtualenv or a conda env

small orbit
#

and, when you setup a "compute", you have a field where you can add "script"

#

is that script for docker? or is it bash?

#

python?

#

yeah, thats what i mean, i understand what the word means, but it can mean several things.

#

!= intuitive

grave frost
#

well, this isn't an ios app you're trying to use

#

its near-cutting edge

small orbit
#

i know

#

but wording isnt cutting edge

#

they could have made it a whole lot more intuitive

grave frost
#

that's how docs are like. you get used to it after some time of suffering

#

they never tell you everything. its always figure out on your own

#

ppl are too lazy smh. can't blame them too. I haven't ever written a single doc

#

boring AF

desert oar
#

writing docs helps you reason about your work, it's a good exercise

mild dirge
#

don't wanna brag or anything, but I've written quite a lot of readmes lemon_swag

misty flint
#

technical writing is important and underappreciated kekHands

#

good documentation can be hard to find at times kekHands

slender sand
#

Hey, can anyone recommend a good person-detection tool for python? My partner and I are trying to build a rudimentary image search engine but I need something that will take an image (in memory) and tell me if I'm looking at a person or not, without taking 30 seconds to process each image. I've tried one or two on GH but haven't found one I like.

pseudo wren
#

This may sound crazy based on what I maybe struggled with yesterday

#

But I am learning about model validation

#

I’m going to explain what I understand of model validation as best I can

#

And I’m hoping that you guys can tell me if what I think is actually correct

#

So the process of model validation is the process of making sure the features you want to evaluate are interacting with each other in a thorough way in order to yield the best results

#

We can hand validating and also k fold validate

#

We will take our x values and y values for whatever feature we have chosen to evaluate

#

(Let’s say it’s a housing market dataset and I want to see if price correlates to number of rooms)

#

I would do a correlation matrix to see if there’s indication of correlation

#

And then I would split my x’s and y’s into training and test sets

#

In k fold validation

#

I would have a training model and 3 test models

#

I will put them up against each other to see how they operate with the information given and calculate MSE or MAE. I would then iterate through all the values needed from the features selected

#

This will ensure my training model is well trained and is less thrown off by new information

#

How far off am I in this analysis?

serene scaffold
#

So the process of model validation is the process of making sure the features you want to evaluate are interacting with each other in a thorough way in order to yield the best results
Model validation is measuring the performance of the model. It does not necessarily involve analysis of the features.

We can hand validating and also k fold validate
I am not sure what you mean by "hand validate".

We will take our x values and y values for whatever feature we have chosen to evaluate
I'm a bit confused by this statement. Features are part of the x data.

I would have a training model and 3 test models
Did you mean train and test data? I have never heard of "train models" and "test models".

It sounds like you have learned a lot @pseudo wren, but are still confused about a few concepts.

pseudo wren
# serene scaffold > So the process of model validation is the process of making sure the features ...

So I got a little bit more clarity on some statements so I will try at a second attempt

  1. So model validation is just performance testing to actually fix our model if it is not performing well, would involve supplying it new data.

  2. When I say hand validate I meant iterate over it with a for loop. Weird terminology professor used.

  3. So yes training data and testing data will have to be compared up against each other, with the data interacting in a good mix so as to supply more accurate results as to how the model is performing

I think when I said features I meant like the features we choose to measure in a data set. Like comparing car seating to price in a car data set.

Thank you for correcting me as well!

serene scaffold
#
  1. So model validation is just performance ~~testing to actually fix our model if it is not performing well, would involve supplying it new data. ~~ measuring

  2. so they meant "write the performance calculation code yourself". they were not introducing a new term.

  3. I don't understand this part. you're talking about a way to pick which features to use in the model?

#

@pseudo wren ^

pseudo wren
#

Yes the way to pick which features to use in the model

serene scaffold
#

anyway, the term "cross validation" is pretty widely used. I don't normally hear people say "model validation", but all the usages I can find of it are only referring to calculating/measuring the performance.

#

but that's just a matter of terminology, not whether the concepts they're trying to teach you are valid.

pseudo wren
serene scaffold
pseudo wren
#

What do I seem unclear on?

serene scaffold
#

you seem confused about how models and model training works in general, but that's completely normal/to be expected if you're taking an introductory course. while you seem to have a general idea, I don't want to give you a false sense of confidence by saying that you understand xyz, as I don't know exactly what you're going to be graded on.

Tell me this, can you explain in your own words what a feature is?

#

@pseudo wren

pseudo wren
#

Yes

#

So when I refer to features in a data frame, I am talking about values that I am going to be using for my training model

#

In the housing market example

sterile rivet
#

I've a survey data with 10k entries (name is one of the columns in the df)
What piece of code should I use in order to check if there is any repetition of names?

pseudo wren
#

If I were to try and compare number of rooms to housing price

#

I believe those would be my features

#

From there

#

I would take that data and feed it to my model as x and y values, to see how the model interprets this data and how it makes predictions based on this data

desert oar
pseudo wren
#

This is what I think so far anyway

sterile rivet
pseudo wren
#

I could also perform regression on the data given by calculating the MSE and MAE and seeing what the loss is, or how close the model comes to the “truth”.

misty flint
#

your "predictor variable"

#

tbh its really annoying how everything gets a different name in ML

serene scaffold
misty flint
pseudo wren
#

you guys probably arent wrong lol

#

so when we say features

#

we mean the independent variables

#

right

misty flint
#

that are being used in the model

pseudo wren
#

yeah

#

so in the analogy of the housing market

#

how would you guys structure a k-fold model

serene scaffold
#

well, there is no such thing as a k-fold model

pseudo wren
#

or the k-fold method. maybe i should just call it a method.

#

the k-fold method.

serene scaffold
#

if you're doing k-fold cross validation, k is an integer, and you're making that many models.

pseudo wren
#

but using the example, how would you carry out the steps

misty flint
pseudo wren
#

so i have a less abstract view on it

serene scaffold
#

@pseudo wren can you explain how k-fold cross validation works?

pseudo wren
#

sort of yes

#

so

#

and bare with me cuz i just learned this today

#

k-fold cross validation is a method of validating our model to make sure it has thoroughly had contact with all the data being provided in our model

#

this means having testing and training sets

#

you can do 3 testing to one training set

#

and calculate the results of that

misty flint
pseudo wren
#

and you'd do this multiple times to ensure your model has had exposure to everything

#

for each x and y variable

#

so you'd have like

#

x_training and y_testing

#

and then x_testing and y_training

serene scaffold
#

@pseudo wren

to make sure it has thoroughly had contact with all the data being provided in our model

models do not provide data. there is data in the data set, and models can either be trained upon data or make predictions from data.

you can do 3 testing to one training set
if you have 3 testing and one training set, what is k?

misty flint
#

yeah what is k

#

thats a good question to get at your understanding

pseudo wren
#

I don’t think I’m good at communicating ideas with the technical vocabulary yet

#

But thank you for correcting me Pope

serene scaffold
#

you are welcome praygeBlessed

pseudo wren
#

If you have 3 testing and one training I think k is the one with the lowest MSE?

#

This is a lot to learn in a day

serene scaffold
#

no

#

MSE is an unrelated concept

pseudo wren
#

What would K be then?

serene scaffold
#

3 + 1

pseudo wren
#

Yeah I sorta pulled that answer out of my ass I’ll admit

#

Ah that makes sense

#

3 training and 1 testing

#

Right?

serene scaffold
#

if you do k fold CV, you split the data into k groups ("folds"), and each group takes a turn being the test data

pseudo wren
#

K is the result of that… combination?

#

That makes sense

#

So in the example of the housing market

serene scaffold
#

and whichever group/fold gets to be the test data for that model, the rest get to be the training data. so every fold is part of the training data k - 1 times

pseudo wren
#

If I had a CV I’d split it up into 1/4th

#

And then each group gets their turn to be the training data

#

And the testing data

#

And this is to ensure our data is accounted for thoroughly

#

So the process would be

serene scaffold
#

for example, in my work, the data sets take thousands of hours to create.

pseudo wren
#
  1. Account for data frame information we are going to measure, and split it up into x and y values. Split it up further into training and testing groups.
#
  1. We then split the CV and test it up against each other with each group getting to be the training model and testing models at some point
#

Or sorry

#

Model is the wrong word

#

Im not being technical here

#

The training group and the testing group

misty flint
#

k=10 is also a common value you might come across. "10-fold cross validation"

pseudo wren
#
  1. The purpose of this is to evaluate how our model is performing
pseudo wren
#

I was given a small set to work with right now though

#

Do I have the general idea a little more correct?

misty flint
serene scaffold
#

would increase training speeds for each fold

pseudo wren
#

lmao maybe! sounds inconvenient though!

serene scaffold
#

training speed

pseudo wren
#

I think I understand a little better now

#

implementation will be another beast

lapis sequoia
#

what if i wrote ai to write ai better than me

serene scaffold
misty flint
pseudo wren
#

I do! if i'm gonna be an AI researcher I hope to know this stuff well!

misty flint
#

its a broad field, so i highly recommend trying to look for a specialty btw

serene scaffold
pseudo wren
#

implementation and learning scikit learn

pseudo wren
#

i thought it was a little silly

#

but it was a philosopher basically prophesizing ai as the end

#

funny to think about when they seem so dumb right now

#

basically saying how ai would then reproduce and write new ai

serene scaffold
#

uhhhhhhhhhhhhhhhhhhhh. there are some problems that AI can do very well. and there are some problems where it performs well unexpectedly. but there are core human competencies that AI can't emulate currently.

pseudo wren
#

yeah but i don't know if it'll end us right?

#

not soon anyway

desert oar
#

not within our lifetimes

#

probably never

#

we will end us before AGI ends us

lapis sequoia
misty flint
pseudo wren
#

yeah we might fuck the earth before we hit that point

serene scaffold
#

you know how you're learning that models are things that learn stuff from data, and then predict one of the columns in the data? that's sort of the whole thing. there isn't a point at which the model becomes self-aware and tries to control society from your computer.

lapis sequoia
pseudo wren
#

they don't exactly possess neuroplasticity

serene scaffold
#

so to put what you're learning in context, you're learning about classifiers, which are models that assign labels to things. and that in itself is a huge part of what AI is

pseudo wren
#

hm yeah

#

so is it like

#

to build an AI

#

it's made up of a ton of classifiers like the models i'm learning?

#

like if i wanted to make some insurance AI software

#

I could include the housing model I trained

#

a car model

#

etc.

#

and package that into an artificial intelligence?

serene scaffold
#

depends on what you consider to be "an AI". but a lot of software that involves AI will probably involve classifiers in some way.

#

though I think insurance is a famous example of where AI probably shouldn't be used. if you're going to tell a customer that they're considered high-risk, you should have a specific reason, not "I entered your data into the model and it told me you were high-risk"

pseudo wren
#

Lmao yeah AI probably shouldn’t be involved in something like that

desert oar
pseudo wren
#

Would we package classifiers together from different models we trained to make up artificial intelligence software?

#

Or is that not part of the process

serene scaffold
desert oar
pseudo wren
#

That’s something else I just learned

#

Regression

#

Not good at it yet!

pseudo wren
#

Or cars

#

But I’m sure you guys would be able to say how successful that would be better than I

safe elk
safe elk
# serene scaffold depends on what you consider to be "*an* AI". but a lot of software that involve...

Weapons of Math Destruction is a 2016 American book about the societal impact of algorithms, written by Cathy O'Neil. It explores how some big data algorithms are increasingly used in ways that reinforce preexisting inequality. It was longlisted for the 2016 National Book Award for Nonfiction but did not make it through the shortlist has been wi...

iron basalt
# pseudo wren yeah but i don't know if it'll _end_ us right?

There are many ways in which humanity can already end itself on purpose or not. There is not really any good prediction that can be made as to what will happen post wide-spread AGI (because nothing like it has ever happened (anywhere in the universe as far as we know)). There are certain classes of "safe" AI, but that won't stop the "non-safe" ones from being made. And anybody with enough computers can make them, so it's kind of unavoidable at this point (without some major setback like a nuclear war).

pseudo wren
iron basalt
#

Nor are most out there in use right now quines.

#

(PaLM actually might be, if it trained on its own source code too / read it)

keen dragon
#

highest_salaries = salary.sort_values(by='salary', ascending=False)
eighth_highest_salary = tenpaid.get['salary'].index[9]
eighth_player_name = tenpaid.get['name'].index[9]
print('Player:', eighth_player_name, '\nSalary:', eighth_highest_salary)

#

what is wrong with this code

graceful glacier
#

i have the following table

#

i want to group the columns into an umbrella 'measures' column

#

what pandas function can i use to make the columns multiindex?

sterile rivet
versed gulch
#

is there a way to save 3D image i.e. i converted my 2D arrays to an image and saved them into a file, is this possible when converting a 3D array into a 3D image?

true elk
#

Hey, any tips to improve pytesseract results? I've tried almost everything I found online (grayscale, threshold, psm modes, white margin, etc)

#

Is model training on Tesseract mandatory for a reliable/consistent result ?

#

(this is not a fancy font)

#

please @ me if you have tips. Model training is my last option. It's a licensed font, I can't easily train the model without manual data entry..

gilded kestrel
#

I'm experimenting with a dataset for a regression task. In short, I want to predict how long a user has to wait for x to happen (this is my target variable in seconds). I have various features, including a date column in a format yy:mm:dd:hh:mm:ss. My assumption is that yy:mm is not important but the time of the day is. E.g. I want my prediction to take into account the time of day. Should I be looking at time series or just a way to include time as a feature?

hybrid mica
#

in general, does XGBoost give good results?

stark breach
#

Hey i have got a model i just made , its a very small linear regression project , will anyone be able to help me out by evaluating it and telling me where i can improve ,you can send a DM if you are ready to help , i will send you the notebook

candid pollen
#

Hello! im tring to extract values from ROI on a Image, is there any example or documentation that can i read??

hybrid mica
#

what is the best way to evaluate the performance of non-linear regression models?

steady basalt
frigid elk
#

ROI 😉

steady basalt
#

Area under roc curve is good for predictive performance

#

It takes into account both recall and specificity

#

For each prediction

#

Based on their own probability cut off

#

Although sometimes u may consider one more important than the other

candid pollen
tough frigate
chilly abyss
#

hi everyone, I want to replicate what's in this excel sheet in python.
The data structure in python is a one-dimention but contains monthly irradiation data for 2000 to 2021.

#

hi everyone, I want to replicate what's in this excel sheet in python.
The data structure in python is a one-dimention but contains monthly irradiation data for 2000 to 2021.

desert oar
#

you should use pandas for tables

#

so your data is in B2:O14, you can read exactly that range with pandas

true elk
#

Anyone has experience with pytesseract? How can I train the model without having the font?

chilly abyss
#

The fact is that, the data would be downloaded from the cloud and it will come in that 1d form so in actual sense the code I want to write would be able to work with the raw data that is the reason I m searching for ways to work with the 1D. I initially used excel just to learn and understand Monte Carlo simulation which is what I am trying to achieve in python

desert oar
#

i see, how do you know what date is attached to each data point? is there a separate series for the dates?

#

i would still recommend using pandas, because it has good features for time series

#

i think you are thinking about this problem in the wrong way. "Monte Carlo simulation" is pretty easy, you just do something over and over. you need to learn "how to get data in and out of python" and "how to draw from random variables" and "how to compute summary statistics"

#

you won't find useful resources on "monte carlo simulation in python" that explain all that

chilly abyss
#

ohkay...

#

@desert oar can I pm you? 🙂

desert oar
#

it's better if you keep asking questions here, i would rather not dm

chilly abyss
#

Alright, get it.

sullen edge
#

Hey there, I noticed that when an opencv imshow window is clicked and dragged, it blocks the execution of the code till it's released. Any idea what causes this and how it can be disabled?

chilly abyss
#

After Dec 2000, the next 12set of data would be Jan 2001, Feb 2001....Dec 2001, till the last set of data

desert oar
#

oh i see, every number is already an average

chilly abyss
#

No not an average

desert oar
#

oh sorry, the value in the table

#

so you have 132 numbers, corresponding to that table

#

and it's in a 1d array

chilly abyss
#

The numbers in the table are of different unit so there is a differenct

lapis sequoia
#

Hello friends ,
I hope you are doing well.
I am selected to work on a problem to develop ML model to predict cancer using HMM....actually problem is I am just a beginner in ML...can you guys suggest how should I proceed to work on this problem ?

desert oar
#

okay. but you can reshape it into the table with .reshape, then i recommend putting it into a pandas dataframe for further analysis

chilly abyss
#

The number in the table is a 2d with different units of the numbers in the juptyter notebook

desert oar
#

right, that's ok

#

we can get there

chilly abyss
#

values 1 - 12 represent data for Jan to Dec 2000; values 13 to 24 reprevalues data fro Jan to Dec 2001...

chilly abyss
desert oar
# chilly abyss ok, I will go read about this an implement it. 🙂
import numpy as np
import pandas as pd

# Import your data however
data_1d = np.loadtxt(...)

# Reshape the data to have 12 columns, automatically
# adjusting the number of rows (with "-1")
data_2d = data_1d.reshape((-1, 12))

# Months for the column labels
months = [
    'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
    'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec',
]

# Years for the row labels
years = [
    str(year) for year in range(2000, 2000 + data_2d.shape[0])
]

# Build a Data Frame
data = pd.DataFrame(data_2d, columns=years, index=years)

# Optionally, save the data frame to a file so
# you don't have to do this processing again
data.to_csv(...)

now you can do table-oriented operations on data

#

pandas is more or less the standard tool for tabular data analysis

#

another option is to load your data as 1d data, and then do "groupby" and "rolling" operations to find the averages, but that is more advanced usage

chilly abyss
#

👍🏾 great

desert oar
#

it would look like this:

import itertools  # included with python

import numpy as np
import pandas as pd

# Import your data however
data_1d = np.loadtxt(...)

# Months for the column labels
months = [
    'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
    'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec',
]

# Years for the row labels
years = [
    str(year) for year in range(2000, 2000 + data_2d.shape[0])
]

yearmon_dt = pd.to_datetime([
    f'{year} {month}' for year, month in itertools.product(years, months)
])

data = pd.Series(data_1d, index=yearmon_dt)
desert oar
#

that code will not work as-written. you will need to adapt it to your own data

chilly abyss
#

sure, will read the doc...

#

Thanks @desert oar , that was really helpful

mild dirge
#

you want us to sift through a script of 1766 lines to help you copy and change it?

fast dust
#

I'm trying to construct a Nerual Network without ml libs
How should I construct weights?
a single float array
or
list[list[list[float]]]

mild dirge
#

There are weights between each pair of consecutive layers

#

And the weights are connections from each node to each other node

#

So you'd have n_layers-1 weight matrices

#

and each matrix' size is dependent on the previous and next layers' sizes

fast dust
#

thanks!

serene scaffold
fast dust
#

like optimized dot production

#

learning the underlaying math of nerual networks is really fun actually

#

in tutorials and articles about constructing a nerual network from scratch, they don't write the code to be re-usable.
they hard code all the weights and biases by hand. I'm trying to make a more "OOP and re-usable" version of them

mild dirge
#

Some articles aren't fully correct btw

#

make sure to learn from multiple sources

tough briar
#

is it alright if i skip andrew ng's course because hes not teaching with python. I'm more inclined towards sentdex's series

mild dirge
#

Actually coded a NN from scratch, and the article I used had quite a lot of mistakes

tacit basin
tough briar
#

because andrew keeps referring to the other languages and thats hard to follow

karmic valley
#

hey need some help. i have 1024xy coordinates tracking some object on an image. i want to tell python to make the image above those coordinates transparent. can someone help pls?

tacit basin
iron basalt
tough briar
sullen edge
iron basalt
modest mulch
iron basalt
#

*Some GUIs / apps do this method when they are too lazy to figure out how Windows actually works.

#

*It's not exactly well documented or understandable.

modest mulch
#

Most GUIS are blocking ones. And it does make sense why they are, has nothning to do with understanding how os actually works

iron basalt
#

Multithreading makes your program harder to understand.

#

It's often not needed and used where one can just use non-blocking IO.

#

A GUI should be non-blocking without any need for threading.

#

*In some cases you may still need threading, but general only when you actually need it so it does not become unnecessarily complex.

tacit basin
lilac kindle
#

Hi all, anyone knows if it's even possible to debug pyspark locally with breakpoints? Always raises an error after it reaches an action method on dataframes. Thanks.

tawdry matrix
#

hello

#

i want to become data scientist so what i can do

misty flint
lilac kindle
#

no worries I'll investigate further

#

running pyspark locally is still a pain in the butt

#

I mean it's not too bad

#

but not the best dev experience

rocky lantern
#

I have a netcdf file that has two variables lon and lat, that are 1d arrays. I want to merge them into a 2d meshgrid of tuples. The code I currently have is below

import netCDF4 as nc4
import numpy as np

nc =  nc4.Dataset('geodata.nc','r', format='NETCDF4')
#open netcdf dataset

lon = nc.variables['lon']
#lon.shape -> (541,)
           
lat = nc.variables['lat']           
#lat.shape -> (346,)

lons,lats = np.meshgrid(lon,lat)   
#lons.shape -> (346, 541), lats.shape -> (346, 541)

Is there a way to easily zip the data values of the lons, lats MaskedArrays together into one MaskedArray of tuples data?

rocky lantern
#

Oh, I think this might work for me.

>>> coords = np.dstack((lons,lats))
>>> coords.shape
(346, 541, 2)
sullen edge
sullen edge
lilac kindle
iron basalt
#

Since it's only two threads, the main one being blocked for UI, and the rest, it should be fine.

modest mulch
iron basalt
#

Nor does it even need concurrency in the sense that it's some built in language feature or "fake" threads.

#

Using threading for this is not ideal, but everyone does it anyhow.

modest mulch
#

Ah thats fair. I mean even non blocking IO does some multithreading /multiprocessing under the hood, at least thats the case in js. I guess its inevitable

iron basalt
#

If you look up a solution to not blocking when dragging a window on Windows you will often find that making another thread is recommended. This is wide spread misinformation about how to properly program a GUI.

#

Under the hood it will of course multitask. Not multithread.

#

There is no need for the OS to wait for the entire file to finish reading. But it will actually wait unless you tell it not to in the user's process. Because the user's code is a bit more complicated when it's not blocking.

#

It's more simple to just block, and often you may not care about a short block.

#

But when the block time is large, it can become a problem.

#

(Or potentially forever in the case of not handling the GUI drag events)

#

(Or network with no timeout)

modest mulch
#

True, but there's no other way if the language doesnt support non blocking IO i guess

iron basalt
#

C, C++, Python, etc, most do.

#

Python also has language level concurrency.

#

Coroutines too.

#

But you can also set IO to non-blocking directly.

#

Basically, not matter which language, you have to tell the OS you want non-blocking IO.

modest mulch
#

Thats fair i guess

iron basalt
modest mulch
#

I didn't actually know that most languages supported async programming

iron basalt
#

It makes your program way more simple than having another thread for the socket.

modest mulch
#

yea for sure

iron basalt
#

Or use the more advanced asyncio, which in Python is the recommended probably.

modest mulch
#

Threads management can be a pain in the ass, especially when writing to a database / file

iron basalt
#

Yeah most don't seem to know about this, it's sadly wide spread misinformation on how to program a GUI too. Everyone kind of just copies everyone else without fixing the bad parts and it then over time is seen as the "correct" way of doing it, and it even shows up as the first answer on stack-overflow, etc.

#

It used to be the way people did it by default.

#

Somehow that knowledge was lost over time / generations.

#

(A lot of things in programming are just assumed to be the right way because everyone is doing it)

modest mulch
#

True

iron basalt
#

(It does not help that Windows, etc are stupidly complex and have bad docs)

modest mulch
#

Anyone knows what could work for basketball court detection? I have tried using color space info, k means clustering but these didn't work, and I don't have a dataset to train some thing like a GMM model or encoder decoder

mint palm
#

in these type of git rep

#

how do i download and run the code?

#

i see some update files as well

tacit basin
mint palm
#

having cuda not found error

#

but i used cuda last year

tacit basin
#

GPU drivers cuda as well

#

nvidia-smi

#

What does this return?

pseudo wren
#
y = y.reshape(-1, 1)
x, y = np.array(x), np.array(y)
#np.any(np.isnan(mat))
x2 = PolynomialFeatures(degree=2, include_bias=False).fit_transform(x)
model = LinearRegression().fit(x2, y)
r_sq = model.score(x2, y)
intercept, coefficients = model.intercept_, model.coef_
y_pred = model.predict(x2)```
#

trying to preform a linear regression on my data

#

but i'm getting the error that says input contains NaN, infinity or a value too large

#

not sure what the best fix is for this

desert oar
pseudo wren
#

😔 yes

#

but i still need to work with the data

#

so i'm not sure how to manipulate it to fit

desert oar
# pseudo wren 😔 yes

you need to either impute values for them, or remove those rows. there are no other options for linear regression

pseudo wren
#

hm okay

#

worked

karmic valley
#

i have 1024xy coordinates tracking some object on an image of 1024 width. i want to tell code to make the image above those coordinates transparent. could someone help pls

fast dust
mild dirge
#

yeah that ones great

cloud maple
#

Do you watch NeuralNine

cloud maple
#

How are you loading the data?

pseudo wren
#

using colab

frigid elk
#

any free resources to convert address to lat/lon? need to do distance calculation in miles from central point

frigid elk
#

thanks, i'll take a look

cloud maple
pseudo wren
#

The error message is line 8 in the cell I sent

#

Had no issue assigning the x and y values

cloud maple
#

Can you print them out?

#

They should be vertical columns.

#

In today's episode we are starting by talking about the first supervised learning algorithm which is linear regression.

Linear Regression Blog Post: https://www.neuralnine.com/linear-regression-from-scratch-in-python/

Website: https://www.neuralnine.com/
Instagram: https://www.instagram.com/neuralnine
Twitter: https://twitter.com/neuralni...

▶ Play video
misty flint
#

Consider Double Thumbs Up as a way to fine-tune your recommendations to see even more series or films influenced by what you love. A Thumbs Up still lets us know what you liked, so we use this response to make similar recommendations. But a Double Thumbs Up tells us what you loved and helps us get even more specific with your recommendations. For example, if you loved Bridgerton, you might see even more shows or films starring the cast, or from Shondaland.

desert oar
#

you know, this makes a lot of sense

#

thumbsup is "not bad"

#

thumbsdown is "did not like"

#

double up is "very good"

#

i know people are going to say that this is backpedaling on their change away from the stars, but for all we know this has been a product roadmap for a very long time

next phoenix
misty flint
#

stars you could do more quantitative analyses and get probably a better overall rating about something

#

but this double thumbs up is all about feeding this information into a RecSys that seems like it might be a deep learning RecSys

#

since it can pick up certain features, it seems like

#

and in the end, create a more personalized RecSys based off of very strong signals

safe elk
misty flint
desert oar
#

humans are bad at stars and 1-5

vagrant gust
#

int input

#

90

lapis sequoia
#

Can we run machine learning programs in google colab ?

ocean swallow
lapis sequoia
quasi ether
#

guys what is the difference between activation kwarg and Activation layer in tensorflow?

here's an example :

activation kwarg :

model.add(Dense(64,activation="relu"))

Activation layer :

model.add(Dense(64))
model.add(Activation("relu"))

im getting different results and different accuracy rate
PS: im new to tensorflow

pastel valve
unique flame
#

I'm using 180x180 and slowly go up to see the difference for a classification task. I was told to just experiment

digital folio
#

Hi Guys, I need some help

#

I want to detach 'Date' from df so that I can convert each column as List

#

^ when I do date

#

but date exist in the table

#

When I do data['Open'] or any other Column

#

date column is attached

true elk
#

Hey, I need some help with pytesseract . If someone has experience with training tesseract please @ me

desert oar
#

you can just use .tolist() on each column and the index will go away

#

that said, it's pretty rare that I need to actually convert a series to a list. so what are you trying to do?

versed gulch
#

how do you stack a list of 3D arrays into 1 3D array?

lapis sequoia
#

Hello. I just fixed my pytorch 2d self driving car AI from spinning in circles and now it's really dumb. How can i make it learn faster and better?

true elk
#
  1. Can I achieve good results without training tesseract? I've already tried most tips I found online (psm modes, white margin, threshold, etc.)
  2. If I really need to train tesseract, what is the best current tool? Most tools/articles I find are very old
    Also I can't figure out how to compile on my MacOS
    ../configure PKG_CONFIG_PATH=/usr/local/opt/icu4c/lib/pkgconfig:/usr/local/opt/libarchive/lib/pkgconfig:/usr/local/opt/libffi/lib/pkgconfig
    This command is taking 100% of my CPU for ages, I had to Ctrl+C it as my PC has been frozen for more than 30mn
    When trying to skip this line and doing the sudo make training-install I've got the following errors
libtool: warning: 'libtesseract.la' has not been installed in '/usr/local/lib'
libtool: install: /usr/bin/install -c .libs/ambiguous_words /usr/local/bin/ambiguous_words
libtool: warning: 'libtesseract.la' has not been installed in '/usr/local/lib'
libtool: install: /usr/bin/install -c .libs/classifier_tester /usr/local/bin/classifier_tester
libtool: warning: 'libtesseract.la' has not been installed in '/usr/local/lib'
libtool: install: /usr/bin/install -c .libs/cntraining /usr/local/bin/cntraining
libtool: warning: 'libtesseract.la' has not been installed in '/usr/local/lib'
libtool: install: /usr/bin/install -c .libs/mftraining /usr/local/bin/mftraining
libtool: warning: 'libtesseract.la' has not been installed in '/usr/local/lib'
libtool: install: /usr/bin/install -c .libs/shapeclustering /usr/local/bin/shapeclustering```
Is that related to the fact that I canceled ./configure ?
#

Or any other suggestion for OCR?

#

There is a tesseract --with-training-tools install history but I can't make it work

#

Error: invalid option: --with-training-tools when trying to do brew install tesseract --with-training-tools

cinder matrix
#

can someone tell me what type of word embedding etc gpt2 uses

lapis sequoia
#

Hi,
Sometimes when you are using Python in the Colab environment you may be wondering how to get your web camera video stream in Colab to be able to use it with your Python code for you ML models for example => check my new post if you are interested :
https://python.plainenglish.io/how-to-get-your-webcam-stream-in-colab-and-use-it-with-python-1f1d2c30df34?sk=f8723004313db0fc64ccc8cf4eac1f39

Medium

A step-by-step guide on getting your webcam stream in Colab and use it in your Python code

arctic wedgeBOT
#

Hey @loud flame!

It looks like you tried to attach file type(s) that we do not allow (, .ipynb). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a, .csv, .json.

Feel free to ask in #community-meta if you think this is a mistake.

lapis sequoia
#

does anyone who knows python AI wanna have a dm conversation? i need to know how to make my AI smarter which i've coded with pytorch as it doesnt seem to learn

serene scaffold
#

@lapis sequoia you're more likely to get help if you put your question in this chat

lapis sequoia
#

ok

true elk
#

Can docker be a solution for my current Tesseract training/MacOS issue?

#

I'm stuck in the compiling hell 😦

lapis sequoia
#

So basically, I stopped my self driving car ai from going in circles after 80 generations and then it no longer had any intelligence. how do i fix the AI?

desert oar
true elk
#

Is this linked to the anaconda project? tbh I had a terrible experience with Anaconda, won't install ever again lol

#

(but it was a long time ago, may give another chance)

#

Do you have experience with training tesseract?

#

or fine-tuning to be exact

rocky mason
#

does putting too many parameters into Gridsearch return hyper-parameters that will overfit when i use on my random forest classifier?

serene scaffold
mild dirge
#

You can make sure of that with validation set

#

That's what you use for tuning hyper parameters

#

afterwards you can test your finished model on the test set, which will give a good estimate on how well it performs on new data (e.g. generalize/ not overfitting)

serene scaffold
#

I see. I guess I should look into gridsearch

mild dirge
#

Grid search is just trying many combinations of hyper parameters

serene scaffold
#

oh

#

via brute force? or does it intelligently skip combinations that are unlikely to work well based on the performance of previous combinations?

mild dirge
#

grid search is just brute force

#

that's why it's a grid, you try every combination

#

and you give a range of values for every hyper-param

serene scaffold
#

I'm learning 😄

mild dirge
#

You work with ml right? I'm sure you've already used it without knowing what it was called then

serene scaffold
#

yes, though I know what I know and don't know what I don't BingShrug

rocky mason
#

is validation set equivalent to your test set

mild dirge
#

No, the test set is not used when training the model

#

It is kept completely separate all the way until after you are done and finished your model

rocky mason
#

is splitting training set into train and validation right?

mild dirge
#

Then you can use it to only test how good it is, if you find out it is bad and try other hyper-parameters, then you might still overfit

mild dirge
rocky mason
#

because from what I know grid search do return you the best hyper-parameters available with best_params_ but it seems to be overfitting on my test set produced from the random forest, with the accuracy of validation set being far off from the accuracy of the training set even after 10 k- folds in grid search

mild dirge
#

yeah, that's why you keep the test set separate

#

How would you know it is overfitting though?

rocky mason
#

from what i am told after the k-fold, your train and validation accuracy should be quite close to 1 and another

#

and also the hyper-parameter returned me a depth of 16 to put onto the dec tree, so im sure that going to overfit as well

mild dirge
#

If you are using k-fold, you should only perform k-fold with some training data, which will split it into several folds, and then later test it on the test set when you are satisfied with the results

pseudo wren
#

Thought this might be of interest to the people here

misty flint
tough frigate
#

i have a pickle file greater than 100mb, is there some way i can reduce the file size? in order to upload on github for deployment

serene scaffold
bold timber
#

How to read this data as csv? I have try to do, but it got an error like this:

ParserError: Error tokenizing data. C error: Expected 1 fields in line 7, saw 10
serene scaffold
bold timber
#

I simply use pd.read_csv('data_csv')

can you guide me on how the code should be used?

serene scaffold
#

!docs pandas.read_csv

arctic wedgeBOT
#
pandas.read_csv(filepath_or_buffer, sep=NoDefault.no_default, delimiter=None, header='infer', names=NoDefault.no_default, index_col=None, usecols=None, squeeze=None, ...)```
Read a comma-separated values (csv) file into DataFrame.

Also supports optionally iterating or breaking of the file into chunks.

Additional help can be found in the online docs for [IO Tools](https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html).
serene scaffold
#

you also need comment='#' in the read_csv function, since there are comments at the top of the CSV file. you could also delete the comments intead.

bold timber
arctic wedgeBOT
#
pandas.read_csv(filepath_or_buffer, sep=NoDefault.no_default, delimiter=None, header='infer', names=NoDefault.no_default, index_col=None, usecols=None, squeeze=None, ...)```
Read a comma-separated values (csv) file into DataFrame.

Also supports optionally iterating or breaking of the file into chunks.

Additional help can be found in the online docs for [IO Tools](https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html).
desert oar
#

the docs describe all the options

serene scaffold
#

I already linked to that

desert oar
#

oh i'm sorry

#

i should have scrolled up

serene scaffold
#

it's okay. ||I don't always scroll up||

#

also read_csv has a metric fuckton of parameters

#

I scrolled for quite a while before I control+f'ed comment

desert oar
#

docs should always have anchors on individual parameter descriptions as well as the function itself

#

that way you can link directly to the parameter you are talking about

serene scaffold
#

I suspect the pandas team can't put that on their plate unless someone volunteers to implement/maintain it

#

I heard they were struggling, but I don't remember the specifics

desert oar
#

my impression is that they're just overloaded, too many lines of code and not enough people to work on it

#

like many/most things in the python ecosystem. billions of dollars in global revenue generated on the back of an understaffed underfunded team

#

fwiw i don't think this should be a "pandas" problem, it should be part of sphinx

serene scaffold
#

is it not?

misty flint
fast rivet
#

I got this question from my data science class and I'm wondering what the confidence interval is. I think i have to look at the standard error of 4, but idk... I'm also not sure wether I got the average correct

misty flint
tough frigate
#

Well, I'll just heroku directly

serene scaffold
fast rivet
#

nvm my question. I already found the formula

serene scaffold
bold timber
serene scaffold
#

whoever made that file shouldn't have put more than one table in the same file. they sabotaged you.

raw ivy
#

anybody know how I can make a pandas DF column with purely links clickable? They currently just export to excel as raw text
current code:

            if data[key].startswith("http"):
                df.at[index, key] = data[key] # data[key] is a URL
bold timber
serene scaffold
bold timber
# serene scaffold *destroy them*

this dataset is part of my skills test to join the company; I can't do anything, but at the same time I don't understand what is this hahaha

#

but thank you for answering it very useful

dusk tide
#

Why do we always use MSE instead of MAE as cost function?

bold timber
#

and the both will following result based of your score of model

serene scaffold
#

!docs pandas.merge

arctic wedgeBOT
#

pandas.merge(left, right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), copy=True, indicator=False, validate=None)```
Merge DataFrame or named Series objects with a database-style join.

A named Series object is treated as a DataFrame with a single named column.

The join is done on columns or indexes. If joining columns on columns, the DataFrame indexes *will be ignored*. Otherwise if joining indexes on indexes or indexes on a column or columns, the index will be passed on. When performing a cross merge, no column specifications to merge on are allowed.

Warning

If both key columns contain rows where the key is a null value, those rows will be matched against each other. This is different from usual SQL join behaviour and can lead to unexpected results.
bold timber
rocky mason
#

what do you want to remove? NaN?

bold timber
rocky mason
#

whats the last row index for ur data

bold timber
rocky mason
#
data = data.drop(labels=range(11, 134))

hoary rover
#

If there are more NA cells in your dataframe, this could affect your results as if you then try to run any diagnostics on that dataframe, SciPy and R may count those NA cells as part of the sample which will create a skew.

misty flint
#

oof. i think a small part of me died inside suggesting that

hoary rover
#

Ah yes. Excel. The glorified CSV reader.

raw ivy
#

when i click the link in the table once and click off it turns into a url

#

but rather not have to click every entry like that

hoary rover
#

If you want something that might perform better, look into directly importing into Google sheets.

misty flint
#

also cute pfp cattohug

hoary rover
raw ivy
#

real-estate links atm

hoary rover
#

Sure, are they for yourself or your workplace? My advice is to leave the links are they are and not touch them as they aren't really necessary for anything other than getting more information unless you're performing a depth text analysis

raw ivy
raw ivy
queen torrent
raw ivy
#

not quite i'm new with pandas struggling to get the actual location to replace in cell @queen torrent but trying to figure it out ^^

#

yes smtp

#

i.e when i get

links = df.loc[:, "Link"]
# 0     https://www.remax.ca/ab/
# 1     https://www.remax.ca/ab/

but havent found how to get the x/y to replace it :p

tropic matrix
#

I'm making a machine learning model, and the input data has a lot of string data
One column (item_id) has over 2000 unique strings, so I'm wondering which would be faster to preprocess and train:

  1. onehot encoding that + all other categories with strings would make more than 2500 new columns to the dataset, which takes a long time to preprocess, and i'm not sure if it would affect how long it takes to train and/or the accuracy of the model
  2. make a new model for each item_id, which would make it much more accurate and take less preprocessing work, but would create over 2000 models that need to be trained (i'm not sure if i can utilize multiprocessing for this, so it would have to be single threaded), and save a new model file for each item_id

is there anything else I could do?

desert oar
misty flint
#

sounds more sustainable, yeah CL5_FeelsBongoMan

dark phoenix
#
last_date_data["Rank"] = last_date_data["Target"].rank(ascending = False, method = 'first').astype(int) - 1

This throws a SettingwithCopyingWarning but the reference also suggests creating a new rank column in a similar way
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.rank.html
Is there a way to make that rank column and not get this warning ?

finite coral
#

Hi there, I'm looking to add some empty data to the end of my dataframe. The data looks like this for example:

index      var-1  var-2 ... var-n
2022-01-01  1       2   ...   3
2022-01-01  1       2   ...   3
2022-01-01  1       2   ...   3
...
2022-01-31  1       2   ...   3
[eof]

I'm looking to add say all of February, but set the values to None; is there a simple way of doing this?

I know this is quite a noob question but I'm struggling bad peeps 💙

finite coral
lapis sequoia
#

!code

arctic wedgeBOT
#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

karmic valley
#

https://paste.pythondiscord.com/luhoyasewu can you help me with some different code i want to work out average pixel whiteness of image. my code i wrote i think the library doesnt support transparency, i have transparency in my image

misty flint
#

use cases

ocean swallow
wraith tapir
#

Anyone familiar with the jupyter notebook? mine just deleted a bunch of core system files. I already verified that they were infact deleted and reset my pc. I just want to know what happened and make sure this doesnt happen in the future.

dusk tide
hoary rover
# tropic matrix I'm making a machine learning model, and the input data has a lot of string data...

So initially, yes you are right, a one-hot vector is one of the correct preprocessing techniques you should be applying to large-scale categorical sets of data. With that many unique elements in your model you may benefit more from entity embedding techniques such as PCA.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4792409/
http://www.stat.columbia.edu/~fwood/Teaching/w4315/Fall2009/pca.pdf
Please let me know if this answers your question.

:)

hoary rover
wraith tapir
# hoary rover Everyone here is familiar with jupyter notebook and no that is not at all possib...

so i use jupyter through anaconda, earlier I was trying to relaunch my network after it crashed and then my desktop got deleted and I couldnt open any apps. I tried to restart my pc, but that didnt work. My documents folder, downloads folder, file explorer, and chrome were deleted along with other important stuff. I am using lucidrains version of stylegan. Does this help? I'm not sure what else I can give.

hoary rover
#

🤨

#

What exactly were you running on the jupyter notebook that would be heavy enough to tank your entire laptop?

#

Typically GAN's need a lot of GPU power. What exactly were you running?

wraith tapir
#

im training a GAN.

#

So ive been playing with the batch size and gradient accumulation to get the model to fit, I made some changes and tried to re launch it. It gave me a permission error first so i restarted the kernal and tried again. Then it started to delete files

hoary rover
#

The best mini-batch sizes to generally start with is 8 then increase from there up to and never higher than 64. Did you raise it higher than this?

#

I can't think of a single reason why it would start deleting your entire PC.

#

What loss function are you using?

wraith tapir
#

Ill try to get the notebook opened so i can tell you what I have configured

hoary rover
#

Well despite your bug, you should use your industry knowledge to research a few loss functions and apply them to see which would better fit the model you are trying to train. Other than that I'm not sure how I can help you. My advice is to leave a ticket directly on the github page. I'm sure the author will find that hilarious.

misty flint
pseudo wren
#

Could a data scientist hypothetically become a software engineer?

#

(Not saying I want to change paths, but I’m wondering if there’s room to dabble in both)

#

Could be interesting and help me build out other parts of the artificial intelligence models I’m working on

agile cobalt
#

but depending on what you do currently and what you're thinking about doing, your experience so far could not help all that much, or it could already set you 95% of the way there

misty flint
queen torrent
agile cobalt
true elk
#

4th day of headache with Tesseract (I've even dreamed of it lol), I'm giving up

#

Any good OCR alternative? I've identified Calamari OCR, Keras-OCR and EasyOCR as candidates, any suggestions/comments are welcome!

#

I need it for two projects: a French one that I need to train on the font Open Sans, an another one for 7-segment digits (to read temperature)

#

Kraken seems to be only for historical. Keras-OCR seems not optimized for non english languages, so I think I'll not consider it as a viable option

#

Is Ocropy still a good option in 2022?

calm nacelle
#

guys can somebody tell me how can i solve the quadratic formula in pycharm

bronze spire
serene scaffold
# next phoenix Found this interesting. Important tips https://medium.datadriveninvestor.com/eff...

I wouldn't read anything this person writes.

NumPy arrays are homogeneous and provide a fast and memory efficient alternative to Python lists.NumPy arrays vectorization technique, vectorize operations so they are performed on all elements of an object at once which allows the programmer to efficiently perform calculations over entire arrays.

but then immediately after that, they write this:

import numpy as np
def reciprocals(values):
    output = np.empty(len(values))
    for i in range(len(values)):
        output[i] = 1.0/values[i]
    return output
values  = np.random.randint(1,15,size=6)
reciprocals(values)

They clearly don't know what they're talking about.

#

In particular, their reciprocals function is not vectorized, and has the same efficiency as using regular Python lists. the vectorized alternative is simply 1 / values

#

!e

import numpy as np
values = np.arange(1, 6)
print(1 / values)
arctic wedgeBOT
#

@serene scaffold :white_check_mark: Your eval job has completed with return code 0.

[1.         0.5        0.33333333 0.25       0.2       ]
raw mountain
#

Hi, i am new to data science i have assignment to build a model for flower recognition. Can anyone suggest different models i can use to improve accuracy of my model

serene scaffold
raw mountain
#

classify specific kind of flowers

serene scaffold
#

and this is images, right?

raw mountain
#

yes

serene scaffold
#

alright. what data set are you using?

raw mountain
#

one provided by kaggle

serene scaffold
#

link?

raw mountain
#

wait

serene scaffold
#

so every image has a flower, and there's five types of flowers to classify. I don't know much about image classification, but I think this is enough information for someone who does to help.

raw mountain
#

yes bro

#

and i have to train different model and then compare their accuracy

strange crow
#

Try Aleph Alpha Magma model it's insane

serene scaffold
#

is that model practical for someone to train on a personal computer?

raw mountain
raw mountain
serene scaffold
# raw mountain will try

if the aleph alpha magma model is "insane", it may be that it requires an especially powerful computer to train

#

that's why I asked Zettelkasten that question

raw mountain
#

ohh i see

#

i have to train basic model and compare accuracy

#

i have build one using cnn that was demonstrated in one youtube video

loud flame
#

Why does a saved pickle model and a trained pickle model have different accuracies?
I can send the code and the datasets if needed

polar depot
#

It depends on how you evaluate your accuracy. Dropout randomness is not disabled, batches are sampled randomly, etc.

strange crow
#

It's a couple hundred billion parameters

loud flame
#

do you want me to send the 1 jupyter nb and dataset?

polar depot
#

No. If you want more help, you can make a Colab notebook and share a link on the channel.

#

Ok, I looked at your notebook. Basically, you leaked test data.

loud flame
polar depot
#

20% "test" data you used in the final evaluation are used in training.

loud flame
loud flame
random sapphire
loud flame
#

oh yeah

#

I don't get 97% anymore

random sapphire
#

Hey all. I created a video about cross validation techniques for ML. Interested in getting feedback. https://youtu.be/-8s9KuNo5SA

In this video Rob Mulla discusses the essential skill that every machine learning practictioner needs to know - cross validation. Without cross validation it's easy to overfit your model and overstate it's predictive power. This video is a must watch for anyone trying to learn machine learning.

Timelime:

00:00 Intro
01:37 Setup
03:41 The Datas...

▶ Play video
polar depot
#

Try to remove train_test_split in the loop.

loud flame
#

yup

#

no 97%

#

but there's a missing row of output

#

will that matter?

#

oh nvm, its the same

#

thanks a lot @polar depot @random sapphire

#

🙏

random sapphire
#

Any time. Setting random state in cross validation is always a good idea.

topaz leaf
#

Has anyone here dealt with this issue as it pertains to computer vision before
" Termination Reason: Namespace TCC, Code 0
This app has crashed because it attempted to access privacy-sensitive data without a usage description. The app's Info.plist must contain an com.apple.security.device.camera key with a string value explaining to the user how the app uses this data."

#

im not sure how to change the info.plist for python IDLE mac is dumv

serene scaffold
topaz leaf
topaz leaf
unique flame
frigid elk
#

is there a faster way to get the max of a column in pyspark than below?

    # Define Working Month
    ReportMonth = dfProduct.select(F.max('monthkey').cast('int').alias('max_monthkey')).collect()[0]['max_monthkey']  # noqa

i'm using ReportMonth as a filter in dependent tables to ensure all the data is in sync (product would be the last table in the month to be updated). ... thinking about this, it seems ReportMonth should not need to be returned to the driver, but may be better suited in jvm memory. ... should i even be using a scalar for my filter or would it be more performant to store max_monthkey in a dataframe and broadcast join and filter

raw ivy
#

It's a pandas dataframe,

Price    Sq Footage    Bedroom(s)    Bathroom(s)    URL
$450,000    1,522    3    3    https://remax.com

i.e trying to replace all urls like that (several hundred atm)

queen torrent
queen torrent
raw ivy
pseudo wren
#

what is a way that i can physically display the MSE of my linear regression

#

the MSE, MAE and huber loss

bold timber
#

why I can't change the type of column?

serene scaffold
#

the values in the columns have to translate exactly to ints, or it won't work. if you can extract only the numeric part and round it up or down, then you can convert it to an int.

#

btw, I wouldn't access individual columns with dot notation in production code. it's often viewed as sloppy.

serene scaffold
grave frost
arctic wedgeBOT
#

Hey @lapis sequoia!

It looks like you tried to attach file type(s) that we do not allow (.ipynb). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a, .csv, .json.

Feel free to ask in #community-meta if you think this is a mistake.

serene scaffold
#

you'll have to put it on colab or something

#

@lapis sequoia

#

or copy and paste the specific parts of the code that are of interest.

lapis sequoia
#

can i dm u ?

serene scaffold
# lapis sequoia would u help ?

I don't know what the question is going to be or if it's something I know about. I can't commit to anything until I see the question.

lapis sequoia
#

hello, i'm trying to implement descent gradient from scratch
but i didnt get the best fit line, ig i have made some logical error, can anyone help me out, it wont take much time

#
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

def gradient_descent(x,y):
    m = b = 0
    i = 1000
    n = len(x)
    a = 0.00001
    
    for j in range(i):
        y0 = m * x + b
        cost = (1/n) * sum((y-y0)**2)
        dm = -(2/n) * sum(x*(y-y0))
        db = -(2/n) * sum(y-y0)
        m = m - a * dm
        b = b - a * db
        print ("m {}, b {}, cost {}".format(m,b,cost))
x= np.array([160,163,165,168,170,173,175,178,180,185,188,190,193,195,198,200])         
y= np.array([50,58,59,70,67,70,78,77,79,81,87,83,92,88,99,100])
gradient_descent(x,y)

#PLOTTING THE GRAPH AFTER GETTING THE VALUE OF m AND b

y1= (0.43291477995* x) -0.00896346
plt.plot(x,y1)
plt.scatter(x,y,marker='o',color='red')
plt.xlabel("Height(cms)")
plt.ylabel("Weight(kgs)")
plt.title("Gradient Descent")
plt.show()```
#

"not the best fit line"

serene scaffold
lapis sequoia
#

bad bot 👎 👺

serene scaffold
arctic wedgeBOT
#

Don't get mad at me lemon_sentimental I'm doing exactly what the staff programmed me to do!

lapis sequoia
#

🤔 the slope should be more

#

ig

#

idk, doing it first time 😭

serene scaffold
#

@lapis sequoia try making the learning rate larger, like 0.001 instead of 0.00001

lapis sequoia
arctic wedgeBOT
#

Don't insult my creators like that lemon_angrysad they love me lemon_sentimental

lapis sequoia
lapis sequoia
agile cobalt
#

it might not be returning at all

serene scaffold
#

it will implicitly return None if it doesn't hit any other return statements

#

!code

arctic wedgeBOT
#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

agile cobalt
#

wList is empty?

serene scaffold
#

this is how you should share code, as much as possible ^

#

there's probably a non-iterative approach to what you're trying to do

#

I would need an example with all variables defined to diagnose.

lapis sequoia
abstract cedar
#

Anyone know of a good tool or method for comparing two ML-generated transcripts of the same audio (two different models were used) and outputting a more accurate transcript?

serene scaffold
deft spire
#

Does model.fit(...) handle epochs automatically or I'd have to add for loops for better accuracy (sklearn)?

bold timber
serene scaffold
bold timber
#

can u give me a clue?

serene scaffold
bold timber
serene scaffold
bold timber
#

so, why?

serene scaffold
#

Reason*

bold timber
serene scaffold
#

if you had "20.33cm", the problem would be the same

thin palm
#

Anybody in the consulting field and can spare me 10-15 minutes of questions I have when analyzing a business problem we're trying to solve? I'd like to pick your brain for a little bit to see how consultants think when faced with a problem. Please let me know and I look forward to hearing from you.

serene scaffold
misty flint
#

facts

bold timber
serene scaffold
#

look into pandas string series slice

grand scaffold
#

Yo

#

I'm trying to learn ml

#

And I'm trying to make a simple neural network

#

So I have a problem with my code

#

It seems to have trouble making predictions

#

Ok nvm

#

It works

#

But the prediction is wrong

#

Lmao

normal latch
#

what are you predicting? (I mean what kind of problem is )

grand scaffold
#
import numpy as np
feature_set = np.array([[70, 70, 60],[30, 40, 50],[60, 50, 70],[40, 50, 70]])
labels = np.array([[1, 0, 1, 0]])
labels = labels.reshape(4,1)
np.random.seed(42)
weights = np.random.rand(3,1)
bias = np.random.rand(1)
lr = 0.05
def sigmoid(x):
    return 1/(1+np.exp(-x))
def sigmoid_der(x):
    return sigmoid(x)*(1-sigmoid(x))

for epoch in range(20000):
    inputs = feature_set

    # feedforward step1
    XW = np.dot(feature_set, weights) + bias

    #feedforward step2
    z = sigmoid(XW)


    # backpropagation step 1
    error = z - labels

    print(error.sum())

    # backpropagation step 2
    dcost_dpred = error
    dpred_dz = sigmoid_der(z)
    z_delta = dcost_dpred * dpred_dz

    inputs = feature_set.T
    weights -= lr * np.dot(inputs, z_delta)

    for num in z_delta:
        bias -= lr * num
    
 
single_point = np.array([40, 40, 50])
result = sigmoid(np.dot(single_point, weights) + bias)
print(result) 
normal latch
#

is this andrew ng course?

grand scaffold
#

Who's that?

#

So basically I'm trying to get how this works

abstract cedar
grand scaffold
#

And I'm having trouble

grand scaffold
#

Ok nvm

bronze flume
#

Hello guys, im new to ML, i need help plz

serene scaffold
#

@bronze flume it looks like somewhere along the way you ended up with the string '?' where a number was supposed to be. can you think of how that would have happened?

For the rest of the time that I help you (and in the future) I won't look at any screenshots of text that could have been copied and pasted as text. I won't look at screenshots of dataframes, either, so I'll give you instructions if you need to share those.

#

!code

arctic wedgeBOT
#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

serene scaffold
#

^ This is how to show code on Discord

bronze flume
#

Oh sorry, thanks for ur help, i will check it out

serene scaffold
#

@bronze flume at some point along the way, did you replace missing data with '?'?

bronze flume
serene scaffold
#

@bronze flume do you ever do model.fit somewhere that isn't shown?

bronze flume
#

i did it, i get the same ValueError

serene scaffold
#

```py
code goes here on a new line
```

bronze flume
#
import pandas as pd;
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn import ensemble
from sklearn.metrics import mean_absolute_error


data = pd.read_csv('jm1.csv')
data.head()

X = data.drop(columns=['defects'])
y = data['defects']

model.fit(X,y)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
serene scaffold
bronze flume
#

Okey i undrestand

#
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
model.fit(X_train,y_train)
#

it should be like this

strange zealot
#

can someone here read partial auto correlation function plot?

rose agate
#

Hi, I've been tasked with using a transformer to do time-series forecasting. I've started using the python darts library, which has a transformer implementation. The dataset I'm working with has ~1000 times series, for a whole year recorded hourly. As expected, training is impossibly slow, so for the time being I've shortened the number of time series and length of the series.

  1. Are there are any other transformer implementations, with optimizations like in this paper [https://arxiv.org/abs/1907.00235] that would be better suited?
  2. Is there any other models I should consider using instead of transformers? I've been asked to look at transformers but I'm sure if there's anything better it would be fine
  3. My time series (energy readings) have an hourly, daily, and monthly seasonality's, so it's obviously better to train using a whole years worth of data, however it's expensive. Is it possible to maybe only train on a few weeks of data and then use some other transform to account for the longer yearly seasonality?
loud flame
#

Can anyone help me improve my Random Forest Classifier Model which I made to predict the people who'd deposit in a bank based off some info,? A GridSearchCV addition can work too.
If so, I shall send the nb and datasets ( both cleaned and uncleaned ) and idrk what to do with the test dataset either.

tropic matrix
#

In pandas, I have a dataframe of values but I'm trying to find a way to get a new dataframe from the original where each column has at least one of every unique value from the original

for example, if i have:
0 | 1 | 2
1 | 2 | 2
2| 1 | 4
3| 3 | 1
4| 3 | 1
5| 0 | 1

I want to get:
0 | 1 | 2
1 | 2 | 2
2| 1 | 4
3| 3 | 1
4| 0 | 1

but I want to apply that same "algorithm" over a lot more rows (26), is there an efficient way to do this in pandas?

hollow shore
#

Hey, i am having problems understanding how pyspark sql works in real world projects can anyone recommend a project which does this, so, i can look at the code and understand it?

rose agate
tropic matrix
tough frigate
tropic matrix
tough frigate
#

Documentation, and search drop duplicates lol I forgot the syntax

tropic matrix
#

thank you!

#

(it's just df.drop_duplicates())

tough frigate
#

Hahe yeah

jagged pewter
bold timber
#

What is difference between Total Sum of Squares and Residual Sum of Squares?

odd meteor
odd meteor
#

TSS = RSS + ESS

RSS (Residual Sum of Squares) = the sum of squares of the error.

While OLS isn't really considered as an optimizer, it's however a non-iterative method we use to fit a model such that the RSS of observed and predicted values are minimised.

So RSS = TSS - ESS

With this understanding, your loss function MSE, when further broken down is given as

MSE = 1/N * (RSS)

You can look up the formulas in the attached image above to get more clarity

teal mortar
#

hi guys, does anyone have experience with running jupyter notebooks/lab in pytorch/pytorch docker containers, how do you correctly connect to them, I bind the port of my machine with container port like -p 7000:7000 when run the docker image, so ports and the ip is set, but for some reason I cannot connect to the notebook when starting it from inside the container from bash terminal, if someone has experience doing so, please help, thank you

deft spire
bold timber
#

Thank you @jagged pewter @odd meteor

jagged pewter
polar depot
#

This notebook has included save & restore code:

floral valley
#

has anyone got any experience using the arima library and more specifically identifying the p,d,q values?

serene scaffold
crisp flax
#

I asked in R discord and got no reply, so I'm wondering if I will get anything here:

Does anyone know how I could use R's inbuilt arima function to agree better with Python's statsmodels ARIMA? I am getting very different results. I have to say I do not have all the expertise on ARIMA beyond the definitions, so understanding both R or Python's implementations is a bit beyond me

arctic wedgeBOT
#

Hey @loud flame!

It looks like you tried to attach file type(s) that we do not allow (.ipynb). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a, .csv, .json.

Feel free to ask in #community-meta if you think this is a mistake.

loud flame
#

Hi, can anyone help me resolve this problem which doesn't allow my code to be submitted since I donot have 22605 rows ( requirement ), but my code is alright

serene scaffold
#

@loud flame are you showing the code in order? The order here doesn't look right. Order matters

#

Try pasting all the code in order as text

#

!code

arctic wedgeBOT
#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

loud flame
#

but i've done it in different cells

serene scaffold
#

You can copy and paste each one individually.

loud flame
#

aight

serene scaffold
#

I'll be back in a few minutes to look at it

loud flame
#

if you can, please suggest ways in which I can improve my model's performance

loud flame
#

for discord's limit

serene scaffold
arctic wedgeBOT
#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

loud flame
#

can I just link the kaggle notebook

#

also the error of 22605 was taken care of

serene scaffold
#

you can try, but McAfee is wrong about the paste bin.

loud flame
#

I had previously removed rows of data

#

in the name of "cleaning" it

serene scaffold
#

I see. do you still have a specific question, then?

loud flame
#

after removing the cells where I removed the rowws

#

my accuracy went down to 89% from 90%

serene scaffold
serene scaffold
#

I'm not interested to do that though.

loud flame
#

its just 3 files, the code, train, and test data

loud flame
grand scaffold
#

Should I increase the possibilities of the training data so the machine has better accuracy?

#

So if I had data that's on a scale of 1 to 10 and each set of data results to a specific value

#

Should I use bigger numbers?

#

So instead I use a scale of 1 to 100 and have more accurate data?

#

So instead of inputting 5 I input 53 for example?

#

As long as the data is accurate?

hollow flare
#

How to start machine learning?

#

From where should I start learning

#

?

grand scaffold
#

This is the most easiest beginner article I have read so far

prime hearth
#

@hollow flare in addition to the great article above, try to watch krish naik youtube channel his video on data science 2022. That will help you to understand what needed to get job as ML or DS

rocky mason
#

i think learn from supervised first,

hollow flare
bronze spire
#

From where do I start learning about Data Science?

loud flame
#

getting infinite acc

#

if helping shall send code

misty flint
#

see where the differences are in implementation

#

they probably make different assumptions

crisp flax
#

I'd prefer not looking in the source code

#

But I suppose it is a last resort

misty flint
#

idk if anybody here is familiar with statsmodels' ARIMA so idk what to tell you

#

but maybe

bronze spire
#

Guys, what's a good free source to learn Data Science from?

rotund zenith
# bronze spire Guys, what's a good free source to learn Data Science from?

Udemy has some pretty helpful and informative courses. However they're not free unfortunately, but they have regular sales for like $10 - $20, which ive found are far better than some of the college courses ive taken and certainly better than any free courses out there since they go really in depth

#

Anyone ever worked with Delta Lake and Spark?

#

Followed the quick start guide on the delta lake homepage, but cant get passed the first step cause of this error

#
/opt/spark/python/pyspark/shell.py:42: UserWarning: Failed to initialize Spark session.
  warnings.warn("Failed to initialize Spark session.")
Traceback (most recent call last):
  File "/opt/spark/python/pyspark/shell.py", line 38, in <module>
    spark = SparkSession._create_shell_session()  # type: ignore
  File "/opt/spark/python/pyspark/sql/session.py", line 553, in _create_shell_session
    return SparkSession.builder.getOrCreate()
  File "/opt/spark/python/pyspark/sql/session.py", line 233, in getOrCreate
    session._jsparkSession.sessionState().conf().setConfString(key, value)
  File "/home/faizififita/.local/lib/python3.7/site-packages/py4j/java_gateway.py", line 1322, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "/opt/spark/python/pyspark/sql/utils.py", line 111, in deco
    return f(*a, **kw)
  File "/home/faizififita/.local/lib/python3.7/site-packages/py4j/protocol.py", line 328, in get_return_value
    format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o30.sessionState.
: java.lang.IncompatibleClassChangeError: class org.apache.spark.sql.catalyst.TimeTravel can not implement org.apache.spark.sql.catalyst.plans.logical.LeafNode, because it is not an interface (org.apache.spark.sql.catalyst.plans.logical.LeafNode is in unnamed module of loader 'app')
#
        at java.base/java.lang.ClassLoader.defineClass1(Native Method)
        at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017)
        at java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:174)
        at java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:800)
        at java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:698)
        at java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:621)
        at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:579)
        at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
        at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
        at io.delta.sql.parser.DeltaSqlParser.<init>(DeltaSqlParser.scala:71)
        at io.delta.sql.DeltaSparkSessionExtension.$anonfun$apply$1(DeltaSparkSessionExtension.scala:78)
        at org.apache.spark.sql.SparkSessionExtensions.$anonfun$buildParser$1(SparkSessionExtensions.scala:239)
        at scala.collection.IndexedSeqOptimized.foldLeft(IndexedSeqOptimized.scala:60)
        at scala.collection.IndexedSeqOptimized.foldLeft$(IndexedSeqOptimized.scala:68)
        at scala.collection.mutable.ArrayBuffer.foldLeft(ArrayBuffer.scala:49)
        at org.apache.spark.sql.SparkSessionExtensions.buildParser(SparkSessionExtensions.scala:238)
        at org.apache.spark.sql.internal.BaseSessionStateBuilder.sqlParser$lzycompute(BaseSessionStateBuilder.scala:124)
        at org.apache.spark.sql.internal.BaseSessionStateBuilder.sqlParser(BaseSessionStateBuilder.scala:123)
        at org.apache.spark.sql.internal.BaseSessionStateBuilder.build(BaseSessionStateBuilder.scala:341)
        at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1145)```
#
        at org.apache.spark.sql.SparkSession.$anonfun$sessionState$2(SparkSession.scala:159)
        at scala.Option.getOrElse(Option.scala:189)
        at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:155)
        at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:152)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.base/java.lang.Thread.run(Thread.java:829)```
lapis sequoia
#

hey

#

could someone explain me data science ?

misty flint
#

it looks there might be an issue when translating python code to java..? idk. i suck at reading tracebacks tbh

lapis sequoia
#

k

rotund zenith
#

My guess is the error stems from this /opt/spark/python/pyspark/shell.py:42: UserWarning: Failed to initialize Spark session.

#

Is that familiar to anyone?

lapis sequoia
#

kinda

misty flint
#

yes but if you follow it, look at this line:

> : java.lang.IncompatibleClassChangeError: class org.apache.spark.sql.catalyst.TimeTravel can not implement org.apache.spark.sql.catalyst.plans.logical.LeafNode, because it is not an interface (org.apache.spark.sql.catalyst.plans.logical.LeafNode is in unnamed module of loader 'app')```
lapis sequoia
#

hey

misty flint
#

so im curious if the reason the spark session is not initializing may be due to the Py4J library or not

lapis sequoia
#

Could someone suggest me a project ?

rotund zenith
lapis sequoia
#

please suggest

rotund zenith
lapis sequoia
#

Python of course

loud flame
lapis sequoia
#

just

#

suggest

rotund zenith
#

Practice clustering some data using different clustering algos

lapis sequoia
#

hmm

#

k

loud flame
#

predict the various theories that we'd be able to prove and apply in the future, based on the theories and ideas we thought of years ago and have it well and running in the present day

#

such as AI, Electric Cars, Better Quality cameras, Better Infrastructure in cities ( all of these are too basic )

#

@lapis sequoia

lapis sequoia
#

k

loud flame
#

+Landing Boosters

loud flame
#

the Iphone too

#

iphone, mac

#

everything

slate hollow
#

i've done some research but i can't seem to find if vs (not vsc) 2022 is compatible with cuda 11.2.2
so yeah, is it?
and i'm just tryna get tensorflow set up, and from what i've seen the most recent version
of tensorflow only supports 11.2

topaz leaf
#

hey people anyone willing to answer some questions i have regarding my attempts at a computer vision project

tropic matrix
#

Is there a way to utilize multiprocessing when training DNN models with keras? I have to train nearly 2000 models, and each take around 30 seconds to train each. I was wondering if there’s a way to train multiple models at a time in order to speed up this training (assume that ram isn’t a concern)

serene scaffold
tropic matrix
serene scaffold
serene scaffold
# tropic matrix I have not

start by making a function that takes a float between 0 and 1 as a parameter (ie the percentage of the data that you want to use), and which does everything else that you need in the function

tropic matrix
#

ok, and i'm assuming that function is what's duplicated in multiple processes?

tropic matrix
#

should i worry or be concerned about anything else related to training? does tensorflow keras play nicely with multiprocessing?

serene scaffold
#

I don't know. what would happen under the hood is that anything you use in that function would get pickled, and then unpickled in a few process

#

(since Python doesn't support true multithreading)