#data-science-and-ml
1 messages Ā· Page 408 of 1
this is not as helpful as the texit bot
.latex $\begin{bmatrix} u_1 & u_2 & \dots & u_N \end{bmatrix} \begin{bmatrix} v_1 \ v_2 \ \vdots \ v_N \end{bmatrix} = \sum{n=1}^N u_n v_n$
this is terrible, it doesn't update on edit. i'm sorry about the spam
.latex $\begin{bmatrix} u_1 & u_2 & \dots & u_N \end{bmatrix} \begin{bmatrix} v_1 \ v_2 \ \vdots \ v_N \end{bmatrix} = \sum_{n=1}^N u_n v_n$
@scenic tulip finally, here we go. i just did this several times. after you've seen it enough times, you can intuit the operations in your head for simple transformations
@wooden sail yeah that's sweet. I've never heard of latex but it allows you to post calculations in an image that is somehow formatted?
it's for formatting pdf documents in general, but it's famous for allowing you to nicely typeset equations and diagrams
It's good for formal stuff like research papers
wow i've never heard of this but yeah....wow that's awesome stuff
a lot of people think of latex as the "math formatting language", but it's really for general-purpose typesetting. think "microsoft Word but code"
like, you can even set variables and stuff.
it's like word, except you actually have an idea what's going on with your document
except for when you get unexpected behavior
I was using a macro that unexpectedly added exclamation points, and that isn't even what the macro is specified to do.
I actually learned just today that you're supposed to, in align, put & right before the alignment point, not after
I was aligning a ton of shit by spaces. š
I thought the & was the alignment point
all I know is that if you do
x =& 5\\
y =& 10
the spaces after = get smaller than they should be
&= is the right way

does anyone know why my tensorflow isn't detecting any gpus on my pc? I've got an rtx 3080
how did you install tensorflow
and how do you know it's not detecting your gpu
@plush jungle ?
also, when I say "how do you know it's not detecting your gpu", I'm not asking "are you sure that it's not ...".

Is there any AI developer community of python
hey everybody, is there an app or something which can be able to fix your code while programming, I'm doing my project, I mean is this thing existing before?
Sourcery extension to vscode for example can refactor the code for you
heyy guys //
i am working on a dataset .. but having some problem . please help me out
these are my datasets
https://colab.research.google.com/drive/1zTxsOwwDOkNYptUdQKYjKaJtA6ltelbJ?usp=sharing and this the link of my Notebook
and where is your problem
if we click the link, we have to request access. but it's easier if you create a minimal example of your problem and an explanation of what you want to have happen instead.
You are provided with the leads data of last year containing both direct and indirect leads. Each lead provides information about their activity on the platform, signup information and campaign information. Based on his past activity on the platform, you need to build the predictive model to classify if the user would buy the product in the next 3 months or not. ....... this is what i want to do
here only .. above msg
are you asking us to do it for you? what part are you having trouble with?
Hey @edgy agate!
It looks like you tried to attach file type(s) that we do not allow (.ipynb). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a, .csv, .json.
Feel free to ask in #community-meta if you think this is a mistake.
Hello! I am thinking of an idea for research on the topic of parameter optimisation viewed as a language problem. Here is what I mean by that - There are already multiple big pre-trained language models such as CodeBERT which can generate good contextual embeddings for source code. So if they're used as a baseline and built upon, we can create a supervised learning pipeline that predicts code parameters which satisfy desired outcomes. For example if we have the function def f(x): return 2 + 2 * x - x*x we can ask the model to maximise it and to find that the desired x is 1. At the beginning we expect to be able to solve such simple optimisation problems, but with time we may derive methods which are able to solve for more parameters and complicated functions and probably even have such a model to optimise parameters for other ML models in the future. If achieved this approach may replace or work together with traditional hyper-parameter tuning solutions like Bayesian optimisation (which are computationally expensive since they require testing the function itself with multiple parameters).
One approach will be to take the problem purely as a language task and replace the desired parameter(s) with a masked token and then train a model (fine-tune pre-trained BERT-like model) to predict such tokens given desired outcomes.
Another approach will be to take advantage of the pre-trained NL-PL models to generate embeddings for the source code, but then use these representations in a separate regression model. In this case it might be a good idea to built some meta learning environment to better generalise to different functions and then take few-shot approach by first providing a few examples of input-result pairs and then asking for predicted parameters given a desired outcome.
What do you think about the idea as a whole and the proposed approaches? Do you think they're feasible and if not - why? Do you think such a study will be pointless and if so - do you have better ideas in this direction?
@fiery adder this there a tldr for this?
I think this whole setup seems kinda vague, trying to make a language model predict the outcome of some given formula seems like an inefficient and likely bad way to optimize parameters
How would the language model even know what good parameters are?
generating data to train this seems ghastly. either you need to check out basically all the machine learning everyone has ever done and the learned parameters, or you'd need to somehow make it self supervised and each example will involve solving a whole machine learning problem. or did you have some idea on how to circumvent this?
Hi, I'm doing KMean clustering on a article texts under the same category to get subcategories.
I'm only getting one major cluster, can someone tell me what I'm doing wrong?
I tried with lemmatizing and without,
with original text and and with cleanup.
with max features at 8k and without setting max features.
https://github.com/MAmr21/EGYFWD/blob/main/KO/Article Classification/articles classifier.ipynb
can someone explain how the code for the gradient descent is theta = theta - alpha * (1/m) * (X' * ((X * theta)-y));
this is the formula
you want an explanation of the math or how to code the math?
i think you need to escape some asterisks in what you wrote with a \
but at any rate, it looks like the expression you wrote is in terms of matrices and vectors
matrix-vector multiplication is itself a sum of products, just like the image you showed
.latex \boldsymbol{Ax} = \begin{bmatrix} \boldsymbol{A_{1,:} x} \ \boldsymbol{A_{2,:} x} \ \vdots \ \boldsymbol{A_{m,:} x} \end{bmatrix}
oof
here
you can think of a vector as an n x p matrix with p = 1
then you see the multiplication is indeed a sum following that definition
hmm makes sense
but for the cost function i had to use the sum function
J = (1/(2 * m)) * sum(((X * theta)-y).^2)
oh i get it
Thanks for the help
in row/col 3/3 i want to plot 2 x axis but i dont know how i can achieve that in the grid, im able to plot a second y-axis but x doesnt work...
@wooden sail you on rn?
Maybe someone else knows this. So I'm writing out arrays of results, containing 20 elements to a file. When it writes the output comes out as this :
[ 7 8 8 ... 2 -2 1]
[ -7 -13 -14 ... -2 5 3]
...
[ -1 -3 -2 ... -8 -6 2]
[ 2 4 15 ... 8 3 0]
[ -2 -2 -9 ... -1 -4 -2]]```
How can I view all of the in between data
this shouldn't really matter, you never want to look at the entirety of a large matrix with your eyes (for the most part)
you could write the contents as a csv if you like
Hi, I am going through deep minds RL slides by David Silver, and I have a question on moving mean and how it forgets past data.
in chapter 4, for model free RL, there is a topic on monte-carlo method that that uses incrementing mean with running average
V(St) ā V(St) + α (Gt ā V(St))
here, α is supposed to be the one thing that represents a moving mean/running average. what I don't understand is how would the formula forget the past values of V(St) when we keep using it iteratively.
if you do a couple of iterations, it might become more clear. let's replace this with a simpler nomenclature first. say, y <- y + a(x - y)
we can rearrange that into (1 - a)y + ax. and you probably have a condition like 0< a < 1
at the next iteration, instead of x, we have some other value. let's call it z.
then we get (1-a) [(1-a) y + ax] + az
we expand into (1-a)^2 y + a(1-a)x + az
as the sequence continues, y will get mutliplied by increasingly high powers of (1-a), and the previous values of the updates Gt too (but with a lower exponent than y)
since (1-a) is also between 0 and 1, the more you repeat this, the smaller the value of y, and also of the old updates
i wrote it that way so that you can kinda see that the algorithm produces a weighted sum at every iteration. the higher the iteration number, the smaller the weights of the older quantities
thanks for taking the time to answer Edd, just give me a minute to process this
for this part, is the condition 0 < a < 1 often the case?
is it because of the a(x-y)
it should be the case, yes
ohhh, i think im getting it
wait, is a less than 1 because of the idea of iterative mean?
like the formula before α was 1/N(t), but for non-moving average
i would have to see how your book defines this stuff, i would call it either "momentum" from the ML perspective or "convex combination" from the linalg standpoint
oh, im using the RL slides from deep ai, the 2015 one, should I share the link? i think im gettting the idea tho
but the idea, if you look at V and G as vectors, is that this operation yields a vector pointing from V to G and passing through V. this is the parametric equation of a line joining two points in N dimensional space. if alpha is equal to 0, you stay exactly at V
if alpha becomes 1, you move all the way to G
for values in between, you land on the line segment connecting them
setting alpha = 0 means "no change", while alpha = 1 means "forget the previous stuff entirely and just move to G"
thats a lot of linear algebra words š
but im getting the idea, Ill have to dig deeper into it
thank you @wooden sail , I thought I would have to wait a while to get help
glad it helps. i'm not familiar with those slides, so if you could share the link, that'd be cool. it's not like i'm a mathematician or anything either, but i've learned most of the stuff this way thanks to uni
https://www.deepmind.com/learning-resources/introduction-to-reinforcement-learning-with-david-silver
its from this link, the chapter 4
for linear algebra, did you use the "mathematics for machine learning" ?
I have a copy but its just sitting there cause I thought I had just enought linear algebra
i've checked some of linear algebra done right by axler and linear algebra done wrong by treil, and also gilbert strang's linear algebra. just straight up math books
and then several papers and books on optimization, signal processing, etc
lots of really great tips, thank you kindly
i learned about machine learning as an application of maths, really very late into the game š i don't know most of the pop nomenclature
you have a very strong foundation tho, coming from math
i'm comfortable with mangling indices and wiping my tears while staring at a piece of paper, yes

my friend who also has a background in math is my go-to when i dont understand a new algorithm

hes also very good at solving problems irl too
i tried getting into linear algebra with 3b1b,
i guess that is way below the barrier
actually
this is my hot take, but 3b1b linalg is not good to learn from
it's GREAT to review concepts, but NOT to learn
what about khan academy?
it's presented from the standpoint that you already learned the concepts (somewhat) or have at least heard about them
i had a hard time learning from there
khan academy is usually solid for practicing concrete problems. grinding through a few can build intuition
really? I had a really hard time there, felt like the talk about determinants was different from 3b1b
i thought of trying gilbert strang but 3b1b 16 video playlist looked from enticing.
did they hit you with a laplace expansion
they hit me with a basic 3 equation thingy
i think gilbert strang's book is pretty good. it won't go into more abstract stuff though
hmmm, im motivated now, ill try the video playlist first tho
oh, if you dont mind, I have a another question on RL
about temporal difference
mhm?
V(St) ā V(St) + α (Rt+1 + γV(St+1) ā V(St))
do you happen to know this formula?
for temporal difference, I have a question on it thats bothering me
looks familiar
its supposed to be used for model free RL, when we can't step into the state of next time step St+1
but the formula has the recursive V(St+1) in it,
wait, I think I am making the question more complicated
so, to restart, if we have the model, we could recursively call V(St+1) from V(St) which in turn calls V(St+2) from V(St+1)
thats what I got for a model based RL
but temporal difference is an algorithm thats used for model free
and it has the V(St+1) being used as a part of the formula to find V(St)
im confused on how a model free algorithm can do this
i'm not really sure, the nomenclature in the slides is all weird to me š
im looking for the "pain" reaction lol
I guess that problem is for the tomorrow me
what kind of visualisation can I do to show this data
I am thinking of a scatterplot with equal distances on x axis for each country. With 2 coloured dots at each x denoting the value of administered vaccines for each date. With legend denoting colour of each date.
anyone familiar with sklego's RBF here?
@wooden sail writing as csv did it...thank you!!
cool
if you just need it stored for later but don't need to actually look at the matrix, consider also .npy or npz
what might be the issue?
why is a column of ones added to the data matrix after feature normalization?
You use boxplots to show how a numerical variable varies within a category.
Guys is standard matplotlib and seaborn enough for visualisations or should we know some advanced visualisation libraries like cuff links
that obviously depends on how complicated you need stuff to be
But matplotlib can do a whole lot, have never been limited so far, except for maybe 3d stuff
I am also not sure how data can be generated efficiently. But it turns out that HPO has already been tested as a sequence problem with Transformers. https://arxiv.org/abs/2205.13320
So if the community here can suggest any feasible way of generating a dataset for the described approach or if data exist for something similar?
you see they discuss there usage of vast amounts of HPO data
which at google they certainly have. idk how easy it is to get that in the wild, though
you rely on people all over the world having solved enough problems to make this trainable
it literally depends on what your use case is like camel mentioned
but matplotlib/seaborn is pretty robust for quick visualizations
my personal favorite is plotly
theres also specific data viz software like tableau/powerBI/looker/etc.
but that tends to be more in the business context where you are creating something for business stakeholders
i.e. you need to create a dashboard showing X, Y, Z for someone in a specific business unit/function
if that is your world, then i highly recommend "storytelling with data" by cole knaflic

I'm relatively new to pycharm and pandas,
does anyone have a minute to help me figure out where to start and how to make assessments on trends?
helo friends, i am getting a warning in pandas, did some reading on stack overflow, unable to fully grasp it
ticker["candle"] = np.array(range(len(ticker)))%25 + 1
__main__:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
how to fix it?
Yeah but that's what my column is, isn't it?
Do you have NaN values?
i suggest using a map visual with these details.
and a bar chart for the top and bottom countries if you care about that.
anyone got an idea?
yeah
You need to drop those
hmm worked now
if anyone's interested in RecSys, there's a series by the great chip huyen this month; starting tomorrow at 10a PT!

Hello all. I am having a lot of fun messing around with pyplot and I need a bit of some help.
3.5 sessions, ending with a big RecSys ML System Design Session
Hey everyone! Can anyone confirm if we can change color of seborn catplots based on conditional statements
I am trying to draw a line graph with formatted percentages on the y-axis. Currently, these are formatted strings. The formatted strings are not ordered correctly, trying to sort them gives me a squiggle.
I think what I would need to do is find a way to format the floating point numbers as they're displayed instead of converting them to a string and formatting that.
And bar_label doesn't seem to work with catplots either. Any Idea about it ?
okay thank you
from isolation import isolate_total_stub, isolate_age_stub
import matplotlib.pyplot as plt
from matplotlib.ticker import (MultipleLocator,
FormatStrFormatter,
AutoMinorLocator)
# very simple extraction, drop some columns and check some data
cdc_data = pd.read_csv('CDC_Delay_of_Care_Data.csv')
cdc_data = cdc_data.drop(columns=['INDICATOR','FLAG','UNIT'])
# do you have good data?
data_types_valid = type_check_numeric_columns(cdc_data)
acceptable_null_threshold = compare_nulls_against_threshold(cdc_data)
# separate the categories of delayed care
delay_of_medical_care = cdc_data[cdc_data.PANEL == 'Delay or nonreceipt of needed medical care due to cost']
# isolate the totals stub
total_delay_of_medical_care = isolate_total_stub(delay_of_medical_care)
x_axis = total_delay_of_medical_care.YEAR
y_axis = total_delay_of_medical_care.ESTIMATE
fig, ax = plt.subplots()
ax.plot(x_axis, y_axis)
plt.show()
I am not using the ticker library imports at this time
oh sorry I thought you were talking to me. excuse me
I gave that a try and it did not work. I am now certain that my data is wacky. I have repeated values that are true for some year and false for other years. And I was plotting year-wise graphs from my data. Those values being true for some and false for others is toasting up the library. I might just break my data into separate files rater than them being in a single file. That should do the job. Thanks anyway!
What does the data look like
Hey I am a beginner ,
trying to automate data from MySQL database to spread sheet and I have all the basic libraries required, sheets api is also enabled.. created credentials for the same on GCP
Have given the right path to the credentials.json file and everything still I seem to go nowhere
Can someone please help me out ?
The debug log is
PS C:\Users\conta\OneDrive\Desktop\Workspace> & 'C:\Python310\python.exe' 'c:\Users\conta.vscode\extensions\ms-python.python-2022.6.3\pythonFiles\lib\python\debugpy\launcher' '51612' '--' 'c:\Users\conta\OneDrive\Desktop\Workspace\pyautomation\sheetsNew.py'
There is an Exception in credsLogin Function : 'module' object is not callable
Authentication DONE !
C:\Python310\lib\site-packages\pandas\io\sql.py:761: UserWarning: pandas only support SQLAlchemy connectable(engine/connection) ordatabase string URI or sqlite3 DBAPI2 connectionother DBAPI2 objects are not tested, please consider using SQLAlchemy
warnings.warn(
MID PID merchant_name locality city
0 b'242307' b'1418703' b'Ruchi Curry Point' b'Manikonda' b'Hyderabad'
1 b'243056' b'1418703' b'Ruchi Curries' b'Madhapur' b'Hyderabad'
2 b'650871' b'1418703' b'Ruchi Curries' b'Nizampet' b'Hyderabad'
3 b'1235155' b'1418703' b'Ruchi Curry Point' b'Nizampet' b'Hyderabad'
4 b'1318633' b'1418703' b'Ruchi Curry Point, Nizampet' b'Nizampet' b'Hyderabad'
Deleting Google Sheet...
There is an Exception in clearGoogleSheet Function : Could not automatically determine credentials. Please set GOOGLE_APPLICATION_CREDENTIALS or explicitly create credentials and re-run the application. For more information, please see https://cloud.google.com/docs/authentication/getting-started
Writing Google Sheet...
There is an Exception in writingGoogleSheet Function : Could not automatically determine credentials. Please set GOOGLE_APPLICATION_CREDENTIALS or explicitly create credentials and re-run the application. For more information, please see https://cloud.google.com/docs/authentication/getting-started
Part 1 Completed !
Hi guys, i've built a model which takes keywords and generates narratives. However, i find the bleu and rouge evaluation isn't appropriate for my case.
So instead am thinking of evaluating by how much the user input keywords is present in the generated text. Would this be a proper way of evaluating how much keywords permeated in the text? Does such a metric or better exists? If not, how would i proceed? Thanks and please @ so i get notified when replying
i want to simulate transfer learning
how do i do it?
i have trained my model
now i wanna check how it will fine tune on deployment
https://stdworkflow.com/269/matplotlib-solves-the-problem-that-x-axis-values-are-not-sorted-by-array
Problem Description¶
Just look at the title. Let me show you the picture first.
The code and data corresponding to this figure are as follows. ā¦
What do you mean fine tune on deployment?
i mean normal fine tune
retraining sort of
Can someone please help me on how to setup my GPU for deep learning on tensorflow
Hey @wooden sail!
It looks like you tried to attach file type(s) that we do not allow (.ipynb). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a, .csv, .json.
Feel free to ask in #community-meta if you think this is a mistake.
Hi, i'm trying to find the intersection point of to sets of data. Neither line cannot be defined by a mathematical function and has each about 21450 values of x and y. Any ideas of functions or libraries i can use?
in what format is the data? arrays? dataframes?
they're series I read from a csv using pandas
Someone mentioned using shapely so I'm trying that now
you can see if there are two rows where the values are the same. or you can take the difference of the two Series and see which index has the smallest difference
Ah yea. That's true. Thanks for the idea
it doesn't seem like they have the same domain, so you'll have to do some padding. otherwise, that seems the easiest way (arg min (abs(diff)))
is this lisp?

No, it's just math
no, that's just math
Lisp would have more parentheses
aurendil tried to joke with me before and also failed
!otn s lisp
⢠python-is-not-lisp
Are there any successful NLP joke/sarcasm detectors out there?
that's a notoriously difficult task, as sarcasm is sometimes difficult for humans to detect, and even when we can, it's relies heavily on world knowledge
It's kind of hard just from a language point of view, yeah. But I've noticed that lots of animals seem to have a concept of play/joking, and you can see it in their facial expression. Probably just need more information than just words.
btw, if any of you are interested, i'm preparing this short intro to jax. specifically, looking at jit, vectorization, and automatic differentiation of functions f:C^n -> R^m (cr or wirtinger calc). the final example does something that could be understood as some form of "deep unfolding"/self supervised training/hyper parameter optimization or whatever you wanna call it. the target is undergrad people with knowledge of linalg and optimization https://github.com/3ddP/jax_example/blob/master/examples.ipynb
any comments and/or feedback are welcome. analytic solutions are used to corroborate the jax results, but the math isn't explained. it's expected the students will already know it
id be interested if you found anything

the twitter API gives you their own sentiment scores, if I remember correctly. what are you trying to do?
you want to get the sentiment score of individual words? I've never heard of that
sentiment scores will reflect the sentiment of the whole tweet
why does printing the head of the dataframe in thonny look like that
it looks gross lol
head is a method
yeah but is there a cleaner way to look at the dataframe
you can see that it says "bound method of ..."
did you try print(df.head()), where you call the method?
yep that's why
but you're just using pandas' native printing functionality. I don't know if thonny does anything like pycharm's dataframe viewer thing
i can't even open anaconda-navigator on my mac anymore
soooo no more uploading ipynbs to my github
gonna stick out like a sore thumb š
The column of dropoff_site have some label. How to do replacing the missing value in load_weight when dropoff_site is 'MRF'?
with fastai that would be as simple as using pretrained model and using fine_tune method on learner https://docs.fast.ai/callback.schedule.html#Learner.fine_tune
Callback and helper functions to schedule any hyper-parameter
can you freelance as a data scientist?
Yeah y not
but i dont have the data for fine tuning
i want to generate that too
Bro you are amazing. Can I ask with which technology you visualised data in the form of dashboard. Do reply @misty flint
1-Is it possible to build a new programming language from scratch, as it is called 0101? Is there any knowledge currently available that helps to do that?
bro this isnt me. this is google's tensorflow
2- When I review some visual and read sources, all I find is a theoretical explanation of 0101's supposed work steps from the beginning, but if I can ask, how was 0101 introduced into the electronic circuit, using any technology and any knowledge?
Hi guys
For supervised training fine tuning you need labeled data
I've started too many courses tbh, i wish i finished half of them š
rip. it helps if you have an end-goal. for me, i might use these concepts at work possibly creating a RecSys protoype. it also helps that it's only 3.5 sessions
3.5 hrs total. 10a PT on sundays
also im super interested so im def planning on completing this one
and this one is less of a lecturer-student style and more of a self-study group style where peeps share more of their experiences/learnings
so i like that format more since its interactive
anyone with a good knowledge of SARIMAX and ARIMAX models or resources on time series forecasting.
I'm working on a personal project which has to do with crypto price modelling, I want to use SARIMAX or ARIMAX before CNN to model
I will quit my job one day and do all these Udemy courses lol
Hi guys, I'm currently writing up a project on the use of neural networks in detecting football tactics and stumbled across a paper which I don't understand. Would anyone be willing to help? I'll dm you the pdf
You could ask the question here and if someone knows the answer they will help
I don't even know how to frame any questions because I don't understand what the paper is saying tbh. I have an understanding of how neural nets work but this is too complicated for me. Am I allowed to upload a file here?
Yup but can i somehow simulate it??
Head to http://brilliant.org/TinaHuang/ to get started for free with Brilliant's interactive lessons. The first 200 people will also get 20% off an annual membership.
āļø NEWSLETTER: https://tinahuang.substack.com/
It's about learning, coding, and generally how to get your sh*t together c:
In this video, I talk about why you keep quitting you...

she has some good points
that i feel is very relevant for people studying the topics in this channel
only if its related to #data-science-and-ml
and sometimes people cant help you here either; it just depends on the problem
helo rex,, remember me?
ok. it is pandas related
hehe.
just ask it
Simulate what?
eww financial data 
using spyder IDE
i have extracted stock data and put it into a dataframe
š¦
For example for making a model that allocate resources based on parameters, can we simulate those condition??
i ask miwojo then
But i think now that if we can simulate manually then whats the need of nn@misty flint
Mistake sir, pardon my dust
Which conditions you want to simulate? Can you explain your goal a bit?
Network slicinf
@tacit basin in this pic, last 3 columns are of interest
i have column "candle" and i need to calculate mean of values of Candle 10, 15, 20 etc only if they belong to same date
there are 22 different dates
what do?
We allocate embb mmtc or urllc based on speed quantity of data etc
helo melio. rex is bullying me. halp plez
Everything is cardinal
yeah melio, you can help stardust; idk how im bullying stardust tho 
im kidding frend :}
groupby by date and candle?
hory shitto. lemme try
groupby ftw
that
and json_normalize
pretty up there on my pandas fave functions

For example you have nn trained on imagenet 1 million images. Then you have labeled images from your domain. You fine tune (transfer learning) nn on these images
If pretraining is done on 500 000 example how big should fine tune data be
getting this error, this wasnt there before i added the groupby line
Yep depends on how similar are ptetrained data to your domain data
df.groupby(['feat1', 'feat2']).mean()
I though you said mean
for example, make group by dates, so 22 groups, then i take candle number from candle column,
for that candle number, i need to take Candle "close" price from another column
Reread again. You said mean that's why I guess
ok. ill omit mean for now
groupby returns groups and if you specify aggregate mathod it will calculate that on group
ill need to read on aggregate
ticker.groupby("date",axis=1)
is this what i do, for date grouping
groupby didnt work, not suited here
i used numpy split, now need to find a way to perform operations on split portions of each dataframe
what is a good way to account for date while working on a model
the date is relevant to my dataset but it is in a format the python interpreter cannot understand
the date is important for me to keep because i need it to record trends in this dataset, but i'm not sure what the best way of separating this data is
Hello people, I am thinking about picking up either "An Introduction to Statistical Learning (with applications in R)" or "Hands-on Machine Learning with Scikit-Learn, Keras & TensorFlow"
Do you have any experience with these, which one would you recommend?
@errant onyx we're going to be partial to the second one, because those three things are python libraries. R is a separate language, ie not python
I know that's the case, it's also the reason I asked the question identically in the R discord
But I sort of wanted to know if you guys think it was good
I've never read either. The book I recommend to beginners is "data science from scratch"
I'm sort of an R person but it seems most ML things are done in Python in industry
That one seems good too
Saw it being recommended too
I work in the AI department of my company, and I don't know anyone who uses R. We just do everything in python
(srry to interrupt convo but how would u guys recommend starting learn AI with python?)
That's what I feared
I'm in academia so most things are done in R here
See the book I just recommend a few messages ago
The one R user I know is a linguistics post doc
haha, of course he/she's from academia
Oh, I know another. Also in academia. I could ask him for advice about how he switched to Python
this? :>
!resources data science
The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.
It's on there.
ty!
I've read Python for data analysis and python crash course, and done some coding in Python in general
But there's a world of books/resources, such a difficult choice
Do you work for a university currently?
Or attend one?
They might give you an ORiley subscription. In which case you can try any book without fear of commitment
I'm doing a phd, should ideally be finishing in 1,5 years
so I have that time to learn more data science basically
I'm not super worried about getting a book, I'll probably read it cover to cover in any case
Just you know, I wanna hit the sweet spot when it comes to a book
Not too basic, not too theoretical
anyway I'm gonna look up the data science for scratch book
thanks
What is your PhD in
BTW I'm at a wedding so I might disappear if someone makes me dance
But I don't wanna
Are you hoping to work as a data scientist in a medical related area?
quite a few times too
I think I would be able to contribute the most there, but I don't feel I should be constrained to only the medical field-
What I mean is that that is probably the best goal for me in a perfect world
but you never know the opportunities that might pop up
I work a bit with bioinformatics right now though
But I have poor knowledge of the underlying algorithms I'd say
I also am on week 5 of Andrew Ng's machine learnign course
He's gonna replace it with a new course though, kind of typical
where do i find apis for data science?
Apis that do what
provide health diagnoses for patients
or i could do my first project with web scraping
oh i didn't mean to ping whoever that was
https://machinelearningmastery.com/start-here/ i'd recommend this guy's blog
i find it hard to think of projects that are personally applicable to me
so i get frustrated when i think of projects
it's tough
i think fitness might be an idea
- Is there an API call limit?
Yes, there are two rate limits per API: 4,000 requests per day and 10 requests per minute. You should sleep 6 seconds between calls to avoid hitting the per minute rate limit. If you need a higher rate limit, please contact us at code@nytimes.com.
up to you. as long as you find the problem interesting, youre more likely to finish it

i have a DS interview tomorrow. that company uses R + Microsoft tooling
lots of peeps in the bioinformatics/pharmaceutical space use R. CDC exclusively uses R as well
im still biased towards python tho since if youre going to deploy models, its going to be in python
sdk's for R are very uncommon
hahaha rip. i forgot SAS is also what they use
absolutely tragic
šÆļø
dance with me stel

is matplotlib a good thing to plot stuf with?
if i cant switch out the default matplotlib backend for it, i def have never heard of it

@errant onyx as I was going to say earlier, if you get a PhD in something that isn't data science in itself, but you can also do data science, I would say that puts you in a good position. Also my cousin's wife is very angry at me for refusing to dance with her.
oh wait jk
cairo is an option
why did you refuse
PhD in something that isn't data science in itself, but you can also do data science
that's me, and it really is valuable
that's the same for pretty much every type of engineer
Being able to get shit done, but being an expert in 1 or 2 areas
I am a dot
i'm trying to run stylegan2 ada
https://github.com/johndpope/stylegan2-ada
but I keep getting this error
RuntimeError: Could not find MSVC/GCC/CLANG installation on this computer. Check compiler_bindir_search_path list in "C:\python\stylegan2-ada-main\stylegan2-ada-main\dnnlib\tflib\custom_ops.py".
the file it's talking about has this code
def _prepare_nvcc_cli(opts):
cmd = 'nvcc ' + opts.strip()
cmd += ' --disable-warnings'
cmd += ' --include-path "%s"' % tf.sysconfig.get_include()
cmd += ' --include-path "%s"' % os.path.join(tf.sysconfig.get_include(), 'external', 'protobuf_archive', 'src')
cmd += ' --include-path "%s"' % os.path.join(tf.sysconfig.get_include(), 'external', 'com_google_absl')
cmd += ' --include-path "%s"' % os.path.join(tf.sysconfig.get_include(), 'external', 'eigen_archive')
compiler_bindir = _find_compiler_bindir()
if compiler_bindir is None:
# Require that _find_compiler_bindir succeeds on Windows. Allow
# nvcc to use whatever is the default on Linux.
if os.name == 'nt':
raise RuntimeError('Could not find MSVC/GCC/CLANG installation on this computer. Check compiler_bindir_search_path list in "%s".' % __file__)
else:
cmd += ' --compiler-bindir "%s"' % compiler_bindir
cmd += ' 2>&1'
return cmd```
is nvcc installed?
msvc is installed, I don't know about nvcc
is that part of cuda?
cause I already installed cuda-toolkit
sounds like it's trying to call nvcc
when I google "download nvcc" it just directs me to download cuda-toolkit
is it in your path?
I would recommend to dig into the content of _find_compiler_bindir() and see what is it looking for
yeah I looked into that actually
there are actually two versions from two different github forks
patterns = [
'C:/Program Files (x86)/Microsoft Visual Studio//Professional/VC/Tools/MSVC//bin/Hostx64/x64',
'C:/Program Files (x86)/Microsoft Visual Studio//BuildTools/VC/Tools/MSVC//bin/Hostx64/x64',
'C:/Program Files (x86)/Microsoft Visual Studio//Community/VC/Tools/MSVC//bin/Hostx64/x64',
'C:/Program Files (x86)/Microsoft Visual Studio */vc/bin',
'C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Auxiliary/Build/vcvars64.bat',
]
def _find_compiler_bindir():
for compiler_path in patterns:
if os.path.isdir(compiler_path):
return compiler_path
return None```
this is one
this is the other
def _find_compiler_bindir():
hostx64_paths = sorted(glob.glob('C:/Program Files (x86)/Microsoft Visual Studio/*/Professional/VC/Tools/MSVC/*/bin/Hostx64/x64'), reverse=True)
if hostx64_paths != []:
return hostx64_paths[0]
hostx64_paths = sorted(glob.glob('C:/Program Files (x86)/Microsoft Visual Studio/*/BuildTools/VC/Tools/MSVC/*/bin/Hostx64/x64'), reverse=True)
if hostx64_paths != []:
return hostx64_paths[0]
hostx64_paths = sorted(glob.glob('C:/Program Files (x86)/Microsoft Visual Studio/*/Community/VC/Tools/MSVC/*/bin/Hostx64/x64'), reverse=True)
if hostx64_paths != []:
return hostx64_paths[0]
vc_bin_dir = 'C:/Program Files (x86)/Microsoft Visual Studio 14.0/vc/bin'
if os.path.isdir(vc_bin_dir):
return vc_bin_dir
return None```
so I figured out that it's looking for the c complier in visual studio
do any of these directories exist for you?
no. instead my MSVC is located here
C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.32.31326/bin/Hostx64\x64```
so i did this
def _find_compiler_bindir():
return 'C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.32.31326/bin/Hostx64\x64'
for compiler_path in patterns:
if os.path.isdir(compiler_path):
return compiler_path
return None```
and I got this
RuntimeError: NVCC returned an error. See below for full command line and output log:
nvcc "C:\Users\Alex\AppData\Local\Programs\Python\Python310\lib\site-packages\tensorflow\python\_pywrap_tensorflow_internal.lib" --gpu-architecture=sm_86 --use_fast_math --disable-warnings --include-path "C:\Users\Alex\AppData\Local\Programs\Python\Python310\lib\site-packages\tensorflow\include" --include-path "C:\Users\Alex\AppData\Local\Programs\Python\Python310\lib\site-packages\tensorflow\include\external\protobuf_archive\src" --include-path "C:\Users\Alex\AppData\Local\Programs\Python\Python310\lib\site-packages\tensorflow\include\external\com_google_absl" --include-path "C:\Users\Alex\AppData\Local\Programs\Python\Python310\lib\site-packages\tensorflow\include\external\eigen_archive" --compiler-bindir "C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.32.31326/bin/Hostx64d" 2>&1 "C:\python\stylegan2-ada-main\stylegan2-ada-main\dnnlib\tflib\ops\fused_bias_act.cu" --shared -o "C:\Users\Alex\AppData\Local\Temp\tmp2gk5m51p\fused_bias_act_tmp.dll" --keep --keep-dir "C:\Users\Alex\AppData\Local\Temp\tmp2gk5m51p"
'nvcc' is not recognized as an internal or external command,
operable program or batch file.```
so my current working theory is that nvcc is installed with cuda-toolkit but it's not in my path
yeah, sounds like it can't find nvcc
these are in my path
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\bin
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\libnvvp```
but nothing about nvcc
is nvcc in either of these directories?
ok then that's weird
is it possible that by doing
def _find_compiler_bindir():
return 'C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.32.31326/bin/Hostx64/x64'
i've given it a bad path somehow?
like that it can find nvcc but it's thrown off by the path i'm giving it?
then your assumption would be that the actual error does not match the error message
yeah
which is fair, but would have to be proven
you should be able to see how nvcc is called exactly and either see what is being returned or being able to call the same thing manually yourself
what if you type just "nvcc" ?
that's what I did
ok, I haven't used windows in years. But do the .exe matter at the end? like in nvcc vs nvcc.exe ?
no, typically you don't put the .exe on the end
ok, then something is wrong with your path or installation
you should at the very least get an nvcc error
not a system error about the executable
and the fact that just calling nvcc without arguments give you such error does mean that it's not about your compiler argument
it's gotta be the path. someone on stackoverflow had the same issue in 2017
/Developer/NVIDIA/CUDA8.0.61/bin
As indicated in the install guide, the correct path is:
/Developer/NVIDIA/CUDA-8.0.61/bin
^```
but that's not what my path looks like in the year of our lord 2022
mine looks like this
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\bin
then get your path out of the matrix into the current year
hi guys i am new to python programming i am having this trouble i have this code that detect plant and i am getting this error: IndexError: tuple index out of range please help me
something is trying to reach something out of range
i can send you the code can you look at it
it's getting late here and I don't do DMs. Better to paste it here
ok
import cv2
import os
#Cascade
cascade = cv2.CascadeClassifier('./golden_pothos_cascade.xml')
#Reading Image
capture = cv2.VideoCapture(0)
while True:
success, img =capture.read()
#Converting to Gray Image
gray_Image = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
#Adding Gaussian Blur
blur=cv2.GaussianBlur(gray_Image,(13,13),cv2.BORDER_DEFAULT)
#Detecting Plant
detection_result, rejectLevels, levelWeights =cascade.detectMultiScale3(blur, scaleFactor=1.0485258, minNeighbors=6, minSize=(30,30),outputRejectLevels = 1)
greaterweightindex = 0
currentweight = levelWeights[0]
#Area with Heighest Confidence
for (weight) in levelWeights:
if weight > currentweight:
greaterweightindex = greaterweightindex+1
currentweight = weight
#Highest Confidence Area
x = detection_result[greaterweightindex][0]
y = detection_result[greaterweightindex][1]
w = detection_result[greaterweightindex][2]
h = detection_result[greaterweightindex][3]
#Modifying Cofidence
confidence= round(currentweight[0], 2)
finalconfidence= confidence * 100
#Drawing Rectangle
cv2.rectangle(img,(x,y), (x+w, y+h), (0,0,255), thickness=2)
cv2.rectangle(img,(x,y-35), (x+w, y), (0,0,255), thickness=-1)
#Adding Text
cv2.putText(img, str(f"Golden Pothos {finalconfidence}%"), (x,y-5), cv2.FONT_HERSHEY_COMPLEX, 0.6, (255,255,255), thickness=2)
#Displaying Image
cv2.imshow("Detected Plant",img)
#Adding Wait
if cv2.waitKey(1) == 13:
break
cv2.waitKey(1)
the error is at currentweight = levelWeights[0]
my theory is that it's not detecting anything, so it's returning an empty tuple or something
and that's why it's out of range
run the code again but before the line that throws the error put
print(levelWeights)```
@cursive walrus
this is what i am getting
yep, it's as I expected, an empty tuple
ok try this
greaterweightindex = 0
if not levelWeights:
continue
currentweight = levelWeights[0]```
now i am getting this error
do this and tell me what it prints
greaterweightindex = 0
if not levelWeights:
continue
print(levelWeights)
currentweight = levelWeights[0]```
[-1.06755358]
who wrote this code?
cause this looks like a mistake
confidence= round(currentweight[0], 2)```
i took it from github
currentweight isn't a list or a tuple, so of course this will throw an error
what happens if you do this
confidence= round(currentweight, 2)```
omg it worked thanks man you helped me a lot.
thank you
I am the duck
hey @worldly dawn how did you get to be a helper?
do you have to defeat one in single combat?
it does involve some intense training
walk uphill both ways through the snow in the heat of summer while row reducing a matrix?
@plush jungle hello
yo
I am having a data set but it is in txt file. Idk how to load it and i want to do it using linear regression..
Also sorry for pinging you like this
the issue is just that it's in a txt file?
Yes but the data is also not properly arranged.
Thanks for the response. š Appreciate it
Hey @fierce pine!
You either uploaded a .txt file or entered a message that was too long. Please use our paste bin instead.
!past
Pasting large amounts of code
If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pythondiscord.com/
After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.
looks like you could read the file line by line, split the strings based on the spaces, and pick the columns you're interested in afterwards
or make a pandas dataframe and ask it for a column
yeah or you could use regex
also, regression is kind of a loose term. do you mean fit a first order polynomial to a sequence of data? fit a general curve to a sequence of data? have the input be vector-valued?
Idk much i am at initial stage. What data should i predict using multiple regression? Any suggestions?
no clue š
if you've never done any of this before, i'd see if you can predict a future value in one of the columns given the past values
plot the data of the column first to see if it has some behavior you can recognize, pick a model function based on that, and fit its parameters
Oh ohkkk, so in this data which column's value should i predict? I am thinking to do it using multiple variable linear regression model from pandas, but what shld i predict as per ur opinion?
Yessss pleasee and thankss
aight
we're gonna look at a short time window linear predictor
our assumption is that, over a reasonable small time period, the data behaves like a straight line. pretty much a loose form of taylor's theorem
so we wanna set up a model that captures this and learn its parameters
we first recall that a linear equation looks as follows (gonna write instead of tex in the end)
which we've conveniently written in matrix form on the right. notice we have 2 unknowns, m and b, because we observe y and x in the data
we need at least as many observations of y and x as the number of parameters we want to find
now, in your case, we don't just have one value x, but several measurements (temperatures and other stuff). and we want to use old data to predict those quantities, so we also don't have just one value of y
which we can all arrange into a single matrix vector equation
that's for a single row of data. but we need several rows to compute all of the parameters in M (n^2 + n of them). that means we need at least n different columns in x and y
and the whole point of this is: those columns are the rows of data in your file
the matrix M you get from this is a linear predictor of y
in particular, a predictor that only looks at the previous row of data. you can change this by changing the shape of M and giving X a block toeplitz structure
Import "tensorflow.keras.optimizers" could not be resolved
help
yes tensorflow is installed
ive tried both 2.8 and 2.9
2.7 apparently is non existent
ping me with response because this chat is so dead id get bored staring at it
you may need to import the base library tensorflow first
nop :C
yes
weird
already did
no
... or venv
Chances are, setting up a clean environment would resolve installation issues
idk how to make a venv right
and the tutorials are bad
the one i made was on wrong version of python
one simple workaround is not not import the optimizers like that and call them by the full name when you need them
i dont need workaround i need the intended way to work like its suppose to
and if the normal imports wont work then those wont work either
can you at least try? many people on google complain they get the same error you do, but it still works when importing tf and keras, and then calling keras.optimizers
can u give example of what u mean
import tensorflow as tf
optim = tf.keras.optimizers.Adam()
like so
i think that worked
other than that, people suggest to use tensorflow.python.keras.etc , with that extra python in the name
well if that works, that's good enough. seems to be an IDE problem
ok ty
i switched to a jupyter note book and the tensor imports are still broken
the devs of tensorflow deserve a cactus up their bum
and jupyter deserves cactus up bum for giving useless error messages
Hey @violet gull!
You either uploaded a .txt file or entered a message that was too long. Please use our paste bin instead.
https://paste.pythondiscord.com/tiqoyimevo this is the error message for the first "fix" from tensorflow.python.keras.model import Sequential
https://paste.pythondiscord.com/isiladubik heres one from just trying to install tensorflow
someone save me from this cringeness i just want to do coding and tensor flow makes me want to commit hate crimes its so terrible
Hi guys, i have a problem wit exporting a .txt file on .csv using pandas, and writed in columns, can someone help me ?
i have a problem, i tried to read a .txt file and exported to .csv and separating the lines using a delimiter by colums using categories.
file.txt is like this
[groups]
admins = user1,user2,user3
users_network = user4,user5
users_m4s = user6,user7,user8,user9
and the .csv file should be
groups
user1 = admins
user2 = admins
user3 = admins
user4 = users_network
user5 = users_network
user6 = users_m4s ... for the rest of element of category line
``import pandas as pd
import numpy as np
df = pd.read_table("D:\GIT-files\Automate-Stats\SVN_sample_files\sample_svn_input.txt" , sep='=',engine='python')
print(df)
df.to_csv("D:\GIT-files\Automate-Stats\SVN_sample_files\sample_svn_input_update.csv" , index=None)
df = pd.read_table ("D:\GIT-files\Automate-Stats\SVN_sample_files\sample_svn_input_update.csv" , sep='=',engine='python')
print(df)``
but its not displaying and exporting right
practicaly the lines form the txt files , on the left of " = " its the group and after its the elements of that group
i want to display for each element the group separatly
@somber burrow
[groups]
admins = user1,user2,user3
users_network = user4,user5
users_m4s = user6,user7,user8,user9
this is not a csv. csv is strictly comma-separated values on individual lines. you would need a more sophisticated parser for this.
you might need to write your own regular expression
hello, I'm a software engineer and I have been trying to specialize in AI for a year now. I was used when I was into software to preparing for interviews at big tech by preparing coding interviews and system design interviews. There's plenty of ressources about that on the internet. But now that i'm into AI i've been wondering what do I need to prepare in order to do great at interviews for Machine learning or AI positions? Are coding problems still relevant? how to prepare for system design for AI? what do big tech ask for this kind of positions? Thank you for your answers, i'm really grateful for being part of this discord community.
does your current company have AI-related positions, and would they support you in making a lateral move to that? because that's going to be the easiest way. also, for how long have you been a SWE?
when I interviewed for AI positions, I presented on research I had done for my university.
I'm sorry if I didn't explain as I should I have a software engineering degree and then got to a masters degree in data science. I'm not currently working in software. @serene scaffold
and I did work as a software engineer for like a year and a half but they were all part time jobs
I'm also working on a lot of personal projects in AI etc but I'm really trying to know if it make sense to get back at preparing coding interviews and if not what to prepare
I only have experience with interviews for career starters, so I should let someone else comment. but I would at least be prepared to talk about anything you worked on during your masters. did you publish?
What caused this to skip to the next column index?
also @cinder schooner try asking in #career-advice as well
I thought that was where we were 
show code
uploading
!code
Hey @gilded flame!
You either uploaded a .txt file or entered a message that was too long. Please use our paste bin instead.
I don't have time to dive into this, but hopefully someone can help.
cursor = cnx.cursor()
cursor.execute(QUERY)
df = pd.DataFrame(cursor.fetchall())
if alldf is not None:
if not df.empty:
alldf = pd.concat([alldf,df],axis=0)
else:
alldf = df
print(df)
field_names = [ i[0] for i in cursor.description]
print(field_names)
xlswriter = pd.ExcelWriter('{}/{}.xls'.format(type,loc),engine='openpyxl')
if not df.empty:
df.columns = field_names
df.to_excel(xlswriter,index=false)
xlswriter.save()
else:
cnx.close()```
def saveToExcel(query,filename):
xlswriter = pd.ExcelWriter("%s.xls"%(filename),engine='openpyxl')
queryDatas = executor(query)
print(queryDatas)
export = queryDatas
export.to_excel(xlswriter)
xlswriter.save()
print("succes savetoExcel")```
using pandas.concat([],axis=0) to stack the dataframes vertically but won't stack vertically?
So which of my column is dependent dataset and which ones are independent? Can u tell by looking at the data i sent please
the way i wrote it, all columns are both dependent and independent š since the idea is to take a full row (data from all columns) and use it to try to predict the next full row of data. anything with numeric values, let's say
And it will predict next row using what? Date? Time? Precipitation?
all of it
it will use all the previous rows of data to predict the next row of data, as long as you can convert the data to numerical values in some way
i would say that, since the sensor data is gathered at a regular interval, you can ignore the date and time
hi
does anyone know sort of video classification
like using audio and image features for classification
what videos into what classes?
has anyone used the mysql workbench with a mac
i was thinking of doing some kind of exploratory data analysis project
with power BI
honestly why do that when python exists
Hello guys, so I am trying to learn Data Science from ground up
I have fairly decent amount of exposure to Python but don't know anything related to Data Science.
Are there any good sources, courses and/or YT channels which I can refer to for learning about Data Science
If anyone could help I would be grateful!
!resources data science
The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.
Thank you!
adventures in overfitting
dunno if i'd call that overfitting
How would you call it?
that looks like underfitting instead, since it's not close to describing the data, let alone the noisy data
the model hasn't been trained enough or cannot represent the data correctly
With training train gets better and valid worse. That's a definition of overfilling, isn't it?
ah wait, what is the plot showing
since the axes are not labelled, i assumed this was data and predictions
is it the loss?
if so, then yes
hello š
quick question concerning the approx_fprime function
https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.approx_fprime.html
if i have for instance a function with 3x parameters, which i want to approxmitate, how can i pass these 3x parameters into the approx_fprime function ?
anyone would be willing to help me with a basic quiz in AI?
torch.cuda.is_available()```
is always returning false
the internet says to upgrade your nvidia drivers, so I did that and it's still happening
Hi I have a question about one of the debugging exercises. In the Arguments, Paramaters, and Debugging section, 7. Debugging Functions - 1st screenshot
this is telling me there is a problem on line 21, but the actual problem is up on line 13- 2nd screenshot
Can anyone please explain to me how this debugging error would point me to find the ācorrectā error? Thanks.
Always read tracebacks from the bottom up, and pay attention to the ^, does it point out anything that might be missing?
there were two issues with the code...line 13 is missing the ":", and down in the 'def mean' section...'sm_list/len_list' - supposed to be sum_list.
Hi!
does anyone have any experience with shapley values?
In tensorflow, which metric tracks how confident a categorical CV model is with it's predictions while training? Similar to accuracy, but I'm trying to see an average of how confident my model is with it's predictions.
I'm basically looking for the mean confidence I guess?
What's the metric called for something like this? I'm using softmax activation on my output layer, if that matters.
is there any seperate servers for image processing in python?
any vector p-norm with p >= 1 will do this. the larger the value of the norm, the more confidence the model has. the degenerate case is the infinity norm, which just takes the largest value of the vector. note that this tells you nothing about whether the predictions are correct š if you set p = 1, the output will always be 1 though, thanks to how softmax works. so pick p >= 2
I'll be blunt. I have no idea what you just said.
what i'm saying is "that's a bad metric if you use it alone" and "use mean squared error between the output of the softmax and a vector of zeros" (this second one is why the metric is bad)
"that's a bad metric if you use it alone"
I agree, that's not the intent though. Just learning, to be honest.
use mean squared error
š
I understand what you said about p-norms also, I took stats ^^ Thanks for the assistance.
oh, what was it you didn't understand then?
Which statistic metric to use. You clarified with "use mean squared error."
I understand most concepts, but I'm very poor with names (also reflects in human names, and just names in general).
So just takes me a bit to remember which thing is which lol
all right. MSE is the p norm with p = 2 between two vectors. since all you want is to study the prediction vector, it's the same as MSE between the softmax output and a vector of zeros. you'd wanna maximize it.
Ok got it, thanks.
if you don't need it to be differentiable because you won't optimize with respect to this, all you need is to look at the maximum element in the softmax output. the closer this is to 1, the better
can someone suggest me sources on where i can read about text recognition from an image
online sources d be highly useful
what are you looking to learn about, just how it works?
or are you looking for sources on how to do it in python?
yes
like using pytesseract and opencv
i have some use cases but i dont know how to implement them using codes
so i wanna learn about it
how to find a given sentence in the inputted image?
"you are good" in a image
is there any approach to solve this problem
pytesseract should return a string when run
you can use regex on that string to match with the text you want
text = pytesseract.image_to_string(img)
this will generate string of the text
but the thing is what if the text is complicated?
like some random characters installed in between due to foreign languages
regex can handle that
S) l\infected.html > @) Search, Pr @
Ā„
ka Mail - Knox Portal @iNinfected.htmi
You are infected!
om | O Jype here to search t F g A , AIC O Bl F va 4
ohh
what are you searching for in this string
what link is this?
rubular is a website that lets you test regexes in real time
so you don't have to run a python script every single time you want to tweak your regex
ohh
how the code in python looks like for using this regex?
how to comment these selected lines
at once
should we have to # all the time for each lines?
Here's how to format Python code on Discord:
```py
print('Hello world!')
```
These are backticks, not quotes. Check this out if you can't find the backtick key.
ohkkk
test = pytesseract.image_to_string(img)
if(test.find("You are infected!")!=-1):
print("Match Found")
else:
print("Match not Found")
i got the code without using regex
š„²
no b is just an iterable
yeah if you know there won't be any letters in between you don't need regex
but what regex can do is detect strings like this
you a8re in4fesecte$d```
if you run into that problem, remember that regex is the solution
but while doing image to text, why will some random letters come inbetween
ocr, like all machine learning, is probabilistic
the computer just makes educated guesses
sometimes those guesses are wrong
yeah that's a good source to learn regex
@plush jungle I have a followup qn too
how to find the coordinates of the box enclosing the sentence You are infected!
import pytesseract
from pytesseract import Output
import cv2
img = cv2.imread('image.jpg')
d = pytesseract.image_to_data(img, output_type=Output.DICT)
n_boxes = len(d['level'])
for i in range(n_boxes):
(x, y, w, h) = (d['left'][i], d['top'][i], d['width'][i], d['height'][i])
cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 2)
cv2.imshow('img', img)
cv2.waitKey(0)```
something like this
ohh i ll try once
that was quick. Did you have that ready?
that's my secret captain, I always copy paste from stackoverflow answers
(or copy/pasted from a sample?)
some people are pro in searching questions in stackoverflow
They are called senior engineers
i see
btw, recursive, I know this is off topic for this channel but do you have any idea why this is giving me strange values
target = (math.cos(math.radians(self.angle)), math.sin(math.radians(self.angle)))```
if self.angle is 90
it should give (0,1)
but I'm doing math.radians
but it's not even close
it's giving me
(6.123233995736766e-17, 1.0)
oh wait
that is close
In [8]: (math.cos(math.pi / 2), math.sin(math.pi / 2))
Out[8]: (6.123233995736766e-17, 1.0)
yeah, looks like a float representation issue
In [9]: (math.cos(math.pi), math.sin(math.pi ))
Out[9]: (-1.0, 1.2246467991473532e-16)
self.x += self.target_vector[0]/100
self.x += self.target_vector[1]/100```
I need to lay off the copy pasting
anyway, this is tangentially related to machine learning
cause I'm making a reinforcement learning bot
numerical precision does matter even in ml
by reverse engineering a flappy bird reinforcement learner
and retrofitting it for a top down pygame shooter
that would be an interesting blog post
yeah if I get it working I might do that
even if you do not.
Learning from failures is as valuable (if not more) than learning from the success
amen to that
there is a demotivator about it too
(not a meme channel but https://despair.com/collections/demotivators/products/mistakes?variant=4376100306965)
can you send me the link of this stackoverflow site>?
yeah I mean unironically it's something to be proud of if you think about it
marie curie discovered both radium and the fact that being around radium kills you
it's a bit too late here to get into these kind of debates
sorry, I tend to wax philosophical in the late hours of the night
np, it's still an interesting question
hi guys, so I am stuck in a problem, and found something that might help me out on stackoverflow
.
my question is, instead of integers, can I use a range?
like [100-105, 110-115, 120-125]
.
for some reason it gives me a future warning when I do on date š¤
var = (df.set_index('date').groupby("user")).rolling('14D')
it doesn't throw the warning if I set the index to the date š¤·āāļø
@urban lance try giving a minimal reproducible example that can be copied and pasted exactly.
are there any usually reasonable ways in general to find a correlation analysis between 2 columns if there's already 100,000 rows to use
If you gave the points like 20% opacity, it would be easier to visualize the density of them
But a priori, I do not see any strong correlations in that plot, lol!
it would be difficult to imagine a weaker correlation
Without better density information, it's hard to say. Maybe there's a very dense straight line in the middle, with lots of outliers
or, maybe there isn't 
there's really not a lot i can work with regarding trying to make it dense so would "there is no correlation between Age and the Final Score" suffice
unless i do something like "check if age > insertNumberHere use row", if it's possible, would that be fine too
I mean, you can compute the correlation coefficient easily. It's gonna be small. And then you can say there's no useful (linear) correlation.
This is a top-down view of my surface plot. The z axis shows the speed of fluid flow. The structure is a big vial full of water (big circle) with a tube going through the middle (small circle). I'm trying to figure out how to extract the velocity from the middle and find the peak velocity, ideally without having to manually label which pixels to grab the values from because I have a lot of these images and the position of these things in the x-y plane changes slightly. Any ideas?
Hi, has anyone here used SciPy before? What are some example projects that one can build with SciPy? I'm trying to get a better understanding of it. Also, what is the difference between SciPy and NumPy? Thank you!
scipy is mostly statistics stuff that you can do to numpy arrays
numpy is pretty much at the foundation of everything
that said, there aren't projects you can do specifically in terms of one data science library
it's not like "build a website with django".
Hi @serene scaffold ! Ok, thank you! So with SciPy it sounds like there are things you can do with the data? Like if you did some speech recognition stuff and get a text transcript back. Then are there things one can do with SciPy on with that text?
Like if you did some speech recognition stuff and get a text transcript back
no, you can't do that with scipy. scipy is for doing math.
Then are there things one can do with SciPy on with that text?
probably not. try spaCy.
I decided to not use rolling anymore
Does anyone have experience with Google's MT5 text model?
I have a df with session IDs, I'd like to group information by session to pass through a function but each group has to have the rows of previously processed groups as well. Is groupby able to do this?
pandas doesn't effectively support iterative operations where previous iterations matter.
remember to always ask your actual questions. don't try to filter people by what they think they know before you've said what you really need help with.
how can I use Shapley values to design utility and payoff for multi agent reinforcement learning?
how can I exactly predict tomorrows stock price? Trynna get a bag quick
what do you mean "trynna get a bag quick"?
it sounds like you might have unrealistic expectations. you can't predict the future, let alone exactly. you can only forecast it.
You can pass a class instead of a function to aggregate the groupby, that way you store the intermediate results in the class
Hello everyone, may I ask, do you have any references about RecommenderNet algorithm?
it means he's tryna get a bag quick stel
well, if any of us could exactly predict the stock market, a few of us would be rich and we wouldn't hang out in this Discord 
any machine learning tool you have access to, wall street investment bankers also have access to. if there was a way to predict stocks accurately, they'd still be richer than you because they'd use the same tool but with better data and more expertise
and more seed capital
deep Q learning is short term, right? it only ever looks at which actions have immediate benefits given a current state?
so it's not going to be able to patterns that take longer delays between the action and the reward?
I'm trying to repurpose this deep Q learning code that teaches a bot to play flappy bird and have it learn to play a top down shooter game
the blue dot tries to shoot the red dot by deciding to change the angle of its laser sight, do nothing, or shoot
it's 186,000 turns in, and it's really not getting noticeably better
the code that updates the neural net's weights is as follows:
minibatch = random.sample(replay_memory, min(len(replay_memory), model.minibatch_size))
# unpack minibatch
state_batch = torch.cat(tuple(d[0] for d in minibatch))
action_batch = torch.cat(tuple(d[1] for d in minibatch))
reward_batch = torch.cat(tuple(d[2] for d in minibatch))
state_1_batch = torch.cat(tuple(d[3] for d in minibatch))
# get output for the next state
output_1_batch = model(state_1_batch)
# set y_j to r_j for terminal state, otherwise to r_j + gamma*max(Q)
y_batch = torch.cat(tuple(reward_batch[i] if minibatch[i][4]
else reward_batch[i] + model.gamma * torch.max(output_1_batch[i])
for i in range(len(minibatch))))
# extract Q-value
q_value = torch.sum(model(state_batch) * action_batch, dim=1)
# PyTorch accumulates gradients by default, so they need to be reset in each pass
optimizer.zero_grad()
# returns a new Tensor, detached from the current graph, the result will never require gradient
y_batch = y_batch.detach()
# calculate loss
loss = criterion(q_value, y_batch)```
I don't entirely understand what y_batch and q_value are, but as far as I can tell, nothing in this does anything that would track the long term benefits of a move
which means if it takes 50 turns for a bullet to reach the target, the model will never learn how to aim
No.
I'm not sure I understand how it makes long term connections between an action (like firing a bullet) and a delayed reward (like the bullet hitting its target 50 moves later)
this minibatch code is the only part where it does gradient descent, so somewhere in the code I posted must be the long term learning you're talking about
could you give me a hint as to how this works?
how many images are good for an ML database of dogs?
also, if im trying to detect something, do i need a database of stuff that is what im trying to detect and a database of stuff im not trying to detect?
if that make sany sense
depends on which model you are using, what is the purpose of the model, and which kind of pictures you'll feed it later and probably a few dozen other factors I do not even know
if you want to accurately identify all dog breeds, from any angle, and tell apart not-a-dog as well, that 1.000.000 joke might not even have been all that far-fetched
if you just want to tell if a picture of a front-facing dog is a Shiba Inu or a Chihuahua, a few dozens or hundreds would suffice
Do some q-learning by hand with a q-table in a small simple maze (such as a T-maze).
"not a dog" can be literally anything, or just one specific kind of thing?
like if i send an image of a house it shoudl say no dog detected
yk?
disclaimer: I have never personally worked with classifying images
you may be able to make it work using a HuggingFace or fast.ai pre-trained model and potentially fine-tune to which kinds of dogs your data will actually include, but it might be trickier than it sounds
that said, if you want to do it with your own dataset, without using a pre-trained model, I don't really have any ideas of how to help other than "good luck"
I'm doing a time series model based off the collapse of WireCard
the model is based on the stock prices for that time
my graphs are looking a little fucked though
so i'm not totally sure what to do with it
unsure what i'm doing wrong, but the graph is real....wonky looking
Yall have any good books you recommend for DSP/data science? @ me since I turn off all notifications lol
I'm trying to understand the code and concepts behind Q learning, as explained here
but I'm stuck on how the Q learning algorithm predicts future payoffs, not just the payoffs that will occur at t+1
it uses replay memory, and randomly selects 32 previous examples of turns
but in that replay memory the only information is the state, action, reward and image
there's nothing linking any given turn to its future reward
It seems your goal is to understand Q-learning. Adding deep learning into it is trying to tackle two problems at the same time. Split up the problem into multiple sub problems and do those separately. In this case that is understanding Q-learning without deep learning, and then how deep learning comes into play.
Using tabular methods for Q-learning (RL in general) makes it really obvious, since they can even be done by hand for very simple toy problems.
because the reward is given based off of immediate success or not
well actually wait
with a bigger maze
Follow the Q-learning algorithm for a simple maze and see how the Q-table is updated.
ok so each state has a value associated with each action
and that makes up the table
so each square of the maze that the player could occupy is a state
and eventually the correct path is produced in the table
through rewards updating the table values
Yes. Although it may not take it exactly depending on the choice of exploration vs exploitation.
right, I think I understand that too
but it all falls apart when you go from like 100 states to millions
because in my code, the states are vectors representing the image
I want the agent to learn that firing the bullet will yield a powerful reward, but not immediately. the neural network that influences what action the agent chooses is trained on minibatchs
You know what else is not immediate? The reward at the end of the maze. So how does the agent know, when all the way at the start, where to go?
(Not trained vs trained)
because each square in the maze receives a reward based on whether it hit a wall or how close it is to the goal, right?
No reward is given except at the goal state.
oh
so it works backwards then? the square before the goal gets a strong update to the weight for choosing the right action
and then the square behind that gets a stronger weight for the action that gets you to that state?
like because of exploration, eventually the agent will stumble its way to the end
the final square's action weight will be updated, but then what about final square - 1
if reward is only given at the end, how does final square -1 know to update the weight for the action that gets it to final square
since it won't receive a reward for doing so
the tile right before the goal makes sense to me, but "estimate of optimal future value" is the part that confuses me. for s_t-1 how does it calculate that future value?
s_t becomes s_t-1 when it moves to the goal.
They are the same thing.
s_t, s_t+1 or s_t-1, s_t
What's DSP
So you did the action that takes you to the goal state s_t+1
how does it know reward at time t+1
But what are you updating according to the equation now?
we just got from final square to goal? I guess we'd update Q?
right
You look things up in it.








