#data-science-and-ml
1 messages · Page 276 of 1
yes, you can use merge sort for example, but that's a question better fit for #algos-and-data-structs
I'd be interested in people's opinions on my reddit thread about pandas: https://www.reddit.com/r/Python/comments/kj36vp/type_hints_for_pandas_what_would_we_need_what_can/
0 votes and 0 comments so far on Reddit
huh, markdown didn't work
I have a pandas dataframe which contains spending and the month of spending, what would be the best way to create another dataframe which holds the total spent for each month??
@teal sluice group + sum maybe
or summarise
Hi all, has anyone configured a Raspberry PI as a server/VM to run your own scripts via http requests?
I just need some quick guidance
can you show some examples?
ping me if you do.
I can give an opinion about the power of static type checking in such cases
👂
there's something called dependent typing
which basically means that the type of a value depends on another value
this can be used to encode, for example, an array's length in its type
which gives you stronger guarantees regarding whether any particular expression is well-formed
e.g. elementwise addition of unequally sized arrays
however, this is still (relatively) an academic thing
(and is also pretty complex)
so there are some things we probz can't do right now.
and this also raises the question: what do we want to encode in a dataframe's type?
at some point, it might be a question of whether this should be delegated to runtime property checking instead
property checking?
i.e. not encoded within a type parameter
for example
in most languages, division by zero is handled as a runtime error
not as a compile time type error, because there is no NonzeroNumber type
there's also the question of axis alignment
if two dataframes have the same column names in a different order
are they the same type?
different column names, but same data types?
that's a point I alluded to in my post. Some use cases may depend on column order, others may not.
my current thinking is that there would be power in having a language for documenting what properties a dataframe has, even if the linter can ultimately only assume that if a function returns a SomeDataFrameType, that object is a valid argument for a function that takes a SomeDataFrameType.
I have another question. I have moved to colab since it has gpu acceleration (better than mine for sure) and i uploaded all my images to drive (1 hour it took). Now, i need to append all the images to an array, to give it as input to my nn. But colab takes like tooooooooooo long to append all the images. Any suggestion?
yup, precisely
but that would still be a type parameter
and it would be irrelevant for other functions
so if you wanted to spec this out
you would need to think about whether there's a practically viable type hierarchy
that can encode the necessary information
how are you doing it
for pok in pokemons:
path = os.path.join(datadir, pok)
images = os.listdir(path)
amount = len(images)
for i in range(amount):
print(f'Doing {pok}. {amount - i} remaining images')
img_array = cv2.imread(os.path.join(path, images[i]), params['color_mode'])
new_array = cv2.resize(img_array, params['dimensions'])
if i < amount * params['percentage']:
train_data.append(new_array / 255)
train_label.append(pok)
else:
valid_data.append(new_array / 255)
valid_label.append(pok)```
i open the imagen with opencv, i resize it, and i append it to an array (input for latter)
wait.
what is train_data?
where is it defined
show.
idk how to copy paste code from colab. it is on different cells. wait
os.chdir('/content/drive/MyDrive/Colab Notebooks/Python ML/Pokeguesser')
train_data = []
train_label = []
valid_data = []
valid_label = []
data_dir = 'dataset'
pokemons = os.listdir(data_dir)
dimensions = (71, 71, 3)
batch_size = 126
num_epochs = 12
percentage = 0.8```
for pok in pokemons:
path = os.path.join(data_dir, pok)
images = os.listdir(path)
amount = len(images)
for i in range(amount):
img_array = cv2.imread(os.path.join(path, images[i]), cv2.IMREAD_COLOR)
new_array = cv2.resize(img_array, dimensions[:2])
if i < amount * percentage:
train_data.append(new_array)
train_label.append(pok)
else:
valid_data.append(new_array)
valid_label.append(pok)```
it's not an array
it's a list
it's important to be clear on this
well, sorry if both are different. For me are the same ^^'
an array from java is a list on python
thats why sometimes i call them array
when you are working with numpy
numpy.ndarray is what is normally called an "array"
and because the semantics are different
okey okey
how many images?
26k
how big is each image?
mmm there are different sizes
1.14 gb
i think i am not explaining well, wait
not rlly, at least on local machine
hold on 1 sec
sorry for a gif, cant think of a different way to show
this is on my local computer
but it's very possible that there needs to be transfer over the wire
from Drive to Colab
which would make it much slower
here you can see
that loading is much slower
and
okay, simple way to show if this is true or not
oh
img_array = cv2.imread(os.path.join(path, images[i]), cv2.IMREAD_COLOR)
this is the line that loads the images
so colab doesnt actually have my images directly?
include a print before and after
to see how long it takes to load
IO should be the primary bottleneck here
this is what I found after a quick search
It takes forever to copy files from Drive to Colab. While this is no problem when dealing with very small datasets, it’s very annoying when facing larger data, for example for image classification.
you said your data was in Drive
right?
yeah, but idk why i though linking drive to colab will make like a copy on colab side
i will try that, one sec
idk if i fcked up but
!cp -r "{data_dir}" ~
will copy the folder on root?
cuz i am trying not to zipping the images and upload again to drive
if this doesnt work i will do it tomorrow
not to distract from the help that's happening, but now I'm wondering: is the only runtime optimization for numpy that it does iterative operations in C, or can it also secretly run independent operations in parallel?
huh
they're run with SIMD
stuff like elementwise addition is run in parallel
with aforesaid SIMD
uh
btw. Does colab indexes files on a different way? my subdirectories are name like 001_name, 002_name, 003_name and so on
But when i do os.listdir it returns some weird sorted list
the first item is the 083
{
"server1":
[
"id":
[
"s1",
],
"channel1":
[
"c1",
],
],
"server2":
[
"id":
]
"s2"
],
"channel2":
[
"c2",
],
],
},
so i got this json
import json
with open(r"D:\Heres\Bots\Messager\Files\saves.json") as f:
data = json.load(f)
server1 = data["server1"]["id"]
channel1 = data["server1"]["channel1"]
server2 = data["server2"]["id"]
channel2 = ["server2"]["channel2"]
print(server1, channel1, server2,channel2)```
and this py
and for some reason its not working
may someone help? i am doing lotsa stuff, may ya ping me if u can help 🙂
what is the error you're seeing?
either way, you can't access id and so on since the json data structure isn't a nested dict, you have a list
so you'll have to do data["server1"][0]
fixed the issue ty
ValueError: Failed to find data adapter that can handle input: (<class 'list'> containing values of types {"<class 'numpy.ndarray'>"}), (<class 'list'> containing values of types {"<class 'int'>"})
Can someone help me fixing this error?
How much math is needed for Data Science?
calculus, linear algebra and statistics
Ty
hey guys, I'm looking for a startup idea on AI.. If you have some good ideas do tell me..
u could help me doing a nn that recognizes pokemons 😄
is AR a big thing in future?
I think yes it is
So do i
probably a noob ass question but i have a correlation matrix, how do i extract the highest pairs, as well as what that pair as? for example, the correlation between x and y was .7? most of the methods I am seeing show the correlation number, not what the two variables are
basicallt i want to extract the highest values from a matrix and what the two variables are
Hi guys, I don't suppose anyone understands this and can help me get a solution out?
I've been looking at this for hours, inspecting it with debugging tools trying to find the relationship between the input and output
(base) C:\Users\siebe>conda install tensorflow-gpu
Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
it keeps doing this. I also tried install only tensorflow (without the gpu)
I have a RTX 2070 Super GPU from Nividia (MSI)
Hey guys, i m a but confused with jupyter notebook and anaconda, basically just wanted to know can i use the jupyter notebook as an independent desktop application, or it does either run on the web or with anaconda?
Just if someone is interested in Data Distillation https://dasha.ai/en-us/blog/data-distillation
Mehn, I'm still struggling with my Tensorflow installation. I've uninstalled Python 3.8, I even created a virtual environment and stuff, installed just TF in it but I'm still getting issues.
It's saying, DLL runtime error: can't load Tensorflow runtime bla bla bla. But I figured out it has to do with my PC, it doesn't have a GPU, that's why. I successfully installed Tensorflow but it can't run without a GPU.
I'd try to get another PC January to resolve this.
Can anyone help me with just what algorithm or structure to use on this problem?
https://open.kattis.com/problems/bokforing
My answer is way too slow, although I use python I still know I don't have the right answer. I have one solution to it that would work I guess, but that would be kinda cheating and I want to solve it the right way, any thoughts how to speed up the solution?
sort of a dumb question, but I can't find the answer to this: how do I read an exponent in this format from mpl? 1e-12+9.9995833333 (other than 'really small' 😉 )?
read as in, interpret
use regex
I want to find out certain metrics in my model. I have binomial distribution of daily pattern and I have average rate of daily metric. How do I find out the rate at particular time ? ( Basically multiplying binomial curve with average should reflect the distribution of data for certain ranges ) Is there any utility to do such kind of analysis ?
How can i use ImageDataGenerator to fit my model after? It is complaining idk why
It's more likely you're missing something else. TF shouldn't require a GPU
Feelssadman I’m doing python at home on this christmas day :/
I've been battling this since yesterday. I directly installed it using !code pip install Tensorflow. I just learned that their are different installation packages based on your needs. I would try installing the one that doesn't require GPU, just CPU installation.
Also, I just installed Pytorch and it's working fine.
If TF doesn't work on my PC, I'll just go with PyTorch.
Hello everyone,
I am trying to make a web scrapper off Fortune 500. I was thinking of using Scrapy but I can do well with BeautifulSoup.
When I make a get soup request (and print the soup itself) I end up with useless information named DNS Prefetch and no relevent info about information on the page. Any idea how I could bypass it?
Thanks a lot!!
guys what's a better way to show lots of graphs in one chart?
below is what I did
this is so messed up
Each line graph shows a historical price of certain good for 5 years
I want them to be in one graph but is that even viable to make it look better than this :/
x is time, y is price btw
why do you want to show that many stuff in one graph
is it how it is normally done?
basically that graph is to compare housing price differences between cities
I have no other better thought
showing some of them makes no sense to me
Any advice is highly appreciated!
@lapis sequoia you're visualizing a lot of data
generally, line plot is the best one if you want to compare real estate prices
but this is surely not looking good
you can try to take the mean values of different cities and then try plotting a bar chart
where one bar will represent the mean price of house in that city in a given year
thanks for the advice, let me try the other way
Guys will heroku charge me for my add-ons like MySQL, i have my credit card info registered that's why
||sorry if offtopic||
I used LeabelEncoder from sklearn to transform my labels into valid thing for keras. But once i do the label encoder, i get 1 list of train_data length
And i think keras needs a matrix
Yesterday i downloaded cifar10 dataset to see what is has. x_train was a ndarray of 50k of images (ndarrays too). But y_train.shape was (50k, 10) cuz 10 classes. I printed what was y_train[0] and it was a list full of zeros except one
On my case, y_train is just 1 list where y_train[i] is the class x_train[i] belongs to
But model.fit doesnt accept this
nvm, i fixed it
hello i need some help for groupby().get_group() : https://paste.pythondiscord.com/ivofaremos.rb
import pandas as pd
#dataframe " ind returnsnsinc 1926) ,shape(11100,30)
ind = pd.read_csv("ind30_m_vw_rets.csv", header=0, index_col=0)/100
ind.index = pd.to_datetime(ind.index, format="%Y%m").to_period('M')
ind.columns = ind.columns.str.strip()
time series correlations over time over a 36 month window: shape((33300, 30)
ts_corr= ind.rolling(window= 36).corr()
ts_corr.index.names = ["Date","Industry"]
ts_grby =ts_corr.groupby(level = "Date")
ts_grby.get_group("2018-12")
KeyError Traceback (most recent call last)
<ipython-input-8-484dc0e2c324> in <module>
----> 1 ts_grby.get_group("2018-12")
~\anaconda3\lib\site-packages\pandas\core\groupby\groupby.py in get_group(self, name, obj)
808 inds = self._get_index(name)
809 if not len(inds):
--> 810 raise KeyError(name)
811
812 return obj._take_with_is_copy(inds, axis=self.axis)
KeyError: '2018-12'
I am using Xception to try transfer learning. But i am doing something wrong cuz val_acc is 0.007. I followed this link https://keras.io/guides/transfer_learning/ but idk. How can i know what is going wrong?
Hello!
I have a dataset with Quora questions and another dataset with individuals' preferences (ranked from 1 to 5) in which each row represents a different individual.
I would like to match each question with a group of individuals whose preferences may match the topics I obtained with lda modelling. The problem is that I don’t know how to exactly do this…
I don't know what answers I need to to look for... I don't know what to google to find out the magic key! Please help!
What do you think? What would you advice me to look for or how do you think I approach this?
Thanks so much in advance! And Merry Christmas!! 🎄
How to optimise the local pyspark so it would run the fastest on the work laptop? I need it to run tests, but they are taking waaay too long.
@lapis sequoia what does your quora dataset contain? Was it text responses to various questions?
This sounds like a question recommendation system based on a individual prefrences
Hello everyone
Yo guys,
So I first created a space invaders game with a friend and then we tried to add a neat ai which kind of worked but it's not really learning anything. If there is someone that might wanna hop into vc and look at the code and maybe help us make it more efficient and learning, that would be awesome :) if there is someone just DM me!
Not sure if I am right here or at #game-development
Wouldn't it be better if you just run it on AWS free tier?
Hi Guys, somebody can help me? Somebody knows how to get data from Pi osisoft with R or python?
I have a pandas dataframe that looks like this, and I'm trying to explode each list into a new row (so I would have a shape of len(list1) * len(list2) * len(list3) rows x 377 cols. The code I'm using to do this is
for column in df.columns:
df[column].explode()
but this does literally nothing. Anyone how this might be fixed? full code here: https://hastebin.com/pifoseripo.properties
Hi Tommy! The Quora dataset contains more than a million non-duplicate questions from Quora. You are totally right! The ultimate goal is indeed a question recommendation system based on individual preferences!
hey
anyone knows a tutorial to learn how to save a neat module? or some docs?
caue when i train my ai few hours for a game i would like to save it so it doesnt have to start from zero
@nova smelt hope this helps
Hey guys, I'd like to increase my knowledge about scientific python and also dangers that come with machine learning. For my university, I am asked to write a paper (it's going to be desk research and I want to state my thoughts on a topic that is controversial, so there is room for critical thinking). Therefore, I was wondering if you guys have any book recommendations? I don't need to get into a hands on how-to right away, but something that takes you by the hand and explains the depth of the scientific data world 🙂
Uh, thanks? I guess
Hi, How can I merge two data frames where
df1 has index "Key0"
df2 has indexes ["Key1", "Key2", "Key3"]
for each row ["Key1", "Key2", "Key3"] might contain "Key0"
I came up with a solution using apply but it is really slow...
My Solution
def matchMerge(x, key, df, keys):
for key in keys:
try:
x.update(df.loc[x[key]))
except:
...
df1.apply(matchMerge, key="Key0", df=df2, keys=["Key1", "Key2", "Key3"] axis=1)
is where away to do this with merge?
pd.merge(df1, df2, left_on="Key0", right_on=["Key1", "Key2", "Key3"], how="outer")
# throws indexes must have same length
hi
so
uhh just learning it for now and create some projects
with it
and then prolly might use it for game deving
later
for making stuff like traffic in cities and stuff
oh
like not making cars and stuff crash with each other
oh
uhh
idk
e
idk much about groups and stuff
:c
uhh yeah
you can say that
e
e damn i was thinking impossible stuff them
alr
tru
yeah yr
alr
thanks
damn thanks a lot for ur time
alr
kk
oh
then it must be good
Hey guys. I am using Xception with weights of 'imagenet' as a pretrained model. I am freezing it according to https://keras.io/guides/transfer_learning/ but after training, my model val_acc is 0.007. Any idea of what could be wrong?
Hey, is anybody working here with PySpark ?
soup = BeautifulSoup(r.content, 'html.parser')
find = soup.find_all('img')```
output
<img alt="blablabla" data-src="linkhere.jpg" height="451" src="anotherlink.jpg" width="300"/>,
How i can specifically select "data-src"
How do you create an XPATH expression into a new HTML file that lives inside an iframe?
@trim oar It is meant to be eventually deployed in Azure ecosystem, but it doesn't solve the problem with local tests.
Guy what layers may i add to my pretrained model (Xception) if i wanna do transferlearning?
Like, this is what i have
base_model = keras.applications.Xception(weights='imagenet',
input_shape=dimensions,
include_top=False)
base_model.trainable = False
inputs = keras.Input(shape=dimensions)
x = base_model(inputs, training=False)
x = keras.layers.GlobalAveragePooling2D()(x)
outputs = keras.layers.Dense(len(pokemons))(x)
model = keras.Model(inputs, outputs)
model.compile(loss=keras.losses.categorical_crossentropy,
optimizer=keras.optimizers.Adadelta(),
metrics=['accuracy'])```
But it seems not to be training at all
I want to use a supervised ML model for some grade prediction, so I have my train data
X: All 6 mock results
Y: Final grade
And all my train data has all the fields filled fine
But when a user wants to use the app, they may not have all 6 mock results, so can I predict their final grade on less such as 3 mocks?
(Plan on using scikitlearn)
(Please ping me if you reply!)
Hi guys, Im new to ML.
My question is, that if the model is trained on normalised or standardised data, we also need to normalise or standardise the data when the model is in production?
Hello guys, I am trying to improve my oop skills and ml skills
So I am trying to right ml algorithms but I kinda need some guidenca
Do you know any resource that gives you steps for this kind of things
greetings all, as regards NLP what tool do you recommend to create text annotations, other than carving it by hand.
Yes.
Has anybody tried the Faster RCNN implementation
same question..?
@misty rivet where is the solution though?
Bro i don't know..😂, I'm new here...!
wait until some experience one reply us @soft salmon
If I get the following loss of 15813.8125 from an mse function do I need to find it's square root to know it's actual loss or is that the loss already
you square root it to know its actual loss
I have, with detectron2 from facebook
No I need it with Tensorflow
Actually my program is showing some bad outputs
also the mAP is about 23%
?
what kind of dataset?
playing with deep learning need at least 5000 images if you're working on image detection
yea
my train split have about 4300 images
in total its about 7800 images
actually this is the issue
wow thats some bounding box issues no wonder your mAP is quite low
did you manage to offlane augmentate?
means ?
try to optimize your parameter, double check your ground truth, augmentate your training dataset so you will have more data
You can find great material on YouTube, Udemy, Coursera, something like https://www.udemy.com/course/datascience/, https://www.coursera.org/browse/data-science, https://m.youtube.com/watch?v=ua-CiDNNj30 the last link is awesome for beginners! @soft salmon @misty rivet
Learn Data Science is this full tutorial course for absolute beginners. Data science is considered the "sexiest job of the 21st century." You'll learn the important elements of data science. You'll be introduced to the principles, practices, and tools that make data science the powerful medium for critical insight in business and research. You'l...
this is it
i'm trying to add another column to my dataframe
this is what i am currently doing
if first:
first = False
df = pd.DataFrame([stock, tempdf.iloc[:,3]])
else:
print(stock)
df[stock] = tempdf.iloc[:,3].tolist()
but it adds it as a row
how do i get it to add the 3rd collumn to the stock?
what is stock?
its a string
sorry should have specified it
the 3rd column of tempdf is integers
and the rows are indexed by datetimes
sorry wrong code, this is the only thing thats working
if first:
first = False
df = pd.DataFrame([stock, tempdf.iloc[:,3]])
else:
print(stock)
tempthing = tempdf.iloc[:,3].tolist()
df[tempthing] = stock
Here's how to format Python code on Discord:
```py
print('Hello world!')
```
These are backticks, not quotes. Check this out if you can't find the backtick key.
okay maybe you can give me a bit more context
as to what you're trying to do
it's `, not '
right
ya same as tilda
yup
basically, i'm trying to put the closing prices for a bunch of different stocks at different dates into a pandas dataframe
i'm doing this by looping through a list of names (stock), and trying to add them one by one to the dataframe
probably not a good idea
every time you "add" a column
you in fact create a new DataFrame
what's the source?
of the data
you have other programming experience?
yes
it returns it in a pandas dataframe
so pandas has this concat function
yes, ive used it
oh it's a Python library wrapping the API?
so I'm guessing
you get a bunch of DataFrames from the API
and you want to combine subsets thereof
in a specified manner?
yes
okay so let me just get this right
you have, say, 10 DataFrames, and each has a 'output' column, and you want to take that column from each and combine them into one big DataFrame
is that right
pd.concat([df['output'] for df in dfs], axis=1)
thank you!
ya, i'll need to do a little restructuring of my code real quick
so dfs is the iterable of all your source DataFrames
i got a TypeError: Cannot join tz-naive with tz-aware DatetimeIndex
i used this instead: pd.concat([tempdf['Close'], df], axis=1)
what is df
the dataframe i am adding everything to
okay
so
to use the approach
above
you need to put all the individual DataFrames in a collection
ok
so i made a list of all the dataframes
and then ran your command
and it worked exept one of the columns has a bunch of NaNs (i forgot what they are called, its not null is it?
also thank you so much for helping me
yw!
what do you mean
one of the columns
like
check the DataFrame
that that column came from
most likely
the source data is bad
or its index is misaligned
Close Close Close
Datetime
2020-12-21 00:27:00+00:00 23526.640625 NaN 641.566772
2020-12-21 00:28:00+00:00 23486.863281 NaN 640.518188
2020-12-21 00:29:00+00:00 23493.597656 NaN 640.609680
2020-12-21 00:30:00+00:00 23497.607422 NaN 640.758362
2020-12-21 00:31:00+00:00 23550.359375 NaN 641.541931
... ... ... ...
2020-12-28 00:19:00+00:00 26493.246094 NaN 708.414062
2020-12-28 00:20:00+00:00 26520.251953 NaN 709.166321
2020-12-28 00:21:00+00:00 26509.263672 NaN 708.537170
2020-12-28 00:22:00+00:00 26530.599609 NaN 707.455750
2020-12-28 00:23:02+00:00 26558.570312 NaN 708.471008
ok seems to be working
just a scattering of NaNs somewhere
check dfs[1]
ok thanks
its fine
i think its just the api
and that one dataset
all the other ones are fine
thx
Have any of you worked with the mal api?
I’m trying to extract the user ids of the users on mal, I’ve tried mal,jikan but nothing seems to work
Is there no other way than to make a crawler and scrape the user ids?
Also I need to extract the rating given by each user to the anime
Could someone explain this paragraph to me? I've been replaying the video, but still don't understand it.
not the kind of reply you’re looking for, but may I ask about the guide? Looks cool, and is there any video tutorial for that?
In a very layman language a loss function is a way of telling a model how bad it is doing
So the less the loss is
It’s better
Coz that means it’s doing better
won't overfitting be the problem though
That’s when you train the model too much on one dataset
FreeCodeCamp Zero to GANS by Aakashn
It's on youtube and it's free
Alright thanks
Thanks m8
Np
Ok thank him he’s desperate for a thank you
Could you explain it in more detail?
I know this is late lol
there are different types of functions which are used to determine the loss
they basically see the difference between what your model is predicting
versus the prediction that should be
what does this mean
English isn't my first language so don't use too advanced words
Alright
versus what the model gave
I thought what the model gave is the prediction
From the tutorial it says that the predictions should be close to or equal to the targets
Looking at the first element in each tensor. The guy says that -4252.4780 is what happens when you differentiate with respect to the 0.2761
Correct me if I'm wrong.
And the value -4252.4780 is the derivative of the loss with respect to 0.2761?
is anyone here familiar with naive bayes?
I'm a self-taught programmer. I'm lucky enough to have a job where I get to use python every day as a data analyst. However I feel like I've hit a wall on my professional development. Internet bootcamps can only take me so far, I think what I'm missing is peer interactions and networking. Unfortunately I don't work with anyone else who codes in python. I'm considering taking a more rigorous online course, applying to a university or pouring time into an open source project.
Any advice?
Hey @radiant urchin!
Uh-oh! It looks like your message got zapped by our spam filter. We currently don't allow .txt attachments, so here are some tips to help you travel safely:
• If you attempted to send a message longer than 2000 characters, try shortening your message to fit within the character limit or use a pasting service (see below)
• If you tried to show someone your code, you can use codeblocks
(run !code-blocks in #bot-commands for more information) or use a pasting service like:
Hello Im having some issues with curve fitting using spicy.optimize.curve_fit.
I keep getting the following issues
ValueError: array must not contain infs or NaNs
“`
def func(x, A, m, hf):
return A * (x - hf)**m
ff='data.txt'
data=pd.read_csv(ff,skiprows=3, delimiter='\t', encoding = "ISO-8859-1")
load=np.array(data.iloc[:, 1])
disp=np.array(data.iloc[:, 0])
istart=np.where(disp==max(disp))[0][0]
p0=[0.001,2,250]
ulfit, pcov = curve_fit(func, disp[istart:], load[istart:],p0,
bounds=(0, [0.1, 5, max(disp)]))
“`
I have a lot of similar curves, some work fine, and others give me errors depending on how I adjust p0.. (even though all the curves are similar) I can share a raw data file too if that helps
Did you try filling or dropping NaN values from data
the raw data has no NaN values
if I drop the initial values, then I get a different error:
RuntimeWarning: invalid value encountered in power
But it spits out a reasonable result anyways? not sure if this is an issue?
can someone help me understand how the quantiles are calculated in pandas?
Im looking at the documentation on pandas and I see the example:
columns=['a', 'b'])
df.quantile(.1)
a 1.3
b 3.7
Name: 0.1, dtype: float64
df.quantile([.1, .5])
a b
0.1 1.3 3.7
0.5 2.5 55.0
q = 0.1 should represent what the bottom ten percent of the data is below
so for column a, q=0.5 makes sense, there are 4 data points and an even number of values so you just take the average between them
I dont, however, understand how q=0.1 results in 1.3
how would I calculate this
q = 0.1, n = 4 ?
so according to that logic, the 0.1 quantile should be 0.1(4+1) which it isnt
or is that the position its found at
and I need to do some math to find what the value is
so that would get 0.5 is the position where 10% of the values are below
but I dont get where position 0.5 is
Have you tried to look up numpy percentile
quartile is basically the same as numpy percentile
I get how to use it in np, I just cant figure out where the numbers are coming from
when the data set is small
So I have an odd pandas question about how to best approach this. Essentially I have 3 columns with data and they're indexed on the time values. They do not, however, have data for the same time values. One column might be missing data in the beginning, one might be missing data at the end and the beginning, and the other might be missing data at the end.
What I want to do: Shift the columns so that their end data all occurs at the same time and back fill the values with NaN. I was going to use df.shift() and the number of NaNs to do the shift, but I can't with the column that also has data missing in the beginning. I'll overshift it. Any suggestions besides manually iterating and count through the NaN values from the back until I have a non-NaN for each column?
Hey guys, apology for a dumb question, Regression, Classification and Clustering can also be done in Deep Learning(like using Keras), or it can only be done in Machine Learning, Deep Learning is only for RNN, CNN etc...?
has anyone come up with problem while trying to debug a program running pyspark?
im currently using pycharm to debug it and this is the code
im taking a spark rdd (i believe) called tweets and taking the stopwords out of its "text" column
i can place a breakpoint on the last line and the debugger will work fine, but if i place it anywhere inside the remove_stopword function the debugger will disconnect
any one have an idea as to why? is it because of how spark works under the hood maybe?
does someone know how to make violin plots?
from lists
i have seen how to do it with csv file, but i just need to use list and its not working
if I have a time series as a feature (e.g. pitch over time for an audio file) while clustering, is it bad practice to use the mean of the time series as a feature instead to simplify it and avoid the curse of dimensionality?
More of a web scraping question but what libs can I use to parse this kind of data?
Returned from an HTTP request
import urllib3
def helper(url):
http = urllib3.PoolManager()
req = http.request('GET', url)
respData = str(req.data)
Arr = respData.split()
for i in Arr:
if 'href' in i:
return i
if __name__ == "__main__":
url = <enter url>
print(helper(url))
Hey all, does anyone have good resources for preparing for technical ML interviews? Currently an ML eng at big tech co. I've been using leetcode.com for coding prep for traditional data structures & algorithms, and datascienceprep.com for ML/stats questions, was wondering if anyone knew of others.
Is there a way to select values by an array that defines which column for every row I will select in numpy? (without iterating every row)
Example:
column_indexes = np.array([1, 0, 1, 1, 1])
values = np.array([[1981.5 , 1894. ],
[ 489.33333333, 492. ],
[1110. , 1110. ],
[ 197. , 197. ],
[ 301.66666667, 319. ]])
values_selected = array([1894. , 489.33333333, 1110. , 197. ,
319. ])
Thanks!
this code works, but I think there is a better way to do that
result = [row[pseudo_label_col] for row, pseudo_label_col in zip(values, column_indexes)]
np.array(result)
>>> np.take_along_axis(values, column_indexes[:, None], axis=1)
array([[1894. ],
[ 489.33333333],
[1110. ],
[ 197. ],
[ 319. ]])
I'm making a feature set where the features are based on an analysis of audio files of differing length. For example, I have audio files A and B and the feature is the loudness over time, but A is 2 times the length of time as B. As a result, the feature for A would be an array of 2x the length of the feature for B. What would the best way to cluster be when I have feature sets of differing length?
@velvet thorn thank you!
aggregate
or pad
@velvet thorn if values matrix have n_cols > 3, the method is still valid? the trick with column_indexes[:, None] will need to be rewritten, correct?
thanks
I think you should ask this in a subreddit search for terms like "Python advice" or "advice", "cs advice"
Thanks!!
[1, 2, 3, 4]. There's a 3 element gap between first and last element. (n - 1).
q=0.1 which means it gets value of 3 * 0.1 elements after from first element (sorted)
so 1.3rd element => 0.7 * first_element + 0.3 * second_element => 1.3
Same for [1, 10, 100, 1000]
0.7 * 1 + 0.3 * 10 = 3.7
Please correct me where I'm wrong, I'm trying to clear my basic concepts:
Regression, Classification, Clustering, dimensionality reduction etc are some major algorithms in Machine Learning.
Machine Learning also has another set of special algorithms called Neural Networks.
Deep Learning is when Neural Networks has depth, i.e. with multiple Layers.
Deep Learning specialize in non-linearities, feature engineering is also done automatically.
RNN, CNN, GAN are some popular architectures of Deep Learning.
Neural Networksis is the machine learning type called reinforcement Learning.
neural networks can be used for reinforcement learning
but they're not the same
You are right. But it is AI branch
that's not what you originally said though
ok
Anyone is working on Data Engineering Platform?
bruh
whole ml comes under ai
AI>ML>DL in short
What is DL
Hi all, anyone who works with classes for your data pipes
Do you prefer long methods to do all the lifting, or many small methods which you can edit later
Deep learning

Hello, I'm a second year data science major at a state university. I have been disappointed with my curriculum thus far because my courses don't cover python for data science specifically and the Intro to R class was pretty basic. I'd like to become more familiar with both of these and reach a level in which I could comfortably apply for internships. I eventually want to build a good foundation on python to start with ML. My understanding is that projects are incredibly important. Does anyone have a list of resources, specific python and R libraries, projects, books, or websites I could use to reach my goals?
!resources
The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.
@fast vector use this
Wow that's awesome! Thanks @crisp gazelle!
No problem!
Hello, I have trained a model in https://teachablemachine.withgoogle.com that has 3 classes (Hand Raised, Thumbs Up and Neutral).
I exported the model as a .h5 Keras model, and I've managed to make some predictions from some testing data that i've gathered.
the predictions output looks like this:
[[9.9910396e-01 1.9197341e-05 8.7688433e-04]]
im not sure what to make of this, any help would be great
Hello! I have a question regarding matplotlib. How can I plot different lists in the same scale?
So, ideally, the dark blue line, should be within the boundaries of the green and red line, but it doesn't.
The dark blue line is a list, composed of several altitude values.
I asked in #help-grapes and they told me to adjust the number of samples of the dark blue line same as the red and green lines, which is 512
And I can't get anymore cause those values come from server request to Google Maps API, and the max number of samples is 512
On the other hand, the dark blue line comes from the path provided by A* on a csv file
It's like this:
I have a set of altitudes, imagine: 60, 100, 120
For those values I can only trace a path between 60+ and 120+, summed to those numbers, that is, I can only trace between 120-180, 160-220 and 180-240, cause those are the max limits for the drone, in this case.
But I have like 3000 samples or so
But even if I adjust the sample numbers to be the same, I still get the plots above
Is anyone preparing for Google Summer of code, or have experience with the same?
I was planning to participate in GSoC 2021, as I have done a bunch of Machine Learning, NLP and Data Science Projects, also have some entry-level experience with Open-source contribution and Git/GitHub.
Is there a way to get the underlying numpy array of a matplotlib plot. I want to apply a color map to 1D data points, and then use openCV to threshold the rgb image. The problem is I want to run k-means on the points inside the threshold, so they need to correspond exactly with the original. The way I am currently doing it, by saving the plot to a file, means that the size will depend on the DPI, and the pixels don't match.
how can i add list of columns to df
KILLA_LINE_Col = ['SR_NO', 'DISTRICT_N', 'TEHSIL_NAM', 'VILLAGE_NA',
'HB_NO', 'LAYER_NAM', 'DESCRIPTIO', 'LENGTH_MTR',
'LENGTH_KAR', 'AREA_SQMTR', 'DES_MEASUR']
i want to add this colmns to existing dataframe
KILLA_LINE_file_copy[:,KILLA_LINE_Col] = np.nan
TypeError: unhashable type: 'slice'
getting this error
anybody here for help
hey guys, im using the dog.ceo api and sometimes it'll give slightly mispelled names (like "Germanshepherd" or "Stbernard" instead of "German Shepherd" or "St. Bernard")
any way to return a "correct" dog breed or fix it? not sure if this is the correct channel
for i in KILLA_LINE_Col:
df[f"{i}"] = np.nan```
you can probably look for some specific words based on which you can change all the values to some standard values like:
GS - German Shephard
SB - St. Bernard
since the spelling is right, you can strip the string and make it lowercase and then compare it with the spelling on the basis of which you can classify them as a particular breed
this is a somewhat complex problem
is there a finite list of misspellings?
Kinda, but the list is long soo i didnt wanna go through it, i fixed it by using a different api though
My original idea was to use wikipedias api to search using the mispelled word, and then use the suggested article's name for the correct breed name but it didnt work for edge cases
Or do a google search and use the first suggested wikipedia link's article name (so i did mispelled name + dog for the searcg query) but that took wayyy too long
It's fine now though, thanks
you can debug the inference source code you used for prediction
sup guys howre you going ?
I think it shows confidence precentage for detection based on 3 classes you made (hand raised, thumbs up and neutral), but i'm not sure tho @dense nova
howdy, working on some NLP projects. anyone here can answer a question about annotations? I see this type of annotation framework: https://universaldependencies.org/format.html are there any other type of annotation standards, frameworks you know of? Thanks
I'm having an issue with pytorch
so what happens is whenever i try to import it in a python file i get this error
Traceback (most recent call last):
File ".\script.py", line 7, in <module>
import torch
File "C:\Users\user\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\__init__.py", line 117, in <module>
raise err
OSError: [WinError 127] The specified procedure could not be found. Error loading "C:\Users\user\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\lib\cufftw64_10.dll" or one of its dependencies.
but when i do it in repl its completely fine
i don't understand it lol
i tried reinstalling cuda/cudnn and reinstalling pytorch but neither worked
and i havent found anything on this error
also i verified that the cufftw64_10.dll file is there
python 3.8.6 pytorch 1.7.1 cuda 11.0 and cudnn 8.0.5 btw
and gpu drivers are the latest
@ me if you have an answer btw
So i just fully deleted the torch folder from site-packages as well as fully deleted the cuda folder and then reinstalled both and it worked now
configuring cuda, cudnn, and ML/DL framework on windows is such a pain
Sounds interesting
I might think of participating
When is it held?
The dates
March 2021
How can I change multiple labels from a value to another in a pandas dataframe. I have tried train_df[train_df['label']=='humor'].label = 'fake' and doesn't really work. .label is a column or Series of the train_df dataframe.
wdym label?
columns?
It's fine, I figured it out train_df.loc[train_df['label']=='humor', 'label'] = 'fake'
thanks man
aight 
I mean it totally depends on your use case. Go with scrapy
if I have a df like this, is it possible to add a row which is not related to these?
It would be on the 2. row and it would be: 'Weigthed number', and then for the next 4 columns it would have a formula, the 3. row* 0.6 + the 4. row*0.4
Hey guys, is there any better lib than matplotlib to plot 3d data? its not very interactive, I can only rotate :(
You mean adding a new column?
no, a new row @lapis sequoia
this is the end goal basically
but instead of the formula I want a number displayed in the 2. row ofc
you can create a new dataframe that calculates each column with the given formula, then concatenate dateframe.
Oh, that could work, thank you! @lapis sequoia
you're welcome (:
@lapis sequoia ur name is very nice
hi
I am 14 year old and 9th grade student
I was interested in learning science
unfortunately almost all tutorials I've seen have a lot of complex math
they have weird symbols and terms I haven't even heard of
I wanted to ask that what all maths is required to learn data science
please ping me with reply
thank you
🤔
Statistics
Discrete Math
oh ok
I was doing that rn for my exam
but
they had those weird symbols
one looked like a mirrored e
which they call sigma
I am a math teacher
And math is not easy for me either
ok
I was thinking of just dropping it due to high level maths, but that would be quitting
so I thought I might study a math book or 2
Good idea, I recommend Khan Academy
thanks
so I need statistics and discrete math right?
that's all?
I do not think is all, it depends how much deep you want to go into data science
well i need enough for machine learning
Some hight level math in data science is calculus and linear algebra too
yes, linear algebra, i remember one guy saying that
thanks for advice, I'll try to get a grip on these topics
ModuleNotFound
Lol

Is there any great DL tutorial video on youtube that you guys would recommend? I’ve been doing data analysis with pandas, and I want to dig into deep learning with tensorflow, but can’t seem to find a good tutorial for total beginners.
Word2vec, what use?
Just want to play with nlp a bit. Make a thesaurus, gpt suggested that.
Also, are colab tpus really free?
Right now I have a pandas series that looks like this time 2020-12-24 12:34:00-05:00 222.600 2020-12-24 12:35:00-05:00 222.480 2020-12-24 12:36:00-05:00 222.520 2020-12-24 12:37:00-05:00 222.510 2020-12-24 12:38:00-05:00 222.330 ... 2020-12-30 12:51:00-05:00 222.510 2020-12-30 12:52:00-05:00 222.505 2020-12-30 12:53:00-05:00 222.565 2020-12-30 12:54:00-05:00 222.565 2020-12-30 12:55:00-05:00 222.535 Name: close, Length: 1000, dtype: float64 The time column is the index and i want to edit it to be numerical like 1, 2, 3, 4, 5, 6, 7, 8, 9.... Can someone help?
guys, i wanna use Xception as my model to train
Can i load it somehow and just train it from scratch?
how can i make plotygon from linestring
my code is not working
`import geopandas as gpd
from shapely.geometry import Polygon, mapping
def linestring_to_polygon(fili_shps):
gdf = gpd.read_file(fili_shps) #LINESTRING
gdf['geometry'] = [Polygon(mapping(x)['coordinates']) for x in gdf.geometry]
return gdf`
LINESTRING Z (528736.796 3513075.750 0.000, 52...)
need help
Hi guys,
I would like to make a user interface in order to visualize stock data that is being webscraped in real time.
I was wondering what you would recommend as a simple user interface. Would something like HTML and CSS suffice to create a basic real-time UI locally? Or is that not ideal as you have to constantly refresh the page to get new data? Or is it easier to stick to something like tkinter or another python package. I'm new to this so I would appreciate any type of advice!!
Flask and Pusher
Nice!! Thanks a lot, @soft dock !!
Is Pusher some kind of online host?
more of an API
hey
anyone want to hop into vc and explain us how to use the neat Checkpointer class
https://neat-python.readthedocs.io/en/latest/_modules/checkpoint.html
we dont know what the diffrent parameters for save_checkpoint exactly are
@gentle wagon to answer your question about numpy: It's used for linear algebra, or just do do large numbers of computations in batches. Suppose you're tracking data about the daily temperature in a given city: the array will have 365 elements. If you have that data for ten years, you can stack all those arrays to get a (10, 365)-shaped matrix. And then if you want to get an array of the daily average, you just have to make an array that's the average of each column. Not linear algebra per se, but numpy makes this kind of math easy to do.
guys, i wanna use Xception as my model to train
Can i load it somehow and just train it from scratch?
Hi guys. I want to learn data science and ml (including dl, rl and drl) but i don't think i have the necessary mathematical background for me to understand it properly. Which resources would you recommend to get me up to speed? And which resources would you recommend for learning data science and ml?
Pusher seems to be dependent on Visual Basic studio. Is there something I can do to prevent using that? I prefer to stick to PyCharm. But I keep getting this error whenever I try to install pusher:
error: Microsoft Visual C++ 14.0 or greater is required. Get it with "Microsoft C++ Build Tools": https://visualstudio.microsoft.com/visual-cpp-build-tools/
pip needs the build tools when it needs to build some package from source (which probably means theres no whl file for your version of python)
what version of python are you on?
is it a bad idea to use feature agglomeration on a time series?
So I should do a pip install? OR do I need to go to the website of microsoft to download the installer?
hey guys
@lapis sequoia I did a pip install in an isolated virtual environment, as I normally do
Uh, I'm sorry but I guess it's a mistake.. I don't remember asking anything here '^'
wrong dean, apologies
No problem :D
guys, i wanna use Xception as my model to train
Can i load it somehow and just train it from scratch?
what do you mean by train it from scratch?
if you mean train it with your own dataset which is totally different categories,/classes, you can
The Keras (TensorFlow) library has a built-in Xception model architecture that can be trained from scratch.
https://keras.io/api/applications/xception/
well, i am trying and acc is 0.007
i made my own small model with 3 layers and 0.25
so idk
Can you provide more information on what you're doing?
i'm kinda lost here, what do you mean with "3 layers and 0.25"?
I'm guessing 3x conv2d and a dropout of 0.25
So the build library is only needed if you need to build the package since there is no whl file, if you can find the whl file somewhere for your Python install you can just use that
I’m not completely sure if it’s on there but you can look up Christoph gohlke he made a repository of wheel files that you can download, check if the one you need is on there
Make sure you get the right one, should have cp<Python version> and if you have 64 bit Python then you need the one that says 64
By cp I mean like if you have Python 3.8 it would be cp38, 3.9 would be cp39, etc
https://www.lfd.uci.edu/~gohlke/pythonlibs/ there’s the link
nvm i think i got it
this is going good, isnt it? @hasty grail
Does anyone know if there will be an M1 native version of Miniconda for the new Macs? I don't see anything on the install page https://docs.conda.io/en/latest/miniconda.html
Better visualize some examples of the model's predictions to be sure it's not just predicting the most common class or something
i am doing the predictions "manually"
def predict(path, dims, color):
img = cv2.imread(path, color)
img = cv2.resize(img, dims) / 255
prediction = model.predict(img[np.newaxis, ...])
print(np.argmax(prediction))```
But for example, ive already found one image it fails
Anyway, i would like to know which is the other class most likely to be
like, the top 5 classes
idk if i am explaining
so rather than using the for loop, I have this, but it's just as slow:
temp = df.loc[right,:].set_index(pd.DatetimeIndex(left))
temp = temp.groupby(temp.index).apply(lambda x: x.ffill())
temp = temp[~temp.index.duplicated(keep="last")]
df.update(temp)
but you're saying i should be able to join on this rather than doing the group by?
did you figure out? what are you trying to do exactly?
like what kind of data are in these dataframes and what are you doing with it?
yeah i have some daily data for a bunch of columns, and there are some dates that are "bad" (holidays and weekends), but sometimes data comes in on those "bad" days. So what I want to do is update the data on the day before the "bad" day with the "bad" days data. So for the most part that's going to look like updating Friday's data with data that came in on Saturday and Sunday, if any
But I can't just backfill, because Sunday's data would overwrite Saturday's data
(on the off chance that data came in on both saturday and sunday)
so I ahve a function that gives me these date pairs, and the code above is what I have to solve the issue
so Friday gets the data for the next two days, even though those days are in the future?
correct
And what does it mean for Friday to "get" that data? Is this addition of numbers or something?
temp = temp.groupby(temp.index).apply(lambda x: x.ffill().iloc[-1])
df.update(temp)```
no just overwrite
overwrite if not nan
ah
can you show me an example of what the dataframe looks like?
like if you print it?
I have a question on how to impute values given the contents of a different column. Like if colA=1 impute 2 into colB, if colA=2 impute 3 into colB. Anyone have an idea on how to do this?
can you show what the dataframe looks like?
It's the titanic training set. I want to impute average age of people within the same class/sex rather than the mean of the column.
so, conditional mean imputation?
that sounds right
Let's see if I still have that code.
great, thanks!
are you familiar with how you can do masks with dataframes?
I've done it I think, but it's been a while
@serene scaffold
2001-05-04 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2001-05-04 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2001-05-11 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2001-05-11 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2001-05-18 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN```
basically
lol
um... @fervent flume
so you can have a mask like (passengers['age'] == n) & (passengers['class'] == 'first'). And that will give you a series of true or false values.
And then you can mask another column with that to only get the columns where those conditions are true in the other columns.
And take the mean of that
💥
Ok, thanks. I'll give it a shot
I can't tell what I'm looking at. Try putting it in our paste bin
!paste
Pasting large amounts of code
If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pydis.com/
After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.
https://paste.pythondiscord.com/owedilaqur.typescript @serene scaffold
it's very sparse
but you can imagine creating a df with like a 10 year daily frequency, and 1-3000 columns
then injecting random data with a 95% chance of being nan, and you'd have the dataframe
so each row is a day, and days that are Sunday or Saturday, you need to copy the non-nan values into the Friday row, and then delete the Sunday and Saturday rows?
the numbers you're trying to impute: you're replacing nan values, yes?
@serene scaffold yeah basically. I have a list of date pairs that I want to replace, Friday is just the most common example, there're other dates in general that i'd want to do this to. And I want to keep the last value if there's a value on both saturday/sunday
how much programming experience would you say you have?
I was just going to say that jupyter notebooks tend to be confusing for learners.
a bit from uni, but i haven't done much other projects
Jupyter notebooks can become convoluted since the cells can be executed in any order you'd like.
are you specifically only trying to replace NaNs?
did it work?
Someone ping me if they want me to come back.
I'm working on it now.
Sorry I wasn't looking, anyway you could use tf.math.top_k or take the last k elements of np.argsort
you do transformers stuff?
or is that tensorflow?
the latter
I've used tensorflow but I still haven't got a clear picture of what it "is". Is it basically numpy on the gpu?
@serene scaffold I have the numbers (means) that I want to insert into the NaN points. I just don't know how to conditionally impute them. At first I was thinking something like if df['column'] == x & df['column'] == y, but don't know where to go from there.
I got the means with grouping
but if it's everything then I'll never wrap my head around it.
Also since it's integrated with Keras you don't have to write your own training loops
Makes training ML models so much more convenient
once you have the average of the non-nan values for that mask, you can use fillna to replace the nans.
what is graph execution?
Here I created a new df with fillna, but it was with the mean of the entire column.
train_num2 = train_num.fillna(train_num.mean().round(0))
ML training is usually done in graphs, while regular computation uses eager execution
Eager execution is essentially just Python logic
@serene scaffold no all values should be updated if there's a new non-nan value later
I think lazy execution is a little more accurate. e.g. spark
mask = (df['age'] == 40) & (df['class'] == 'first')
df['died'].fillna(np.nanmean(df['died', mask]))
I didn't look up the methods or antying for this so this is probably wrong, but I think something along these lines will work.
ok, I'll play with it. thanks again
might even be df['died'][mask]
Graph execution is a bit like lazy execution but not really. It involves compiling Python functions through tf.function which run as graphs during runtime, allowing the engine to perform optimizations such as parallelizing and merging operations.
Gotcha. That is similar to spark as well.
Can see it when looking at the "explain" for a given dataframe
This is also where a lot of the original notoriety of TensorFlow came from though. Originally, everything had to be done via graphs, which made it incredibly difficult to debug because breakpoints don't work in graph execution, as the code that is actually executed is dynamically generated elsewhere when the function is compiled.
also you had to write boilerplate code for the compile-run process
Was it TensorFlow 2.0 that added the ability to do stuff outside of graphs? I haven't really messed around with it for a long, long time (~2015-ish)
Yup.
Also even in graph mode you don't have to mess around with tf.Session anymore. You just use the tf.function decorator around whatever function you want to compile.
the first time the function is evaluated, it is automatically compiled
Have you tried using PyTorch? If so, what are your thoughts on it vs TensorFlow?
I used the old Torch package in Lua, but haven't touched the python version yet.
Only in passing, the thing I don't like is that you still have to define your training/evaluation loop explicitly whereas TF 2.0 already has a default implementation thanks to Keras
However, it is easier to debug because it uses eager execution all the way
I think I'm missing something, but can't you use something like a CrossValidator class (or variant thereof) to abstract away the training/evaluation part?
I don't think that's a built-in thing
If you wanted to, you could use supplementary libraries like https://github.com/mv1388/aitoolbox but it's extra work
microsoft's version was the best though. By far the most intuitive to understand and to use imo. the lack of explicit loops and the way recurrence was handled was also super nice.
too bad that died
can someone explain this error. Specifically what does "non-singleton dimension" mean?
RuntimeError: The size of tensor a (1338) must match the size of tensor b (5) at non-singleton dimension 1
@serene scaffold I couldn't get it to work with masking. I figured it out with a different groupby, and defining a funciton to impute, then transform. titanic_tr is the training df
#impute age based on sex/pclass
#Create a groupby object: by_sex_class
by_sex_class = titanic_tr.groupby(['Sex', 'Pclass'])
#Write a function that imputes median
def impute_median(series):
return series.fillna(series.median())
#Impute age and assign to titanic['age']
titanic_tr['Age'] = by_sex_class['Age'].transform(impute_median)
looks like this was my solution when I did conditional mean imputation for homework
def conditional_mean_imputation(df: pd.DataFrame) -> pd.DataFrame:
label_series = df[BIN_LABEL]
df = df.groupby(BIN_LABEL).transform(lambda x: x.fillna(x.mean()))
return df.join(label_series)
df[BIN_LABEL] was the column that identified the class for that row. Also worth noting that this is doing the imputation for every column, or something
say you have a tensor with shape (x, y, 1)
the last dimension is a singleton dimension
the first two are not
@sleek fjord, no
go on
okay in general
don't post screenshots please
post code as text; it's easier to read and debug.
o sorry
!code
Here's how to format Python code on Discord:
```py
print('Hello world!')
```
These are backticks, not quotes. Check this out if you can't find the backtick key.
Traceback (most recent call last):
File "/Users/airmac/Documents/NBA Python/Untitled.py", line 1, in <module>
from basketball_reference_scraper.teams import get_roster, get_team_stats, get_opp_stats, get_roster_stats, get_team_misc
ImportError: No module named basketball_reference_scraper.teams
[Finished in 0.072s]
this is the error code
a = get_opp_stats('BOS', 1955, data_format='TOTAL')
print(a)
thats the code
what do you understand by this ImportError: No module named basketball_reference_scraper.teams
nothing
so you're trying to import something
i did the pip install basketball_reference_scraper
from a module (Python file) that it can't find
presumably either your install failed
or you're using the wrong Python installation
it didnt fail
how do i see if this is the case
well
that seems to be the case, considering you can't import it
or the module name could be wrong
can i post links?
to the api thing?
its been updated recently
and I have the latest version of python
Python 2.7.16
this is the version
damn left on read
okay
python2 is deprecated
why
It was released a long time ago and the python community has moved on to python 3.
i think so
it might somehow solve it. The problem is that Python can't see the module you're referring to.
yes
you should have python 3 in either case. There's almost no point learning python 2 at this point because anyone who hasn't updated their project to 3 has probably abandoned that project.
btw @sleek fjord if u are scraping data u can use Parsehub it will help u
do i need to relaunch atom after i get the new version
scrape any web with free
I am not sure.
can you web scrape something that is hidden behind a login
hmm
i think so
I would check that you're allowed to scrape that website
yes
try robots.txt in the last of you scraping web
that will tell u what u can scrape
i just downloaded the new version of python and its still saying my version is 2.7.16
ah ok
try using python3 as your command instead of python
hmm
cool
and its saying that ive installed basketball-reference-scraper
if i do pip show
and its still coming up with the same error
File "/Users/airmac/Documents/NBA Python/Untitled.py", line 1, in <module>
from basketball_reference_scraper.teams import get_roster, get_team_stats, get_opp_stats, get_roster_stats, get_team_misc
ImportError: No module named basketball_reference_scraper.teams
[Finished in 0.139s]```
is the module name right?
yes
just to double check i downloaded the example of the offical github
and ran it
and that dont work
how did you download it?
just try from cmd
ah got it thank you
i pressed raw then save as
try pip install git+https://github.com/vishaalagartha/basketball_reference_scraper.git
im on mac
ok
that's fine
try the same command with python3 -m pip instead of just pip
yes
so
python3 -m pip https://github.com/vishaalagartha/basketball_reference_scraper.git
?
python3 -m pip install git+https://github.com/vishaalagartha/basketball_reference_scraper.git
yeah that's what i am sayin'
thank you
@gaunt heron no
no?
im going to relaunch atom
its coming up with the same problem
a = get_opp_stats('BOS', 1955, data_format='TOTAL')
print(a)
something wrong with my code?
any time there is something wrong with your code in the sense that there's an error message, please always share the whole error message.
'Traceback (most recent call last):
File "/Users/airmac/Documents/NBA Python/Untitled.py", line 1, in <module>
from basketball_reference_scraper.teams import get_roster, get_team_stats, get_opp_stats, get_roster_stats, get_team_misc
ImportError: No module named basketball_reference_scraper.teams
[Finished in 0.12s]'
thats the entire error message
okay, and what was the terminal output when you ran that command from before?
I'm referring to this command. Did you run it?
yes
What happened?
Hey @sleek fjord!
Uh-oh! It looks like your message got zapped by our spam filter. We currently don't allow .txt attachments, so here are some tips to help you travel safely:
• If you attempted to send a message longer than 2000 characters, try shortening your message to fit within the character limit or use a pasting service (see below)
• If you tried to show someone your code, you can use codeblocks
(run !code-blocks in #bot-commands for more information) or use a pasting service like:
!paste
Pasting large amounts of code
If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pydis.com/
After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.
This code is in a file. Where is that file?
Is that where your terminal is operating from?
Can you go there in the terminal?
Yes, that's the easiest way for us to help you debug
how do i do that on mac
if you use a UI, we'd have to have extensive knowledge about how that UI works.
cd is usually the command to change directories
and ls usually tells you what is in your current directory
'cd: string not in pwd: /Users/airmac/Documents/NBA'
when i do ls
'Applications Documents Library Music Public
Desktop Downloads Movies Pictures get-pip.py'
do cd Documents
when i write ls now it comes up with
'Excel NBA Python School'
should i do cd nba python
yes, but you might need to put "NBA Python" in quotes
is Untitled.py the file that contains the code you referred to earlier?
yes
alright, do python3 Untitled.py
youre a fast tyoper
thxxx
it worked?
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/basketball_reference_scraper/teams.py", line 6, in <module>
from constants import TEAM_TO_TEAM_ABBR, TEAM_SETS
ModuleNotFoundError: No module named 'constants'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/airmac/Documents/NBA Python/Untitled.py", line 1, in <module>
from basketball_reference_scraper.teams import get_roster, get_team_stats, get_opp_stats, get_roster_stats, get_team_misc
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/basketball_reference_scraper/teams.py", line 10, in <module>
from basketball_reference_scraper.utils import remove_accents
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/basketball_reference_scraper/utils.py", line 4, in <module>
import unicodedata, unidecode
ModuleNotFoundError: No module named 'unidecode'```
so their code (not yours) is broken.
@sleek fjord is your problem fixed?
no, the library they installed contains broken code.