#data-science-and-ml | Python | Page 212

olive prairie Nov 26, 2019, 3:18 AM

#

example:

📎 unknown.png

#

I know how I could write javascript to fix this (draw a circle, then draw a number, through the set), but I'm using a Bokeh widget to filter the data that's working pretty nicely...

#

I've been reading through their code and it doesn't look like there's anything to do this, but just started looking into using their "Scatter" figure, and creating a second dataframe; one for the circles, one for the numbers, and interleaving the two together. But I'm also open to other libraries if there's one that does this better

lapis sequoia Nov 26, 2019, 3:28 AM

#

looks like peanuts

#

your goal is to visualize and you say you're open to things other than bokeh.. but you have to state what you intend to visualize better

olive prairie Nov 26, 2019, 3:29 AM

#

Haha - I've added borders to those circles since then

lapis sequoia Nov 26, 2019, 3:29 AM

#

I see different colors.. so I'm guessing is some categories?

#

and it's a scatter plot

olive prairie Nov 26, 2019, 3:30 AM

#

Yeah, those are different categories, and the numbers I'm putting on top of them are the category numbers; because there are so many different ones, I don't want to have to refer to a long color legend

#

I have a hovertooltip which could include the categories, but I'd like people to be able to get a sense of how close or far away the different categories are from one another just by looking at it

lapis sequoia Nov 26, 2019, 3:32 AM

#

what's the number in the circles

#

do you need it

olive prairie Nov 26, 2019, 3:33 AM

#

The number is the category. This is a TSNE visualization of an LDA topic model - the numbers and colors are the topics

lapis sequoia Nov 26, 2019, 3:33 AM

#

you can't use colors and numbers to represent the same thing

olive prairie Nov 26, 2019, 3:33 AM

#

I think it could look good with the numbers

lapis sequoia Nov 26, 2019, 3:34 AM

#

use plotly

#

and plot graphs side by side.. one for each category

#

and label the graph with the number

#

https://plot.ly/python/plotly-express/

Plotly Express

Plotly Express is a terse, consistent, high-level API for rapid data exploration and figure generation.

olive prairie Nov 26, 2019, 3:35 AM

#

I tried plotly and found the same problem -

📎 unknown.png

#

I get the idea of having different graphs for each category, but there are 18 categories

#

And I'm trying to visualize their relationship/distance to one another

lapis sequoia Nov 26, 2019, 3:40 AM

#

distance together or distance one by one

#

just reduce the size

#

and add a filter

somber hamlet Nov 26, 2019, 6:48 AM

#

just remove the numbers?

lapis sequoia Nov 26, 2019, 7:20 AM

#

yeah I told him that..

plain turret Nov 26, 2019, 7:52 AM

#

Since it have colors, thoses numbers can be in the legend yeah?

lapis sequoia Nov 26, 2019, 7:53 AM

#

that's what I said

plain turret Nov 26, 2019, 7:53 AM

#

I dunno if they represent something but a color gradient might be cool to use too if they make sense to use

#

If 1 is red and 14 is blue, you just need to put the gradient next to graph. But this work only if the numbers are a measure of the same thing

lapis sequoia Nov 26, 2019, 7:55 AM

#

they're different categories

plain turret Nov 26, 2019, 7:55 AM

#

Ah yeah nono then

#

sweatcat

chilly geyser Nov 26, 2019, 11:25 AM

#

@lapis sequoia Pre-training the whole thing is too expensive obviously though

lapis sequoia Nov 26, 2019, 11:37 AM

#

you keep missing the point here

#

your problem/application 2. representation that suits your application 3. metric that fits your application best

#

is where the type of model comes in.. representing your word vectors in a suitable space..

#

there's lighter frameworks that are language specific.. including ones from BERT that'll help you do that

supple ferry Nov 26, 2019, 12:15 PM

#

Hey there! anyone knows the reason of such behavior? I have this dataframe (just several rows of it for reproduction purposes):

id    dpt    price    minutes
9710556    0    180.82    140
9710556    0    180.82    140
9710556    0    202.32    145
9710556    1    218.32    145
9710556    1    250.82    140

I am trying to find out the number of (price minutes) combos being strictly less than the other combos. And I try to find it for all of them. My data will be grouped by id and dept after all.
This is the function I came up with which gives me the correct output if I forget about grouping by dept for now:

def ranker(df):
  values = df[["price", "minutes"]].values
  result = values[:, None] < values
  return np.logical_and.reduce(result, axis = 2).sum(axis = 1)

And if I apply it to my data now, I get this:

small.groupby("id").apply(ranker)

Out[144]: 
id
9710556    [2, 2, 0, 0, 0]
dtype: object

Which means that, the first price minutes combination is exactly less (in both values) from 2 options within this dataset, and so on.
When i try to assign it back to dataframe, I get NaNs everywhere:

small["a"] = small.groupby("id").apply(ranker)

small.a
Out[147]: 
102    NaN
103    NaN
104    NaN
105    NaN
106    NaN
Name: a, dtype: object

How can I solve this? My overall goal is to run this function groupbing by id and dept in the end
EDIT: code

lapis sequoia Nov 26, 2019, 12:17 PM

#

what's small

supple ferry Nov 26, 2019, 12:18 PM

#

the name of tha dataframe i gave it

#

as far as i know, groupby applies the function to every group seperately which is dataframe by logic

chilly geyser Nov 26, 2019, 1:22 PM

#

@lapis sequoia I don't see your point

lapis sequoia Nov 26, 2019, 2:19 PM

#

exactly

barren bluff Nov 26, 2019, 2:30 PM

#

Hey im just starting off with CNN's working with the basic Fashion-mnist dataset using tensorflow and keras. I am a bit stuck with two things hoping someone can help me out! if the data is 2D do I have to flatten it two a 1D array? Also, how does the layering work exactly and how do you set it up?

acoustic mural Nov 26, 2019, 3:10 PM

#

tf.keras has a flatten layer

#

but you could also get there through pooling

chilly geyser Nov 26, 2019, 3:25 PM

#

@lapis sequoia No, I mean that you talking about different parts of solving the problem. I don't see your point of stating it. I was never interested in solving the problem.

I'm certainly not going to pretrain because I don't have that kind of data, nor is it my main project to do so.
My problem is primarily input: text, output: classification. It's as simple as that and details like sentiment, text type, etc. generally don't matter except that they are in English sentences.
Metric - I don't even see your point with this, most business applications would just put a dollar sign to everything that they can or care about. In either case, any and all categorical losses would be relevant to me, and I'm not particularly using or focussing on any

My problem (was) is very simple, the tokenizer from TF2Hub's Albert doesn't seem to produce expected things, and the scripts provided seem tricky (with stuff like FLAGS and TF2.0 migration in the way). That is about it.

#

@acoustic mural Pooling is probably better, or at least, in the standard help I see online

acoustic mural Nov 26, 2019, 3:27 PM

#

it depends on what you're doing with it, sometimes pooling all the way down to 1D loses too much information

silent swan Nov 26, 2019, 3:53 PM

#

lol just use roberta+pytorch

#

albert is new enough that I think you'll get less help around it I think

#

or roberta+tf if you really want to use tf

chilly geyser Nov 26, 2019, 6:01 PM

#

Yeah I noticed the different levels of how involved the coder is for each different package. Anyway I think I accomplished my goal with benchmarks, it does seem that roBERTa works quite out-of-the-box and it's not really quite clear what hyperparameters are really good/important to change for any given problem

silent swan Nov 26, 2019, 6:49 PM

#

learning rate X num_epochs from my experience

#

how big is your dataset

lapis sequoia Nov 26, 2019, 8:24 PM

#

In python multithreading if you multi thread 2 threads on one class, then the variables within that class wont change by the other thread? Like duck = 0 in thread 1 And in thread 2 duck gets changed to duck = 1, now in thread 1 duck = 0 still right?

#

Also does the Same apply for calling Another function within that Same function with 2 threads?

distant inlet Nov 27, 2019, 2:43 AM

#

What is wrong here

📎 Screenshot_2019-11-27-08-13-24-328_com.android.chrome.png

twin hinge Nov 27, 2019, 2:44 AM

#

What Python version is that? What version was the as keyword added in?

distant inlet Nov 27, 2019, 2:45 AM

#

Python 3

#

Its working now

#

📎 Screenshot_2019-11-27-08-21-24-508_com.android.chrome.png

twin hinge Nov 27, 2019, 2:52 AM

#

@distant inlet: :)

distant inlet Nov 27, 2019, 2:58 AM

#

python

lapis sequoia Nov 27, 2019, 7:38 AM

#

this is pretty cool

#

http://cs231n.github.io/python-numpy-tutorial/

Python Numpy Tutorial

Course materials and notes for Stanford class CS231n: Convolutional Neural Networks for Visual Recognition.

#

covers basics for DS

quartz monolith Nov 27, 2019, 8:14 AM

#

Is there a lib to extract from a photo (Document) to text, tables and photo? Just saw photo to text with cv2 and pil

quartz stream Nov 27, 2019, 10:55 AM

#

https://github.com/cseas/ocr-table

GitHub

cseas/ocr-table

Extract tables from scanned image PDFs using Optical Character Recognition. - cseas/ocr-table

olive willow Nov 27, 2019, 3:38 PM

#

hey guys, do you have any good resources where you can find messy datasets to train data cleaning skills?

compact bluff Nov 27, 2019, 5:45 PM

#

i have a matrix in tensorflow and I want to create a heatmap of it using seaborn and log it using tf.summary.image. pretty much, I need to get pixel data from a seaborn plot. does anyone know how to do this?

#

I've only found tf.image.decode_png but I think it would be more efficient and accurate to directly get the pixel array from the seaborn plot itself

native stag Nov 27, 2019, 10:13 PM

#

https://gluon.mxnet.io/chapter01_crashcourse/preface.html

#

love this resource

#

amazin

lapis sequoia Nov 28, 2019, 12:31 AM

#

if you want the summary, as in the shape.. why dont you use np @compact bluff

wraith basin Nov 28, 2019, 2:05 AM

#

@olive willow http://www.kdnuggets.com/datasets/index.html

Stanford Large Network Dataset Collection: http://snap.stanford.edu/data/

Google Public Data Directory :http://www.google.com/publicdata/directory

Natural Earth Data : http://www.naturalearthdata.com/downloads/

Geocomm : http://data.geocomm.com/drg/index.html

Geonames data: http://www.geonames.org/

US GIS Data: Available from http://libremap.org/

KDnuggets

Datasets for Data Mining and Data Science - KDnuggets

See also Government, State, City, Local, public data sites and portals Data APIs, Hubs, Marketplaces, Platforms, and Search Engines. Data Mining and Data Science Competitions Google Dataset Search Data repositories Anacode Chinese Web Datastore: a collection of crawled Chines...

Google Public Data Explorer

The Google Public Data Explorer makes large datasets easy to explore, visualize and communicate. As the charts and maps animate over time, the changes in the world become easier to understand. You don't have to be a data expert to navigate between different views, make your o...

Natural Earth

Nick

Downloads | Natural Earth

Data themes are available in three levels of detail. For each scale, themes are listed on Cultural, Physical, and Raster category pages. Stay up to date!

olive willow Nov 28, 2019, 7:04 AM

#

@wraith basin thanks!!!

pale thunder Nov 28, 2019, 7:43 AM

#

Is there a way to find the focal points of an ellipsis based on it's contour? numpy+mpl

maiden void Nov 28, 2019, 1:47 PM

#

hope you dont mind me doubleposting, just figured this was a better place to ask: im looking to change the structure of a dataset to this:

Country Year Debt Unemployment GDP
Afghanistan 1986 13 7 3456
Afghanistan 1987 12 8 3487
Afghanistan 1988 13 4 2356

#

so i have this:

📎 unknown.png

#

and i want this:

📎 unknown.png

#

anyone know how i could go about changing that?

olive willow Nov 28, 2019, 3:18 PM

#

@maiden void so I would try this https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.transpose.html

#

it's a method that changes columns into individual rows

#

you would have to look around how to get it into your desired format tho, I've no clue

lapis sequoia Nov 28, 2019, 5:42 PM

#

Can anybody recommend me a website that mainly host contest for ml frequently.

split temple Nov 28, 2019, 5:44 PM

#

@lapis sequoia - like this? https://www.kaggle.com/competitions

Competitions | Kaggle

Download Open Datasets on 1000s of Projects + Share Projects on One Platform. Explore Popular Topics Like Government, Sports, Medicine, Fintech, Food, More. Flexible Data Ingestion.

errant venture Nov 28, 2019, 7:59 PM

#

Hey I'm trying to train a weeather data set to tell me if it will rain tomorrow, but the column is "yes/no" and not binary like it's aasking for

#

Anyone know a way around this? I can provide pictures if anyone is interested thanks!

true fiber Nov 28, 2019, 8:16 PM

#

Can you use pandas?

errant venture Nov 28, 2019, 8:19 PM

#

df['RainTomorrow'] = df['RainTomorrow'].map({'Yes': 1, 'No': 0}) Worked for me, now I have to fix the other errors 😦

#

I know this means very little without knowing other data, but is anyone able to translate what these errors are saying?

📎 errors.png

maiden void Nov 28, 2019, 8:25 PM

#

thanks @olive willow it did work to some degree, but unfortunately not to do what i wanted

#

going borderline mental here after spending the entire day on this single dataset

olive willow Nov 28, 2019, 8:28 PM

#

Hahaha, maybe you can get only the columns with the years, transpose them and then add them to the others?

#

@maiden void

maiden void Nov 28, 2019, 8:31 PM

#

thats actually what i started doing. so now i have:

📎 unknown.png

#

so all the Japan columns are actually variables

#

that go on for like 100 variables and then they continue for the next country

#

so what i should do is move the next country under this one

#

unfortunately, i dont know how to do that effectively

#

(could always copy paste, but that will take forever and also its better to do it in python or R so i can recreate it)

true fiber Nov 28, 2019, 8:38 PM

#

@errant venture this error is very simple - "could not convert string to float" - the date string cannot be converted to a number. In fact, why should the probability that it rains tomorrow depend on the date today? It is possible that rain depends on pressure or temperature for several days earlier. If this is the case, then the table with the dates must be transformed so as to enter data for 1,2,3, ... the last few days.

errant venture Nov 28, 2019, 8:39 PM

#

@true fiber that makes sense, so if I were to remove the date column that would likely fix it?

true fiber Nov 28, 2019, 8:41 PM

#

@errant venture Yes, the error will disappear, but this does not mean that the prediction will make sense, can it still take into account the pressure over the past few days?

errant venture Nov 28, 2019, 8:41 PM

#

@true fiber yeah it has pressure/Wind etc

#

And location, will I have to remove all data sets that contain strings?

#

It throws an error for Wind direction and location since they arent floats, does this mean I can' tu se the mto train a data set

true fiber Nov 28, 2019, 8:45 PM

#

All strings need to be converted to numbers, but if there are several values, then they need to be binarized. If you simply delete the strings, important information is lost, but you can check how something works.

errant venture Nov 28, 2019, 8:45 PM

#

So say wind direction, that has say N W E S values, would converting them to 1, 2, 3, 4 be smart?

#

Thank you so much for answering by the way, you've already helped me understand this a lot

true fiber Nov 28, 2019, 8:51 PM

#

No, to encode categorical data you need to use https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html

errant venture Nov 28, 2019, 8:52 PM

#

@true fiber Ah thanks for that, will look into it now!

true fiber Nov 28, 2019, 8:56 PM

#

But it is more interesting what to do with the cities. First you need to try to remove the city and look at the accuracy of the forecast. Then it may be necessary to take into account the existence of rainy cities and consider them separately. But probably for a simple task, they can be deleted.

errant venture Nov 28, 2019, 9:07 PM

#

@true fiber Yeah I'll try and use them, but for now I just want to train the rest of the data set just for now

#

I'm just working passed a "Input contains NaN, infinity or a value too large for dtype('float64')." error

true fiber Nov 28, 2019, 9:10 PM

#

Yes, you must clear the table by deleting all missing data.
df.fillna(0) / df.dropna(axis = 1, thresh=3) / df.fillna(df.mean())

warped tangle Nov 28, 2019, 9:16 PM

#

pandas people, whats the difference between df[0] and df[df.columns[0]]

#

i normally do the first one to get the targets of a df but for some reason its not working on the df im working on rn

errant venture Nov 28, 2019, 9:18 PM

#

@true fiber I've already prepped the data and removed or replaced all NaN values, is there a check for infinite values?

silent swan Nov 28, 2019, 9:21 PM

#

basically df[blah] can be ambiguous. df[column_name] generally works to get a column

warped tangle Nov 28, 2019, 9:21 PM

#

wdym by ambiguous?

silent swan Nov 28, 2019, 9:23 PM

#

it can have different behavior depending on what "blah" is

#

actually don't worry about that

#

in any case, if you want to get a column, do df[column_name]

warped tangle Nov 28, 2019, 9:23 PM

#

k

#

thx

errant venture Nov 28, 2019, 9:40 PM

#

np.isfinite(df.any()) returns true for every column ,what does that mean?

haughty vale Nov 28, 2019, 9:46 PM

#

Hey is anyone here open to giving me some guidance on this ML project I'm doing?

#

Oops sorry if I interrupted something

jovial river Nov 28, 2019, 9:57 PM

#

I have a dataframe that has the batting peformance of player who played in the world baseball classics. This is only played in certain years. The columns this dataframe has are playerID, yearID, BattingPerformance and Name. I want to calculate the average in their batting performance in the current year they played in the WBC, previous Non WBC year and following Non WBC year to see if the WBC had any effect in their performance. For example, player X in year 2006 had a batting performance of 0.3, -0.2 in 2005, and 0.01 in 2007. The average would be (0.3 + (-0.2) + 0.01) / 3. The yearID goes from 2005 to 2018. The WBC years are 2006, 2009, 2013, 2017.

The final output of this new dataframe should have the following columns

[Name] | [Average calculated from 2005,2006,2007] | [Average calculated from 2008,2009,2010] | [Average calculated from 20012,2013,2014] | [Average calculated from 2016,2017,2018]

What methods does pandas have that will help me achieve this?

lapis sequoia Nov 28, 2019, 10:14 PM

#

@jovial river https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html

#

using groupby and mean should do it

jovial river Nov 28, 2019, 10:28 PM

#

@lapis sequoia

So this is what I have so far.

def caculate_impact_score(row, batting_df):
    print(row)
    return 0
      
batting_impact_score = people_WBC_batting.groupby(['playerID', 'yearID']).apply(lambda row: caculate_impact_score(row, batting))

people_WBC_batting is a dataframe that contains all the players that played in a WBC year(yearID are 2006, 2009, 2013, 2017).

batting is the one that has the batting performance for a player in a particular year.

I can't really do mean() on the batting dataframe because it includes player that didn't play in WBC.

#

So I have find a way to use the player ID and yearID in wbc dataframe and associate that with the playerID and yearID in batting and get their batting performance.

lapis sequoia Nov 28, 2019, 10:32 PM

#

could you join the tables

#

merge them

#

or filter

#

bad_id_list = []
filtered_frame = batting[~batting['playerID'].isin(bad_id_list)]

errant venture Nov 28, 2019, 10:35 PM

#

I keep gettiing:

Input contains NaN, infinity or a value too large for dtype('float64').

But my table has none of these issues

#

grrr

lapis sequoia Nov 28, 2019, 10:35 PM

#

are you converting types?

#

try pd.tonumeric

#

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_numeric.html

errant venture Nov 28, 2019, 10:57 PM

#

@lapis sequoia There are only floats in the data set, I've removed any other data types

lapis sequoia Nov 28, 2019, 11:05 PM

#

still, i've had that issue before, and panda's to numeric helped resolve it

#

its good for number type conversions

jovial river Nov 29, 2019, 3:13 AM

#

Is there any way to use the diff function https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.diff.html to find the difference between the previous element and the next element?

#

has an example of doing previous row and following row but not both.

paper niche Nov 29, 2019, 4:07 AM

#

@jovial river not tested, but probably do a diff, then shift

#

diff of 2, then shift by -1

rancid slate Nov 29, 2019, 4:34 AM

#

When I run a command like this to start training the model

model.fit(x_train, y_train, epochs=10, validation_data=(x_test, y_test))

I get an output like this: https://pastebin.com/9rdZb41s
Is there anyway to turn this into a graph form? I am using Jupyter so it would be cool to have it in live graph

Pastebin

Train on 46750 samples, validate on 5195 samples Epoch 1/10 4675...

quartz stream Nov 29, 2019, 6:09 AM

#

@rancid slate Yes

#

Here is a simple example to show that

#

https://paste.pythondiscord.com/fagonebusa.py

true fiber Nov 29, 2019, 8:27 AM

#

@errant venture No, I don’t think it’s a good idea to use this, by advice @lapis sequoia . You must understand what you are doing, so you need to find the erroneous element, for example, by eliminating it by deleting columns or rows. You don’t have any infinite numbers, almost surely you just missed some empty spurious element.

df.dropna()

olive willow Nov 29, 2019, 1:30 PM

#

Hey guys can somebody help me with this

#

I've a dataframe that I want to modify

#

Country Name                   Afghanistan  ...                               Zimbabwe
Country Code                           AFG  ...                                    ZWE
Indicator Name  5-bank asset concentration  ...  Working capital financed by banks (%)
Indicator Code                  GFDD.OI.06  ...                             GFDD.AI.35
1960                                   NaN  ...                                    NaN
...                                    ...  ...                                    ...
2013                               79.6688  ...                                    NaN
2014                               86.6035  ...                                    NaN
2015                               72.1549  ...                                    NaN
2016                               71.9406  ...                                    5.8
2017                               73.6723  ...                                    NaN

#

this is a preview of it

#

I want the first 4 rows to become columns and there data points to become vertical not horizontal if that makes sense

tranquil rose Nov 29, 2019, 1:48 PM

#

how about the other rows for the years?

olive willow Nov 29, 2019, 1:49 PM

#

you know like with transpose, I've applied it to the dataset and the years should be the rows like the index rows

#

but the other columns like Country code, Name etc should become the columns

#

and all the countries should become the rows not the columns

#

like from top to bottom

serene plume Nov 29, 2019, 8:41 PM

#

So I'm trying the senet model (https://github.com/moskomule/senet.pytorch) which is supposed to yield better results than ResNet, but I keep getting 0% accuracy on my train set (throughout 10 epochs)
I'm using it as such:

model_senets = se_resnet20(num_classes=len(classes), reduction=16)
cuda = torch.device('cuda')
model = model_senets.to(cuda)

optimizer = homura_optim.SGD(lr=hp["lr"], momentum=0.9, weight_decay=1e-4)
scheduler = homura_lr_scheduler.StepLR(80, 0.1)
tqdm_rep = reporters.TQDMReporter(range(hp["epochs"]), callbacks=[callbacks.AccuracyCallback()])

trainer = homura_Trainer(model, optimizer, F.cross_entropy, scheduler=scheduler, callbacks=[tqdm
for _ in tqdm_rep:
    trainer.train(train_loader)
    trainer.test(test_loader)
    trainer.update_scheduler(scheduler)

Am I implementing it wrong? Why am I always getting 0 accuracy? 😕 (please tag me if you reply)

orchid geode Nov 30, 2019, 5:24 AM

#

anyone here can tutor R? Willing to pay, please slide into my dm pls. Thanks.

fallen anchor Nov 30, 2019, 5:52 AM

#

Hello

#

name,age,weight
mike,22,180.2
alexa,28,133.30
terry,56,
jordan,,
joey,82,138.90```

#

I got a csv like that

#

I want to specify dtypes on import

#

but the missing data is screwing it up, I get an error ValueError: Integer column has NA values in column 1

#

how do I avoid that?

lapis sequoia Nov 30, 2019, 5:55 AM

#

use fillna

fallen anchor Nov 30, 2019, 5:57 AM

#

But what am I gonna fill it with?

#

I don't want to add random numbers

#

And filling with None doesn't work

sullen wing Nov 30, 2019, 6:12 AM

#

@fallen anchor py df = pd.read_csv('test.csv').fillna(value=0)worked for me

fallen anchor Nov 30, 2019, 6:13 AM

#

hmm

#

but 0 is appropriate data

#

can I use something like -1000?

#

my actual data has temp and wind speed etc, so 0 makes sense

sullen wing Nov 30, 2019, 6:14 AM

#

You can use anything, sure

#

-1000 works as well

#

     name     age  weight
0    mike    22.0   180.2
1   alexa    28.0   133.3
2   terry    56.0 -1000.0
3  jordan -1000.0 -1000.0
4    joey    82.0   138.9
```this is what it will look like with -1000

#

You can also do float('inf')

#

     name   age  weight
0    mike  22.0   180.2
1   alexa  28.0   133.3
2   terry  56.0     inf
3  jordan   inf     inf
4    joey  82.0   138.9```

#

Infinite age and weight yes

fallen anchor Nov 30, 2019, 6:15 AM

#

interesting

#

next question

#

time,temp
2019-11-20 00:56,5
2019-11-20 01:56,
2019-11-20 02:56,8
2019-11-20 03:56,
2019-11-20 04:56,4
2019-11-20 05:56,
2019-11-20 06:56,
2019-11-20 07:56,
2019-11-20 08:56,0```

#

I want to interopoliate the missing data

sullen wing Nov 30, 2019, 6:20 AM

#

Do you mean, filling data?

#

df = pd.read_csv('test.csv').fillna(method='ffill')```Will give```py
               time  temp
0  2019-11-20 00:56   5.0
1  2019-11-20 01:56   5.0
2  2019-11-20 02:56   8.0
3  2019-11-20 03:56   8.0
4  2019-11-20 04:56   4.0
5  2019-11-20 05:56   4.0
6  2019-11-20 06:56   4.0
7  2019-11-20 07:56   4.0
8  2019-11-20 08:56   0.0```

fallen anchor Nov 30, 2019, 6:20 AM

#

interpolate

#

so lets say at 0 hours temp was 2c at 3 hour it was 7c, we can assume at 2hour it was around 4c

#

does that make sense?

sullen wing Nov 30, 2019, 6:26 AM

#

Yes

#

df = pd.read_csv('test.csv')
df = df.interpolate(method='linear', limit_direction='forward')```

#

               time  temp
0  2019-11-20 00:56   5.0
1  2019-11-20 01:56   6.5
2  2019-11-20 02:56   8.0
3  2019-11-20 03:56   6.0
4  2019-11-20 04:56   4.0
5  2019-11-20 05:56   3.0
6  2019-11-20 06:56   2.0
7  2019-11-20 07:56   1.0
8  2019-11-20 08:56   0.0```

#

Like this?

fallen anchor Nov 30, 2019, 6:26 AM

#

woah

#

that is built-in?

sullen wing Nov 30, 2019, 6:26 AM

#

It is, you can shorten it to py df = pd.read_csv('test.csv').interpolate(method='linear', limit_direction='forward')

#

well pandas has a lot of cool stuff

#

The method is described here https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.interpolate.html

fallen anchor Nov 30, 2019, 6:26 AM

#

that's awesome. thank you

#

what if I only want to do it for the temp column (even though this example only has temp I know)

#

Oh, I can use axis

sullen wing Nov 30, 2019, 6:29 AM

#

You can do this

#

df['temp'] = df['temp'].interpolate(method='linear', limit_direction='forward')```

#

Test datacsv time,temp,test 2019-11-20 00:56,5,1 2019-11-20 01:56,2 2019-11-20 02:56,8, 2019-11-20 03:56, 2019-11-20 04:56,4, 2019-11-20 05:56,, 2019-11-20 06:56,, 2019-11-20 07:56,, 2019-11-20 08:56,0,3Outputpy time temp test 0 2019-11-20 00:56 5.0 1.0 1 2019-11-20 01:56 2.0 NaN 2 2019-11-20 02:56 8.0 NaN 3 2019-11-20 03:56 6.0 NaN 4 2019-11-20 04:56 4.0 NaN 5 2019-11-20 05:56 3.0 NaN 6 2019-11-20 06:56 2.0 NaN 7 2019-11-20 07:56 1.0 NaN 8 2019-11-20 08:56 0.0 3.0

fallen anchor Nov 30, 2019, 6:30 AM

#

but how can I do it within the interpolate() call? wouldn't that be cleaner

#

df = df.interpolate(method='linear', limit_direction='forward', axis='temp') thows an error

#

UnboundLocalError: local variable 'ax' referenced before assignment

sullen wing Nov 30, 2019, 6:31 AM

#

You cannot, axis only accepts 0, 1 or none

#

axis : {0 or ‘index’, 1 or ‘columns’, None}, default None
    Axis to interpolate along.```

#

So you will need to split into 2

fallen anchor Nov 30, 2019, 6:32 AM

#

ahh, this works

#

df = df.interpolate(method='linear', limit_direction='forward', columns='temp')

sullen wing Nov 30, 2019, 6:33 AM

#

hmm it doesnt work for me

#

It still interpolate the extra column

fallen anchor Nov 30, 2019, 6:34 AM

#

huh

#

weird, same for me

#

whaat does it even do then

sullen wing Nov 30, 2019, 6:36 AM

#

Nothing haha

#

Well this is clean enough

#

df = pd.read_csv('test.csv')
df['temp'] = df['temp'].interpolate()```

fallen anchor Nov 30, 2019, 6:37 AM

#

I will use that

#

ValueError: time-weighted interpolation only works on Series or DataFrames with a DatetimeIndex

#

I get that error when I try to use the time method

#

df['temp'] = df['temp'].interpolate(method='time')

#

even though column 0 is time

#

I even added this df = df.set_index('time')

#

Fixed it

#

df['time'] = pd.to_datetime(df['time'])
df = df.set_index('time')
df['temp'] = df['temp'].interpolate(method='time')```

supple ferry Nov 30, 2019, 10:42 PM

#

@fallen anchor all time rated such operations require datetimeindex dtype which can be set as you did with pd.to_datetime or you can specify time column during read csv

fallen anchor Nov 30, 2019, 11:28 PM

#

how do I do the latter?

#

it would be a 2 in 1, convert to dt and set as index

supple ferry Dec 1, 2019, 4:15 AM

#

@fallen anchor look at the parse dates argument in read_csv. Also, date_parser can be of help

#

They are both arguments of read_csv

ruby vortex Dec 1, 2019, 6:22 AM

#

Hi everyone I am looking for some person whom I can work with on some data science/Machine learning project. If anyone is working on some data science project I would happy to be part of team. I am an undergrad student and want to gain skills and experience in deep learning. DM me if you have some project. Thanks

acoustic mural Dec 1, 2019, 4:29 PM

#

just saw a picture of the Two Minute Papers guy... not at all what I expected based on his voice

#

no other takeaways, just that i pictured him EXTREMELY different

#

to keep it on topic, it's in this (WILDLY INTERESTING/CONCERNING) video https://www.youtube.com/watch?v=38ZXwJj6j8k

YouTube

Two Minute Papers

All Hail The Mighty Translatotron!

❤️ Pick up cool perks on our Patreon page: https://www.patreon.com/TwoMinutePapers My talk and the full panel discussion at the NATO conference (I start at a...

▶ Play video

fallen anchor Dec 1, 2019, 6:20 PM

#

what is a common thing to fillna with? np.nan ?

fallen anchor Dec 1, 2019, 9:19 PM

#

nevermind, np.nan is the default anyway, no need to set it

lime cradle Dec 2, 2019, 12:13 AM

#

Hey guys I kinda need help with making a simple supervised machine learning code for my science project where imputing an integer will respond with a win or a loss based on given data if someone could help me out that would be awesome dm me.

tranquil rose Dec 2, 2019, 12:52 AM

#

@lime cradle it would be better to ask in the channel instead

#

ask a specific question

fallen anchor Dec 2, 2019, 12:53 AM

#

that is some confusing wording ^ @tranquil rose

lime cradle Dec 2, 2019, 12:54 AM

#

Well I am very new to coding and I was doing some research and it seems like I am going to be using a logistic regression code and I was wondering if there was a an algorithm that is already made that I could use for my project

fallen anchor Dec 2, 2019, 12:54 AM

#

https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html

#

TF and keras probably have them too

lime cradle Dec 2, 2019, 12:56 AM

#

So would i use the desciion_fuction method for the link that you sent?

fallen anchor Dec 2, 2019, 12:57 AM

#

I've never actually use scikit

#

I just know it is widly popular

lime cradle Dec 2, 2019, 12:57 AM

#

ok are you more familiar with tensor flow or keras

fallen anchor Dec 2, 2019, 12:57 AM

#

TF

lime cradle Dec 2, 2019, 12:57 AM

#

ok

fallen anchor Dec 2, 2019, 12:58 AM

#

but really I don't know TF well either

lime cradle Dec 2, 2019, 12:58 AM

#

Is there any way if you could help me set up the program if I find the coding to use

#

well if you arent I just looked up a tutorial so wish me luck

fallen anchor Dec 2, 2019, 12:59 AM

#

I wish I could, but my knowledge is limited

lime cradle Dec 2, 2019, 1:01 AM

#

ok thank you for the help

storm gate Dec 2, 2019, 6:22 PM

#

How does one get the mode of a groupby in pandas?

deft harbor Dec 2, 2019, 7:53 PM

#

.mode()?

#

📎 Screenshot_from_2019-12-02_12-53-00.png

lapis sequoia Dec 3, 2019, 12:21 AM

#

lol

#

dammit Gir

fading cloak Dec 3, 2019, 2:23 AM

#

does any one know what queue model applies to a queue that has a single queue with multiple processors but where each process requires N processors?

#

I figured the first part would be M/M/c but that only applies when each process use 1 processor

#

I'm trying to make a simulation

obtuse skiff Dec 3, 2019, 4:45 AM

#

For anyone familiar with pyspark. Im getting the error "Caused by: java.lang.ClassCastException: java.lang.Long cannot be cast to scala.collection.Seq"

First using ALS, Im turning my prediction values from a dataframe to a rdd, then putting that into RankingMetrics. Then calling that evaluators .meanAveragePrecision

the error is occuring when calling .meanAveragePrecision

Any idea why this could be happening?

#

its wierd because im using python not scala, so idk what its trying to use scala

deft harbor Dec 3, 2019, 4:50 AM

#

https://stackoverflow.com/questions/58355139/java-lang-classcastexception-java-lang-string-cannot-be-cast-to-scala-collectio

Stack Overflow

java.lang.ClassCastException: java.lang.String cannot be cast to s...

I am doing something like this

val domainList = data1.select("columnname","domainvalues").where(col("domainvalues").isNotNull).map(r => (r.getString(0), r.getListString.asScala.toList)).

#

have you look at something like this

obtuse skiff Dec 3, 2019, 4:55 AM

#

@deft harbor Whats confusing me, is that Im using python, not scala

deft harbor Dec 3, 2019, 5:00 AM

#

this is beyond me, but my guess is that spark itself is trying to call something

#

perhaps there is an issue with configuration or a particular package it is trying to call

#

sorry im not more of a help

obtuse skiff Dec 3, 2019, 5:04 AM

#

hmm, let me check the parameters for the rdd. looks like your right, it calls a scala method

urban shore Dec 3, 2019, 5:46 AM

#

a friend and i are trying to get into the realm of data science/ml, any recommended videos to watch or simple projects to try?

obtuse skiff Dec 3, 2019, 5:47 AM

#

Is anyone familiar with RankingMetrics from mllib pyspark?

I have my results from ALS which is in dataframe form, but idk what values Im supposed to passinto RankingMetrics for when I create the rdd for it
I see something called predictionAndLabels but Im not understanding what those values are in relation to the values I get for transforming the test data in the ALS model

#

@urban shore look up k nearest neighbor (KNN), for movie reviews. also tfidf is used for it. Its one of the starting projects you can do. if you can get that to work with sklearn, you can do most other similiar classification models

barren bluff Dec 3, 2019, 11:28 AM

#

Hey im working on a final project for school where I have to build a CNN and plot some interesting results. Im using keras on the fashion mnist dataset, but right now I only have a plot showing training loss and accuracy on my data set with and without regularization. So I was wondering do any of you have any good ideas on stuff I could plot to show interesting results?

lapis sequoia Dec 3, 2019, 11:47 AM

#

1. final project for school - somewhat relevant.. ok
2. build a cnn - for what?
3. need to know what you're applying it for, to tell you what an interesting result is

barren bluff Dec 3, 2019, 11:49 AM

#

Im trying to classify images in the Fashion-MNIST dataset by zalando using a convolutional neural network.

#

Dont know what else to say

lapis sequoia Dec 3, 2019, 11:50 AM

#

classifying images.. there you go

#

why do you think it's interesting to show training loss

barren bluff Dec 3, 2019, 11:50 AM

#

pr epoch

lapis sequoia Dec 3, 2019, 11:50 AM

#

if your model was what you're showcasing, you can show how your metrics improve for different methods

barren bluff Dec 3, 2019, 11:51 AM

#

Well, I just wanted to show if the model is over or underfitting

#

oh so you mean like log the results after tweaking parameters?

lapis sequoia Dec 3, 2019, 11:53 AM

#

yes... compare the metrics

#

you could try things like data augmentation..

#

also can try sequencing the data (sequence of images) and trying to make a prediction on that..

barren bluff Dec 3, 2019, 11:54 AM

#

Not sure I understand the last two points 🙂

polar acorn Dec 3, 2019, 2:22 PM

#

@barren bluff If you are mostly interested in nice ways to present your results, you could showcase a confusion matrix of your results. Or visualise both training and validation error when training. Or if you have sometime and some curiosity you can look into something like shap and try to interpret why your models predicts a certain class for a certain picture. https://github.com/slundberg/shap

barren bluff Dec 3, 2019, 2:45 PM

#

yeah I actually just did that @polar acorn but it looks a bit funky for some reason

#

looks like this

#

📎 unknown.png

#

this is the code the generated the plot:

def plot_confusion_matrix(cm, classes,
                          normalize=False,
                          title='Confusion matrix',
                          cmap=plt.cm.Greens):
    
    plt.imshow(cm, interpolation='nearest', cmap=cmap)
    plt.title(title)
    plt.colorbar()
    tick_marks = np.arange(len(classes))
    plt.xticks(tick_marks, classes, rotation=90)
    plt.yticks(tick_marks, classes)

    if normalize:
        cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]

    thresh = cm.max() / 2.
    for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
        plt.text(j, i, cm[i, j],
                 horizontalalignment="center",
                 color="white" if cm[i, j] > thresh else "black")
    
    plt.ylabel('True label')
    plt.xlabel('Predicted label')

#

Any of you know how I can fix the block at the top and bottom?

#

row***

polar acorn Dec 3, 2019, 2:56 PM

#

Idk, try to set plt.ylim(cm.shape[0] - 0.5, -0.5)?

barren bluff Dec 3, 2019, 2:56 PM

#

yeah let me try that

#

inside of the for loop right?

#

that fixed it gosh darn!

#

Why would I even need that block @polar acorn ?

#

doesnt plt.yticks(tick_marks, classes) define that value already?

polar acorn Dec 3, 2019, 3:00 PM

#

It defines where to put the ticks I think not necessarily the axis limits? Anyhow I think that example used to work without that line in a previous matplotlib version and they changed something with update without updating that example.

mystic ravine Dec 3, 2019, 6:45 PM

#

I'm still learning about nlp and ml, sorry for the newbie question, maybe wrong channel
I have a classified dataset with
Title (string), is_drama (bool), is_scifi (bool)

I'm looking for something that could suggest if it's drama or scifi, based on a Title input, but I have no idea of algorithm or method, it could be a link, video, or anything that could help, thanks in advance

sand lark Dec 3, 2019, 9:15 PM

#

@mystic ravine you can use an LSTM

#

http://cs231n.stanford.edu/syllabus.html, look for the slides on RNN (recurrent networks) and they also have a link to Goodfellow's DL book which has a section on RNNs

mystic ravine Dec 3, 2019, 9:16 PM

#

Thanks, I'll try it

oblique belfry Dec 3, 2019, 10:02 PM

#

Do you all use something like Luigi, Airflow, or some other DAG orchestrator for your data science/machine learning pipelines? Is there a reason to use one of these frameworks versus not using one? I am trying to evaluate the usefulness of them, and I am looking at potentially implementing this at work. I like standardization of processes, but I do not want to want over engineer.

silent swan Dec 3, 2019, 11:46 PM

#

better: use BERT/RoBERTa

lapis sequoia Dec 4, 2019, 1:51 AM

#

you use one of those.. usually airflow..

#

just depends how mature it is.. and how much you think it'll be maintained as you take the risk of deploying it over one framework..

oblique belfry Dec 4, 2019, 3:22 PM

#

https://github.com/quantumblacklabs/kedro
Seems like a competitive alternative to Luigi and Airflow.

GitHub

quantumblacklabs/kedro

A Python library that implements software engineering best-practice for data and ML pipelines. - quantumblacklabs/kedro

fallen anchor Dec 4, 2019, 5:49 PM

#

Any recommendations for an online course of series of courses for scienc

#

I was thinking edx our coursera

#

Maybe even a stats course

#

Hopefully something comprehensive

trail island Dec 4, 2019, 8:26 PM

#

Is it normal to get 0.0 as a pvalue?

#

just seems a little unlikely to get it 4 hypothesis tests in a row

plain badger Dec 4, 2019, 8:38 PM

#

there's nothing inherently unreasonable about it

trail island Dec 4, 2019, 8:41 PM

#

even from a large real world data set?

plain badger Dec 4, 2019, 8:41 PM

#

sure

trail island Dec 4, 2019, 8:41 PM

#

ok

plain badger Dec 4, 2019, 8:42 PM

#

i mean depending what you're testing, the fact that it's a large dataset might make it way more likely to have a v small p value

#

like testing for normality when youve got a big dataset that's very not normal

trail island Dec 4, 2019, 8:42 PM

#

import scipy.stats as st

mean_elo_n_project_team = assigned_team_df['elo_n'].mean()
print("Mean Relative Skill of the assigned team in the years 1996 to 1998 =", round(mean_elo_n_project_team,2))

mean_elo_n_your_team = your_team_df['elo_n'].mean()
print("Mean Relative Skill of your team in the years 2013 to 2015  =", round(mean_elo_n_your_team,2))


# Hypothesis Test
# ---- TODO: make your edits here ----
test_statistic, p_value = st.ttest_ind(assigned_team_df['elo_n'], your_team_df['elo_n'])

print("Hypothesis Test for the Difference Between Two Population Means")
print("Test Statistic =", round(test_statistic,2)) 
print("P-value =", round(p_value,4))

#

im comparing two nba teams from two different time periods

#

so not testing for normalcy i think

plain badger Dec 4, 2019, 8:44 PM

#

the same 2 nba teams?

trail island Dec 4, 2019, 8:44 PM

#

yes

plain badger Dec 4, 2019, 8:44 PM

#

you should be doing a paired differences test

trail island Dec 4, 2019, 8:45 PM

#

oh

#

how do you know that?

plain badger Dec 4, 2019, 8:46 PM

#

ttest_ind is a 2 sample t-test which is for 2 independent samples. paired differences is for testing a difference between 2 dependent samples

#

like the weight of your family in 2018 vs. their weights in 2019 is two dependent samples because it's the same family members

chilly geyser Dec 4, 2019, 8:47 PM

#

0.0 as a pvalue
Are you 'allowed' to state this p-value? Because if not, it's probably safer to state that the p-value is below machine epsilon (usually 10^-16)

plain badger Dec 4, 2019, 8:48 PM

#

i dunno i've never seen 10^-16 lol. usually something like < 0.001

chilly geyser Dec 4, 2019, 8:48 PM

#

If values are assumed to follow either t or norm-dist, a p-value of zero indicates an unbounded difference.

#

Alternatively, find out the actual t or z-score, and use logarithmic scale

#

If the logarithmic scale also breaks, your problem's precision is really problematic (since we're talking in orders of magnitudes when talking logarithmic scales)

trail island Dec 4, 2019, 8:50 PM

#

oh im sorry Naarkie I meant they are 2 different teams, i missunderstood your question.

#

Denver Nuggets 2013-2015
Chicago Bulls 1996-1998

chilly geyser Dec 4, 2019, 8:51 PM

#

Oh I realise you have this
print("P-value =", round(p_value,4))
Yes you should state p-value<0.00005 instead (That's with respect to your round function)

#

>>> round (0.00004,4)
0.0

plain badger Dec 4, 2019, 8:52 PM

#

yeah then that's fine

trail island Dec 4, 2019, 8:52 PM

#

oh

chilly geyser Dec 4, 2019, 8:53 PM

#

FWIW, physics uses 5-sigma or p-value ~ 3.5 * 10^-7

trail island Dec 4, 2019, 8:53 PM

#

print("P-value =", round(p_value, 10))

#

no wait

#

no mater what i round to, it comes out as 0.0

chilly geyser Dec 4, 2019, 8:55 PM

#

What happens when you try just
print(p_value)

trail island Dec 4, 2019, 8:59 PM

#

ooooooo

#

P-value = 1.604719099435058e-51

#

strange, idk why it would round to 0.0

chilly geyser Dec 4, 2019, 9:01 PM

#

Um bro, e-51 is extremely small

#

It means 0.[fifty or so zeros]1604719 ....

trail island Dec 4, 2019, 9:02 PM

#

omg

chilly geyser Dec 4, 2019, 9:02 PM

#

Your choices are

claim p-value is that number - I'm too skeptical of machine floating point to do so
claim p-value below any usual significance level, which that p-value is definitely

trail island Dec 4, 2019, 9:03 PM

#

ok

#

i dont think i understand p-value like i thought

#

p-value is the probability that the hypothesis is true right?

chilly geyser Dec 4, 2019, 9:05 PM

#

I will quote wiki because it's very specific
probability of obtaining test results at least as extreme as the results actually observed during the test, assuming that the null hypothesis is correct

#

Your null hypo is that the means are the same

trail island Dec 4, 2019, 9:07 PM

#

i see

chilly geyser Dec 4, 2019, 9:07 PM

#

The at least as extreme as the results refers to how different your results are from this assumption - with an underlying assumption that the means follow gaussian distributions (if not, just central limit theorem)

#

So at least as extreme refers to mean differences very far from 0

#

The at least part means inequalities - and basically it means you can integrate from negative infinity up to that point, OR on the other side, from positive infinity down towards some positive-side number (2-sided)

#

Either way the 2-side or 1-side impacts the integral value with dividing by 2 (since the Gaussian is symmetric) and/or shifting the comparison values - which doesn't really matter with your p-value at that kind of magnitude

trail island Dec 4, 2019, 9:11 PM

#

holy shit

#

mmScaredMilk

#

thats a ton of information, i have more research to do i can see!
thanks so much for the help 😄

deft harbor Dec 5, 2019, 4:20 AM

#

take a basic stats class

native stag Dec 5, 2019, 4:25 PM

#

read The Elements of Statistical Learning on stanfords website

subtle glade Dec 5, 2019, 6:57 PM

#

Is there an R discord someone could send me a discord link for

polar acorn Dec 5, 2019, 7:10 PM

#

Suspect you'll have better luck with either slack or irc

rocky maple Dec 5, 2019, 8:28 PM

#

Is there anyone particularly experienced in Keras?

I'm trying to build essentially a deep learning hashing algorithm. I have a Keras model, and I'll feed it an image, and another version of the same image with noise/rotations/crops, whatever else I want it to be invariant to. I run both through the same autoencoder, and I train on the similarity between the two vectors, trying to get them as close as possible.

But, there's a problem with this approach. If all that you do is nudge similarity closer together, then all your vectors will end up looking the same no matter what. So, I'm also running the original through an autodecoder and training both models on that too.

I have two loss functions. One that trains the autoencoder by comparing the Cartesian distance between the vectors of the original and the scrombled image, and another loss function another that trains both the autoencoder and the autodecoder on how well it can reconstruct the original image using the vector. Hopefully this combination of loss functions will yield a well trained model.

The issue comes in implementation. This is actually my first project, and I'm not very familiar with setting up branching networks like this in Keras. If I was doing something sequential it would be easy, but I have some questions.

The docs say that you can use Models like Layers, which are really just tf Tensors. How do I get that to work with multiple outputs? Furthermore, if I incorporate one model into another and train it, does it train both?
Right now how I have it set up is I'm passing it two images. In my autoencoder Model I define convolutional and max pooling layers, then some dense layers, and apply them all on both images in the correct order. My model does the same thing twice. But in "production," I only want to give it one and have it tell me what the autoencoder says. How would I rewrite it to do so, and link up the loss functions correctly?

arctic wedgeBOT Dec 5, 2019, 8:39 PM

#

Hey @rocky maple!

It looks like you tried to attach a file type that we do not allow. We currently allow the following file types: .3gp, .3g2, .avi, .bmp, .gif, .h264, .jpg, .jpeg, .m4v, .mkv, .mov, .mp4, .mpeg, .mpg, .png, .tiff, .wmv.

Feel free to ask in #community-meta if you think this is a mistake.

rocky maple Dec 5, 2019, 8:40 PM

#

So I can't attach notebooks. Alright.

acoustic mural Dec 5, 2019, 10:22 PM

#

@rocky maple
1)
a) each output is one input into the following layers
b) you can freeze layers so they aren't adjusted during training
2) huh? i don't follow sorry

rocky maple Dec 5, 2019, 10:24 PM

#

But Keras allows you to branch and share layers, correct?

#

Is isinstance(Model, Layer) true?

#

Also, I do absolutely want to train every single value, but only on specific things

silent swan Dec 6, 2019, 12:35 AM

#

I'm gonna be annoying and recommend pytorch as usual

polar acorn Dec 6, 2019, 12:49 AM

#

Deep self insight at least 😉

oblique belfry Dec 6, 2019, 12:50 AM

#

I wasn't a fan of Pytorch at first, but I am really like the design concepts behind it now. My team uses Keras extensively, but I am going to reimplement a work project in Pytorch to show the team the difference.

The only thing that worries me about Pytorch is deployment in production. But, I am just waiting on more blog posts about that.

polar acorn Dec 6, 2019, 12:52 AM

#

I keep wanting to try PyTorch, but I feel I should properly learn tf 2.0 first to evaluate my options better.

wraith bluff Dec 6, 2019, 1:01 AM

#

anyone here worked with speech to text models/libraries before?

silent swan Dec 6, 2019, 1:04 AM

#

fair enough, if you have business needs

#

but if you need to learn just one library for your own use

#

learn pytorch

oblique belfry Dec 6, 2019, 1:06 AM

#

I really like Keras Callbacks. Pytorch doesn't have a native way to do that. There are some helper libraries like Pytorch-Lightning and Poutyne that give it a more "Keras-like" api.

#

Pytorch documentation is pretty dope though.

#

And, I seemingly never have weird Cuda issues.

rocky maple Dec 6, 2019, 4:01 AM

#

Holy shit, why did nobody ever tell me that PyTorch had a Java API?

#

Sign me up

rocky maple Dec 6, 2019, 4:37 AM

#

In Keras, do I have to output something to train on it for loss?

lapis sequoia Dec 6, 2019, 7:41 AM

#

ModuleNotFoundError: No module named 'modin.backends.pandas.parsers'

#

modin is a pain in the ass

lapis sequoia Dec 6, 2019, 11:23 AM

#

p value

lapis sequoia Dec 6, 2019, 1:33 PM

#

can you call a classification task as predictive modelling?

chilly shuttle Dec 6, 2019, 1:58 PM

#

yes

#

you're predicting the class

ruby vortex Dec 6, 2019, 2:08 PM

#

import tensorflow_datasets as tfds
dataset, info = tfds.load("imdb_reviews/subwords8k", with_info=True, as_supervised=True)

#

Can anyone execute these 2 lines and tell if he/she is getting any error. Thanks

deft harbor Dec 6, 2019, 5:31 PM

#

ModuleNotFoundError: No module named 'tensorflow_datasets'

#

longlemon

native stag Dec 6, 2019, 7:26 PM

#

so if you had limited data for something would you use an oversampling technique to resample the data? if not what would you do

vital cipher Dec 6, 2019, 7:52 PM

#

@wraith bluff was reading about it- as Qualcoms new chipset has one of its feature so was pretty interesting for me 🙂

ruby vortex Dec 7, 2019, 1:15 AM

#

@deft harbor thanks

vital cipher Dec 7, 2019, 4:40 PM

#

hi guys, just wanted to share ...trying out the sentimental analysis from the twitter api's...any suggestion or ideas to perform any specific data analysis algorithm for more learnings...open to all suggestions 🙂

grand breach Dec 8, 2019, 3:16 AM

#

When creating a virtual env in anaconda, is it possible to import packages from the base dir to virtual env or eveytime need to be installed with pip ?

grand breach Dec 8, 2019, 3:31 AM

#

Will copy pasting work?

silent swan Dec 8, 2019, 8:12 AM

#

yes, but depends on your configuration

#

why not just pip install as well

vital cipher Dec 8, 2019, 8:22 AM

#

i would suggrst to use pip to install them in your environment whenever you want!!

jolly briar Dec 8, 2019, 3:34 PM

#

@native stag ESL isn't appropriate advice to all surely? there's quite a lot of math

native stag Dec 8, 2019, 3:36 PM

#

i'm reading ISL right now i would start with that i have almost no math background and its completely understandable to me and is fantastic every data scientist should read it, i'm going to move to ESL after and i may have to learn some calc LA in between to fully understand ESL

jolly briar Dec 8, 2019, 3:37 PM

#

@native stag ISLR is a more suitable start yes

#

ESL, no

#

ISLR wouldn't be suitable to most without any maths either though

#

I don't know why you'd recommend a resource like that - but hey ho

native stag Dec 8, 2019, 3:38 PM

#

i have no maths and i'm doing fine it explains things well but ya whatever you wanna do

jolly briar Dec 8, 2019, 3:38 PM

#

good for you - i'm talking for most

native stag Dec 8, 2019, 3:38 PM

#

idk i wanted to incase people haven't heard of it sorry that it bothered you so much but gl to ya mate

jolly briar Dec 8, 2019, 3:39 PM

#

it's just not helpful to someone beginning to be recommend texts that aren't at a suitable level imo

deft harbor Dec 8, 2019, 5:46 PM

#

I started with ISLR and enjoyed it, found it really helpful

#

Pretty sure I wouldn't have been able to take as much away from it if it spent most of its time bashing me over the head with only linear algebra notation.

flint nest Dec 8, 2019, 6:34 PM

#

how hard is it to build an ai that can play a game like ticactoe
in highschool

oblique belfry Dec 8, 2019, 6:42 PM

#

Anyone willing to help me port a Keras model to Pytorch? I am doing a lot with Conv3d stuff.

oblique belfry Dec 8, 2019, 7:06 PM

#

Input shape is (1, 240, 320, 3)

model = Sequential()

    # Define model
    model.add(Conv3D(32, kernel_size=(3, 3, 3), input_shape=input_shape, padding="same", kernel_regularizer=l2(opt.l2), bias_regularizer=l2(opt.l2)))
    model.add(Activation('relu'))
    model.add(Conv3D(32, padding="same", kernel_size=(3, 3, 3),kernel_regularizer=l2(opt.l2), bias_regularizer=l2(opt.l2)))
    model.add(Activation('relu'))
    model.add(MaxPooling3D(pool_size=(3, 3, 3), padding="same"))
    model.add(Dropout(0.7))

    model.add(Conv3D(64, padding="same", kernel_size=(3, 3, 3)))
    model.add(Activation('relu'))
    model.add(Conv3D(64, padding="same", kernel_size=(3, 3, 3)))
    model.add(Activation('relu'))
    model.add(MaxPooling3D(pool_size=(3, 3, 3), padding="same"))
    model.add(Dropout(0.25))
    
    model.add(Conv3D(64, padding="same", kernel_size=(3, 3, 3)))
    model.add(Activation('relu'))
    model.add(Conv3D(64, padding="same", kernel_size=(3, 3, 3)))
    model.add(Activation('relu'))
    model.add(MaxPooling3D(pool_size=(3, 3, 3), padding="same"))
    model.add(Dropout(0.25))

    model.add(Conv3D(64, padding="same", kernel_size=(3, 3, 3),  kernel_regularizer=l2(opt.l2), bias_regularizer=l2(opt.l2)))
    model.add(Activation('relu'))
    model.add(Conv3D(64, padding="same", kernel_size=(3, 3, 3),  kernel_regularizer=l2(opt.l2), bias_regularizer=l2(opt.l2)))
    model.add(Activation('relu'))
    model.add(MaxPooling3D(pool_size=(3, 3, 3), padding="same"))
    model.add(Dropout(0.7))

    model.add(Flatten())
    model.add(Dense(1024, activation='relu'))
    model.add(BatchNormalization())
    model.add(Dropout(0.7))
    model.add(Dense(1, activation='sigmoid'))

    model.compile(
        optimizer=RMSprop(lr=opt.learning_rate),
        loss='binary_crossentropy',
        metrics=['accuracy'])

    return model

#

Forget the regularizers and what not. I cannot seem to translate the Flatten layer to Pytorch correctly. I cannot seem to translate the same padding options.

silent swan Dec 9, 2019, 2:45 AM

#

flatten should be doable

#

but padding is not, without manually calling a padding layer

#

pytorch and TF for some reason have conv operations with fundamentally different padding

#

you might be able to find some code that auto computes the correct manual padding for you

#

that said, whether this is worth it depending on whether you're training from scratch or reusing a trained model

#

if it's the former, you can skip the exact replication

oblique belfry Dec 9, 2019, 3:16 AM

#

Yeah....it kinda needs to be exact.

#

That is so frustrating.

silent swan Dec 9, 2019, 3:17 AM

#

if you're replicating then yea

#

worst case just go layer by layer and ensure that they match

#

also make sure to configure cudnn to be deterministic

oblique belfry Dec 9, 2019, 3:17 AM

#

I am trying to replicate the model.

silent swan Dec 9, 2019, 3:18 AM

#

the flip side is that padding is about the only hard thing to translate between the two frameworks, in terms of straightforward models

oblique belfry Dec 9, 2019, 3:19 AM

#

I am not sure how to configure the Conv3d layers properly. Keras lets you be pretty lazy, but pytorch makes you be explicit. I've tried 2 different ways to create a flatten layer, but each time I get a tensor that is so big, it doesn't fit in memory.

silent swan Dec 9, 2019, 3:20 AM

#

explain what shapes you're trying to go from/to

#

you'll come to love how explicit pytorch is

oblique belfry Dec 9, 2019, 3:25 AM

#

I don't think I understand your question. What about shapes?

#

Oh, I am loving pytorch, until this project. lol

#

Nah, I like it.

silent swan Dec 9, 2019, 3:26 AM

#

you're trying to flatten right? that's manipulating the shape of hidden activations

#

put another way, you should be able to find out the shape of your outputs at every stage of that model

#

that's important for translating between tf and pytorch

oblique belfry Dec 9, 2019, 4:05 AM

#

I am not sure exactly what the dimensions were, but out of the convolution block, I ran x.view() to flatten the layer. This was an example I found online to flatten the layer.

silent swan Dec 9, 2019, 4:06 AM

#

that should work

oblique belfry Dec 9, 2019, 4:18 AM

#

2019-12-08 22:17:20,646 - ERROR - root: [example.py:95] Given input size: (512x1x29x39). Calculated output size: (512x0x14x19). Output size is too small

#

https://gist.github.com/hammacktony/d365339bbf8db3fd36b2c21645a39341

Gist

test.py

GitHub Gist: instantly share code, notes, and snippets.

#

Model file

silent swan Dec 9, 2019, 8:07 AM

#

which line? (your log is pointing to the line number in the full file I guess?)

#

it doesn't look like Flatten is the issue?

supple ferry Dec 9, 2019, 9:08 AM

#

hey ! anyone tried new version of Spyder? 4.0 i presume it is.
any feedbacks ??

idle oracle Dec 9, 2019, 11:10 AM

#

Yo anyone familiar with pytorch, because i have a problem and cant seem to figure out what it is.
ahh

#

#Choose device for training
if torch.cuda.is_available:
    device = torch.device("cuda:0")
    print("Running on GPU")
else:
    device = torch.device("cpu")
    print("running on CPU")

net.to(device)
net = Net().to(device)


# Print out training information, set epoch range to train

for epoch in range(50): # (n) full passes over the data # set to ridiculous amount if using accepted value
    for data in testset:  # `data` is a batch of data
        X, y = data  # X is the batch of features, y is the batch of targets.
        X, y = X.to(device), y.to(device)
        net.zero_grad()  # sets gradients to 0 before loss calc. You will do this likely every step.
        output = net(X.view(-1,784))  # pass in the reshaped batch (recall they are 28x28 atm)
        loss = F.nll_loss(output, y)  # calculate and grab the loss value
        loss.backward()  # apply this loss backwards thru the network's parameters
        optimizer.step()  # attempt to optimize weights to account for loss/gradients
    print(loss)  # print loss
    # Adding accepted value, Comment out if need be
    if loss <= (accploss):
        break

#

anyway, the loss i get remains constant and I'm either blind to the problem, or did something really dumb and don't know how to code for the life of me.

#

tensor(2.3093, device='cuda:0', grad_fn=<NllLossBackward>)
tensor(2.3093, device='cuda:0', grad_fn=<NllLossBackward>)
tensor(2.3093, device='cuda:0', grad_fn=<NllLossBackward>)
tensor(2.3093, device='cuda:0', grad_fn=<NllLossBackward>)
tensor(2.3093, device='cuda:0', grad_fn=<NllLossBackward>)```

#

sample output
I'd appreciate any help whatsoever, thx in advanced

oblique belfry Dec 9, 2019, 12:24 PM

#

maybe set net.train()

#

I'd also set the graddients on the optimizer to zero instead of the model.

#

# From the MNIST Pytorch example

def train(args, model, device, train_loader, optimizer, epoch):
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()
        output = model(data)
        loss = F.nll_loss(output, target)
        loss.backward()
        optimizer.step()
        if batch_idx % args.log_interval == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(train_loader.dataset),
                100. * batch_idx / len(train_loader), loss.item()))

lapis sequoia Dec 9, 2019, 2:25 PM

#

guys i was looking into the functional API of keras and came across a problem [a doubt] why do they use the input shape as (64, 64, 1) when the input is just a 64x64 image
this is the code

# Convolutional Neural Network
from keras.utils import plot_model
from keras.models import Model
from keras.layers import Input
from keras.layers import Dense
from keras.layers import Flatten
from keras.layers.convolutional import Conv2D
from keras.layers.pooling import MaxPooling2D
visible = Input(shape=(64,64,1))
conv1 = Conv2D(32, kernel_size=4, activation='relu')(visible)
pool1 = MaxPooling2D(pool_size=(2, 2))(conv1)
conv2 = Conv2D(16, kernel_size=4, activation='relu')(pool1)
pool2 = MaxPooling2D(pool_size=(2, 2))(conv2)
flat = Flatten()(pool2)
hidden1 = Dense(10, activation='relu')(flat)
output = Dense(1, activation='sigmoid')(hidden1)
model = Model(inputs=visible, outputs=output)
# summarize layers
print(model.summary())
# plot graph
plot_model(model, to_file='convolutional_neural_network.png')

#

i think i got it !!!
is it for mentioning the number of channels?

polar acorn Dec 9, 2019, 2:32 PM

#

Yep 👍

lapis sequoia Dec 9, 2019, 2:32 PM

#

im a genius

#

xD

oblique belfry Dec 9, 2019, 2:43 PM

#

Yep...channels last. Seting that properly is extremely important...speaking from experience

silent swan Dec 9, 2019, 7:07 PM

#

@idle oracle yes check the gradients after one backward pass

#

@oblique belfry did you figure it otu?

oblique belfry Dec 9, 2019, 7:39 PM

#

nope

hardy crag Dec 9, 2019, 8:50 PM

#

@idle oracle as @oblique belfry already said, you need to call optimizer.zero_grad() instead of net.zero_grad()

silent swan Dec 9, 2019, 11:25 PM

#

the two are generally interchangeable unless you have a very esoteric training scheme

#

optimizer.zero_grad is recommended, but net.zero_grad will still zero out the gradients for the model

#

@oblique belfry well lemme know if I can help. I've had to port models back and forth between TF and pytorch multiple times, so I'm well aware of the pain points

lapis sequoia Dec 10, 2019, 1:11 AM

#

hey

#

trying to figure out how to parse OCR'd text that has arbitrary yet somewhat similar formatting

#

is there some kind of matching system that works well for messed up OCR

#

like for instance i might want to match "FURCHASE ORDER NO." with "PURCHASE ORDER NO." as my search criteria

idle oracle Dec 10, 2019, 4:59 AM

#

@hardy crag nah i tried, didn't work

#

its most likely an issue with throwing things with to.device, becaus an older version works.

silent swan Dec 10, 2019, 5:11 AM

#

check the gradients! thx

idle oracle Dec 10, 2019, 5:14 AM

#

how would you go about doing that

#

i belive my grads are none

silent swan Dec 10, 2019, 5:15 AM

#

if you call net.parameters(), you should get a list of parameters

#

check the .grad on each one. It should be none or 0 at the start, and after a forward pass and loss.backward, the .grad should be tensors

#

let me know if you do or do not see that

#

actually I have another guess. Can you show me the code you're using including where you initialize the optimizer?

idle oracle Dec 10, 2019, 5:18 AM

#

yea

#

one sec

arctic wedgeBOT Dec 10, 2019, 5:18 AM

#

Hey @idle oracle!

It looks like you tried to attach a file type that we do not allow. We currently allow the following file types: .3gp, .3g2, .avi, .bmp, .gif, .h264, .jpg, .jpeg, .m4v, .mkv, .mov, .mp4, .mpeg, .mpg, .png, .tiff, .wmv.

Feel free to ask in #community-meta if you think this is a mistake.

idle oracle Dec 10, 2019, 5:18 AM

#

ok....

#

https://pastebin.com/DkEJAtmD

Pastebin

[Python] # -*- coding: utf-8 -*- """SoftwareAIModel.ipynb Auto...

#

@silent swan so now the loss values change, but...

#

tensor(2.2650, device='cuda:0', grad_fn=<NllLossBackward>)
tensor(2.2917, device='cuda:0', grad_fn=<NllLossBackward>)
tensor(2.3413, device='cuda:0', grad_fn=<NllLossBackward>)
tensor(2.2748, device='cuda:0', grad_fn=<NllLossBackward>)
tensor(2.3047, device='cuda:0', grad_fn=<NllLossBackward>)
tensor(2.2935, device='cuda:0', grad_fn=<NllLossBackward>)
tensor(2.2991, device='cuda:0', grad_fn=<NllLossBackward>)
tensor(2.3095, device='cuda:0', grad_fn=<NllLossBackward>)
tensor(2.2907, device='cuda:0', grad_fn=<NllLossBackward>)

#

their pretty stagnant

#

Ok, I think I have the problem identified, it must be a .to (device issue)

#

but i dont kno whow to fix it

silent swan Dec 10, 2019, 6:03 AM

#

try creating the optimizer all the to(device) stuff

idle oracle Dec 10, 2019, 9:41 AM

#

yea imma stick to cpu

#

small probelm anyway

#

stick a xeon to it and she'll be fine

silent swan Dec 10, 2019, 9:55 AM

#

lol we could just fix the issue

oblique belfry Dec 10, 2019, 9:59 AM

#

Unless your model is tiny, use the GPU.

idle oracle Dec 10, 2019, 12:51 PM

#

Honestly i now have a massive problem with weights

#

for some weird reason it seems like every time i reset the runtime and create a new model (code creates a fresh one) there is always a single weight un accounted for, with a value of 0.000, where as the other have massive negative values.

#

tensor([-23179.8730, -17778.3848, -14537.1084, -31701.2402, -27408.4082,
        -20759.7539, -24848.2812,      0.0000, -38601.3164, -40405.9219],
       grad_fn=<SelectBackward>)
tensor(7)

#

so now i have 7 with nothing in it

#

it goes , 0,1,2,3,4... etc

#

and the rest have massive neg values

#

i and going to sleep no cuz almost 12:00AM so i will chk back in the morning. about 7 hrs from now

#

i have a feeling its because my training set is like in colour

📎 unknown.png

#

and im testing on

📎 unknown.png

#

suggestion son how to change format?

oblique belfry Dec 10, 2019, 1:25 PM

#

You could grayscale the training data.

#

However.....a good neural net should be able to handle that.

#

Also, matplotlib's plots aren't always the most indicative of the data. The first image almost looks like a heatmap.

lapis sequoia Dec 10, 2019, 2:00 PM

#

I need some help

#

is it normal for a vm instance to idle after a heavy computation..

#

my cpu utilization near maxed out during training for a few hours.. now it's done but in the model exporting phase it's not doing anything and I barely see a blip on the utilization..

lapis sequoia Dec 10, 2019, 4:52 PM

#

📎 unknown.png

#

how can I understand the unitvector (r_a) rightside and the ornage block U_s

#

I tried googling but I do not konw how this works. I want to understand it

silent swan Dec 10, 2019, 8:25 PM

#

is your input 28x28x3 or 28x28x1?

#

in any case, no, if the color scale is modified, you should have no prior expectation that the model should still work

#

would you mind posting your whole code again?

idle oracle Dec 10, 2019, 8:41 PM

#

yea ok

#

https://pastebin.com/bWu1k8TL

Pastebin

[Python] Code - Pastebin.com

#

@silent swan what if my training set it inverted form what im testing

#

from*

#

it thin kit was sensitive to color

#

@silent swan it works now,

#

import matplotlib.pyplot as plt
import matplotlib.image as mpimg
from PIL import Image
import PIL.ImageOps 

import numpy
im = Image.open("number.png")
img = PIL.ImageOps.invert(im)
num_img = numpy.array(img)
lum_img = num_img[:, :, 0]
plt.imshow(lum_img)

#

used this to apply the 'LUT' like the data set, inverted it thenis works with 99.6% accuracy

oblique belfry Dec 10, 2019, 9:13 PM

#

I use cv2 for image transformations. Found it to beat Pillow and scikit-image in terms of speed.

upper ginkgo Dec 10, 2019, 10:00 PM

#

I'm trying to make a family tree image like MarriageBot's for my own bot
Here's a family tree from MarriageBot:

📎 tree.png

#

I'd like to achieve something similar, except I'm using PIL to make a family tree with this design:

📎 tree.png

#

I've never made something like this, how would I do it while keeping the design?

silent swan Dec 10, 2019, 10:16 PM

#

inversion of inputs can break a model

oblique belfry Dec 10, 2019, 10:49 PM

#

My coworkers worked on this massive project for weeks and they were getting nowhere. Ends up they had the inputs reversed. They wasted weeks. "channels_first" and "channels_last" are probably the most important words in image recognition.

Harder to mess up in Pytorch though since you have to be so explicit. Keras lets you be a bit too lazy at times.

silent swan Dec 10, 2019, 11:11 PM

#

exactly!

#

too much automatic inference and things become hard to debug

oblique belfry Dec 10, 2019, 11:16 PM

#

For the most part, it is pretty cool.

#

I just am not a fan of Keras/TF exceptions. Curse of the static graph....

#

That alone might move me to Pytorch.

Also projects like Poutyne are pretty great. It gives a Keras-like interface to Pytorch. I can re-implement all my custom Keras callbacks and it will behave in a similar way as the original. I really didn't want to really configure all that again.

silent swan Dec 10, 2019, 11:19 PM

#

I've never gotten into the whole callback style of programming. For Keras, it really feels more like a hack because Keras owns all of your control flow, so you have to play by its rules

#

PyTorch is a little more verbose, but in returns you basically control everything

#

but yes, there are definitely similar keras-like wrappers for pytorch

oblique belfry Dec 10, 2019, 11:22 PM

#

Saves me on boilerplate. I just like running a set of functions at the end. I really don't care about controlling all that. (For one project, I have a Socket.IO callback.) Just let that run at the end, I don't really care. I think the Callbacks are a nice abstraction. I think it makes the training code cleaner. But, that is more for my readability than true fucntionality.

#

I like how TF 2.0 essentially took everything people like about Pytorch and try to implement that in their stuff.

silent swan Dec 10, 2019, 11:23 PM

#

well, I think they tried to at least

oblique belfry Dec 10, 2019, 11:24 PM

#

Instead of trying to bring in dynamic graphs (okay...technically it is called eager execution) in TF, they should have done an AngularJS and Angular 2 thing. Just do a rewrite and do it the right way.

#

Right now it is still a mess.

silent swan Dec 10, 2019, 11:24 PM

#

I think TF fundamentally targets a different goal though

oblique belfry Dec 10, 2019, 11:25 PM

#

I always have Cuda errors with TF. Seems like Pytorch always works with whatever version of Cuda I got. That is so nice.

silent swan Dec 10, 2019, 11:25 PM

#

I think that's mostly because of the TF/Google attitude

#

"you have to do things our way"

#

that's why the TF library contains everything, to force you into their eco system

#

PyTorch feels like it's trying to serve the users

#

TF feels like you need to follow their way

oblique belfry Dec 10, 2019, 11:26 PM

#

The next ML project I get, I might do the project in Pytorch.

The only thing about Pytorch is its "deployment strategy." TF/Keras does a good job at deploying models. Pytorch, to me, is a bit lagging in this area. But, this will improve with time and more people writing up articles.

silent swan Dec 10, 2019, 11:27 PM

#

fair enough. I'm on the research side so I don't do much with deployment

#

TF has a lot of good tools for that

#

but if you just want to train models, PyTorch is generally the far better option

oblique belfry Dec 10, 2019, 11:29 PM

#

Unless you have to due Conv3D...

#

https://tenor.com/view/sad-sadness-depress-frown-aw-man-gif-8171721

Tenor

silent swan Dec 10, 2019, 11:29 PM

#

how so

oblique belfry Dec 10, 2019, 11:29 PM

#

Still can't configure the model architecture correctly. lol

silent swan Dec 10, 2019, 11:30 PM

#

that's because of the incompatibility of TF/PyTorch conv padding. Not that one or the other is more correct

#

if anything, PyTorch allows you to drop a debugger into wherever you're running into an error, TF throws you into magic C error space

oblique belfry Dec 10, 2019, 11:34 PM

#

I haven't touched a debugger since I was learning Visual Basic in undergrad. I am not the biggest fan of them. I need to get into them more. Might could help.

#

The padding is what is killing me.

silent swan Dec 10, 2019, 11:35 PM

#

show me your stack trace and the line that it's throwing an error on

oblique belfry Dec 10, 2019, 11:37 PM

#

I will tonight

idle oracle Dec 11, 2019, 8:15 AM

#

idle oracle Dec 11, 2019, 9:25 AM

#

i have this image, is there a way to center it. Usinng np?

📎 number4.png

native rivet Dec 11, 2019, 11:52 AM

#

any machine learning engineer here?

oblique belfry Dec 11, 2019, 3:40 PM

#

@native rivet I’ll do my best. What’s up?

silent swan Dec 11, 2019, 5:17 PM

#

you can do a think like find the bounding box for non-white pixels, find center of the bounding box, and then shift

pulsar stag Dec 11, 2019, 5:37 PM

#

Dash-Bootstrap-Components How to Build Layered Dashboards with Python https://youtu.be/P-XYio7G_Dg

YouTube

Enjoy the videos and music you love, upload original content, and share it all with friends, family, and the world on YouTube.

oblique belfry Dec 11, 2019, 9:08 PM

#

https://arxiv.org/abs/1912.04316
An interesting approach to action recognition. They use graph CNNS to solve the problem. Very novel. I have not read an approach like this before.

We don't have an ML channel, so sorry if it doesn't truly fit data science.

arXiv.org

STAGE: Spatio-Temporal Attention on Graph Entities for Video...

Spatio-temporal action localization is a challenging yet fascinating task
that aims to detect and classify human actions in video clips. In this paper,
we develop a high-level video understanding...

oblique belfry Dec 11, 2019, 10:53 PM

#

I am having to refactor our custom ML platform, and I just cannot seem to get started. it was built very project specific. Trying to abstract that is going to be fun. My boss is also a bit opinionated about things and so some of my changes may not even matter.

For the record: Comments are not a waste of time.

silent swan Dec 11, 2019, 11:54 PM

#

"eh, I'll just use kwargs for this"

oblique belfry Dec 12, 2019, 4:11 AM

#

Honestly, I would take that.

Had to convince him we should use classes for this one thing because it holds state, and we were just passing things through a million functions. I get that some people go a bit too far with polymorphis and inheritance, but there is a time and place. You know?

#

Currently working on an action recognition project, but we are trying to make this platform accessible for sequence data, tabular data, audio, etc.

lapis sequoia Dec 12, 2019, 4:27 AM

#

sounds fun

#

what do you mean platform though

oblique belfry Dec 12, 2019, 4:31 AM

#

It's composed of several parts. Mainly data labelling and deep learning "best practices" our company has found to be uberly succesful. Essentially standardize things and allow a ML engineer to focus on building models, not software development.

#

If anything, it has been a killer in-house tool.

lapis sequoia Dec 12, 2019, 4:45 AM

#

sounds nice

#

how did you build it

#

I'm familiar with building models but not so much on serving..

oblique belfry Dec 12, 2019, 5:16 AM

#

Can't dive too deep into it. But, the python side is pretty simple. We found caching our data in an fast, binary format gave us 5x to 10x improvement on training time. So, after the traditional ETL process, prep your data into a high performance format, then load that into a Keras generator or Pytorch Dataloader. Kind of like a head of time compiling code. Same principal.

Used React and Express to create a SPA for labeling and viewing model stats. We also found tensorbaord annoying as hell to work with.

lapis sequoia Dec 12, 2019, 5:51 AM

#

labelling?

oblique belfry Dec 12, 2019, 5:52 AM

#

data labelling.

lapis sequoia Dec 12, 2019, 5:52 AM

#

I know what it is..

#

I'm wondering how you're doing it over the app

oblique belfry Dec 12, 2019, 5:53 AM

#

There weren't any "simple" data labelling solutions. And, we weren't interested in Sagemaker.

lapis sequoia Dec 12, 2019, 5:53 AM

#

how do you save and version models

oblique belfry Dec 12, 2019, 5:54 AM

#

Labeling videos for Action Recognition isn't the easiest. Gotta search per frame.

lapis sequoia Dec 12, 2019, 5:54 AM

#

ok then

oblique belfry Dec 12, 2019, 5:56 AM

#

Save it locally. Though it isn't hard to save stuff to S3 or Azure. For each run, we save weights, logs, performance stats, and other stuff.

lapis sequoia Dec 12, 2019, 5:56 AM

#

hmm ok

vital cipher Dec 12, 2019, 3:28 PM

#

has anyone setup metabase in ubuntu/or any destrooo?

#

and used?

lapis sequoia Dec 12, 2019, 7:01 PM

#

I was reading about Decision trees on medium.. Can anyone tell me what does it mean by this paragraph 👇

#

📎 Capture.PNG

#

I dont understand what does it mean by saying pure

#

i even found the same text in the decision tree documentation

chilly geyser Dec 12, 2019, 7:15 PM

#

Pure means the leaf has only 1 category

#

To give an example, suppose you had a dataset of 6 things, of category "Blue" and "Red"

#

Suppose in your dataset you have some predictors

#

What the tree does is calculate for entropy changes ('information gain') by splitting using the predictors (if predictor <= some_value, classify in some way, else classify the other way)

#

I have terrible drawing skills but this should illustrate my point

📎 unknown.png

#

Suppose that basically of your 6 data points, 4 were "Red" and 2 were "Blue"

#

We assume for simplicity that your data is very good, so it just splits once, and then all the 2 "Blue" get together - it's now pure
The same happens for the 4 "Red" - the leaf is now pure.
Since the tree has split into pure leaves, the algorithm ends

#

The splitting and deciding of predictors is the core to the algorithm - use of Entropy/Information Gain isn't the only way - although there are reasons for doing so

fast rain Dec 12, 2019, 7:36 PM

#

hey guys

#

can anyone help me with excel stuff?

lapis sequoia Dec 12, 2019, 7:41 PM

#

Ok i get it now..

#

thanks a lot

#

I have one more question

#

If we oversplit the data in decision tree what will happen?

#

Will we overfit the data then?

chilly geyser Dec 12, 2019, 7:53 PM

#

Yes you will overfit your training set, so you need a model selector/parameters/algorithm to decide if you really need a tree that always ends in pure leaves

#

What happens is that with the fully grown tree, you can prune it at somewhere in the middle, and this is something you can do after you grow the tree

#

It's going to be a hard problem to dynamically check for overfitting while growing the tree - so that's why that is done after

#

With pruning the tree becomes something that ends in 'impure' leaves, but what's important is if the tree is really useful and generalises to actual data or use cases you want to present it towards

lapis sequoia Dec 12, 2019, 8:02 PM

#

Ok got it.. thanks a lot

deft harbor Dec 12, 2019, 10:43 PM

#

thats a pretty tree you have there

#

@lapis sequoia look at bagging and boosting as well

twilit ore Dec 12, 2019, 11:12 PM

#

Is this the appropriate channel to discuss things related to Scrapy?

deft harbor Dec 13, 2019, 1:29 AM

#

I'm not against it, but I don't know the rules

chilly shuttle Dec 13, 2019, 6:27 AM

#

if you keep it general and not 'im scraping some site that has TOS saying not to' its probably fine

#

although it's also not data science

lapis sequoia Dec 13, 2019, 6:49 AM

#

@deft harbor Sure will

#

Also in Random forests if we used a large number of n_estimators or trees will we overfit the data?

silent swan Dec 13, 2019, 8:02 AM

#

eventually in some way yes

haughty pawn Dec 13, 2019, 5:51 PM

#

hi there

#

anyone heard of ai dungeon2? question related:
can you help me out by modifying that model in a way to shove the entire save and not just the prompt + 10 8 last phrases?

#

here's the original repo https://github.com/AIDungeon/AIDungeon

GitHub

AIDungeon/AIDungeon

Infinite AI adventures await! Contribute to AIDungeon/AIDungeon development by creating an account on GitHub.

#

here's an example of a desired outcome:
you save your conversation, you load it via id and it shoves the contents of the save into the model, you continue your conversation in tact and the context is not broken

deft harbor Dec 13, 2019, 6:07 PM

#

I don't think I've seen a lot of people do the work for others here

haughty pawn Dec 13, 2019, 6:08 PM

#

sorry, but i'm not an ai(TF) programmer, so i can't really figure it out myself

twilit ore Dec 13, 2019, 8:05 PM

#

Anyone working in a project that requires scraping the web with Scrapy? I'm up to join you to learn it hands-on and eventually help you in exchange for some knowledge. I've some experience scraping and wrangling data using requests, selenium webdriver, beautifulsoup, regex, lxml, json...

If you've worked with Scrapy before but are not using it in any of your project and have some free time, I'd be interested in partnering up to tackle some whatever-payment freelance jobs using it if that pleases you.

Not sure if this is appropriate channel but seems like it's the most appropriate one.

idle oracle Dec 14, 2019, 5:36 AM

#

hey is it possible to create an AI that plays a game against you and gets better everytime?

oblique belfry Dec 14, 2019, 5:58 AM

#

That is the field of Reinforcement Learning.

native moon Dec 14, 2019, 6:26 AM

#

i have merged two csv files into a single dataframe, is there any way i can check if the row from the new merged dataframe is present in the second csv file or not?

paper niche Dec 14, 2019, 8:54 AM

#

plenty of ways. why not add a column called “source”, whose value is “first” for the first csv and “second” for the second csv, before merging them?

haughty pawn Dec 14, 2019, 11:52 AM

#

alrighty then, since you either don't know the answer to the previous question or just don't want to answer here's another ~~two~~ three:

is ML available to general populus yet (was it dumbed down for anyone to take on it)?
what is the least resource (GPU/CPU) intensive but just as performant model (compared to GPT-2)?
what model is better than GPT-2 and supports i18n?
(pardon if i mix my terms up)

lyric kernel Dec 14, 2019, 12:24 PM

#

df1 = data[data.MESS_DATUM >= 19600101]
df1 = df1[df1.MESS_DATUM <= 19981231]

How do i get this in one line ?
and and & didnt work

paper niche Dec 14, 2019, 1:25 PM

#

data[data.MESS_DATUM.between(19600101,19981231)]
``` @lyric kernel

lyric kernel Dec 14, 2019, 1:27 PM

#

crisp!

paper niche Dec 14, 2019, 1:31 PM

#

if you want to use &, you're gonna need to surround your conditions in parenthesis, like

data[(data.MESS_DATUM >= 19600101) & (data.MESS_DATUM <= 19981231)]

I'm guessing that's why your attempt didnt' work

worn stratus Dec 14, 2019, 5:16 PM

#

pipenv, venv, or conda for a machine learning project?

silent swan Dec 14, 2019, 6:19 PM

#

conda, always conda

quartz monolith Dec 14, 2019, 6:29 PM

#

has somebody used graphs database to recommend users based on keywords? The Keywords are extracted from a text and combined with the users

formal storm Dec 14, 2019, 8:55 PM

#

Hi, Is this the right place to ask for some Pandas help?

quartz monolith Dec 14, 2019, 9:25 PM

#

zes

#

yes

formal storm Dec 14, 2019, 9:35 PM

#

awesome

worn stratus Dec 14, 2019, 10:53 PM

#

@silent swan whats the advantage to conda?

slim fox Dec 14, 2019, 11:42 PM

#

imo venv would give you the most flexibility. Some packages might not be of the latest version in vonda

silent swan Dec 15, 2019, 4:21 AM

#

conda installs all the scientific computing libraries properly

#

you get both the package installer and environment manager in one

oblique belfry Dec 15, 2019, 4:27 AM

#

I've always found conda to be annoying. And the only issues I have with any scientific computing packages is Tensorflow due to it not playing with certain versions of Cuda.

silent swan Dec 15, 2019, 6:04 AM

#

what issue have you had with conda

#

also tensorflow plays nicely with NO ONE lol

uneven harbor Dec 15, 2019, 7:44 AM

#

If anyone has extra time and would like to help with the creation of my Jojo bot you can help create stands here https://docs.google.com/document/d/1o4gkz4jmROzNSp79LvOwggQBW2sUI5gZ0Ft6MF2WEPo/edit?usp=sharing

Google Docs

Botaro Stands

Help create new stands for Botaro! The format is: [‘Name’, HP Stat, Attack Stat, Speed Stat, Range Stat] For reference 500 is the mid tier hp stat, 50 is the mid tier attack stat, 50 is the mid tier speed stat, and 2 is the mid tier range stat Please make every stat ex...

lapis sequoia Dec 15, 2019, 9:04 AM

#

📎 unknown.png

#

what does this mean

#

1:06:14<2412:56:12 this part

silent swan Dec 15, 2019, 9:13 AM

#

time so far < expected time to completion probably?

lapis sequoia Dec 15, 2019, 9:16 AM

#

2000 hrs?

#

wut

oblique belfry Dec 15, 2019, 2:43 PM

#

I can’t remember exactly. But. I remember it’s just simpler to setup up a virtual env and install what I need. I also don’t mind pipenv either.

I also don’t need all those dependencies in a project.

deft harbor Dec 15, 2019, 4:06 PM

#

But it comes with sypder 😶

oblique belfry Dec 15, 2019, 5:57 PM

#

....I don’t use Spyder 😬

silent swan Dec 15, 2019, 7:15 PM

#

That's what it says. 900 seconds per iteration, and you have 10000 iterations.

spare arch Dec 15, 2019, 8:27 PM

#

Does anyone know

#

where I can get started

#

learning how to make

#

AI learn how to play games?

oblique belfry Dec 15, 2019, 8:41 PM

#

@spare arch

https://gym.openai.com/

Gym: A toolkit for developing and comparing reinforcement learning...

spare arch Dec 15, 2019, 9:17 PM

#

dope

#

thanks @oblique belfry

jolly briar Dec 15, 2019, 9:18 PM

#

@oblique belfry i found conda confusing at first as it dumps stuff into your bashrc or something, but after using it for a bit it's fine really, haven't had issues since, perhaps i don't do a fat lot with it though

oblique belfry Dec 15, 2019, 9:21 PM

#

If I was on Windows, I'd take a second look at it. But I just haven't had a need good ole pip couldnt solve.

jolly briar Dec 15, 2019, 9:42 PM

#

@oblique belfry fair, i'm only using it because the team uses it, and if it wasn't for that it's unlikely i'd have got past the initial hiccup i expect

oblique belfry Dec 15, 2019, 9:42 PM

#

I get that.

silent swan Dec 16, 2019, 12:12 AM

#

worth reading https://jakevdp.github.io/blog/2016/08/25/conda-myths-and-misconceptions/

#

if I'm not wrong conda also installs optimized versions of e.g. numpy libraries

oblique belfry Dec 16, 2019, 1:08 AM

#

I’ve read that before. Like I said, I don’t look down on conda. But there just hasn’t been a situation where I needed something else than pip to get all my ml packages going.

deft harbor Dec 16, 2019, 3:49 AM

#

You know you want spyder

twilit ore Dec 16, 2019, 3:50 AM

#

@oblique belfry Do you only use one version of python or you create venvs to replace conda?

oblique belfry Dec 16, 2019, 3:51 AM

#

Just venvs. I like the separation between projects.

twilit ore Dec 16, 2019, 3:52 AM

#

Then that's just fine.
Conda is a huge help when you need to run several python versions and libs versions without having to store python and libs versions over and over within each project folder

#

But if you're satisfied with storing libs and python for each project, that's a way to do it

paper niche Dec 16, 2019, 4:53 AM

#

it’s been a while since i last used conda, but in my experience it also takes really long for conda to resolve dependencies and when installing packages; whereas if i just want to spin up a quick virtualenv, venv+pip is usually much faster and i can get going quickly

#

i see the merits of conda as well though, but just saying it’s not for everybody and not necessarily a defacto for all ds projects

gleaming thorn Dec 16, 2019, 5:34 AM

#

how to speedup model traning using tensorflow for object detection yoloV3 Darknet its takes 5 days for 1000 itterations

silent swan Dec 16, 2019, 6:01 AM

#

faster GPU, larger batch sizes, or tweak the learning rate if you're okay with having slightly worse performance

gleaming thorn Dec 16, 2019, 6:05 AM

#

i have already use GPU and also run on google colab GPU but its takes same time

deft harbor Dec 16, 2019, 6:15 AM

#

That was a useful article sheemp

#

https://github.com/BMW-InnovationLab

GitHub

BMW InnovationLab

This organization contains open source software published by the developers and partners of the BMW InnovationLab - BMW InnovationLab

gleaming thorn Dec 16, 2019, 6:21 AM

#

@deft harbor thanks

lapis sequoia Dec 16, 2019, 1:39 PM

#

:<

📎 unknown.png

deft harbor Dec 16, 2019, 3:08 PM

#

What did you go with

oblique belfry Dec 16, 2019, 6:09 PM

#

I'd use the original Yolo Darknet that is written in C. It is more finnicky to work with, but the performance time is impressive.

worn stratus Dec 16, 2019, 7:23 PM

#

Does anyone happen to have a neat example of calculating information gain in Python?

waxen topaz Dec 16, 2019, 8:27 PM

#

Hello everyone, I'm trying to write an essay on optical character recognition as implemented with machine learning. do you guys have any interesting or useful sources explaining the topic? looking for youtube videos or articles or academic papers.
As it's not for a CS course I want to explain what OCR and Machine learning are as well as what it's useful for.

acoustic mural Dec 17, 2019, 2:47 AM

#

what libraries do you all use to automate testing different neural architectures? i'm not going to be able to work interactively

#

not just regular hyperparameter tuning but also things like number, type, and size of layers

#

on top of keras*

#

although i'm sure more general solutions exist

native moon Dec 17, 2019, 4:04 AM

#

def fill_data(data_frame):
    for i in np.setdiff1d(unique_full_folder_path, data_frame["Full Folder Path"].values):
        data_frame.loc[data_frame.shape[0]] = [i, 0, *data_frame.iloc[0,2:].values]
    return data_frame

can anyone tell me what this does?

#

unique_full_folder_path = report_1_df["Full Folder Path"].append(report_2_df["Full Folder Path"]).unique()

silent swan Dec 17, 2019, 4:51 AM

#

testing in what sense

#

looks like it gets all the unique "Full Folder Path"s across both data frames

lapis sequoia Dec 17, 2019, 4:52 AM

#

anyone have experience with Rstudio and SQL?

#

I know this a python disc

lapis sequoia Dec 17, 2019, 6:03 AM

#

sql, you can ask in databases

vital cipher Dec 17, 2019, 10:56 AM

#

@lapis sequoia yeah working with rstudio too so can dm me 🙂

#

hope i can help you

native stag Dec 17, 2019, 2:15 PM

#

https://homes.cs.washington.edu/~pedrod/papers/cacm12.pdf

#

worth a read

worn stratus Dec 17, 2019, 2:46 PM

#

Does anyone know of an example of a decision tree (preferably id3) implemented from scratch? I can find a couple on random githubs, but they seem to have issues

vital cipher Dec 17, 2019, 2:57 PM

#

@worn stratus you wanna implement decision tree from scratch or you wanna an example for it?
i do have an example and can share you...so do let me know

worn stratus Dec 17, 2019, 2:59 PM

#

Yeah, an example would be useful

#

my end goal is random forest from scratch

#

but the first step is a decision tree

vague merlin Dec 17, 2019, 7:44 PM

#

Hey I need to create a program that can check similarity between two images. Is machine learning the best way to solve this?

deft harbor Dec 17, 2019, 10:18 PM

#

are you looking for the SAME image?

#

if all you need to do is match the same picture to itself, then you dont need machine learning

#

you could just see if the pixels match

#

@vague merlin

wraith basin Dec 18, 2019, 4:39 AM

#

@worn stratus for information gain in Python check this out https://machinelearningmastery.com/information-gain-and-mutual-information/

Machine Learning Mastery

Jason Brownlee

Information Gain and Mutual Information for Machine Learning

Information gain calculates the reduction in entropy or surprise from transforming a dataset in some way. It is commonly used in the construction of decision trees from a training dataset, by evaluating the information gain for each variable, and selecting the variable that m...

#

Has worked examples

deft harbor Dec 18, 2019, 5:44 AM

#

Thanks for the read

lapis sequoia Dec 18, 2019, 7:15 AM

#

how is this different from stacking models

#

say I use catboost or something that works well with categories.. and then append predictions to support the next model on top of this

silent swan Dec 18, 2019, 8:20 AM

#

@worn stratus if you're good at reading code you can read the sklearn cython

#

actually maybe not, I don't think it's a good learning experience

worn stratus Dec 18, 2019, 11:38 AM

#

@silent swan at this point I'd happily look at it in the sklearn source, but I can't find it

#

nvm - got it

#

sorry for the ping

vague merlin Dec 18, 2019, 2:17 PM

#

@deft harbor It needs to able to check similarity between two different pictures, it could be almost the same image but from a different angle or zoom etc

deft harbor Dec 18, 2019, 3:21 PM

#

Ah, then yeah, you will most likely need some sort of NN for that. @vague merlin

#

How many different image classes will you have? For example, you know you will want to match pictures of a certain statue downtown, a bus stop and a specific building, that would be three.

#

If you want to match all possible similar images of anything, that's going to be a pretty large undertaking.

vague merlin Dec 18, 2019, 4:53 PM

#

@deft harbor ah okey,
it needs to check similarity between different rooms within a house, so if there are two images of the same kitchen but from a different angle it would still give it a high similarity score, the same goes for bedrooms, bathrooms and so on, preferably the outside of the house as well. Just enough to determine if it's the same house or not.

I assume that it would be quite a large project to make something like that work?

silent swan Dec 18, 2019, 5:51 PM

#

still, try some silly heuristic like pixel-level difference, maybe with image registration first

oblique belfry Dec 18, 2019, 6:58 PM

#

https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.correlate2d.html

#

Maybe try running an edge detector on the image first, and then running correlate2d.

#

https://ourcodeworld.com/articles/read/1006/how-to-determine-whether-2-images-are-equal-or-not-with-the-perceptual-hash-in-python

I've had some luck with pHash too.

Our Code World

How to determine whether 2 images are equal or not with the percep...

Learn how to generate the perceptual hash of an image using Python.

vague merlin Dec 18, 2019, 8:02 PM

#

thanks guys, i will try a few different solutions and see what results i get, https://en.wikipedia.org/wiki/Scale-invariant_feature_transform seem interesting as well

Scale-invariant feature transform

The scale-invariant feature transform (SIFT) is a feature detection algorithm in computer vision to detect and describe local features in images.
It was patented in Canada by the University of British Columbia and published by David Lowe in 1999.
Applications include object ...

uncut shadow Dec 18, 2019, 9:04 PM

#

Hey. What is the problem here? I visited tf errors page but I couldn't find this error.

#

Traceback (most recent call last):
  File "C:\Users\PC\Anaconda3\envs\TF\lib\site-packages\tensorflow_core\python\pywrap_tensorflow.py", line 58, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "C:\Users\PC\Anaconda3\envs\TF\lib\site-packages\tensorflow_core\python\pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "C:\Users\PC\Anaconda3\envs\TF\lib\site-packages\tensorflow_core\python\pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
  File "C:\Users\PC\Anaconda3\envs\TF\lib\imp.py", line 242, in load_module
    return load_dynamic(name, filename, file)
  File "C:\Users\PC\Anaconda3\envs\TF\lib\imp.py", line 342, in load_dynamic
    return _load(spec)
ImportError: DLL load failed with error code 3221225501

#

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:/Users/PC/PycharmProjects/TF/test.py", line 1, in <module>
    import tensorflow
  File "C:\Users\PC\Anaconda3\envs\TF\lib\site-packages\tensorflow\__init__.py", line 98, in <module>
    from tensorflow_core import *
  File "C:\Users\PC\Anaconda3\envs\TF\lib\site-packages\tensorflow_core\__init__.py", line 40, in <module>
    from tensorflow.python.tools import module_util as _module_util
  File "C:\Users\PC\Anaconda3\envs\TF\lib\site-packages\tensorflow\__init__.py", line 50, in __getattr__
    module = self._load()
  File "C:\Users\PC\Anaconda3\envs\TF\lib\site-packages\tensorflow\__init__.py", line 44, in _load
    module = _importlib.import_module(self.__name__)
  File "C:\Users\PC\Anaconda3\envs\TF\lib\importlib\__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "C:\Users\PC\Anaconda3\envs\TF\lib\site-packages\tensorflow_core\python\__init__.py", line 49, in <module>
    from tensorflow.python import pywrap_tensorflow
above this error message when asking for help.

#

  File "C:\Users\PC\Anaconda3\envs\TF\lib\site-packages\tensorflow_core\python\pywrap_tensorflow.py", line 74, in <module>
    raise ImportError(msg)
ImportError: Traceback (most recent call last):
  File "C:\Users\PC\Anaconda3\envs\TF\lib\site-packages\tensorflow_core\python\pywrap_tensorflow.py", line 58, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "C:\Users\PC\Anaconda3\envs\TF\lib\site-packages\tensorflow_core\python\pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "C:\Users\PC\Anaconda3\envs\TF\lib\site-packages\tensorflow_core\python\pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
  File "C:\Users\PC\Anaconda3\envs\TF\lib\imp.py", line 242, in load_module
    return load_dynamic(name, filename, file)
  File "C:\Users\PC\Anaconda3\envs\TF\lib\imp.py", line 342, in load_dynamic
    return _load(spec)
ImportError: DLL load failed with error code 3221225501


Failed to load the native TensorFlow runtime.

See https://www.tensorflow.org/install/errors

for some common reasons and solutions.  Include the entire stack trace

#

This error occurs even when I want to import it

#

I'm using windows 7 if that's needed to know

lapis sequoia Dec 19, 2019, 4:05 AM

#

@uncut shadow what do you want to run locally with TF

#

google colab can run TF without separately installing it

rigid summit Dec 19, 2019, 5:54 AM

#

Hey, anyone here use Anaconda? I can't seem to update my Spyder...

#

the "conda install spyder-4.0.0" doesn't work in the Anaconda Prompt, it says it's not a legit command

silent swan Dec 19, 2019, 6:05 AM

#

what's the error message

#

windows/unix?

lapis sequoia Dec 19, 2019, 6:13 AM

#

try spyder==4.0.0

#

@rigid summit

primal ravine Dec 19, 2019, 6:35 AM

#

Hey, im trying to give my Matrix categories. by first making the words in a column into numbers, and then setting those numbers as categories. However, i keep getting an error message and i dont know what to do, my code is identical to the professor

📎 Screen_Shot_2019-12-19_at_1.29.25_AM.png

#

📎 Screen_Shot_2019-12-19_at_1.31.46_AM.png

#

Professor's Code:

#

📎 Screen_Shot_2019-12-19_at_1.36.01_AM.png

#

Any help would be greatly appreciated, im just starting out learning python for data science!

lapis sequoia Dec 19, 2019, 6:50 AM

#

you need to install those packages

#

are you using spyder

#

open anaconda prompt and run conda install numpy matplotlib

fallen anchor Dec 19, 2019, 8:22 AM

#

Hey

#

valid,value
2004-07-21 20:00:00,280
2004-07-21 21:00:00,020```

#

df.iloc[1]['value2'] = 555

#

I am trying to create a new column value2 for the last row, but the above code doesn't do anything

#

I want it to look like

#

valid,value,value2
2004-07-21 20:00:00,280,
2004-07-21 21:00:00,020,555```

#

how can I do that?

lapis sequoia Dec 19, 2019, 8:34 AM

#

this is a csv?

oblique wyvern Dec 19, 2019, 10:53 AM

#

Hey everyone, I'm trying to figure out how to do the following with numpy:
There's a array a = [1, 2, 3, 4, 5] with data and an array b = [0, 1, 2, 0, 1] which are row numbers
How would I create a matrix use the data from a but placing the elements in the rows specified by b but keeping their column position from a?
Like:

[[1, 0, 0, 4, 0],
 [0, 2, 0, 0, 5],
 [0, 0, 3, 0, 0]]

paper niche Dec 19, 2019, 12:30 PM

#

with scipy you can use coo_matrix to construct a sparse matrix using row,col,data then call .todense() after

#

@oblique wyvern

rigid summit Dec 19, 2019, 3:43 PM

#

Is anyone willing to help me troubleshoot my Anaconda installation? I can't update it OR my Spyder program using either: conda update anaconda OR conda install spyder=4.0.0

#

It's really frustrating, especially because uninstalling and installing seems to take so long

fallen anchor Dec 19, 2019, 4:08 PM

#

@lapis sequoia yes, a csv

fierce ravine Dec 19, 2019, 5:30 PM

#

Would using PCA for dimensionality reduction be a good method for a dataset with 600 features?

#

I’m going to use clustering on the dataset, but I feel as though I am getting really weird results

oblique belfry Dec 19, 2019, 5:35 PM

#

It won’t hurt. Try and see what happens.

fierce ravine Dec 19, 2019, 5:37 PM

#

I did, but I’m having some issues with plotting it. If I want to keep my variance at greater than 0.85, I still have over 100 features

#

If I reduce it to 0.59 i have 3 features which is nice to work with, but wouldn’t the data be extremely muddy?

primal ravine Dec 19, 2019, 5:54 PM

#

Hey, im new to Learning Machine Learn. And im learning about multiple linear regression, Desicion tree, vector machines, rainorest classification.
It seems the course is directed towards business, slightly, but i want to know if all this information is Applicable to AI in robotics

#

for example programming a robot to avoid obstacles

#

or pick of certain objects

#

any help would be appreciated

oblique belfry Dec 19, 2019, 6:34 PM

#

Look into reinforcement learning.

primal ravine Dec 19, 2019, 6:36 PM

#

Okat

#

but

#

i would like to know if multiple linear regression, Desicion tree, vector machines, rainorest classification. this sort of thing will help me in applying machine learning to AI in relation to say robotics

fierce ravine Dec 19, 2019, 6:53 PM

#

I think I’m just gonna run with 0.59 variance. It works RenShrugGif

primal ravine Dec 19, 2019, 7:32 PM

#

If anyone can confirm for me that content such as multiple linear regression, Desicion tree, vector machines, rainorest classification. this sort of thing will help me in applying machine learning to AI in relation to say robotics

silent swan Dec 19, 2019, 7:42 PM

#

you've got a long way to go to get to robotics

oblique belfry Dec 19, 2019, 7:42 PM

#

@primal ravine Potentially...but I don’t know of people currently doing that.

Those are good techniques to know about. But, neural networks are the new norm in that field. You need to learn a super complex non-Linear function. Only neural nets can capture that.

#

@silent swan Agreed.

silent swan Dec 19, 2019, 7:43 PM

#

those are the correct starting points, but it'll still be far away from your goal

#

but if this is like something you want to do over the course of like, 3 years, yes, that's where to start

#

@rigid summit post your error message, and are you using windows?

rigid summit Dec 19, 2019, 10:12 PM

#

thanks @silent swan , I posted the issue here: https://github.com/ContinuumIO/anaconda-issues/issues/11524

GitHub

Can't update Anaconda or anything in Anaconda · Issue #11524 · C...

Actual Behavior When I attempt to update Anaconda, conda, or Spyder using the Anaconda PowerShell Prompt: "conda update anaconda, conda update --all, conda install spyder=4.0.0", ...

#

I don't know if I'll get a response, there are 1000+ open issues apparently

#

I am using windows

lapis sequoia Dec 20, 2019, 4:02 AM

#

is anyone alive

#

📎 239848.png

#

I need help understanding this graph..

#

why does the % change keep decreasing

#

does that mean media is only part of digital ad spending?

lapis sequoia Dec 20, 2019, 4:48 AM

#

nvm I figured it out.. but I'm having trouble understanding difference between ARIMA and ARMA

primal ravine Dec 20, 2019, 5:57 AM

#

Hey, can someone explain to me how Deep Learning or Neural networks are used in robotics? By that i mean, how do you actually allow the robot to make its own desicion using your intended code, Do they all require an arduino? is there a more powerful alternative?

oblique belfry Dec 20, 2019, 1:10 PM

#

Reinforcement learning. Lol

#

Not all of them have an arduino. But they have some type of sensors that feed into some type of computer. It depends on what the task is. Object detection requires a different architectural than movement.

#

Robotics is a VERY big field. And each robotics problem can be subdivided into many miniature problems that might use multiple neural networks.

oblique belfry Dec 20, 2019, 5:50 PM

#

My coworker is doing some data transformation on Temple's eeg corpus.

Man...that is some narly code, which is all done in a iPython terminal. I am not a fan of running long-running jobs in iPython/Jupyter.

📎 image.png

jolly briar Dec 20, 2019, 6:28 PM

#

@oblique belfry looks like it's been copy pasted from a script no?

#

i use this workflow quite often

oblique belfry Dec 20, 2019, 6:51 PM

#

Eh.....knowing him....doubt it.

#

He trains all of his neural nets (ones that take over 5 hours to complete an epoch) through Jupyter. I have had too many kernels crap out on long computations. The above script will take a day and a half to complete. He must have better luck with Jupyter than I do.

fierce ravine Dec 20, 2019, 7:17 PM

#

I’ve been looking online and just want to make sure I’m right about this

#

For unsupervised, clustering methods we don’t need to split between training and testing data?

silent swan Dec 20, 2019, 8:16 PM

#

I'm a fan of doing whatever you want in notebooks, but once things move into code, unless it's explicitly a script, you gotta start cleaning things up

#

@fierce ravine that intuition is somewhat correct, but there can be more subtlety around it

#

certainly generally you don't really care about evaluating against anything

#

or in the case of visualization methods, you just want any decent represetation of your data so feel free to run over and over on the whole dataset

fierce ravine Dec 20, 2019, 8:19 PM

#

feelsthumbsup

alpine stream Dec 20, 2019, 10:20 PM

#

Hi! I have an NLP task. There is a text (telephone conversations). Voice is already converted into text and is divided into agent and customer paragraphs. I need to understand what approach is the best one for the next tasks:

Who is the customer and who is the agent?
Customer Name
The topic of conversation
Promises made by the operator to the customer (for example, "I call back tomorrow")
Negative Sentiment (if there is something in the conversation that the subscriber is not happy with)
I am just trying to understand how to handle it. Is it possible to create some kind of general approach for this? If yes, for which packages (maybe BERT)/publications/books could I pay my attention?

silent swan Dec 20, 2019, 11:52 PM

#

best to think in terms of "what is the output" for each task

#

e.g. 3 is "document" -> "topic" classification

#

1/5 are per-sentence classification

#

(some of these can be reframed but this is a starting point)

#

2 is sort of span prediction, potentially 4 can be framed the same way as well

#

after that, go look for what models is suitable for each

#

fwiw, BERT (with additional modules) is suitable for all of them

#

but BERT is also much more computationally intensive than simpler models

lapis sequoia Dec 21, 2019, 4:37 AM

#

I am running a GPT-2 text generation model that looks something like this

gpt2.generate(sess,
              model_name=model_name,
              prefix=pbuffer[0],
              return_as_list=True,
              length=120,
              temperature=flavorslider.value,
              top_p=0.9,
              truncate='<|e',
              nsamples=1,
              batch_size=1,
              )```

#

where sess = tensorflow.compat.v1.Session

#

what I want to do is clear the session grid and variables each run to prevent memory leaks

#

so basically after this runs, collect the output, then close the session

#

and just before it runs the next time, open a fresh session

#

anyone know how to do this?

silent swan Dec 21, 2019, 4:50 AM

#

does session.close() not suffice? or using a context manager

lapis sequoia Dec 21, 2019, 4:50 AM

#

it does work, however returns an error about attempting to reuse closed tf session

#

not sure why, since its at the end of the code

silent swan Dec 21, 2019, 4:51 AM

#

do you create a new session each time?

lapis sequoia Dec 21, 2019, 4:51 AM

#

yeah thats what im trying to do

silent swan Dec 21, 2019, 4:51 AM

#

I guess I'm confused about the issue here

#

if you could post a code snippet?

lapis sequoia Dec 21, 2019, 4:51 AM

#

its rather large

#

i can send a link to the colab

silent swan Dec 21, 2019, 4:52 AM

#

sure

lapis sequoia Dec 21, 2019, 4:52 AM

#

def reset_session(sess, threads=-1, server=None):
    """Resets the current TensorFlow session, to clear memory
    or load another model.
    """

    tf.compat.v1.reset_default_graph()
    sess.close()
    sess = start_tf_sess(threads, server)
    return sess```

#

this might be working

#

https://colab.research.google.com/drive/1Gx2gkRKVpjry62iWcEaEPL8bwUkRaRTb

Google Colaboratory

silent swan Dec 21, 2019, 4:52 AM

#

oh it's ai dungeon

lapis sequoia Dec 21, 2019, 4:52 AM

#

if you double click the "Enter Dungeon" header

#

its my own dungeon 👅

silent swan Dec 21, 2019, 4:53 AM

#

yea anyway you can do the above which sounds like it should work

#

all you want to do is either 1) close and create a new session each time

lapis sequoia Dec 21, 2019, 4:54 AM

#

yeah it wasnt working before but now im not getting an error message

silent swan Dec 21, 2019, 4:54 AM

#

or 2) use a context manager, which does that for you

lapis sequoia Dec 21, 2019, 4:54 AM

#

ok, that was what i thought originally, but the error messages had me confused

#

glad to know i was right

#

thank you

lapis sequoia Dec 21, 2019, 5:32 AM

#

📎 unknown.png

#

always this

#

gpt2.start_tf_sess(threads=-1, server=None)
  with sess:
    message = gpt2.generate(sess,
                model_name=model_name,
                prefix=pbuffer[0],
                return_as_list=True,
                length=120,
                temperature=flavorslider.value,
                top_p=0.9,
                truncate='<|e',
                nsamples=1,
                batch_size=1,
                )
    return```

silent swan Dec 21, 2019, 6:13 AM

#

instead do

#

with gpt2.start_tf_sess(threads=-1, server=None) as sess:
    blahblah

lapis sequoia Dec 21, 2019, 6:14 AM

#

yes i have switched to that

#

was getting initialization errors so had to add this:

#

init_op = tf.global_variables_initializer()```

silent swan Dec 21, 2019, 6:15 AM

#

yep, you need to reinitialize the global variables for a new session

lapis sequoia Dec 21, 2019, 6:15 AM

#

sess.run(init_op)

#

it runs, but i get crazy garbled output

#

trying a slightly different arrangement

oblique belfry Dec 21, 2019, 6:38 AM

#

That’s why I don’t like TF.

fallen pendant Dec 21, 2019, 1:36 PM

#

new to this i just finished my first python course and i have been practicing on there two apps which one do you think is better .

#

https://www.hackerrank.com/dashboard

HackerRank

Dashboard | HackerRank

Join over 7 million developers in solving code challenges on HackerRank, one of the best ways to prepare for programming interviews.

#

or https://www.codewars.com/dashboard

Codewars

Codewars is where developers achieve code mastery through challenge. Train on kata in the dojo and reach your highest potential.

granite marsh Dec 21, 2019, 2:23 PM

#

@fallen pendant First of all, congrats for completing first course on python👍
My priority (Level of hardness/complexity) would be:
Hackerrank (Improves your basic)

Leetcode (Improves your knowledge through medium complexity)

Hackerearth (Gives you better knowledge and also you can get internships or jobs)

SPOJ(Very tough level of problems)

Still, many exist but even if you practice from these is more than enough.
If you are still having any doubts, you can contact me at:
https://www.linkedin.com/in/tejas-s-401ab4185

fallen pendant Dec 21, 2019, 2:52 PM

#

@granite marsh thanks so much

granite marsh Dec 21, 2019, 3:24 PM

#

Never mind for asking any help from me , reach me at:
https://www.linkedin.com/in/tejas-s-401ab4185

jolly briar Dec 22, 2019, 6:16 PM

#

anyone done parallel processing with R? I"m wondering whether it's more straightforward or not than Python

#

https://nceas.github.io/oss-lessons/parallel-computing-in-r/parallel-computing-in-r.html

#

seems fairly straightforward.

drifting hemlock Dec 22, 2019, 8:00 PM

#

I think this is more of an operational question instead of data science itself but I'm a bit curious, do you guys implement a workflow in your work as a data scientist?

#

I'm not talking about methodologies (osemn, crisp, asum, etc), more like a team workflow. I've heard of Agile, but that's complicated to implement in a data science team.

rigid summit Dec 22, 2019, 9:02 PM

#

Anybody a seasoned user of Anaconda? The tech support is terrible to non-existent so far for me. I have an issue described here:

https://stackoverflow.com/questions/59419880/how-do-i-change-the-directories-in-anaconda-having-issues-updating

Stack Overflow

How do I change the directories in Anaconda? (Having issues updating)

I can't update anything in Anaconda.

I've tried the Anaconda Powershell Prompt (conda update/install) and the Anaconda Navigator. I think it's because when I originally installed it, I used the

lapis sequoia Dec 22, 2019, 9:32 PM

#

Try deleting the sitepackages -> %USERPROFILE%/AppData/Roaming/ -> Python/../site-packages

drifting hemlock Dec 22, 2019, 9:34 PM

#

In which environment are you trying to update your packages? The base environment?

lapis sequoia Dec 22, 2019, 9:40 PM

#

Check with "conda info -e"

drifting hemlock Dec 22, 2019, 9:40 PM

#

Anaconda is not that great at handling path variables at install/uninstall. You should check them out and make sure your user path is not referenced for Anaconda:

📎 unknown.png

rigid summit Dec 22, 2019, 9:43 PM

#

Oh awesome, thanks let me check

#

It would be amazing if I can finally get this to work...

#

the base is c:\Anaconda

#

@drifting hemlock How do I pull that path up?

#

I mean, that screen...

drifting hemlock Dec 22, 2019, 9:46 PM

#

Type path in the search bar and select this one:

📎 unknown.png

#

Then click in PATH and click the Edit... button

rigid summit Dec 22, 2019, 9:50 PM

#

Alright, looks like the "Path" does reference my account/user (with the space) for Python

#

What should I do?

drifting hemlock Dec 22, 2019, 9:51 PM

#

Just change it to reference the folder in which you installed Anaconda, for example change:

C:\Users\Your Username\Anaconda3\Library\usr\bin

to

C:\Anaconda3\Library\usr\bin

#

Oh, and remember to close and reopen the console so it can refresh PATH

rigid summit Dec 22, 2019, 9:59 PM

#

Alright 👍 checking to see if that worked

#

shoot, no dice... I might have not done it properly - the Paths didn't say Anaconda, they said Python, so that might be one issue... also the user config file, populated config files, package caches, and envs directories all still point through my user name

drifting hemlock Dec 22, 2019, 10:14 PM

#

Then the issue is definitely your path, unfortunately Anaconda sucks at updating the path environment. You have to options I believe:

Updating PATH manually. This means removing entries referencing Anaconda in your Path variable and then creating them. That can be a pain in the ass.
You can remove all the entries referencing anaconda in your PATH variable and then reinstall anaconda making sure to select Add Anaconda to my PATH environment variable.

Just full disclosure: playing with Path can lead to undesirable results, so have a backup of your path just in case.

rigid summit Dec 22, 2019, 10:15 PM

#

Thanks 🙂 ... how undesirable?

drifting hemlock Dec 22, 2019, 10:18 PM

#

Well, depends on what you have there, anyways you can easily make a backup and then restore it if anything goes wrong, you can just open regedit and go to Computer\HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Environment and then clicking in Path and saving the contents in a text file.

#

It's not that scary honestly.

📎 unknown.png

rigid summit Dec 22, 2019, 10:21 PM

#

Alright, I'll give it a shot. Thanks very much for your help.

#

Oh, one thing before I get started - I accidentally deleted "Path" that included an entry with this %USERPROFILE% in it... the only other two were for python... I'm hoping if I restart my computer it will come back...

#

Should have made a backup, haha

drifting hemlock Dec 22, 2019, 10:23 PM

#

I really hope so too hahaha

#

Know what? Let me show you how to back it up easily without getting into the registry.

#

Hold on

drifting hemlock Dec 22, 2019, 10:43 PM

#

@rigid summit here you go https://www.youtube.com/watch?v=dKE1EpACl2E it's on low quality right now because it is still being processed

YouTube

Franccesco Orozco

How to backup Path

GPU: GeForce GTX 1080
CPU: Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
Memory: 16 GB RAM (15.94 GB RAM usable)
Current resolution: 1920 x 1080, 60Hz
Operating system:

▶ Play video

rigid summit Dec 22, 2019, 11:02 PM

#

Perfect!! Thanks. Crossing my fingers that this works

rigid summit Dec 22, 2019, 11:32 PM

#

Damnit. I'm still getting the same error when I try to install updates - the directories are all the same to (with my user name in it) using conda info

#

The Paths are all correct now though.

drifting hemlock Dec 22, 2019, 11:59 PM

#

what environment are you using? Base? do conda env list

#

You should get something like:

PS C:\Users\Franccesco> conda env list
# conda environments:
#
base                  *  C:\Users\username\Anaconda3
getaltname               C:\Users\username\Anaconda3\envs\getaltname

rigid summit Dec 23, 2019, 12:18 AM

#

Just: C:\Anaconda

lusty trellis Dec 23, 2019, 8:33 AM

#

hey guys has anyone here worked on Scrapy for scrapping i am trying to scrape and infinte scrolling page but i dont know how to do it

lost sinew Dec 23, 2019, 10:14 AM

#

anyone knows how to get the historical 1 minute data from january 2018 for bitmex without getting api banned

oblique belfry Dec 23, 2019, 3:03 PM

#

https://www.nature.com/articles/d41586-019-03895-5
Good article. Nice to see fell ML engineers frustrated by the lack of reproducibility. I am all for the push for submissions to have code when published.

This AI researcher is trying to ward off a reproducibility crisis

Joelle Pineau is leading an effort to encourage artificial-intelligence researchers to open up their code.

#

"Code submission is one of the elements I’m most impressed with. A year ago, 50% of accepted NeurIPS papers contained a link to code; this year, we’re at 75%."
Albeit, this is just for NeurIPS, but I hope this trend continues.

lapis sequoia Dec 23, 2019, 3:05 PM

#

I need help in creating a SVM model in pure numpy and python

oblique belfry Dec 23, 2019, 3:08 PM

#

bumpy? I hope you mean numpy.

#

I get that.

#

Our company tried to reproduce results from eeg papers with the given data (Temple's corpus) and we still couldn't achieve the results. We followed the specs of the papers (had the same LSTM/Convolution layers/hyperparameters), but the results didn't match.

#

Showing the agorithm can get us one step closer.

#

The way in which people manipulate their data before training can significally alter the data to the point it isn't generalizable.

#

hmm....I never thought of doing it with different random seeds. I think that should be a requirement as well.

#

Yeah. I get why they might not be abel to publish their dataset. But publishing their code can help mitigate issues like this.

#

Ha...use a random number generator to pick seeds for other random generators.

I love it.

rigid summit Dec 23, 2019, 5:35 PM

#

Sorry for the repeat for those who have scanned over this before: Anybody a seasoned user of Anaconda? The tech support is terrible to non-existent so far for me. I have an issue described here:

https://stackoverflow.com/questions/59419880/how-do-i-change-the-directories-in-anaconda-having-issues-updating

Stack Overflow

How do I change the directories in Anaconda? (Having issues updating)

I can't update anything in Anaconda.

I've tried the Anaconda Powershell Prompt (conda update/install) and the Anaconda Navigator. I think it's because when I originally installed it, I used the

silent swan Dec 23, 2019, 10:34 PM

#

in the case of medical data, a lot of it comes down to preprocessing

#

that's why they're especially hard to reproduce (in addition to all the big datasets being unsharable)

#

if you're open to reinstalling

#

and you're still running into issues

#

you need to go track down wherever the bad paths are coming from

jolly briar Dec 23, 2019, 10:40 PM

#

@oblique belfry if there's code and no data though is it reproducible ?

oblique belfry Dec 23, 2019, 11:08 PM

#

It’s better than nothing.

You can at least check out their methods and validate the logic behind it.

#

I’ve also tried to replicate papers that used public datasets.

I think you need both data and code to replicate results. Data will be the hardest to get, code is pretty simple. I’d rather have something than nothing.

deft harbor Dec 24, 2019, 3:11 AM

#

What database should I study first?

#

SQL?

oblique belfry Dec 24, 2019, 4:07 AM

#

What type of job do you want?

#

I know the basics of SQL, but if I have a project that really utilizes it, I’ll use SQLAlchemy or Orator to query data. I mostly use Mongo at work. But, we do a lot of machine learning and AI.

#

Data science is a big field that honestly should become more separated. A data scientist at one company may look completely different at another company.

acoustic mural Dec 24, 2019, 4:33 AM

#

as someone in industry who sees some hiring decisions, SQL on the resume never hurts

#

(unless it's a lie)

deft harbor Dec 24, 2019, 5:34 AM

#

No specific job, more looking to expand my skill set for my own projects that might be useful later down the road. I was thinking SQL, and most of the tasks would be machine learning. At least at first, until I get a better hand on things.

oblique belfry Dec 24, 2019, 6:13 AM

#

A lot of positions I applied for wanted SQL experience. But, those were jobs that I wasn’t very interested in.

bright heron Dec 24, 2019, 3:50 PM

#

Anyone recommend any guides on how to interact with API using python?

deft harbor Dec 24, 2019, 4:05 PM

#

The api documentation?

fallen anchor Dec 24, 2019, 4:30 PM

#

which API?

jolly briar Dec 24, 2019, 4:45 PM

#

@oblique belfry what does SQL experience mean though? I'm never sure how much is typically expected to satisfy that, i guess it's "how longs a piece of string".

#

personally i've never needed anything beyond a join

#

no subtables, views, or whatever else

oblique belfry Dec 24, 2019, 5:13 PM

#

I don’t know either

drifting hemlock Dec 24, 2019, 7:09 PM

#

Has anyone noticed that in Data Science / Analysis there's a ton of information available to improve your skills, but not much information on the "operational" side of the industry? For example, how to work with teams, how to implement a workflow in a corporate environment that is scalable, how the different methodologies fit in an data science organizational team.

#

At least for me it's been difficult to get this kind of information on the internet.

deft harbor Dec 24, 2019, 7:56 PM

#

Management, Dev ops and "big data" covers a lot of that

drifting hemlock Dec 24, 2019, 8:11 PM

#

Yeah but I've been feeling like we're kinda lost in that scenario, we know the process that comes with gathering data, scrubing, feature engineering, modeling and deployment, but we tend to forget that all of that needs a place where toolsets and teams in a corporate environment needs to take place and co-exist.

#

I think that's harder for small teams though.

#

In my work environment we're still trying to figure this out, so for example we have a binary classification task:

Where do we document it? Let's say Jira.
Where do perform the EDA/Feature Engineering? Jupyter notebooks, right.
Where and with what we build the model? sklearn.
Deployment? IBM Watson or a simple API in a cluster.

All of that comes with a price and is that, depending on what tools you use, you're going to get a big technical debt, or you are going to sacrifice collaboration if you do the development offline, or even reproducibility.

#

I know that there's no a tried and true workflow in which all of these can be implemented because the industry is still very young, but it would be neat to have more direction. * rant over lol *