#data-science-and-ml | Python | Page 196

light cloud Mar 31, 2019, 8:22 AM

#

And that was my understanding initially. As I started reading on KNN there were several articles talking about how KNN could do forecasting. Same with RF, but it was predicting just the next days weather and not doing a time series forecast.

#

This convo has helped clear some stuff up so I appreciate all the responses.

paper niche Mar 31, 2019, 8:35 AM

#

I think weather might be okay for RF since the temperatures don't typically exceed historic highs and lows.

pulsar stag Mar 31, 2019, 5:06 PM

#

Tutorial video on how to connect to the Binance API & break down the data with python and create dashboards & graphs! Hope you guys find it interesting & learn a thing or two: https://youtu.be/RHqEPNgpbzQ

📎 Screen_Shot_2019-03-31_at_2.14.05_AM.png

YouTube

Potluck Economics

Python Binance API, Dashboards & Data Science Tutorial

Have Questions check out our discord: https://discord.gg/rNc6xtP Python-Binance wrapper: https://python-binance.readthedocs.io/en/latest/market_data.html Git...

▶ Play video

vague jetty Mar 31, 2019, 7:57 PM

#

Any recommendations for a science-orientated IDE? Spyder's debugging tools aren't great, PyCharm's science view isn't working for me (so I can't explore my data), and I'm not a fan of notebooks

reef bone Mar 31, 2019, 8:03 PM

#

JupyterLab should give a more sophisticated experience than plain notebooks, so maybe see if that works better for you. And there is a plug-in for Atom called Hydrogen that provides a notebook-like experience and I've seen people swear by it, but never used it personally

#

Otherwise I'm afraid there isn't that much, at least as far as I know

vague jetty Mar 31, 2019, 8:05 PM

#

I'm not a fan of notebook-style work. I'm doing scientific software development

#

So it's less pure data exploration and more software development

#

Hence why I want good debugging tools

reef bone Mar 31, 2019, 8:06 PM

#

In that case pycharm is probably your best bet

#

See if you can figure out why the science view thingy doesn't work for you, you might be able to find someone with the same problem and maybe a solution

vague jetty Mar 31, 2019, 8:07 PM

#

Yeah, I'll give it a shot. Thanks!

reef bone Mar 31, 2019, 8:08 PM

#

(I abandoned pycharm myself because of its awful support for notebooks), I'm aware that's not what you're looking for but afaik it's a well known problem that pycharm is a bit lacking in that department

#

Or at least used to be, apparently there was just an update

lunar rover Mar 31, 2019, 8:09 PM

#

Hello guys,
What would be the best way to create a web app (with Django or Flask) which would demonstrate:
1: Automatic Data Gathering for Stocks,
2: Automatic Datasheets for a variety of Stocks and
3: Stock prediction (approximately if possible)
If some1 has a good device or a tutorial/instructions, I would be really grateful. I don't have a Data Science experience, so I am just looking for a good start.
Thank you in advance 😃

vague jetty Mar 31, 2019, 8:10 PM

#

Just got it working, I think? I used to be a big Spyder fan, but I've grown too dependent on the good debugging tools of JetBrains' stuff 😛

vague jetty Mar 31, 2019, 8:48 PM

#

If I want to call pyplot.axvspan to highlight a range in time-series data indexed by DateTimes, what do I put in for the data units of the range?

#

nvm

pearl tangle Mar 31, 2019, 11:29 PM

#

Hey guys just made the decision to dedicate my future to deep Learning but I have an issue.

#

My maths sucks. So I came up with a plan to learn the required maths alongside learning deep learning and was wondering if you guys could help me out with the order I should learn each topic

#

I’m starting off with Algebra 1&2

#

Moving onto Linear Algebra

#

Then finally calculus followed by multivariate calculus

#

Is this the right order to learn each of the topics?

lean ledge Mar 31, 2019, 11:51 PM

#

I'd do algebra 1&2, single variable calculus, linear algebra, multivariate calculus, probability theory

waxen vine Apr 1, 2019, 1:46 AM

#

self teaching myself calculus on top of all this programming 😛

lean ledge Apr 1, 2019, 1:50 AM

#

Calculus is great though

waxen vine Apr 1, 2019, 1:51 AM

#

I would agree, but I am not the best at math to start with, so learning it on my own is like ugh

feral jungle Apr 1, 2019, 6:30 AM

#

This really doesn't have to do with data science, aside that I'm attempting to learn numba

#

Is there any way I can use str() in a vectorize, or at the very least run a line of code off the gpu?

lapis sequoia Apr 1, 2019, 9:06 AM

#

im trying to make a program which simulates aerodynamics around a object using vectors can someone who has some experience with vectors please hmu 😃

tall drum Apr 1, 2019, 11:04 AM

#

I have a pandas dataframe consisting of about 10 columns (float values). How do I split the dataframe into subgroups where one column has for example values over 80?

#

I know of groupby() but I think I need something more because it will output a new dataframe for each row but I would like to split them into "sections" where consecutive values are over 80

dim beacon Apr 1, 2019, 11:12 AM

#

@tall drum

df[df['column name'] > 80]

tall drum Apr 1, 2019, 11:21 AM

#

Basically I would like to create a new dataframe for each of the circled sections:

#

📎 unknown.png

#

Do I just have to loop through the dataframe and store the start and end points for each section in a list or is there a better way?

dim beacon Apr 1, 2019, 11:30 AM

#

@tall drum i literally gave you the solution

tall drum Apr 1, 2019, 11:33 AM

#

Thanks I tried it but the output was only one dataframe.

#

I would like to create a dataframe separately for each of the "spikes" in the data. Your solution is one dataframe in which those spikes are concatenated .

paper niche Apr 1, 2019, 12:23 PM

#

@tall drum you can do it like:
0) ensure your indices are numbers (not dates, etc.); reset index if need be.

extract the indices which meet your criteria (>80).
split the indices (a numpy array) into blocks containing consecutive numbers
use these indices to extract from the df again

#

df = pd.DataFrame(np.random.random((30, 2)), columns=['a','b'])

# find indices where the 'a' column has values > 0.5
idx = df[df['a']>0.5].index.values

# splits numpy array into blocks containing consecutive elements
idx_list = np.split(idx, np.where(np.diff(idx) != 1)[0]+1)

# use the indices to extract from df
for i in idx_list:
    print(df.loc[i,:])

#

a simple example

#

the top one is the original df, the split ones (where the 'a' column >0.5) are after that

📎 chrome_2019-04-01_20-24-29.png

tall drum Apr 1, 2019, 1:15 PM

#

Thank you

lunar rover Apr 1, 2019, 1:30 PM

#

anyone used AlphaVintage API? or IEX?

distant inlet Apr 1, 2019, 2:40 PM

#

Hi guys

#

I'm new to python

#

And I want to start with data analytics

#

Which libraries should I learn

paper niche Apr 1, 2019, 2:46 PM

#

@distant inlet pandas and numpy are a must. as for plotting libraries, minimally matplotlib

distant inlet Apr 1, 2019, 2:46 PM

#

Thanks man !

#

Also I have no prior experience in data analytics

paper niche Apr 1, 2019, 2:47 PM

#

we all start somewhere

#

😃

distant inlet Apr 1, 2019, 2:47 PM

#

python 😍

paper niche Apr 1, 2019, 2:47 PM

#

oh and sklearn if you're doing machine learning

distant inlet Apr 1, 2019, 2:47 PM

#

Cool..

#

No ML...it's so freaky complex

paper niche Apr 1, 2019, 2:48 PM

#

keras and tensorflow/pytorch if you're doing deep learning

#

haha okay then

distant inlet Apr 1, 2019, 2:48 PM

#

:)

#

Any book that u can recommend for data analytics

paper niche Apr 1, 2019, 2:50 PM

#

I'm not that well-read, but I'm looking at "Machine Learning: A probabilitstic perspective" now. seems like quite a heavy emphasis on the statistical side, so might still be relevant to you. you can consider

rocky moth Apr 1, 2019, 4:04 PM

#

idk if this is the right place to ask but I want to do a small study on analysing album coverts to see whether or not I can train a neural netwrok to recognize genre's associated with the album.
does anyone have an idea where to start or experience with these kind of things? Currently thinking of using tensorflow, but that's as far as my thought process has gone

#

I have accumulated a dataset of ~4k labeled album covers

#

ill search for some more information and come back with better defined questions 😛

gaunt axle Apr 1, 2019, 4:19 PM

#

How does one calculate the maximum both positive and negative and minimum both positive and negative numbers in IEEE 754?
I have the answers but...
Máximo positivo: 1,111 1111 1111 1111 1111 1111 x 10^111 1111]2
Mínimo positivo: 1 x 10^-111 1110]2
Mínimo negativo: -1 x 10^-111 1110]2
Máximo negativo: -1,111 1111 1111 1111 1111 1111 x 10^111 1111]2

desert cradle Apr 1, 2019, 4:39 PM

#

@gaunt axle the information for the system float type is present in sys.float_info, the information for numpy types is in numpy.finfo([type])

#

they use different names unfortunately, so they're not interchangeable ```py

sys.float_info.max
1.7976931348623157e+308
sys.float_info.min
2.2250738585072014e-308
-sys.float_info.min
-2.2250738585072014e-308
-sys.float_info.max
-1.7976931348623157e+308
numpy.finfo(numpy.float64).max
1.7976931348623157e+308
numpy.finfo(numpy.float64).tiny
2.2250738585072014e-308
-numpy.finfo(numpy.float64).tiny
-2.2250738585072014e-308
numpy.finfo(numpy.float64).min
-1.7976931348623157e+308```

gaunt axle Apr 1, 2019, 4:58 PM

#

Thanks @desert cradle but I was looking for the explanation behind ithehe : )

desert cradle Apr 1, 2019, 5:20 PM

#

well, you understand binary, right?

#

when you have a "decimal" [not actually decimal but there's no better name for it] point in binary, that means that just like in decimal it goes 1/10, 1/100, 1/1000, it goes 1/2, 1/4, 1/8

#

so 1,11111111111111111111111 is 1 + 1/2 + 1/4 + 1/8 + 1/16 ... + 1/8388608

#

which is 1.99999988079071044921875 if written in decimal

#

and then that number is multiplied by 2^127

#

so you get 340282346638528859811704183484516925440

#

which would be 0xffffff00000000000000000000000000 in hex

#

@gaunt axle

#

is that what you're looking for? otherwise i'm not sure what explanation you want

gaunt axle Apr 1, 2019, 5:27 PM

#

That helped a lot! Now I must work it out with the IEE 754 format simple precision (1bit for sign, 8 for exponent, and 22 for mantissa(?))

desert cradle Apr 1, 2019, 5:27 PM

#

23

#

and there's the implicit 1 in the mantissa

gaunt axle Apr 1, 2019, 5:28 PM

#

Ohh, we were told we don't write those

desert cradle Apr 1, 2019, 5:28 PM

#

well it's 23 explicit bits, 24 including the 1

#

sign is the simplest, it's just 1 for negative 0 for positive

#

then the exponent is a bit complicated because of denormalized numbers, infinity, and NaN

gaunt axle Apr 1, 2019, 5:30 PM

#

^^^yeah that!!

#

I wonder if I have to memorize those for a test lol

desert cradle Apr 1, 2019, 5:30 PM

#

and the regular values are just offset by 127, so 1 [00000001] is -126, 127 [01111111] is 0, and 254 [11111110] is +127

gaunt axle Apr 1, 2019, 5:30 PM

#

I didn't understand the logic behind , why do we add 127 to the exponent

desert cradle Apr 1, 2019, 5:31 PM

#

because that way the negative and positive values are all in order

gaunt axle Apr 1, 2019, 5:31 PM

#

I just assumed it was like that bc, it said "excess by 127"

desert cradle Apr 1, 2019, 5:31 PM

#

the smallest one is the most negative value, and the largest one is the largest positive vlaue

#

(00 for denormal works because it's smaller than any value with 01, and FF for infinity is larger than any other possible value)

#

in fact, the floating point bits for +0.0 end up being 0 00000000 00000000000000000000000

#

which is useful because it means when memory is allocated and filled with zero bytes, it can be used directly as a float initialized to zero without any special code to fill in the parts that are floats

gaunt axle Apr 1, 2019, 5:34 PM

#

I will try to do my hw now : ), thanks @tribal kindle

desert cradle Apr 1, 2019, 5:35 PM

#

lol you @'d the wrong lemon

#

this april fools joke is kind of annoying

gaunt axle Apr 1, 2019, 5:38 PM

#

Ty @desert cradle ^_^

hasty maple Apr 1, 2019, 5:56 PM

#

nu u

urban ibex Apr 1, 2019, 6:22 PM

#

is this channel free?

#

would anyone be able to help with this

#

A city council is planning the city’s bus routes. It has decided which places will have a bus stop (schools, cinemas, hospital, etc.). Each bus route will start from the train station, visit a number of bus stops, and then return to the station, visiting the same bus stops in reverse order. Each bus stop has to be served by at least one bus route. The council wants to minimise the total amount of time that all buses are on the road when following their routes.

chilly geyser Apr 1, 2019, 6:24 PM

#

@urban ibex Not really sur eif this is the correct channel but I'm now even more convinced it's a multi-TSP while minimizing the highest route length.
An assumption is that route length is directly proportional to the time buses spend on the routes.

you need to find a way to visit all nodes. Even in the problem, the return trip I assume is simply the distance between the nodes again, so you have 2x of the distance for the same 'bus route' - effectively meaning this detail does not matter (except at the end when you need to show back answers/calculations relating to this)
Even if you don't really visit the home node after reaching the end of the trip, you can still transform the problem to an equivalent TSP whereby you visit a place that magically is able to teleport back to origin - essentially, this means it's a TSP.
Multiple vehicles means it's a multiple-TSP. The part where it says all vehicles essentially means you need to minimise the highest route cost among all routes

#

With this you should research into min-max TSP algorithms

urban ibex Apr 1, 2019, 6:24 PM

#

i think it wants me to use either BFS or DFS

chilly geyser Apr 1, 2019, 6:24 PM

#

Pretty sure that's not how state-of-the-art does it, but you could....

urban ibex Apr 1, 2019, 6:25 PM

#

how would it work with BFS or DFS and what would i need to adapt/change

#

the unit its linked to focusses on DFS, BFS, Dijkstras, priority queues and greedy algos

hardy crag Apr 1, 2019, 6:27 PM

#

what class is this for?

chilly geyser Apr 1, 2019, 6:27 PM

#

If given those I'd actually use a greedy algo

#

It's suboptimal but terminates in finite time IIRC

urban ibex Apr 1, 2019, 6:28 PM

#

this is the Q @hardy crag

chilly geyser Apr 1, 2019, 6:33 PM

#

Yes and I told you what optimisation problem it was

urban ibex Apr 1, 2019, 6:35 PM

#

yeah im just struggling to see how it would be implimented without the combination of another algo

hardy crag Apr 1, 2019, 6:38 PM

#

I reckon traveling salesperson is a good start might want to research algorithms that solve it

urban ibex Apr 1, 2019, 6:59 PM

#

i think youd have to use like DFS first and break the graph down then traverse them but im not sure

urban ibex Apr 2, 2019, 8:49 AM

#

Would anyone be able to help with this?

#

In graph theory, the number of nodes in a graph is called the order of the graph. The term ‘order’ is unrelated to sorting. Specify the problem of computing the order, as a UGraph operation.

Creator / Inspector / Modifier (delete as appropriate): order
Inputs:
Preconditions:
Outputs:
Postconditions:
Justify the kind of operation.

lapis sequoia Apr 2, 2019, 9:11 AM

#

do you know the difference between directed and undirected graphs

urban ibex Apr 2, 2019, 9:12 AM

#

yes I do @lapis sequoia

lapis sequoia Apr 2, 2019, 9:12 AM

#

so you know what a ugraph is?

urban ibex Apr 2, 2019, 9:13 AM

#

not 100%

lapis sequoia Apr 2, 2019, 9:14 AM

#

hmmm.. is this problem for python?

#

because if not.. its a complete tangent.. and I dont want to give you wrong directions..

urban ibex Apr 2, 2019, 9:15 AM

#

i just need to complete the spec

lapis sequoia Apr 2, 2019, 9:15 AM

#

there's a package in R called ugraph

#

well function actually..

urban ibex Apr 2, 2019, 9:16 AM

#

i think in this circumstance UGraph is just a name

lapis sequoia Apr 2, 2019, 9:16 AM

#

ok

#

undirected graph then..

urban ibex Apr 2, 2019, 9:16 AM

#

Consider an ADT for undirected graphs, named UGraph, that includes these operations:

nodes, which returns a sequence of all nodes in the graph, in no particular order
has_edge, which takes two nodes and returns true only if there is an edge between those nodes
edges, which returns a sequence of node-node pairs (tuples), in no particular order. Each edge only appears once in the returned sequence, i.e. if the pair (node1, node2) is in the sequence, the pair (node2, node1) is not.
How each node is represented is irrelevant. Because the graph is undirected, has_edge(node1, node2) and has_edge(node2, node1) return the same. You can assume the graph is connected and has no edge between a node and itself.

lapis sequoia Apr 2, 2019, 9:17 AM

#

they want you to describe how you're going to sort it?

urban ibex Apr 2, 2019, 9:18 AM

#

In graph theory, the number of nodes in a graph is called the order of the graph. The term ‘order’ is unrelated to sorting. Specify the problem of computing the order, as a UGraph operation.

lapis sequoia Apr 2, 2019, 9:19 AM

#

ok

urban ibex Apr 2, 2019, 9:21 AM

#

the big para is just background info

#

the small one is what this sub question wants answered

lapis sequoia Apr 2, 2019, 9:23 AM

#

ok

#

gimme a sec

urban ibex Apr 2, 2019, 9:23 AM

#

okay mate!

lapis sequoia Apr 2, 2019, 9:31 AM

#

from what I understand the sorting should be based on the degree and index value of the nodes..

urban ibex Apr 2, 2019, 9:32 AM

#

it doesnt need sorting does it?

lapis sequoia Apr 2, 2019, 9:34 AM

#

I think it does.. you would sort by descending order of degree, index

#

I mean.. expressing the order this way

urban ibex Apr 2, 2019, 9:35 AM

#

but this is the question...

#

In graph theory, the number of nodes in a graph is called the order of the graph. The term ‘order’ is unrelated to sorting. Specify the problem of computing the order, as a UGraph operation.

#

specify the problem of computing the order

#

the order is just the number of nodes

#

"unrelated to sorting"

lapis sequoia Apr 2, 2019, 9:41 AM

#

hmmm I'm not sure.. it's been a while since I did any graph theory.. but if you find how to order an undirected graph, you should get your answer..

wicked flare Apr 2, 2019, 9:41 AM

#

@lapis sequoia It specifically says "order" doesn't have anything to do with sorting

#

There's no sorting involved in this exercise

urban ibex Apr 2, 2019, 9:42 AM

#

the order of a graph is just the number of nodes

wicked flare Apr 2, 2019, 9:42 AM

#

@urban ibex What specifically stumps you about this?

#

Like, do you have trouble figuring out if this operation is a creator, inspector or modifier?

urban ibex Apr 2, 2019, 9:43 AM

#

right so i assume you read the overall sort of brief/backgroud info @wicked flare

wicked flare Apr 2, 2019, 9:43 AM

#

Yeah

urban ibex Apr 2, 2019, 9:43 AM

#

okay so...

#

Are we in agreement that its an inspector

#

because it doesnt change or create

wicked flare Apr 2, 2019, 9:44 AM

#

Yeah

#

That makes sense

urban ibex Apr 2, 2019, 9:44 AM

#

it just inspects the order(how many nodes in ugraph)

wicked flare Apr 2, 2019, 9:45 AM

#

Is "UGraph" a data type from some specific library or code example or something? Or is it just an abbreviation for undirected graph?

urban ibex Apr 2, 2019, 9:45 AM

#

this is all i have to go off...

#

Consider an ADT for undirected graphs, named UGraph, that includes these operations:

nodes, which returns a sequence of all nodes in the graph, in no particular order
has_edge, which takes two nodes and returns true only if there is an edge between those nodes
edges, which returns a sequence of node-node pairs (tuples), in no particular order. Each edge only appears once in the returned sequence, i.e. if the pair (node1, node2) is in the sequence, the pair (node2, node1) is not.
How each node is represented is irrelevant. Because the graph is undirected, has_edge(node1, node2) and has_edge(node2, node1) return the same. You can assume the graph is connected and has no edge between a node and itself.

wicked flare Apr 2, 2019, 9:45 AM

#

Oh, wait, I missed that

#

Ok

#

Then I'm with you

urban ibex Apr 2, 2019, 9:45 AM

#

so yeh

#

undirected

wicked flare Apr 2, 2019, 9:46 AM

#

Ok, so, inputs

#

What's the input for this operation?

urban ibex Apr 2, 2019, 9:46 AM

#

just the graph?

wicked flare Apr 2, 2019, 9:46 AM

#

Yeah

#

I'd say so as well

urban ibex Apr 2, 2019, 9:46 AM

#

should i put UGraph?

wicked flare Apr 2, 2019, 9:47 AM

#

I guess? Do you have any example specification for the existing operations?

urban ibex Apr 2, 2019, 9:47 AM

#

nah

wicked flare Apr 2, 2019, 9:47 AM

#

Or an example specification for some other data type operation?

urban ibex Apr 2, 2019, 9:47 AM

#

yeah

wicked flare Apr 2, 2019, 9:48 AM

#

Just so we have an idea of what the expected format is

#

Can you paste that?

urban ibex Apr 2, 2019, 9:48 AM

#

Inspector: isEmpty
Inputs: theStack, a stack of objects (o1, o2, ..., on) Preconditions: true
Outputs: a boolean empty
Postconditions: empty is true if n=0, otherwise false

wicked flare Apr 2, 2019, 9:48 AM

#

Ok, right, so you can just specify a UGraph as the input

urban ibex Apr 2, 2019, 9:49 AM

#

okay

wicked flare Apr 2, 2019, 9:49 AM

#

I don't know what they mean by "Preconditions: true"

urban ibex Apr 2, 2019, 9:49 AM

#

that the inputs are true i guess

wicked flare Apr 2, 2019, 9:49 AM

#

The input is a stack though. A stack isn't a boolean.

#

It can't be true or false.

#

I don't see any need for any preconditions for either isEmpty or order.

#

Maybe you can just leave that blank.

#

And then argue with your professor if they disagree.

#

I mean, a precondition is something that has to be true before the operation can be called.

urban ibex Apr 2, 2019, 9:51 AM

#

i think it just means that the input is correct

wicked flare Apr 2, 2019, 9:51 AM

#

There is no sense in which input can be correct or not.

#

The input is the input.

urban ibex Apr 2, 2019, 9:52 AM

#

of course there is

wicked flare Apr 2, 2019, 9:52 AM

#

But I mean, in general, an operation can require preconditions.

urban ibex Apr 2, 2019, 9:52 AM

#

if it said the input was integers

wicked flare Apr 2, 2019, 9:52 AM

#

Let me think of an example.

urban ibex Apr 2, 2019, 9:52 AM

#

and i entered letters

#

then the input is false

#

so the pre condition is that the input is true(correct)

wicked flare Apr 2, 2019, 9:52 AM

#

No, the input would be invalid. But you are already specifying in the input specification that the input is a number.

#

In a concrete programming language, this would be handled by specifying the data type.

#

Or possibly validating the data type in the case of a weakly typed language.

#

In the case of isEmpty, you are specifying that the input is a stack. It can't be anything else.

urban ibex Apr 2, 2019, 9:54 AM

#

heres what it means

#

The precondition true specifies the operation is valid for any state of a stack: there are no preconditions. An ADT invariant specifies conditions that are True for any instance and remain True throughout its lifetime ireespective of the operations carried out on it

wicked flare Apr 2, 2019, 9:55 AM

#

Ah, yeah, I was gonna say that.

#

If you think of the precondition as a condition, i. e. a boolean statement, then "true" would signify that all previous conditions are valid.

#

Fine.

#

So that should be the case for your order operation too.

#

Because you can always ask a graph for its order.

#

Just as you can always ask a stack if it's empty.

urban ibex Apr 2, 2019, 9:57 AM

#

so precon of True then?

wicked flare Apr 2, 2019, 9:57 AM

#

Right.

#

Ok, so, outputs.

#

What's your output?

urban ibex Apr 2, 2019, 9:58 AM

#

the order (number of nodes in UGraph)

wicked flare Apr 2, 2019, 9:58 AM

#

Right.

urban ibex Apr 2, 2019, 9:58 AM

#

how should i phrase that though

wicked flare Apr 2, 2019, 9:59 AM

#

Or well, if we look at the specification for isEmpty, what they seem to want is that you just specify the data type in outputs, and describe the contents of the output in the post-condition

urban ibex Apr 2, 2019, 9:59 AM

#

so an integer

#

because you never have like half a node

wicked flare Apr 2, 2019, 9:59 AM

#

Right

urban ibex Apr 2, 2019, 9:59 AM

#

would you just call it order?

wicked flare Apr 2, 2019, 10:00 AM

#

Yeah

#

Sounds like the most straightforward name

urban ibex Apr 2, 2019, 10:01 AM

#

an integer, order (the number of nodes in the graph)

wicked flare Apr 2, 2019, 10:01 AM

#

Well, I would put what's inside the brackets in the post-condition section

#

Since that's how they phrased it in the isEmpty specification

urban ibex Apr 2, 2019, 10:02 AM

#

okay

wicked flare Apr 2, 2019, 10:02 AM

#

Like, post-condition: order is the number of nodes in the graph

urban ibex Apr 2, 2019, 10:02 AM

#

should order be > 0

wicked flare Apr 2, 2019, 10:02 AM

#

Well, there is such a thing as an empty graph

#

So order could be 0

urban ibex Apr 2, 2019, 10:02 AM

#

really?

wicked flare Apr 2, 2019, 10:02 AM

#

Yeah

#

Most data structures in CS can have size 0

#

Empty sets, empty strings, etc.

urban ibex Apr 2, 2019, 10:03 AM

#

but they arent graphs though unless im missing soemthing

wicked flare Apr 2, 2019, 10:03 AM

#

Yeah, they are

urban ibex Apr 2, 2019, 10:03 AM

#

??

wicked flare Apr 2, 2019, 10:03 AM

#

An empty set is still a set

#

An empty string is still a string

urban ibex Apr 2, 2019, 10:04 AM

#

hmm

wicked flare Apr 2, 2019, 10:04 AM

#

Like, you can have a string split operation, which takes a string to split and a separator string to split on. If you supply the empty string as the separator, it will split inbetween every character

#

Like "abc".split("") == ["a", "b", "c"]

#

Makes total sense

urban ibex Apr 2, 2019, 10:05 AM

#

alright well what would you have as post con then?

wicked flare Apr 2, 2019, 10:05 AM

#

There are similar things that might make sense for empty instances of other data types

#

post-condition: order is the number of nodes in the input graph, perhaps?

urban ibex Apr 2, 2019, 10:06 AM

#

i guess im not 100% on this one though

wicked flare Apr 2, 2019, 10:06 AM

#

What part?

urban ibex Apr 2, 2019, 10:06 AM

#

post con

wicked flare Apr 2, 2019, 10:06 AM

#

I'm just following the format in the isEmpty example

urban ibex Apr 2, 2019, 10:06 AM

#

guess you could also say singular integer

wicked flare Apr 2, 2019, 10:07 AM

#

In that one, in output, they just say that it's a boolean

#

And in post-con, they say what the value of the boolean should be

#

True if n > 0, else false

urban ibex Apr 2, 2019, 10:07 AM

#

should i say singuar integer?

wicked flare Apr 2, 2019, 10:07 AM

#

I don't know what you mean by "singular"

urban ibex Apr 2, 2019, 10:08 AM

#

just 1 integer

wicked flare Apr 2, 2019, 10:08 AM

#

If you say "an integer", that implies that it's just one

#

That's what the word "an" means

urban ibex Apr 2, 2019, 10:08 AM

#

yeah just realised i have that

#

📎 unknown.png

wicked flare Apr 2, 2019, 10:09 AM

#

I don't think you need to repeat the data type if you already mention it in the outputs section

urban ibex Apr 2, 2019, 10:10 AM

#

hmm

#

cant really half to though i guess

#

*harm

wicked flare Apr 2, 2019, 10:10 AM

#

It's like when you're programming. You only need to specify the data type when you declare a variable. You don't need to repeat it every time you use it.

#

No, probably not, but why be redundant when you don't need to?

urban ibex Apr 2, 2019, 10:11 AM

#

i guess, i think ill leave it though

wicked flare Apr 2, 2019, 10:11 AM

#

Up to you.

urban ibex Apr 2, 2019, 10:11 AM

#

would you be able to take a look at something im stuck on for me in a moment?

#

its related to this Q still

wicked flare Apr 2, 2019, 10:12 AM

#

Perhaps. I'm at work, but if I have free time.

urban ibex Apr 2, 2019, 10:12 AM

#

yeah thats alright no rush im just finishing something off and then ill @ you

lapis sequoia Apr 2, 2019, 10:24 AM

#

hey. I have a dataset where each row looks like this [0,1,2,4] where the columns are [Outcome,Person1,Person2,Person3] I want to transform it into [0,1,1,0,1,0] where Outcome stays the same but Person1-3 changes to reflect 1 or 0. Forexample Person3 is 4 so index 4 is 1, Person1 is 1 so index 1 is 1, no person is 5 or 3, so index 3 and 5 are 0

#

I want to use it for logistic regression so for something like sklearn or similar would be cool

#

I've looked around and i am not sure how to proceed. The best I could think of would be to simply make an entirely new dataset and just load that but I thought there might be a better solution.

supple ferry Apr 2, 2019, 10:41 AM

#

@lapis sequoia what you need is pd.get_dummies from Pandas. It will convert one column with categories to multiple boolean ones

#

>>> s = pd.Series(list('abca'))
>>> pd.get_dummies(s)
   a  b  c
0  1  0  0
1  0  1  0
2  0  0  1
3  1  0  0

#

from their documentation

#

In [11]: df = pd.DataFrame(["a", "b", "c", "d"], columns= ["People"])

In [12]: df
Out[12]:
  People
0      a
1      b
2      c
3      d

In [13]: pd.get_dummies(df)
Out[13]:
   People_a  People_b  People_c  People_d
0         1         0         0         0
1         0         1         0         0
2         0         0         1         0
3         0         0         0         1

#

this is example dataframe from your use case

lapis sequoia Apr 2, 2019, 10:57 AM

#

@supple ferry thanks a lot. that's perfect! Is there a way to do it for multiple columns? Like if coulm2 is People2 and had index 0 = b, would pd.get_Dummies write the first row as 1,1,0,0 in your example?

#

column2*

supple ferry Apr 2, 2019, 11:01 AM

#

@lapis sequoia You can add column names that you want to get dummies to as an argument to that function. Because I had just one column, i did not use that. you can also add a custom prefix to each created column and instead of People_b you can hve Person_b if you add prefix = "Person_" as argument

lapis sequoia Apr 2, 2019, 11:13 AM

#

ok thats really perfect. thank you so much

tiny comet Apr 2, 2019, 9:09 PM

#

hi im new here, in the comunity

i try to convert some xml 3d data from one dcc package to another using numpy.
something like this.

def seup_lod_arrays(self, point_num_of_lods = [] , triangles_of_lods =[] , quads_of_lods =[]):
    point_data = []
    for iter, value in enumerate(point_num_of_lods):
        point_data.append(np.empty([value,14]))
        point_data.append(np.empty([triangles_of_lods[iter],1]))
        point_data.append(np.empty([quads_of_lods[iter],1]))

    self.data = point_data

i have just one question, make this sense ? (-:
Or destroys ,using a list of numpy arrays the speed of the arrays.
.is the list just pointing to the arrays ?

help would be awsome

supple ferry Apr 3, 2019, 6:52 AM

#

Hey. First things first, it is good if you format your code using backward tick notation. with this code it is not clear what you want to do. Maybe more code and some example output you want will be helpful. np.empty is simply create an empty array with given shape. If you want that array to have some values in it, it does not do that

lapis sequoia Apr 4, 2019, 7:45 AM

#

I need to superimpose transparency onto a matplotlib heatmap

#

Could anybody help me there?

paper niche Apr 4, 2019, 8:07 AM

#

post your code, if someone can help, they will respond

finite solar Apr 4, 2019, 12:31 PM

#

Any idea why py val = 1 / np.sqrt(2 * np.pi) * integrate.quad( lambda t: np.exp(-t ** 2 / 2), -np.inf, z ) raises TypeError: can't multiply sequence by non-int of type 'numpy.float64'

#

where integrate is import scipy.integrate as integrate

#

and z = (mean - value) / stddev

#

er

#

yeah that's right

lapis sequoia Apr 4, 2019, 12:32 PM

#

probably because you're multiplying a sequence by a numpy float?...

finite solar Apr 4, 2019, 12:33 PM

#

oh yeah integrate.quad returns 2 items

#

aaaaaa

#

confusing as heck error because it points to the line with z

#

cool I guess it works now

lapis sequoia Apr 4, 2019, 12:36 PM

#

python's instruction to line number tables can't go backwards, even if that means errors make no sense at all

desert cradle Apr 4, 2019, 7:32 PM

#

i wonder if it'd make more sense in debug mode

#

that turns off some optimizations

#

wait, no, debug mode is default

lapis sequoia Apr 4, 2019, 7:36 PM

#

here we go, boys

#

https://youtu.be/v1PvpKN6uwc

YouTube

Anaconda, Inc.

AI World - AnacondaCON 2019

When everything is artificial intelligence...is anything intelligible at all?

▶ Play video

lapis sequoia Apr 4, 2019, 8:08 PM

#

hi

#

I am trying to create an histogram but it doesn't look right

#

📎 Screen_Shot_2019-04-04_at_10.06.14_PM.png

#

do you think it is accurate?

#

how can i make it look more like an histogram, for exmple, branching out to the left adn right etc

waxen vine Apr 4, 2019, 8:59 PM

#

does that only have one plot point?

lapis sequoia Apr 4, 2019, 10:00 PM

#

hi @supple ferry I just want to thank you for helping me out this tuesday. It is working flawlessly 👌

#

https://github.com/ForrestKnight/open-source-cs

GitHub

ForrestKnight/open-source-cs

Video discussing this curriculum:. Contribute to ForrestKnight/open-source-cs development by creating an account on GitHub.

#

https://www.youtube.com/watch?v=NyOvFSP_IpQ

YouTube

ForrestKnight

The Open Source Computer Science Degree

This is my curated list of free courses from reputable universities like MIT, Stanford, and Princeton that satisfy the same requirements as an undergraduate ...

▶ Play video

lapis sequoia Apr 4, 2019, 10:35 PM

#

can someone please please please pin this video ^

#

I love it

supple ferry Apr 4, 2019, 11:53 PM

#

@lapis sequoia you have just one point. Make sure that for get all points included.

lean ledge Apr 5, 2019, 12:17 AM

#

First of all, not data science, also already been overdone to death to various extents

#

https://github.com/ossu/computer-science

GitHub

ossu/computer-science

:mortar_board: Path to a free self-taught education in Computer Science! - ossu/computer-science

#

Is pretty good

lapis sequoia Apr 5, 2019, 1:23 AM

#

@lean ledge hmm

#

guess im out of the loop

#

which do you think is the best of these kinds?

waxen vine Apr 5, 2019, 5:15 AM

#

I wish the courses in that link did not have set dates

blissful cedar Apr 5, 2019, 12:39 PM

#

I haved maded the AI course of Berkeley, pretty nice too

#

my other massege didn't go

#

message*

#

I said: "Thanx a lot guys, those gits have a lot of valuable information"

waxen vine Apr 5, 2019, 5:42 PM

#

You guys know if python can rip all images from a website, store, them then add meta data based off image analysis of them, and then sort them into catagories?

serene veldt Apr 5, 2019, 11:05 PM

#

does anyone know a kfold method for python besides sklearn?

#

https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.KFold.html

#

this one splits into test and train

#

i was looking for oen that just split into train

paper niche Apr 6, 2019, 12:42 AM

#

@serene veldt what would that look like? let's say 5-fold, so you want something that splits your dataset into 4+1, but both used for training?

#

anyway there's nothing stopping you from using the outputs from kfold as both training sets.. can you explain a little better about your use case?

still abyss Apr 6, 2019, 12:51 AM

#

Hey guys, I have a pandas related question.

paper niche Apr 6, 2019, 12:53 AM

#

sure, just ask

still abyss Apr 6, 2019, 12:54 AM

#

Okay, I'm trying to select rows from a DF but I keep getting a max recurrsion depth error. The code is simple and I know I've done this thing last semester.

df[df["W%"] > conf]

#

"RecursionError: maximum recursion depth exceeded while calling a Python object"

paper niche Apr 6, 2019, 12:55 AM

#

hmm can you show df.head(), and also what's conf?

still abyss Apr 6, 2019, 12:55 AM

#

W% Var
0 0.608696 0.706
1 0.426829 0.714
2 0.317073 0.754
3 0.207317 0.781
4 0.256098 0.741

#

conf is just a float.

#

conf = mean + (1.96 * sd)
conf
0.8041728975013317

paper niche Apr 6, 2019, 12:57 AM

#

df = pd.DataFrame(np.random.random((10, 2)), columns=['W%','b'])
df[df['W%']>0.5]

does this run for you?

#

it's runs perfectly fine on mine.

still abyss Apr 6, 2019, 12:57 AM

#

Yes.

#

Which is why I don't get why I'm having issues.

paper niche Apr 6, 2019, 12:58 AM

#

try changing conf to 0.8 (the number)?

#

where are you doing this by the way? notebook or py file?

still abyss Apr 6, 2019, 12:58 AM

#

Jupyter notebook.

paper niche Apr 6, 2019, 12:59 AM

#

hmmm

still abyss Apr 6, 2019, 12:59 AM

#

df[df["W%"] > 0.8] is still max recurrsion.

paper niche Apr 6, 2019, 1:00 AM

#

what does df["W%"]>0.8 return?

still abyss Apr 6, 2019, 1:00 AM

#

Also a max recursion.

paper niche Apr 6, 2019, 1:00 AM

#

just df["W%"]?

#

is there max recursion, I mean. I don't really need to see the output

still abyss Apr 6, 2019, 1:01 AM

#

TypeError: cannot concatenate object of type "<class 'numpy.ndarray'>"; only pd.Series, pd.DataFrame, and pd.Panel (deprecated) objs are valid

#

nd the error is suppppppppppppper long.

paper niche Apr 6, 2019, 1:02 AM

#

can you show the full stacktrace?

#

if it's long, use hastebin

#

https://paste.pydis.com/

#

what comes before this erroneous line? how are you setting up your df?

still abyss Apr 6, 2019, 1:05 AM

#

Hmmm... let me try rerunning jupyter. I just got an error on df.tail()

#

df = pd.DataFrame(data={"W%":data["W%"], "Var":data["FT%"]})

paper niche Apr 6, 2019, 1:07 AM

#

and data is another dataframe? why create a new dataframe when you can just index the cols out?

#

though I'm not sure if that's the issue

#

did restarting the kernel help?

still abyss Apr 6, 2019, 1:08 AM

#

Because I'm ultimately trying to index through and compare each variable to see if Win % is > the confidence interval.

#

Okay, restarting seems to have fixed it.

paper niche Apr 6, 2019, 1:10 AM

#

okay 👍

still abyss Apr 6, 2019, 1:10 AM

#

df[df["W%"] > conf]
W% Var
19 0.817073 0.696
40 0.833333 0.797
115 0.804878 0.771
137 0.817073 0.794
366 0.841463 0.747
367 0.878049 0.746
371 0.817073 0.744
394 0.804878 0.757
489 0.804878 0.754
826 0.817073 0.788
827 0.890244 0.763
828 0.817073 0.768
980 0.817073 0.805
1067 0.817073 0.803

#

Thank you for the help.

paper niche Apr 6, 2019, 1:11 AM

#

np

vague jetty Apr 6, 2019, 4:06 PM

#

Has anyone had any issues using np.split or np.asarray on a list of dataframes? I'm getting '''ValueError: cannot copy sequence with size 11122 to array axis with dimension 88''', so I assume numpy is trying to operate on the dataframes inside the list. I just want to split the list and not operate on the dataframes.

#

The shape of the dataframes inside the list are of shape (11122,88), hence why I think numpy is trying to operate on the dataframes inside the list.

#

Ended up posting to SO

#

https://stackoverflow.com/questions/55551068/numpy-split-not-working-a-list-of-dataframes

Stack Overflow

Numpy.split not working a list of DataFrames

I have a list of 31 dataframes, days, of shapes (11122, 88) to (22123, 88) stored in a standard Python list (of overall shape (31, 1)). I want to call train_x, test_x = np.split(days, [int(0.8*len(...

lusty abyss Apr 6, 2019, 5:20 PM

#

Hi guys, I'm planning in creating a crawler for a company and it envolves AI (NLP) and Big Data. I was wondering, what is the best libraries for that both crawling and NLP in python that can help me create something that can scale?

void anvil Apr 6, 2019, 5:35 PM

#

spacey for nlp is a solid starting point

dense rose Apr 6, 2019, 6:01 PM

#

Not really a data science question but I'm having difficulties with Jupyter.

chilly geyser Apr 6, 2019, 6:31 PM

#

What kind of difficulties Arc

hardy crag Apr 6, 2019, 6:46 PM

#

@lusty abyss maybe scrapy for the crawling part

#

@dense rose pleas do tell

dense rose Apr 6, 2019, 6:57 PM

#

Oh it's a stupid issue I think it's just an problem with the paths Jupyter is using.

#

But so I do pip install Jupyter and it installs fine

#

And python -m jupyter works

#

But python -m jupyter notebook gives Error executing Jupyter command 'notebook': [Errno 'jupyter-notebook' not found] 2

#

I should mention Win10 I suppose.

lusty abyss Apr 6, 2019, 7:33 PM

#

@hardy crag I'll be reading about scrappy more

#

@void anvil Is it multi language? The only one I found that supports another language other than english is NLTK

void anvil Apr 6, 2019, 7:48 PM

#

what do you mean?

#

https://spacy.io/

spaCy · Industrial-strength Natural Language Processing in Python

spaCy is a free open-source library for Natural Language Processing in Python. It features NER, POS tagging, dependency parsing, word vectors and more.

#

there's nothing really better out of the box

lusty abyss Apr 6, 2019, 8:06 PM

#

@void anvil for example, for sentiment analysis

hardy crag Apr 6, 2019, 8:09 PM

#

@dense rose windows or linux/mac? system wide python or conda/virtual env?

dense rose Apr 6, 2019, 8:20 PM

#

Win10 just system wide.

hardy crag Apr 6, 2019, 8:30 PM

#

have you tried reinstalling jupyter?

#

pip install --upgrade --force-reinstall --no-cache-dir jupyter

#

also maybe checkout this. https://github.com/jupyter/notebook/issues/4195

GitHub

jupyter.exe isntalled in the wrong location on windows · Issue #4...

After running > python -m pip install jupyter I was unable to run > jupyer notebook as per the instructions. I did some searching, my python environment has the following scripts folder which...

dense rose Apr 6, 2019, 8:42 PM

#

Yes.

#

And that link is an open question with no answer.

hardy crag Apr 6, 2019, 8:48 PM

#

well no he said its installed in the wrong location, so you would have to use that exe instead of the command line command

#

it might also be worth considering anaconda

dense rose Apr 6, 2019, 9:26 PM

#

Oh my b I just read the first part and saw a similar question and skipped to the (non-existent) answers.

#

Yeah that's weird.

lyric sedge Apr 7, 2019, 7:27 AM

#

So this is a basketball sport data visual.
This about a basketball shooting three pointers
Axis attempts and made
Legend is positions of basketball

I want to ask what kinda person would use things like this ?

https://www.reddit.com/r/dataisbeautiful/comments/b2ysmn/basketball_stats_on_the_three_pointers_totals/

r/dataisbeautiful - Basketball Stats on the three pointers totals ...

5 votes and 4 comments so far on Reddit

vague jetty Apr 7, 2019, 3:11 PM

#

I'm working on an LSTM operating on time-series data. The lengths of my input sequences vary from 968 to 6244 measurements. What's the best way to normalize their lengths? I could just take the median number of measurements, throw away sequences with less, and chop ones with more. Or I could work on interpolating the data, but I wouldn't want to interpolate 968 measurements to 6244. I know this is dependent on what my data actually is (it's all measurement data from several sensors), but are there any rules-of-thumb or guidelines to normalizing sequence length?

#

I guess another alternative would be to feed in sliding window subsequences (say of length 200) of each sequence rather than the whole sequence itself?

vague jetty Apr 7, 2019, 3:50 PM

#

Nvm, I've looked more into it and looks like the norm is just use 250-500 steps as inputs, so I'll just make windows using that

vague jetty Apr 7, 2019, 4:24 PM

#

Does anyone know if dataframes have any built in methods for deque-like behavior? Specifically, I want to have the dataframe pop an entry when a new entry is added once the dataframe hits a certain length. Writing code to do this wouldn't be hard, but I was curious if any of that is built-in already.

paper niche Apr 7, 2019, 4:30 PM

#

https://stackoverflow.com/q/29609118 if you havent yet seen it

Stack Overflow

How to specify the number of rows a pandas dataframe will have?

I have a Pandas dataframe and I am continually appending a row of data each second as below.

df.loc[time.strftime("%Y-%m-%d %H:%M:%S")] = [reading1, reading2, reading3]

df
...

vague jetty Apr 7, 2019, 5:07 PM

#

Thanks! That looks about like what I was going to write manually

mossy dragon Apr 8, 2019, 7:30 AM

#

Hello!

#

anybody here actually works as a data scientist?

gilded dagger Apr 8, 2019, 7:31 AM

#

Small question: how good is Dash?

#

https://dash.plot.ly/getting-started

Dash User Guide and Documentation - Dash by Plotly

Dash User Guide and Documentation. Dash is a Python framework for building analytical web apps in Python.

#

I'm looking for a way to create interactive dashboards for my coworkers (not necessarily using Python) and came across it. Is it popular?

#

I usually use seaplots for my plots but they're kinda hard to automatically export to create browser-compatible dashboards, rihgt?

mossy dragon Apr 8, 2019, 7:34 AM

#

I think tablaeu is popular

#

not python though

gilded dagger Apr 8, 2019, 7:34 AM

#

Tableau is 80$/month though lol

#

I used it at my previous job but I'm working for a smaller company now and I'd like something that's open source 😄

#

But I don't want to spend hours understanding the Dash framework if it's not really popular/maintained

mossy dragon Apr 8, 2019, 7:39 AM

#

i see

#

if you knew R you could just use Shiny

#

I hear its great

gilded dagger Apr 8, 2019, 7:45 AM

#

Mmmmmh that's true. Never used R but it can't be that hard.

#

I'm doing all my analysis with Python though so ideally I'd like to work with it

#

So I can re-use my existing library

#

And my sql-alchemy integration

lean ledge Apr 8, 2019, 8:03 AM

#

Poltly is decent for making dashboards in general

#

poorly.dashboard_objs

gilded dagger Apr 8, 2019, 8:06 AM

#

Is it really popular? I don't see many people talking about it (compared to seaborn for example)

lapis sequoia Apr 8, 2019, 10:42 AM

#

R isn't used in production at most major companies..

#

not enough support.. plus R studio license cost..

supple ferry Apr 8, 2019, 10:59 AM

#

You can use plotly but it won't be versatile as Shiny of R

lyric canopy Apr 8, 2019, 11:42 AM

#

R's still very popular in academics (more so than Python, although Python is gaining traction). That also means that newer statistical methods may be implemented in R before they are implemented in Python. My guess is that this applies less to machine learning, but that's not my field. I do know my research group still mainly uses R and pushes out a lot of R-packages with the new developments.

#

Stuff like newly published methods for Prediction Rule Ensembles and multiple imputation.

lean ledge Apr 8, 2019, 12:02 PM

#

Definitely correct, R is the preferred language in a lot of science, not just pure statistics

#

While ML stuff is generally python

hardy crag Apr 8, 2019, 5:02 PM

#

@gilded dagger I think Dash is exactly what you are looking for.

light cloud Apr 8, 2019, 5:30 PM

#

I asked this in #career-advice but I thought I would ask here as well.

I am hiring a couple data analysts/scientists/ML dudes for the summer. Got some promising candidates.

Outside of some technical stuff, are there any good questions to ask people through a case study. I am interested in how they deal with problems, their methodology and those things more than their technical acumen in this particular instance.

void anvil Apr 8, 2019, 10:10 PM

#

Give them a data analysis project that'll take 2 hours to do something easy on (clean data set, maybe a couple holes and some irrelevant data) and ask them to give a 10 minute presentation on their findings

lean ledge Apr 9, 2019, 2:51 AM

#

Doesn't really help with data science and ML people

#

DS and ML is more about being able to understand high dimensional data and the maths and behaviour behind it all. I'm not sure how you can measure that

inland garnet Apr 9, 2019, 10:07 AM

#

Hello all

#

I'm very new to using Python for data science, and was wondering if anyone would be able to shine some light on an issue I'm having

#

I am trying to use statsmodels to forecast call data, using this guide: https://medium.com/datadriveninvestor/how-to-build-exponential-smoothing-models-using-python-simple-exponential-smoothing-holt-and-da371189e1a1

Medium

How to Build Exponential Smoothing Models Using Python: Simple Exp...

How many iPhone XS will be sold in first 12 months?

#

I have a month's worth of call data in a pandas dataframe

#

it's a simple 1D array with date as the index and calls as data

#

Simple exponential smoothing seems to work:

#

📎 unknown.png

#

but if I try the full ExponentialSmoothing function I just get this error:

#

📎 unknown.png

#

WHen I'm using the exact same dataframe as before

#

📎 unknown.png

#

📎 unknown.png

#

cleared up some clutter

inland garnet Apr 9, 2019, 10:30 AM

#

seems to only happen with the mul method

supple ferry Apr 9, 2019, 11:24 AM

#

@inland garnet , i am not very expert in statsmodels, but this error indicates that boolean indexing is not properly "translated" into pandas. Seems like statsmodels bug to me (maybe)
If you look at traceback line 391, you will see something like this:
(condition) and (condition)
This is pythonic way to do it, which is not acceptable to pandas
In panas, boolean indexing is to be done via | , & operators

#

please, refer to
http://pandas-docs.github.io/pandas-docs-travis/user_guide/indexing.html#boolean-indexing

#

or to this stackoverflo answer
https://stackoverflow.com/a/54358361/10943886

Stack Overflow

Logical operators for boolean indexing in Pandas

I'm working with boolean index in Pandas.
The question is why the statement:

a[(a['some_column']==some_number) & (a['some_other_column']==some_other_number)]
works fine whereas

a[(a['some_co...

lapis sequoia Apr 9, 2019, 11:52 AM

#

you're better off implementing your own functions.. but where are you importing statsmodels from

inland garnet Apr 9, 2019, 12:22 PM

#

Right

#

I just installed statsmodels via pip

#

and then i'm importing from statsmodels.tsa.api

#

Thanks @supple ferry

#

I think I'm in over my head

#

Every guide to exponential smoothing in python seems to use pandas, so it seems strange that statsmodels isn't working. my data is extremely simple and 1D

lime lava Apr 9, 2019, 8:04 PM

#

in pandas, i want to check for duplicates for all columns except one

#

so i made a list of columns with dataf.columns, removed "sku" from every list, then did dataframe.duplicated(subset=listwithoutsku)

#

but the output is a list of every row where there is at least one duplicated column from that list

#

how can I make it so it only counts full duplicates from the subset?

lapis sequoia Apr 10, 2019, 3:06 AM

#

could you rephrase your question..

#

unable to follow

dense rose Apr 10, 2019, 5:28 AM

#

I'm in a introductory ML class and the final project is one of the projects on Kaggle (mostly free to choose). Any suggestions on some that are not too hard but still interesting?

lean ledge Apr 10, 2019, 5:29 AM

#

Pokemon one can be amusing

gilded dagger Apr 10, 2019, 8:18 AM

#

Anybody here has any experience with Dash apps?

#

I'm having a lot of trouble just making a very basic CSS

#

Look at that date picker LOL

📎 wtf.mov

river plume Apr 10, 2019, 9:15 AM

#

hello guys, i have installed the cpu version of keras

#

how do i change it to the gpu one?

#

didnt find it on stack of

lean ledge Apr 10, 2019, 9:16 AM

#

you should be installing tensorflow-gpu rather than just tensorflow

river plume Apr 10, 2019, 9:17 AM

#

so shall i uninstall tf and then install tf-gpu?

lean ledge Apr 10, 2019, 9:17 AM

#

yes, most likely

#

personally i use the keras built into tensorflow anyway, so

river plume Apr 10, 2019, 9:18 AM

#

yeah even im using keras

lean ledge Apr 10, 2019, 9:19 AM

#

there's a separate keras library and then there's tensorflow.keras

#

i use the latter

river plume Apr 10, 2019, 9:19 AM

#

but my cpu usage was at 95% and gpu usage was 5%

#

looks like i made a mistake installing keras-cpu

lean ledge Apr 10, 2019, 9:20 AM

#

keras-cpu is not a thing, keras is only an API. it doesnt handle hardware on its own

#

it leaves that to the backend

shrewd phoenix Apr 10, 2019, 10:00 PM

#

I have a script I made for turning some geographic data points into a heatmap-style video. It uses scipy.interpolate.griddata() to do the interpolating, which works pretty well (see image).

📎 griddata.PNG

#

However, I'd like for it to do some extrapolating outside the boundary created by available data points. I've tried using scipy.interpolate.Rbf for this with extremely poor results. More recently, I tried using scipy.interpolate.interp2d, which has also given extremely poor results (see image).

📎 interp2d.PNG

#

Can anyone explain why it's like this?

misty sonnet Apr 10, 2019, 10:02 PM

#

Try sending your code. It often makes issues like this easier https://paste.mcadesigns.co.uk

shrewd phoenix Apr 10, 2019, 10:06 PM

#

Give me a second to redact a couple things and I'll put it up. I'll put up a slightly different script than the one that made those screenshots, but it showed the same problems

#

https://paste.mcadesigns.co.uk/ujubotomak.py

#

^ Line 113 works, but if I comment it and uncomment Line 114 that creates bad images

#

Example from pasted code using interp2d:

📎 unknown.png

shrewd phoenix Apr 10, 2019, 11:11 PM

#

Well, if anyone has any insight please ping me. I have this server muted, so I won't notice otherwise.

analog helm Apr 11, 2019, 3:39 AM

#

would asking questions about OpenCL and/or coherent noise be within the realm of this channel, or not really"?

lean ledge Apr 11, 2019, 4:02 AM

#

@analog helm it's python specific so probably not but asking in the offtopic channels would be fine

rich chasm Apr 11, 2019, 4:30 AM

#

I found a great site to learn anything data science for free https://courses.cognitiveclass.ai

Cognitive Class

admin

Data Science and Cognitive Computing Courses - Cognitive Class

Free Courses in Data Science, AI, Cognitive Computing, Blockchain and more

lapis sequoia Apr 11, 2019, 5:29 AM

#

it's been there a while.. really elaborate..

#

too bad people don't use it much

#

wait a minute.. are you from IBM

lapis sequoia Apr 11, 2019, 2:32 PM

#

@rich chasm i love u

analog helm Apr 11, 2019, 10:18 PM

#

Does anyone have any recommendations for someone wanting to produce and manipulate (coherent) noise in Python? I've only found two libraries which do such, both are ports of the C++ libnoise library. One of the ports has some random Visual Studios dependency, and I can't find a prebuilt wheel, official or not. The other has a dependency on (Py)OpenCL which is a holy nightmare of its own. I checked the original libnoise library, and there is nothing in it to do with OpenCL, so why the author of the port decided to make the baffling, inane, and stupid decision to weld that crap on without any choice on the user's part is completely beyond me. But now I'm just ranting!

Is there a library or something I'm missing which either ports libnoise or has equivalent functionality, and "just works" without necessitating any ridiculous proprietary runtimes, or exotic drivers?

#

It's gotten to point where I'm seriously considering just porting the original libnoise myself, but obviously I'd like to avoid that if i can

#

Oh right, the ports i found are noisepy (the one with the VS dependency) and PyNoise (the one with the OpenCL dependency)

lean ledge Apr 11, 2019, 10:24 PM

#

@analog helm What kind of noise are you looking for

analog helm Apr 11, 2019, 10:39 PM

#

The noise it's self isn't really the issue. I can implement perlin, sinplex, voronoi, etc in 10 minutes for each. The main convenience of using libnoise is its full set of features and automation in regards to multi dimensional containers, multiple 'layers' of modules which modify the noise values, and the built-in visualization system

lean ledge Apr 11, 2019, 10:46 PM

#

You can probably implement a lot of that relying on standard python data science libraries without much effort.

serene veldt Apr 12, 2019, 10:19 AM

#

Need some help with scikit learn

#

using the naive bayes classifiers

serene veldt Apr 12, 2019, 11:15 AM

#

sklearn.naive_bayes.BernoulliNB

#

it has a binarize parameter

#

its a threshold

#

so i would assume it goes from [0,1]

#

however, using values above 1 produces diferent results, some better some worse

#

so i cant really understand what that threshold means

lyric canopy Apr 12, 2019, 1:56 PM

#

@serene veldt It's used to convert a floating point number to a binary (boolean) value

#

So, you need a boundary to determine in which category something belongs

#

The default is 0.0 (so all negative numbers go into the first category, all positive numbers go into the second; I don't know how it treats the boundary itself)

#

Since those floating point numbers can have any value (well... you know what I mean), the boundary can be "anywhere"

#

So, say that I have numbers ranging from 0 to 100 and I want the boundary to be 90, I can use 90 for the binarize parameter

serene veldt Apr 12, 2019, 1:58 PM

#

so all bellow 90 turn 0 and above 100 become 1?

#

or the oposite

lyric canopy Apr 12, 2019, 1:59 PM

#

I don't know, but does it matter? I'm not familiar with this model, but it could just be two groups without any significant meaning or order

serene veldt Apr 12, 2019, 2:00 PM

#

i would just like to understand how the threshold works, since they are ot specific at all

#

to run some tests

#

since, imo, they dont properly explainhow to work with it and how it binarizes

#

but i apreciate the help

lyric canopy Apr 12, 2019, 2:04 PM

#

It's here: https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.binarize.html

#

That's the function it calls to binarize it

#

According to the source code: https://github.com/scikit-learn/scikit-learn/blob/7b136e9/sklearn/naive_bayes.py#L920

GitHub

scikit-learn/scikit-learn

scikit-learn: machine learning in Python. Contribute to scikit-learn/scikit-learn development by creating an account on GitHub.

#

@serene veldt

#

So it's 1 above the threshold and 0 below or equal

serene veldt Apr 12, 2019, 2:14 PM

#

Much appreciated!

#

That really helps

balmy geyser Apr 12, 2019, 7:48 PM

#

Are there any folks here that know how to display mutliple plots?

#

in matplotlib?

lean ledge Apr 12, 2019, 10:25 PM

#

plt.subplot()

balmy geyser Apr 12, 2019, 10:50 PM

#

thanks @lean ledge

pine yoke Apr 13, 2019, 4:00 AM

#

Could someone point out how I can get my regex working? https://regex101.com/r/RlR4HO/1/
I'm trying to match just the fish name, then the amount, then the price. I've gotten pretty far, but for some reason I can't get specieweightprice knocked off my group 1 match, even if I add an non-capturing group that captures that exact phrase.

Regex101 - online regex editor and debugger

Regex101 allows you to create, debug, test and have your expressions explained for PHP, PCRE, Python, Golang and JavaScript. The website also features a community where you can share useful expressions.

#

Here's something that allows me to capture it like I want to: https://regex101.com/r/RlR4HO/2

Regex101 - online regex editor and debugger

Regex101 allows you to create, debug, test and have your expressions explained for PHP, PCRE, Python, Golang and JavaScript. The website also features a community where you can share useful expressions.

#

But as soon as I add '?' the rest of group one steals it from the non capturing group.

lapis sequoia Apr 13, 2019, 6:11 AM

#

what pattern do you want to catch

balmy geyser Apr 13, 2019, 6:54 AM

#

Bit of a long shot, would anyone be up for looking at my implementation of a limiter algorithm and help me figure out why it doesn't work as described in the paper its based upon?

#

the paper is here: https://users.aalto.fi/~hamalap5/dafx2002/dafx_hamalainen.pdf

lapis sequoia Apr 13, 2019, 7:33 AM

#

sure.. post your code.. I can get back by tomorrow..

frigid portal Apr 13, 2019, 11:05 AM

#

hi i need to crate data table by jason url and then sort her all that in flask how do i sort the table

#

like if u click on number sort by min/max

balmy geyser Apr 13, 2019, 3:44 PM

#

hey @lapis sequoia

#

let me know if you're still interested.

slate orchid Apr 13, 2019, 8:32 PM

#

hey y'all

#

i need someone to push me in the right direction for a thing

#

i understand the basics of machine learning, but i've never actually implemented it into anything

#

here's the gist of what i'd like to do: i want to be able to train an algorithm to judge a section of text on whether it most displays one of four different traits

#

now, i've been told the way to do this is to make four models, one for each of the traits, and see which confidence appears highest

#

so, if this is right, what libraries would i want to be using, what kind of stuff should i be googling

#

if this is wrong, then how should i be going about this?

#

and then the previous question i guess

#

thanks y'all

torn musk Apr 13, 2019, 8:38 PM

#

@slate orchid i made a library which judges text based on different traits

slate orchid Apr 13, 2019, 8:38 PM

#

that sounds... extremely relevant and useful?

torn musk Apr 13, 2019, 8:38 PM

#

for sake of visualization this website shows images depicting what it does

#

the model i used was.. i used a Word2Vec implementation and just plugged in the traits i wanted

#

for each word i found the 'correlation value' and averaged it over the entire section of text

slate orchid Apr 13, 2019, 8:40 PM

#

huhh

torn musk Apr 13, 2019, 8:40 PM

#

for example

#

unfriendliness = model.most_similar(
    positive=['hostile', 'hurtful', 'unfriendly', 'mean'],
    negative=['friendly', 'affectionate', 'loving', 'kind'],
    topn=100000
)

#

are you familiar with Word2Vec?

slate orchid Apr 13, 2019, 8:41 PM

#

nope, looking it up now

#

is that basically step one for all 'text reading' machine learning stuff?

torn musk Apr 13, 2019, 8:41 PM

#

word2vec yes

#

its so simple and does so much

slate orchid Apr 13, 2019, 8:42 PM

#

sure, that sounds like a thing i want

torn musk Apr 13, 2019, 8:42 PM

#

https://radimrehurek.com/gensim/models/word2vec.html

gensim: topic modelling for humans

Efficient topic modelling in Python

slate orchid Apr 13, 2019, 8:42 PM

#

so how and what would i want to plug that into?

torn musk Apr 13, 2019, 8:43 PM

#

from gensim.models import KeyedVectors

slate orchid Apr 13, 2019, 8:43 PM

#

for distinguishing the four traits

#

(i'll train it on text in which i've established which the dominant trait is)

torn musk Apr 13, 2019, 8:43 PM

#

there are 2 ways to do it : pretrained and posttrained

#

depending on how fast your application needs to be

#

mine is pretrained because i wanted it fast

#

by posttrained i mean 'train on the fly'

slate orchid Apr 13, 2019, 8:44 PM

#

ah, sure

torn musk Apr 13, 2019, 8:44 PM

#

or more like 'compute on the fly' whic his slow

slate orchid Apr 13, 2019, 8:44 PM

#

can you do both?

torn musk Apr 13, 2019, 8:44 PM

#

anything is possible

slate orchid Apr 13, 2019, 8:44 PM

#

woah

torn musk Apr 13, 2019, 8:45 PM

#

print("Loading GoogleNews-vectors into word2vec (~30 seconds)")
model = KeyedVectors.load_word2vec_format(
    'GoogleNews-vectors-negative300.bin.gz',
    binary=True,
    limit=500000
)```

#

the problem with using gensim directly is that everytime you run the script it takes 30 seconds to load the model

#

maybe theres a way to do it faster but i dont know that

slate orchid Apr 13, 2019, 8:46 PM

#

haha, clearly i have a lot to google here

torn musk Apr 13, 2019, 8:46 PM

#

gensim is fast but loading the googlenews into it takes time

slate orchid Apr 13, 2019, 8:47 PM

#

gensim is... a library?

torn musk Apr 13, 2019, 8:47 PM

#

yes

#

https://www.pydoc.io/pypi/gensim-3.2.0/autoapi/models/word2vec/index.html

slate orchid Apr 13, 2019, 8:48 PM

#

okay, in dumb terms for me: it inputs your word2vec stuff, and outputs...

#

(once trained)

torn musk Apr 13, 2019, 8:48 PM

#

in dumb terms Google already trained it on Google News

#

but the file is like 2GB so its slow to load

slate orchid Apr 13, 2019, 8:48 PM

#

oh right

#

wait, what does it being trained on google news mean?

#

like, trained for what

torn musk Apr 13, 2019, 8:49 PM

#

it trained on which words occured frequently together

#

for example if 'neko' and 'cat' appeared frequently together, their vectors would be closer

slate orchid Apr 13, 2019, 8:50 PM

#

ah, okay

#

not sure that's what i want? honestly i can't tell

torn musk Apr 13, 2019, 8:50 PM

#

you can play with it

#

https://projector.tensorflow.org/

Embedding projector - visualization of high-dimensional data

Visualize high dimensional data.

#

look for your traits

iron latch Apr 13, 2019, 8:50 PM

#

confusing

slate orchid Apr 13, 2019, 8:50 PM

#

woah

#

words

torn musk Apr 13, 2019, 8:51 PM

#

just type your traits in there

#

search box on the right side of page

slate orchid Apr 13, 2019, 8:51 PM

#

ah, okay, here's the bit i'm explaining crap

#

i want to have four traits that are basically arbritrary

#

let's say sweet, salty, bitter, and whatever the last one is

#

sour

#

that

#

if i gave it like 100 examples saying 'this text is sweet, this text is salty', could i then give it some text and it tell me which of those four it is?

torn musk Apr 13, 2019, 8:53 PM

#

yes

slate orchid Apr 13, 2019, 8:53 PM

#

so... is this this thing then?

torn musk Apr 13, 2019, 8:53 PM

#

yes

#

it works surprisingly well for this purpose

slate orchid Apr 13, 2019, 8:55 PM

#

somehow your certainty makes me confused but awesome

torn musk Apr 13, 2019, 8:55 PM

#

because thats what i've been working on

#

so i tested it and stuff

#

i did have to choose the parameters though, so i chose limit=500000 and topn=10000 because they gave better values

#

limit meaning we use 500,000 words from googlenews and 10,000 for each trait but that was just for the specific use case i was using

#

the other 'optimization' was that i used a bunch of synonyms so that it would reflect the meaning better for example

#

with antonyms too

dominance = model.most_similar(
    positive=['dominant', 'assertive', 'capable', 'important'],
    negative=['submissive', 'apologetic', 'meek', 'passive'],
    topn=100000
)

#

as opposed to

dominance = model.most_similar(
    positive=['dominant'],
    topn=100000
)

slate orchid Apr 13, 2019, 8:58 PM

#

okay, so what you're doing is trying to find phrases and such related to words that exist

#

i think?

torn musk Apr 13, 2019, 8:58 PM

#

i rank phrases

slate orchid Apr 13, 2019, 8:58 PM

#

what... does that mean

#

i'm really sorry you're trying to be helpful and i'm pretty useless

torn musk Apr 13, 2019, 8:59 PM

#

"Mitsuki is really kind and sweet" -> friendliness: 3/10, dominance: -2/10

#

ranks the phrase based on traits

slate orchid Apr 13, 2019, 8:59 PM

#

ah, gotcha

#

so you're inputting 'friendliness' into the trained thing, right?

torn musk Apr 13, 2019, 8:59 PM

#

yes

#

  friendliness = model.most_similar(
      positive=['friendly', 'affectionate', 'loving', 'kind'],
      negative=['hostile', 'hurtful', 'unfriendly', 'mean'],
      topn=100000
  )

slate orchid Apr 13, 2019, 9:00 PM

#

okay, i'm trying to do something kind of different i think

torn musk Apr 13, 2019, 9:01 PM

#

"oranges are really heavy on the acid and vitamin c" -> sweet:-2, sour:+8, bitter:-4

slate orchid Apr 13, 2019, 9:01 PM

#

ah, sorry, my example was really confusing

#

those were meant to just be arbritrary phrases

torn musk Apr 13, 2019, 9:02 PM

#

let me guess you want a one hot encoding :
"oranges are really heavy on the acid and vitamin c" -> [sweet,sour,bitter]=[0,1,0]

#

f("oranges are really heavy on the acid and vitamin c" ,[sweet,sour,bitter]) -> [0,1,0]

slate orchid Apr 13, 2019, 9:02 PM

#

in plain terms: i want a computer to tell me whether some text is more 'bleep' or 'bloop', having given the computer a bunch of phrases and told them whether they were 'bleep' or 'bloop'

torn musk Apr 13, 2019, 9:02 PM

#

choose_trait("oranges are really heavy on the acid and vitamin c" ,[sweet,sour,bitter]) -> sour

slate orchid Apr 13, 2019, 9:02 PM

#

the computer has to figure out in the training bit what bleep or bloop actually mean

torn musk Apr 13, 2019, 9:03 PM

#

oh

slate orchid Apr 13, 2019, 9:03 PM

#

so, what kinds of words they're associated with

torn musk Apr 13, 2019, 9:03 PM

#

well if you have like a lot of phrases

#

you can use linear regression or something

slate orchid Apr 13, 2019, 9:03 PM

#

yeah, i think that's what i want

torn musk Apr 13, 2019, 9:03 PM

#

maybe even a neural network

slate orchid Apr 13, 2019, 9:03 PM

#

YEAH THAT

#

that thing

torn musk Apr 13, 2019, 9:03 PM

#

supervised learning

slate orchid Apr 13, 2019, 9:03 PM

#

that's the thing i want i think

torn musk Apr 13, 2019, 9:04 PM

#

the question is how much data are you feeding it

#

is it 10^2, 10^5 or 10^9

slate orchid Apr 13, 2019, 9:05 PM

#

all data would have to be sorted by hand, so at best like 10^3

torn musk Apr 13, 2019, 9:05 PM

#

more data means less transfer learning and more layers of neural networks

#

i see

#

maybe a 2 layer DNN

slate orchid Apr 13, 2019, 9:05 PM

#

like, i'm tring to make a crappy experiment, nothing's riding off of this being perfect

#

but i want it to mean SOME kind of thing

torn musk Apr 13, 2019, 9:05 PM

#

ok

slate orchid Apr 13, 2019, 9:05 PM

#

DNN?

torn musk Apr 13, 2019, 9:05 PM

#

deep neural network

slate orchid Apr 13, 2019, 9:06 PM

#

ah gotcha

torn musk Apr 13, 2019, 9:06 PM

#

but deep here will be just 1 or 2 layers

slate orchid Apr 13, 2019, 9:06 PM

#

so a 'd'nn then

#

'''d'''nn

torn musk Apr 13, 2019, 9:06 PM

#

or even sequential neural network

#

dnn is just the common phrase

slate orchid Apr 13, 2019, 9:06 PM

#

okay that's a googlable thing, thank you so much

torn musk Apr 13, 2019, 9:06 PM

#

np

slate orchid Apr 13, 2019, 9:06 PM

#

what kinds of libraries do i want?

torn musk Apr 13, 2019, 9:06 PM

#

the easy way would be using keras

#

in tensorflow 2.0 keras is a part of tensorflow , but its pretty recent and you could find more examples of the 'old' keras

slate orchid Apr 13, 2019, 9:08 PM

#

is keras an abstraction on top of tensorflow?

torn musk Apr 13, 2019, 9:08 PM

#

https://machinelearningmastery.com/tutorial-first-neural-network-python-keras/

Machine Learning Mastery

Develop Your First Neural Network in Python With Keras Step-By-Step

Keras is a powerful easy-to-use Python library for developing and evaluating deep learning models. It wraps the efficient numerical computation libraries Theano and TensorFlow and allows you to define and train neural network models in a few short lines of code. In this post...

#

yes

slate orchid Apr 13, 2019, 9:09 PM

#

oh is it just easier tensorflow then

torn musk Apr 13, 2019, 9:09 PM

#

yes

#

a lot easier

slate orchid Apr 13, 2019, 9:09 PM

#

okay easy is great

#

thank you so so much, this looks like a great place to start

torn musk Apr 13, 2019, 9:09 PM

#

yay anytime 😃

slate orchid Apr 13, 2019, 9:10 PM

#

by the way your project looks awesome

torn musk Apr 13, 2019, 9:10 PM

#

thanks!!!

slate orchid Apr 13, 2019, 9:10 PM

#

have a good whatever the time is wherever you are

#

night?

torn musk Apr 13, 2019, 9:11 PM

#

afternoon

slate orchid Apr 13, 2019, 9:11 PM

#

that

#

have an excellent that

torn musk Apr 13, 2019, 9:11 PM

#

you too

slate orchid Apr 13, 2019, 9:11 PM

#

:)

grave patrol Apr 14, 2019, 12:33 AM

#

This is a nooby question, but i cant figure out how to label my data it's a dataset with 140 feature per sample in a 2d array with the last cell on each row being the class it belongs two (binary)

#

do i have to read the labels into a second array

#

using sklearn*

supple ferry Apr 14, 2019, 6:15 AM

#

You can slice that array, transpose it and concatenate with the main array but vertically. Then you will get an array with 141 columns

mighty vector Apr 15, 2019, 5:55 AM

#

I'm pretty novice to python and data science. But I have some code that my organization wants to make available to everyone in our 30+ offices. What is the best way to package my product and make/push future updates?

chilly shuttle Apr 15, 2019, 7:51 AM

#

@mighty vector github

#

Or just host it on an internal wiki like confluence

lapis sequoia Apr 15, 2019, 10:26 AM

#

yo anyone got a clue about this neural net, seems to keep oscillating around 50/50 ```py
import numpy as np
import pandas as pd

inputs = []
outputs = []
train = pd.read_csv('traindata.csv')

def preprocessing(x):
input = x.iloc[:, 0:-2]
output = x.iloc[:, 6:8]
for i in range(1, 12):
inp = input.iloc[i - 1]
inputs.append(inp.to_numpy())
for i in range(1, 12):
out = output.iloc[i - 1]
outputs.append(out.to_numpy())

def sigmoid(x):
return 1./(1. + np.exp(-x))

def sigmoid_prime(x):
return x * (1. - x )

class StroopNetwork:
def init(self, x, y_hat, epsilon):
self.input = x
self.theta1 = np.random.randn(4, 6)*np.sqrt(2/6)
self.theta2 = np.random.randn(2, 4)*np.sqrt(2/4)
self.expectation = y_hat
self.output = np.zeros((2, 1))
self.epsilon = epsilon

def forward(self, i):
    self.layer1 = sigmoid(np.dot(self.theta1, self.input[i]))
    self.output = sigmoid(np.dot(self.theta2, self.layer1))

def backward(self, i):

    delta_2 = (self.expectation[i] - self.output) * sigmoid_prime(np.dot(self.theta2, self.layer1))
    delta_1 = (delta_2 @ self.theta2) * sigmoid_prime(np.dot(self.theta1, self.input[i]))
    
    d_theta2 = self.output @ delta_2
    d_theta1 = self.layer1 @ delta_1

    self.theta1 += d_theta1
    self.theta2 += d_theta2

preprocessing(train)

bob = StroopNetwork(inputs, outputs, 0.12)

for z in range(1000):
for i in range(1, 12):
bob.forward(i-1)
bob.backward(i-1)
print(bob.expectation[i-1], bob.output, z)

#

i'm 🅱 retty new to this so it should be a rookie mistake lel

#

https://cdn.discordapp.com/attachments/448711580377546752/567286035709427713/Screen_Shot_2019-04-15_at_7.44.04_pm.png

#

^ bit of sample output

hasty maple Apr 15, 2019, 10:32 AM

#

def sigmoid_prime(x):
    return x * (1. - x )
should be like
def SigmoidGradient(Z):
    
    return Sigmoid(Z) * (1 - Sigmoid(Z)) no?

#

https://git.io/fhUGE @lapis sequoia this might help, been like a year since I wrote those codes though, so don't remember much >.>

GitHub

M-e-r-c-u-r-y/Machine-Learning

Machine Learning codes. Contribute to M-e-r-c-u-r-y/Machine-Learning development by creating an account on GitHub.

lapis sequoia Apr 15, 2019, 10:42 AM

#

sigmoid(z) = a

#

so yea that is correct but it doesn't make a difference @hasty maple

#

idk it's really weird

hasty maple Apr 15, 2019, 10:47 AM

#

strange indeed, I would suggest to check for the shapes, usually numpy doesn't cry but works with whatever shape you throw at it, like for a multiplication, A(mn) should have B(nk), but if it's A(m*n) and B(n) it would still work and give an output but it won't be the correct one

lapis sequoia Apr 15, 2019, 10:48 AM

#

sorry i don't quite understand, what do you mean by check for the shapes?

hasty maple Apr 15, 2019, 10:51 AM

#

self.layer1 = sigmoid(np.dot(self.theta1, self.input[i]))
self.output = sigmoid(np.dot(self.theta2, self.layer1))

Output the shapes of self.theta1, self.input[i], self.theta2, self.layer1, self.output
When I coded it up last year, the shapes caused some trouble in the math, the self.input[i] might have caused the shapes to not be correct for the matrix multiplication operation

lapis sequoia Apr 15, 2019, 10:55 AM

#

ok ty i'll check

hasty maple Apr 15, 2019, 10:55 AM

#

👍

lapis sequoia Apr 15, 2019, 11:04 AM

#

the multiplication seems fine but it still doesn't work

#

ugh what is this

hasty maple Apr 15, 2019, 11:15 AM

#

ah you aren't using epsilon

#

the network might be jumping back and forth due to the large steps

lapis sequoia Apr 15, 2019, 11:19 AM

#

it does the exact same thing when i'm using epsilon

#

ty for all the help btw

hasty maple Apr 15, 2019, 11:22 AM

#

try running it again with epsilon, use small values of epsilon, 0.001,0.0001, etc
Also maybe check for the gradients as well, compute em numerically
You're welcome :)

lapis sequoia Apr 15, 2019, 11:22 AM

#

how would i compute them numerically?

#

ok will try that 👌

hasty maple Apr 15, 2019, 11:42 AM

#

I can't seem to find the pdf for it, but it's available in Andrew NG's course on coursera, he shows how to calculate it in matlab, you can code it in python as well

lapis sequoia Apr 15, 2019, 11:42 AM

#

ok ty i'll give it a search

turbid bay Apr 15, 2019, 1:56 PM

#

im trying to make a digit recogniser. I dont really know what to do. Any help would be great. Thankyou

vague jetty Apr 15, 2019, 2:21 PM

#

There are plenty of tutorials online for that. Do you want help finding a good one? Also, do you understand generally how digit recognizers work? If not, do you care about learning the fundamentals or do you just want to code something?

#

Also, does anyone know what InvalidArgumentError: Can not squeeze dim[1], expected a dimension of 1, got 250 [[{{node metrics_5/acc/Squeeze}}]] means when calling model.fit in TF keras? It backtraces to some nondescript function calls: https://pastebin.com/0bY4TswX

Pastebin

------------------------------------------------------------------...

paper niche Apr 15, 2019, 2:30 PM

#

does this help? https://stackoverflow.com/q/49083984

Stack Overflow

ValueError: Can not squeeze dim[1], expected a dimension of 1, got...

I tried to replace the training and validation data with local images. But when running the training code, it came up with the error :
ValueError: Can not squeeze dim[1], expected a dimension o...

vague jetty Apr 15, 2019, 2:31 PM

#

I looked into that yesterday, and it didn't really help. That user is getting the error from a different function.

paper niche Apr 15, 2019, 2:34 PM

#

you have 250 nodes in your output layer right?

vague jetty Apr 15, 2019, 2:35 PM

#

Oh, no. I'm new to ML, so I'm still wrapping my head around the architecture.

paper niche Apr 15, 2019, 2:35 PM

#

https://stackoverflow.com/q/55634133 this looks similar to what you're getting?

Stack Overflow

Can not squeeze dim[1], expected a dimension of 1, got 2

I have very simple input: Points, and I am trying to classify whether they are in some region or not. So my training data is of the shape (1000000, 2), which is an array of the form:[ [x1,y1], [x2,...

vague jetty Apr 15, 2019, 2:37 PM

#

Not exactly, but I think I know where I'm getting tripped up now.

#

Not really related to the last question, but I have an architecture question, too.

My X shape is (17042, 250, 87), so 17042 sequences of length 250, each with 87 features. My Y shape is a vector of length 250 containing 1s and 0s, denoting wether a point in the input is important or not.

The last layer in the network should be an LSTM with units=250 and return_sequences=True, right?

Edit: NVM above question.

distant wraith Apr 15, 2019, 3:12 PM

#

Anyone have a nice presentation on why Python for data science over Bi tools (Splunk, tableau exc exc) that I can show senior management ?

turbid bay Apr 15, 2019, 3:28 PM

#

@vague jetty i found something online. using the mnist digit dataset. it says its called a Multi-Layer Perceptron. However when i input my own data to test the model it is incredibly wrong

#

https://gogul09.github.io/software/digits-recognition-mlp

Gogul Ilango

Handwritten Digit Recognition using Deep Learning, Keras and Python

Learn how to recognize handwritten digit using a Deep Neural Network called Multi-layer Perceptron (MLP).

#

would it be viable for me to relearn neural networks? and make it from scratch?

vague jetty Apr 15, 2019, 3:45 PM

#

I always advocate building things from scratch over copying code. If you have the time, you should absolutely relearn NNs and build one from scratch. I don't know where are you competency-wise, but you should be able to figure out what you do and don't know

reef bone Apr 15, 2019, 3:54 PM

#

@turbid bay Multi-layer perceptron is mostly synonymous with neural network (although some literature makes a slight distinction between them afaik), so don't get confused by the fancy terminology. The MNIST digit recognition task is absolutely canonical and you will find a massive amount of resources dedicated to the problem by just googling - Geoffrey Hinton recognized it as the drosophila of machine learning, meaning it's an extensively studied problem and a good place to start your ML journey

#

And I would also advocate for trying to build your own NN to solve the task - and after you're done, take a look at the state-of-the-art networks for the problem (Kaggle is a good place for this), see what they do differently, and try to understand why

#

Keras provides an extremely modular and straight-forward API so you don't really need extensive knowledge beyond what layers are and how they work, although more knowledge always helps

turbid bay Apr 15, 2019, 3:57 PM

#

yh i will try. i learnt before how to do some of it. But it was in octave and i found octave very hard to learn so hopefully will understand it when trying to do it in python

#

i will work on making my own NN. But be prepared for many questions XD

reef bone Apr 15, 2019, 3:58 PM

#

There are some very smart and educated people in this chat so don't be afraid to ask for help

turbid bay Apr 15, 2019, 3:58 PM

#

will do thanks

vague jetty Apr 15, 2019, 4:08 PM

#

Alright, I fixed some of the layers in the model I posted earlier. Now I'm getting AttributeError: 'builtin_function_or_method' object has no attribute 'shape' from inside some of keras's helper libraries. Here's the Google collab with the whole code: https://colab.research.google.com/drive/1yvIYYiBVtqQgVGER9rZr9cnxzlTwyiRz

Google Colaboratory

hardy crag Apr 15, 2019, 4:28 PM

#

I get an error for that colab

vague jetty Apr 15, 2019, 4:32 PM

#

As in the error I mentioned?

#

Or can you not access it?

hardy crag Apr 15, 2019, 4:40 PM

#

cannot load it

#

Error loading https://apis.google.com/js/client.js
Error: Error loading https://apis.google.com/js/client.js
at HTMLScriptElement.k.onerror (https://colab.research.google.com/v2/external/gapi_loader.js:9:415)

#

wait fixed it

#

my add blockers fault 😦

vague jetty Apr 15, 2019, 4:40 PM

#

Haha, it happens.

hardy crag Apr 15, 2019, 4:42 PM

#

It's read only though

vague jetty Apr 15, 2019, 4:43 PM

#

Can you not see the error in the last cell? I've never shared a collab before.

hardy crag Apr 15, 2019, 4:43 PM

#

yes. I see your output but can't change anything

#

(may be for the better now that I think about it

#

)

#

can you check if in the lstm function the data is numpy arrays?

vague jetty Apr 15, 2019, 4:51 PM

#

found the error lol

hardy crag Apr 15, 2019, 4:52 PM

#

lstm function call

#

y_train.astype

#

?

vague jetty Apr 15, 2019, 4:52 PM

#

Bingo.

hardy crag Apr 15, 2019, 4:52 PM

#

yeah

#

noticed it just now :p

slate orchid Apr 15, 2019, 9:22 PM

#

yo

#

i'm putting some tweets into keras text classification

#

now bare in mind i have no clue what i'm doing

#

say i were to replace the twitter image link with something like TWITTERIMAGELINKHERE

#

would that be okay to be added to the 'vocabulary', and then having an image is counted as relevant to the classification?

heady bone Apr 15, 2019, 11:38 PM

#

I would say yes, if you were using some type of bag of words model

slate orchid Apr 15, 2019, 11:41 PM

#

yup

#

nice

#

can classifications be given actual names?

#

or are they just integers normally

heady bone Apr 15, 2019, 11:47 PM

#

you would usually have a classification for each integer, no? like 0 = positive, 1 = negative, 2 = whatever, etc.

slate orchid Apr 15, 2019, 11:47 PM

#

okay then i'm gonna need to find a solid answer on the correct order of the hogwarts houses

#

i've got a weird project on

void star Apr 16, 2019, 1:08 AM

#

Should I get started with tf, keras tf, keras, tf2 ? Which is best? Just trying to learn so I have no specific objective. Just trying to start from what's more logical

#

Mention me if u answer pls

sand lark Apr 16, 2019, 4:48 AM

#

@void star I'd learn tf and/or pytorch. Pytorch is nicer in my opinion but I kind of do both since for some jobs they want tf and for others they want pytorch. They all do the same thing

hasty maple Apr 16, 2019, 10:21 AM

#

https://www.reddit.com/r/datascience/wiki/frequently-asked-questions

reddit: the front page of the internet

r/datascience: A place for data science practitioners and professionals to discuss and debate data science career questions.

hardy crag Apr 16, 2019, 10:23 AM

#

if you do tf, do tf2

hasty maple Apr 16, 2019, 10:28 AM

#

@lyric canopy could you go through the above sub reddit and see if it's pin worthy for this channel

hardy crag Apr 16, 2019, 10:29 AM

#

I agree it is a good place to start

slate orchid Apr 16, 2019, 10:46 AM

#

hey, if i want to make text training data for keras, can i go with a csv file formatted like

#

"text stuff text stuff text stuff", 0
"more text things, other text things", 1

hardy crag Apr 16, 2019, 10:47 AM

#

sure

#

is your dataset too big to keep in memory?

slate orchid Apr 16, 2019, 10:48 AM

#

i'm making the dataset out of some survey stuff

#

so i'm getting other people to classify stuff for me

#

i need a way for them to send stuff back to me

#

it's not gonna be that big

hardy crag Apr 16, 2019, 10:49 AM

#

okay. So your probably gonna load it from that csv anyway before using it, which means the format is whatever you want it to be and you just parse it accordingly

slate orchid Apr 16, 2019, 10:51 AM

#

nice

#

thank you!

#

oh, one more thing

#

how important is it for the dataset to be balanced?

#

so, i have four classifications

#

how important is it that i get a roughly 1:1:1:1 dataset

hardy crag Apr 16, 2019, 10:54 AM

#

well, it would be easier but there are ways to deal with class imbalance

#

you could modify your loss function to account for class frequency, or sample your batches so that they contain roughly equal parts of each class even tough the actual dataset does not,

#

or you could just randomly select a subset of each class for your training set

#

Not sure what the goto way is in NLP

slate orchid Apr 16, 2019, 10:57 AM

#

don't know anything about loss functions yet haha

#

anyway, thank you so much

hardy crag Apr 16, 2019, 10:58 AM

#

Uh there is another one (which is fun, but probably not viable): You could try to generate synthetic samples of your smaller classes :p

#

yw. This is what this discord is for 😃

#

(or maybe you can rethink your class definitions in a way that make them more balanced)

slate orchid Apr 16, 2019, 11:01 AM

#

uhhh

#

my class definitions are a little...

#

fixed

#

(i'm trying to make something that classifies by hogwarts house)

#

(don't ask)

hardy crag Apr 16, 2019, 11:03 AM

#

well, I'm not in the replace-talking-hats business but I reckon the houses should be pretty equally sized no?

slate orchid Apr 16, 2019, 11:04 AM

#

depends on how my friends decide to classify stuff

#

i'll tell them to try to keep things roughly equal-ish

hardy crag Apr 16, 2019, 11:07 AM

#

a little bit of inbalance is fine I reckon. I'd just test it and you will probably see if your network tends to ignore one class or prefer one

slate orchid Apr 16, 2019, 11:08 AM

#

sure, that works

#

thank you!

lapis sequoia Apr 16, 2019, 12:00 PM

#

is anyone familiar with tensorflow

#

can you tell me about tf graph and tf session real quick

#

not sure if I need to initialize a new table for every session

#

and whether I can have my for loop of items outside tf session or inside

void star Apr 16, 2019, 12:04 PM

#

@sand lark great thank you.

hardy crag Apr 16, 2019, 12:14 PM

#

@lapis sequoia maybe try tf2. eager execution ftw

lapis sequoia Apr 16, 2019, 12:14 PM

#

I havent learnt it yet

hardy crag Apr 16, 2019, 12:15 PM

#

it's very similar, but imho much easier to grasp

#

you don't need to build a graph anymore with said eager execution

#

also it has built in keras for "common" architectures

lapis sequoia Apr 16, 2019, 12:18 PM

#

can you share some example code

hardy crag Apr 16, 2019, 12:19 PM

#

https://colab.research.google.com/github/tensorflow/docs/blob/master/site/en/r2/tutorials/quickstart/advanced.ipynb

Google Colaboratory

#

https://colab.research.google.com/github/tensorflow/docs/blob/master/site/en/r2/tutorials/quickstart/beginner.ipynb

Google Colaboratory

lapis sequoia Apr 16, 2019, 12:20 PM

#

im doing something like this

#

what do you think

#

for query in some_queries:
  with tf.Graph().as_default():
    with tf.Session() as sess:

hardy crag Apr 16, 2019, 12:20 PM

#

does your for loop need to be outside of the session?

lapis sequoia Apr 16, 2019, 12:20 PM

#

I tried keeping it inside the session, but got an error saying my table was already initialized

hardy crag Apr 16, 2019, 12:21 PM

#

are you only doing inference?

lapis sequoia Apr 16, 2019, 12:21 PM

#

it needs a new table initialized for every query in the for loop..is there a different way

#

Im getting embeddings.. and comparing them to other embeddings

#

this is my embd function

hardy crag Apr 16, 2019, 12:22 PM

#

hmm. gotta be honest, never run into that case. From the docs I guess that you need to initialize the tables for the graph once

lapis sequoia Apr 16, 2019, 12:22 PM

#


def embd(inputs, module, placeholder, sess):
  embeddings_tensor = module(placeholder)
  
  sess.run(tf.global_variables_initializer())
  sess.run(tf.tables_initializer())
  message_embeddings = sess.run(embeddings_tensor,
                                feed_dict={placeholder: inputs})
    
  return message_embeddings

#

so I run this function for every query.. in a new session

hardy crag Apr 16, 2019, 12:23 PM

#

so you would need to maybe put the for loop inside the graph context and call the session inside the loop?

lapis sequoia Apr 16, 2019, 12:23 PM

#

I timed that.. it took longer

#

than If I kept it outside the graph

#

like 9 seconds longer

#

I cant afford this.. it takes too long..just trying to get it done faster..

#

like 2 queries takes 31 seconds

#

that's days.. if I want to run thousands of queries

hardy crag Apr 16, 2019, 12:27 PM

#

how many different graphs do you need to loop over?

lapis sequoia Apr 16, 2019, 12:27 PM

#

what does that mean

#

I think it's the same graph

#

im not really sure about graph and sessions..still new

hardy crag Apr 16, 2019, 12:28 PM

#

okay different approach :p

#

I'm guessing by embedding you mean you have a vector that contains some kind of information?

#

in your embd() function, inputs is a datapoint, module is a neural net?

lapis sequoia Apr 16, 2019, 12:30 PM

#

yes.. its an encoder..

hardy crag Apr 16, 2019, 12:32 PM

#

and you want to compare different encodings?

lapis sequoia Apr 16, 2019, 12:32 PM

#

I have a set of embeddings which I want to compare with the embedding gotten from the queries list

#

I just changed my code

#

now it runs at half time

#

woohoo

hardy crag Apr 16, 2019, 12:33 PM

#

👍

lapis sequoia Apr 16, 2019, 12:33 PM

#

Necessity is the mother of invention.. saving my ass

#

thanks man

#

it really helped talking it out with you

hardy crag Apr 16, 2019, 12:34 PM

#

happy to be the rubber duck 😃

lapis sequoia Apr 16, 2019, 12:39 PM

#

what I did was this

#

Initialized graph and session..

#

loading the module..

#

placeholders..

#

then sess run.. initialized variables and tables..

#

then in loop, I called for embeddings for each query

hardy crag Apr 16, 2019, 12:41 PM

#

sounds like the way to go

#

doing the initalizing before hand should save loads of time

lapis sequoia Apr 16, 2019, 12:46 PM

#

yeah before I was doing all of this with a function call for each query..

#

which was a waste of time

midnight atlas Apr 16, 2019, 2:42 PM

#

anyone familiar with using cloud computing to train neural nets using a python script? pretty confused between several options

chilly shuttle Apr 16, 2019, 3:43 PM

#

@midnight atlas that's a pretty nebulous and broad question, wanna narrow it down?

midnight atlas Apr 16, 2019, 4:05 PM

#

Well essentially it's two parts, which platform is most intuitive to setup and then how can I do so?

#

I want to run a training script that runs over a 50gb database

devout ridge Apr 16, 2019, 6:29 PM

#

I am using MultiLabelBinarizer() with np.array(), and I got this error: TypeError: 'numpy.float64' object is not iterable

#

📎 python_error.png

#

Then I got TypeError: 'numpy.int64' object is not iterable

#

Anyone?

lyric canopy Apr 16, 2019, 6:33 PM

#

Which lines gives you that error?

#

You should be able to find which variable is assigned to a single number instead of an iterable

#

Trace it back to the source and then try to udnerstand why it's not what you think it is

misty imp Apr 16, 2019, 6:35 PM

#

def:

#

sorry, was a test.

devout ridge Apr 16, 2019, 6:36 PM

#

@lyric canopy new_array_of_labels = mlb.fit_transform(array_of_labels)

lyric canopy Apr 16, 2019, 6:36 PM

#

So, array_of_labels probably isn't what you think it is

#

Did you try printing it just before this line?

#

Oh

#

It should be an iterable of iterables, not an array of single numbers

#

According to the docs

devout ridge Apr 16, 2019, 6:38 PM

#

this is array_of_labels - [0 0 0 ... 5 5 5]

lyric canopy Apr 16, 2019, 6:38 PM

#

Ah, yes, so the elements are single numbers

#

But it actually wants an array of iterables (other arrays, tuples, ...)

#

See https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MultiLabelBinarizer.html#sklearn.preprocessing.MultiLabelBinarizer.fit_transform

devout ridge Apr 16, 2019, 6:38 PM

#

I see

#

thank you very much

misty imp Apr 16, 2019, 6:42 PM

#

"""def:

devout ridge Apr 16, 2019, 7:24 PM

#

I tried to run this: H = model.fit_generator(aug.flow(trainX, trainY, batch_size=BS),validation_data=(testX, testY),steps_per_epoch=len(trainX),epochs=EPOCHS, verbose=1)

#

And I ended up with this at the end: ValueError: Error when checking target: expected activation_7 to have shape (6,) but got array with shape (1,)

#

The data has 6 classes.

void anvil Apr 16, 2019, 7:38 PM

#

@midnight atlas you can just run it off a single AWS i nstance

devout ridge Apr 16, 2019, 7:39 PM

#

Hello?

void anvil Apr 16, 2019, 7:40 PM

#

sorry it's the p2.8xlarge or p2.16xlarge

#

you can probably get away with g3 instances as well

hoary terrace Apr 16, 2019, 7:44 PM

#

how would you guys describe the difference between a classifier and regression model in simple terms?

devout ridge Apr 16, 2019, 7:54 PM

#

Hello? I need help.

heady bone Apr 16, 2019, 8:04 PM

#

What are your layers like?

devout ridge Apr 16, 2019, 8:06 PM

#

📎 layers.png

heady bone Apr 16, 2019, 8:09 PM

#

oh wait

#

is your generator giving only 1d array?

#

it should look like this [[0, 1, 0, 0, 0, 0]]

#

instead of [0, 1, 0, 0, 0, 0]

#

hold on, i just woke up, brb 😛

heady bone Apr 16, 2019, 8:44 PM

#

Okay, another problem might be your training outputs might just be a number. Have you converted them into onehot labels? Your model expects something like this [1, 0, 0, 0, 0, 0] for trainY.

devout ridge Apr 16, 2019, 8:45 PM

#

I will check it out.

reef bone Apr 16, 2019, 9:52 PM

#

@hoary terrace Classification produces discrete values, regression produces continuous values. If you want to determine whether an image shows a cat or not, and nothing inbetween, those are 2 discrete classes with no overlap (nocat/cat, or 0/1). If you instead wish to predict how much something looks like a cat, therefore get output anywhere between 0 (absolutely not a cat) and 1 (absolutely a cat), for example 0.75 for "wow that looks a lot like a cat, but not entirely", that would be regression. Naturally, the techniques that you would use for either can overlap, for example classification with neural networks - imagine you're trying to decide which of 10 discrete animal classes are shown in a picture - is it a cat, dog, or armadillo? The neural network would likely end up having 10 output neurons where each represents one of your classes. After the input is propagated through the network, each of the neurons has a certain value - we can understand each neuron's output to be the probability that the the image shows its designated animal. We would use a softmax activation to get a normalized probability distribution, and then simply choose the most probably animal as the output. If the input is a cat, the cat's neuron would maybe have a value of 0.4, a dog would have 0.3 because it looks similar, but the armadillo would only get 0.05 because it looks nothing alike. However, we are still performing classification because the output is a discrete class - a cat.

lapis sequoia Apr 16, 2019, 11:19 PM

#

yo anyone know what be going wrong with this? https://pastebin.com/arjx0LLQ

Pastebin

[Python] bob - Pastebin.com

#

📎 unknown.png

#

just seems to be oscillating around 50/50

midnight atlas Apr 16, 2019, 11:34 PM

#

@void anvil how can I upload a large dataset to a VM? struggling with this part as SFTP type solutions can't handle the 50 gb easily

lapis sequoia Apr 17, 2019, 2:00 AM

#

I have a pandas column that isa list of lists

#

how do I split that column in to separate columns for each item

lapis sequoia Apr 17, 2019, 5:13 AM

#

I didn't take a direct approach, this is what I did:

#

1. make dataframe using dict, where values are list of lists
2. assigning column names
3. create separate concatenated df where I do a .apply(pd.Series) on the columns containing lists
4. reset index (because up until now, the dict key is the index)
5. Assign column names to the new df..

chilly shuttle Apr 17, 2019, 5:47 AM

#

check out unstack

#

oh hang on, you mean a column that contains lists

#

not a grouped column

lapis sequoia Apr 17, 2019, 5:49 AM

#

yeah

chilly shuttle Apr 17, 2019, 5:49 AM

#

https://stackoverflow.com/questions/35491274/pandas-split-column-of-lists-into-multiple-columns

Stack Overflow

Pandas split column of lists into multiple columns

I have a pandas dataFrame with one column that looks like the following:

`
In [207]:df2.teams
Out[207]:
0 [SF, NYG]
1 [SF, NYG]
2 [SF, NYG]
3 [SF, NYG]
4 [...

lapis sequoia Apr 17, 2019, 5:50 AM

#

yeah so this works.. but requires that the split column be assigned to a new dataframe

#

then I have to concat..

#

similar to what I;m doing with apply and pd.series

lapis sequoia Apr 17, 2019, 9:37 AM

#

I want to filter my dataframe, based on values in a column.. whether they contain a certain text or not

#

I can do conditionals for whole match.. but not sure how to do it for contains

#

I did str.contains.. and it seems to work..

#

but not sure if it's right..

#

hmmm

chilly shuttle Apr 17, 2019, 10:05 AM

#

str.contains is the best way to do it if it's sufficient for what you want

#

otherwise you need to map

lapis sequoia Apr 18, 2019, 9:14 AM

#

is anyone alive

#

I want to left join two dataframes..

#

do I mention which columns.. in left_on and right_on.. what if I needed one more column from the right

#

I got it..

#

I just had to mention the columns to join on..

#

didn't need to mention the others I wanted to add.. it did it anyway

#

how do I convert a header to the first row of dataframe?

supple ferry Apr 18, 2019, 9:47 AM

#

header to a first row ?

#

when reading?

lapis sequoia Apr 18, 2019, 9:53 AM

#

not when reading

#

like after assigning column names

#

adding the same header as row

supple ferry Apr 18, 2019, 9:54 AM

#

didint understand it. can you give a small example?

lapis sequoia Apr 18, 2019, 10:08 AM

#

consider you have a header for a dataframe..

#

which has column names A, B

#

you want the first row of the dataframe to be A, B as well

#

and move the rest of the dataframe contents one cell down.. to accommodate this

supple ferry Apr 18, 2019, 10:11 AM

#

You can use pd.DataFrame.shift with optional argument fill value and give it your values

lapis sequoia Apr 18, 2019, 10:13 AM

#

thanks