#How to get the prediction for flight delay and its reason?

17 messages · Page 1 of 1 (latest)

cobalt sonnet
#

im curently on a process of creating a prediction application using streamlit app. The idea is to create a model to predict if a flight will be delayed or not using model 1, and if the flight predicted to be delayed, model 2 is use to predict the cause or the reason of delay. the data is obtain from a database i created using pgadmin4

currently, im almost finish with the app, however, the prediction is still unfinish as the prediction always predicted the flight will be not delay and as scheduled given a new set of input, im still figuring out the cause of the problem. even when i put a set of input that should be predicted to be delayed, it still predcted the flight will be no delayed.

bleak iglooBOT
#

@cobalt sonnet

f.syaheerah Uploaded Some Code

this is the function to predict the user inputs, predict_flight_delay()

Uploaded these files to a Gist
stoic hinge
#

How are you making the predictions?

cobalt sonnet
bleak iglooBOT
#

@cobalt sonnet

f.syaheerah Uploaded Some Code

this is how the complete code looks like for the predictions

Uploaded these files to a Gist
stoic hinge
#

That a lot of code that I won't be able to read through

cobalt sonnet
#

the prediction is made by using the user input given a set of input features from the page

stoic hinge
#

What is your label class distribution for your training/validation/testing data?

cobalt sonnet
#

y['Delayed'], the distrution class is 0 No delay, 1 Delayed

#

the distribution of class is df['Delayed'].value_counts()
1 2245
0 2227
Name: Delayed, dtype: int64

#

which is fairly distributed, hence, why im confused why th eprediction is always no delay

stoic hinge
#

That is interesting

#

Is there any indication you found of underfitting or overfitting?

cobalt sonnet
#

so far, ive not found anything that may cause over/underfitting, im afraid that the function thta may cause that

#

im still not confident with this part of the code
input_df = pd.DataFrame({
'Day': [int(day)],
'Weekday': [weekday_num],
'AirlineName': [airline_name],
'DepartureCityName': [departurecity],
'DepTimeCategory': [deptimecategory],
'DestCityName': [destinationcity],
'ArrTimeCategory': [arrrimecategory],
'CarrierDelay': [0],
'WeatherDelay': [0],
'NASDelay': [0],
'SecurityDelay': [0],
'LateAircraftDelay': [0]
})

this is the input df, which is the dataframe of the information of features user inputed which is use to predict will a flight will be delayed or not using model 1. since i want to predict the reason of the delay, i need the 'CarrierDelay',
'WeatherDelay'
'NASDelay'
'SecurityDelay'
'LateAircraftDelay': for model 2 to do prediction of reason, but im not sure what to put as the value? should i put it as [0]?

stoic hinge
#

Again, this seems like quite a big project that I don't really have the time to unravel and understand completely. If you're able to help isolate or focus on specific parts of the project then I may be able to help