A. Dataset:

We use 2 datasets: Singapore Changi Airport Passenger Volume and GDP.

Our goal is to forecast monthly passenger traffic, aircraft arrivals and departures at Singapore Changi Airport and see how GDP is correlated with the number of passengers at the airport.

Link to original dataset: Civil Aircraft Arrivals, Departures, Passengers And Mail, Changi Airport, Monthly-Data.gov.sg

Raw Data:

Singapore Changi Airport Dataset

originalpassenger.csv

Untitled

GDP dataset

originalgdp.csv

Untitled

This dataset has 975 rows and 3 columns

It contains total number of aircraft arrivals and departures as well as total number of passengers for the month over the months from January 1980 to July 2020.

However, each month is duplicated because the second column contains both number of arrivals - departures AND number of passengers.

→ therefore, we need to extract "total passengers" and paste it into a new column

This dataset shows GDP of different countries from 1960 to 2020.

But we only need Singapore GDP, so we will extract that data into a new csv file.

Processed Data:

Passengers.csv

Untitled

As mentioned above, we extract the values of passengers each month and paste them into a new column. This is done through an excel formula that only inputs the total passengers' value and leaves an NA value where total aircraft arrivals and departures are listed. We then used R to remove the middle columns and NA values.


The second dataset that we are analyzing is Singapore GDP.

GDP.csv

Untitled

As mentioned above, we only needed Singapore GDP, so we extracted that data for Singapore GDP while removing every other countries data into a new Excel Sheet and named it "GDP.csv"

It has 62 rows, showing GDP from 1960 - 2020

To do this we transpose the rows to columns as well making it easier to insert into R for analysis

B. Passenger Dataset:

The very first thing that we did with the data after we loaded into R was converting it to a time series. Doing this allowed us to create graphs in order to see if there was seasonality, trends, and outliers in the data. What we found was that the data had both seasonality and an overall upward trend. We also found that there was quite a big dip in the number of airport passengers around 2005 and obviously because of covid in 2020. After converting the data into a time series we created a few models. The first models that we created were the ANN, MNN, AAN, MAN, MMN, AAA, MAA, MAM, MMM models. We then compared these models based on AIC and found that the best model according to AIC ws the AAA model, or Holt Winters additive. Then we did model selection based on MAE, and found that the best model based off of MAE was the Holt Winters multiplicative. Both of these models performed well however the Holt Winters multiplicative did perform better than the additive. The chart that plots both of these forecasts can be seen below.

Screen Shot 2021-11-27 at 3.42.55 PM.png

After we created these forecasts we then created an error analysis for the 2 forecasts. The error analysis based on MAE also supports the fact that the Holt winters Multiplicative performed better, the additive had an error of 214440.1 and the multiplicative had an error of 179400.6. The next error analysis we performed was RMSE and again the multiplicative model had a better RMSE with 423246.3, compared to 480556.3 that the additive model had.