By Nael Shiab, data journalist for CBC/Radio-Canada.
Contact: nael.shiab@radio-canada.ca (PGP key)
The story based on this analysis was published on May 13, 2021. Click here to read it.
La version française de cette analyse peut être consultée ici.
With planes moving all around the planet, the SARS-CoV-2 virus traveled quickly. The first Canadian case of COVID-19 was reported on Jan. 25, 2020. It was a Toronto man who had recently traveled to Wuhan, China.
On March 13, 2020, the federal government asked Canadians to cancel or postpone any non-essential travel plans. It wasn’t until Feb. 22, 2021 that international travellers had to take a COVID-19 molecular test when they arrived in Canada. Since then, hundreds of tests came back positive. Meanwhile, domestic travellers can fly across the country without tests or quarantine.
I obtained the detailed data for all planes which landed in Canada between January 1st, 2019 and April 30th, 2021. The dataset was provided by Flightradar24 and contains more than 3,6 million rows with 20 columns.
In the dataset, we have the airport of origin and the airport of destination. To retrieve the latitude and longitude of airports, I coded a NodeJS script that connected to Flightradar24’s website.
Under the Flightradar24 terms of use, I am not allowed to publish the raw data, but only data visualizations or aggregated data. For that reason, I am publishing my code here to be fully transparent in my journalistic process, but I don’t give access to the data I used.
In this analysis, we focus only on commercial flights landing in Canada. We don’t cover the planes leaving the country.
The coverage of the data is close to 100 per cent in Canada for commercial flights, according to Flightradar24. However, we only have the last place where a plane took off before landing in Canada. It’s possible that the plane made previous stops or that travelers took several flights.
We excluded airlines specialized in cargo flights and Air Canada’s flights reserved for goods transport. However, it’s possible that some flights of this kind escaped our filters.
We want to answer these questions :
Q1: How many planes usually land in Canada and how did the pandemic change this trend? How large is the difference between international air traffic and domestic air traffic?
Q2: How did the domestic air traffic look like during the pandemic? During the peak of each province’s waves, how many planes took off from there and landed elsewhere in the country?
While much attention is given to international air traffic, 85 per cent of air traffic in Canada has been domestic, during the pandemic. The volume of these flights increased during the summer and winter of 2020. In April and May 2020, the number of domestic flights was a fifth of the same months in 2019. In December 2020, when Canada was fighting its second wave, the number of flights was back to almost half the level of December 2019.
Between April 1, 2020, and April 30, 2021, 185,190 planes took off from a Canadian city to travel to another Canadian city. That’s an average of 469 planes every day. We found 2,865 unique connections between Canadian airports during the pandemic.
According to data released by airports, between April 1, 2020, and February 28, 2021, more than 8 million passengers were boarded or deplaned in total in Montreal, Toronto, Calgary, and Vancouver.
For these travellers, no quarantine or mandatory tests were required.
And it’s not because they don’t get sick. From April 26 to May 8, 2021, an average of 17 flights per day carried at least one passenger who later tested positive for COVID and travelled during a period when they may have been infectious, according to a “non-exhaustive” list from the federal government.
In April 2020, the pandemic was raging in Alberta, with a record 1,413 new cases on average per day (32 per 100,000 ppl.). But it didn’t stop 49 planes from taking off from the province every day on average. During this specific month, 305 planes left for Toronto, 331 for Vancouver, 177 for Kelowna, 88 for Montreal, 65 for Saskatoon, 61 for Winnipeg.
In BC, in April, there was a daily average of 981 new daily cases (19 per 100,000 ppl.). During this same month, a daily average of 43 flights took off from the province for elsewhere in Canada. From April 1, 2021, to April 30, 2021, 616 planes left BC to land in Calgary, 211 in Edmonton, 181 in Toronto, and 85 in Montreal.
In Ontario, the average number of daily new cases was at a record high of 3,782 in April 2021 (26 per 100,000 ppl.), while the number of flights leaving the province was on average 39 daily. From April 1, 2021, to April 30, 2021, a total of 272 flights went to Montreal, 201 to Calgary, 173 to Vancouver, 143 to Edmonton, and 123 to Winnipeg.
In Quebec, the worst month of the pandemic in terms of COVID cases was December 2020, with an average of 1944 cases a day. But the province experienced a third wave in April 2021 with 1289 new daily cases on average (15 per 100,000 ppl). In April 2021, an average of 25 planes took off from the province. In April 2021, 176 planes left Quebec to land in Toronto, 84 in Vancouver, 78 in Ottawa, 69 in Calgary, 66 in Halifax, and 65 in Moncton.
Ontario also decided to close its borders on April 19, 2021. Quebec did the same (but just for its Ontarian border). However, the provinces do not have jurisdiction over airports. There is no decrease in the number of domestic flights landing in these provinces compared to March and previous weeks in April. And neither of them imposed quarantine measures on people landing in their territory.
The Atlantic bubble (Nova Scotia, Prince Edward Island, Newfoundland, and Labrador) however did impose quarantine measures on anyone from outside of these provinces. The number of planes coming from outside of the bubble stayed low during that period, while the number of flights within bounced back a little more than elsewhere in the country.
I received four csv files. We combined the files and show the first rows of the dataset, with the columns. After deleting duplicate rows, we had a total of 3,693,868 different flights.
data_2019 <- read_csv("../data/Canada_2019.csv", col_types = cols(.default = col_character()))
data_2020 <- read_csv("../data/Canada_2020.csv", col_types = cols(.default = col_character()))
data_2021 <- read_csv("../data/Canada_2021.csv", col_types = cols(.default = col_character()))
data_april2021 <- read_csv("../data/Canada_APR2021.csv", col_types = cols(.default = col_character()))
raw_data <- rbind(data_2019, data_2020, data_2021, data_april2021) %>% unique()
head(raw_data, 5)
We also imported the geographic coordinates of the airports. We made a spatial join with a geographic map of the provinces from Statistic Canada to assign each airport a province, when they were in Canada.
airports_coords <- fromJSON("../data/airports_coords.json")
canada_sf = read_sf("../data/lpr_000b16a_e/lpr_000b16a_e.shp") %>%
st_transform(map,crs=4326)
airports_sf = st_as_sf(select(airports_coords, lat, lon), coords = c('lon', 'lat'), crs = st_crs(canada_sf), remove=FALSE)
airports_sf_joinded <- airports_sf %>% mutate(
intersection = as.integer(st_intersects(geometry, canada_sf)),
province = ifelse(is.na(intersection), "No prov", canada_sf$PREABBR[intersection])
)
airports_coords <- left_join(airports_coords, select(airports_sf_joinded, lat, lon, province), by=c("lat", "lon")) %>% select(!geometry) %>% arrange(province)
head(airports_coords, 5)
For our analysis of commercial flights, we needed an identification code for the airport of origin, an identification code for the airport of destination, a landing date, and an operator_code. We discard the flights that didn’t have this information. We also made sure that we only used planes that landed in Canada and we removed flights that took off and landed in the same place.
We also discarded the columns that we didn’t need for the analysis, but added columns for months, years, and type of traffic (domestic, international), latitude/longitude/province of origin, and latitude/longitude/province of destination.
data <- raw_data %>%
mutate(
date_landing = ymd(date_landing),
month_landing = month(date_landing),
year_landing = year(date_landing),
type = case_when(
origin_country == "CANADA" ~ "domestic",
TRUE ~ "international"
)
) %>%
filter(
destination_country == "CANADA",
is.na(origin_iata) == F,
is.na(destination_iata) == F,
is.na(date_landing) == F,
is.na(operator_code) == F,
origin != destination
) %>%
select(airline, flight, origin, origin_country, origin_iata, destination, destination_country, destination_iata, date_landing, month_landing, year_landing, type) %>%
left_join(select(airports_coords, iata, lat, lon, province), by=c("origin_iata" = "iata")) %>%
rename(origin_lat = lat, origin_lon = lon, origin_province = province) %>%
left_join(select(airports_coords, iata, lat, lon, province), by=c("destination_iata" = "iata")) %>%
rename(destination_lat = lat, destination_lon = lon, destination_province = province)
head(data,5)
To ensure we only have commercial flights, we excluded cargo or emergency airlines and flights without a declared airline. Our process was simple: we searched the website of the airline and kept flights we could either book directly or that were obviously transporting travellers. We kept private jet airlines but excluded companies using special aircraft, like helicopters.
According to FlightRadar24, Air Canada’s flights with AC7XXX as flight numbers are cargo planes. We excluded them as well. We asked Air Canada about that but they didn’t give us a list of flight numbers for cargo planes.
data <- data %>%
filter(
!is.na(airline),
airline %in% c("FedEx", "Cargojet Airways", "Canlink Aviation", "CanWest Air", "Ornge Air", "Airsprint", "Helijet International", "Nolinor Aviation", "North Caribou Flying Service", "UPS Airlines", "SkyLink Express", "Keewatin Air", "Hydro Quebec", "Transport Canada", "Kelowna Flightcraft Air", "Kenn Borek Air", "Air Nunavut", "Syncrude Canada", "DHL", "Voyageur Airways", "Gama Aviation", "Omni Air International", "Aeronaves TSM", "USA Jet Airlines", "21 Air", "AeroLogic", "Conair", "Cargolux", "Kalitta Charters", "Kalitta Air", "Buffalo Airways", "Berry Aviation", "GetJet Airlines", "IFL Group", "Delta Private Jets", "Volga-Dnepr Airlines", "Cavok Air", "Cavok Air", "Atlas Air", "Western Global Airlines", "Priority Air Charter", "Hi Fly", "Lufthansa Cargo", "Sky Lease Cargo", "Antonov Design Bureau", "ExpressJet", "Airnet", "Swiss Air-Ambulance", "Titan Airways", "Bluebird Nordic", "Sierra West Airlines", "AirBridgeCargo Airlines", "Everts Air Alaska", "Civil Air Patrol", "Lynden Air Cargo", "Air Atlanta Icelandic", "National Airlines", "Ameriflight", "Fltplan", "MasAir Cargo Airline", "United Nations", "Western Aircraft", "CargoLogicAir", "Amerijet International", "Bemidji Airlines", "Boeing", "Jet Logistics", "LATAM Cargo Chile", "Nippon Cargo Airlines", "Strait Air", "Gama Aviation (UK)", "Aloha Air Cargo", "Embraer", "Longtail Aviation", "Northern Air Cargo", "Silk Way West Airlines", "Suparna Airlines", "ABX Air", "Air Cargo Carriers", "Airbus", "Estafeta Carga Aerea", "Hangar 8", "Tyrol Air Ambulance", "Cobham Aviation Services Australia", "") == F,
str_detect(flight, "AC7") == F
)
data %>%
group_by(airline) %>%
summarize(count=n()) %>%
arrange(desc(count)) %>%
mutate(perc=round(count/sum(count)*100, 2))
We received the last data points on May 3. To avoid an incomplete month, we keep flights with a landing date earlier than May 1, 2021.
We end up with 1,050,452 rows and 18 columns.
data <- data %>% filter(date_landing < ymd("2021-05-01"))
head(data,5)
We also created some variables with specific dates that we will work with later.
firstCaseDate <- ymd("2020-01-25")
pandemicDate <- ymd("2020-03-11")
postponeDate <- ymd("2020-03-13")
aprilFirst2020 <- ymd("2020-04-01")
mandatoryTestDate <- ymd("2021-02-22")
britishVariantDate <- ymd("2020-12-26")
southAfricanVariantDate <- ymd("2021-01-12")
brazilianVariantDate <- ymd("2021-02-07")
Just to have an overview of our data, let’s create a map showing all the unique flows from one place to another. We have 6,331 unique combinations coming from all over the planet.
dataMap <- data %>%
select(origin_lon, origin_lat, destination_lon, destination_lat) %>%
unique() %>%
filter(
!is.na(origin_lon) &
!is.na(origin_lat) &
!is.na(destination_lon) &
!is.na(destination_lat)
)
plot_my_connection=function( dep_lon, dep_lat, arr_lon, arr_lat, ...){
inter <- gcIntermediate(c(dep_lon, dep_lat), c(arr_lon, arr_lat), n=50, addStartEnd=TRUE, breakAtDateLine=F)
inter=data.frame(inter)
diff_of_lon=abs(dep_lon) + abs(arr_lon)
if(diff_of_lon > 180){
lines(subset(inter, lon>=0), ...)
lines(subset(inter, lon<0), ...)
}else{
lines(inter, ...)
}
}
par(mar=c(0,0,0,0))
map('world',
col="#505050", fill=TRUE, bg="black", lwd=0.05,
mar=rep(0,4),border=0, ylim=c(-80,80)
)
for (i in 1:nrow(dataMap)) {
plot_my_connection(dataMap$origin_lon[[i]], dataMap$origin_lat[[i]], dataMap$destination_lon[[i]], dataMap$destination_lat[[i]], col=rgb(1,1,0, alpha = 0.1), lwd=1)
}