Introduction

There have been many news stories about Ontario colleges. Concerns have grown that Ontario colleges have forgotten their core mandate of training skilled workers in favor of attracting international students who pay much higher tuition. Questions have been raised about the quality of education students now receive and the incentives of college administrators. In this analysis, I focus narrowly on the salaries paid by Ontario colleges. In particular:

I use the Ontario government’s Sunshine List of employees who make over $100,000 annually in this analysis. (This analysis focuses on Ontario colleges, which are distinct from Ontario universities.)

The first section describes the data from the Sunshine List, focusing on the years between 2012 and 2023. The following section builds OLS, fixed effects, and between-effects models for inference about the potential causes of salary increases. These models will help explain the factors related to changes in college workers’ salaries.

Given the limited public information available, advanced predictive models build on these insights to predict workers’ salaries. To test the quality of these models, the data is split into a training component to train predictive models and test data to verify the results. The quality of the predictions and areas for improvement are discussed.

Data

Accounting for Inflation in Salaries

I account for inflation using the CPI to calculate the yearly price relative to a single base year. This enables the analysis to compare people’s salaries across time validly.

It is also important to remember that the Sunshine List only contains workers who make more than $100,000. This salary level is a nominal cut-off, so as inflation increases and the purchasing power of $100,000 decreases, we see an increasing number of people on the list over time. I have restricted the analysis from 2013 onwards to partially account for this. I also use 2013 as the base year to account for inflation. After adjusting for inflation, workers who make less than $100,000 are removed from the panel.

Data collection

I’ve created a panel of observations about workers at Ontario colleges. To accomplish this, I combined a series of annual Sunshine List data files into a panel of employees, which I followed over time. Observations of individuals are linked over time based on a common employer and the person’s first and last name.

The shortcomings of this approach are that I cannot easily capture people who have changed jobs between different colleges and whose salary in those transition years is below $100,000. It also potentially misses women who have changed their last name after marriage.

Salaries at Ontario Colleges

The following plot shows the distribution of salaries across Ontario colleges. Below is a box plot of salaries paid at Ontario colleges. The colored bands represent the mass of the distribution, and the dots are ‘outlying’ points. In this case, the dots represent employees who make significantly more than others in the organization.

Note that I use 2013 as the inflation-adjusted year. Many workers who make nominally more than $100,000 in later years are making less than $100,000 in 2013 inflation-adjusted dollars. These workers are kept in our panel because they still provide helpful salary information.

Box Plot of Salaries

Changes to the Data

Salaries in the data tend to have a long right tail, with very few college administrators making large amounts of money. The mass of the distribution, which includes lower paid professionals, comprises college teaching faculty and administrators. A common approach to normalize such distributions is to log transform salary. This transformation tends to improve the fit of the models we use, and we can easily exponentiate the results to see them again in dollars.

I have also merged data about each college’s student demographics to enrich the data. In particular, I focus on the two largest groups – Canadian students and students from India enrolled in Ontario colleges.

The final data transformation has to do with job titles. The Sunshine List contains job titles given by colleges, but there is no consistency in those titles. To simplify the variation, I have searched for common positions. For example, if the position title contains the words ‘professor,’ ‘lecturer,’ and ‘teacher,’ then the person is given the simplified title of ‘Professor’. If the title contains ‘vice-president of …’, they are categorized as a ‘VP,’ etc.

College Enrollments

I have also merged data about the number of enrollments in each college over time. My prior analysis found that Canadians and Indians comprise these institutions’ two largest groups of students.

Summary Statistics

Below is a summary of the variables that are available for examination.

  • Title2 is the simplified title given to a college worker that denotes their approximate position in the organization.
  • Employer is the name of the college that a given employee works for.
  • Year is the financial year a given Ontario college worker’s name, salary, and employer have been recorded on the Sunshine List. Years are used mainly as factors in this analysis but also as a trend measure for some earlier models.
  • Student_count is the number of students enrolled in the college for a given year. For simplicity, this is the sum of all Canadian and Indian students. Both groups make up the vast majority of students during 2012-2023.
  • enroll_canadian and enroll_indian are the number of students listed as Canadian or Indian as recorded by the college enrollment data.
  • “prop” is the percentage of Indian international students enrolled. This variable is the number of enrolled Indian international students divided by the sum of Indian and Canadian students enrolled.
  • is_prof is an indicator set for each observation where the person is listed as a lecturer, professor, or teacher.
  • faculty_num is the total number of faculty listed on the Sunshine List for a given college. (i.e., this is the sum of workers listed as professors for a given college.) This can be used as a rough guide to the school’s teaching faculty size.
  • The experience of the person is captured with the exp variable. This is the number of years a person has been employed by the college(s) for which they are employed.
  • ln_salary is the natural log-transformed salary of a college employee.
  • salary is the annual salary paid to a college employee

Table of Salaries in Ontario Colleges, 2022

Salaries by College in 2013 adjusted dollars
2022
Employer Median SD Max N
Humber $94,620 $22,366 $449,086 796
Sheridan $95,170 $18,921 $373,444 687
George Brown $94,702 $21,678 $368,889 609
Seneca $95,192 $18,808 $342,789 764
Conestoga $95,082 $21,201 $332,908 577
Algonquin $95,082 $16,618 $274,107 597
La Cit $95,081 $19,947 $265,873 173
Sir Sandford $94,880 $16,679 $249,764 230
Centennial $95,155 $17,786 $247,264 545
Durham $94,862 $16,867 $245,880 340
Fanshawe $94,841 $15,207 $244,277 503
St Clair $95,096 $17,314 $243,781 253
Sault $95,171 $19,413 $233,358 109
Niagara $95,082 $16,409 $231,887 352
Mohawk $95,115 $17,080 $223,765 426
St Lawrence $93,623 $18,411 $222,775 215
Lambton $95,118 $21,386 $214,770 132
Northern $95,269 $19,737 $212,716 80
Georgian $94,821 $17,566 $212,587 340
Bor $95,095 $19,861 $212,430 103
Cambrian $94,717 $16,905 $207,437 173
Canadore $95,115 $15,662 $199,430 108
Loyalist $97,085 $18,443 $187,978 124
Confederation $94,517 $14,644 $184,300 132

Correlations

The correlation plot shows the degree to which variables move together. Highly correlated variables should not appear in the same inferential regression model because they will affect the size and significance of the coefficients, making interpretation impossible.

There are expected relationships between the previously listed variables because of how they are constructed or, in some cases because they measure similar phenomena. For example, the number of Canadian students enrolled is correlated with the total count of all students. Similarly, the proportion of Indian students positively correlates with the time trend. We know from other analyses that Indian international students only began to arrive in Ontario colleges towards the end of our panel.

It’s also unsurprising that the number of faculty positively correlates with the number of students enrolled. Of course, as a college gains more students, it requires more teachers.

Creating Training and Testing Data

The full dataset is divided into train and test data. The train data is the part of the data I use to build models. The remaining data is the ‘test’ data. The test data is used to test the accuracy of predictive models based on the training data.

As previously noted, individual IDs are assigned based on the individual’s first and last name and employer. This assumption means we cannot track people who switch between employers or change their names (ex., married women.)

To create valid test and train data, I sample from all available person IDs and assign a portion to the train data set and a portion to test for final verification.

Job Movers - Looking For People Who Moved Between Colleges

Some code was written to capture cases of people who left a job at one college and switched to another. However, there can be cases where people switch and are not in the data because their mid-year salary does not meet the threshold of $100,000 to be included in the Sunshine list.

Inferential Models

Manual and automated testing with OLS models showed that a relatively small subset of variables provided the most explanatory power. For simplicity, this work will focus on variations of these explanatory variables.

OLS Model Specifications

We start with OLS models because of the flexibility and ease of interpretation. We start the specification using dummies for each simplified title commonly used at colleges. The current year is also set as a dummy variable because we have no reason to believe that salaries increase linearly. Instead, with the inclusion of the worst of pandemic years in our panel, it seems quite likely that there will be significant salary changes. We also include ‘exp,’ which represents the years of experience of workers at the college.

Two other additional explanatory variables are tested. The first is the proportion (prop) of Indian international students enrolled. The other is a count of the number of students enrolled in the college.

Sometimes, the most straightforward specification is the best, and adding more explanatory variables does not add to the overall explanatory power of a model. To determine which of the possible specifications is the best, I use the log-likelihood test to compare different nested models.

A series of log-likelihood tests indicate that the full model provides the best explanation of the variation in the data.

The following visualization shows how the model coefficients change with the addition of new variables. We can see that the models are relatively consistent – which is a good thing.

Ordinary Least Squares, OLS, models, which we have just looked at, can be specified to control for within variation by adding time-varying dummy variables. However, they still contain some between-variation (more on this below). OLS models are generally used in cross-sectional data. When they are used with panel data, they are referred to as pooled OLS. One needs to be aware of the risk of bias with misspecified OLS models and the risk that the errors of pooled OLS models used on panel data tend to be biased downward. (i.e., coefficients are listed as significant when they are really not.) This is why we often employ specialized models when developing inferential models with panel data.

OLS Model Visualizations

The differently shaped icons show how the model coefficients differ. Model 1 is the baseline model. Model 4 is the optimal model that contains the proportion of Indian students in the specification in addition to the same explanatory variables in Model 1.

The visualization shows that the simplified titles are the most prominent single salary indicator. All the salaries are compared to those of a professor (teaching staff). The default category ‘other’ includes admin staff and those job categories that were not obviously classifiable. Chairs and Directors make much more, followed by Deans and Officers of the college. The highest-paid positions belong to VPs and Presidents. The dummy variables used for years show some annual non-linear variation. Around the pandemic, we see a drop in average salary levels.

The expected annual salary increases with each additional year of worker experience (exp). This is a small effect, but it is multiplied by the number of years of service.

The size of the school, as measured by student count in thousands, does not represent a large change in any of the models despite being statistically significant. In other words, colleges with larger enrollments do not appear to pay much better than their smaller peers. Remember that regression models are interpreted as ‘holding everything else constant.’ So, people may still choose to work at larger colleges for other professional reasons – such as promotion opportunities – that are not available at smaller institutions.

The proportion of Indian international students is negative, indicating that institutions with larger cohorts of Indian international students compared to Canadian students tend to pay people less. This could be interpreted as a desire by these institutions to maximize profits regardless of learning outcomes. But of course, we can only speculate why colleges with higher international enrollments tend to pay employees less, all other things being equal.

OLS Model Full Results

Model 1Model 2Model 3Model 4Model 5
(Intercept)11.56 ***11.53 ***11.52 ***11.53 ***11.52 ***
(0.00)   (0.00)   (0.00)   (0.00)   (0.00)   
title2Chair0.13 ***0.14 ***0.14 ***0.14 ***0.14 ***
(0.00)   (0.00)   (0.00)   (0.00)   (0.00)   
title2Dean0.24 ***0.23 ***0.23 ***0.23 ***0.23 ***
(0.00)   (0.00)   (0.00)   (0.00)   (0.00)   
title2Director0.17 ***0.17 ***0.17 ***0.17 ***0.17 ***
(0.00)   (0.00)   (0.00)   (0.00)   (0.00)   
title2Officer0.46 ***0.44 ***0.45 ***0.44 ***0.45 ***
(0.01)   (0.01)   (0.01)   (0.01)   (0.01)   
title2Other0.09 ***0.09 ***0.09 ***0.09 ***0.09 ***
(0.00)   (0.00)   (0.00)   (0.00)   (0.00)   
title2President0.88 ***0.84 ***0.84 ***0.84 ***0.84 ***
(0.01)   (0.01)   (0.01)   (0.01)   (0.01)   
title2VP0.49 ***0.48 ***0.48 ***0.48 ***0.48 ***
(0.00)   (0.00)   (0.00)   (0.00)   (0.00)   
year2014-0.00    -0.01 *  -0.01 ** -0.01 *  -0.01 ** 
(0.00)   (0.00)   (0.00)   (0.00)   (0.00)   
year2015-0.01 ***-0.02 ***-0.02 ***-0.02 ***-0.02 ***
(0.00)   (0.00)   (0.00)   (0.00)   (0.00)   
year2016-0.01 ***-0.02 ***-0.02 ***-0.02 ***-0.02 ***
(0.00)   (0.00)   (0.00)   (0.00)   (0.00)   
year20170.00    -0.01 ** -0.01 ***-0.01 ** -0.01 ***
(0.00)   (0.00)   (0.00)   (0.00)   (0.00)   
year2018-0.00    -0.02 ***-0.02 ***-0.01 ***-0.02 ***
(0.00)   (0.00)   (0.00)   (0.00)   (0.00)   
year2019-0.01 ***-0.03 ***-0.03 ***-0.03 ***-0.03 ***
(0.00)   (0.00)   (0.00)   (0.00)   (0.00)   
year20200.00    -0.02 ***-0.02 ***-0.02 ***-0.02 ***
(0.00)   (0.00)   (0.00)   (0.00)   (0.00)   
year2021-0.01 ***-0.04 ***-0.04 ***-0.04 ***-0.04 ***
(0.00)   (0.00)   (0.00)   (0.00)   (0.00)   
year2022-0.02 ***-0.05 ***-0.06 ***-0.05 ***-0.05 ***
(0.00)   (0.00)   (0.00)   (0.00)   (0.00)   
year2023-0.01 *  -0.03 ***-0.04 ***-0.03 ***-0.04 ***
(0.00)   (0.00)   (0.00)   (0.00)   (0.00)   
exp       0.01 ***0.01 ***0.01 ***0.01 ***
       (0.00)   (0.00)   (0.00)   (0.00)   
student_count              0.00 ***       0.00 ***
              (0.00)          (0.00)   
prop                     -0.02 ***-0.02 ***
                     (0.00)   (0.00)   
N31801       31801       31801       31801       31801       
R20.62    0.65    0.65    0.65    0.65    
*** p < 0.001; ** p < 0.01; * p < 0.05.

Fixed-Effects Model Specifications

Fixed-effects models are commonly used in panel data studies because they are consistent and don’t have the same bias that misspecified pooled OLS models can have. The trade-off is that fixed-effect models generally have less explanatory power. Still, they remain the gold standard in inferential panel data analysis.

All models attempt to explain changes in variation. Fixed-effects models focus on the variation within groups of observations. In contrast, regression models called between-effects regressions examine how the variation differs on average between groups.

Below are four fixed effects (within) models and one between effects model. The similarities and differences are discussed below. (A Hausman test indicated that a random-effects model would not be justified.)

The model interpretation is quite similar to the OLS models. The simplified titles explain much of the variation in salaries. Experience also has a small but significant impact on salary. Interestingly, the proportion of Indian students is also negatively related to employee salary in the within-model. The number of college students has a significant but small relationship with college employee salary.

The between model (Model 5) focuses on the average differences between groups. In this model, the Presidents and VPs of colleges have a much larger average relationship with salary. Similarly, the between model also emphasizes the difference in salaries on average between employees at colleges with larger international Indian student groups and those without.

Fixed-Effects Visualizations

Fixed Effects Full Results

Model 1Model 2Model 3Model 4Model 5
title2Chair0.02 ** 0.02 ** 0.02 ** 0.02 ** 0.13 ***
(0.00)   (0.00)   (0.00)   (0.00)   (0.01)   
title2Dean0.07 ***0.07 ***0.07 ***0.07 ***0.22 ***
(0.00)   (0.00)   (0.00)   (0.00)   (0.00)   
title2Director0.05 ***0.06 ***0.05 ***0.05 ***0.15 ***
(0.00)   (0.00)   (0.00)   (0.00)   (0.00)   
title2Officer0.21 ***0.21 ***0.21 ***0.21 ***0.39 ***
(0.01)   (0.01)   (0.01)   (0.01)   (0.01)   
title2Other0.04 ***0.04 ***0.04 ***0.04 ***0.07 ***
(0.00)   (0.00)   (0.00)   (0.00)   (0.00)   
title2President0.18 ***0.18 ***0.18 ***0.18 ***0.74 ***
(0.01)   (0.01)   (0.01)   (0.01)   (0.01)   
title2VP0.16 ***0.16 ***0.16 ***0.16 ***0.48 ***
(0.01)   (0.01)   (0.01)   (0.01)   (0.01)   
exp0.00 ***0.00 ***0.00 ***0.00 ***0.01 ***
(0.00)   (0.00)   (0.00)   (0.00)   (0.00)   
prop       -0.01 **        -0.02 ** -0.05 ***
       (0.00)          (0.00)   (0.01)   
student_count              -0.00    0.00    0.00 ** 
              (0.00)   (0.00)   (0.00)   
(Intercept)                            11.51 ***