Recently I organized sessions at two strikingly different conferences (1) The Society for Mathematical Biology (SMB) meeting and (2) The International Congress for Conservation Biology (ICCB). Both featured quantitative approaches, but presentation styles and modeling philosophies differed remarkably between the two conferences. You might be surprised to find out that the conservation scientists at ICCB were, on average, using more “complex” mathematical models than the mathematicians at SMB. How can this be? Shouldn’t mathematicians be analyzing more complicated equations? Answer: No, and it has to do with a trade-off between model “complexity” and model “transparency/tractability,” which I will explain below using YouTube video and Uber driver ratings.
Uber utilizes a 5-star rating system. When you take a trip in an Uber vehicle, you rate the experience from one to five stars (five being good and one being terrible). Uber then displays the mean rating across the driver’s most recent 1000 trips. Similarly, drivers rate passengers the same way. These ratings are then used to determine whether drivers and passengers are allowed to remain on Uber.
YouTube’s rating system is quite different. They simply ask you to rate a video “thumbs up” or “thumbs down.” Youtube then displays the number of people who chose each option.
Why are these two rating systems so different and what do they have to do with model “complexity” and “transparency/tractability”? Uber’s model of a driver’s performance is complex: there are five choices for how to rate your driver, meaning each rating provides more detailed information than the like/dislike approach employed by YouTube. This creates a new challenge. Uber must now present more complex data to their customers. They opt for taking the mean, which removes information. Surely, a driver with 1 star and 1,000 trips means something very different from a driver that has 1 star and only one trip. But displaying the mean makes it easier for riders to digest the meaning of the rating. One could potentially opt for a complex display that doesn’t remove such info, [e.g. a frequency distribution (histogram) of star ratings for each driver]. This type of information is provided by some websites, such as Yelp and Amazon, but in general, this approach is mostly avoided because a distribution is more difficult to understand than a single number such as 4.6-star rating.
XKCD comic on the 5-star rating system https://xkcd.com/1098/
The added complexity of the star rating system creates another issue. What exactly does a 4-star experience entail? One person might give 1 star for a small mistake. Others might reserve a one-star rating for things as extreme as physical violence. Model complexity introduces a new form of ambiguity, undermining the true meaning of a displayed star rating.
YouTube initially started with a 5-star rating system, but then they realized that the vast majority of people were rating a video either 1 or 5 stars anyway. So they ditched their 5-star rating system for the thumbs up, thumbs down approach. Netflix and other companies have followed suit. So what makes this system so great. (1) the viewer of the rating gets complete information and (2) there is less ambiguity as to what the rating means.
You might be thinking at this point, “why would Uber opt for this more complex and less transparent rating system?” I can’t answer this question, but I can take a guess based on my observations of conservation scientists and mathematicians at my last two conferences. Mathematicians are obsessed with understanding “why” things are true, while conservation scientists are obsessed with projecting the consequences of our actions onto the future state of the environment. For a decision maker, “understanding” only really matters when it leads to better decisions. Model complexity may in some cases improve predictability even if it is too complex to understand completely.
Since Uber uses its ratings to determine the fate of a driver’s employment, they are likely interested in predicting who will be a good long-term driver. A fine grain rating system might be required to make these predictions well (and there may be far more sophisticated things, behind the scenes, rather than calculations of mean star ratings, to do such predictions). At Uber, users can’t select drivers based on their rating so transparency to users may not be so important anyway.
To summarize Uber may be more like a Conservation Scientist and YouTube might be more like a mathematician.