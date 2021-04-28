Articles

By Bronwyn Howell

Similar to computers and the internet before them, Big Data and artificial intelligence (AI) are being hailed as denizens set to revolutionize medicine, finance, business operations, media, and more. Others argue they are dangerous new technologies set to disrupt society like nothing before. Most notably, they sit at the center of claims that jobs — particularly those of low-skilled workers — are doomed as cheaper and more-reliable algorithmically programmed machines (fueled by an ever-growing tsunami of Big Data) displace them. What should we make of these seemingly conflicting claims?

via Twenty20

People will continue to have jobs

First, it is far from clear that jobs are doomed. Technological change has always led to changes on the margin. But as New Zealand’s Productivity Commission recently found, total employment has historically grown as computerization has expanded.

Rather than facing John Meynard Keynes’ 15-hour work week, it seems we are busier than ever — albeit doing different jobs than before. Indeed, the intriguing question on productivity data over the past 15 years or so is why — despite the explosion of internet use and increased applications of Big Data and algorithmic decision-making — productivity growth internationally has been so sluggish compared to earlier times.

Limits to algorithms

Second, it is unclear that Big Data and AI concepts are all that new. Sure, decreased costs of computing power, storage, and data transportation have combined with widespread use of internet applications to allow vast amounts of data to be captured. These data can be interrogated using data models (algorithms) to establish patterns and make predictions. More-sophisticated algorithms can be programmed to update their parameters as subtle changes in the data emerge. The latter feature has led to them being dubbed “artificial intelligence,” since humans update their decision-making frames as new patterns are observed.

Yet mathematical (algorithmic) decision-making was based on these same principles even before computers existed. Statistical techniques such as linear regression use errors between a prediction and an observation to refine the parameters. Time series forecasting methods such as exponential smoothing continually (and automatically via the algorithm) update the weights used to predict the next period’s result from the error in the last period’s forecast. Most AI algorithms operating on Big Data are based on mathematical and statistical techniques such as these. So they will be as good as — or limited by — the same features that have characterized these statistical techniques in decision-making in the noncomputerized world.

Algorithms are essentially models of a situation being investigated. Complex situations with many variables are difficult to model, and precisely how these variables interact is not always known. Good models are parsimonious; they include just enough variables to identify salient features, but not so many that the model is not widely applicable. The availability of a lot of data from many variables does not necessarily lead to “better” models.

Also, what the data are may matter more than the number of variables included in the model. So too does some understanding of the situation in which the model is being applied. Putting data of the number of children born annually with the number of storks in each of a set of Eurasian countries will surely find a relationship between storks and children born. (This is because they are positively correlated: Large countries have large numbers of both storks and children born.) A third variable is being identified that is not in the model: the size of the countries.

Associations require interpretation. AI models tend to be constructed by data specialists with little knowledge of the circumstances in which the results will be used. They are no more immune to mistaken modeling and misinterpretation than when past data technicians churned out senseless regression models just because the relationships looked statistically good. It has always been beholden on those applying the models to assess their logic and reasonableness before applying them to real-life (and often high-stakes) decisions.

Furthermore, all data models using historic data are useful only to the extent that the past is a good predictor of the future. If something truly unexpected happens (e.g., based on a variable for which data are not being collected or are not included in the model) then errors will occur. The (in)famous Google Flu Trends model capturing the pattern that the flu mostly occurs in winter was unable to predict a flu outbreak in summer.

On biases

Policymakers have expressed concerns about biases in AI models. Ultimately, the models themselves cannot be biased, as all they do is manipulate the data they are provided with. If the data themselves contain biases, then these will manifest themselves in the model. But all data collection is subject to biases aligned to its collection. Data collected for one purpose can be used for another. But they will necessarily reflect any biases related to its collection for their first purpose. This applies to all data-modeling exercises — not just those associated with AI. While AI applications may highlight this, it is something that should always have applied in the assessment of the use of models for decision-making. While caution is indicated for AI, it is also indicated for all use of models. This is not new.

Bronwyn Howell teaches Decision Making in the Victoria University of Wellington School of Management. The comments here relate to mathematical and statistical models. The author notes that AI models based on pictorial and linguistic pattern recognition are outside her scope of expertise — hence the title’s emphasis on “algorithmic” and not the broader “artificial” intelligence.

