Using AI to improve genetic selection

No, this is not an article about artificial insemination – I’m talking about the other “AI.” Artificial intelligence (AI) is a phrase that is hard to avoid.

Matt Spangler

Associate Professor / Beef Genetics Extension Specialist / University of Nebraska – Lincoln

Email Matt Spangler

From our experiences using Google searches, driverless cars, suggestions on what to watch on Netflix and TV commercials recommending solutions to the world’s largest problems, artificial intelligence impacts most of us on a daily basis. Even fraud protection uses artificial intelligence to help determine if a charge was likely to be mine or a case of a stolen credit card number.

Machine learning is a subfield of artificial intelligence that includes the study of algorithms to make predictions and inferences. If you think about it, the routine genetic evaluations that have been conducted for decades make use of machine learning – using statistical models and algorithms to make predictions of genetic merit and infer which animals are the better candidates for selection.

I would contend the U.S. beef industry has been making use of machine learning since 1971, when the analysis for the first Simmental Sire Summary was conducted at Boeing. (That’s where the computers were in those days.) However, there has been recent interest in investigating more complex machine learning algorithms for the purpose of predicting genetic merit estimates (expected progeny differences [EPDs] in beef cattle).

Across a multitude of studies, spanning several livestock and plant species, the general consensus of the scientific literature is: These more complex machine learning algorithms do not provide better predictions than the statistical framework currently being used in beef cattle genetic evaluation systems. This is largely because we focus on additive genetic merit, and thus linear models are appropriate to use. So is there a place for complex machine learning algorithms in beef cattle genetic evaluations?

Earlier, I mentioned the use of artificial intelligence for fraud protection. If we think about fraud as “unexpected occurrences,” then we can imagine some of the data turned into beef breed associations might deviate from what would be expected. More specifically, if we think about an EPD trait such as birthweight (BW), then the distribution of birthweights within a contemporary group (a group of animals born in the same herd in the same year, as an example) might deviate from what we would typically expect.

The birthweights of calves and the distribution of birthweights within a contemporary group are impacted by several factors including genetics, age of the dam, sex of the calf, etc. Moreover, the distribution could be impacted by the method used to collect the weight. Some people might use a digital scale (preferred), while others might use hoof tape or their eyes (i.e., guess).

The distributional properties of weights collected using various methods could be different. In other words, if you plotted weights collected by a scale, hoof tape and those by guess, the plots would likely look different in terms of the average, the number of observations on the heavy and light ends and the amount of variation in birthweight. This is a potential use of machine learning for routine genetic evaluations – categorizing data based on quality.

To investigate this, we used simulated data to train a deep neural network. A deep neural network which can be thought of as a machine learning algorithm capable of contemplating complex interactions much the same as the neurons in our brains contemplate complex interactions. In this case, the factors included those known to impact the distribution of birthweights within a contemporary group. We then applied it in real beef cattle data to determine if we could reliably assign contemporary groups to categories based on perceived methods of data collection and the impact this might have on resulting EPDs for calving ease.

Contemporary groups were predicted to be in one of four categories:

Real (digital scale)
Tape (use of a hoof tape)
Dirty (think of peaks of observations at 70 pounds, 80 pounds, etc.)
Potentially fabricated

Admittedly, “fabricated” is a harsh word, but groups placed in this category did not follow biological expectations relative to the distribution of calf birthweights within the group. Results showed that correlations between EPDs when fabricated groups were removed from the genetic evaluation, compared to when they were retained, were high for both calving ease direct and calving ease total maternal (0.91 and 0.86, respectively).

However, the impacts are more likely to be noticed by individual herds and sires as compared to large shifts in the overall population. Animals were more likely to have large changes in their calving ease EPD if they were at moderate levels of accuracy (animals with some progeny), simply because some of these progenies could be removed from the analysis. This method of using machine learning to classify data before entry into genetic evaluations is currently being used by International Genetic Solutions, and this group has further demonstrated through validation that removing contemporary groups that are classified as fabricated leads to more accurate predictions of BW.

This represents a practical example of how emerging technologies such as artificial intelligence can be used to make genetic evaluations more accurate. In addition to classifying data based on how informative it might be, machine learning could also be helpful as we advance beyond genetic prediction toward predicting phenotypic outcomes. In example, predicting the optimal time on feed for a group of cattle requires combining the effects of genetics, prior management history, planned management practices and changing weather data. Where artificial intelligence could be helpful is in making use of the complex, hidden interactions among all of these factors to enable better managing and marketing decisions.