The 2017 AI Challenge

Background

Food Security

Our world is running out of cropland. We’ll add 2 billion more people by the year 20501, but we’re currently using our arable land and water 50 percent faster than the planet can sustain.2 At the same time, the crops farmers plant face an unprecedented set of obstacles due to increasingly limited growing conditions and climate change.

How will we be able to grow enough food to meet world demand?

Today, the agriculture industry works to optimize the amount of food we gain from each plant by breeding varieties with the strongest, highest-yielding genetics. Scientists at research and development organizations like Syngenta create stronger plants by crossing two plant varieties as parents, and then selecting the best offspring over time to provide to farmers.

Water Resources

The current breeding process, however, is highly technical and cumbersome. One cycle takes about nine years, requires vast testing resources and results in only moderate yield increases (called genetic gain) in crops. It includes many failures along the way.

We believe data-driven strategies can help our industry breed better seeds, faster. Developing models that identify robust patterns in seed genetic data may help us more accurately choose seeds that increase the genetic gain of the crops we plant – and will help us address the growing global food demand.

Research Problem

Each seed variety of any plant has a unique genetic composition and must pass through a series of “stage gates” in order to be selected by scientists to breed (Figure 1). Each year, after the data from yield tests are analyzed, breeders decide whether to continue testing the variety or discard it. At the final stage gate is the decision to offer the seed variety to growers.

Figure 1: Testing and selection scheme for the class of 2014 seeds. Several hundred experimental soybean varieties were evaluated at up to 10 locations in 2012. After the experiments were harvested and the yield data collected, 15% of the varieties were selected to advance to the next year of testing, while the rest were discarded. In 2013, the selected varieties were evaluated at up to 30 locations with the top performing 5% selected for the final year of evaluation. Following testing in 2014, the top performing 5% of varieties were selected to become commercially available for farmers to buy.

Though this is one way to select varieties, this method doesn’t show a variety’s true fitness once it is planted. Many varieties are not successful (non-elite) after they become commercial. We consider this a Type I error.

  Class of 2011 Class of 2012 Class of 2013 Class of 2014
Stage One Testing 2009 2010 2011 2012
Stage Two Testing 2010 2011 2012 2013
Stage Three Testing 2011 2012 2013 2014
Field Evaluations 2012-2013 2013-2014 2014-2015 Predict
2015-2016

Table 1. Data structure: commercialization year, class, and year of testing. The 2011 to 2013 classes can be used to train a model to make prediction for the class of 2014.



1 United Nations Report, 2013
2 Global Footprint Network