Our world is running out of cropland. We’ll add 2 billion more people by the year 20501, but we’re currently using our arable land and water 50 percent faster than the planet can sustain.2 At the same time, the crops farmers plant face an unprecedented set of obstacles due to increasingly limited growing conditions and climate change.
How will we be able to grow enough food to meet world demand?
Today, the agriculture industry works to optimize the amount of food we gain from each plant by breeding varieties with the strongest, highest-yielding genetics. Scientists at research and development organizations like Syngenta create stronger plants by crossing two plant varieties as parents, and then selecting the best offspring over time to provide to farmers.
The current breeding process, however, is highly technical and cumbersome. One cycle takes about nine years, requires vast testing resources and results in only moderate yield increases (called genetic gain) in crops. It includes many failures along the way.
We believe data-driven strategies can help our industry breed better seeds, faster. Developing models that identify robust patterns in seed genetic data may help us more accurately choose seeds that increase the genetic gain of the crops we plant – and will help us address the growing global food demand.
Each seed variety of any plant has a unique genetic composition and must pass through a series of “stage gates” in order to be selected by scientists to breed (Figure 1). Each year, after the data from yield tests are analyzed, breeders decide whether to continue testing the variety or discard it. At the final stage gate is the decision to offer the seed variety to growers.
Figure 1: Testing and selection scheme for the class of 2014 seeds. Several hundred experimental soybean varieties were evaluated at up to 10 locations in 2012. After the experiments were harvested and the yield data collected, 15% of the varieties were selected to advance to the next year of testing, while the rest were discarded. In 2013, the selected varieties were evaluated at up to 30 locations with the top performing 5% selected for the final year of evaluation. Following testing in 2014, the top performing 5% of varieties were selected to become commercially available for farmers to buy.
Though this is one way to select varieties, this method doesn’t show a variety’s true fitness once it is planted. Many varieties are not successful (non-elite) after they become commercial. We consider this a Type I error.
Class of 2011 | Class of 2012 | Class of 2013 | Class of 2014 | |
Stage One Testing | 2009 | 2010 | 2011 | 2012 |
Stage Two Testing | 2010 | 2011 | 2012 | 2013 |
Stage Three Testing | 2011 | 2012 | 2013 | 2014 |
Field Evaluations | 2012-2013 | 2013-2014 | 2014-2015 | Predict 2015-2016 |
Table 1. Data structure: commercialization year, class, and year of testing. The 2011 to 2013 classes can be used to train a model to make prediction for the class of 2014.