The 2017 AI Challenge

It is November 2014 and you are a breeder at the end stage of seed selection. You are responsible for selecting varieties for commercial release (Table 1). You have data from the current testing year (2014) and variety performance data from previous years.

Goal of the Challenge: To develop a model that could be used to help scientists analyze large amounts of seed data more efficiently and effectively, leading to improvements in the world’s ability to grow more without using more resources.

Research Question: Which soybean varieties will perform better in farmers’ fields in 2015 & 2016?


Submit your solution in the following four parts:


  1. Design a model that predicts the 2015-2016 yield of the seed varieties from the class of 2014 (a proxy of true fitness). Your yield predictions should be provided as a full codalab.org software/model pipeline that operates on the dataset provided, and outputs a file with one prediction per line:

    VARIETY_ID, PREDICTION

In a 5- to 20-page scientific write-up, explain the following:

  1. Predict and explain which seed varieties tested for commercialization in 2014 were truly “elite”
  2. Use information from previous years to provide estimates of Type I errors, and provide recommendations to reduce them
  3. Identify patterns in the genetic information that predict whether a variety is “elite” and support how you arrived at your conclusion

In order to help promote generalizable models, we plan to release training data for the Syngenta AI Challenge in stages. The following schedule will apply:


Stage 1 - January 30, 2017

Participants will receive data describing all varieties that were tested in the 2014 class for the year 2012, along with the experimental yield data for those varieties, and geographic, soil and genetic characteristics.

Stage 2 - March 1, 2017

Participants will receive data describing varieties of the 2014 class that cover all experimental trials for year 2013, indicating which of the Stage 1 varieties advanced to the following stage.

Stage 3 - May 1, 2017

Participants will receive data describing varieties of the 2014 class that cover all experimental trials for year 2014, indicating which of the Stage 1 and Stage 2 varieties advanced to the final stage.