This news blog provides news about the e-IRG and related e-Infrastructure topics.

Back

ECMWF's scalability programme and co-design effort result in proposal for "Extreme Earth" Flagship Initiative

During the recent e-IRG Open Workshop in Sofia, Bulgaria, Dr. Peter Bauer, Deputy Director of Research at the European Centre for Medium-Range Weather Forecasts (ECMWF), gave a talk on the need for co-design in supercomputers for weather prediction. Afterwards, we met with Peter Bauer who was happy to join the e-IRG Workshop in Sofia, which he described as a well-organized event bringing together national representatives involved in high performance computing (HPC) strategy and how to handle Big Data in the future, together with application owners like ECMWF.

ECMWF is an international and independent organisation that sits in Reading, in the United Kingdom and was founded by its Member States in 1975 to centralize scientific excellence but also resources in a single place to provide medium-range forecasts that are much better than any national weather service could perform. Since then, this concept has been proven successful because ECMWF is leading medium-range global weather forecasts.

Despite this success over 40 years ECMWF is reaching now a period where realizing the ever increasing forecast skill requirements by Member States of the global community on future computers becomes increasingly difficult. This is because the codes become more and more complex. ECMWF wants to invest in spatial resolution that allows the ECMWF researchers to represent physical processes with more and more accuracy. ECMWF wants to invest in complexity because eventually the scientists have to represent the entire complexity of Earth system processes in a single forecasting system from atmosphere processes but also from atmosphere chemistry - not only physics - ocean processes, sea ice, land surface processes and the end direction of all these components in the system.

ECMWF also invests a great deal in ensemble prediction. Ensemble prediction has been founded by the meteorological community to also forecast predictional certainties, not only a prediction of a temperature or precipitation in a place, but also the uncertainty of a prediction that changes a lot depending on how good the model and the initial conditions are, of course, but also on regimes. Certain regimes forecasting temperatures in Sofia for the next ten days is easier than in other situations and the researchers need to assess that to provide their users with a feel for the probability or the likelihood of events and give them confidence in their decision making.

All these investments in resolution, complexity and ensembles require enhanced computing. The models become more and more complex and more and more difficult to run on the current HPC infrastructures. This is why ECMWF needs a concerted investment in proving the models, at the same time investing in computer code development, and together with high performance computing industry and experts to also prove the hardware side of this coin. This is a real co-design effort that ECMWF needs to invest in to be able to run its models with much enhanced prediction skill but at an affordable cost, even in ten years.

How are you working on that? Are you starting new projects?

ECMWF understands the challenge as something that cannot only rely on individual, small research projects but that needs a comprehensive effort actually. For this purpose, ECMWF has founded a so-called scalability programme about five years ago where the researchers said that it is not just a small-scale effect looking at a particular aspect of the forecasting model to make it run faster but to look at every individual aspect of the entire chain between receiving observational data, dealing with increasing volumes of satellite observations, for example, making the model faster but also producing the forecast products faster for the Member States and users to use.  

This entire chain needs to be improved. It requires mathematical changes, coding changes. It requires workflow changes and a tight collaboration between mathematical or algorithmic developments and workflows but in the end the researchers also have to put this on computers that are changing architectures right now. So, ECMWF is aiming for moving targets. ECMWF needs to make sure that it can run these codes on today's technology but also on technologies that one is expecting in 10 or 20 years.

Do you expect that there will be different kinds of architectures for different kinds of applications in the future?

There is right now this discussion between general-purpose computing and domain-specific computing. General-purpose computing is what we had in the past. There were single-type architectures like x86 CPUs that served everybody sufficiently well, that scaled according to the increasing requirements of most applications following Moore's Law. That appears to be coming to a halt now. New technologies appear. There are processors coming for different types of users like GPUs, for example, FPGAs, other types of CPUs. We need to make sure that we can exploit those. Some of them promise the same type of computational performance, or even better, at much lower power cost. This is something that ECMWF needs to exploit given the specific requirements. The investments that are needed for making ECMWF's codes work and these types of architecture require again investing research in programming models.

One thing is to develop the physics that you put in code and that you want to run on any type of machine but then, some of these architectures require specific programming models to be able to exploit the specific features of that architecture and we need to invest in that. This is not something an application owner can do on his own. It needs to be done in a concerted effort, a so-called co-design.

The scalability programme started in 2013. It has gained significant momentum since then. It is strongly supported by ECMWF's Member States. ECMWF has succeeded in acquiring a number of externally funded projects, that are participating but also coordinating. These projects are funded by the European Commission's Horizon 2020 programme. They focus on many different aspects like algorithms, programming models, workflows, and these things.

What the researchers realized when they discussed this entire complex within their community is that to really redesign forecasting systems within an entire community requires a much larger effort. ECMWF has formulated this in a proposal to the European Commission for a future Flagship Initiative. Flagships are very ambitious science projects that the European Commission is supporting. These are 10-year projects that try to unite entire communities to perform fundamental scientific research and perform structural changes in Europe serving the European society. Examples are the Human Brain Project, the Graphene Project, and the Quantum Technology Project, for example.

Recently, the European Commission has drafted another Call for ideas for a new generation of flagships. The weather forecasting community has responded to that and formulated an idea that is called "Extreme Earth" which focuses on the entire revision and renovation of forecasting systems, focusing on extremes in the Earth's system, and hence the name. It focuses on extremes forecasting because extremes are very impactful. Just think of the hurricanes that affected last Summer and Autumn season large parts of the southern United States. Just two of them, Harvey and Irma, caused 300 billion worth of damages. Extremes like this, but also droughts, earthquakes, volcanoes are extremes of the Earth system that we need to predict with much enhanced reliability in the future. The Earth system aspect of Extreme Earth brings together what we do in the atmosphere, ocean and sea ice with the solid Earth community. This reflects on the volcano and earthquake aspects of it.

This idea brings together these types of applications with high performance computing as the main enabling technology. With high performance computing is meant the actual computing aspects and the technology mentioned earlier but also the data issues dealing with large volumes and very diverse types of data in the future, from the entire Earth system, how to manage that, how to make this information available so that users across the community but also downstream communities like the energy and food sectors, risk assessment, risk management, national agencies in that context can use the data in a much more efficient way.

Thank you very much for this interview.