One of the keynote speakers at ISC'17 in Frankfurt, Germany was Peter Bauer from the European Centre for Medium-Range Weather Forecasts (ECMWF). ECMWF is a European organisation, not to be confused with the European Union. It has very similar Member States. ECMWF was founded 40 years ago with the idea to centralize in one place scientific excellence and HPC power. It has been very successful since this time, according to Peter Bauer. For 40 years, it has been the world leading centre in terms of medium-range weather prediction so it's a very successful European project.
Peter Bauer promised to provide a light presentation for people who are not familiar with weather forecasting or climate prediction so that they would get a bit of the insides of what it actually takes. What is behind weather forecasting and climate prediction is very complex and eventually, this is going to be an exascale problem, even though it is not quite one yet.
Before he went on, Peter Bauer thanked a number of people, mostly from ECMWF, but not exclusively. Bjorn Stevens is a professor at Max Planck in Hamburg; Willem Deconinck and Peter Messmer are working at NVIDIA; and Thomas Schulthess is active at CSCS in Lugano. The team is involved in projects, such as ESCAPE, ESiWACE, and NextGenIO, that it has been participating in, in the context of HPC.
Peter Bauer provided the audience with a bit of history on the first attempts in weather prediction. The first idea that weather needs to be predictable and most likely is predictable, stems from Vice-Admiral Robert Fitzroy (1805-1865) who happened to be the captain of one of Charles Darwin's voyages to South America at that time. They didn't get along very well with each other but they both excelled in their respective disciplines. Robert Fitzroy wrote a number of letters to The Times from which the audience could see some quotes. In fact, they were a good journal at the time. The letters were addressed to "those whose hats have been spoiled by umbrellas having been omitted".
Basically, the motivation for Fitzroy's attempts to measure pressure at the time and from this to predict the weather to a certain extent stems from the need to protect sailors' lives and to give storm warnings. Later, this became more physical and more scientific. There were parellel efforts in the US and Europe at the same time with Cleveland Abbe (1838-1916) in the US and Vilhelm Bjerknes (1862-1951) in Europe who formulated the first set of equations that are still used today in weather prediction. They are mostly based on fluid dynamics obviously but of course they include simplifications because at that time the knowledge was not present to cast these equations in full form with all the prognostic variables that are available today, Peter Bauer explained.
The person who really advanced weather prediction was Lewis Fry Richardson (1881-1953). He tried to formulate and cast the analytic equations into a framework to solve them. His first attempts to do this were not very good. The outcome was wrong. It was later understood why but nonetheless the methodology was correct and it was the first time this was being done. It was the kickoff for American weather prediction and this is where the term "numerical" comes from.
Later, computers became involved at the time of Fry Richardson. It was actually a manual process. If you think of today's weather forecast being run on thousands of cores, at that time it could be thousands of scientists to perform the calculations manually and communicating by shouting. Later, computers were more and more involved. Peter Bauer cited John von Neumann's projects investing into electronic computing. One of the sub projects was on weather forecasting. It was headed by Jule Charney (1917-1981), who was able to perform an even further refinement of the equations that we use today.
One of the first calculations, published in 1950, was performed on the Electronic Numerical Integrator and Computer (ENIAC), a computer that was set up in Maryland. It was quite a beast of 30 tons, and in the way it was constructed with 18.000 thermo-ionic valves. The first forecasts were done in a very simplified way. The method was basically just covering the US and had a number of grid points of 16 times 19 with no equal spacing over the US but of the order of 400 to 700 kilometer resolution. The single 24-hour forecast needed 24 hours of compute time but it was quite a demonstration, Peter Bauer told the audience. In 2006, some students put this on a Nokia, which is many mobile phone generations ago. The same forecast took only one second so this tells you something about the evolution of computing.
Another very important milestone is the introduction of uncertainty, the concept of uncertainty and probabilities. Obviously for any kind of prediction that is not exact and which you have to initialize somewhere and where the conditions aren't perfect, you need to quantify uncertainty. Peter Bauer showed a graph and explained that scientists need to initialize their models. This is not a point with exact knowledge of the current state of the global atmosphere but there's uncertainty illustrated by the circle in the graph. The scientists launch a whole bunch of forecasts with slightly different initial conditions. What they get is a certain area that is more narrow than climatology. The scientists want to have a more precise forecast than is always the average forecast for tomorrow. So, they have to narrow the space that they can cover in terms of predictions but they do this by 50 or 100 individual forecasts, called ensemble forecasts. Peter Bauer showed that the scientists get 50 different scenarios of likely weather evolutions in 4 to 10 days. This is what scientists do today.
The concept was born by Ed Lorenz (1917-2008). Most people know the butterfly graph. Ed Lorenz is the founder of chaos theory which we still use today at all scales. In terms of predicting only a single state of the atmosphere, scientists predict a probability distribution of states. That is very important for extremes prediction where you look at the probability distribution and at the tails of these distributions, Peter Bauer explained.
As for today's skills, the good thing about weather prediction is that you can verify it and meteorologists do it every day. They use about 40 million observations per day in their analysis system that creates the initial conditions. This is about the number of observations they have available to verify every forecast every day. In terms of an application it is quite unique that they can do that. In the graph that Peter Bauer showed, the audience could see a time series of the evolution of skill. This is a certain skill measure - it doesn't matter which one but the higher the better. There are different colours for different kind of weather parameters like surface pressure, geopotential which is like an incarnation of large-scale dynamics, lower level temperature, 2-meter temperature, thermal wind, and cloud cover. Most of these are weather parameters we can all relate to and which are all important to the experts. Getting the right ones is important, according to Peter Bauer.
He showed they all go up so there's a steady improvement in predictive skills for which there are many reasons but they don't go up in the same way because they depend on different things in the model. They depend on how you change the resolution or on how you improve a physics scheme of one thing or another. Then, you get improvements in one of these parameters but it is not very straightforward to do that all the time everywhere to enforce it. That is one important thing but the main message is the experts can verify their predictions every day.
Next, Peter Bauer told about an area where Europe is world-leading. If you take one of these measures to get into a time series of skill one can see that the European models, the blue and the red one, are leading the pack compared to other models. Peter Bauer thought that this is quite stunning. This proves also that the concept of centralizing excellence in HPC in one place pays off.
In the future Peter Bauer expects improvements will come from a number of areas. The obvious one is model resolution of course. Another one is model complexity. One has to understand that this becomes increasingly important if we go to a global range, that the earth is not just an atmosphere but that the atmosphere is affected by the ocean as well, by land surface, sea ice, maybe the higher atmosphere and so forth. The experts have to add model complexity to enhance skill. Peter Bauer also mentioned the ensemble component. Ideally, the experts use the same type of models to run long integrations in many time steps. In terms of creative skill, resolution values accuracy or investment in ensemble prediction to predict the probability distribution well. Complexity values your range because many of these processes in the ocean are slower than in the atmosphere so when you couple it with the ocean you get a better grip on the longer scales.
In terms of cost, Peter Bauer showed just copies of models, so this is linear in HPC. In fact, he said, this is worse than linear because it is not just a linear superposition of added prognostic variables, for example, so this is probably a more costly growth than linear and this is about cubic, so with time steps and two dimensions in the horizontal. This needs to be traded off against each other, depending on what you actually want to do with your forecast model.
Since experts are limited by resolution and Peter Bauer mentioned this cost factor of cubic, they need to think about whether they want to invest in resolution or not because the cost growth is huge. Obviously, you don't resolve all scales. Some scales like small clouds, convection, some radiation effects on land surface, are actually not resolved and accurately represented in models. If you take the equations that Peter Bauer mentioned, the black parts are actually resolved, he showed, so they are actually described in the model very accurately as long as the experts have enough resolution. However, they have a problem with the red ones. They come from these parameterizations and they are important. If the parameterizations are very crude, they will affect the accuracy of these predictions very strongly. Ideally, you resolve at a lower resolution to have as little parameterizations as possible, but again that comes for a cost.
If you want to get rid of these red terms, the only solution is not using parameterizations and to crack up your resolution. Peter Bauer showed that if you look at the one big factor in the middle, which is convection - so the deep convective clouds that you see in the tropics, for example, or storm clouds - this kind of thing is parameterized quite well and it's a large source of error in all models. Even though the systems are quite fast developing in the tropics, for example, in life cycles of half a day, they affect large-scale circulation and experts believe that the link between the tropics and the high latitudes depends a lot on how accurately experts can represent convection. Ideally, experts don't need a convection representation; ideally, they resolve it but that means that they need scales of about one kilometer resolution. Right now, they are at about 10 kilometers. Then, you could play this game further down. The further you go down, you will lose other parameterizations and have a more accurate model but, of course, there are limitations with computing.
Peter Bauer in any case wanted to make this one point that right now what his team formulates is its leading key science case: they want to have the global capability at one kilometer in their system model with about a calculation rate of one to five years per day. They are not yet there. It is less, about 240 days per day, which is like three quarters of a year. If you calculate that on the grid points you have, the layers, the variables and the ensembles, that is an exascale problem, Peter Bauer stated. That is a very important science case. His team can formulate it and they all agree on it.
Peter Bauer and his team recently performed some scaling graphs. These are based on the ECMWF model and most of them are single-precision. He showed the forecast day-per-day figure. The team was aiming at one to five years. At the centre, the audience saw about 240 days which translates to completing a 10-day forecast in one hour. The team does this twice a day. Today, the team is running a single forecast of 9 kilometers. In about five years, the team would like to run this as an ensemble. This is already bigger than the single cluster that ECMWF has. If you go to a higher solution of 5 kilometers, 2,5 kilometers or 1 kilometer, this is about 35.000 cores x factor 15. If everything scales well, you have to add 15 times ensemble numbers, so the team is still far away from it.
In a certain way, more FLOPS are going to make the team happy, because all these curves are going up to a certain degree, even though the model is clearly not efficient enough to benefit from this. In another way, it's not the case, because if you look at the history of computing at ECMWF, the sustainable peak performance has gone down to 5% and this is something that all the system models suffer from. It is very inefficient and it is something the team has to work on, Peter Bauer explained.
So, what's wrong with weather and climate codes, Peter Bauer asked. For this, the team needs to look at the problem that it integrates with every time step. If you take one of the equations - a prognostic equation for wind, in this case - you have two terms that relate to the result part, the black terms. You also have other terms that relate to the parameterization, the red terms, such as coupling and coupling means in this case, coupling the atmosphere model and the ocean model. There are different contributions with every time step from all of them, as the dynamic motions in the atmosphere evolve. These individual contributions are being added up at every time step.
At ECMWF, the experts are running a global spectrum model. The team calculates the sequential time steps; transforms the results into grid point space; applies advection and physics with expensive calculations; couples the ocean waves; performs iterations using the 3D-solver; and finally transforms the results back into the spectral space. This is done with every time step. All these problems have very different issues with computing, Peter Bauer explained. The transforms are global and thus very large. Communication bandwidth, networking, and scalability are issues. The physics are single column usually, meaning that it is independent, so the team can domain-decompose them and thus make them scalable but usually they're very expensive. The radiation scheme in the ECMWF configuration represents 15 percent of the total model. Therefore, the team only uses every third grid point and only runs it once an hour because it's so expensive.
There is quite a range of challenges that ECMWF has to deal with in a model and the experts have to deal with them somehow individually to improve individual aspects before they can assemble the entire model. The number one problem is that everything is sequential so there's no way for a complex system like this with so many degrees of freedom to actually perform the calculation in parallel. The more time steps you use, the worse it gets. For a 10-day forecast, the team uses 2000 time steps. The bigger you can choose your time steps, the better it is.
Peter Bauer showed how important it is to work on everything together. At a certain time, MeteoSwiss was faced with the challenge to improve the grid resolution. The team spoke about ensemble prediction and added data simulation which created a certain growth, with the same budget. The team had to pull back the cost curve to where they could afford running these simulations. There were a number of factors they tried but the biggest investment they had to make was in software development.
Peter Bauer told the audience about the ESCAPE project which stands for Energy efficient SCalable Algorithms for weather Prediction at Exascale. The team does something similar in ESCAPE. They start with the global model and come back to the sequence of time steps. They try to extract the individual parts such as the physics and the transforms in order to deal with these problems individually before they put them back together. The team thus extracts so-called model dwarfs to adapt them to certain hardware types available in the project. The team then explores alternative numerical algorithms that are better suited for one or the other type of hardware and eventually reassembles the model.
The team managed to achieve quite a number of improvements in different areas, both in terms of data intensity and number of flops by profiling a dwarf. Of course, this needs to be assessed for all dwarfs, architectures and programming models. If you go to higher resolution problems, you can play with this across nodes. These are time critical applications so the experts need to fit their forecast into an hour. Time spent on this per time step is important, as well as is energy time step. The team has to trade off energy and time to solution.
Peter Bauer insisted on the importance of investing in DSLs, structures and programming models that separate and take away the hardware dependent part of a code from the actual science code. ESCAPE invests in this in two different ends. One is the Atlas library that allows you to represent all kinds of grids and meshes on a sphere and to perform domain-decomposition, and implement a variety of discretization forms, solutions, and formulations of your equations, depending on what you want to do. This builds in a certain scientific flexibility because, depending on the choices you want to make now or in ten years, you want to have some flexibility and not fix yourself too much on one solution that you may regret in the future. It allows you to deal with all the fields that are required and you need to address in these solutions.
The second investment involves a highly optimized kernel library, which is a Swiss development that interfaces with the Atlas library and that allows you to perform all kinds of operations on different types of grids. On the backend, it has different types of architectures. The idea in the end is to separate that part from the science part so that the scientist doesn't have to deal with it. In terms of weather and climate applications, investment in DSLs might be the only way to catch up with the HPC development fast enough to be efficient and portable at the same time, according to Peter Bauer.
There are all kinds of directions of research right now at the interface between weather and climate, and computing. A few of these are precision; hardware faults and thus the need for resilient systems that have a link with single-precision or floating hardware that can come with efficiency gains on its own, and programming models. In fact, there is a whole area of research projects being started, just in the vicinity of weather and climate, to make it more computable.
One year ago, Peter Bauer had a talk with Peter Messmer from Nvidia who asked: "Can't we do weather forecasting with neural networks?" Peter Bauer answered: "No way." But then he got to read publications in which he saw big companies investing a lot in this area and producing weather forecasts, making completely available some of them, having their own observations that they use to have an advantage over other competitors in this field. However, there are some three-level questions that have to be asked. The top level question would be: 'Can you actually replace the full dynamical model, explaining the resolution and the equations by a neural network or deep learning concept?' If that doesn't work, you can go one level down and ask: 'Can you replace one of the components?' Lastly, and Peter Bauer thinks this is mostly the area where these companies are shooting, there is the short-range forecast that is based on data analytics, bringing together observations and modelling data from various sources, tailored to a specific user and location, and attaching some impact modelling.
Addressing question one, Peter Bauer decided to give it a try and attempt to test whether neural networks could replace weather forecast models that are based on physics principles. In the context of the foundation of the chaos theory, Ed Lorenz has published in 1996 a simplified model that captures chaotic and non-linear interaction between scales, just like in the real atmosphere. It demonstrated the very non-linear behaviour and the strong dependence of the forecast's initial conditions using these models at the time. He introduced this concept of how important it is to have accurate initial conditions for a highly normal problem and of how chaotic the system can respond if you don't know it.
The model has three levels: X, Y and Z. There are a number of parameters in each layer: large-scale, medium-scale and small scale parameters. Peter Bauer used that model set-up to test the idea of what can be replaced and what not. Being no expert in neural networks, Peter Bauer took something of the shelf and ran that model for all three parameters, all three scales, and all three equations. Basically, he used one equation and neglected the small scales. This causes an error which is normal in weather forecasting, because there, you have no perfect resolution. This is what makes weather prediction inaccurate. He trained the neural network with truth. Then, he tried to predict X, just like in the real world, with no initial conditions. Both were compared with truth.
Peter Bauer showed the result on the axe-Y which is forecast time and the axe-X which is forecast error. There were two colour curves, blue and red. Blue relates to the highest resolution where you know the starting point and the initial conditions for x and y, so you have two different scales and you try to predict X for different ranges. The difference between blue and red is that you use less training data for red. This shows a friction of the truth programme for the short range. As long as you have enough training data, the neural network does not behave bad. It is just a simple way of doing it. However, with forecast range it deviates, due to the strong impact of having less training data. Peter Bauer admitted that it is just a demo but he found it quite interesting and wanted to continue research in this area.
The second example was about replacing a forecast model component, namely the radiation scheme, as it is one of the very critical and important costly things that is done in forecasting and thus a great candidate for replacing with neural networks. Peter Bauer told the audience that the team invested in this some time ago. The person who performed the test, needed a network for each layer and also a large number of input variables. In the end, he produced forecasts. The first five days, the predictions were actually quite good but the network needs to be trained on a vast number of atmospheric profiles and it needs to be retrained every time the model changes, which is twice a year. However, it achieved 8 times faster performance in calculations.
The last example addressed data analytics, post-processing and impact modelling. This is mostly used for very short-range forecasting. It focuses on a few key parameters, like temperature and precipitation. The observations are model-based. The access to certain information that nobody else has in terms of observations or model helps of course to get a better product. It is usually downscaled for local forecasts.
Peter Bauer also needed to say one or two words about data. He showed the production chain between data acquisition, running forecasts, product generation, and disseminating the forecast. You need to get the data out as fast as you can. In parallel to the science challenge, there is a data challenge which is to post-process that kind of data. Daily, you do this in real time. As you go with your model, you calculate your time steps, you run the post-processing at the same speed, so you don't have to do it sequentially afterwards. This saves about half an hour in the daily routine and experts can spend that half hour on a more complicated model.
One aspect of this of data is that, in terms of benchmarking what the experts do, has been mostly focused on computing. They issue a forecast model with different incarnations of that and then they give it to vendors and they run it. Still, one who is faster, wins. However, there is never any data on I/O involved. It is very difficult to convince vendors to embark on data and benchmarking of course and competitions but it is also very hard for the experts to do. So, there is now one new development in the context of the NEXTGenIO project, which is another project funded by the European Commission, in which the partners try to simulate the I/O workload that happens in their systems. It is very easily-portable because it is running dummy routines to produce the same profile of reads and writes in terms of access of disks, in terms of MPI communication, in terms of data volumes as the real thing. This makes it lightly, easily scalable, and easily portable to other applications. It is called Kronos.
In conclusion, Peter Bauer showed a graph with the biggest challenges in resolution as you go from courser scales to finer scales and having more complexity, as you go from simple models to more complex problems, with all these coupling axes and the resolution axis. There are certain processes that mostly are located at smaller scales, going from seconds to years. The blocks on the graph identified challenges. In these areas, there are parameterization challenges. Next to this, there is coupling. When you go to finer resolution models, this is where computing hits you the most. Computing and data handling are the number one challenges for weather and climate prediction in the next ten years. Weather and climate prediction is therefore a perfect candidate for assessing exascale HPC, Peter Bauer insisted.
He showed an animation of a 2,5 kilometer simulation over the Tropical Atlantic, performed at the Max Planck Institute. The audience could see an incredible resolution that no model is able to resolve, since 10 kilometers is the limit of what can be done today. Peter Bauer explained that if you see this kind of detail, this is where scientists want to go and this is only half the way to where they want to be, because they would like to do this globally, at one kilometer, fully coupled, and fast enough.
ECMWF launched the Scalability Programme Partnership in which experts look at things such as data, data simulation and observation, modelling, and Deep Learning. Peter Bauer showed a matrix that is going along three directions, namely the Member States, industry, and academia and HPC centres. ECMWF is collaborating with all these instances and the list is still growing. It is a very important project and certainly larger than ECMWF and probably larger than Europe. It is a global challenge.
The question then is, in the context that the partners have, in terms of funding, is the European Programme on Extreme Computing, Weather and Climate (EPECC) a FET Flagship? It is true that game changing prediction capability has an enormous societal relevance. It is good for Europe because it adds technological and impact leadership to a science area where Europe is already a world leader. Also, you should look wider than climate because there is a value chain from weather to climate to energy to food and to water, and a second value chain from science to technology to services. The entire weather and climate challenge that we have and where we need to go, requires probably an investment at maybe not just a European scale but internationally, concluded Peter Bauer.