During the annual SURFsara Super D Event on December 12, 2017 in Amsterdam, The Netherlands, we had the pleasure to talk with Thomas Schulthess, Director of the Swiss Supercomputing Centre - CSCS - in Lugano, who gave a keynote presentation at Super D about the European exascale developments and the requirements to achieve this. Thomas Schulthess is a professor of Computational Physics at ETH Zurich. He leads the Swiss National Supercomputing Centre and the Swiss High-Performance Computing Initiative. This is very much concerned with building applications that perform at sustained petaflop performance, up till now. Now, Thomas Schulthess and his colleagues are thinking about the future, the so-called exascale.
We observed that exascale is really been taken seriously now in Europe with the initiatives of the European Commission, the European Union and the Member States. There is also the initiative called EuroHPC of which Switzerland is also a member.
Thomas Schulthess replied that he would actually argue that supercomputing has been taken seriously in Europe for many years. Europe has some of the most performant systems on the planet, if one counts Switzerland to be a part of Europe. There is a big push all around the world to make the next steps. These next steps will be challenging because we are no longer in a time where performance increases come guaranteed. Moore's Law is stapling out and one has to deal with this reality but this also makes it interesting. There are initiatives in the US, Japan, China, and now also in Europe. The good thing about Europe is that it has a combination of a top-down initiative, where the European Commission manages the effort, but also bottom-up initiatives. Europe is rather strong in the bottom-up initiatives that are sometimes science driven. Thomas Schulthess feels pretty good with this position.
We said that one of the goals is to go now to exascale but wanted to know what exactly is meant by exascale.
Thomas Schulthess explained that in the past one would go to petascale which meant petaflops and terascale meaning teraflops. In the last five years maybe, we have seen a disconnection between the application requirements and the requirements of peak flops. This makes it a little bit more difficult to define the goal. This is one view and, of course, there are different views. However, the view that Thomas Schulthess and some of his colleagues advocate, is that one goes after a scientific problem that is very ambitious and lies beyond the horizon for any individual group but it is a problem that represents a big value for society and then one builds systems to solve this problem. The problem Thomas Schulthess talked about in his keynote presentation is high resolution weather and climate simulation.
The idea is that you push the current simulations that have a lateral resolution of around ten kilometers to one kilometer. This is going to take about a factor 1000 more performance which is not trivial to reach but the payback is going to be tremendous because the models will resolve convective clouds which are like thunderstorms. By making the models computationally more difficult, you make them physically simpler and then, they become more predictive. This can really push the limits of what one can do in terms of prediction of weather and climate, going from days to weeks today to improving the scale of a prediction for the next season. This could have a big impact. It also allows the scientists to better understand the development of extremes and allows society to mitigate these extremes.
It is a very good subject for Europe because it is in the leading position for this. The European Centre for Mid-Range Weather Forecasts (ECMWF) is recognised as a world leader in this field. The Americans will use their forecasts when it comes to preparing for reacting or preparing for hurricanes. If you take a problem where Europe is in the lead, Europe can also define today's baseline. It is always easier to solve a problem first when you are already in the leading position. On the other side, this is not just a European problem. The weather and climate and the development of extremes, whether they are human-made or not, those are things that need to be figured out or determined. This is not a European problem, it is a global problem. Thomas Schulthess' hope is that with the combination of the know how in Europe on the model and the software side, in collaboration with system developers in the US, Japan and China, one actually will manage to reach this scientific goal early mid next decade.
We remarked that this is one scientific goal and wondered whether there should be more or just this one, because it is already representative enough.
Thomas Schulthess thought that there are other problems that have a similar combination of a scientific challenge that is really difficult but still doable - not beyond what one can ever reach - but requires an extraordinary effort, and a value to society that merits this extraordinary effort. Thomas Schulthess is sure you could formulate similar problems in molecular biophysics, peptide simulations, drug design, and materials design. Thomas Schulthess is convinced of the fact - and he knows some of his colleagues will disagree - that one needs to have a problem that one can solve. One may need different architectures for different problems. One should not go and pick 25 different problems and say: "Now, we are going to build a machine that solves all of them." This is not going to work in the future.
We observed that this is different from what has been done until today because it was said: "Let's build a general-purpose supercomputer because it is expensive and then we can solve a lot of problems with it."
Thomas Schulthess agreed and said that one can see it with the example of MeteoSwiss and the COSMO example that he mentioned in his presentation. If one takes today's cold in a model which is well implemented, and one tries to solve a problem on a general-purpose machine, like the machines they have at the German weather service and ECMWF, the machine ends up being about a factor 10 bigger. If one refactors the software, and designs a very special computer for this particular problem, still with off-the-shelf processors but different ones, like graphic processors, one builds a machine to solve this problem, rather than taking one that is in the vendor's roadmap. One could show that the difference in efficiency is a factor 10. This means that the investment that one is putting into the software is already amortised by the fact that one buys a smaller machine. The computer vendors may not be perfectly happy but the customer is because the overall production is lower. This is a part of innovation and that is always good.
We joked that the vendors might say: "You need to buy 10 different machines for 10 different applications", and continued by asking whether it is important for Europe to be in the top 3 with European technology, like EC President Juncker stated, or is the quality of the new technology more important.
Thomas Schulthess replied that the latter is important. It makes sense for Europe to want to be in the technology game. The question is which top 3 are you taking. If you take the top 3 in the TOP500, then Thomas Schulthess thinks it makes no sense. If you are talking about wanting to be the top in terms of solving this weather problem, he would be more ambitious than President Juncker and say: "We just want to be the best." Where there is currently a little bit of disagreement is how do you get to European technology. Do you start from scratch or do you with the help of collaborations and slowly adopt more and more European technology? Thomas Schulthess would favour the latter approach.
The approach in EuroHPC that some promote, since they are not all in agreement, is, for example as they do in France, they want to embrace all European or all non-American technology, so they are not limited by ITA rules and so on. It is the riskier approach. In the end, it will take longer. That is why Thomas Schulthess would argue that if you take the more continuous approach where you adopt more and more European technology, you will end up building European systems faster. There is one thing you should not forget. You have to build a market and a business case around the system. It is not just enough to invest a few billion to create a one-off system, you actually want a business case. That is what makes the Americans so strong because there, everything they do is not driven by government programmes, it is driven by an economic case. The government programmes may pick some of the cherries to accomplish certain goals but the fact that the US needs Silicon Valley is not because of the US government. It is because there is an economic case for what they are doing. Then it becomes self-funded and self-sustained.
We wanted to know how the discussions that are now taking place at the governance level, at EuroHPC - and the Swiss government is part of these discussions - are now evolving.
Thomas Schulthess thought that the discussions are progressing. One is really making an effort to create the European initiatives and to also motivate other governments to invest. There are some like the German government, the German Federal Ministry of Education and Research (BMBF) that have already invested a lot. They have a very healthy programme. Given the size of the country, Switzerland has also invested a lot and also accomplished something. There are a few places in Europe that have very substantial programmes. The EuroHPC initiative is trying to motivate others to do the same and that is a good thing. Of course, you have to find consensus with a lot of members which is not a bad idea. One is well used to such consensus-finding in Switzerland, even at a much smaller scale. This is generally healthy and good to do. It may slow you down a little bit. That is why Thomas Schulthess thinks one needs both: top-down initiatives like EuroHPC and bottom-up science driven initiatives. With both, one will keep Europe healthy.
We summarized that Thomas Schulthess talked about the importance of having an application that is crucial for the whole world like climate research and having that as a kind of far-reaching but not unreachable goal. That is nice but how does one "get rid" of the TOP500 and HPCG benchmarks which are now kind of dominating the discussions?
Thomas Schulthess replied that you can call it the carrot of the scientific challenge. You have to think about what you need to do to reach that carrot. It is not just about building a computer. One of the most significant investments one has to make is that one has to change the way one develops software, how one implements the models. This is where in many areas of scientific computing one is out of date. One still lives in the 1980s, maybe has arrived in the 1990s. There are other areas like machine learning and data science which are emerging in 2017 and are way ahead. What Thomas Schulthess and his team have demonstrated is possible to do in the work they did with COSMO and MeteoSwiss is that they can use modern approaches to build a software framework that gives them more flexibility with the hardware.
The ideas of how they build the software framework will fix a huge problem with the productivity gap that one has today. Once that productivity gap is closed, people will suddenly realize that this approach that one has today where one builds a machine for the TOP500 and then shoehorns an application onto that machine - like is happening now with these systems in China and the Gordon Bell prizes that they receive for it but with a performance that is embarrassingly low - is not a reasonable way to do computing. If you have a very substantial alternative to the current approach that one has around the TOP500 - and Thomas Schulthess thinks weather and climate, questions of molecular dynamics and material science, all those problems have a large enough base - people will just go there and hopefully, they will ignore what Thomas Schulthess believes is not the right path. He is quite hopeful and suggested to revisit this in five or ten years.
As a last topic we wanted to know how Thomas Schulthess saw the relation with the industry in Europe relating to HPC.
Thomas Schulthess replied that there is a lot of discussion around industry. In this case, he thought there are different models in how this relationship has to be implemented. In Switzerland and at ETH in particular, they follow a relatively liberal model, liberal in the economic sense. There are clear roles for institutions like ETH Zurich to invest in education and innovation. The federal government invests in innovation and education. There is a clear transfer step between the innovative work, creating maybe spin-offs or start-ups, but then getting spin-off or just educating people who then go into the commercial world and become productive there.
According to Thomas Schulthess, a computer centre like CSCS should work in partnership with industry to help create infrastructure like this on the industrial side. This is where Switzerland differs a little bit from some of the European approaches - not everywhere in Europe because there are many countries that have a very similar model to Switzerland, such as The Netherlands - but there is a difference in the fact that some countries think that you should use the publicly funded infrastructure to help industry more directly. Switzerland does things a bit differently. There, it is about knowledge transfer into industry and creating successful projects in partnership initially but then you want things to happen on the commercial level without subsidies. If something is successful commercially, you have to scale it, make it ten or a hundred times bigger than CSCS could ever do it.
Today, there are the implementations in the form of hyper scale Clouds. There are a lot of them in the US and in the US companies in Europe. This is the model. There is the academic world that Thomas Schulthess lives and operates in and that is more or less capped. Even the Swiss science budget is not going to go up by a factor 10 in the next five years. If there are good ideas that can be commercialized, you want them to scale by a factor 10 but if they are commercial, industry also has the capital to do it. The infrastructure of the hyper scale Clouds is in place which is quite good.