Keynote Sessions

Monday, September 27, 2021, 9:00-11:00 MSK
"Sokolniki" Hall

Supercomputing Technologies: perspectives, society, people
Vladimir Voevodin, Moscow State University, Russia

Data-Centric Python – Productivity, portability and all with high performance!
Torsten Hoefler, ETH Zürich, Switzerland

Python has become the de-facto language for scientific computing. Programming in Python is highly productive, mainly due to its rich science-oriented software ecosystem built around the NumPy module. As a result, the demand for Python support in High Performance Computing (HPC) has skyrocketed. However, the Python language itself does not necessarily offer high performance. In this work, we present a workflow that retains Python’s high productivity while achieving portable performance across different architectures. The workflow’s key features are HPC-oriented language extensions and a set of automatic optimizations powered by a data-centric inter-mediate representation. We show performance results and scaling across CPU, GPU, FPGA, and the Piz Daint supercomputer (up to 23,328 cores), with 2.47x and 3.75x speedups over previous-best solutions, first-ever Xilinx and Intel FPGA results of annotated Python, and up to 93.16% scaling efficiency on 512 nodes.

OS for HPC in Exascale Era
Ruibo Wang, National University of Defense Technology, China

Operating system for HPC has been evolving fast to fit between the underlying hardware and upper applications. Entering the Era of Exascale, HPC manufacturers and researchers are exploring different hardware architectures aiming to push forward the performance boundary. At the same time, widening applications (Big data, AI, etc.) calls for HPC support. The contradictory goals of performance and application compatibility pose new challenges on operating system. This talk reviews previous works and explores design space and concerns for Exascale HPC operating systems.

Bitwise Reproducibility in Computational Climate Science
Thomas Ludwig, German Climate Computing Centre & Universität Hamburg, Germany

Reproducibility of scientific results became a heavily debated issue in the last recent years. Looking closer into that field we find many different levels of concern. In computational sciences where we conduct experiments in silico and receive results from computer simulations we are faced with special aspects of reproducibility. Can we run third party code and reproduce the result that formed the basis for a publication? Following the rules of good scientific practice: can we even reproduce our own results after a certain time with perhaps a modified computer infrastructure? Finally, if we give our best, what is the influence of the hardware and the processor itself? Can we achieve a bitwise reproduction of a single program run? And if so, how is this connected to reproducibility of findings in science? The talk will discuss several of these critical issues and concentrate on aspects of programming and hardware issues.

Using supercomputers to study molecular dynamics of DNA-protein complexes at multi-microsecond time scale
Alexey Shaytan, Faculty of Biology, Moscow State University, Russia

The complexity and power of modern computer information systems is astonishing, rivalling the complexity of living organisms - the most sophisticated entities nature has created. Yet, our understanding of how living systems function at the molecular level remains elusive. The information content of the genetic programs encoded by the DNA in human genome is less than 900 megabytes and has been known for two decades to date.  However, understanding the molecular mechanisms underlying the interpretation of these programs is a big challenge of this century. The growing capabilities of modern computer systems and algorithms are an ultimately required component for addressing this challenge. In this talk I will showcase how supercomputer simulations allow us to gain insights into the molecular mechanisms of DNA functioning in living systems by understanding its dynamic interactions with proteins.

Monday, September 27, 2021, 11:30-13:45 MSK
"Sokolniki" Hall

Keynote by NVIDIA
Anton Dzhoraev, Boris Neiman, NVIDIA

Keynote by RSC
Alexander Moskovsky, RSC

The RISC-V vector in EPI
Jesus Labarta, Barcelona Supercomputing Center, Spain

The New Era of Hybrid-Computing on and with SX-Aurora TSUBASA: Vector-Scalar to Vector-Digital Annealing, to Vector-Quantum Annealing
Hiroaki Kobayashi, Tohoku University, Japan 

China National Supercomputer Centers based on Ascend
Valery Cherepennikov, Huawei

Intel hardware and software architecture and tools to build HPC systems of Exascale level
Nikolay Mester, Intel

Keynote by AMD
Pavel Stanavov, AMD

Neptune, Or how to keep the hot stuff cool
Andrey Sysoev, Lenovo

Keynote by Dell
Nikita Stepanov, Dell Technologies

Monday, September 27, 2021, 17:00-18:15 MSK
"Sokolniki" Hall

Digital Convergence and Human Intelligence
Michael Resch, University of Stuttgart, Germany​ 

Digital technologies converge rapidly opening new spaces of solutions and problems. The talk will present the impact of digital convergence. It will also look into the new possibilities that human intelligence has when making best use of digital convergence.

HPC: The Where We Are Today And A Look Into The Future
Jack Dongarra, University of Tennessee, Oak Ridge National Laboratory, and University of Manchester, USA

In this talk we examine how high performance computing has changed over the last 10-year and look toward the future in terms of trends. These changes have had and will continue to have a major impact on our numerical scientific software. A new generation of software libraries and algorithms are needed for the effective and reliable use of (wide area) dynamic, distributed and parallel environments. Some of the software and algorithm challenges have already been encountered, such as management of communication and memory hierarchies through a combination of compile--time and run--time techniques, but the increased scale of computation, depth of memory hierarchies, range of latencies, and increased run--time environment variability will make these problems much harder.

Foundational Principles and Examples of non von Neumann Architectures beyond Moore’s Law
Thomas Sterling, Indiana University, USA

The von Neumann architecture has dominated the mainstream of computing including HPC and AI for seven decades. Its success has been driven by incremental improvements to hardware designs yielding many generations of progress in capability up to and including nano-scale technologies. With Moore’s Law of semiconductor technology driving the ticktock cadence of microprocessor and DRAM development, innovative concepts could not challenge the industrial economy of scale offered by COTS von Neumann derivatives; that is, until the end of Moore’s Law. In the absence of continuing performance gain through technology advances, alternative approaches may rely on significant innovations in computer architecture instead. This presentation will expose underlying assumptions incorporated in conventional processor cores that impede scalability and efficiency of future exascale HPC systems. A brief review of prior art will be presented with examples of non von Neumann computing as well as those being pursued. This fast paced talk will serve as a direct segue to a related follow-on presentation of one potentially revolutionary strategy of memory-centric non von Neumann parallel architectures for scalable processing of time-varying graphs and AI applications.

Tuesday, September 28, 2021, 17:00-18:00 MSK
"Sokolniki" Hall

Active Memory Architecture: a non von Neumann Memory-centric Paradigm for Exascale AI, ML, and Data Analytics
Thomas Sterling, Indiana University, USA

With the end of Moore’s Law, new architecture concepts may enable orders of magnitude performance gain for those AI and Machine Learning applications heavily dependent on time-varying irregular graphs. Such computations are data-centric exhibiting little or no temporal or spatial locality due to lack of data reuse. New objective functions may replace conventional practices of optimizing FPU utilization. Emphasizing memory access latency and bandwidth can be combined with automatic discovery of intrinsic parallelism exposed by graph meta-data to yield efficiency of semiconductor die area and energy. The Active Memory Architecture (AMA) is in development and experimental prototype to test the opportunity of high density memory-centric compute-cell hardware architecture in support of a ultra-high performance intelligent computing applications. The presentation will describe the concepts and major functionality of the AMA for future exascale unsupervised deep learning among other AI applications.

Conference Close
Vladimir Voevodin, Moscow State University, Russia