RoMoL 2016: Technical Presentations

Here is the current list of the outstanding technical presentations we will enjoy during RoMoL 2016. Take a look at the Invited Speakers biographies, too. Enjoy the Workshop Program.

  • The Run Time System and the end of Moore's Law: Why now? By who?, Yale Patt, U. T. Austin
    • It is the job of the run time system to decide how to manage available resources. With billions of transistors already on a chip, and Moore's Law predicting that number will grow to 50 billion in a few years, there are and will be plenty of resources to manage. Whose job is it to manage them? Historically, the run time system has been part of the o/s. But the o/s is far from the action. The research literature has plenty of hardware structures that can help, but on-chip hardware has a small window of observation. This situation has been around for quite awhile. What is different now. Answer: performance benefits have continued to grow exponentially in response to the shrinking size of transistors, which has allowed switching times to decrease while at the same time providing more transistors for microarchitects to use. But that shrink (the basis of Moore's Law) is certainly going to end, and with it the shorter switching times and doubling of transistors on a chip. With this ending, how do we continue performance improvements? The answer has to include: by better utilizing the chip's resources – a legitimate task for the run time system. But where should the run time system reside? And who needs to be part of the solution?
  • The OmpSs Programming Model, Jesus Labarta, Barcelona Supercomputing Center / UPC
    • The talk will present a vision of how parallel computer architrectures are evolving and some of the reasearch being done at the Barcelona Supercomputing Center (BSC) driven by such vision. We consider that the evolution towards increasing complexity, scale and variability in our systems makes two technologies play a very important role in future parallel computing, which with the advent of multicores means in general computing. On one side, performance analysis tools with very detailed analytics capabilities are key to understand the actual behavior of our systems. On the other, programming models that hide the actual complexity of the underlying hardware are needed to ensure the programming productivity and performance portability needed to ensure the economic sustainability of the programming efforts. We will present the OmpSs programming model and development at BSC, a task based model for homogeneous and heterogeneous systems which acts as a forerunner for OpenMP. OmpSs targets in a uniform way multicores, accelerators and clusters. We will describe features of the nanos++ runtime on which OmpSs is implemented, focusing on the dynamic scheduling capabilities and load balance support features. We will also present the BSC tools environment, including trace visualization capabilities and specific features to understand the actual behavior of the NANOS runtime and OmpSs programs.
  • The Galois runtime system for parallel graph analytics workloads, Keshav Pingali, U. T. Austin
  • A Quantitative Analysis of Runtime System Software for Asynchronous Multi-Tasking Computation, Thomas Sterling, Indiana University
  • Runtime Resource Management, Burton Smith, Microsoft
    • Computing resources in emerging architectures are unmanageable by today’s operating systems and runtimes, making humans responsible for the configuration decisions. As heterogeneous processors, memory-based synchronization and late compilation become commonplace, the configurations explode. Moreover, cloud computing changes performance criteria from “as fast as possible” to those based on adequate response times, and energy has become a component of performance. These factors are changing resource management from an ad-hoc, heuristic-based bag of tricks into an optimal control problem that can only be solved by cooperation between the operating system and the runtime.
  • Significance-Driven Runtime Systems, Dimitrios S. Nikolopoulos, Queen's University of Belfast
    • In a number of data-intensive applications domains, not every part of the code or every piece of data accessed by the code is equally critical for producing program output with acceptable correctness. This talk explores how we can express and exploit the significance of code and data in applications in order to improve their execution efficiency on current and future computing systems. We will specifically look into a runtime system, mechanisms and and policies for dynamically reducing application resource usage while sustaining output quality.
  • Intel HPC Co-Design Activities in Europe, Hans-Christian Hoppe, Intel
    • Intel works closely with a range of academic partners in Europe to help solve the well-known challenges in making HPC applications and systems more efficient, scalable, as well as easier to deploy and use. Co-design between all elements of the SW stack is a key requirements, in addition to the more traditional HW/SW Co-design. The talk will highlight the scope of these engagements, discuss the approaches used and present recent results from a subset of projects.
  • Runtime Aware Architectures, Miquel Moretó, BSC/UPC
    • In the last few years, the traditional ways to keep the increase of hardware performance to the rate predicted by the Moore's Law have vanished. When uni-cores were the norm, hardware design was decoupled from the software stack thanks to a well defined Instruction Set Architecture (ISA). This simple interface allowed developing applications without worrying too much about the underlying hardware, while hardware designers were able to aggressively exploit instruction-level parallelism (ILP) in superscalar processors. Current multi-cores are designed as simple symmetric multiprocessors (SMP) on a chip. However, we believe that this is not enough to overcome all the problems that multi-cores face. The runtime has to drive the design of future multi-cores to overcome the restrictions in terms of power, memory, programmability and resilience that multi-cores have. In this talk, we introduce a first approach towards a Runtime-Aware Architecture (RAA), a massively parallel architecture designed from the runtime's perspective.
  • Efficient Computing Beyond the Validity of Moore's Law, Per Stenstrom, Chalmers University of Technology
  • Building Larger Shared Memory Multiprocessors, Trevor Mudge, University of Michigan, Ann Arbor
    • In this talk we will discuss the barriers to building shared memory machines. We will then show how it is possible to build larger shared memory systems that support fast coherence and low latency memory access.
  • Just-enough-computing: Doing more with less, Chris Adeniyi-Jones, ARM
    • The challenge of enabling Exascale-level computing provides an opportunity to examine the assumptions we make when designing new hardware and software for high-performance computing. With a focus on pushing for maximum “top-right“ performance, the traditional route for building high-performance computing systems results in massive over-provisioning in the capability of the hardware. On the software side, overhead is introduced throughout the software stack as we favour a general-purpose approach over targeted, specific solutions (again an over-provision of capability). These two trends have consequences: the theoretical peak performance of a system is unattainable when executing real-world applications and the energy-efficiency of the system can be very disappointing when compared to the energy-efficiency of the individual components. This state of affairs raises some questions: what could be achieved with a focus on firstly improving software efficiency and then reducing the overshoot on the hardware capability? How could a such a system be built? What are the implications for programming models on such a system? What types of application would demonstrate the validity of this approach? In this talk I will present suggestions for alternative system architectures and software aimed at addressing these questions.
  • Application-Driven Runtime Adaptation for Efficient Resilience, Pradip Bose and Alper Buyuktosunoglu, IBM T. J. Watson Research Center
    • The multi- (and, now many-) core architectural design paradigm has been in full flight, as a response to the so-called “power (or power-density) wall.” However, as we scale forward to post-14nm CMOS technologies, the many-core paradigm is encountering new yield- and/or reliability-related challenges. These new challenges make it even harder to preserve sustained growth in chip throughput (balanced against single-thread growth), while maintaining current levels of system resilience and power density. This talk will examine opportunities to meet this next generation challenge by exploiting run-time adaptation knobs. The techniques explored are all driven by the user application (workflow), but the implementation of the underlying sense-and-control actuation loop could be in hardware, software or a combination – depending on the particular context and associated practicalities.
  • Architecting 3D Memory Systems, Moinuddin Qureshi, Georgia Institute of Technology
  • Designing next-gen task-based runtime systems: challenges and opportunities, Raymond Namyst, Université de Bordeaux, INRIA, CNRS
    • To Fully tap into the potential of heterogeneous manycore machines, the use of runtime systems capable of dynamically scheduling tasks over the pool of underlying computing resources has become increasingly popular. Such runtime systems expect applications to generate a graph of tasks of sufficient “width” so as to keep every processing unit busy. However, not every application can exhibit enough task-based parallelism to occupy the tremendous number of processing units of upcoming supercomputers. Exploiting inner parallelism of tasks, and co-scheduling differents codes simultaneouly are two ways of greatly increasing hardware occupancy. Although these techniques may seem unrelated, they rely of the same set of runtime mechanisms: hierarchical scheduling and resource negociation. This talk will give some insights about how such features could be better implemented with some hardware support.
  • SGL: An Approach for Future Exascale Graph Processing, Lawrence Rauchwerger, Texas A&M University
    • The STAPL Graph Library (SGL) is a high-level framework that can shield the users from the details of parallelism management and allows them to concentrate on parallel graph algorithm development. In this talk, we present SGL's scalable design and some techniques to control and increase asynchrony, reduce communication at algorithm level and a manage out-of-core graph computation for extreme-scale data sets. We demonstrate the scalable performance of our graph framework by evaluating fundamental graph algorithms on up to 131,072 processes and show results for our particle transport application on up to 1.5 million processes on a Blue Gene/Q.
  • Codesign - from Devices to Hyperscale Datacenters, Marc Tremblay, Microsoft
    • This talk will cover the co-design of two devices from the silicon, system and software standpoint, in the context of a fully integrated design team. The concept is also applied to optimizing hyper-scale datacenters running internal cloud workloads as well as hundreds of thousands of customer workloads running on virtual machines. Simulation results based on these workloads and other benchmarks are presented to improve our understanding of the impact of such technology as large L4 caches and/or high-bandwidth memory.
  • A new Architecture Avenues in Big Data Environment, Uri Weiser, Technion IIT
  • Big Cores versus Medium Cores, Roger Espasa, Broadcom
    • In this talk we will discuss the two major trends in high end processors: Big Cores being used in Server environments and medium cores used in HPC environments and reflect on the possible evolution paths of both types of processors.
presentations.txt · Last modified: 2016/03/11 10:07 by miquel
Powered by PHP Driven by DokuWiki Recent changes RSS feed Valid CSS Valid XHTML 1.0 Valid HTML5