Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
presentations [2016/03/02 19:42]
miquel
presentations [2016/03/11 10:07] (current)
miquel
Line 34: Line 34:
        * The challenge of enabling Exascale-level computing provides an opportunity to examine the assumptions we make when designing new hardware and software for high-performance computing. With a focus on pushing for maximum “top-right"​ performance,​ the traditional route for building high-performance computing systems results in massive over-provisioning in the capability of the hardware. On the software side, overhead is introduced throughout the software stack as we favour a general-purpose approach over targeted, specific solutions (again an over-provision of capability). These two trends have consequences:​ the theoretical peak performance of a system is unattainable when executing real-world applications and the energy-efficiency of the system can be very disappointing when compared to the energy-efficiency of the individual components. This state of affairs raises some questions: what could be achieved with a focus on firstly improving software efficiency and then reducing the overshoot on the hardware capability? How could a such a system be built? What are the implications for programming models on such a system? What types of application would demonstrate the validity of this approach? In this talk I will present suggestions for alternative system architectures and software aimed at addressing these questions.        * The challenge of enabling Exascale-level computing provides an opportunity to examine the assumptions we make when designing new hardware and software for high-performance computing. With a focus on pushing for maximum “top-right"​ performance,​ the traditional route for building high-performance computing systems results in massive over-provisioning in the capability of the hardware. On the software side, overhead is introduced throughout the software stack as we favour a general-purpose approach over targeted, specific solutions (again an over-provision of capability). These two trends have consequences:​ the theoretical peak performance of a system is unattainable when executing real-world applications and the energy-efficiency of the system can be very disappointing when compared to the energy-efficiency of the individual components. This state of affairs raises some questions: what could be achieved with a focus on firstly improving software efficiency and then reducing the overshoot on the hardware capability? How could a such a system be built? What are the implications for programming models on such a system? What types of application would demonstrate the validity of this approach? In this talk I will present suggestions for alternative system architectures and software aimed at addressing these questions.
  
-   * //​**Application-Driven Runtime Adaptation for Efficient Resilience**//,​ **__Pradip Bose__**, IBM T. J. Watson Research Center+   * //​**Application-Driven Runtime Adaptation for Efficient Resilience**//,​ **__Pradip Bose__** and **__Alper Buyuktosunoglu__**, IBM T. J. Watson Research Center
        * The multi- (and, now many-) core architectural design paradigm has been in full flight, as a response to the so-called “power (or power-density) wall.” However, as we scale forward to post-14nm CMOS technologies,​ the many-core paradigm is encountering new yield- and/or reliability-related challenges. These new challenges make it even harder to preserve sustained growth in chip throughput (balanced against single-thread growth), while maintaining current levels of system resilience and power density. This talk will examine opportunities to meet this next generation challenge by exploiting run-time adaptation knobs. The techniques explored are all driven by the user application (workflow), but the implementation of the underlying sense-and-control actuation loop could be in hardware, software or a combination – depending on the particular context and associated practicalities.        * The multi- (and, now many-) core architectural design paradigm has been in full flight, as a response to the so-called “power (or power-density) wall.” However, as we scale forward to post-14nm CMOS technologies,​ the many-core paradigm is encountering new yield- and/or reliability-related challenges. These new challenges make it even harder to preserve sustained growth in chip throughput (balanced against single-thread growth), while maintaining current levels of system resilience and power density. This talk will examine opportunities to meet this next generation challenge by exploiting run-time adaptation knobs. The techniques explored are all driven by the user application (workflow), but the implementation of the underlying sense-and-control actuation loop could be in hardware, software or a combination – depending on the particular context and associated practicalities.
  
Line 46: Line 46:
        * The STAPL Graph Library (SGL) is a high-level framework that can shield the users from the details of  parallelism management and allows them to concentrate on parallel graph algorithm development. In this talk, we present SGL'​s ​ scalable design and some  techniques to control and increase asynchrony, reduce communication at algorithm level and a manage out-of-core graph computation for extreme-scale data sets. We demonstrate the scalable performance of our graph framework by evaluating fundamental graph algorithms on up to 131,072 processes and show results for our particle transport application on up to 1.5 million processes on a Blue Gene/Q.        * The STAPL Graph Library (SGL) is a high-level framework that can shield the users from the details of  parallelism management and allows them to concentrate on parallel graph algorithm development. In this talk, we present SGL'​s ​ scalable design and some  techniques to control and increase asynchrony, reduce communication at algorithm level and a manage out-of-core graph computation for extreme-scale data sets. We demonstrate the scalable performance of our graph framework by evaluating fundamental graph algorithms on up to 131,072 processes and show results for our particle transport application on up to 1.5 million processes on a Blue Gene/Q.
  
-   * //**Co-design ​from Devices to Hyperscale Datacenters**//,​ **__Marc Tremblay__**,​ Microsoft+   * //**Codesign ​- from Devices to Hyperscale Datacenters**//,​ **__Marc Tremblay__**,​ Microsoft 
 +       * This talk will cover the co-design of two devices from the silicon, system and software standpoint, in the context of a fully integrated design team. The concept is also applied to optimizing hyper-scale datacenters running internal cloud workloads as well as hundreds of thousands of customer workloads running on virtual machines. Simulation results based on these workloads and other benchmarks are presented to improve our understanding of the impact of such technology as large L4 caches and/or high-bandwidth memory.
  
    * //**A new Architecture Avenues in Big Data Environment**//,​ **__Uri Weiser__**, Technion IIT    * //**A new Architecture Avenues in Big Data Environment**//,​ **__Uri Weiser__**, Technion IIT
presentations.txt · Last modified: 2016/03/11 10:07 by miquel
Powered by PHP Driven by DokuWiki Recent changes RSS feed Valid CSS Valid XHTML 1.0 Valid HTML5