Funded PhD position: Energy-aware job scheduling and feedback

Keywords

High Performance Computing, energy-aware scheduling, CO2 impact, energy-efficiency

Context

High Performance Computing usage is growing from climate science studies to chemical research. The increased impact of these computation opens the field of research on how to manage and reduce their energy consumption. The PhD is in the context of the NumPEx project which aims at developing state-of-the-art skills and infrastructures in the field of exascale computing. One of the pillars of NumPEx focuses on making exascale computing sustainable.

HPC (High Performance Computing) systems are usually managed by RJMS (Resource and Job Management System) that decide when and on which server to execute the applications submitted by users. This RJMS is crutial as its quality directly impacts the performance of the whole infrastructure. Usually this performance is measured by the number of tasks finished each day or the execution time. In our context we will also optimize the CO2 impact of these tasks (scheduling tasks when the carbon footprint is minimal for example). We will also leverage the capability of applications to run using different number of servers to optimize our metrics.

HPC infrastructures are exascale supercomputers with monitoring of resource usage, energy consumed.

HPC applications are usually described by a DAG (Directed Acyclic Graph) of tasks. Another leverage is that different tasks have different power and resource impact. Running applications consuming less power when energy produces a lot of CO2 is more efficient.

Users feedbacks (metrics showing impacts of the execution demand) are more and more required to reach sustainability of these platforms.

The PhD student will focus on studying energy-aware scheduling of HPC applications.

Objectives

The objectives of the PhD are the following:

  • Multi-objectives Tasks Scheduling (energy, time) with as inputs a prediction of energy mix, renewable energy available, energy price
  • Using IA prediction for the consumption of application. These predictions will be used by the scheduling algorithms to optimize the efficiency of the platform (cf. https://hal.science/hal-04566184/document)
  • Evaluate the impact of different leverages (capacity to adapt the resources required by applications like in https://hal.science/hal-02964970, heterogeneity of the different applications requirements, heterogenity of resources (CPU/GPU)…)
  • Propose different users feedback like in https://www.sciencedirect.com/science/article/pii/S0167739X24000219
  • Experiments using actual HPC infrastructure (using OAR Resources and Job Management System, RJMS)

The PhD structure will be as follows:

  • State of the art on HPC scheduling
  • Propose Multi-objectives Tasks Scheduling (energy, time)
  • Experiments to validate the algorithms and compare them with state of the art
  • A demonstrator using OAR and the scheduling algorithms proposed to reduce the power consumption of HPC platforms

Monitoring software will be used (such as MojitO/S) during the PhD, and some contributions might be done to them. A large scale experiment platform will be used (Grid'5000).

Expected skills and profile

  • Required: Master’s in computer science.
  • Strongly recommended: A taste for experimental approaches, C or Rust programming, Python or R data analysis.
  • Language: English. Basic French is a plus.

At the end of the PhD, the student will have acquired the following skills: scientific work and experimentations, expertise in HPC and sustainable computing, planning of long term projects, scientific writing.

This project involves many academic partners in France, the PhD will therefore be done in a national collaboration context. It is expected to have three annual meetings in other cities in France during which English will be used.

If willing, the student will have the opportunity to teach in English or French.

Practical details

The PhD will take place at IRIT, the largest computer science research institute in Toulouse, France. Our team SEPIA works on resource management on various distributed systems (cloud datacenters, HPC centers, edge architectures, IoT…) and is especially interested in ecological transition, notably by reducing energy consumption and CO2 emissions, by using renewable energy.

The PhD will be supervised by Georges Da Costa and Patricia Stolf in a convivial atmosphere :).

The PhD will be funded by the NumPEx collaborative project. The monthly gross salary will be of 2100 €.

You can send us your application (cover letter + resume / short curriculum vitæ + transcript of records for the full master) by email to georges.da-costa@irit.fr and patricia.stolf@irit.fr.