dask_jobqueue.SLURMCluster

class dask_jobqueue.SLURMCluster(queue=None, project=None, walltime=None, job_cpu=None, job_mem=None, job_extra=None, **kwargs)

Launch Dask on a SLURM cluster

Parameters:
queue : str

Destination queue for each worker job. Passed to #SBATCH -p option.

project : str

Accounting string associated with each worker job. Passed to #SBATCH -A option.

walltime : str

Walltime for each worker job.

job_cpu : int

Number of cpu to book in SLURM, if None, defaults to worker threads * processes

job_mem : str

Amount of memory to request in SLURM. If None, defaults to worker processes * memory

job_extra : list

List of other Slurm options, for example -j oe. Each option will be prepended with the #SBATCH prefix.

name : str

Name of Dask workers.

cores : int

Total number of cores per job

memory: str

Total amount of memory per job

processes : int

Number of processes per job

interface : str

Network interface like ‘eth0’ or ‘ib0’.

death_timeout : float

Seconds to wait for a scheduler before closing workers

local_directory : str

Dask worker local directory for file spilling.

extra : list

Additional arguments to pass to dask-worker

env_extra : list

Other commands to add to script before launching worker.

python : str

Python executable used to launch Dask workers.

kwargs : dict

Additional keyword arguments to pass to LocalCluster

Examples

>>> from dask_jobqueue import SLURMCluster
>>> cluster = SLURMCluster(processes=6, cores=24, memory="120GB",
                           env_extra=['export LANG="en_US.utf8"',
                                      'export LANGUAGE="en_US.utf8"',
                                      'export LC_ALL="en_US.utf8"'])
>>> cluster.scale(10)  # this may take a few seconds to launch
>>> from dask.distributed import Client
>>> client = Client(cluster)

This also works with adaptive clusters. This automatically launches and kill workers based on load.

>>> cluster.adapt()
__init__(queue=None, project=None, walltime=None, job_cpu=None, job_mem=None, job_extra=None, **kwargs)

Methods

__init__([queue, project, walltime, …])
adapt(**kwargs) Turn on adaptivity
close() Stops all running and pending jobs and stops scheduler
job_file() Write job submission script to temporary file
job_script() Construct a job submission script
scale(n) Scale cluster to n workers
scale_down(workers) Close the workers with the given addresses
scale_up(n, **kwargs) Brings total worker count up to n
start_workers([n]) Start workers and point them to our local scheduler
stop_all_jobs() Stops all running and pending jobs
stop_jobs(jobs) Stop a list of jobs
stop_workers(workers) Stop a list of workers

Attributes

cancel_command
dashboard_link
finished_jobs Jobs that have finished
job_id_regexp
pending_jobs Jobs pending in the queue
running_jobs Jobs with currenly active workers
scheduler The scheduler of this cluster
scheduler_address
scheduler_name
submit_command
worker_threads