Configuration

Dask-jobqueue should be configured for your cluster so that it knows how many resources to request of each job and how to break up those resources. You can specify configuration either with keyword arguments when creating a Cluster object, or with a configuration file.

Keyword Arguments

You can pass keywords to the Cluster objects to define how Dask-jobqueue should define a single job:

cluster = PBSCluster(
     # Dask-worker specific keywords
     cores=24,             # Number of cores per job
     memory='100GB',       # Amount of memory per job
     shebang='#!/usr/bin/env zsh',   # Interpreter for your batch script (default is bash)
     processes=6,          # Number of Python processes to cut up each job
     local_directory='$TMPDIR',  # Location to put temporary data if necessary
     # Job scheduler specific keywords
     resource_spec='select=1:ncpus=24:mem=100GB',
     queue='regular',
     project='my-project',
     walltime='02:00:00',
)

Note that the cores and memory keywords above correspond not to your full desired deployment, but rather to the size of a single job which should be no larger than the size of a single machine in your cluster. Separately you will specify how many jobs to deploy using the scale method.

cluster.scale(12)  # launch 12 workers (2 jobs of 6 workers each) of the specification provided above

Configuration Files

Specifying all parameters to the Cluster constructor every time can be error prone, especially when sharing this workflow with new users. Instead, we recommend using a configuration file like the following:

# jobqueue.yaml file
jobqueue:
  pbs:
    cores: 24
    memory: 100GB
    processes: 6
    shebang: "#!/usr/bin/env zsh"

    interface: ib0
    local-directory: $TMPDIR

    resource-spec: "select=1:ncpus=24:mem=100GB"
    queue: regular
    project: my-project
    walltime: 00:30:00

See Configuration Examples for real-world examples.

If you place this in your ~/.config/dask/ directory then Dask-jobqueue will use these values by default. You can then construct a cluster object without keyword arguments and these parameters will be used by default.

cluster = PBSCluster()

You can still override configuration values with keyword arguments

cluster = PBSCluster(processes=12)

If you have imported dask_jobqueue then a blank jobqueue.yaml will be added automatically to ~/.config/dask/jobqueue.yaml. You should use the section of that configuation file that corresponds to your job scheduler. Above we used PBS, but other job schedulers operate the same way. You should be able to share these with colleagues. If you can convince your IT staff you can also place such a file in /etc/dask/ and it will affect all people on the cluster automatically.

For more information about configuring Dask, see the Dask configuration documentation