DAN System User Guide
Index
- About DAN System
- Rules and Conducts
- Login to DAN System
- Filesystem (Directory) Structure
- Running Jobs
- Software Library
- conda Cheat Sheet on DAN System
- Jupyter and RStudio on DAN System
- nf-core configuration on DAN System
- Reference Genomes on DAN System
About DAN System
DAN System is a HPC System owned by reNEW (The Novo Nordisk Foundation Center for Stem Cell Medicine, SUND, KU) for computational experts needing to connect large datasets stored on KU-IT storage to compute. It is primarily based at genomics and image data analysis. The system includes 1 GPU computing node and 2 CPU computing nodes contributed from collaborators of reNEW (CGEN -- Center for Gene Expression, and CPR -- Novo Nordisk Foundation Center for Protein Research) across the faculty of KU.
System Specification:
Head Node -- danhead01fl
- CPU: 6 virtual CPUs
- RAM: 12GB
- OS: Red Hat Enterprise Linux 8, with SLURM Workload Manager
- ROLE: Login/control node
GPU Node -- dangpu01fl
- CPU: 4 × Intel Xeon Platinum 8280 @ 2.70GHz, 224 cores/threads (Hyper-threading)
- GPU: 4 × NVIDIA Quadro RTX 8000
- RAM: 4TB
- OS: Red Hat Enterprise Linux 8, with SLURM Workload Manager
- ROLE: Compute node, Jupyter/RStudio server
CPU Node 01 -- dancmpn01fl
- CPU: 2 × AMD EPYC 7763 @ 2.45GHz, 256 cores/threads (Hyper-threading)
- RAM: 4TB
- OS: Red Hat Enterprise Linux 8, with SLURM Workload Manager
- ROLE: Compute node
CPU Node 02 -- dancmpn02fl
- CPU: 2 × AMD EPYC 9454 @ 2.75GHz, 192 cores/threads (Hyper-threading)
- RAM: 768GB
- OS: Red Hat Enterprise Linux 8, with SLURM Workload Manager
- ROLE: Compute node
Please contact Sen Li to get the access to DAN System. You are also welcome to join our slack channel.
RULES AND CONDUCTS
- DO NOT run any program on Head Node directly except basic file operation commands e.g. cp, mv, rsync, etc.
- DO NOT run any job without slurm. Please read the section 'Running Jobs' for more details.
- DO NOT save data or install programs under your homedir. Please read the section 'Filesystem Structure' for more details.
- DO NOT run any job under the folder '$HOME/ucph'. Please read the section 'Filesystem Structure' for more details.
- During working hours, max. number of CPUs usage of all running jobs per user is 32 cores (one extra slurm interactive session with max. 8 cores is allowed), excess jobs will be killed without notifying.
- Please notify other users in advance if you are planning to run heavy (> 40 CPUs) job.
- For any special concern of your job, please contact Sen Li.
Login to DAN System
DAN System only accepts the login via your own KU credentials.
- Launch Cisco AnyConnect (shipped with KUComputer or visit KU VPN Service for the guide) and connect to the VPN. OPTIONAL: you can ignor this step if you use the cable connection in KU reNEW offices.
- Open a Terminal (for Windows user, please download/install PuTTY).
Login via ssh with your own KUID (server address: danhead01fl.unicph.domain):
ssh YOUR_KUID@danhead01fl.unicph.domain ## Note: Please replace **YOUR_KUID** with your own one.
Enter your KU password.
Run the command below once when you login to the server for the first time.
/projects/dan1/apps/etc/init_dangpu_env.sh
Then, start a new bash session or simply logout and login back again.
Filesystem (Directory) Structure
- '/home/[YOUR_KU_ID]' = 'homedir': Your home directory and the path where you login to DAN System. Note the homedir has a quota of 100GB, so please DO NOT save your data or install programs in homedir, only use it for config files.
'/projects/dan1/people/[YOUR_KU_ID]' = 'datadir': The directory where you can save your personal data and run your jobs. It's highly recommended to make its symbolic link in your homedir:
ln -s /projects/dan1/people/[YOUR_KU_ID] $HOME/datadir
Keep in mind that we might need to pay for the storage here as a whole group in the future, therefore please keep your usage as minimal as possible, e.g. move/delete the raw data, intermediate files and results to kudrives or ERDA for long time archiving.
- '/projects/dan1/data' = 'sharedir': This is where we have shared data across the whole group, such like shared database, reference genomes, etc.
- '/home/[YOUR_KU_ID]/ucph' = 'kudrives': This is where the KU network drives you have access to are mounted, such like your personal H-drive and group shared N-drives. Please keep in mind that NEVER NEVER NEVER launch any job under this directory or its sub-directories, except copy/move/delete files.
- '/datasets/[YOUR_DATASET_NAME]' = 'dataset': Compute Dataset, similar to KU N-drive but with high performance, you can run your jobs directly under this folder. Please contact Sen Li if need to request one dataset.
- '/scratch': the local drive which is used for temporary files and caches. If a program asks a value for 'TMP_DIR' or 'TMP' or etc., please set it to '/scratch'.
Running Jobs
DAN System is managed by 'slurm', a workload manager, and 'danhead' is the slurm control node. Since 'danhead' has very limited system resources and shared by all users, so the MOST IMPORTANT rule of working on DAN System is: DO NOT run any job on 'danhead' (except copy/move files), e.g. executing your python/R scripts directly in the terminal after login or in a screen session is absolutely prohibited. Please read the instruction below carefully before starting your first job on DAN System.
Note: Any process other than basic file operation commands on 'danhead' shall be killed by the system.
Interactive slurm job
If you want to submit a slurm job or test your script or run some short-time tasks, you can launch an interactive slurm job session by:
srun -c 2 --mem=8gb --time=0-00:05:00 --pty bash
Options you may want to change:
- '-c 2': number of processors/cores, e.g. 2 cores here
- '--mem=8gb': total requested RAM, e.g. 8GB here (default unit is in MB, 8000 = 8000MB)
- '--time=0-00:05:00': max. running time of the session, format = D-HH:MM:SS
'screen/tmux' session + interactive slurm job
GNU Screen or tmux is an application of terminal multiplexer, which allows user to send a running terminal session to the background and it keeps running even if you are disconnected from the server. Please follow one of the tutorials if you need to learn the usage of screen (tutorial) or tmux (tutorial).
As screen/tmux allows user to run jobs in the background and the running processes won't be terminated even if the user gets disconnected, therefore launching an interactive slurm job in a screen/tmux session and then detaching from the session would be an ideal solution for running tasks (Again, please DO NOT run any job directly in a screen/tmux session without slurm).
Workflow on DAN System
Important note: Due to the lack of support of Kerberos authentication in slurm, you loose the access of KU H/N-drive in any slurm session on DAN System. If you need to transfer data from H/N-drive, it must be done before launching a slurm session.
- Start a screen/tmux session
- Launch a slurm interactive job
- Run your script or submit slurm jobs to the queue
- Detach from screen/tmux
Submitting a slurm job script
A slurm job script consists of 2 sections: 'sbatch (slurm)' options and your own scripts. Here is a simple example of the script with minimum required options:
Use your favorite (command-line) text editor to create a file. Here we take 'nano' as an example.
nano slurm_job_example.sh
Write text below to the file.
#!/bin/bash
### slurm job options: line must begin with '#SBATCH'
#SBATCH --job-name=any_name # job name
#SBATCH --mail-type=END,FAIL # mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=your.name@sund.ku.dk # email address to receive the notification
#SBATCH -c 2 # number of requested cores
#SBATCH --mem=8gb # total requested RAM
#SBATCH --time=0-00:05:00 # max. running time of the job, format in D-HH:MM:SS
#SBATCH --output=any_name_%j.log # standard output and error log, '%j' gives the job ID
##SBATCH -p gpuqueue # if you need to use GPU node, copy this line and delete one '#'
##SBATCH --gres=gpu:1 # request 1 gpu
##SBATCH -w dancmpn01fl # submit job to a certain node only
##SBATCH -x dancmpn02fl # exclude a certain node
### write your own scripts below
module load dangpu_libs python/3.7.13
echo "I am running a job with the slurm job script"
python /this/is/a/fake/path/empty_python_script.py
Press 'Ctrl' + 'x' (same for both PC and MAC) to save and quit editing. Then you can submit the job script to the slurm queue.
sbatch slurm_job_example.sh
Note:
- Only lines beginning with '#SBATCH' are considered as sbatch options, other similar terms like '##SBATCH' or '#[space]SBATCH' are just comments.
- Be kind to your colleagues. When you specify the number of CPUs or other resources, please keep in mind that you are not the only user on the server. If you are planning to run a job that require more than 32 CPUs, please talk to Sen Li first.
- Always specify a estimated running time, though it does not need to be precise. It would be helpful for the admin to schedule the maintenance.
- As dancmpn02fl has smaller RAM size (but high performance CPUs), if your job much more RAM than '4GB/CPU' ratio, e.g. 4 CPU + 64GB RAM, it is recommanded to exclude it from the default queue or use gpuqueue instead. To exclude the node, you need to add '-x dancmpn02fl' to sbatch/srun/salloc command option or state '#SBATCH -x dacmpn02fl' in slurm job script.
- If your job is failed and need help, please attach the job log and contact Sen Li.
Job Control
Check the usage and status of all compute nodes:
xsload
Check the status of your submitted jobs:
squeue -u [YOUR_KU_ID]
or xjobs
Check all running and queuing jobs on the server:
squeue
or xpinfo
Cancel your running or queuing job:
scancel [JOB_ID]
Software Library
DAN System uses Environment Modules to manage the software library. To DAN specified module files (if you have run the 'init_dangpu_env.sh' script before, please skip next 2 lines), you must add the line below in your '$HOME/.bashrc' file and reload the bash profile:
export MODULEPATH=/maps/projects/dan1/apps/.modules:\${MODULEPATH}
Then you can run:
module avail
to show the list of installed software.
To load a certain software (e.g. 'cutadapt'), run:
module load dangpu_libs python/3.7.13 cutadapt/4.1
Note:
- Module 'dangpu_libs' and 'python/3.7.13' are the pre-required modules of 'cutadapt'. All pre-required modules must be loaded before loading the one you want to use.
- If you don't know which are the pre-required modules, just try to load the one you want to use first, you will see an error message with 'HINT' to show the pre-req.
To check loaded modules in the current environment, run:
module list
If you need to install a new software, please send your request to Sen Li with the link of the software.
conda Cheat Sheet on DAN System
miniconda has been installed on DAN System. Here are a few tips on how to work with conda on DAN System.
Load conda module
If you have configured the MODULEPATH as the instruction above, it is highly recommended to load conda by:
module load miniconda/latest
which is the locally installed on DAN System.
Configure your own conda root directory
By default, conda use '${HOME}/.conda' as your personal root directory. However, as the quota of the homedir is only 100GB, it would be smart to change it, e.g. to your datadir.
mkdir -p /maps/projects/dan1/people/${USER}/.conda
Or if you already have the folder in your homedir,
mv ${HOME}/.conda /maps/projects/dan1/people/${USER}/.conda
Then create a symbolic link in your homedir:
ln -s /maps/projects/dan1/people/${USER}/.conda $HOME/.conda
Other conda configurations
Specify your personal conda environments/packages directories:
conda config --add envs_dirs $HOME/.conda/envs
conda config --add pkgs_dirs $HOME/.conda/pkgs
Add bioconda channel to your condarc (the file saves your personal configs) file:
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
conda config --set channel_priority strict
Jupyter and RStudio on DAN System
DAN System is hosting Jupyter and RStudio via the hub on GPU node 'dangpu01fl', you can access it by:
The independent entry of RStudio (http://dangpu01fl:8787) is still available at the moment, however, it will be closed eventually.
nf-core configuration on DAN System
All nf-core pipelines have been pre-configured for DAN System. Please read the instruction link below:
Reference Genomes on DAN System
DAN System provides the local support and maintenance of common reference genomes (e.g. human, mouse, etc.), and the pre-build indices for various programs (e.g. star, bowtie, etc.).
The genome library is managed by a program called refgenie. It is a command-line tool that simplifies finding of local resources, and it can be easily scripted and integrated to your own codes.
To load the refgenie on DAN System:
module load python/3.8.16 refgenie/0.12.1a
List all local genomes:
refgenie list
List all assets of a certain genome (e.g. GRCh38_ensembl):
refgenie list -g GRCh38_ensembl
Locate the path of the index of a program (e.g. star, version 2.7.11b):
refgenie seek GRCh38_ensembl/star_index:2.7.11b
Note:
- DAN Reference Genomes library does NOT include/support genomes for personal tailored purpose. However, please contact Sen if you want to have a registry for your project or group.
- The format of a refgenie registry: [GENOME_NAME]/[ASSET_NAME]:[VERSION]
- If you want to manage a personal refgenie library yourself, please refer to refgenie documents or contact Sen Li.
Go back to the Genomics Platform home