Life Science Applications¶
AlphaFold3¶
AlphaFold3 is a state-of-the-art tool developed by DeepMind for predicting the 3D structures of proteins, protein complexes, and their interactions with other biomolecules such as ligands and nucleic acids. It uses advanced machine learning methods to support a broad range of biological systems, making it particularly useful for applications in structural biology, drug discovery, and systems biology. AlphaFold3 can model large multimeric assemblies and heterogeneous interactions with high accuracy, providing insights that are often difficult to obtain experimentally. Its ability to predict structures across a wide range of organisms and conditions has made it a transformative tool in computational biology.
AlphaFold3 is provided as a Singularity container for ease of use and reproducibility in the HPC environment. It includes all necessary dependencies and model data, and is compatible with GPU-accelerated computing. It can be used with the following commands:
marie@login$ module load container/all
marie@login$ module load Alphafold3/3.0.1
marie@login$ srun --nodes=1 --ntasks=1 --cpus-per-task=1 --mem=10G --gres=gpu:1 --time=01:00:00 --pty bash
marie@compute$ run_alphafold –-help
The AlphaFold3 source code is licensed under CC BY-NC-SA 4.0. To view a copy of this license, visit https://creativecommons.org/licenses/by-nc-sa/4.0/
Example: Running AlphaFold3 job script:
Here we give an executable example on cluster Capella using 1 GPU, 13 CPUs and 100 GB memory. To run the function, the basic parameters have to be provided, including,
--db_dir
: the AlphaFold 3 database files--model_dir
: the model parameters--output_dir
: the output directory--json_path
: the input file
In this example, we turn jackhmmer_n_cpu=3
(number of CPUs per jackhmmer process) to
enable 4 parallel jackhmmer processes (since the input sequence will be quired to 4 databases,
uniref90_2022_05.fa
, mgy_clusters_2022_05.fa
, bfd-first_non_consensus_sequences.fasta
and uniprot_all_2021_04.fa
in the meantime) be executed efficiently.
#!/bin/bash
#SBATCH --job-name=AF3_prediction_test
#SBATCH --output=log_%j.out
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=13
#SBATCH --time=01:00:00
#SBATCH --mem=100G
#SBATCH --gres=gpu:1
#SBATCH --partition=capella
module purge
module load container/all
module load Alphafold3/3.0.1
run_alphafold --db_dir=/data/cat/shared/AlphaFold3/databases [1] \
--model_dir=/path/to/downloaded/af3.bin.zst/from_Deepmind [2] \
--output_dir=/path/to/your/working_directory \
--json_path=/path/to/your/input/json_file [3]\
--jackhmmer_n_cpu=3
[1] The databases for AlphaFold3 must be provided for running it properly.
AlphaFold3 provides a script, fetch_databases.sh
, for downloading these files,
which is accessible from within the container, /app/alphafold3/fetch_databases.sh
. User can enter
the container with enter_container
wrapper command after loading the AlphaFold3 module.
The uncompressed files require ~900GB of space. To help users save time and space, we have
downloaded the database files and put into the public space, /data/cat/shared/AlphaFold3/databases
.
Note: if there is a new release of AlphaFold3 and new datasets we haven’t been aware of,
you could write us ticket and ask for update.
[2] We can not provide AlphaFold3 model parameters globally on our system because of license restrictions.
To request access to the AlphaFold3 model parameters, follow the process set out at
the AlphaFold documentation
and download the file af3.bin.zst
.
[3] A test example json file, 1YU9.json
, is put here, and it took less than 10 minutes on
cluster Capella to
complete the calculation with above given Slurm settings.
1YU9.json
{
"name": "1YU9",
"modelSeeds": [1],
"sequences": [
{"protein": {
"id": "A",
"sequence": "GPLGSETYDFLFKFLVIGNAGTGKSCLLHQFIEKKFKDDSNHTIGVEFGSKIINVGGKYVKLQIWDTAGQER
FRSVTRSYYRGAAGALLVYDITSRETYNALTNWLTDARMLASQNIVIILCGNKKDLDADREVTFLEASRFAQENELMFLETSALT
GEDVEEAFVQCARKILNK"
}}
],
"dialect": "alphafold3",
"version": 1
}