Life Science Applications¶

AlphaFold3¶

AlphaFold3 is a state-of-the-art tool developed by DeepMind for predicting the 3D structures of proteins, protein complexes, and their interactions with other biomolecules such as ligands and nucleic acids. It uses advanced machine learning methods to support a broad range of biological systems, making it particularly useful for applications in structural biology, drug discovery, and systems biology. AlphaFold3 can model large multimeric assemblies and heterogeneous interactions with high accuracy, providing insights that are often difficult to obtain experimentally. Its ability to predict structures across a wide range of organisms and conditions has made it a transformative tool in computational biology.

AlphaFold3 is provided as a Singularity container for ease of use and reproducibility in the HPC environment. It includes all necessary dependencies and model data, and is compatible with GPU-accelerated computing. It can be used with the following commands:

marie@login$ module load container/all
marie@login$ module load Alphafold3/3.0.1
marie@login$ srun --nodes=1 --ntasks=1 --cpus-per-task=1 --mem=10G --gres=gpu:1 --time=01:00:00 --pty bash
marie@compute$ run_alphafold –-help

The AlphaFold3 source code is licensed under CC BY-NC-SA 4.0. To view a copy of this license, visit https://creativecommons.org/licenses/by-nc-sa/4.0/

Example: Running AlphaFold3 job script:

Here we give an executable example on cluster Capella using 1 GPU, 13 CPUs and 100 GB memory. To run the function, the basic parameters have to be provided, including,

--db_dir: the AlphaFold 3 database files
--model_dir: the model parameters
--output_dir: the output directory
--json_path: the input file

In this example, we turn jackhmmer_n_cpu=3 (number of CPUs per jackhmmer process) to enable 4 parallel jackhmmer processes (since the input sequence will be quired to 4 databases, uniref90_2022_05.fa, mgy_clusters_2022_05.fa, bfd-first_non_consensus_sequences.fasta and uniprot_all_2021_04.fa in the meantime) be executed efficiently.

#!/bin/bash
#SBATCH --job-name=AF3_prediction_test
#SBATCH --output=log_%j.out
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=13
#SBATCH --time=01:00:00
#SBATCH --mem=100G
#SBATCH --gres=gpu:1
#SBATCH --partition=capella

module purge
module load container/all
module load Alphafold3/3.0.1

run_alphafold --db_dir=/data/cat/shared/AlphaFold3/databases [1] \
--model_dir=/path/to/downloaded/af3.bin.zst/from_Deepmind [2] \
--output_dir=/path/to/your/working_directory \
--json_path=/path/to/your/input/json_file [3]\
--jackhmmer_n_cpu=3

[1] The databases for AlphaFold3 must be provided for running it properly. AlphaFold3 provides a script, fetch_databases.sh, for downloading these files, which is accessible from within the container, /app/alphafold3/fetch_databases.sh. User can enter the container with enter_container wrapper command after loading the AlphaFold3 module. The uncompressed files require ~900GB of space. To help users save time and space, we have downloaded the database files and put into the public space, /data/cat/shared/AlphaFold3/databases. Note: if there is a new release of AlphaFold3 and new datasets we haven’t been aware of, you could write us ticket and ask for update.

[2] We can not provide AlphaFold3 model parameters globally on our system because of license restrictions. To request access to the AlphaFold3 model parameters, follow the process set out at the AlphaFold documentation and download the file af3.bin.zst.

[3] A test example json file, 1YU9.json, is put here, and it took less than 10 minutes on cluster Capella to complete the calculation with above given Slurm settings.

1YU9.json

{
  "name": "1YU9",
  "modelSeeds": [1],
  "sequences": [
    {"protein": {
      "id": "A",
      "sequence": "GPLGSETYDFLFKFLVIGNAGTGKSCLLHQFIEKKFKDDSNHTIGVEFGSKIINVGGKYVKLQIWDTAGQER
      FRSVTRSYYRGAAGALLVYDITSRETYNALTNWLTDARMLASQNIVIILCGNKKDLDADREVTFLEASRFAQENELMFLETSALT
      GEDVEEAFVQCARKILNK"
    }}
  ],
  "dialect": "alphafold3",
  "version": 1
}