ColabFold Downloads

  1. uniref30_2103.tar.gz
    MD5 Hash
    Byte Size
  2. bfd_mgy_colabfold.tar.gz
    MD5 Hash
    Byte Size
  3. colabfold_envdb_202108.tar.gz
    MD5 Hash
    Byte Size

Database information

ColabFold databases are MMseqs2 expandable profile databases to generate diverse multiple sequence alignments to predict protein structures. They are the backend of our ColabFold MMseqs2 searches. Here you can download three databases: (1) UniRef30, (2) BFD/Mgnfiy and (3) ColabFold DB.

  • (1) UniRef30 is a 30% sequence identity clustered database based on UniRef100.
  • (2) BFD/Mgnfiy is a combination of BFD and Mgnfiy (2019_05). We merged both databases by searching the Mgnify sequences against the BFD cluster representative sequences. Each Mgnify sequence with a sequence identity high 30% and a local alignment that covers at least 90% of its length is assigned to the BFD cluster. All remaining sequences are clustered at 30% sequence identity and 90% coverage (--min-seq-id 0.3 -c 0.3 --cov-mode1 -s 3) and merged with the BFD clusters, resulting in 182 million cluster. For each cluster we keep only the 10 most diverse sequences (filterresult --diff 100)
  • (3) Colabfold DB is similarly contructured to BFD/Mgnify. It contains BFD/Mgnify, MetaEuk (Levy Karin et al), SMAG (Delmont et al), TOPAZ (Alexander et al), MGV (Nayfach et al), GPD (Camarillo-Guerrero et al) and MetaClust2.

Setup ColabFold Search

In order to setup the ColabFold MMseqs2 search you need an MMseqs2 version of commit or newer. Convert the UniRef30 and either the BFD/Mgnfiy or Colabfold DB using tsv2exprofiledb.

Build database

mkdir database
tar xzvf uniref30_2103.tar.gz
mmseqs tsv2exprofiledb uniref30_2103 uniref30_2103_db
tar xzvf colabfold_envdb_202108.tar.gz
mmseqs tsv2exprofiledb colabfold_envdb_202108 colabfold_envdb_202108_db
mmseqs createindex uniref30_2103_db tmp
mmseqs createindex colabfold_envdb_202108_db tmp
cd ..

Run the search script

Our ColabFold search script is here

# run search
chmod +x
./ mmseqs "query.fasta" "database/" "result/" "uniref30_2103_db" "" "colabfold_envdb_202108_db" "1" "0" "1"

The result will be in result folder. It contains a uniref.a3m and a bfd.mgnify30.metaeuk30.smag30.a3m file. The query.fasta can contain multiple queries. Each query is seperated by null byte.