IdentifySpuriousGenesUnclassifiedCDS - Tutorial

Objective

This tutorial will guide you through the process of using the IdentifySpuriousGenes module to identify potential new loci or spurious alleles.

Prerequisites

Procedure

  1. Open the terminal

  2. Modify and run the following command to evaluate a set of unclassified CDSs:

SR IdentifySpuriousGenes -s '/path/to/tutorial_schema/schema_seed' -a '/path/to/Allele_calling' -o '/path/to/files/output_folder/IdentifySpuriousGenesUnclassifiedCDS' -m unclassified_cds --t 11 -c 6

Important

Replace /path/to/files/ with the actual path to the files.

  1. The output directory contains files with the groups of similar alleles that were identfied. The first lines of the final clusters file should look like:

Locus       Action  Class
GCA-000831145-protein1681   Join    1a
GCA-000831105-protein622    Join    1a
GCA-000012705-protein568    Join    1a
GCA-001275545-protein1163   Join    1a
GCA-000007265-protein582    Join    1a
GCA-000730215-protein582    Join    1a
GCA-000730255-protein607    Join    1a
GCA-000427075-protein661    Choice  3b
#
GCA-000427055-protein712    Join    1a
GCA-000730255-protein664    Join    1a
#

Example Output Structure

To see the expected output structure, refer to the “Outputs” section in the IdentifySpuriousGenes documentation.

Conclusion

You have successfully classified unclassified coding sequences (CDS) using the IdentifySpuriousGenes module.