IdentifySpuriousGeneSchema - Tutorial
Objective
This tutorial will guide you through the process of using the IdentifySpuriousGene module to identify spurious genes in a schema.
Prerequisites
Download the schema from chewBBACA’s tutorial.
Procedure
Open the terminal
Modify and run the following command to identify spurious genes in the schema:
SR IdentifySpuriousGenes -s '/path/to/tutorial_schema/schema_seed' -a '/path/to/Allele_calling' -o '/path/to/files/output_folder/IdentifySpuriousGenesSchema' -m schema -pm alleles_vs_alleles --t 11 -c 6
Important
Replace /path/to/files/ with the actual path to the files.
Check the output directory for the list of identified spurious genes. The first lines of the file containing the clusters of potential spurious loci should look like this:
Locus Action Class
GCA-000730255-protein547 Join 1a
GCA-000427055-protein583 Join 1a
GCA-000730215-protein2131 Choice 4b
GCA-000196055-protein1223 Choice 1c
GCA-000007265-protein1233 Choice 1c
GCA-000007265-protein534 Choice 1c
GCA-000012705-protein1877 Choice 4b
#
GCA-000730215-protein1962 Join 1a
GCA-000007265-protein1932 Join 1a
GCA-000196055-protein485 Choice 1c
GCA-000196055-protein1146 Choice 1c
GCA-000196055-protein398 Choice 1c
GCA-000427075-protein1286 Choice 1c
#
Example Output Structure
To see the expected output structure, refer to the “Outputs” section in the IdentifySpuriousGenes documentation.
Conclusion
You have successfully identified spurious genes in a schema using the IdentifySpuriousGene.