Skip to contents

Example using default bayesian method on example file

This example data is a subset of the dataset from @Labouyrie2023-yc.

First, the genome sizes are predicted from the taxa:

  results = estimate_genome_size(example_input_file, sep='\t', match_column='TAXID', output_format='input', ci_threshold = 0.3)
  
  #############################################################################
  # Genome size estimation summary:
  #
  #  22.22222 % estimations achieving required precision
  #
       Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
    3007721   5408472  16980834  23969767  41811396 143278734 
  
  # Estimation status:
  Confidence interval to estimated size ratio > ci_threshold                                                         OK 
                                                         140                                                         40 

Plot genome size histogram per sample

Then, the results can be visualized using the plotting functions provided. This histogram shows the estimated genome sizes for each sample.

  plotted_df = plot_genome_size_histogram(results)

Plot genome size histogram for one sample

  plotted_df = plot_genome_size_histogram(results, only_sample='16S_1')

Plot genome size boxplot per sample

This boxplot shows the estimated genome sizes for each sample:

  plotted_df = plot_genome_size_boxplot(results)

Plot genome size boxplot for one sample

  plotted_df = plot_genome_size_boxplot(results, only_sample='ITS_1')

Plot simplified taxonomic tree with colour-coded estimated genome sizes

This tree shows the taxonomic relationships as well as the estimated genome sizes. The difference between the genome size distribution of bacteria (16S marker) and fungi (ITS marker) is visible.

  plotted_df = plot_genome_size_tree(results, refdata_archive_path)
## Untarring reference data
## Using reference data in: /tmp/RtmpzO6rdM/refdata
## Untarring taxonomy
## Using taxonomy: /tmp/RtmpzO6rdM/taxdump