Estimate genome sizes
estimate_genome_size.Rd
This function loads a query file or table and an archive containing reference databases and bayesian models, and predicts genome sizes.
Usage
estimate_genome_size(
queries,
refdata_path,
format = "csv",
sep = ",",
match_column = NA,
match_sep = ";",
output_format = "input",
method = "bayesian",
ci_threshold = 0.2,
n_cores = "half"
)
Arguments
- queries
Queries: path to csv or BIOM file, or variable name of table object
- refdata_path
Path to the downloadable archive containing the reference databases and the bayesian models
- format
Query format: csv/dataframe format ('table', default), taxonomy table format as used in e.g. phyloseq ('tax_table') or BIOM format ('biom')
- sep
If table format, column separator
- match_column
If table format, the column containing match information (with one or several matches)
- match_sep
If table format and several matches in match column, separator between matches
- output_format
Format in which the output should be. Default: "input" a data frame with the same columns as the input, with the added columns: "TAXID", "estimated_genome_size", "confidence_interval_lower", "confidence_interval_upper", "genome_size_estimation_status", "model_used", as well as taxids at all ranks. Other formats available: "data.frame", a data frame with only the previous columns, without the taxid columns.
- method
Method to use for genome size estimation, 'bayesian' (default), 'weighted_mean' or 'lmm'
- ci_threshold
Threshold for the confidence interval as a proportion of the predicted size (e.g. 0.2 means that estimations with a confidence interval that represents more than 20% of the predicted size will be discarded)
- n_cores
Number of CPU cores to use (default is 'half': half of all available cores)