About the package
This R package uses statistical modelling on data from NCBI databases and provides three statistical methods for genome size prediction of a given taxon, or group of taxa.
A straightforward weighted mean method identifies the closest taxa with available genome size information in the taxonomic tree and averages their genome sizes using weights based on taxonomic distance. A frequentist random effect model uses nested genus and family information to output genome size estimates. Finally, a third option provides predictions from a distributional Bayesian multilevel model which uses taxonomic information from genus all the way to superkingdom, therefore providing estimates and uncertainty bounds even for under-represented taxa.
All three methods use:
- A list of queries; a query being a taxon or a list of several taxa.
- A reference database containing all the known genome sizes, built from the NCBI databases, with associated taxa.
- A taxonomic tree structure as built by the NCBI.
genomesizeR
retrieves the taxonomic classification of input queries, estimates the genome size of each query, and provides 95% confidence intervals for each estimate.
How to install
Prerequisites: R
with the already installed packages up-to-date, and git
Run one of the commands below in an R console to install the package. We include four different installation methods, as some setups (for example, corporate networks) may block specific download mechanisms:
install.packages("remotes")
remotes::install_github("https://github.com/ScionResearch/genomesizeR")
- OR -
install.packages("remotes")
remotes::install_git("https://github.com/ScionResearch/genomesizeR")
- OR -
install.packages("devtools")
devtools::install_github("ScionResearch/genomesizeR")
- OR -
install.packages("pak")
pak::pkg_install("git::https://github.com/ScionResearch/genomesizeR")
You also need to download the archive containing the reference databases and the bayesian models from zenodo.org
, using the inborutils
package. You can change the path
option to where you want to download the archive (default is current directory ‘.’):
remotes::install_github("inbo/inborutils")
inborutils::download_zenodo("10.5281/zenodo.13733183", path=".")