Title: | Metabolomics-Based Models for Imputing Risk |
---|---|
Description: | Provides an intuitive framework for ad-hoc statistical analysis of 1H-NMR metabolomics by Nightingale Health. It allows to easily explore new metabolomics measurements assayed by Nightingale Health, comparing the distributions with a large Consortium (BBMRI-nl); project previously published metabolic scores [<doi:10.1016/j.ebiom.2021.103764>, <doi:10.1161/CIRCGEN.119.002610>, <doi:10.1038/s41467-019-11311-9>, <doi:10.7554/eLife.63033>, <doi:10.1161/CIRCULATIONAHA.114.013116>, <doi:10.1007/s00125-019-05001-w>]; and calibrate the metabolic surrogate values to a desired dataset. |
Authors: | Daniele Bizzarri [aut, cre] , Marcel Reinders [aut, ths] , Marian Beekman [aut] , Pieternella Eline Slagboom [aut, ths] , Erik van den Akker [aut, ths] |
Maintainer: | Daniele Bizzarri <[email protected]> |
License: | GPL-3 |
Version: | 1.4 |
Built: | 2024-11-17 04:41:35 UTC |
Source: | https://github.com/danielebizzarri/mimir |
Accuracy of the Leave One Biobank Out Validation of the surrogate metabolic-modesl performed in BBMRI-nl
data("acc_LOBOV")
data("acc_LOBOV")
An object of class list
of length 20.
Dataframe containing the accuracy obtained during the Leave One Biobank Out Validation of the surrogate metabolic-modesl in BBMRI-nl.
The method is described in: Bizzarri,D. et al. (2022) 1H-NMR metabolomics-based surrogates to impute common clinical risk factors and endpoints. EBioMedicine, 75, 103764, doi:10.1016/j.ebiom.2021.103764
data("acc_LOBOV")
data("acc_LOBOV")
The coefficients used to compute the T2Diabetes score by Ahola Olli.
data("Ahola_Olli_betas")
data("Ahola_Olli_betas")
An object of class data.frame
with 7 rows and 3 columns.
Dataframe containing the abbreviation of the metabolites, the metabolites names and finally the Coefficients to compute the T2Diabetes score
Ahola-Olli,A.V. et al. (2019) Circulating metabolites and the risk of type 2 diabetes: a prospective study of 11,896 young adults from four Finnish cohorts. Diabetologia, 62, 2298-2309, doi:10.1007/s00125-019-05001-w
data("Ahola_Olli_betas")
data("Ahola_Olli_betas")
Distributions of the Nightingale Health metabolic features in BBMRI-nl
data("BBMRI_hist")
data("BBMRI_hist")
An object of class list
of length 57.
List containing the histograms of the metabolomics-features in BBMRI-nl.
data("BBMRI_hist")
data("BBMRI_hist")
Function to plot the ~60 metabolites used for the metabolomics-based scores and compare them to to their distributions in BBMRI-nl
BBMRI_hist_plot( dat, x_name, color = MiMIR::c21, scaled = FALSE, datatype = "metabolite", main = "Comparison with the metabolites measures in BBMRI" )
BBMRI_hist_plot( dat, x_name, color = MiMIR::c21, scaled = FALSE, datatype = "metabolite", main = "Comparison with the metabolites measures in BBMRI" )
dat |
data.frame or matrix with the metabolites |
x_name |
string with the name of the selected variable |
color |
colors selected for all the variables |
scaled |
logical to z-scale the variables |
datatype |
a character vector indicating what data type is being plotted |
main |
title of the plot |
This function plots the distribution of a metabolic feature in the uploaded dataset, compared to their distributions in BBMRI-nl. The selection of features available is done following the metabolic scores features.
plotly image with the histogram of the selected variable compared to the distributions in BBMRI-nl
The selection of metabolic features available is the one selected by the papers: Deelen,J. et al. (2019) A metabolic profile of all-cause mortality risk identified in an observational study of 44,168 individuals. Nature Communications, 10, 1-8, doi:10.1038/s41467-019-11311-9 Ahola-Olli,A.V. et al. (2019) Circulating metabolites and the risk of type 2 diabetes: a prospective study of 11,896 young adults from four Finnish cohorts. Diabetologia, 62, 2298-2309, doi:10.1007/s00125-019-05001-w Wurtz,P. et al. (2015) Metabolite profiling and cardiovascular event risk: a prospective study of 3 population-based cohorts. Circulation, 131, 774-785, doi:10.1161/CIRCULATIONAHA.114.013116 Bizzarri,D. et al. (2022) 1H-NMR metabolomics-based surrogates to impute common clinical risk factors and endpoints. EBioMedicine, 75, 103764, doi:10.1016/j.ebiom.2021.103764 van den Akker Erik B. et al. (2020) Metabolic Age Based on the BBMRI-NL 1H-NMR Metabolomics Repository as Biomarker of Age-related Disease. Circulation: Genomic and Precision Medicine, 13, 541-547, doi:10.1161/CIRCGEN.119.002610
library(plotly) library(MiMIR) #load the metabolites dataset metabolic_measures <- synthetic_metabolic_dataset BBMRI_hist_plot(metabolic_measures, x_name="alb", scaled=TRUE)
library(plotly) library(MiMIR) #load the metabolites dataset metabolic_measures <- synthetic_metabolic_dataset BBMRI_hist_plot(metabolic_measures, x_name="alb", scaled=TRUE)
Z-scaled distributions of the Nightingale Health metabolic features in BBMRI-nl
data("BBMRI_hist_scaled")
data("BBMRI_hist_scaled")
An object of class list
of length 57.
List containing the histograms of the scaled metabolomics-features in BBMRI-nl.
data("BBMRI_hist_scaled")
data("BBMRI_hist_scaled")
Helper function created to binarize the phenotypes used to calculate the metabolomics based surrogate made by Bizzarri et al.
binarize_all_pheno(data)
binarize_all_pheno(data)
data |
phenotypes data.frame containing some of the following variables (with the same namenclature): "sex","diabetes", "lipidmed", "blood_pressure_lowering_med", "current_smoking", "metabolic_syndrome", "alcohol_consumption", "age","BMI", "ln_hscrp","waist_circumference", "weight","height", "triglycerides", "ldl_chol", "hdlchol", "totchol", "eGFR","wbc","hgb" |
Bizzarri et al. built multivariate models,using 56 metabolic features quantified by Nightingale, to predict the 19 binary characteristics of an individual. The binary variables are: sex, diabetes status, metabolic syndrome status, lipid medication usage, blood pressure lowering medication, current smoking, alcohol consumption, high age, middle age, low age, high hsCRP, high triglycerides, high ldl cholesterol, high total cholesterol, low hdl cholesterol, low eGFR, low white blood cells, low hemoglobin levels.
The phenotypic variables binarized following the thresholds in in the metabolomics surrogates made by by Bizzarri et al.
This function was made to binarize the variables following the same rules indicated in the article: Bizzarri,D. et al. (2022) 1H-NMR metabolomics-based surrogates to impute common clinical risk factors and endpoints. EBioMedicine, 75, 103764, doi:10.1016/j.ebiom.2021.103764
pheno_barplots
library(MiMIR) #load the phenotypes dataset phenotypes <- synthetic_phenotypic_dataset #Calculate BMI, LDL cholesterol and eGFR binarized_phenotypes<-binarize_all_pheno(phenotypes)
library(MiMIR) #load the phenotypes dataset phenotypes <- synthetic_phenotypic_dataset #Calculate BMI, LDL cholesterol and eGFR binarized_phenotypes<-binarize_all_pheno(phenotypes)
#' Function created to calculate: 1) BMI using height and weight; 2) LDL cholesterol using HDL cholesterol, triglycerides, totchol; 3) eGFR creatinine levels, sex and age.
BMI_LDL_eGFR(phenotypes, metabo_measures)
BMI_LDL_eGFR(phenotypes, metabo_measures)
phenotypes |
data.frame containing height and weight, HDL cholesterol, triglycerides, totchol, sex and age |
metabo_measures |
numeric data-frame with Nightingale metabolomics quantifications containing creatinine levels (crea) |
phenotypes data.frame with the addition of BMI, LDL cholesterol and eGFR
This function is constructed to calculate BMI, LDL cholesterol and eGFR as in the following papers:
BMI: Flint AJ, Rexrode KM, Hu FB, Glynn RJ, Caspard H, Manson JE et al. Body mass index, waist circumference, and risk of coronary heart disease: a prospective study among men and women. Obes Res Clin Pract 2010; 4: e171-e181, doi:10.1016/j.orcp.2010.01.001
LDL-cholesterol: Friedewald WT, Levy RI, Fredrickson DS. Estimation of the Concentration of Low-Density Lipoprotein Cholesterol in Plasma, Without Use of the Preparative Ultracentrifuge. Clin Chem 1972; 18: 499-502, <doi.org/10.1093/clinchem/18.6.499>
eGFR: Carrero Juan Jesus, Andersson Franko Mikael, Obergfell Achim, Gabrielsen Anders, Jernberg Tomas. hsCRP Level and the Risk of Death or Recurrent Cardiovascular Events in Patients With Myocardial Infarction: a Healthcare-Based Study. J Am Heart Assoc 2019; 8: e012638, <doi: 10.1161/JAHA.119.012638>
library(MiMIR) #load the dataset metabolic_measures <- synthetic_metabolic_dataset phenotypes <- synthetic_phenotypic_dataset #Calculate BMI, LDL cholesterol and eGFR phenotypes<-BMI_LDL_eGFR(phenotypes, metabolic_measures)
library(MiMIR) #load the dataset metabolic_measures <- synthetic_metabolic_dataset phenotypes <- synthetic_phenotypic_dataset #Calculate BMI, LDL cholesterol and eGFR phenotypes<-BMI_LDL_eGFR(phenotypes, metabolic_measures)
Colors attributed to each metabolomics-based model in MiMIR
data("c21")
data("c21")
An object of class character
of length 21.
data("c21")
data("c21")
Function to compute the surrogate scores by Bizzarri et al. from the Nightingale metabolomics matrix
calculate_surrogate_scores( met, pheno, PARAM_surrogates, bin_names = c("sex", "diabetes"), Nmax_miss = 1, Nmax_zero = 1, post = TRUE, roc = FALSE, quiet = FALSE )
calculate_surrogate_scores( met, pheno, PARAM_surrogates, bin_names = c("sex", "diabetes"), Nmax_miss = 1, Nmax_zero = 1, post = TRUE, roc = FALSE, quiet = FALSE )
met |
numeric data-frame with Nightingale-metabolomics |
pheno |
phenotypic data.frame including this clinical variables (with the same nomenclature): "sex","diabetes", "lipidmed", "blood_pressure_lowering_med", "current_smoking", "metabolic_syndrome", "alcohol_consumption", "age","BMI", "ln_hscrp","waist_circumference", "weight","height", "triglycerides", "ldl_chol", "hdlchol", "totchol", "eGFR","wbc","hgb" |
PARAM_surrogates |
list containing the parameters to compute the metabolomics-based surrogates |
bin_names |
vector of strings containing the names of the binary variables |
Nmax_miss |
numeric value indicating the maximum number of missing values allowed per sample (Number suggested=1) |
Nmax_zero |
numeric value indicating the maximum number of zeros allowed per sample (Number suggested=1) |
post |
logical to indicate if the function should calculate the posterior probabilities |
roc |
logical to plot ROC curves for the metabolomics surrogate (available only for the phenotypes included) |
quiet |
logical to suppress the messages in the console |
Bizzarri et al. built multivariate models,using 56 metabolic features quantified by Nightingale, to predict the 19 binary characteristics of an individual. The binary variables are: sex, diabetes status, metabolic syndrome status, lipid medication usage, blood pressure lowering medication, current smoking, alcohol consumption, high age, middle age, low age, high hsCRP, high triglycerides, high ldl cholesterol, high total cholesterol, low hdl cholesterol, low eGFR, low white blood cells, low hemoglobin levels.
if pheno is not available: list with the surrogates and the Nightingale metabolomics matrix after QC. if pheno is available: list with the surrogates, ROC curves, phenotypes, binarized phenotypes and the Nightingale metabolomics matrix after QC,
This function was made to vidualize the binarized variables calculated following the rules indicated in the article: Bizzarri,D. et al. (2022) 1H-NMR metabolomics-based surrogates to impute common clinical risk factors and endpoints. EBioMedicine, 75, 103764, doi:10.1016/j.ebiom.2021.103764
QCprep_surrogates
require(MiMIR) require(foreach) require(pROC) require(foreach) #load dataset m <- synthetic_metabolic_dataset p <- synthetic_phenotypic_dataset #Apply the surrogates sur<-calculate_surrogate_scores(met=m,pheno=p,MiMIR::PARAM_surrogates,bin_names=c("sex","diabetes"))
require(MiMIR) require(foreach) require(pROC) require(foreach) #load dataset m <- synthetic_metabolic_dataset p <- synthetic_phenotypic_dataset #Apply the surrogates sur<-calculate_surrogate_scores(met=m,pheno=p,MiMIR::PARAM_surrogates,bin_names=c("sex","diabetes"))
Function to compute the COVID severity score made by Nightingale Health UK Biobank Initiative et al. on Nightingale metabolomics data-set.
comp_covid_score(dat, betas = MiMIR::covid_betas, quiet = FALSE)
comp_covid_score(dat, betas = MiMIR::covid_betas, quiet = FALSE)
dat |
numeric data-frame with Nightingale-metabolomics |
betas |
data.frame containing the coefficients used for the regression of the COVID-score |
quiet |
logical to suppress the messages in the console |
Multivariate model predicting the risk of severe COVID-19 infection. It is based on 37 metabolic features and trained using LASSO regression on 52,573 samples from the UK-biobanks.
data-frame containing the value of the COVID-score on the uploaded data-set
This function is constructed to be able to apply the COVID-score as described in: Nightingale Health UK Biobank Initiative et al. (2021) Metabolic biomarker profiling for identification of susceptibility to severe pneumonia and COVID-19 in the general population. eLife, 10, e63033, doi:10.7554/eLife.63033
prep_data_COVID_score, covid_betas, comp.mort_score
library(MiMIR) #load the Nightignale metabolomics dataset metabolic_measures <- synthetic_metabolic_dataset #Compute the mortality score mortScore<-comp_covid_score(dat=metabolic_measures, quiet=TRUE)
library(MiMIR) #load the Nightignale metabolomics dataset metabolic_measures <- synthetic_metabolic_dataset #Compute the mortality score mortScore<-comp_covid_score(dat=metabolic_measures, quiet=TRUE)
Function to compute CVD-score made by Peter Wurtz et al. made by Deelen et al. on Nightingale metabolomics data-set.
comp.CVD_score(met, phen, betas, quiet = FALSE)
comp.CVD_score(met, phen, betas, quiet = FALSE)
met |
numeric data-frame with Nightingale-metabolomics |
phen |
data-frame containing phenotypic information of the samples (specifically: sex, systolic_blood_pressure, current_smoking, diabetes, blood_pressure_lowering_med, lipidmed, totchol, and hdlchol) |
betas |
The betas of the linear regression composing the CVD-score |
quiet |
logical to suppress the messages in the console |
data-frame containing the value of the CVD-score on the uploaded data-set
This function is constructed to be able to apply the CVD-score as described in: Wurtz,P. et al. (2015) Metabolite profiling and cardiovascular event risk: a prospective study of 3 population-based cohorts. Circulation, 131, 774-785, doi:10.1161/CIRCULATIONAHA.114.013116
prep_met_for_scores, CVD_score_betas, comp.T2D_Ahola_Olli, comp.mort_score
library(MiMIR) #load the dataset met <- synthetic_metabolic_dataset phen<-synthetic_phenotypic_dataset #Prepare the metabolic features fo the mortality score CVDscore<-comp.CVD_score(met= met, phen=phen, betas=MiMIR::CVD_score_betas, quiet=TRUE)
library(MiMIR) #load the dataset met <- synthetic_metabolic_dataset phen<-synthetic_phenotypic_dataset #Prepare the metabolic features fo the mortality score CVDscore<-comp.CVD_score(met= met, phen=phen, betas=MiMIR::CVD_score_betas, quiet=TRUE)
Function to compute the mortality score made by Deelen et al. on Nightingale metabolomics data-set.
comp.mort_score(dat, betas = mort_betas, quiet = FALSE)
comp.mort_score(dat, betas = mort_betas, quiet = FALSE)
dat |
numeric data-frame with Nightingale-metabolomics |
betas |
data.frame containing the coefficients used for the regression of the mortality score |
quiet |
logical to suppress the messages in the console |
This multivariate model predicts all-cause mortality at 5 or 10 years better than clinical variables normally associated with mortality. It is constituted of 14 metabolic features quantified by Nightingale Health. It was originally trained using a stepwise Cox regression analysis in a meta-analysis on 12 cohorts composed by 44,168 individuals.
data-frame containing the value of the mortality score on the uploaded data-set
This function is constructed to be able to apply the mortality score as described in: Deelen,J. et al. (2019) A metabolic profile of all-cause mortality risk identified in an observational study of 44,168 individuals. Nature Communications, 10, 1-8, doi:10.1038/s41467-019-11311-9
prep_met_for_scores, mort_betas, comp.T2D_Ahola_Olli, comp.CVD_score
library(MiMIR) #load the Nightignale metabolomics dataset metabolic_measures <- synthetic_metabolic_dataset #Prepare the metabolic features fo the mortality score mortScore<-comp.mort_score(metabolic_measures,quiet=TRUE)
library(MiMIR) #load the Nightignale metabolomics dataset metabolic_measures <- synthetic_metabolic_dataset #Prepare the metabolic features fo the mortality score mortScore<-comp.mort_score(metabolic_measures,quiet=TRUE)
Function to compute the T2D score made by Ahola Olli et al. on Nightingale metabolomics data-set.
comp.T2D_Ahola_Olli(met, phen, betas, quiet = FALSE)
comp.T2D_Ahola_Olli(met, phen, betas, quiet = FALSE)
met |
numeric data-frame with Nightingale-metabolomics |
phen |
data-frame containing phenotypic information of the samples (in particular: sex, age, BMI and the clinically measured glucose) |
betas |
The betas of the linear regression composing the T2D-score |
quiet |
logical to suppress the messages in the console |
This metabolomics-based score is associated with incident Type 2 Diabetes, made by Ahola-Olli et al. It is constructed using phe, l_vldl_ce_percentage and l_hdl_fc quantified by Nightingale Health, and some phenotypic information: sex, age, BMI, fasting glucose. It was trained using a stepwise logistic regression on 3 cohorts.
data-frame containing the value of the T2D-score on the uploaded data-set
This function is constructed to be able to apply the T2D-score as described in: Ahola-Olli,A.V. et al. (2019) Circulating metabolites and the risk of type 2 diabetes: a prospective study of 11,896 young adults from four Finnish cohorts. Diabetologia, 62, 2298-2309, doi:10.1007/s00125-019-05001-w
prep_met_for_scores, Ahola_Olli_betas, comp.mort_score, comp.CVD_score
library(MiMIR) #load the dataset met <- synthetic_metabolic_dataset phen<-synthetic_phenotypic_dataset #Prepare the metabolic features fo the mortality score T2Dscore<-comp.T2D_Ahola_Olli(met= met, phen=phen,betas=MiMIR::Ahola_Olli_betas, quiet=TRUE)
library(MiMIR) #load the dataset met <- synthetic_metabolic_dataset phen<-synthetic_phenotypic_dataset #Prepare the metabolic features fo the mortality score T2Dscore<-comp.T2D_Ahola_Olli(met= met, phen=phen,betas=MiMIR::Ahola_Olli_betas, quiet=TRUE)
Function to calulate the correlation between 2 matrices
cor_assoc(dat1, dat2, feat1, feat2, method = "pearson", quiet = FALSE)
cor_assoc(dat1, dat2, feat1, feat2, method = "pearson", quiet = FALSE)
dat1 |
matrix 1 |
dat2 |
matrix 2 |
feat1 |
vector of strings with the names of the selected variables in dat |
feat2 |
vector if strings with the names of the selected variables in dat2 |
method |
indicates which methods of the correlation to use |
quiet |
logical to suppress the messages in the console |
correlations of the selected variables in the 2 martrices
plot_corply
library(stats) #load the dataset m <- as.matrix(synthetic_metabolic_dataset) #Compute the pearson correlation of all the variables in the data.frame metabolic_measures cors<-cor_assoc(m, m, MiMIR::metabolites_subsets$MET63,MiMIR::metabolites_subsets$MET63)
library(stats) #load the dataset m <- as.matrix(synthetic_metabolic_dataset) #Compute the pearson correlation of all the variables in the data.frame metabolic_measures cors<-cor_assoc(m, m, MiMIR::metabolites_subsets$MET63,MiMIR::metabolites_subsets$MET63)
The coefficients used to compute the COVID score by Nightingale Health UK Biobank Initiative et al.
data("covid_betas")
data("covid_betas")
An object of class data.frame
with 25 rows and 3 columns.
Dataframe containing the abbreviation of the metabolites, the metabolites names and finally the Coefficients to compute the COVID score
Nightingale Health UK Biobank Initiative et al. (2021) Metabolic biomarker profiling for identification of susceptibility to severe pneumonia and COVID-19 in the general population. eLife, 10, e63033, doi:10.7554/eLife.63033
data("covid_betas")
data("covid_betas")
The coefficients used to compute the CVD score by Wurtz et al.
data("CVD_score_betas")
data("CVD_score_betas")
An object of class data.frame
with 12 rows and 3 columns.
Dataframe containing the abbreviation of the metabolites, the metabolites names and finally the Coefficients to compute the COVID score
Wurtz,P. et al. (2015) Metabolite profiling and cardiovascular event risk: a prospective study of 3 population-based cohorts. Circulation, 131, 774-785, doi:10.1161/CIRCULATIONAHA.114.013116
data("CVD_score_betas")
data("CVD_score_betas")
Function to translate Nightingale metabolomics alternative metabolite names to the ones used in BBMRI-nl
find_BBMRI_names(names)
find_BBMRI_names(names)
names |
vector of strings with the metabolic features names to be translated |
data.frame with the uploaded metabolites names on the first column and the BBMRI names on the second column.
This is a function originally created for the package ggforestplot and modified ad hoc for our package (https://nightingalehealth.github.io/ggforestplot/articles/index.html).
library(MiMIR) library(purrr) #load the Nightignale metabolomics dataset metabolic_measures <- synthetic_metabolic_dataset #Find the metabolites names used in BBMRI-nl nam<-find_BBMRI_names(colnames(metabolic_measures))
library(MiMIR) library(purrr) #load the Nightignale metabolomics dataset metabolic_measures <- synthetic_metabolic_dataset #Find the metabolites names used in BBMRI-nl nam<-find_BBMRI_names(colnames(metabolic_measures))
#' Function to plot the histograms for all the variables in dat
hist_plots( dat, x_name, color = MiMIR::c21, scaled = FALSE, datatype = "metabolic score", main = "Predictors Distributions" )
hist_plots( dat, x_name, color = MiMIR::c21, scaled = FALSE, datatype = "metabolic score", main = "Predictors Distributions" )
dat |
data.frame or matrix with the variables to plot |
x_name |
string with the names of the selected variables in dat |
color |
colors selected for all the variables |
scaled |
logical to z-scale the variables |
datatype |
a character vector indicating what data type is beeing plotted |
main |
title of the plot |
plotly image with the histograms of the selected variables
require(MiMIR) require(plotly) require(matrixStats) #load the metabolites dataset m <- synthetic_metabolic_dataset #Apply a surrogate models and plot the ROC curve surrogates<-calculate_surrogate_scores(m, PARAM_surrogates=MiMIR::PARAM_surrogates, roc=FALSE) #Plot the histogram of the surrogate sex values scaled hist_plots(surrogates$surrogates, x_name="s_sex", scaled=TRUE)
require(MiMIR) require(plotly) require(matrixStats) #load the metabolites dataset m <- synthetic_metabolic_dataset #Apply a surrogate models and plot the ROC curve surrogates<-calculate_surrogate_scores(m, PARAM_surrogates=MiMIR::PARAM_surrogates, roc=FALSE) #Plot the histogram of the surrogate sex values scaled hist_plots(surrogates$surrogates, x_name="s_sex", scaled=TRUE)
#' Function to plot the histogram of the mortality score separated for different age ranges as a plotly image
hist_plots_mortality(mort_score, phenotypes)
hist_plots_mortality(mort_score, phenotypes)
mort_score |
data.frame containing the mortality score |
phenotypes |
data.frame containing age |
plotly image with the histogram of the mortality score separated in 3 age ranges
library(MiMIR) library(plotly) #' #load the dataset metabolic_measures <- synthetic_metabolic_dataset phenotypes <- synthetic_phenotypic_dataset #Compute the mortality score mortScore<-comp.mort_score(metabolic_measures,quiet=TRUE) #Plot the mortality score histogram at different ages hist_plots_mortality(mortScore, phenotypes)
library(MiMIR) library(plotly) #' #load the dataset metabolic_measures <- synthetic_metabolic_dataset phenotypes <- synthetic_phenotypic_dataset #Compute the mortality score mortScore<-comp.mort_score(metabolic_measures,quiet=TRUE) #Plot the mortality score histogram at different ages hist_plots_mortality(mortScore, phenotypes)
#' Function that creates a Kaplan Meier comparing first and last tertile of a metabolic score
kapmeier_scores(predictors, pheno, score, Eventname = "Event")
kapmeier_scores(predictors, pheno, score, Eventname = "Event")
predictors |
The data.frame containing the predictors |
pheno |
The data.frame containing the phenotypes |
score |
a character string indicating which predictor to use |
Eventname |
a character string with the name of the event to print on the plot |
plotly with a Kaplan Meier comparing first and last tertile of a metabolic score
require(MiMIR) require(plotly) require(survminer) require(ggfortify) require(ggplot2) #load the dataset metabolic_measures <- synthetic_metabolic_dataset phenotypes <- synthetic_phenotypic_dataset #Compute the mortality score mortScore<-comp.mort_score(metabolic_measures,quiet=TRUE) #Plot a Kaplan Meier kapmeier_scores(predictors=mortScore, pheno=phenotypes, score="mortScore")
require(MiMIR) require(plotly) require(survminer) require(ggfortify) require(ggplot2) #load the dataset metabolic_measures <- synthetic_metabolic_dataset phenotypes <- synthetic_phenotypic_dataset #Compute the mortality score mortScore<-comp.mort_score(metabolic_measures,quiet=TRUE) #Plot a Kaplan Meier kapmeier_scores(predictors=mortScore, pheno=phenotypes, score="mortScore")
Function created to visualize the accuracies in the current dataset compared to the accuracies in the Leave One Biobank Out Validation in Bizzarri et al.
LOBOV_accuracies(surrogates, bin_phenotypes, bin_pheno_available, acc_LOBOV)
LOBOV_accuracies(surrogates, bin_phenotypes, bin_pheno_available, acc_LOBOV)
surrogates |
numeric data.frame containing the surrogate values by Bizzarri et al. |
bin_phenotypes |
numeric data.frame with the binarized phenotypes output of binarize_all_pheno |
bin_pheno_available |
vector of strings with the available phenotypes |
acc_LOBOV |
accuracy of LOBOV calculated in Bizzarri et al. |
Comparison of the AUCs of the surrogates in the updated dataset and the results of the Leave One Biobank Out Validation made in BBMRI-nl.
Boxplot with the accuracies of the LOBOV
This function was made to vidualize the binarized variables calculated following the rules indicated in the article: Bizzarri,D. et al. (2022) 1H-NMR metabolomics-based surrogates to impute common clinical risk factors and endpoints. EBioMedicine, 75, 103764, doi:10.1016/j.ebiom.2021.103764
require(pROC) require(plotly) require(MiMIR) require(foreach) require(ggplot2) #load the dataset m <- synthetic_metabolic_dataset p<- synthetic_phenotypic_dataset #Calculating the binarized surrogates b_p<-binarize_all_pheno(p) #Apply a surrogate models and plot the ROC curve sur<-calculate_surrogate_scores(m, p, MiMIR::PARAM_surrogates, bin_names=colnames(b_p)) p_avail<-colnames(b_p)[c(1:5)] LOBOV_accuracies(sur$surrogates, b_p, p_avail, MiMIR::acc_LOBOV)
require(pROC) require(plotly) require(MiMIR) require(foreach) require(ggplot2) #load the dataset m <- synthetic_metabolic_dataset p<- synthetic_phenotypic_dataset #Calculating the binarized surrogates b_p<-binarize_all_pheno(p) #Apply a surrogate models and plot the ROC curve sur<-calculate_surrogate_scores(m, p, MiMIR::PARAM_surrogates, bin_names=colnames(b_p)) p_avail<-colnames(b_p)[c(1:5)] LOBOV_accuracies(sur$surrogates, b_p, p_avail, MiMIR::acc_LOBOV)
Translator of the names of the metabolomics-features to the ones used in BBMRI-nl
data("metabo_names_translator")
data("metabo_names_translator")
An object of class data.frame
with 228 rows and 9 columns.
This is a list originally created for the package ggforestplot and modified ad-hoc for our package (https://nightingalehealth.github.io/ggforestplot/articles/index.html).
data("metabo_names_translator")
data("metabo_names_translator")
List containing all the subset of the metabolomics-based features used for our models
data("metabolites_subsets")
data("metabolites_subsets")
An object of class list
of length 8.
The selection of metabolic features available is the one selected by the papers: Deelen,J. et al. (2019) A metabolic profile of all-cause mortality risk identified in an observational study of 44,168 individuals. Nature Communications, 10, 1-8, doi:10.1038/s41467-019-11311-9 Ahola-Olli,A.V. et al. (2019) Circulating metabolites and the risk of type 2 diabetes: a prospective study of 11,896 young adults from four Finnish cohorts. Diabetologia, 62, 2298-2309, doi:10.1007/s00125-019-05001-w Wurtz,P. et al. (2015) Metabolite profiling and cardiovascular event risk: a prospective study of 3 population-based cohorts. Circulation, 131, 774-785, doi:10.1161/CIRCULATIONAHA.114.013116 Bizzarri,D. et al. (2022) 1H-NMR metabolomics-based surrogates to impute common clinical risk factors and endpoints. EBioMedicine, 75, 103764, doi:10.1016/j.ebiom.2021.103764 van den Akker Erik B. et al. (2020) Metabolic Age Based on the BBMRI-NL 1H-NMR Metabolomics Repository as Biomarker of Age-related Disease. Circulation: Genomic and Precision Medicine, 13, 541-547, doi:10.1161/CIRCGEN.119.002610
data("metabolites_subsets")
data("metabolites_subsets")
Function to calculate a Metabolome Wide Association study
MetaboWAS(met, pheno, test_variable, covariates, img = TRUE, adj_method = "BH")
MetaboWAS(met, pheno, test_variable, covariates, img = TRUE, adj_method = "BH")
met |
numeric data.frame with the metabolomics features |
pheno |
data.frame containing the phenotype of interest |
test_variable |
string vector with the name of the phenotype of interest |
covariates |
string vector with the name of the variables to be added as a covariate |
img |
logical indicating if the function should plot a Manhattan plot |
adj_method |
multiple testing correction method |
This is a function to compute linear associations individually for each variable in the first data.frame with the test variable and corrected for the selected covariates. This function to computes linear regression modelindividually for each variable in the first data.frame with the test variable and adjusted for potential confounders. False Discovery Rate (FDR) is applied to account for multiple testing correction. The user has the faculty to select the test variable and the potential covariates within the pool of variables in the phenotypic file input. The results of the associations are reported in a Manhattan plot
The p-value of the association is then corrected using Benjamini Hochberg. Finally we use plotly to plot a Manhattan Plot, which reports on the x-axis the list of metabolites reported in the Nightingale Health, divided in groups, and on the y-axis the -log (adjusted p-value).
res= the results of the MetaboWAS, manhplot= the Manhattan plot made with plotly, N_hits= the number of significant hits
This method is also described and used in: Bizzarri,D. et al. (2022) 1H-NMR metabolomics-based surrogates to impute common clinical risk factors and endpoints. EBioMedicine, 75, 103764, doi:10.1016/j.ebiom.2021.103764
require(MiMIR) require(plotly) require(ggplot2) #' #load the dataset metabolic_measures <- synthetic_metabolic_dataset phenotypes <- synthetic_phenotypic_dataset #Computing a MetaboWAS for age corrected by sex MetaboWAS(met=metabolic_measures, pheno=phenotypes, test_variable="age", covariates= "sex")
require(MiMIR) require(plotly) require(ggplot2) #' #load the dataset metabolic_measures <- synthetic_metabolic_dataset phenotypes <- synthetic_phenotypic_dataset #Computing a MetaboWAS for age corrected by sex MetaboWAS(met=metabolic_measures, pheno=phenotypes, test_variable="age", covariates= "sex")
The coefficients used to compute the mortality score by Deelen et al.
data("mort_betas")
data("mort_betas")
An object of class data.frame
with 14 rows and 3 columns.
Dataframe containing the abbreviation of the metabolites, the metabolites names and finally the Coefficients to compute the mortality score
Deelen,J. et al. (2019) A metabolic profile of all-cause mortality risk identified in an observational study of 44,168 individuals. Nature Communications, 10, 1-8, doi:10.1038/s41467-019-11311-9
data("mort_betas")
data("mort_betas")
#' Function to plot the histograms for all the variables in dat
multi_hist(dat, color = MiMIR::c21, scaled = FALSE)
multi_hist(dat, color = MiMIR::c21, scaled = FALSE)
dat |
data.frame or matrix with the variables to plot |
color |
colors selected for all the variables |
scaled |
logical to z-scale the variables |
plotly image with the histograms for all the variables in dat
library(plotly) library(MiMIR) #load the dataset metabolic_measures <- synthetic_metabolic_dataset multi_hist(metabolic_measures[,MiMIR::metabolites_subsets$MET14], scaled=T)
library(plotly) library(MiMIR) #load the dataset metabolic_measures <- synthetic_metabolic_dataset multi_hist(metabolic_measures[,MiMIR::metabolites_subsets$MET14], scaled=T)
The coefficients used to compute the MetaboAge by van den Akker et al.
data("PARAM_metaboAge")
data("PARAM_metaboAge")
An object of class list
of length 8.
List containing all the information to pre-process and compute the MetaboAge.
van den Akker Erik B. et al. (2020) Metabolic Age Based on the BBMRI-NL 1H-NMR Metabolomics Repository as Biomarker of Age-related Disease. Circulation: Genomic and Precision Medicine, 13, 541-547, doi:10.1161/CIRCGEN.119.002610
data("PARAM_metaboAge")
data("PARAM_metaboAge")
The coefficients used to compute the metabolomics-based surrogate clinical variables by Bizzarri et al.
data("PARAM_surrogates")
data("PARAM_surrogates")
An object of class list
of length 6.
List containing all the information to pre-process and compute the surrogate clinical variables.
Bizzarri,D. et al. (2022) 1H-NMR metabolomics-based surrogates to impute common clinical risk factors and endpoints. EBioMedicine, 75, 103764, doi:10.1016/j.ebiom.2021.103764
data("PARAM_surrogates")
data("PARAM_surrogates")
#' Function created to binarize the phenotypes used to calculate the metabolomics based surrogate made by Bizzarri et al.
pheno_barplots(bin_phenotypes)
pheno_barplots(bin_phenotypes)
bin_phenotypes |
phenotypes data.frame containing some of the following variables (with the same namenclature): "sex","diabetes", "lipidmed", "blood_pressure_lowering_med", "current_smoking", "metabolic_syndrome", "alcohol_consumption", "age","BMI", "ln_hscrp","waist_circumference", "weight","height", "triglycerides", "ldl_chol", "hdlchol", "totchol", "eGFR","wbc","hgb" |
Bizzarri et al. built multivariate models,using 56 metabolic features quantified by Nightingale, to predict the 19 binary characteristics of an individual. The binary variables are: sex, diabetes status, metabolic syndrome status, lipid medication usage, blood pressure lowering medication, current smoking, alcohol consumption, high age, middle age, low age, high hsCRP, high triglycerides, high ldl cholesterol, high total cholesterol, low hdl cholesterol, low eGFR, low white blood cells, low hemoglobin levels.
The phenotypic variables binarized following the thresholds in in the metabolomics surrogates made by by Bizzarri et al.
This function was made to vidualize the binarized variables calculated following the rules indicated in the article: Bizzarri,D. et al. (2022) 1H-NMR metabolomics-based surrogates to impute common clinical risk factors and endpoints. EBioMedicine, 75, 103764, doi:10.1016/j.ebiom.2021.103764
binarize_all_pheno
require(MiMIR) require(foreach) #load the phenotypes dataset phenotypes <- synthetic_phenotypic_dataset #Calculate BMI, LDL cholesterol and eGFR binarized_phenotypes<-binarize_all_pheno(phenotypes) #Plot the variables pheno_barplots(binarized_phenotypes)
require(MiMIR) require(foreach) #load the phenotypes dataset phenotypes <- synthetic_phenotypic_dataset #Calculate BMI, LDL cholesterol and eGFR binarized_phenotypes<-binarize_all_pheno(phenotypes) #Plot the variables pheno_barplots(binarized_phenotypes)
List containing all the subsets of phenotypics variables used in the app
data("phenotypes_names")
data("phenotypes_names")
An object of class list
of length 5.
data("phenotypes_names")
data("phenotypes_names")
Function that calculates the Platt Calibrations
plattCalibration(r.calib, p.calib, nbins = 10, pl = FALSE)
plattCalibration(r.calib, p.calib, nbins = 10, pl = FALSE)
r.calib |
observed binary phenotype |
p.calib |
predicted probabilities |
nbins |
number of bins to create the plots |
pl |
logical indicating if the function should plot the Reliability diagram and histogram of the calibrations |
Many popular machine learning algorithms produce inaccurate predicted probabilities, especially when applied on a dataset different than the training set. Platt (1999) proposed an adjustment, in which the original probabilities are used as a predictor in a single-variable logistic regression to produce more accurate adjusted predicted probabilities. The function will also help the evaluation of the calibration, by plotting: reliability diagrams and distributions of the calibrated and non-calibrated probabilities. The reliability diagrams plots the mean predicted value within a certain range of posterior probabilities, against the fraction of accurately predicted values. Finally, we also report accuracy measures for the calibrations: the ECE, MCE and the Log-Loss of the probabilities before and after calibration.
list with samples, responses, calibrations, ECE, MCE and calibration plots if save==T
This is a function originally created for the package in eRic, under the name prCalibrate and modified ad hoc for our purposes (Github)
J. C. Platt, 'Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods', in Advances in Large Margin Classifiers, 1999, pp. 61-74.
library(stats) library(plotly) #load the dataset met <- synthetic_metabolic_dataset phen <- synthetic_phenotypic_dataset #Calculating the binarized surrogates b_phen<-binarize_all_pheno(phen) #Apply a surrogate models and plot the ROC curve surr<-calculate_surrogate_scores(met, phen,MiMIR::PARAM_surrogates, bin_names=colnames(b_phen)) #Calibration of the surrogate sex real_data<-as.numeric(b_phen$sex) pred_data<-surr$surrogates[,"s_sex"] plattCalibration(r.calib=real_data, p.calib=pred_data, nbins = 10, pl=TRUE)
library(stats) library(plotly) #load the dataset met <- synthetic_metabolic_dataset phen <- synthetic_phenotypic_dataset #Calculating the binarized surrogates b_phen<-binarize_all_pheno(phen) #Apply a surrogate models and plot the ROC curve surr<-calculate_surrogate_scores(met, phen,MiMIR::PARAM_surrogates, bin_names=colnames(b_phen)) #Calibration of the surrogate sex real_data<-as.numeric(b_phen$sex) pred_data<-surr$surrogates[,"s_sex"] plattCalibration(r.calib=real_data, p.calib=pred_data, nbins = 10, pl=TRUE)
Function creating plottig the correlation between 2 datasets, dat1 x dat2 on basis of (partial) correlations
plot_corply( res, main = NULL, zlim = NULL, reorder.x = FALSE, reorder.y = reorder.x, resort_on_p = FALSE, abs = FALSE, cor.abs = FALSE, reorder_dend = FALSE )
plot_corply( res, main = NULL, zlim = NULL, reorder.x = FALSE, reorder.y = reorder.x, resort_on_p = FALSE, abs = FALSE, cor.abs = FALSE, reorder_dend = FALSE )
res |
associations obtained with cor.assoc |
main |
title of the plot |
zlim |
max association to plot |
reorder.x |
logical indicating if the function should reorder the x axis based on clustering |
reorder.y |
logical indicating if the function should reorder the y axis based on clustering |
resort_on_p |
logical indicating if the function should reorder x and y axis based on the pvalues of the associations |
abs |
logical indicating if the function should reorder based the absolute values |
cor.abs |
logical indicating if the function should reorder the plot base on the absolute values |
reorder_dend |
Tlogical indicating if the function should reorder the plot based on dendrogram |
heatmap with the results of cor.assoc
cor_assoc
library(stats) #load the dataset m <- as.matrix(synthetic_metabolic_dataset) #Compute the pearson correlation of all the variables in the data.frame metabolic_measures cors<-cor_assoc(m, m, MiMIR::metabolites_subsets$MET63,MiMIR::metabolites_subsets$MET63) #Plot the correlations plot_corply(cors, main="Correlations metabolites")
library(stats) #load the dataset m <- as.matrix(synthetic_metabolic_dataset) #Compute the pearson correlation of all the variables in the data.frame metabolic_measures cors<-cor_assoc(m, m, MiMIR::metabolites_subsets$MET63,MiMIR::metabolites_subsets$MET63) #Plot the correlations plot_corply(cors, main="Correlations metabolites")
Function plotting information about missing & zero values on the indicated matrix.
plot_na_heatmap(dat)
plot_na_heatmap(dat)
dat |
The matrix or data.frame |
This heatmap indicates the available values in grey and missing or zeros in white. On the sides two bar plots on the sides, one showing the missingn or zero values per row and another to show the missing or zeroes per column.
Plot with a central heatmap and two histogram on the sides
library(graphics) library(MiMIR) #load the metabolites dataset metabolic_measures <- synthetic_metabolic_dataset #Plot the missing values in the metabolomics matrix plot_na_heatmap(metabolic_measures)
library(graphics) library(MiMIR) #load the metabolites dataset metabolic_measures <- synthetic_metabolic_dataset #Plot the missing values in the metabolomics matrix plot_na_heatmap(metabolic_measures)
Helper function to pre-process the Nightingale Health metabolomics data-set before applying the COVID score.
prep_data_COVID_score( dat, featID = c("gp", "dha", "crea", "mufa", "apob_apoa1", "tyr", "ile", "sfa_fa", "glc", "lac", "faw6_faw3", "phe", "serum_c", "faw6_fa", "ala", "pufa", "glycine", "his", "pufa_fa", "val", "leu", "alb", "faw3", "ldl_c", "serum_tg"), quiet = FALSE )
prep_data_COVID_score( dat, featID = c("gp", "dha", "crea", "mufa", "apob_apoa1", "tyr", "ile", "sfa_fa", "glc", "lac", "faw6_faw3", "phe", "serum_c", "faw6_fa", "ala", "pufa", "glycine", "his", "pufa_fa", "val", "leu", "alb", "faw3", "ldl_c", "serum_tg"), quiet = FALSE )
dat |
numeric data-frame with Nightingale-metabolomics |
featID |
vector of strings with the names of metabolic features included in the COVID-score |
quiet |
logical to suppress the messages in the console |
The Nightingale-metabolomics data-frame after pre-processing (checked for zeros, z-scaled and log-transformed) according to what has been done by the authors of the original papers.
This function is constructed to be able to follow the pre-processing steps described in: Nightingale Health UK Biobank Initiative et al. (2021) Metabolic biomarker profiling for identification of susceptibility to severe pneumonia and COVID-19 in the general population. eLife, 10, e63033, doi:10.7554/eLife.63033
prep_met_for_scores, covid_betas, comp_covid_score
require(MiMIR) require(matrixStats) #load the Nightignale metabolomics dataset metabolic_measures <- synthetic_metabolic_dataset #Prepare the metabolic features fo the mortality score prepped_met <- prep_data_COVID_score(dat=metabolic_measures)
require(MiMIR) require(matrixStats) #load the Nightignale metabolomics dataset metabolic_measures <- synthetic_metabolic_dataset #Prepare the metabolic features fo the mortality score prepped_met <- prep_data_COVID_score(dat=metabolic_measures)
Helper function to pre-process the Nightingale Health metabolomics data-set before applying the mortality, Type-2-diabetes and CVD scores.
prep_met_for_scores(dat, featID, plusone = FALSE, quiet = FALSE)
prep_met_for_scores(dat, featID, plusone = FALSE, quiet = FALSE)
dat |
numeric data-frame with Nightingale-metabolomics |
featID |
vector of strings with the names of metabolic features included in the score selected |
plusone |
logical to determine if a value of 1.0 should be added to all metabolic features (TRUE) or only to the ones featuring zeros before log-transforming (FALSE) |
quiet |
logical to suppress the messages in the console |
The Nightingale-metabolomics data-frame after pre-processing (checked for zeros, zscale and log-transformed) according to what has been done by the authors of the original papers.
This function is constructed to be able to follow the pre-processing steps described in: Deelen,J. et al. (2019) A metabolic profile of all-cause mortality risk identified in an observational study of 44,168 individuals. Nature Communications, 10, 1-8, doi:10.1038/s41467-019-11311-9.
Ahola-Olli,A.V. et al. (2019) Circulating metabolites and the risk of type 2 diabetes: a prospective study of 11,896 young adults from four Finnish cohorts. Diabetologia, 62, 2298-2309, doi:10.1007/s00125-019-05001-w
Wurtz,P. et al. (2015) Metabolite profiling and cardiovascular event risk: a prospective study of 3 population-based cohorts. Circulation, 131, 774-785, doi:10.1161/CIRCULATIONAHA.114.013116
comp.mort_score, mort_betas, comp.T2D_Ahola_Olli, comp.CVD_score
library(MiMIR) #load the Nightingale metabolomics dataset metabolic_measures <- synthetic_metabolic_dataset #Prepare the metabolic features fo the mortality score prepped_met <- prep_met_for_scores(metabolic_measures,featID=MiMIR::mort_betas$Abbreviation)
library(MiMIR) #load the Nightingale metabolomics dataset metabolic_measures <- synthetic_metabolic_dataset #Prepare the metabolic features fo the mortality score prepped_met <- prep_met_for_scores(metabolic_measures,featID=MiMIR::mort_betas$Abbreviation)
Helper function to pre-process the Nightingale Health metabolomics data-set before applying the MetaboAge score by van den Akker et al.
QCprep(mat, PARAM_metaboAge, quiet = TRUE, Nmax_zero = 1, Nmax_miss = 1)
QCprep(mat, PARAM_metaboAge, quiet = TRUE, Nmax_zero = 1, Nmax_miss = 1)
mat |
numeric data-frame NH-metabolomics matrix. |
PARAM_metaboAge |
list containing all the parameters to compute the metaboAge (metabolic features list,BBMRI-nl means and SDs of the metabolic features, and coefficients) |
quiet |
logical to suppress the messages in the console |
Nmax_zero |
numberic value indicating the maximum number of zeros allowed per sample (Number suggested=1) |
Nmax_miss |
numberic value indicating the maximum number of missing values allowed per sample (Number suggested=1) |
Nightingale-metabolomics data-frame after pre-processing (checked for zeros, missing values, samples>5SD from the BBMRI-mean, imputing the missing values and z-scaled)
This function is constructed to be able to follow the pre-processing steps described in: van den Akker Erik B. et al. (2020) Metabolic Age Based on the BBMRI-NL 1H-NMR Metabolomics Repository as Biomarker of Age-related Disease. Circulation: Genomic and Precision Medicine, 13, 541-547, doi:10.1161/CIRCULATIONAHA.114.013116
apply.fit
library(MiMIR) #load the Nightignale metabolomics dataset metabolic_measures <- synthetic_metabolic_dataset #Pre-process the metabolic features prepped_met<-QCprep(as.matrix(metabolic_measures[,metabolites_subsets$MET63]), PARAM_metaboAge)
library(MiMIR) #load the Nightignale metabolomics dataset metabolic_measures <- synthetic_metabolic_dataset #Pre-process the metabolic features prepped_met<-QCprep(as.matrix(metabolic_measures[,metabolites_subsets$MET63]), PARAM_metaboAge)
Helper function to pre-process the Nightingale Health metabolomics data-set before applying metabolomics-based surrogates by Bizzarri et al.
QCprep_surrogates( mat, PARAM_surrogates, Nmax_miss = 1, Nmax_zero = 1, quiet = FALSE )
QCprep_surrogates( mat, PARAM_surrogates, Nmax_miss = 1, Nmax_zero = 1, quiet = FALSE )
mat |
numeric data-frame Nightingale metabolomics matrix. |
PARAM_surrogates |
is a list holding the parameters to compute the surrogates |
Nmax_miss |
numeric value indicating the maximum number of missing values allowed per sample (Number suggested=1) |
Nmax_zero |
numeric value indicating the maximum number of zeros allowed per sample (Number suggested=1) |
quiet |
logical to suppress the messages in the console |
Bizzarri et al. built multivariate models,using 56 metabolic features quantified by Nightingale, to predict the 19 binary characteristics of an individual. The binary variables are: sex, diabetes status, metabolic syndrome status, lipid medication usage, blood pressure lowering medication, current smoking, alcohol consumption, high age, middle age, low age, high hsCRP, high triglycerides, high ldl cholesterol, high total cholesterol, low hdl cholesterol, low eGFR, low white blood cells, low hemoglobin levels.
Nightingale-metabolomics data-frame after pre-processing (checked for zeros, missing values, samples>5SD from the BBMRI-mean, imputing the missing values and z-scaled)
This function was made to vidualize the binarized variables calculated following the rules indicated in the article: Bizzarri,D. et al. (2022) 1H-NMR metabolomics-based surrogates to impute common clinical risk factors and endpoints. EBioMedicine, 75, 103764, doi:10.1016/j.ebiom.2021.103764
binarize_all_pheno
library(MiMIR) #load the Nightignale metabolomics dataset metabolic_measures <- synthetic_metabolic_dataset #Pre-process the metabolic features prepped_met<-QCprep_surrogates(as.matrix(metabolic_measures), MiMIR::PARAM_surrogates)
library(MiMIR) #load the Nightignale metabolomics dataset metabolic_measures <- synthetic_metabolic_dataset #Pre-process the metabolic features prepped_met<-QCprep_surrogates(as.matrix(metabolic_measures), MiMIR::PARAM_surrogates)
Function that creates a ROC curve of the selected metabolic surrogates as a plotly image
roc_surro(surrogates, bin_phenotypes, x_name)
roc_surro(surrogates, bin_phenotypes, x_name)
surrogates |
numeric data.frame of metabolomics-based surrogate values by Bizzarri et al. |
bin_phenotypes |
logic data.frame of binarized phenotypes |
x_name |
vector of strings with the names of the selected binary phenotypes for the roc |
plotly image with the ROC curves for one or more selected variables
require(pROC) require(plotly) require(foreach) require(MiMIR) #load the dataset met <- synthetic_metabolic_dataset phen<- synthetic_phenotypic_dataset #Calculating the binarized surrogates b_phen<-binarize_all_pheno(phen) #Apply a surrogate models and plot the ROC curve surr<-calculate_surrogate_scores(met, phen, MiMIR::PARAM_surrogates, colnames(b_phen)) #Plot the ROC curves roc_surro(surr$surrogates, b_phen, "sex")
require(pROC) require(plotly) require(foreach) require(MiMIR) #load the dataset met <- synthetic_metabolic_dataset phen<- synthetic_phenotypic_dataset #Calculating the binarized surrogates b_phen<-binarize_all_pheno(phen) #Apply a surrogate models and plot the ROC curve surr<-calculate_surrogate_scores(met, phen, MiMIR::PARAM_surrogates, colnames(b_phen)) #Plot the ROC curves roc_surro(surr$surrogates, b_phen, "sex")
Function that plots the ROCs of the surrogates of all the available surrogate models as plotly sub-plots
roc_surro_subplots(surrogates, bin_phenotypes)
roc_surro_subplots(surrogates, bin_phenotypes)
surrogates |
numeric data.frame containing the surrogate values by Bizzarri et al. |
bin_phenotypes |
numeric data.frame with the binarized phenotypes output of binarize_all_pheno |
plotly image with all the ROCs for all the available clinical variables
library(pROC) library(plotly) library(MiMIR) #load the dataset met <- synthetic_metabolic_dataset phen<- synthetic_phenotypic_dataset #Calculating the binarized surrogates b_phen<-binarize_all_pheno(phen) #Apply a surrogate models and plot the ROC curve surr<-calculate_surrogate_scores(met, phen, MiMIR::PARAM_surrogates, colnames(b_phen)) roc_surro_subplots(surr$surrogates, b_phen)
library(pROC) library(plotly) library(MiMIR) #load the dataset met <- synthetic_metabolic_dataset phen<- synthetic_phenotypic_dataset #Calculating the binarized surrogates b_phen<-binarize_all_pheno(phen) #Apply a surrogate models and plot the ROC curve surr<-calculate_surrogate_scores(met, phen, MiMIR::PARAM_surrogates, colnames(b_phen)) roc_surro_subplots(surr$surrogates, b_phen)
Function to visualize a scatter-plot comparing two variables
scatterplot_predictions(x, p, title, xname = "x", yname = "predicted x")
scatterplot_predictions(x, p, title, xname = "x", yname = "predicted x")
x |
numeric vector |
p |
second numeric vector |
title |
string vector with the title |
xname |
string vector with the name of the variable on the x axis |
yname |
string vector with the name of the variable on the y axis |
plotly image with the scatterplot
library(plotly) #load the dataset metabolic_measures <- synthetic_metabolic_dataset phenotypes <- synthetic_phenotypic_dataset #Pre-process the metabolic features prepped_met<-QCprep(as.matrix(metabolic_measures), MiMIR::PARAM_metaboAge) #Apply the metaboAge metaboAge<-apply.fit(prepped_met, FIT=PARAM_metaboAge$FIT_COEF) age<-data.frame(phenotypes$age) rownames(age)<-rownames(phenotypes) scatterplot_predictions(age, metaboAge, title="Chronological Age vs MetaboAge")
library(plotly) #load the dataset metabolic_measures <- synthetic_metabolic_dataset phenotypes <- synthetic_phenotypic_dataset #Pre-process the metabolic features prepped_met<-QCprep(as.matrix(metabolic_measures), MiMIR::PARAM_metaboAge) #Apply the metaboAge metaboAge<-apply.fit(prepped_met, FIT=PARAM_metaboAge$FIT_COEF) age<-data.frame(phenotypes$age) rownames(age)<-rownames(phenotypes) scatterplot_predictions(age, metaboAge, title="Chronological Age vs MetaboAge")
Start the application MiMIR.
startApp(launch.browser = TRUE)
startApp(launch.browser = TRUE)
launch.browser |
TRUE/FALSE |
This function starts the R-Shiny tool called MiMIR (Metabolomics-based Models for Imputing Risk), a graphical user interface that provides an intuitive framework for ad-hoc statistical analysis of Nightingale Health's 1H-NMR metabolomics data and allows for the projection and calibration of 24 pre-trained metabolomics-based models, without any pre-required programming knowledge.
Opens application. If launch.browser
=TRUE in default web browser
Deelen,J. et al. (2019) A metabolic profile of all-cause mortality risk identified in an observational study of 44,168 individuals. Nature Communications, 10, 1-8, doi: 10.1038/s41467-019-11311-9. Ahola-Olli,A.V. et al. (2019) Circulating metabolites and the risk of type 2 diabetes: a prospective study of 11,896 young adults from four Finnish cohorts. Diabetologia, 62, 2298-2309, doi: 10.1007/s00125-019-05001-w Wurtz,P. et al. (2015) Metabolite profiling and cardiovascular event risk: a prospective study of 3 population-based cohorts. Circulation, 131, 774-785, doi: 10.1161/CIRCULATIONAHA.114.013116 Bizzarri,D. et al. (2022) 1H-NMR metabolomics-based surrogates to impute common clinical risk factors and endpoints. EBioMedicine, 75, 103764, doi: 10.1016/j.ebiom.2021.103764 van den Akker Erik B. et al. (2020) Metabolic Age Based on the BBMRI-NL 1H-NMR Metabolomics Repository as Biomarker of Age-related Disease. Circulation: Genomic and Precision Medicine, 13, 541-547, doi:10.1161/CIRCGEN.119.002610
Data.frame containing a synthetic dataset of the Nightingale Metabolomics dataset created with the package synthpop from the LLS_PAROFF dataset.
data("synthetic_metabolic_dataset")
data("synthetic_metabolic_dataset")
An object of class data.frame
with 500 rows and 229 columns.
M. Schoenmaker et al., 'Evidence of genetic enrichment for exceptional survival using a family approach: the Leiden Longevity Study', Eur. J. Hum. Genet., vol. 14, no. 1, Art. no. 1, Jan. 2006, doi:10.1038/sj.ejhg.5201508 B. Nowok, G. M. Raab, and C. Dibben, 'synthpop: Bespoke Creation of Synthetic Data in R', J. Stat. Softw., vol. 74, no. 1, Art. no. 1, Oct. 2016, doi:10.18637/jss.v074.i11
data("synthetic_metabolic_dataset")
data("synthetic_metabolic_dataset")
Data.frame containing a synthetic dataset of phenotypic dataset created with the package synthpop from the LLS_PAROFF dataset.
data("synthetic_metabolic_dataset")
data("synthetic_metabolic_dataset")
An object of class data.frame
with 500 rows and 24 columns.
M. Schoenmaker et al., 'Evidence of genetic enrichment for exceptional survival using a family approach: the Leiden Longevity Study', Eur. J. Hum. Genet., vol. 14, no. 1, Art. no. 1, Jan. 2006, doi:10.1038/sj.ejhg.5201508 B. Nowok, G. M. Raab, and C. Dibben, 'synthpop: Bespoke Creation of Synthetic Data in R', J. Stat. Softw., vol. 74, no. 1, Art. no. 1, Oct. 2016, doi:10.18637/jss.v074.i11
data("synthetic_metabolic_dataset")
data("synthetic_metabolic_dataset")
#' Function that creates a boxplot with a continuous variable split using the binary variable
ttest_scores(dat, pred, pheno)
ttest_scores(dat, pred, pheno)
dat |
The data.frame containing the 2 variables |
pred |
character indicating the y variable |
pheno |
character indicating the binary variable |
plotly boxplot with the continuous variable split using the binary variable
library(MiMIR) library(plotly) #load the dataset metabolic_measures <- synthetic_metabolic_dataset phenotypes <- synthetic_phenotypic_dataset #Compute the mortality score mortScore<-comp.mort_score(metabolic_measures,quiet=TRUE) dat<-data.frame(predictor=mortScore, pheno=phenotypes$sex) colnames(dat)<-c("predictor","pheno") ttest_scores(dat = dat, pred= "mortScore", pheno="sex")
library(MiMIR) library(plotly) #load the dataset metabolic_measures <- synthetic_metabolic_dataset phenotypes <- synthetic_phenotypic_dataset #Compute the mortality score mortScore<-comp.mort_score(metabolic_measures,quiet=TRUE) dat<-data.frame(predictor=mortScore, pheno=phenotypes$sex) colnames(dat)<-c("predictor","pheno") ttest_scores(dat = dat, pred= "mortScore", pheno="sex")
Function that calculates a t-test and a plotly image of the selected surrogates
ttest_surrogates(surrogates, bin_phenotypes)
ttest_surrogates(surrogates, bin_phenotypes)
surrogates |
numeric data.frame containing the surrogate values by Bizzarri et al. |
bin_phenotypes |
numeric data.frame with the binarized phenotypes output of binarize_all_pheno |
Barplot and T-test indicating if the surrogate variables could split accordingly the real value of the binary clinical variables.
plotly image with all the ROCs for all the available clinical variables
require(pROC) require(plotly) require(MiMIR) require(foreach) #load the dataset m <- synthetic_metabolic_dataset p <- synthetic_phenotypic_dataset #Calculating the binarized surrogates b_p<-binarize_all_pheno(p) #Apply a surrogate models and plot the ROC curve surr<-calculate_surrogate_scores(met=m, pheno=p, MiMIR::PARAM_surrogates, bin_names=colnames(b_p)) ttest_surrogates(surr$surrogates, b_p)
require(pROC) require(plotly) require(MiMIR) require(foreach) #load the dataset m <- synthetic_metabolic_dataset p <- synthetic_phenotypic_dataset #Calculating the binarized surrogates b_p<-binarize_all_pheno(p) #Apply a surrogate models and plot the ROC curve surr<-calculate_surrogate_scores(met=m, pheno=p, MiMIR::PARAM_surrogates, bin_names=colnames(b_p)) ttest_surrogates(surr$surrogates, b_p)