关于剑灵私服dat结构的信息
今天给大家复现一篇4.4分的免疫细胞孟德尔随机化文章,这篇文章发表在如下期刊上:
这种孟德尔随机化跟肠道菌群孟德尔随机化非常类似,暴露细分成上百种,工作量非常大,例如这里的免疫细胞就有731种,工作量肯定是OK的,哪怕你是做大论文的,都没有啥问题。
现在讲一下如何复现这篇文章,作者研究的是免疫细胞与精神分裂症的关联性,那首先就要下载这731种免疫细胞的GWAS数据,这些数据就放在GWAS Catalog数据库上,可以直接在这个网址上下载:
http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/,找到编号GCST90001391 到 GCST90002121的数据就是这731种免疫细胞的GWAS数据。
把对应的数据下载下来即可,然后就找到结局精神分裂症的GWAS数据,剩下就是普通双样本孟德尔随机化的事情了,具体代码如下:
install.packages("devtools")
展开全文
devtools::install_github("MRCIEU/TwoSampleMR")
#install.packages("ggplot2")
#library
library(TwoSampleMR)
library(ggplot2)
setwd("")
###本地读取暴露与结局数据
expo_rt<- read_exposure_data(
filename = "GCST9001391.tsv",
sep = "\t",
snp_col = "SNP",
beta_col = "beta",
se_col = "standard_error",
effect_allele_col = "effect_allele",
other_allele_col = "other_allele",
eaf_col = "effect_allele_frequency",
pval_col = "p_value"
#data filter
expo_rt<- expo_rt[expo_rt$pval.exposure < 1e-5,]
expo_rt <- clump_data(expo_rt,clump_kb = 10000,
clump_r2 = 0.001)
#本地读取结局数据
outc_rt <- read_outcome_data(
snps = expo_rt$SNP,
filename = "schizophrenia.txt",
sep = "\t",
snp_col = "rsids2",
beta_col = "beta",
se_col = "standard_error",
effect_allele_col = "effect_allele",
other_allele_col = "other_allele",
eaf_col = "effect_allele_frequency",
pval_col = "p_value")
#harmonise and merge data
harm_rt <- harmonise_data(
exposure_dat = expo_rt,
outcome_dat = outc_rt,action=2)
write.table(harm_rt, "harmonise.txt",row.names = F,sep = "\t",quote = F)
#mendelian randomization(MR) analysis
mr_result<- mr(harm_rt)
View(mr_result)
OR=generate_odds_ratios(mr_result)
write.table(OR[,5:ncol(OR)],"OR.txt",row.names = F,sep = "\t",quote = F)
#select relevant statistical methods
mr_method_list()
my_mr_result<- mr(harm_rt,method_list = c("mr_ivw","mr_egger_regression"))
View(my_mr_result)
OR2=generate_odds_ratios(my_mr_result)
write.table(OR2[,5:ncol(OR2)],"OR2.txt",row.names = F,sep = "\t",quote = F)
#heterogeneity test
mr_heterogeneity(harm_rt)
#outlier test
run_mr_presso(harm_rt,NbDistribution = 1000)
#pleiotropy test
mr_pleiotropy_test(harm_rt)
#obtain the beta for each SNP
singlesnp_res<- mr_singlesnp(harm_rt)
View(singlesnp_res)
singlesnpOR=generate_odds_ratios(singlesnp_res)
write.table(singlesnpOR,"singlesnpOR.txt",row.names = F,sep = "\t",quote = F)
#sensitivity analysis
sen_res<- mr_leaveoneout(harm_rt)
View(sen_res)
#Scatter plots of several statistical methods
p1 <- mr_scatter_plot(my_mr_result, harm_rt)
p1[[1]]
ggsave(p1[[1]], file="scatter.pdf", width=8, height=8)
#forest plot
p2 <- mr_forest_plot(singlesnp_res)
p2[[1]]
ggsave(p2[[1]], file="forest.pdf", width=8, height=8)
#sensitivity analysis plot
p3 <- mr_leaveoneout_plot(sen_res)
p3[[1]]
ggsave(p3[[1]], file="sensitivity analysis.pdf", width=8, height=8)
#funnel plot
res_single <- mr_singlesnp(harm_rt)
p4 <- mr_funnel_plot(singlesnp_res)
p4[[1]]
ggsave(p4[[1]], file="funnel plot.pdf", width=8, height=8)
然后可以再做一下反向孟德尔随机化即可,也就是暴露与结局互换位置,代码也是上面的代码,就是代码中暴露与结局的位置互换,这里就不重复写上来了。
最后把阳性结局汇总起来的,直接绘制森林图即可,森林图可以使用下面的代码,当然你使用其他的也可以。
library(grid)
library(forestploter)
setwd("")
mydata=read.table("allresult.txt",header = T,sep = "\t")
mydata$`OR (95% CI)` <- ifelse(is.na(mydata$or), "",sprintf("%.4f (%.4f - %.4f)",
mydata$or, mydata$or_lci95,
mydata$or_uci95))
forest(mydata[,c(1:4,8:9)],
est = mydata$or,
lower =mydata$or_lci95,
upper = mydata$or_uci95,
sizes =0.3,
ci_column =5 ,
ref_line = 1,
xlim = c(0.05, 3),
theme = tm3)
如果直接这样做就太累了,毕竟这里有731种免疫细胞,就下载数据、转换id都要很久很久很久,如果直接写循环批量运行的话,普通电脑直接卡死,甚至CPU直接冒烟都有可能,没有大型服务器根本就行不通。当然还有更加简便的方法,也就是普通电脑也可以做的方法,大家可以自行摸索一下。
版权声明
本文仅代表作者观点,不代表百度立场。
本文系作者授权百度百家发表,未经许可,不得转载。