NGS-based methods and Data Science

【NGS with Data Science】Use bioinfokit to make a volcano plot

In this article, I will explain how to use Python to draw a Volcano plot. The Volcano plot was a standard tool for illustrating the differential analysis of gene expression analysis during the downstream of quantitative analysis, which was a special type of scatter plot. The x-axis of the Volcano plot was the log2 Fold change of two conditions, and the y-axis was the negative value of the negative logarithm base 10 of the p-value. Each data point represents a gene or other biological feature being tested. In RNA-Seq, each data point would be the genes.

這裡介紹使用Python的套件bioinfokit來繪製火山圖,首先我們需要下載範例資料以及安裝bioinfokit :

pip install bioinfokit

wget https://zenodo.org/record/2529117/files/limma-voom_luminalpregnant-luminallactate
wget https://zenodo.org/record/2529117/files/volcano_genes

稍微看一下資料的情況

head limma-voom_luminalpregnant-luminallactate

輸出:

ENTREZID	SYMBOL	GENENAME	logFC	AveExpr	t	P.Value	adj.P.Val
12992	Csn1s2b	casein alpha s2-like B	-8.603611114762	3.56295004142591	-43.7964980711435	3.83064977005569e-15	6.05395889659601e-11
13358	Slc25a1	solute carrier family 25 (mitochondrial carrier, citrate transporter), member 1	-4.12417532129173	5.77969894403042	-29.9078492752674	1.75859473379618e-13	1.38964155864574e-09
11941	Atp2b2	ATPase, Ca++ transporting, plasma membrane 2	-7.38698638678659	1.28214314739647	-27.8194991746381	4.83636254056037e-13	2.43279979019347e-09
20531	Slc34a2	solute carrier family 34 (sodium phosphate), member 2	-4.17781242057656	4.27862903538987	-27.0727230566646	6.15742796809282e-13	2.43279979019347e-09
100705	Acacb	acetyl-Coenzyme A carboxylase beta	-4.3143199499725	4.4409137064501	-25.2235657685746	1.4999774593805e-12	4.74112875360987e-09
13645	Egf	epidermal growth factor	-5.36266382024579	0.73590465313728	-24.5993025952199	2.11624444834827e-12	5.57418787694935e-09
230810	Slc30a2	solute carrier family 30 (zinc transporter), member 2	-3.20311813582619	2.69581147606106	-23.80427759327	3.02466813499713e-12	6.82883645792781e-09
68801	Elovl5	ELOVL family member 5, elongation of long chain fatty acids (yeast)	-2.86330403668687	6.45520453127796	-22.3535751591657	6.5987442593808e-12	1.30358192844068e-08
19659	Rbp1	retinol binding protein 1, cellular	5.44304402150002	6.10703318183881	21.0523576693295	1.47914323855668e-11	2.36474559807046e-08

limma-voom_luminalpregnant-luminallactate是一個tsv格式的下游分析結果。縱向是基因,橫向則是每個基因的變化量(這個案例是logFC),以及P值和校正後P值(adj.P.Val)

再看一下volcano_genes

head volcano_genes
GeneID
Mcl1
Hbegf
Tgfb2
Cxcl16
Csf1
Pdgfb
Edn1
Lif
Kitl

它是等一下要標註上名稱的候選基因名稱的列表

接著我們使用pandas把tsv讀成dataframe,並傳入visuz.GeneExpression.volcano這個函式就可以了

from bioinfokit import analys, visuz
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv("limma-voom_luminalpregnant-luminallactat", sep='\t')

with open('volcano_genes') as f:
    genes = f.read().splitlines()


df.dropna(inplace=True)
visuz.GeneExpression.volcano(df=df, lfc='logFC', pv='adj.P.Val', lfc_thr=(1, 1), sign_line=True, plotlegend=True, legendpos='upper right', legendanchor=(
    1.46, 1), dotsize=0.1, gfont=7, xlm=(-10.0, 10.0, 1), pv_thr=(0.05, 0.05), genenames=tuple(genes), geneid="SYMBOL")

執行之後會在程式得得目錄下輸出圖檔

以上就是繪製火山圖的介紹。

Leave a Reply

Your email address will not be published. Required fields are marked *

en_USEnglish