DeepArk

What is DeepArk?

DeepArk is a set of deep learning algorithms capable of predicting regulatory activity (e.g. transcription factor binding) from genomic sequences. DeepArk uses distinct neural networks for mouse (Mus musculus), fly (Drosophila melanogaster), worm (Caenorhabditis elegans), and zebrafish (Danio rerio). By comparing the regulatory activity of two different sequences, DeepArk can also be use to predict how genomic variants may alter regulatory activity. Examples and advice on how to use the DeepArk web server are available on the help page. Additional information about DeepArk's various applications can also be found on the help page and in our publication on DeepArk.

Citation

If you use DeepArk in your publication, please cite it as follows:

Cofer, E.M., Raimundo, J., Tadych, A., Yamazaki, Y., Wong, A.K., Theesfeld, C.L., Levine, M.S., & Troyanskaya, O.G. DeepArk: modeling cis-regulatory codes of model species with deep learning. Genome Research (2021). publication link

Network architecture

DeepArk uses four separate convolutional neural networks to make predictions for mouse, fly, worm, and zebrafish. While the learned weights in each network are different, the network architecture is largely the same. The basic unit of the DeepArk network is the "convolutional block", which includes a convolutional layer with batch normalization, a rectified linear unit (ReLU) activation function, and a channel-wise spatial dropout layer. These convolutional blocks can extract complex motifs from the input sequence, and also identify interactions among motifs. The convolutional layer has C_in input channels, C_out output channels, and a kernel size of K. A diagram of the convolutional blocks is shown below.

a visualization of the convolutional block used in DeepArk

Interleaving the convolutional blocks are maximum pooling layers, which reduce the dimensionality of the input sequence and add spatial invariance. The pooling layers have a kernel size of K and stride of S. A diagram of the pooling block is shown below.

a visualization of the pooling block used in DeepArk

The complete architecture is shown below. Note that the last fully-connected layer uses a sigmoid activation function to convert the output predictions to probabilities between zero and one. A diagram of the complete DeepArk architecture is shown below.

a visualization of the DeepArk architecture