DeepArk

Upload a FASTA file for regulatory feature prediction. FASTA format:

  • Each section starts with a sequence name line starting with ">" and followed by the sequence name.
  • Each sequence name must be unique.
  • The sequence must have a length of 4095 bp, as this is the input sequence length for DeepArk.
  • The sequence may not use any letters besides "A", "C", "T", "G", "N", "a", "c", "t", "g", or "n".
Additional help and example files.

Upload a BED file for regulatory feature prediction. BED format:

  • Each line has at least three tab-separated fields.
  • The first column is the chromosome name.
  • The second column is the start position on the chromosome.
  • The third column is the end position on the chromosome.
  • The difference between the start and end position should be 4095 bp, as this is the input length for DeepArk.
  • The user may include a fourth column, which is the name for the entry on the given line. If the name is left empty or specified as ".", then the entry is considered nameless. However, if a name is provided, it must be a unique name.
Additional help and example files.

Upload a VCF file for variant effect prediction. VCF format:

  • Each line represents a single variant.
  • Each line has at least five tab-separated fields.
  • The first column is the chromosome name.
  • The second column is the start position of the variant.
  • The third column is the variant name. A variant is considered unnamed if the name is specified as ".". Variants that are not unnamed must have unique names.
  • The fourth column is the reference allele.
  • The fifth column is the variant allele.
  • The reference and variant alleles may not contain any letters besides "A", "C", "T", "G", "a", "c", "t", or "g".
Additional help and example files.

Upload a FASTA file for regulatory activity profiling. FASTA format:

  • Each section starts with a sequence name line starting with ">" and followed by the sequence name.
  • Each sequence name must be unique.
  • The sequence must have a length of 4095 bp, as this is the input sequence length for DeepArk.
  • The sequence may not use any letters besides "A", "C", "T", "G", "N", "a", "c", "t", "g", or "n".
Additional help and example files.

Upload a BED file for regulatory activity profiling. BED format:

  • Each line has at least three tab-separated fields.
  • The first column is the chromosome name.
  • The second column is the start position on the chromosome.
  • The third column is the end position on the chromosome.
  • The difference between the start and end position should be 4095 bp, as this is the input length for DeepArk.
  • The user may include a fourth column, which is the name for the entry on the given line. If the name is left empty or specified as ".", then the entry is considered nameless. However, if a name is provided, it must be a unique name.
Additional help and example files.