Biojava - open source biological information base library

zhaozj2021-02-08  230

Self-awake point of view autoasm.blog-city.com

Biojava - open source biological information base library

Alex Dou

Autoasm@yahoo.com

What is biological information?

Bioinformatics, this is indeed a cool name.

From literally, this is a discipline that is related to the two today's hotspots of life sciences and information sciences.

What is biological information? Here, I can only regret to tell you that I am hard to give a precise definition for such an emerging and discipline in such an emerging and changing.

Generally speaking, bioinformatics engage in acquisition, processing, storage, distribution, analysis and explanation of genome research related biological information. This definition includes two-layer meaning, one is to collect, organize and serve the massive data, that is, to manage these data; the other is from which new laws are found, that is, use this data.

Specifically, bioinformatics is to analyze the genomic DNA sequence information as a source, find a coding region representing a protein and an RNA gene in a genomic sequence; at the same time, it is possible to clarify the information substance in a large number of non-encoding regions in the genome, and decipher the hidden in DNA sequence. Genetic language law; on this basis, summarize, independent of the transcription spectrum and protein spectrum of genomic genetic information and its regulation-related transcription spectroscopy, and understand the law of metabolism, development, differentiation, evolution.

The development and application of information technology has benefited almost all people, and the molecular biologists of DNA, RNA and protein are no exception. It is difficult to believe that leaving information technology, molecular biologists can complete the sequencing of human genome (in fact, the bird gun law used in large-scale sequencing doesly depend on calculation technology), if there is no information technology and calculation of molecular biology theory support Studying SARS viralists are also impossible to judge the type of SARS virus in a short time (unless they can find regular arrangements from several millions of AGCTs.).

The nature of biological information is used to support the research and development of life sciences.

Biojava Introduction

Biological information is facing many challenges in theory and engineering. To develop a complex biological sequence analysis system, some basic libraries are required, and Biojava is such a base library.

Biojava is a base bank developed using Java language for analysis and representing biological sequences such as DNA, RNA, and proteins. BioJava provides biological sequence processing (such as transcription and translation), file format conversion function and some simple scientific calculations (such as hidden Markov model).

Readers can get more Biojava from http://www.biojava.org. In addition, I thanked Wu Xin (transliteration) of the Beijing University Biological Information Center. He translated Biojava's entry document into Chinese.

Transcription - a simple example

Most organisms use DNA to express genetic information (such as SARS viruses and Jes virus using RNA representing genetic information). However, direct guidance synthesis proteins is indeed a letter to make RNA. In molecular biology, the process from DNA to RNA during replication is referred to as transcription (similar, from RNA to DNA), the reverse transcription process usually appears in RNA represented by SARS virus and Az virus. During the viral replication), as shown in Figure 1.

Figure 1. Transcription and translation

The following code uses the BioJava library to get the RNA sequence corresponding to a DNA sequence.

Import org.biojava.bio.symbol. *;

Import org.biojava.bio.seq. *;

Public class transcribednatorna {

Public static void main (String [] args) {

Try {

// Make a DNA Symbollisticsymbollist Syml = Dnatools.createdNA ("AtgccGaatcgtaa");

// Transcribe it to RNA

Syml = rnatools.transcribe (SYML);

// Just to Prove It Worked

System.out.println (Syml.SEqstring ());

}

Catch (Illegalsymboexception ex) {

// this will happen if you try and make the Dna SEQ Using Non Iub Symbols

EX.PrintStackTrace ();

}

Catch (IllegalPhabetexception EX) {

// this will happen if you try and transcribe a non DNA Symbollist

EX.PrintStackTrace ();

}

}

}

In the above code, we first created a DNA sequence object, which is atgccgaatcgtaa. Then call the Rnatools.Transcribe () method to get the transcribed RNA sequence. Since the DNA sequence and the RNA sequence in the transcription process are corresponding, and the symbol mapping relationship between the two character sets is also determined, so the implementation of this method is very simple. Of course, BioJava also provides more complex features such as HMMER, and interested readers can refer to the Biojava website.

Related reading materials

If you are a software engineer and interested in biological information, you may need to know more of more molecular genetics. The "genome" of the Science Press is a good choice. The photocopy version of "molecular biology" is also quite good, the premise is that you have to prepare for the professional vocabulary. The whole American classic "molecular biology and cell biology" is a good reference book.

If you have the background of life sciences, you may need to master more computer science skills. "Computer technology in biological information" is a good entry material.

In addition, "Computing Molecular Biology Airbus is this pretty classic theory, but the readers of the book need to have certain computer science theory knowledge.

转载请注明原文地址:https://www.9cbs.com/read-1065.html

New Post(0)