Center for Bioinformatics
Oxidoreductases | Transferases | Hydrolases | Lyases | Isomerases | Ligases

Pipeline to construct RLEdb

Construction of rate-limiting enzyme regulation database for Human, Rat, Mouse, Yeast and E. coli involve four main steps: curation rate-limiting enzymes from published literatures; mapping enzymes to genes and proteins to assure the enzyme existence; curation regulatory information from published literatures; automatic annotation and update from Intenz,KEGG,Uniprot,Entrez Gene and other related resource.
The primary aim of the database is to support biochemical research by maintaining a high quality rate-limiting enzyme regulation database that serves as a comprehensive, fully classified, richly and accurately annotated enzyme regulation knowledgebase, with extensive cross-references and querying interfaces freely accessible to the scientific community.

Curation rate-limiting enzymes from literature:

Curation about rate-limiting enzymes include five steps before final submission to RLEdb: exhaustive searching for relevant abstracts from Entrez based on all the enzyme code and names, grouping the downloaded abstracts by their topics, curation of description about the rate-limiting enzymes, mapping the enzyme name to right enzyme code and recording the organism information. As the enzyme name and organism information are curated separately, providing multiple opportunities to verify the accuracy of the final information.


exhaustive search:
We first searched PubMed using the key words "rate-limiting enzyme" and retrieved 15688 abstracts on 22th,Nov 2006. Next, all the enzyme names and enzyme code are extracted from IntEnz database, the exhaustive search for each enzyme are performed using enzyme name, enzyme code and "rate-limiting". Combined all the results and download all the abstracts in Medline format for parse.


group abstracts:
All the downloaded abstracts are grouped based on topic according to related articles provided by the Entrez. This allows us, quickly and easily, to assess if and how the searched enzyme code are highly related with rate-limiting enzyme. And also allows us to access  if and how some references relates to other highly confirmed references about rate-limiting enzyme description.


curation:
This step is to read the abstracts, assess the information given and add relevant comments and features to the entry. Often from reading the abstract of the paper and analyzing, we can see that the described enzyme belongs to rate-limiting enzymes. In these cases, care is taken to look at other references about the same enzyme from same organisms. The description line for rate-limiting role are added to the new entry. If any negative reports about the enzyme from certain organisms occur, the enzyme is regarded as not rate-limiting enzyme. Take abstract 2546326 as an example. "The oxidation of mitochondrial GSH, induced by 7-hydroxymethyl-12-MBA, was not dramatically enhanced by the inactivation of GSH reductase, indicating that this enzyme was not rate-limiting in the regeneration of GSH".


enzyme name:
The important step in the process of curating an article is choosing a enzyme code based on the enzyme name description, which will serve as the initial information to collect gene and protein information. Often enzyme code are provided by MeSH annotation for a abstract. Much care is taken to some deleted or transferred enzyme code provided by MeSH annotation. Take 1.14.14.5 as an example, which are transferred to 1.14.14.1 in current IntEnz database. If enzyme code are not provided, we search the name exhaustively in IntEnz, KEGG ligand, BRENDA and Swissport database, and assign their appropriate enzyme code.


Organism:
We only collect rate-limiting enzymes in reported organisms as rate-limiting role could not be directly projected between different organisms. For example (9373878), cyclooxygenases are reported to be not rate-limiting in guinea pigs, although there are thousands literatures to support that cyclooxygenases are rate-limiting enzymes in human. For the same reason, we also only collect precise regulation information for the reported organisms, not directly projecting to other organisms. Another example could find in 3021448.

Gene and protein mapping:

Gene and protein mapping are performed using KEGG ligand and IntEnz database after collected rate-limiting enzymes from five model organisms.  Two aims are in this step. One is to validate the existence of the reported rate-limiting enzymes, the other is to get mapped references for rate-limiting enzyme genes and prepare literatures for regulation information collection.

For the enzyme with no gene or protein evident, care is taken to assure their existence. The 6.3.5.5 from rat are reported to be rate-limiting, however, we could not find any uniprot or gene in IntEnz and KEGG ligand database.

GeneRifs are gene annotation from literature provided by the staffs of the National Library of Medicines Index Section, who have advanced degrees in the life sciences. The related GeneRif records are collected for each rate-limiting enzyme according to their Entrez-Gene.

Regulatory information collection:

Three types regulatory information are mainly collected including upstream transcription factors, phosphorylation regulation and inhibitors. Each enzymes contain more than one regulation records. Each regulation records is from an separate evidence from literature or database.


Transcriptional relavant regulatory information:

Although there are many transcription factor databases and enzyme databases,the relation between transcription factor and their target enzymes are not systematically collected. As it has been assumed that the enzymes with the lowest velocities are regulatory, we focus on reported rate-limiting enzymes and collect their upstream transcriptional factors. Only experiment validated transcription factors are collected as regulatory information of "transcription factor". Regulation records of computational promoter analysis or transcriptional binding site analysis are just assigned as "transcription level". "interact with TF" is assigned to interaction pairs of rate-limiting enzymes and transcription factors from high-throughput data of protein protein interaction (Data source :16189514,16169070,17353931,16429126,11799066,15782160,11805837,10688190). Practically, for each enzyme, we search Entrez using expression as (HNF4alpha or hepatocyte nuclear factor or peroxisome proliferator-activated receptor gamma or PPARgamma peroxisome proliferator-activated receptor alpha or PPARalpha or hepatocyte nuclear factor 1 alpha or HNF1alpha or TNF or myc or myb or CREBP or CEBP or CEBP or ChEBP or REST or Sp1 or Sp3 or TGFbeta or NF-kappaB or C/EBPbeta or C/EBPalpha or enhancer binding proteins beta or PGC-1alpha or Nuclear receptor or STAT1 or USF or jun). The quoted transcription factors are common on metabolic network regulation.


phosphorylation or post-translational modification relevant regulatory information:

For phosphorylation regulation information, we first collect the description for phosphorylation, post-translational regulation, post-translational modification or reversible covalent regulation from published literature using key word "phosphorylation","post-translational regulation","post-translational modification","reversible covalent regulation". Phosphorylation information from Uniprot and PhosSitedb are also integrated into our database during annotation stage to complement phosphorylation information from literature. If enzymes are reported to be regulated by phosphorylation or any proteins contains experimental validated phosphorylation site in database, we will assign "phosphorylation" for the regulation record. If just reported "post-translational modification" or "reversible covalent modification", we also just assign the regulation information as "post-translational modification".


inhibitors relevant regulatory information:

Feedback and allosteric regulation are two main short-term regulatory mechanisms. We do not do exhaustive searching for the two type information. If they are mentioned in certain literatures, we extract them.

All enzyme inhibitor information was extracted from the BRENDA database (version 7.1). Organism specific inhibitors were recorded in a given EC code in BRENDA database. Similar semi-automatic method was used to convert free text inhibitor information to KEGG compound identifiers as described in the previous study. For each enzyme, if the inhibitor description from BRNEDA was exactly match a KEGG compound name, we assigned the KEGG compound to that description. Then we grouped all assigned KEGG compound together by their KEGG compound ID and checked all the mapping results manually. However, many man-made inhibitors such as EDTA could not be produced in vivo. We picked out all the organism-specific inhibitors dataset by in vivo enzyme products of each organism. Although some inhibitors were enzyme products, they just inhibited other proteins not metabolic enzymes. We also excluded such inhibitors from the final dataset as they did not provide inhibiting effect in the metabolic network.


other regulatory information:

"Key enzyme" and "regulatory enzyme" are very close concepts with rate-limiting enzymes to describe essential enzymes in metabolic pathway. Weber (1974) introduce "key enzyme" to pathway. He lists several features of such enzymes, including low activity, rate-limiting, catalysis of irreversible reactions, their allosteric regulation by inhibitor or feedback. The concept of "regulatory enzyme" emphasize the influence of effectors. Although these two concepts fall short of an unexact definition, description relevant the two concepts provide regulatory information in certain.

Although direct regulator are diffcult to discover, upstream signal transduction pathway are often describe in enzyme regulatory study, we record such information as "signal pathway".


For each enzyme, regulatory information is extracted from GeneRif annotation first. Extensive searching for more upstream transcription factor and reversible phosphorylation regulation are performed using Entrez as described as above.

For any abstracts from Generif or Entrez searching, we need to confirm the enzyme and organisms information first. Rigorous control for enzyme name and organism information during curation ensures the precise regulatory information.

After confirming the enzyme information, the article is read to extract classes of information including:

  1. 1. "transcription factor": upstream transcription factor which are directly regulate the enzyme

  2. 2. "transcriptional level": description such as promoter site analysis

  3. 3. "Phosphorylation": phosphorylation relevant description

  4. 4. "post-translational modification": post-translational modification or reversible covalent modification relevant description

  5. 5. "inhibitor": inhibitor from BRENDA

  6. 6. "allosteric": description about allosteric regulation

  7. 7. "feedback": description about feedback regulation

  8. 8. "regulatory enzyme": description about regulatory enzyme

  9. 9. "key enzyme": description about key enzyme in certain pathway

  10. 10. "signal pathway": upstream signal transduction pathway

  11. 11. "epigenetic": description about epigenetic level regulation

  12. 12. "others": other regulatory information

Automatic annotation and update from IntEnz,KEGG,Uniprot,Entrez Gene and other resource:

We will maintain and update RLEdb regularly as more data and information become available. In addition, an automatic pipeline for database annotation and updating was constructed to enable the integration of a pathway-centric set of databases including IntEnz, KEGG/Ligand, and in addition the UniProt, Entrez Gene and NCBI Taxonomy, and the Gene Ontology.

After get all the enzyme codes of rate-limiting enzymes, we will first annotate the enzymes with IntEnz database and KEGG ligand database at enzyme level.  and then we extract their correspondding gene and proteins. Following the cross links to the Entrez gene and Uniprot protein, annotation at gene and protein level are extracted including KEGG pathway maps, tissue expression, subcellular localization and chromosome number etc. The automation of the whole processes give us more focus to the curation of rate-limiting enzymes.

At this stage we try to collect five model organism data first. In future, we plan to expand and integrate our data to other organisms. For example, many rate-limiting enzymes play same role in all mammalian. It is a good way for cross-organism validation.

On current database, we collect data in a semi-supervised way and do not automate all the thing at first. In future, we plan to add some text ming results.



  Copyright 2009, Center for Bioinformatics 
  Last Modified: 2009-03-24  
  Design by Zhao Min