About the Gadgets & Data

Gadgets

Name	Entity Linker
Description	The Entity Linker gadget facilitates the natural language processing task of Entity Linking on biographical data. Entity Linking consists of connecting keywords contained in a text with their corresponding keywords stored in a knowledgebase. When English text input (e.g. the abstract of an academic paper) is given to the Entity Linker, it outputs the keywords contained in the text in the form that they are registered in the knowledgebase.
Reference	http://prm-ezcatdb.cbrc.jp/entity_linking/
Institute	National Institute of Advanced Industrial Science and Technology (AIST)
Contributors	Masami Ikeda & Hiroya Takamura

Name	NamedEntityRecognizer
Description	Named Entity Recognizer is a gadget that facilitates the natural language processing task of Named Entity Recognition (NER) on literature information. NER is used to extract and classify keywords such as disease names, cell names, and pharmacological substances found within texts. When English text is input, the gadget will find keywords in the text that match one or more of 37 pre-defined criteria (including names of diseases, cells, pharmacological substances, and other proper nouns relevant to the field of drug discovery).
Reference	http://prm-ezcatdb.cbrc.jp/named_entity_recognition/
Institute	National Institute of Advanced Industrial Science and Technology (AIST)
Contributors	Masami Ikeda & Hiroya Takamura

Name	JaMIE
Description	Relation extraction is the extraction of semantic relations between keywords in a text. When a Japanese medical text (e.g. CT image reading finding) is input into this gadget, it outputs the relationship between the keywords in the text and their associated keywords in the knowledgebase.
Reference	https://github.com/racerandom/JaMIE
Institute	Kyoto University
Contributors	Fei Cheng & Sadao Kurohashi

Name	Semantic Search
Description	This is a document retrieval system that presents similar documents when given medical documents such as electronic medical records and radiological findings. The search target is a group of medical documents annotated by the PRISM project. An example application of this gadget would be searching for existing cases in a hospital.
Reference	https://aoi.naist.jp/prism-search/
Institute	Nara Institute of Science and Technology (NAIST)
Contributors

Name	HeaRT
Description	When medical documents such as electronic medical records and findings are input, a Gantt-like chart is created that illustrates the information in chronological order. This can be used to facilitate information sharing among health professionals.
Reference	https://aoi.naist.jp/prism-heart/
Institute	Nara Institute of Science and Technology (NAIST)
Contributors

Name	SFM, bST
Description	Space-efficient feature maps for string alignment kernels (SFMEDP) takes a set of input strings and outputs a set of feature vectors. Using the features produced by SFMEDM, a support vector machine (SVM) can be used to perform predictive tasks such as string classification and regression. One example of this gadget's utility is prediction tasks which use amino acids as training data. Because strings are mapped in a nonlinear space, prediction performance using SFMEDM is highly accurate and memory efficient.
Reference	https://github.com/tb-yasu/SFMEDM https://github.com/kampersanda/integer_sketch_search
Institute	RIKEN
Contributors	Yasuo Tabei

Name	kGCN Network Prediction
Description	Graph convolutional neural networks (GCNs) allow structural information of small molecule compounds to be input as graphs and have been reported to perform well on many types of prediction tasks. kGCN is an open-source, GCN-based gadget that provides the necessary preprocessing for building prediction models. Bayesian optimization for model tuning and atom visualization contribute significantly to the prediction for interpretation of results. This gadget predicts and outputs new links that may exist between nodes upon input of the dataset, nodes, and trained models.
Reference	https://github.com/clinfo/kGCN
Institute	Kyoto University
Contributors	Ryosuke Kojima & Yasushi Okuno

Name	Molenc
Description	One approach to using information about a compound in machine learning is to generate fingerprints, which are vectors that indicate how many specific substructures are present in the compound. There are various ways to generate fingerprints, but this gadget generates Signature Molecular Descriptors (SMDs) which were originally published by J.L. Faulon et al. in 2003. By inputting a list of structural information (SMILES) of a compound of interest into the gadget, a correspondence table of features (substructures) and SMD fingerprints is generated and output. Preexisting correspondence tables can be uploaded by ticking the "encoding.dix" checkbox and pressing Run. If the user does not have a correspondence table, the user should ensure that the aforementioned box is not checked before pressing Run.
Reference	https://github.com/UnixJunkie/molenc
Institute	Kyushu Institute of Technology
Contributors	Francois Berenger & Yoshihiro Yamanishi

Name	Vanishing Ranking Kernels
Description	Ligand-based virtual screening is performed by learning a classification model of activity strength for a set of compounds and predicting it for a set of compounds of unknown activity based on vanishing kernels and intermolecular Tamimoto coefficients. The resulting model defines an applicability domain (AD) for the activity. This AD is used to improve screening efficiency. The input file is the feature (descriptor) rather than the structure of the compound. Please refer to https://github.com/UnixJunkie/rankers for details.
Reference	https://github.com/UnixJunkie/rankers
Institute	Kyushu Institute of Technology
Contributors	Francois Berenger & Yoshihiro Yamanishi

Name	Modified Diet Networks
Description	With ultra-high dimensional (n<<p) data, such as genomic data, it is difficult to avoid overlearning even with regularization and other methods. Diet Networks is a deep learning method designed to train on high dimensional data, and Modified Diet Networks is an improved version of Diet Networks that provides stable and accurate predictions. This gadget is equipped with a pre-trained model that uses Modified Diet Networks to classify lung cancer patients into lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC) based on their somatic mutation profiles. When patient information is provided to the model in the form of a vector of counts of how many somatic mutations are present in each gene/patient (multiple patients can be entered as a matrix) the model outputs a prediction of whether the patient is LUAD or LUSC.
Reference	https://www.mdpi.com/2218-273X/10/9/1249
Institute	National Cancer Center
Contributors	Ken Asada & Ryuji Hamamoto

Name	Multiomics Analyzer
Description	In recent years, multi-omics data analysis has attracted attention for various applications, but its methodology has not yet been fully established. One of the biggest challenges in omics data analysis is how to handle high-dimensional data. The Multi-omics Analyzer is equipped with models created by applying unsupervised deep learning methods to miRNA and mRNA data acquired from lung cancer patients in The Cancer Genome Atlas (TCGA). When miRNA/mRNA data is input, it outputs feature vectors with reduced dimensionality. The feature vectors obtained from the Multi-omics Analyzer can be used for prediction tasks such as classification and regression.
Reference	https://www.mdpi.com/2218-273X/10/4/524/htm
Institute	National Cancer Center
Contributors	Kazuma Kobayashi & Ryuji Hamamoto

Name	Subset Binder
Description	This gadget uses an algorithm called subset binding to find groups of items that are linked across two datasets. For example, when medical information and omics data are input, the gadget outputs a pair of molecule groups that fluctuate together as well as a corresponding group of medical information that changes in conjunction with them. Miné includes two types of data - hepatotoxicity phenotype data and gene expression profiles - that reflect hepatotoxicity when high concentrations of acetaminophen are administered to rats as data for operation checks. Subset binding is based on association rule mining technology, so the parameters used are generally the same as those used in association rule mining.
Reference	https://www.researchsquare.com/article/rs-405195/latest.pdf
Institute	National Institutes of Biomedical Innovation, Health and Nutrition (NIBIOHN) & RIKEN
Contributors	Yayoi Natsume-Kitatani & Naonori Ueda

Name	RPPA
Description	This gadget is a prognostic system for lung squamous cell carcinoma and lung adenocarcinoma using a Deep Autoencoder. It can predict prognosis using only reversed phase protein array (RPPA) data as well as six types of omics data (RNA sequencing data, miRNA sequencing data, DNA methylation data, copy number variation, somatic mutation, DNA, sequencing data, and RPPA data).
Reference	https://www.mdpi.com/2218-273X/10/10/1460
Institute	National Cancer Center
Contributors	Ken Asada, Ryuji Hamamoto

Name	PathoGN
Description	Upon input of mutation information and gene relationship networks, this gadget presents information such as the pathogenicity of the mutation and the predicted relevant genes present in the biomolecular network. This is a novel method implemented as an extension of kGCN.
Reference
Institute	Kyoto University
Contributors	Ryosuke Kojima, Yasushi Okuno

Name	INGOR
Description	INGOR is an implementation of a Bayesian network estimation algorithm. When provided with measured biomolecular profiles, it creates a causal network between biomolecules such as genes and proteins. This gadget can be used to elucidate intermolecular regulatory causal mechanisms and to search for new drug target candidates.
Reference	https://ytlab.jp/clinfo/ingor/tutorialja.html
Institute	Kyoto University
Contributors	Yoshinori Tamada, Yasushi Okuno

Name	INGOR ECv
Description	Given a measured biomolecular profile and a Bayesian network estimated with INGOR, INGOR ECv outputs the edge contribution value (ECv) for each branch in the network and the partial network extracted using this value. This tool is for academic use only.
Reference	https://doi.org/10.1038/s41598-021-02394-w
Institute	Kyoto University
Contributors	Nakazawa, M.A., Tamada, Y., Tanaka, Y., Ikeguchi, M., Higashihara, K., Okuno, Y.

Name	INGOR RC
Description	When the measured biomolecular profiles and the Bayesian network estimated by INGOR are input to INGOR RC, it will output the relative contribution value (RC) for each branch needed to visualize the network for each sample. This tool is for academic use only.
Reference	https://doi.org/10.1038/s41598-021-90556-1
Institute	Kyoto University
Contributors	Tanaka, Y., Higashihara, K., Nakazawa, M.A., Yamashita, F., Tamada, Y., Okuno, Y.

Name	INGOR Network
Description	Based on the measured biomolecular profiles, this Bayesian network estimation algorithm provides results of stratification and grouping of samples based on causal networks, ECv, and networks among biomolecules such as genes and proteins. It can be used for elucidating intermolecular regulatory causal mechanisms, searching for novel drug target candidates, and patient stratification. This tool is for academic use only.
Reference
Institute	Kyoto University
Contributors	Yoshinori Tamada, Yasushi Okuno

Name	TargetMine
Description	A data warehouse that integrates more than 30 public data sources widely used internationally to support early drug discovery research, especially in target discovery, enabling efficient knowledge discovery. TargetMine covers a wide range of data from genes, proteins, and pathways to 3D structures and interactions with compounds. Presently, the data incorporated in TargetMine is primarily focused on the most studied model organisms in the field of drug discovery: humans, rats, and mice.
Reference	https://doi.org/10.1093/bioinformatics/btac507
Institute	National Institutes of Biomedical Innovation, Health and Nutrition (NIBIOHN)
Contributors	Yi-An Chen, Kenji Mizuguchi

Data

There is no data available for release.