DataServicesOverview

From CVRG Wiki

Jump to: navigation, search

Contents

CVRG Data Services Overview

The current CVRG data services have been developed to meet the needs of the initial Driving Biomedical Project, which is the D. W. Reynolds study of Sudden Cardiac Death in the setting of coronary artery disease (see http://www.reynolds.jhmi.edu/index.html). These data services are described in detail, with links to proper installation guidelines below:


Waveform 3

WaveformECG is a web-based tool for managing and analyzing ECG data. Users can browse their files, and upload ECG data in GE MUSE XML Versions 7+, Philips ECG XML versions 1.03+, and WaveForm DataBase (WFDB) format. WaveformECG extracts and stores information from GE and Philips file headers, including analysis results. Digital ECG data is extracted and stored as a time-series. Users can select, view and scroll through individual digital ECG waveforms and lead signals, formatted to look like paper chart recordings. Points and time intervals can be annotated using ontology from the Bioportal1 ontology server. Annotations are then stored with waveforms. Users can select groups of ECGs for analysis. These ECGs are then passed to user-selected analysis algorithms. Analyses can be distributed across multiple CPUs to decrease processing time. Analysis results can be viewed as well as downloaded in csv format. With the wiki pages listed below developers can configure a working instance of Waveform 3 in a Linux environment and also set up an Amazon EC2 Cloud implementation.

CVRG Waveform Development Environment Configuration

CVRG Waveform Server Configuration

CVRG Waveform Amazon EC2 Cloud Implementation

CVRG Waveform Developer Documentation


Electrophysiology Data Discovery Index

The purpose of the Electrophysiology Data Discovery Index (EDDI) is to provide cardiac electrophysiologists with a repository to share their data. Through the use of DSpace and Globus, EDDI allows for the annotation, publication and discovery of electrophysiology time-series data. The page listed below has detailed information on deploying an instance of DSpace with EDDI-specific modifications.

CVRG EDDI Server Configuration


OpenTSDB Standalone

OpenTSDB is a time series database which provides a mechanism for storing time series data such as EKGs. The database is built on an HBase/Hadoop 2 platform and is configured to run on 3+ nodes in a cluster.

Installing and Configuring an OpenTSDB Standalone Client

OpenTSDB Cluster

OpenTSDB is a time series database which provides a mechanism for storing time series data such as EKGs. The database is built on an HBase/Hadoop 2 platform and is configured to run on 3+ nodes in a cluster.

Installing and Configuring an OpenTSDB Cluster


SNP Data Service

The SNP Data Service provides grid-enabled access to SNP data. The service uses a MySQL version 5.0.22 database and a data model developed in caCORE 3.2.1. The data model is based upon information captured by the JHU Institute for Genetic Medicine, on behalf of the Reynolds investigators. The data model associates subject information with biomarkers derived from primary SNP calls stored in SNP arrays.

Installation and Configuration Instructions for SNP Data Services


Protein DB Data Service

Protein DB Version 2.0 is a full-featured proteomics data service based on the PSI-OM. Protein DB has been harmonized with the caBIG caDSR and can be viewed in the NCI UML Model Browser. A data model is available on the JHU Proteomics Center site. Source Code and System Configuration information is available on the CCBM Protein DB page.


DICOM Image Data Service

The DICOM image data service has been instantiated at JHU. It leverages an Open-Source Clinical Image Management system (DCM4CHE). Access to the system is currently available through the use of virtualPACS.

CVRG DICOM Data Service Installation Instructions

VirtualPACS Installation Instructions


DCM4CHEE - Clinical Image and Object Management

dcm4che is a collection of open source applications and utilities for the healthcare enterprise. These applications have been developed in the Java programming language for performance and portability, supporting deployment on JDK 1.5 and up.

At the core of the dcm4che project is a robust implementation of the DICOM standard. The dcm4che-1.x DICOM toolkit is used in many production applications across the world, while the current (2.x) version of the toolkit has been re-architected for high performance and flexibility.

Also contained within the dcm4che project is dcm4chee (the extra 'e' stands for 'enterprise'). dcm4chee is an Image Manager/Image Archive (according to IHE). The application contains the DICOM, HL7 services and interfaces that are required to provide storage, retrieval, and workflow to a healthcare environment. dcm4chee is pre-packaged and deployed within the JBoss application server. By taking advantage of many JBoss features (JMS, EJB, Servlet Engine, etc.), and assuming the role of several IHE actors for the sake of interoperability, the application provides many robust and scalable services.

For general information, go to http://www.dcm4che.org.

CVRG DCM4CHEE Installation Instructions


ECG HL7aECG Data Service

In 2004, the Food & Drug Administration announced its intent to accept annotated ECG waveform data in XML following the Health Level 7 (HL7) Annotated ECG Waveform Data Standard (aECG). The HL7aECG data service leverages the XML Data Service, developed jointly by the caBIG In Vivo Imaging workspace and the CVRG. The HL7aECG files are stored within in an open-source embeddable Oracle Berkeley DB XML database.

ECG HL7aECG Data Service Installation Instructions


ECG Physionet Data Service

PhysioNet was established in 1999 as the outreach component of the Research Resource for Complex Physiologic Signals. Its PhysioBank is a large and growing archive of well-characterized digital recordings of physiologic signals and related data for use by the biomedical research community. The Physionet data service leverages the XML Data Service, developed jointly by the caBIG In Vivo Imaging workspace and the CVRG. Through the XML Data Service, metadata about the WFDB files are stored in an open-source embeddable Oracle Berkeley DB XML database. The WFDB files themselves are stored in a referenced location on the CVRG file system.

WFDB Data Service Installation Instructions


OpenClinica Data service

The OpenClinica Data Service provides grid-enabled, secure access to clinical information. The service consists of three main components:

  1. A PostgreSQL version 8.1.11 database as the backend system. This database is built using the OpenClinica relational database schema.
  2. A caCORE SDK 4.0 generated server application. The object model for this application has been developed in UML using the Enterprise Architect system. The version 1.1 of the object model allows queries against Study and Subject information.
  3. A caGrid-compatible data service. This service allows grid-enabled, secure access to data serviced through the object model.

The new version (ver 1.2) of the OpenClinica Data Service supports querying of Case Report Forms information as well as Study and Subject information. More detailed instructions for building and deploying the OpenClinica Data Service with the new object model can be found here.

How to Build and Deploy OpenClinica Data Service.


caArray Data service

caArray is an open-source, web and programmatically accessible mRNA array data management system developed as part of the cancer Biomedical Informatics Grid caBIG. caArray supports the annotation and exchange of mRNA array data as part of a grid. caArray is not currently deployed on the CVRG. A cross-registration of caBIG and CVRG services is underway where the CVRG will leverage caBIG's caArray and caBIG will leverage CVRG's ProteinDB and ECG services.

CVRG caBIG Collaboration Page on caBIG Knowledge Center Website


Berger Algorithm Parameter Data Service Information

The Berger Algorithm Parameter Data Service is a persistent storage mechanism for the parameter files utilized by the Berger Algorithm. The Berger Algorithm itself requires a parameter file for each ECG file that it processes. The Berger Algorithm was developed by Dr. Ron Berger at Johns Hopkins University. See Berger, R.D., Kasper, E.K., Baughman K.L., Marban E., Calkins H., Tomaselli G.F. (1997) Beat-to-beat QT interval variability: novel evidence for repolarization lability in ischemic and nonischemic dilated cardiomyopathy. Circulation. 96(5):1557-1565 and Berger, R.D. (2003). QT Variability. J. Electrocardiol. 36: 83-87.

Berger Algorithm Parameter Data Service Installation Instructions


Berger Algorithm Results Data Service Information

The Berger Algorithm Results Data Service is a persistent storage mechanism for the output files produced by the Berger Algorithm.

Berger Algorithm Results Data Service Installation Instructions


Physionet QT Algorithm Results Data Service Information

The Physionet QT Algorithm Results Data Service is a persistent storage mechanism for the output files produced by the Physionet QT Algorithm.

Physionet QT Algorithm Results Data Service Installation Instructions


General Purpose XML Data Service

A general purpose XML data service has been developed as part of the CVRG Project. Further development of this service is now a caGrid Community Project.

XML Data Service Documentation is available here.

Personal tools
Project Infrastructures