IT Requirements
1. Data Ingestion
Data that is intended to be loaded into the BioGrid system will generally be loaded onto a Local Research Repository (LRR). This is a computer that resides at your site, or is shared with another BioGrid site.
A copy of your database is created on the LRR by the BioGrid team in a standardised format, and a regular automatic process is set up to copy data from your database to the LRR. This data will contain patient identifying details as well as clinical information; the LRR is still behind your site’s firewall and under the control of your site’s security arrangements.
BioGrid must assign a Unique Subject Identifier (USI) to each patient. This will either be an existing USI if your patient already exists elsewhere in the BioGrid platform (possibly at another site), or a new USI if the patient does not yet have any information in BioGrid.
To make this determination for each patient, the normal procedure is that a programmatic process will automatically copy just the identifying information for each patient to the central BioGrid demographic repository, otherwise known as the linkage key server. Here it will attempt to match each patient’s identifying details to those of other patients already in BioGrid. Approval for this copying of the identifying details for your patients, and its use in the matching process, will have been obtained as part of the legal review and ethics application at your site when you joined BioGrid.
This process is known as probabilistic matching, and will result in a either a new USI or an existing USI number being written back in encrypted form alongside your data on your site’s LRR. It is this number that is used to link patient data between databases at different sites – a researcher will only see the USI and no identifying details for a patient.
There is a second method of matching patients that can be used in cases where a site’s ethics requirements do not allow the release of identifying details as described above. This is known as hashing, and involves the use of a non-reversible algorithm to calculate a unique hash value for your patients and patients already in BioGrid. The hashes are then compared to determine if this is a new or an existing patient, and a new or existing USI is again assigned.
In addition to the data itself, BioGrid will work with you to create a data dictionary for all items in your database. This describes what kind of data is in the data set and who the owner of the data is. This metadata information is freely available on the BioGrid web site.
Your data will now be stored in your local LRR with a Unique Subject Identifier which allows it to be linked to data from other sites. Researchers who wish to use the data can browse the BioGrid data dictionary and determine which data set they would like to access.
To be given access to your data, they must first seek your permission (via the BioGrid Access Request System) and obtain approval from an accredited human research ethics committee. If this is obtained, they are given access to patient data from different sites that is linked by the USI; they do not obtain or see any patient identifiers. Access is provided directly to the data that is stored on the LRR at your site. The BioGrid model is a 'federated database' rather than a central data repository; no clinical data is stored centrally in BioGrid.
2. Acquire a node
As mentioned above, adding your data will normally require a Local Research Repository (LRR) to be available at your site. If your data is being added to an existing BioGrid node, this LRR will already be present and can be used for your data as well as any existing data.
However, if your site is new to BioGrid, an LRR will need to be installed. This process will normally be handled by your Information Technology (IT) department. BioGrid will provide specifications for a standard virtual server which will act as the LRR, as well as information about software and communications.
- Minimal specifications for a single-site LRR are:
- Processor: 2 logical processor allocation from virtual host
- Memory: 4gb
- Hard Disk: System C-100Gb; Data D-50Gb; Log E-50Gb
- Operating System: Microsoft Windows 2012R2 Server Standard Edition minimum
- Database System:
- Minimum: MS SQL Server 2012 Standard Edition, per processor or core licensing with SSIS and SSRS enabled.
- Maximum: MS SQL Server 2016 Standard Edition with SSIS and SSRS enabled
- Recommendation to deploy LRR as Virtual Machine
Note, if your site is experiencing issues provisioning the minimum requirements, BioGrid is open to discussing alternative technology options for your site.
3. Piggyback node
In some cases, it may be possible for a new site to 'piggyback' on an existing LRR. This is especially the case for smaller sites, or for sites where there is an existing arrangement for the sharing of IT infrastructure. If this is the case, BioGrid will explore the options with your IT department, and determine if the existing LRR needs to be upgraded to handle the additional workload.
Note that your full data set, including both identifying data and clinical data, must still be copied to the LRR. If the LRR is at another site, the ethics application at your site must clearly acknowledge and approve this process.
4. IT connection
In order to link an LRR to BioGrid, a Virtual Private Network (VPN) must be established between your site and BioGrid. Your IT department will handle this.
In addition, researchers wishing to access data via remote access from BioGrid need to ensure that PCs can support remote desktop.
5. Research Tools
BioGrid has a multi-user licence for use of the SAS statistical analysis system for sites which are members of BioGrid. SAS Enterprise Guide and SAS Visual Analytics are the standard tools used to access the data. BioGrid provides regular complimentary courses in the use of SAS to member sites.
For further information on IT requirements contact our technical team.