HomeResearchProject ArchivesHealth Data Integration
Aim: to develop a trusted health data integration service delivering better health and research outcomes through novel data linkage mechanisms.

Health Data Integration

Research into the management and delivery of healthcare is critically dependent on access to data, however much of this data resides across many data repositories and organisations, and is often highly protected and private. Australia has a rich collection of health and community data repositories that could potentially be linked to help find answers to important health and social questions. Bringing these data repositories together would enhance greatly our ability to tackle diseases and understand complex issues.

HDIis a data integration tool developed at the Australian e-Health Research Centre (AEHRC) to provide private and secure access to an integrated virtual data repository, enabling research and analysis on a larger scale than would be otherwise be possible. It provides sophisticated infrastructure for publishing, locating retrieving and analysing data in large-scale, diverse and distributed information systems.


HDI will lead to benefits for all Australians by enabling:

  • policy makers in government to develop forecasts on which to base health policies, plans and budgets, leading to better use of national health resources
  • the health research community to conduct more focused and relevant research through access to higher quality, more accurate and up-to-date data and data products
  • provision of better health services based on the integrated analysis of national health data

Through HDI, CSIRO is enabling a network-based research infrastructure to develop the equivalent of a major virtual data repository that links individual data repositories. This will:

  • help researchers to rapidly locate appropriate data repositories and provide them with better tools to exploit the data
  • protect the privacy of individuals
  • enable data custodians to maintain ownership and control of their data resources
  • restrict access to data to authorised researchers
  • help create master patient indexes across institutions and databases
  • enable researchers to find answers to real life problems to improve the health and well-being of all Australians


One tool, many data sources

clean, linked data can be extracted, analysed and delivered from disparate databases, delivering all the functionality in a single, easy to use tool.

Data privacy, integrity & security

an overarching privacy policy provides declarative custodian-controlled policy statements which are enacted throughout the distributed network, allowing data custodians to vet and disseminate their data. Custodians have real-time control over access to and use of their data and any generated data products.

Virtual data repository

a virtual repository of data, rather than the warehousing of data, ensures that control of data stays with the data custodian.


a service-oriented approach that builds on leading-edge web services technology.

Industry-standards based

industry standards-based protocols, algorithms and software.

Platform independent

implemented in Java for platform independence.

Security & Privacy

Increasing concerns over privacy and confidentiality, coupled with a growing body of legislation and codes of practice governing the use of personal and health data, means that sharing health data for research purposes across health data custodian boundaries poses technical, organisational and ethical challenges.

HDI uses privacy-preserving linking algorithms to protect and identity and personal information.

Integration & Linking

HDI supports the networking of health data repositories by:

  • assembling data from existing multiple health data repositories to enable insight and to discover complex relationships
  • enabling data sharing in a secure, privacy-protecting networked environment
  • ensuring custodial control of data repositories, by providing a virtual data repository with local control of resources

HDI takes a federated approach to integrating data repositories, with web services used to query the data repositories and retrieve results for further analysis and reporting. Metadata is used to describe the data repositories and present a view of the data to users and is used in the planning and executing of complex queries across the data services.

Linking of patient records in different data repositories, while maintaining the privacy of patients, is core HDI functionality. Matching of patients across these data repositories is made possible using encrypted demographic data, meaning identifying information remains protected.

Last Updated on Thursday, 20 October 2011 08:57

Go to top


Dr Michael Lawley

 +61 7 3253 3609
This email address is being protected from spambots. You need JavaScript enabled to view it.