Linkage findings
Scope of the data
The first version of the data was made available to approved researchers in December 2022 and had around 250,000 records linked to a range of administrative data sets. The current version (Version 2) has expanded the data coverage to more than 6 million linked records with more recent case data (New South Wales), more jurisdictions (Victoria and Queensland) and more data sets as listed in Table 1 below. The linked data will continue to be regularly updated, with the aim to include all jurisdictions and additional sources of information in future versions.
Please refer to the data variables list for the temporal scope of each of the datasets and how it differs between versions.
Version 1 (released in December 2022) Number of linked records = 250,821 |
Version 2 (released in July 2023) Number of linked records = 6,415,740 |
---|---|
State/territory notifiable disease data on COVID-19 cases from:
|
State/territory notifiable disease data on COVID-19 cases from:
|
Australian Immunisation Register (AIR) – whole of population | Australian Immunisation Register (AIR) – whole of population |
Medicare Benefits Schedule (MBS) – cases only | Medicare Benefits Schedule (MBS) – whole of population |
Medicare Consumer Directory (MCD) – whole of population | Medicare Consumer Directory (MCD) – whole of population |
National Death Index (NDI) – whole of population | National Death Index (NDI) – whole of population |
Pharmaceutical Benefits Scheme (PBS, including Repatriation Schedule of Pharmaceutical Benefits (RPBS) information) – cases only | Pharmaceutical Benefits Scheme (PBS, including Repatriation Schedule of Pharmaceutical Benefits (RPBS) information) – whole of population |
National Notifiable Disease Surveillance System (NNDSS) – cases only | National Notifiable Disease Surveillance System (NNDSS) – cases only |
National Hospitals Morbidity Database (NHMD) – cases only | National Hospitals Morbidity Database (NHMD) – whole of population |
National Non-Admitted Patient Emergency Department Care Database (NNAPEDCD) – cases only | National Non-Admitted Patient Emergency Department Care Database (NNAPEDCD) – whole of population |
National Aged Care Data Clearinghouse (NACDC) – cases only | National Aged Care Data Clearinghouse (NACDC) – whole of population |
Australian New Zealand Intensive Care Survey (ANZICS) Adult Patient Database (APD) – whole of population |
|
Australian and New Zealand Paediatric Intensive Care Registry (ANZPICR) – whole of population |
Linkage rates by jurisdiction
Generally, linkage results depend on the accuracy and completeness of the linkage variables provided to AIHW: more accurate and complete data result in better linkage rates. For more information on how the data is linked, please refer to the above section on Data and methods.
Figure 2 shows the number of records that were linked and those that were unable to be linked by state and territory. For all jurisdictions, linkage rates have generally remained the same or improved slightly, where over 90% of records supplied for the project were linked in both Version 1 and 2. There was a notable increase in number of records supplied from New South Wales where there were over 3 million records linked in Version 2, compared to just over 70,000 cases linked in Version 1, though the linkage rate for New South Wales remained similar to Version 1 at 98%. In Version 2, data coverage was expanded to include two more jurisdictions (Victoria and Queensland), where over 95% of data supplied from these jurisdictions were linked. The lower linkage rate (93%) in the Northern Territory may be due to limited address information provided with the case data, to which the AIHW is working with the Northern Territory to improve this rate. New data supply for South Australia, Tasmania, Australian Capital Territory and Northern Territory is still ongoing, hence there is no change in the linkage rates for these jurisdictions in Version 2.
Figure 2: Number of records and percentage linked by jurisdictions
The segmented horizontal bar chart compares the linkage rates for participating jurisdictions for Version 1 and 2. In both Version 1 and 2, all jurisdictions have over 90% of records linked, where Tasmania has the highest percentage of linked records (99%), followed by New South Wales and Victoria (98%), while Northern Territory has the lowest percentage of linked records (93%).

Linkage rates by population groups
Table 2 describes the linkage rates by age group and sex/gender. Linkage rates can differ by population groups, and it is important to consider this when doing analysis on linked data. For example, individuals who change addresses whilst renting may be underrepresented in linkage studies. Table 2 shows that the linkage rate has largely improved for Version 2 compared to Version 1, where the linkage rate for all groups remains at well over 90%, except the ‘Other’ sex/gender category. Sex is one of the key variables used to link records, therefore, where sex is not reported consistently, or as neither male nor female (‘Other’ in Table 2 below) linkage rates are lower. The linkage rate for ‘Other’ has shown considerable improvement from 3% in Version 1 to about 77% in Version 2, though the linkage rate remains lower than males or females. There were no other large differences observed in linkage rates across the age groups.
|
Version 11 No. of records linked (%) |
Version 11 No. of records not linked (%) |
Version 2 No. of records linked (%) |
Version 2 No. of records not linked (%) |
---|---|---|---|---|
Sex/gender2 |
|
|
|
|
Male |
125,673 (96.4%) |
4,689 (3.6%) |
3,020,677 (97.7%) |
72,564 (2.3%) |
Female |
125,075 (97.2%) |
3,553 (2.8%) |
3,382,173 (97.8%) |
75,163 (2.2%) |
Other3 |
73 (3.0%) |
2,353 (97.0%) |
13,765 (77.3%) |
4,031 (22.7%) |
Age group4 |
|
|
|
|
0-15 |
47,241 (96.6%) |
1,675 (3.4%) |
1,141,652 (97.2%) |
33,225 (2.8%) |
16-29 |
73,074 (95.1%) |
3,739 (4.9%) |
1,463,851 (97.2%) |
42,122 (2.8%) |
30-49 |
79,326 (95.9%) |
3,422 (4.1%) |
2,104,378 (98.3%) |
35,554 (1.7%) |
50-69 |
39,433 (96.9%) |
1,252 (3.1%) |
1,253,801 (98.8%) |
15,343 (1.2%) |
70+ |
11,747 (95.9%) |
506 (4.1%) |
452,888 (94.7%) |
25,205 (5.3%) |
- Results for Version 1 (released on 16 December 2022) are based on those participating states and territories as detailed in Figure 2 and will not be directly comparable to the figures in the previously released web report ‘Establishing a COVID-19 linked dataset’ which also includes Victoria.
- As reported by the state and territory.
- Other includes records where sex or gender is not reported, or sex is reported as neither male nor female.
- Age group is based on age as at 31 December 2022. Records with missing information on birth date are excluded. Person IDs with more than one year of birth and/or sex were restricted to the most recent notification date (only small number of records were affected). Where the notification dates were equal, a random record was used.