Summary

In Australia, many community service program data collections developed over the last decade, including several for aged care programs, contain a common statistical linkage key (SLK-581) to enable derivation of client level data and to determine which clients use a number of programs. A direct comparison between an SLK-based linkage strategy and a name-based linkage strategy was carried out to gauge the quality of the SLK-based linkage. The scope of the comparison was limited to linkage for a single stage of the Pathways in Aged Care (PIAC) study which linked Aged and Community Care Management Information System (ACCMIS) data with National Death Index (NDI) data using an SLK-based linkage strategy. The purpose of this report is to examine the accuracy of the SLK-based strategy and utility of the resulting matched data, using the name-based strategy as the reference standard.

Methods

Both the ACCMIS and NDI data sets have full name information available as well as the data required for the keys used in the PIAC SLK-based data linkage. In the PIAC linkage, a detailed stepwise deterministic record linkage algorithm was developed to link data sets. The strategy used a general person identifier (SLK-581) in conjunction with additional data items (e.g. region and date of death). Measures of likely match accuracy were used to select match keys and ensure match quality.

Both a name-based linkage strategy and the PIAC SLK-based linkage strategies were applied to link the data sets. The name-based strategy was probabilistic and involved running a series of passes allowing for variation in name and demographic data and using clerical review to identify matches. Matches made under the two strategies were directly compared using the name-based strategy as the reference standard. The name-based strategy made 172,776 links and the SLK-based strategy made 170,928 links.

Results

Overall, the study confirms that the utility of the SLK-based linkage strategy (which used SLK-581 in conjunction with other common data items) is comparable to that of the name-based linkage strategy. More specifically, the study showed that:

  • the SLK-based strategy was highly effective in identifying matches, with a positive predictive value (PPV) of 99.7% and a sensitivity of 98.5%.
  • the name-based strategy was not infallible. A very small number of name-based matches were identified as false. Also, detailed comparisons showed that one-third of the small number of the matches made only by the SLK-based strategy, were identified as true matches after close clerical review.
  • some minor improvements could be made to the stepwise SLK-based linkage process. 
  • the SKL-based linkage strategy resulted in linked data that largely reflected the name-based linkage strategy in terms of the distributions across key variables. 

Furthermore, the use of the detailed stepwise SLK-based linkage process, which utilises additional common data items, was justified when compared with using a single-step SLK581 linkage, identifying an extra 10% of all name-based links (sensitivity of 98.5% compared with 88.4%, respectively).