Technical notes

Attributing the number of clients to a set of missing SLK records

The AODTS NMDS collects information at the service record level. Service records are associated with individual clients through an SLK. There are a number of records that have missing or invalid SLK data that cannot be attributed to a client. This leads to an under‑reporting of the total number of clients using the services, because some (but not all) of the records will belong to clients who are not observed via a valid SLK.

This document describes the method of using the available data—after making several assumptions about the behaviour of the whole population—to estimate the total number of clients.

Imputation groups

Imputation groups are formed to improve the performance of the estimates. The service records were grouped according to properties that are thought to influence the behaviour of clients and the quality of SLK data, and then the imputation was performed at this imputation group level.

Possible properties used to develop groups include location, provider size (measured by number of service records) and service type. The data are also grouped according to any subpopulations that are going to be reported upon, such as jurisdiction.

The final imputation groups were formed by balancing the often-competing priorities of having homogenous groups and the need to have groups large enough to ensure that the imputation is robust.

Assumptions and approximations

Assumption 1: randomness and independence

This imputation method assumes that whichever service provider a client attends for each incidence of service is random and independent of any other incidents of service the client may have. It is further assumed that the validity or otherwise of the SLK recorded on each service record is random, and independent of both the client and the service provider with which the record is associated.

Assumption 2: distribution of the number of service records per client

This method also assumes that the distribution of the number of records per client for all clients is similar to that observed using the subset of records with valid SLKs.

Approximation 1: no client has more than 10 service records

This imputation method uses the approximation that no client has more than 10 service records.

In order to implement this approximation, any clients observed to have more than 10 service records were treated as if they had only 10, and the proportion of clients with 10 service records calculated accordingly.

Notation

The definition of the notation used in this document is as follows:

Nt : the (unknown) total number of clients
N't : the imputed total number of clients
NSLK1: the number of clients observed using the records with a valid SLK
PSLK1: the proportion of clients with at least 1 service record with a valid SLK
PNi: the (unknown) proportion of clients with i service records
P'Ni: the imputed proportion of clients with i service records
PNi,SLK1: the proportion of clients with i service records as observed using records with valid SLKs
nt: the total number of service records
nt|Nt,PNi: the number of service records given the total number of clients and the proportions of cleints with i service records, i = 1,2, ... 10
nSLK1: the number of service records with a valid SLK
nSLK0: the number of service records with an invalid SLK
pSLK0: the proportion of service records with an invalid SLK

Methodology

Given Assumption 1 and Approximation 1, the proportion of clients who have at least 1 service record with a valid SLK is:
equation
Now:
equation NSLK1 = PSLK1 X Nt
so it follows that the total number of clients is:
equation Nt=NSLK1 divided by PSLK1

To resolve this equation for Nt the values of PNi is required. These are unknown, given it is not possible to observe the whole population due to the records with invalid SLK values. This method imputes the unknown PNi using numerical methods, then uses these values to impute Nt.

The process starts with the distribution of number of records per client that were observed using the records with valid SLKs (PNi,SLK1). These values are then adjusted so that the following conditions are met.

Constraint 1

The sum of the imputed proportions is equal to 1. That is:
equation

Constraint 2

The imputed proportion of clients with 1 service record is less than or equal to the observed equivalent proportion among clients with records with valid SLKs. That is:
equation

This constraint is used because some of the clients observed to have only 1 record will, in fact, have additional records with invalid SLKs. It is unlikely that the true proportion of clients with 1 service record is higher than that observed using records with valid SLKs.

Constraint 3

The total number of service records that the imputed total number of clients and the imputed distribution of records per client imply is equal to the observed number of service records.

That is:
equation

This constraint is used to ensure that the imputed values are consistent with the observed number of records.

Penalty function

Under Assumption 2 we want to limit how much the imputed proportions differ from the proportions observed via the records with valid SLK data. To achieve this we use a penalty function that increases as the distance between the imputed and observed proportions increases. this function is defined to be:
equation

Using numerical methods, the P'N1, P'N2, ... P'N10  are chosen that the penalty function is minimised, subject to the 3 constraints.

The final step is to use the imputed proportions to calculate the imputed total number of clients:
equation

The resulting number is then rounded to the nearest integer.

Discussion

This imputation technique uses available information to impute the total number of clients. The methodology takes into account the proportion of records with invalid SLK data and the distribution of the number of service records per client, as observed via the records with valid SLK data. It is apparent that the assumptions made do not hold for every client or service record. It is reasonable to expect that a client’s attendance at a service provider will be affected by location and any prior contact they had with a provider. It should also be noted that some service providers failed to collect SLK for any service record during the reference period.

Despite the known cases where Assumption 1 does not hold, it is reasonable to hope that, across the population as a whole, the assumption is a reasonable representation of the populations of clients and service records.

It is believed that the impact of Approximation 1 will be small because, given Assumption 1, the chance that a client with more than 10 service records is not observed via a record with a valid SLK is extremely small. The chance diminishes as the proportion of records with an invalid SLK decreases and across jurisdictions the highest proportion observed is about 0.3. It should also be noted that the largest proportion of clients with 10 or more service records observed in the data at the jurisdiction level was only 0.007.

There are many different penalty functions that could be used in this imputation. The function used was chosen because, compared with the other penalty functions investigated, it produced imputed proportions that were generally as close or closer to the observed proportions. It also most consistently resulted in a distribution that was similar in shape to the observed distribution of the number of records per client.