Auditable Multi-Party Computation

blog
This post is the first half of a two-part series explaining the PRIViLEDGE project use case for applying privacy-preserving distributed ledger technologies to outcome-based contracting in health care. In this first part we will focus on what can be done with the current state of the art. The second part will highlight how PRIViLEDGE research allows us to go beyond that.

Outcome-based contracting

A healthcare system can be roughly modeled as interactions between three kinds of parties: patients, care providers, and payers. The payers are typically either insurance companies or national sick funds.

Currently, the prevalent type of contract between a payer and a care provider is fee for service, where the provider gets a pre-agreed fee for each treatment performed on an insured patient. The drawback of this model is that it incentivizes the provider to prefer treatments with higher profit margins and sometimes even perform treatments not indicated for the patient's condition.

At the other end of spectrum, in particular in primary care and nursing sectors, the capitated payment model is used, where the provider gets a fixed income stream per patient under its care irrespective of how much treatment the patient really needs or receives. The downside of this model is that the care providers are incentivized to avoid performing treatments.

To motivate the providers to focus on best results for patients while simultaneously keeping costs under control, payers are increasingly interested in outcome-based contracting, also called pay for performance. In this model, the payments that providers receive depend on achieving certain measurable outcomes, such as reducing the mean time off work for a certain diagnosis or lowering the number of patients with blood pressure exceeding a given threshold.

To be able to affect such broad outcomes, care providers with multiple different specializations typically join into an accountable care organization (ACO) to share the responsibilities as well as the financial rewards and risks. In practice, the ACO contracts are not pure outcome-based, but combine different payment models in various proportions.

Privacy-preserving reporting

Both coordination of work within an ACO and reporting to the payer on meeting the objectives require sharing of data. It is usually easy to get consent to share the medical records when a patient is referred to another provider in the ACO. But this is not the case when reporting to the payer is concerned, as patients (justifiably) fear that sharing detailed medical records might cause discrimination in future insurance offerings.

Also, a provider may participate in multiple contracts with different payers. Thus, even providers within the same ACO may not want (or even be allowed) to share all of their records (neither with any one payer nor with other members of the ACO).

We describe an architecture for answering questions of the form "how many patients satisfy condition X?", where the data needed to evaluate the condition X may reside with different members of the ACO (called data sources in the following). The questions could be something like "how many patients using drug Z for indication S ended up in emergency room with diagnosis T?", or "how many patients using drug Z had a biomarker D with a value below E or above F?", and could be the basis for getting rebates from the manufacturer of drug Z for those cases where it did not work.

On one hand, the detailed patient data is sensitive, thus cannot be sent out or shared with other providers without privacy-preserving measures. On the other hand, the result affects the cost of the drug for the payer, so it must be possible to verify the correctness of the report. Yet such a system can be built and be auditable using existing technology.

General architecture

The solution is to break the question down into sub-questions "how many patients satisfy A?" and "how many patients satisfy B?", that can each be answered from a single data source and X can be expressed as their logical combination ("A and B", "A or B", "A but not B", etc). The system then gets the answers to these sub-questions from the respective data sources and uses a private set intersection (PSI) protocol, a special case of the more general concept of secure multi-party computation (MPC), to combine them into the desired result without leaking any sensitive information.

The overall architecture of the system is shown in Figure 1. Each data source operates an MPC service node that participates in the MPC on their behalf. The MPC nodes of all data sources communicate with the coordinator node to jointly execute the PSI protocol.

Figure 1: Overall architecture of the MPC-based solution.

Each MPC node has a locally generated encryption key. For each reporting period, the MPC node receives from the respective data source the list of patient identifiers answering a pre-agreed sub-question of the form "which patients satisfy A?" or "which patients satisfy B?". The MPC node then encrypts the list with its encryption key and only the encrypted identifiers participate in the MPC protocol.

Note that each MPC node is hosted and operated by the respective data source. As each MPC node encrypts the list of patient identifiers, no cleartext of the patient data ever leaves the premises of the data source.

Private set intersection from commutative encryption

In general, when a value is encrypted repeatedly with multiple keys, the result depends on the order in which the keys are applied. When a value V is doubly-encrypted with two keys k1 and k2, the result of applying k1 first and k2 second (which we can denote as Vk1,k2) is generally different from the result of applying k2 first and k1 second (Vk2,k1). An encryption scheme where it is guaranteed that the result is the same in both cases (that is, Vk1,k2 always equals Vk2,k1) is called commutative. As an example, the Pohlig-Hellman scheme is commutative.

The PSI protocol based on a commutative encryption scheme works as follows:

  • First, each MPC node encrypts the list of identifiers from its data source with its encryption key and sends the result to the coordinator. For example, if {ID1} denotes the list from the first data source, then its MPC node sends {ID1}k1 to the coordinator.
  • The coordinator sends each list through other nodes in sequence. The other nodes each apply their own encryption to the already encrypted list. For example, {ID1}k1 from the first data source becomes {ID1}k1,k2 and then {ID1}k1,k2,k3.
  • Finally, all lists will have an encryption layer from each MPC node, as shown in Figure 2. The commutative encryption scheme ensures that an identifier that appears in multiple lists will have the same encrypted form in all the lists it appears in.
  • This enables the coordinator to find the size of the intersection or union of the lists (the number of values that appear in all lists, or the total number of unique values in all the lists combined), and report this as the result of the computation. However, neither the coordinator nor any of the MPC nodes will be able to recover the cleartext identifiers making up any of the lists.

Figure 2: Multiply-encrypted lists representing answers to the sub-queries.

Distributed auditing

The system can also support a distributed auditing mechanism to allow the behavior of each party to be verified separately and the global correctness of the process to be deduced from these local audits. This removes the need to have one auditor that everyone must trust to look at their confidential data. (If such a trusted party existed, all data sources could just send their inputs to that party in cleartext and have them announce the result.)

The audits are supported by a bulletin board seen by all the parties (Figure 3). Before a party sends out an encrypted identifier list, it posts a commitment of the list on the board. The recipient only accepts the message if it matches the commitment. The commitments are computed using one-way hash functions, so the original data cannot be recovered from them. Yet, the commitments are binding: it is infeasible to come up with two different data sets to match the same commitment.

To remove the need to trust the bulletin board to operate correctly, two mechanisms are used. First, all updates to the board are cryptographically time-stamped, so that anything posted cannot be changed after the fact. Second, in addition to encryption key, each node also has a signing key and uses it to digitally sign the commitments before posting them to the board, so that no other parties can post commitments in the name of that node.

These features allow the board to be used as a common reference point that links the local audits of all parties into an integrated whole and allows concluding the correctness of the global process from the consistency of all local audits, and also provides hard evidence in case some party has misbehaved.


Figure 3: Shared bulletin board to support distributed auditing.


Summary

We have described a secure multi-party computation protocol that allows queries useful for outcome-based contracting to be executed and also audited in a privacy-preserving manner. However, anyone relying on the outcome of the protocol to be correct will still have to trust the auditors, and also the data sources will have to trust the auditors to maintain the confidentiality of the patient records. In the second post in the series (to appear towards the end of the year) we will look at ways to reduce those trust assumptions.


Written by Ahto Truu, Guardtime