Approach // the method

The exposure lives in the data flow.

AI compliance is usually treated as a policy exercise. The real risk is architectural: it is the set of places personal information moves when a model runs. Custos works at that layer, then maps it to the regulation.

[ 01 ]

Map where personal information moves

A typical AI pilot touches member or customer data in four places. Most architectures account for the first one and quietly leak through the other three. The vector store keeps embeddings of personal information. Prompt history is logged, often to a foreign region. Inference itself generates transient state that still constitutes a transfer under POPIA.

The first deliverable on any engagement is a complete map of these flows. You cannot design a sovereign architecture for data you have not located.

1 of 4 mapped · 0 exposed

PII flow · typical LLM pilot

Storage pending

Vector embeddings pending

Prompt history pending

Inference state pending

Three of four touchpoints leak personal information past the boundary most teams think they have secured.

[ 02 ]

The regulatory frame Custos designs against

POPIA S72

Cross-border transfers

Personal information may only leave SA where the receiving country offers adequate protection, or under a contract that imposes equivalent terms. An LLM inference call to a US-hosted endpoint is a transfer event, not just a network request.

POPIA S26

Special personal information

Health, biometric, and certain financial data carry a higher processing bar. This is the default condition for medical aid and life insurance AI, and it changes what counts as an acceptable architecture.

POPIA S19

Security safeguards

Reasonable technical and organisational measures are mandatory. For AI workloads that means controls over logging, vector stores, and inference state, not just access control on a database.

SARB Directive 3 of 2018

Cloud computing

Banks must assess data sovereignty and concentration risk for cloud deployments. AI inference through a foreign-controlled provider is exactly the case this directive was written to surface.

SAM Pillar 1 & 2

Insurance model risk

Solvency Assessment and Management treats AI and model risk under operational risk and governance. The interpretation emerging is that AI needs its own architectural controls, not a footnote in a risk register.

[ 03 ]

Three sovereign deployment patterns

On-prem GPU

Inference on owned hardware inside the client network. Maximum control, highest capital cost. Right for the most sensitive workloads.

Sovereign cloud

Private inference on SA-sovereign infrastructure such as Cassava AI Factory. Local compute at scale without the capital outlay of owning a cluster.

Hybrid retrieval

Sensitive retrieval stays local; only de-identified context reaches an offshore model. The right pattern depends on data classification and latency.

None of these is "default OpenAI." The choice between them is driven by data classification and decision latency, and it is the heart of the architecture work.