Approach // the method
The exposure lives in the data flow.
AI compliance is usually treated as a policy exercise. The real risk is architectural: it is the set of places personal information moves when a model runs. Custos works at that layer, then maps it to the regulation.
Map where personal information moves
A typical AI pilot touches member or customer data in four places. Most architectures account for the first one and quietly leak through the other three. The vector store keeps embeddings of personal information. Prompt history is logged, often to a foreign region. Inference itself generates transient state that still constitutes a transfer under POPIA.
The first deliverable on any engagement is a complete map of these flows. You cannot design a sovereign architecture for data you have not located.
1 of 4 mapped · 0 exposed
PII flow · typical LLM pilot
Three of four touchpoints leak personal information past the boundary most teams think they have secured.
The regulatory frame Custos designs against
Cross-border transfers
Personal information may only leave SA where the receiving country offers adequate protection, or under a contract that imposes equivalent terms. An LLM inference call to a US-hosted endpoint is a transfer event, not just a network request.
Special personal information
Health, biometric, and certain financial data carry a higher processing bar. This is the default condition for medical aid and life insurance AI, and it changes what counts as an acceptable architecture.
Security safeguards
Reasonable technical and organisational measures are mandatory. For AI workloads that means controls over logging, vector stores, and inference state, not just access control on a database.
Cloud computing
Banks must assess data sovereignty and concentration risk for cloud deployments. AI inference through a foreign-controlled provider is exactly the case this directive was written to surface.
Insurance model risk
Solvency Assessment and Management treats AI and model risk under operational risk and governance. The interpretation emerging is that AI needs its own architectural controls, not a footnote in a risk register.
Three sovereign deployment patterns
On-prem GPU
Inference on owned hardware inside the client network. Maximum control, highest capital cost. Right for the most sensitive workloads.
Sovereign cloud
Private inference on SA-sovereign infrastructure such as Cassava AI Factory. Local compute at scale without the capital outlay of owning a cluster.
Hybrid retrieval
Sensitive retrieval stays local; only de-identified context reaches an offshore model. The right pattern depends on data classification and latency.
None of these is "default OpenAI." The choice between them is driven by data classification and decision latency, and it is the heart of the architecture work.