What is a Data Owner, Really?
What does it mean to take ownership? There are a lot of different ways the term “owner” is used in our society and within our organization. Often industry experts in Security and Data Governance texts will divide ownership up into three different subsets: ownership, stewardship and custodianship. I believe that this approach is left over from land use terminology and oversimplifies how we might understand data today.
I’d like to dive in and really think about data ownership with a number of different lenses. (We are Pluralsight afterall.) I went ahead and did some homework around ownership to really understand the terms as well as the legal and social implications of what it is. From what I discovered (big surprise), ownership is complex.
Ownership
Ownership is a large and often contested concept. It is a multi-variant, changing, and interrelated set of individual and group relations, but generally implies forms of granted rights, agreements, and behaviors broken down across a certain set of qualities. Rights imply both duty and entitlement.
- Possession: concepts around physical custody and control
- Accountability: concepts around responsibility, answerability, and liability
- Execution: concepts around tasks being performed
- Production: concepts around value, profit and reward systems
Ownership implies that the owner is accountable for and to the economic benefits and costs of the property in some way. Ultimately, ownership touches on a state of mind.
The rights of an owner include the ability to grant rights and access privileges to others and the rights might be subjective, objective or both depending on the context.
For example, users own a certain amount of their data which is why we have PII and GDPR regulations. Pluralsight owns data that could be related to information generated by our system. PXTs might “own” and be responsible for generating the data within a bounded context and could grant access to other entities to view or upsert data generated by others through their services.
Generally, as an owner, you have responsibilities around production and consumption regarding the property. Owners apply a certain lens to solve problems around the property involved. As an owner, you can be both a grantor and a grantee. Owners are granted authority within a larger system, and can grant systems of rules, behaviors, and responsibilities to others regarding their properties. This can apply to software just like it does with more concrete property.
Data
Data at its lowest form is the facts, and nothing but the facts. It’s the ones and zeroes - the quantities, characters, and symbols. Data is our basis of reasoning and calculation, but it does not necessarily include the context in which it was generated. Data is stored and transmitted in the form of electrical signals and recorded on magnetic, optical, or mechanical recording media.
Operations are performed on data by a system. Data is assigned and operated upon using different models and contexts. Data is only relevant in as much as it is related and associated within a larger system and other distinct pieces of information. In its system it is formatted in a certain, agreed upon (standardized) way. By this definition where and what definitions we standardize becomes important. Data is tightly coupled to the system in which it resides.
Data is Part of a Larger System
A bounded context is a mental (linguistic) model for drawing definitions around these facts (entities, events, functions, behaviors, and attributes) and their formats. Having custody and control, liability, tasks, and value defined for data in these models is essential for ownership to be clearly defined.
Owning a bounded context means that a team holds conceptual knowledge about the data, its interrelationships and dependencies, and the goal of the system. Anyone that does not have a full understanding of the concepts that the bounded context was built around could potentially misuse or misunderstand the data in the system.
As we standardize and change, the scope and reach of the data shifts, as does the inertia of the system. Standardization has benefits and drawbacks and must be weighed carefully by the people it impacts. In order to do that you have to know who it impacts. Having tools and conventions that help us understand that dynamically is part of the reason for this blog post.
In other words, data could be considered signals generated by a system, but generally must be associated with the context of the system in order to be utilized. The system that generates data gives context and generally grants the rights of ownership to the data and, importantly, to the people associated with it.
Human-Centered Design
Human Centered Design processes can include questions around analyzing and determining answers for the tests necessary to determine success and permissions of the system and its data (e.g. fitness functions, unit, integration tests, etc.) in order to build out a bounded context. Tests are an integral part of discovery and design processes. The data is related to the people who are responsible for generating, collecting, and reporting or when determining the ownership in the business model of a bounded context (Domain-driven design). Essentially, we can use our current user-centered and directed discovery processes to better define and understand types of data ownership in a bounded context.
Formulating and answering questions during directed discovery that center on data can help us build better tools. When we can answer questions about the data model we will have an easier time defining the questions around ownership as well.
I think helping teams understand how to collect data, what event sourcing entails and when to use it, and what kinds of data are relevant for any given experience. Additionally, linking data to user/consumer and owner/producer behavior will go a long way to help us solve many of the cross-cutting concerns we have as we grow our business. Data is an element of an interrelated system. Data can only be understood in relationship to the domain it exists in and to the people that hold the institutional knowledge about it.
Given all that…
…I posit that, in any given system, data ownership is shared in a number of ways around the qualities I mentioned above. If we want to understand ownership the concept needs to be carved into sets of common behaviors and contextual understanding that is shared as part of a convention in the meta-data in order to adjust and make changes and decisions quickly and completely.
RAPID is a form of Responsibility Assignment Matrix (a RAM or RACI) created by Bain & Company. It can be used to delineate behaviors and conventions for distributing responsibilities and decision making power among people and their roles. At Pluralsight our engineers use RAPID to speed up descision-making processes and determine responsibilities around certain high-level and cross-cutting concerns.
Conventions and matrices delineating data ownership have the potential to help organizations make better decisions, communicate more clearly, and define behaviors and mental models more quickly.
Enter the Matrix
Systems must be observable to owning parties. Proximity to the owners, the creators and consumers needs to be tracked and communicated. The sociotechnical system must be visible and people must be notified of changes in their responsibilities.
I suggest creating a Responsibility Matrix for Data Ownership broken down along the lines I delineated above. We can iterate into this and improve upon it if it works well. I see many broad level implications for systems that use this to go above what a traditional SDL allows.
- Possession
- Accountability
- Execution
- Production
Whenever we choose to share information beyond the bounds of a system a Responsibility Matrix could be included to help build stronger communication lines and create greater technical understanding of the system.
Individuals could use a matrix like this much like an old-fashioned calling tree to assist with access, collection, translation, and knowledge sharing about the data, networks and systems around them without being owners of the data. The mental models (information), knowledge (behaviors and goals), and patterns (wisdom) of the systems they are accessing would have a master ownership data associated with it as part of its meta-schema.
Having processes, documents, and systems that are observable around accessing data, modeling information, and goal-setting will help us make decisions and changes more quickly.
In order to understand data ownership we need to be consciously thinking about data as a product separate from the technology of the system when building our business (domain) models. We must understand what data factors into the success and goals of the system and how that is intimately related to the people owning it.
This post is meant to spark discussion around data ownership and to offer a potential solution that could impact data governance, what that means and how data is related to people that use it. So, please - Discuss!
Questions of Ownership
Possession
- Who is generating the data?
- Who is responsible for producing the data?
- Who controls the publication of the data?
- Who controls the access permissions of the data?
- Who is consuming the data?
- Who can access the data?
- Who can store the data?
Accountability
- Who is responsible for data loss?
- Who is liable if data is corrupted?
- Who has an understanding of the purpose/goal of the data?
- Who fixes problems with the data?
- Who can ultimately verify the data is correct?
Execution
- Who decides what data needs to be collected?
- Who is responsible for collecting the data?
- Who decides what roles and organizations can access the data?
- Who can add, change, and remove data from the system?
Production
- Who can benefit from the data?
- Who decides what the data is worth and sets pricing?
- Who benefits when the data is sold?
Example Matrix
Our data design process could include creating an initial responsibility matrix that lays out ownership behaviors associated with roles inside the system. The roles, responsibilities and questions may change depending on the platform it is being shared in, the producer of the data, and the audience it is intended for as well as any other domains it concerns. A document like this could be included with the meta-data/headers of the streams and/or APIs. The strength of this approach is that it is flexible and only likely to change when the structure of the data changes. It could be included as a part of our initiatives to build self-documenting processes and make data easier to interpret.
https://docs.google.com/spreadsheets/d/1cOVFPNWl0FGLbbdTxNCTSfuuBz9lv8OtqC89DvkVNfA/edit#gid=0
Additional Links for Thought:
-
https://blog.acolyer.org/2019/10/30/corels/, https://blog.acolyer.org/2019/10/28/interpretable-models/
-
https://www.rti.com/resources/tech-talk-principles-of-dds-the-databus-and-data-centricity
-
https://www.iiconsortium.org/pdf/IIC_PUB_G5_V1.01_PB_20180228.pdf