What is data completeness?
Data completeness is, as the phrase would suggest, a state in which a dataset is whole, with no traces of data loss or corruption, and able to present a full picture of an ecosystem or infrastructure.
In the context of business communications, data completeness refers to a state where all business communication data, from every communication channel across an organization, is captured and stored in its entirety – from the content of a message to the associated metadata (such as time stamps, and information on the sender and recipient).
As another example, in the context of trade reconstruction, data completeness means the aggregation of all trading and order data across all venues – from the beginning of a transaction to the end.
Meeting benchmarks for data completeness necessitates that all relevant data is captured, stored, and remains highly accessible – including valuable metadata. Data completeness can be used as a catch-all phrase for all data within an organization, or – more commonly – may refer to the completeness of data within a particular business function, such as recordkeeping, trade surveillance, or communications surveillance.
Having access to complete datasets allows all functions within a business to work from the same trustworthy data source, and unlocks multiple business benefits, including more accurate eDiscovery and faster searches and audits. However, the other side of the data completeness coin is that if datasets are incomplete, organizations may fall foul of regulatory recordkeeping rules – and can result in weaker risk-detection, risk-management, and compromised decision making.
What are the benefits of data completeness?
- Regulatory obligation: First and foremost, there are a range of regulatory rules that require firms evidence high standards of data completeness, including communications data (FINRA Rule 4511, SEC Rule 17a-4), business information (SEC Rule 17a-3, DOJ ECCP), and marketing and advertising data (SEC Marketing Rule). Meeting these is essential to avoid regulatory censure and potential fines.
- Effective risk assessments: When something appears to have gone wrong, be it a potential bullying or harassment incident between colleagues or suspected market abuse, having access to complete data is vital to understanding whether a risk is genuine or not. Context can make all the difference, as can being able to piece together a full audit trail or timeline of events.
- More accurate analysis: Spotting trends and patterns within your organization is essential to optimizing business strategy to reach desired goals. When a business does not have access to complete data, vital trends might get missed, or the data to hand might not tell the full story – potentially leading to the wrong decisions being made.
- Operational efficiency: A complete overview of data gives clearer oversight of where you can drive efficiency and cut cost, from being able to see that a particular communications channel or trading venue is barely used to clocking duplicative processes or lengthy data intensive workflows that can be streamlined.
The perils of incomplete data
Financial institutions in all jurisdictions are under obligations to capture, retain and monitor business communication channels to meet diverse recordkeeping regulations, from Securities and Exchange Commission (SEC) Rule 17a-4 to MiFID II, to Financial Conduct Authority (FCA) Handbook SYSC 9.1.
Since 16 Wall Street firms were fined by the SEC for widespread recordkeeping failures in 2022, we have seen a regular cadence of firms facing fines that now total well into the billions of dollars. Many of these cases included substantial issues with off-channel communications use – meaning regulators were deprived of complete records of business communications. Regulators cannot assess what they cannot see, so if communications data is missing, the assumption is that it is a sign that conversations may be being taken “off channel” (and outside the scope of compliance supervision) because there is something to hide.
While the tempo of these enforcements may have slowed, firms are still being fined for recordkeeping infractions relating to off-channel communications and SEC Marketing Rule violations. Should regulators come knocking, firms need to be able to account for all of their communications and business data and have this ready to hand over – or face the consequences.
And it isn’t just communications data firms need to be conscious of capturing. In March 2024, JP Morgan Chase & Co. faced a significant financial penalty totaling nearly $350 million, due to deficiencies in its trade surveillance data capture procedures. Included in this was the Office of the Comptroller of the Currency (OCC) which imposed a $250 million civil penalty, citing the bank’s operation with “gaps in trading venue coverage and without adequate data controls required to maintain an effective trade surveillance program.” Concurrently, the Federal Reserve Board levied an additional $98.2 million fine, highlighting JPMorgan’s failure to monitor billions of trading activities across more than 30 global venues between 2014 and 2023.
Four steps to maximize data completeness
1) Ensure data from all sources is captured – legitimately
With regulatory rules mandating that firms ensure business and communications data is captured comprehensively and held in its entirety, capturing data from all sources is now a necessity. Taking stock of which trading venues or communications channels are used across your business, and even the social media channels your marketing and sales team may use, is the first step.
2) Select a single-vendor solution to mitigate third-party data risk
The more links there are in a chain, the higher the likelihood of a weak link. While outsourcing services like data capture and archiving to a third-party is increasingly common, regulators are focusing on due diligence around these “critical third-party” relationships. The weekly news cycle now contains at least one example of a data breach or outage where a third (or even fourth) party has been the point of failure.
Relying on a patchwork of vendor solutions heightens the likelihood of data issues, where each vendor passing on data presents the potential for data to be lost or corrupted in transit – a costly game of “telephone.” Each vendor touchpoint also presents a potential ingress point for bad actors. By relying on a single-vendor solution that gives you control over the entirety of your data lifecycle, from ingestion to archiving to reporting, points of failure are minimized, data can travel frictionlessly, and you can rely on a single point of contact and accountability.
3) Break down data siloes
While all functions within a business work towards the same common goal, many will work independently of one another day to day. This data siloing can lead to situations where a lack of holistic overview or communication between teams results in risk.
Recordkeeping functions are, for example, “data owners” and responsible for data ingestion and archiving. But often surveillance teams – who are responsible for reviewing comms – are working from different data sets or don’t have full oversight of all venues. This can lead to two business functions looking at different data and seeing different, disparate outcomes. Should a data stream stop being captured, or capture be interrupted, and this not be effectively communicated, data completeness is compromised – and regulators won’t care which team was ultimately responsible.
Ensuring teams have access to the same consistent, comprehensive, and reliable pool of data is essential to breaking down these data siloes and minimizing the risks from miscommunication or unclear roles and priorities.
4) Reconciliation, reconciliation, reconciliation
Working with a trustworthy vendor gives considerable peace of mind when it comes to knowing your data is captured completely, but reconciliation will always be a vital part of the data life cycle. Validating each captured version of a message against the version from source is crucial to detect potential gaps or failures in captured data that may cause data to be incomplete.
Leveraging a solution that provides Constant Integrity Check (CIC) capabilities means data is constantly and automatically scanned and validated against audit events of every message’s lifecycle to verify that it is present, viable, and accessible throughout a retention term. This verification allows you to confirm that messages have been captured and stored completely, with all relevant data and metadata in place, and reduces lengthy manual reconciliation reviews.
The data completeness benchmark has been set, and regulators expect firms to meet it. Ensuring your business communications data is compliantly captured and archived across every channel is now a necessity – because firms that provide regulators with incomplete datasets will face the consequences.