Article imageLogo
Chatbots Behaving Badly™

From SOC 2 to True Transparency - Navigating the Ethics of AI Vendor Data

By Markus Brinsa  |  August 3, 2025

Sources

At a glance, today’s AI purchasing climate seems almost utopian: vendors promise state-of-the-art models, seamless integrations, and security standards you can supposedly take to the bank. The industry favorite in this area? SOC 2 certification—a gold star for privacy, security, availability, processing integrity, and confidentiality. Marketers shout about it, procurement teams breathe a little easier, and compliance departments sleep better with it in their vendors’ portfolios. But in the slick race to adopt generative tools, recommendation systems, and predictive engines, a new and thornier question has emerged—one that SOC 2 can’t answer: Where does the data really come from, and is it ethical?

Let’s pull back the curtain and investigate why those shiny compliance checkmarks might not be enough, especially when it comes to AI—and how to know if a vendor’s data backbone is truly upright, or just another case of smoke and mirrors.

The Comfort and the Limitations of SOC 2

Vendors parade their SOC 2 reports the way bakeries display their health grades. After all, this badge means an independent auditor has scrutinized their systems and found them remediated, certified, and—most importantly for many clients—trustworthy. SOC 2, created by the American Institute of Certified Public Accountants (AICPA), is explicitly designed to give assurance about how a company guards your data from loss, leaks, and mishandling. Its five Trust Services Criteria sound reassuring: security, availability, processing integrity, confidentiality, and privacy. For any company handling sensitive information, these are real concerns; neglecting them can be career-ending.

So, if a vendor can produce their sparkly SOC 2, shouldn’t that be enough?

Well, if all you want is confidence that your vendor won’t fumble your data off the back of a digital truck, perhaps. But if you want to feel confident that their AI isn’t built on a foundation of stolen, biased, or exploitative data—if you want to ensure that their definition of “responsible AI” matches your own—you’ll need to dig much, much deeper.

Beyond the Audit: The Murky World of AI Data Sourcing

Here’s where the plot thickens. SOC 2, for all its strengths, is a map for navigating a landscape of well-understood risks—data breaches, downtime, failed backups. But the world of AI data is wonderfully, terrifyingly vast and largely unregulated when it comes to the finer points of ethics. Imagine an AI vendor—a luminary in the marketing analytics world—declaring that its model has been “trained on millions of real-world customer interactions.” What’s left unsaid: Were those interactions gathered with informed consent? Did customers know their chats, emails, or images would train the algorithm? Was any attempt made to balance gender, race, language, or geography, or did the model simply harvest the nearest (and cheapest) available data?

SOC 2 doesn’t ask; it doesn’t even know to ask. Instead, these questions form the front lines of a new kind of due diligence—a discipline as much about investigative research as about checklists.

The Rise of AI Audits and the New Procurement Detective

Let’s be honest: verifying ethical sourcing isn’t about paperwork. It’s about exposure, curiosity, and sometimes, getting proof that a vendor is walking the talk. In this era, the best AI buyers don’t just review the SOC 2 binder—they ask for the origin story behind the data. They demand receipts, insist on transparency, and refuse to accept “proprietary reasons” as a free pass to opaqueness.

When faced with these questions, vendors will fall into two camps. The first: those with nothing to hide, who gladly unspool documentation on data pedigree, consent protocols, and audit trails. They talk about “data cards” or “model cards”—detailed records showing which data was used, when, how, and under what licenses. They get specific, not just on compliance but on representation: who was in the training set, and who was left out? Was the data “debiased” or just “collected at scale?” They aren’t insulted by questions about bias audits or fairness metrics—they expect them.

The second camp? They obfuscate. They wave their SOC 2 report as a magic shield. They use phrases like “industry-standard hygiene” or “security best practices” and dodge provenance questions by citing competitive secrecy. Sometimes, they give you a compliance officer’s direct line and hope you won’t call. These vendors may be secure in the SOC 2 sense, but ethically sourced data? That’s another story.

This new AI procurement era therefore isn’t about checking off boxes—it’s about being a data detective. It takes a mindset of relentless inquiry, fact-checking, and, sometimes, a willingness to walk away when documentation is thin.

Data Law: Necessary, but Not Sufficient

It would be lovely if we could simply say “Are you GDPR compliant?” and call it a day. Surely the rigor of European regulators guarantees both legal and ethical sourcing? Not quite. While the General Data Protection Regulation (GDPR), California Consumer Privacy Act (CCPA), and a smattering of other frameworks protect privacy, they don’t spell out a moral code for what data can be used to train AI. GDPR might require explicit consent for data use, but nothing in the law says a vendor must, say, ensure their training set doesn’t perpetuate old biases or source only copyright-cleared images for generative models.

That’s why those searching for ethical AI must always go further: demanding proof that data was sourced with meaningful consent, that every photo or text snippet was either public domain, properly licensed, or volunteered with full context about how it would be used—not just for a direct service, but for AI model training. That is a critical leap, and many companies haven’t made it.

Bias: The Hidden Hitchhiker

The most insidious risk in the AI vendor ecosystem isn’t malice—it’s indifference. Left unchecked, AI systems trained on “naturally occurring” data replicate every flaw and prejudice of society—sometimes amplifying them. If the data comes from a biased pool (say, mostly men, or mostly North Americans, or mostly English-speakers), its recommendations, predictions, and generated outputs reflect those skews. Yet, even those vendors proudest of their security posture may have never tested their models for disparate impact, or for “minority underrepresentation.”

When evaluating vendors, listen for their approach to bias. Do they conduct regular bias and fairness audits? Do they employ fairness-aware learning techniques and measure demographic effects? Have they ever had to pull a dataset because it didn’t meet their own standards? If all you get are blank stares or platitudes, look elsewhere.

Remember: being “secure” or “compliant” under SOC 2 is not the same as being fair or unbiased.

Privacy by Design: More than Encryption

SOC 2 does examine privacy, but in a mechanical sense—access controls, encryption, retention schedules. Considerably more nuanced, though, is the discipline known as “privacy by design.” Ethical AI vendors build mechanisms to anonymize input data, strip identifiers, and enforce privacy processes throughout the lifecycle, not merely at the perimeter. When considering a vendor, ask for concrete examples. Have they implemented systematic anonymization for all training data? Are synthetic or federated data sets in use? Is there clear evidence of privacy risk assessment beyond routine access control?

You want to hear about practices like persistent monitoring, ongoing consent management, and the ability to honor “right to be forgotten” requests—even for model training history. Data privacy for AI isn’t a switch; it’s a continual commitment.

The Vendor “Story”: Attestations, References, and Third-Party Trust

Here’s a trade secret: Some of the best due diligence comes through references. Before you sign on, ask who else trusts this vendor, for what, and how they’ve upheld ethical standards under scrutiny. What certifications or third-party validations can they show for AI governance or data ethics—ISO 42001, for example, or seals from industry consortia working on AI governance? Has anyone audited their AI for bias, explainability, or ethical compliance beyond standard IT risks?

Press for answers. The best vendors will introduce you to real customers and share audit summaries or certificates. They will have undergone independent scrutiny—voluntarily or at the behest of their most demanding clients.

Contracts: Ethics in Black-and-White

It’s easy to get swept up in the pitch, but the ultimate test comes when pen meets paper. Your contract isn’t just a price tag—it’s a tool for accountability. For truly ethical AI deployment, insist on clear clauses mandating legal data sourcing, explicit processes for bias identification and remediation, and continuous obligations for transparency. Set out what constitutes a material data breach (hint: not just leaks, but unethical or unconsented use), and build in the right to audit or request ongoing documentation.

If a vendor balks at this level of specificity, perhaps they’re not truly committed—just compliant.

Conclusion: Toward a More Responsible AI Marketplace

Buying AI is unlike any other procurement process. What you’re “buying” isn’t just lines of code and guarantees of uptime, but a worldview encoded in data, algorithms, and practices. SOC 2 is vital and laudable. Without it, your vendor might be a privacy time bomb. But with it—and only it—you risk missing the deeper, subtler hazards that arise when models are built on ethically shaky ground.

The next generation of AI adopters—those who will distinguish themselves as trusted, responsible innovators—are already going further. They’re asking the uncomfortable questions, seeking receipts, demanding proof that AI is as ethical as it is secure.

So the next time a vendor hands you a shiny SOC 2 report, be grateful—but take it as only the beginning. In the world of AI, what happens behind the curtain matters every bit as much as what is on stage. Always look for the data story, demand transparency, and let your company’s principles—not just compliance—be your north star. That’s how you build AI you, your customers, and the world can trust. 

And if you’d like to work with people who actually understand what Ethical Data Sourcing and AI Audits mean in AI strategy, drop me a note at ceo@seikouri.com or swing by seikouri.com.

About the Author

Markus Brinsa is the Founder and CEO of SEIKOURI Inc., an international strategy consulting firm specializing in early-stage innovation discovery and AI Matchmaking. He is also the creator of Chatbots Behaving Badly, a platform and podcast that investigates the real-world failures, risks, and ethical challenges of artificial intelligence. With over 15 years of experience bridging technology, business strategy, and market expansion in the U.S. and Europe, Markus works with executives, investors, and developers to turn AI’s potential into sustainable, real-world impact.

©2025 Copyright by Markus Brinsa | Chatbots Behaving Badly™