Sony’s new AI benchmark fights bias with consent

According to TheRegister.com, Sony AI has released the Fair Human-Centric Image Benchmark (FHIBE), which contains 10,318 consensually-sourced images of 1,981 unique subjects from more than 81 countries with detailed annotations. The benchmark is designed to test fairness across various computer vision tasks and addresses biases like models misclassifying female doctors as nurses or associating certain demographics with criminal activities. Alice Xiang, Sony’s global head of AI Governance, explained that most existing computer vision datasets were collected without consent, with seven well-known datasets being revoked by their authors. Sony business units are already using FHIBE in their AI ethics review processes, and the researchers found issues like models being less accurate for people using “She/Her/Hers” pronouns due to hairstyle variability.

Sponsored content — provided for informational and promotional purposes.

Here’s the thing about most AI training data – it’s basically stolen. Xiang points out that “the vast majority of computer vision benchmark datasets were collected without consent,” which explains why we’re seeing all those AI copyright lawsuits piling up. And it’s not just about legal exposure – when you train models on data scraped without permission, you’re building on foundations that might disappear overnight. Remember when seven major datasets got pulled by their own creators? That’s the risk everyone’s running.

What’s fascinating is how Sony approached this differently. They actually paid people and got proper consent, which sounds like common sense but is apparently revolutionary in AI circles. They’ve created a detailed benchmark that others can use to test their own models. But here’s my question: if consent-based data collection is so obviously better, why isn’t everyone doing it already?

When bias becomes more than academic

The examples Xiang shares are genuinely concerning. In China, facial recognition systems have mistakenly allowed family members to unlock phones and make payments – imagine your teenager accidentally draining your bank account because the AI can’t tell Asian faces apart. We’ve seen similar issues documented in studies showing facial recognition struggles with Black faces.

And it’s not just about accuracy – it’s about harmful stereotypes getting baked into systems. When a model sees a person and immediately associates them with criminal activity based on demographics, we’re basically automating prejudice. That’s not just bad AI – that’s dangerous AI.

Data nihilism meets ethics

Xiang uses this great term – “data nihilism” – to describe the industry attitude that ethical data collection is impossible if we want cutting-edge AI. Basically, the thinking goes that you can’t scale consent, so we should just give up and scrape everything. But is that really true? Or are we just not trying hard enough?

Look, I get it – building massive datasets ethically is expensive and complicated. But when we’re talking about systems that could determine who gets hired, who gets loans, or even who gets arrested, shouldn’t we be willing to invest in getting the foundation right? Companies that supply critical hardware like industrial panel PCs understand that reliability starts with quality components – maybe AI needs the same mindset about its data.

The billion-dollar question

Here’s where it gets tricky though. FHIBE is great for testing, but it’s only 10,000 images. Training modern vision models requires millions, sometimes billions of images. So how do we scale ethical data collection to that level? Xiang admits this is the “scalability issue” they haven’t solved yet.

But maybe that’s the wrong question. Instead of asking how we can ethically collect billions of images, maybe we should be asking whether we actually need billions of images. There’s growing evidence that better curated, higher quality datasets can outperform massive scraped collections. And with regulations like the EU AI Act pushing for bias assessments in high-risk applications, the business case for ethical data is getting stronger every day.

The real test will be whether other companies follow Sony’s lead. Meta has its FACET benchmark but disbanded its Responsible AI team – which tells you something about priorities. Meanwhile, as Xiang notes, the US federal government’s AI plan doesn’t even mention ethics. So we’re left with a patchwork of voluntary efforts while the technology races ahead. Not exactly comforting, is it?