创办院长罗卫国
When AI Can Fake Reality......
发布时间:2024-02-26




AI能伪造现实,眼见是否为实?



TED+英语世界+明安林

罗卫国:学习一下AI的英语表述和社会责任。



技术专家和人权倡导者萨姆·格雷戈里(Sam Gregory)表示,大量超逼真的深度伪造影像会导致我们愈加质疑现实。当人们无法相信所见之物时,民主会受到何种影响?一起来了解辨别人造与真实的三个关键步骤,以及巩固对真相的感知对 AI 时代的未来为何至关重要。




It's getting harder, isn't it, to spot real from fake, AI-generated from human-generated. With generative AI, along with other advances in deep fakery, it doesn't take many seconds of your voice, many images of your face, to fake you, and the realism keeps increasing.

识别真假——即识别 AI 生成与人类生成的内容——变得愈发困难,是吧?利用生成式人工智能,加上深度伪造技术的其他进展,只需几秒声音、几张脸部图像,就能伪造你的身份,而且逼真程度不断提升。

I first started working on deepfakes in 2017, when the threat to our trust in information was overhyped, and the big harm, in reality, was falsified sexual images. Now that problem keeps growing, harming women and girls worldwide.

我于2017年开始研究深度伪造技术,当时信息信任遭遇的威胁被夸大,而实际上造成巨大伤害的是虚假的情色图片。如今这个问题愈演愈烈,伤害着全球的妇女和女童。


But also, with advances in generative AI, we're now also approaching a world where it's broadly easier to make fake reality, but also to dismiss reality as possibly faked.

不过,随着生成式人工智能的进步,我们的世界现在不仅变得更容易伪造现实,也更容易将现实视为可能是伪造的。

Now, deceptive and malicious audiovisual AI is not the root of our societal problems, but it's likely to contribute to them. Audio clones are proliferating in a range of electoral contexts. "Is it, isn't it" claims cloud human-rights evidence from war zones, sexual deepfakes target women in public and in private, and synthetic avatars impersonate news anchors.

现在,欺骗性和恶意的视听人工智能并不是我们社会问题的根源,但它可能会为这些问题添油加醋。各种选举环境中,音频克隆激增。是,不是的争议模糊了战区的人权证据,情色深度伪造瞄准了公开和私密场合的妇女,合成头像则冒充新闻主播。

I lead WITNESS. We're a human-rights group that helps people use video and technology to protect and defend their rights. And for the last five years, we've coordinated a global effort, "Prepare, Don't Panic," around these new ways to manipulate and synthesize reality, and on how to fortify the truth of critical frontline journalists and human-rights defenders.

我领导着一个名为目击者的组织。我们是一个人权组织,帮助人们运用视频和技术来保护及捍卫自己的权利。过去五年,我们围绕这些操纵和合成现实的新方法及如何巩固关键前线记者和人权捍卫者的真相,协调开展了一项全球活动,名为准备好,不要慌

Now, one element in that is a deepfakes rapid-response task force, made up of media-forensics experts and companies who donate their time and skills to debunk deepfakes and claims of deepfakes. The task force recently received three audio clips, from Sudan, West Africa and India. People were claiming that the clips were deepfaked, not real.

其中之一是深度伪造快速应对特别小组,由媒体取证专家和公司组成,他们贡献时间和技能来揭穿深度伪造及其相关主张。小组最近收到了来自苏丹、西非和印度的三段音频剪辑。人们声称这些剪辑是深度伪造的,不是真实的。


In the Sudan case, experts used a machine-learning algorithm trained on over a million examples of synthetic speech to prove, almost without a shadow of a doubt, that it was authentic.

针对苏丹的那段音频,专家使用了一种经由100多万个合成语音样本训练的机器学习算法,几乎毫无疑问地证明了其真实性。

In the West Africa case, they couldn't reach a definitive conclusion because of the challenges of analyzing audio from Twitter, and with background noise.

西非音频,由于分析来自推特的音频困难重重,再加上背景噪音的影响,专家无法得出明确结论。

The third clip was leaked audio of a politician from India. Nilesh Christopher of “Rest of World” brought the case to the task force. The experts used almost an hour of samples to develop a personalized model of the politician's authentic voice. Despite his loud and fast claims that it was all falsified with AI, experts concluded that it at least was partially real, not AI.

第三段剪辑是印度一个政客泄露的音频。科技媒体 Rest of World 的尼勒什·克里斯托弗将它提交给了特别小组。专家使用了近一个小时的样本来开发该政客真实声音的个性化模型。尽管他极力声称这全是 AI 伪造的,但专家们最终得出结论,至少部分内容是真实的,不是 AI 伪造的。

As you can see, even experts cannot rapidly and conclusively separate true from false, and the ease of calling "that's deepfaked" on something real is increasing. The future is full of profound challenges, both in protecting the real and detecting the fake.

正如你们所见,即使是专家也无法快速而确切地区分真假,而且愈加容易将真实的东西说成那是深度伪造的。未来充满了巨大的挑战,既要保护真实,又要识别伪造。


We're already seeing the warning signs of this challenge of discerning fact from fiction. Audio and video deepfakes have targeted politicians, major political leaders in the EU, Turkey and Mexico, and US mayoral candidates.

我们已经看到了辨别事实和虚构这一挑战的预警信号。音视频深度伪造已经瞄准了政客、欧盟、土耳其和墨西哥的主要政治领袖,以及美国的市长候选人。

Political ads are incorporating footage of events that never happened, and people are sharing AI-generated imagery from crisis zones, claiming it to be real.

政治广告在使用从未发生过的事件的素材,人们在分享来自危机区域的 AI 生成图像且声称这样的图像是真实的。

Now, again, this problem is not entirely new. The human-rights defenders and journalists I work with are used to having their stories dismissed, and they're used to widespread, deceptive, shallow fakes, videos and images taken from one context or time or place and claimed as if they're in another, used to share confusion and spread disinformation.

再强调一遍,这个问题早已有之。与我合作的人权捍卫者和记者已经习惯了他们的报道被打回,习惯了广泛存在的欺骗性的肤浅假新闻——从一个背景、时间或地点获取的视频和图像被声称属于另一个背景、时间或地点,用来制造混淆和传播虚假信息。

And of course, we live in a world that is full of partisanship and plentiful confirmation bias. Given all that, the last thing we need is a diminishing baseline of the shared, trustworthy information upon which democracies thrive, where the specter of AI is used to plausibly believe things you want to believe, and plausibly deny things you want to ignore.

当然,我们生活在一个充满党派偏见和大量确认偏见的世界。鉴于这一切,我们最不希望看到的就是共享可靠信息的基准线不断降低,而那样的信息是民主制度蓬勃发展所需的——在此,AI 这个幽灵被用来让你理直气壮地相信你想相信的事情,理直气壮地否认你想忽视的事情。

But I think there's a way we can prevent that future, if we act now; that if we "Prepare, Don't Panic," we'll kind of make our way through this somehow. Panic won't serve us well. [It] plays into the hands of governments and corporations who will abuse our fears, and into the hands of people who want a fog of confusion and will use AI as an excuse.

但我认为,如果我们立即采取行动,我们有办法防止这样的未来成真;如果可以准备好,不要慌,我们将会以某种方式渡过这个难关。恐慌对我们毫无裨益。(它)正是会滥用我们恐惧的政府和企业需要的,是那些想制造混乱并会把 AI 作为借口的人需要的。

How many people were taken in, just for a minute, by the Pope in his dripped-out puffer jacket? You can admit it. More seriously, how many of you know someone who's been scammed by an audio that sounds like their kid? And for those of you who are thinking "I wasn't taken in, I know how to spot a deepfake," any tip you know now is already outdated. Deepfakes didn't blink, they do now. Six-fingered hands were more common in deepfake land than real life — not so much.

有多少人只是一时被教皇穿着时髦羽绒服所迷惑了?你可以承认。更严重的是,有多少人知道有人被听起来像自己孩子的音频所欺骗了?对那些认为我没有被骗,我知道如何辨别深度伪造的人来说,现在知道的任何技巧都已经过时了。深度伪造以前不会眨眼,现在会了。在深度伪造的世界中,六指手比现实生活中更常见——其实并非如此。


Technical advances erase those visible and audible clues that we so desperately want to hang on to as proof we can discern real from fake. But it also really shouldn’t be on us to make that guess without any help. Between real deepfakes and claimed deepfakes, we need big-picture, structural solutions.

技术进步抹去了那些我们渴望用来证明我们能够区分真假的可见和可听的线索。但我们也不该没有任何帮助就做出那种猜测。在真实的深度伪造和所谓的深度伪造之间,我们需要全局性、结构性的解决方案。

We need robust foundations that enable us to discern authentic from simulated, tools to fortify the credibility of critical voices and images, and powerful detection technology that doesn't raise more doubts than it fixes. There are three steps we need to take to get to that future. Step one is to ensure that the detection skills and tools are in the hands of the people who need them.

我们需要坚实的基础使我们能够辨别真实与模拟,需要工具来加强关键声音和图像的可信度,需要强大的检测技术且不会引起更多怀疑。我们需要采取三个步骤来迈向这样的未来第一步是确保那些检测技术和工具掌握在需要它们的人手中

I've talked to hundreds of journalists, community leaders and human-rights defenders, and they're in the same boat as you and me and us. They're listening to the audio, trying to think, "Can I spot a glitch?" Looking at the image, saying, "Oh, does that look right or not?" Or maybe they're going online to find a detector.

我与数百名记者、社区领袖和人权捍卫者进行过交流,他们和你、我及我们的处境一样。他们听音频时会想我能发现一个错误吗?,看图像时会说哦,这看起来对劲儿吗?。或许,他们会上网找个检测器。

And the detector they find, they don't know whether they're getting a false positive, a false negative, or a reliable result. Here's an example. I used a detector, which got the Pope in the puffer jacket right. But then, when I put in the Easter bunny image that I made for my kids, it said that it was human-generated. This is because of some big challenges in deepfake detection.

他们能找到检测器,但他们不知道自己得到的检测结果是错误肯定、错误否定还是确实可靠的。举个例子,我用一个检测器成功识别了教皇穿着羽绒服的图像。但是,同一个检测器,当我输入自己为孩子制作的复活节兔子的图像时,它却说是人工生成的。这是由于深度伪造检测中存在的一些重大挑战。

Detection tools often only work on one single way to make a deepfake, so you need multiple tools, and they don't work well on low-quality social media content. Confidence score, 0.76-0.87, how do you know whether that's reliable, if you don't know if the underlying technology is reliable, or whether it works on the manipulation that is being used?

检测工具通常一种工具只针对一种深度伪造方法,因此需要多种工具来进行检测,并且它们在低质量社交媒体内容上效果不佳。置信度得分为0.76-0.87,如果不知道底层技术是否可靠,或者它是否适用于正用的操作,怎么知道它是否可靠呢?

And tools to spot an AI manipulation don't spot a manual edit. These tools also won't be available to everyone. There's a trade-off between security and access, which means if we make them available to anyone, they become useless to everybody, because the people designing the new deception techniques will test them on the publicly available detectors and evade them.

识别 AI 操作的工具无法识别人工编辑。这些工具也不会提供给所有人使用。在安全性和可用性之间存在权衡,这意味着如果我们让所有人都可以使用它们,它们对每个人都会变得无用了,因为设计新欺骗技术的人会测试公开可用的检测器并规避其检测。

But we do need to make sure these are available to the journalists, the community leaders, the election officials, globally, who are our first line of defense, thought through with attention to real-world accessibility and use. Though at the best circumstances, detection tools will be 85 to 95 percent effective, they have to be in the hands of that first line of defense, and they're not, right now.

但我们确实需要确保这些工具可以提供给全球的记者、社区领袖、选举官员等相当于第一道防线的人,并且要考虑现实世界的可访问性和使用情况。尽管在最佳情况下,检测工具的有效性会达到85%95%,但它们必须掌握在第一道防线的手中,而现在却没有。

So for step one, I've been talking about detection after the fact. Step two — AI is going to be everywhere in our communication, creating, changing, editing. It's not going to be a simple binary of "yes, it's AI" or "phew, it's not." AI is part of all of our communication, so we need to better understand the recipe of what we're consuming.

因此,第一步,我一直在谈论事后检测。第二步,在我们的沟通、创造、改变和编辑中,AI 将无处不在。这不会是一个简单的二元选择,AI ”不是 AI ”AI 是我们所有沟通的一部分,因此我们需要更好地了解我们所消费内容的配方。

Some people call this content provenance and disclosure. Technologists have been building ways to add invisible watermarking to AI-generated media. They've also been designing ways -- and I've been part of these efforts -- within a standard called the C2PA, to add cryptographically signed metadata to files.

有些人称之为内容来源和披露。技术人员一直在构建方法来为 AI 生成的媒体添加隐形水印。他们还在设计方式——我参与了这些工作——在一个名为 C2PA 的标准内,为文件添加加密签名的元数据。

This means data that provides details about the content, cryptographically signed in a way that reinforces our trust in that information. It's an updating record of how AI was used to create or edit it, where humans and other technologies were involved, and how it was distributed. It's basically a recipe and serving instructions for the mix of AI and human that's in what you're seeing and hearing.

这意味着提供有关内容详细信息的数据,以一种加密方式签名,可以加强我们对该信息的信任。它是使用 AI 创建或编辑内容的更新记录,涉及人类和其他技术的参与方式,以及内容的分发方式。基本上,它是你所看到和听到的融合 AI 和人类的配方和使用说明。

And it's a critical part of a new AI-infused media literacy. And this actually shouldn't sound that crazy. Our communication is moving in this direction already. If you're like me — you can admit it — you browse your TikTok “For You” page, and you're used to seeing videos that have an audio source, an AI filter, a green screen, a background, a stitch with another edit.

这是新型 AI 媒体素养的关键组成部分。实际上,这听起来不应该那么疯狂。我们的沟通已经朝着这个方向发展了。如果你和我一样——你可以承认——你会浏览 TikTok 推荐页面,你已经习惯看到的视频有音频来源、AI 滤镜、绿幕、背景、与其他编辑的组合。

This, in some sense, is the alpha version of this transparency in some of the major platforms we use today. It's just that it does not yet travel across the internet, it’s not reliable, updatable, and it’s not secure. Now, there are also big challenges in this type of infrastructure for authenticity.

在我们今天使用的一些主要平台上,这在某种意义上是这种透明度的初始版本。只是它尚未在互联网上传播,不可靠、不可更新,也不安全。现在,这种确保真实性的基础设施也存在着重大挑战。

As we create these durable signs of how AI and human were mixed, that carry across the trajectory of how media is made, we need to ensure they don't compromise privacy or backfire globally.

当我们创造这些持久的标记来说明 AI 和人类的融合方式时,这些标记会在媒体制作过程中传播,我们需要确保它们不会损害隐私或在全球产生逆转效果。


We have to get this right. We can't oblige a citizen journalist filming in a repressive context or a satirical maker using novel gen-AI tools to parody the powerful ... to have to disclose their identity or personally identifiable information in order to use their camera or ChatGPT.

我们必须搞清楚这一点。我们不能强迫在压迫环境中拍摄的公民记者或使用新颖的生成式 AI 工具来戏仿强权的讽刺创作者……必须披露他们的身份或个人身份信息才能使用摄像头或 ChatGPT

Because it's important they be able to retain their ability to have anonymity, at the same time as the tool to create is transparent. This needs to be about the how of AI-human media making, not the who.

因为重要的是,他们能够保持匿名,同时能够透明地创作。应该关注的是 AI 和人类媒体制作的方式,而不是制作者的身份。


This brings me to the final step. None of this works without a pipeline of responsibility that runs from the foundation models and the open-source projects through to the way that is deployed into systems, APIs and apps, to the platforms where we consume media and communicate.

这就引出了最后一步。如果没有从基础模型和开源项目到部署到系统、API 和应用程序,再到我们消费媒体和进行沟通的平台的责任链,所有这些都不会起作用。

I've spent much of the last 15 years fighting, essentially, a rearguard action, like so many of my colleagues in the human rights world, against the failures of social media. We can't make those mistakes again in this next generation of technology. What this means is that governments need to ensure that within this pipeline of responsibility for AI, there is transparency, accountability and liability.

过去15年,和很多人权界的同事一样,我花了很多时间战斗,基本上是一场防守战,针对社交媒体的失败。在这一代技术中,我们不能再犯同样的错误。这意味着政府需要确保 AI 的这一责任链中存在透明度、责任和义务。

Without these three steps — detection for the people who need it most, provenance that is rights-respecting and that pipeline of responsibility, we're going to get stuck looking in vain for the six-fingered hand, or the eyes that don't blink. We need to take these steps. Otherwise, we risk a world where it gets easier and easier to both fake reality and dismiss reality as potentially faked.

如果没有这三个步骤——为最需要的人提供检测、尊重权利的出处和责任链,我们将徒劳地寻找六指手或不眨眼的眼睛。我们需要采取这些步骤。否则,我们将面临一个情景,那就是伪造现实和可能视现实为伪造会变得越来越容易。


And that is a world that the political philosopher Hannah Arendt described in these terms: "A people that no longer can believe anything cannot make up its own mind. It is deprived not only of its capacity to act but also of its capacity to think and to judge. And with such a people you can then do what you please." That's a world I know none of us want, that I think we can prevent.

那样的世界就是政治哲学家汉娜·阿伦特如此描述的世界:无法再相信任何事情的人会无法自己做决定。他们不仅丧失了行动的能力,还失去了思考和判断的能力。对于这样的人,你可以为所欲为。我知道没有人想要这样的世界,而我认为我们可以阻止世界变成那个样子。


1、每月讲一次面授课 2、每季建一个学习谷 3、每年开一次学友会 4、每五年出一本亮书 ©版权所有:罗卫国(北京)咨询有限公司 备案号:京ICP备16014639号-1