[Music]
So I'll thank Jaka for that lovely
introduction, and, and thank you for
sharing lunch, sharing your lunch hour
with me. So I think what I'm going to do
is kind of jump into the talk, and if
there are procedural questions you have
or questions about methods, I'm happy to
take those questions. Now otherwise
we'll just wait till the end. So lots of
tech companies are trying to figure out
how to detect emotion by reading facial
expressions. It's a really exciting time
because the technology is developing
really quickly, advancing really fast, and
in fact the pace is, even seems to me,
anyway, it's kind of speeding up and
there's a growing economy of emotion
reading gadgets and apps and algorithms.
But the question I want to start today
with is: can we really expect a machine
to read emotion in a face? There are
plenty of companies who are claiming to
have already done it, and their claims
are based on some fundamental
assumptions that we're going to
systematically examine today. And I'll
just warn you, I'm going to maybe suggest
some things that some people might find
a little provocative and might challenge
your deeply-held beliefs, because the
message I'm going to suggest today is that
machines, it's not the case that machines
can't perceive emotion, but that
companies currently seem to be going
about this question in the wrong way,
because they fundamentally misunderstand
the nature of emotion. And as a
consequence, they're missing what I would
think of as a game-changing opportunity
to really transform the science and its
application to everyday problems. So
emotion-reading technology usually
starts with the assumption that people
are supposed to smile when they're happy,
frown when they're sad, scowl when
they're angry, and so on, and that
everyone around the world should be able
to recognize smiles and frowns and
scowls as expressions of emotion. And
it's this assumption that leads
companies to claim that detecting a
smile with computer vision algorithms is
equivalent to detecting an emotion like
joy. But I want you to consider this
evidence. Here on the x-axis are the
presumed expressions for the various
emotions for anger, disgust, fear,
happiness, sadness, and surprise. And now
we're going to look at some evidence
from meta-analyses, statistical summaries
of experiments, to answer the question of
how often people actually make these
faces during emotion. And the answer is:
not so much. The y-axis represents the
proportion of times that people actually
make these facial expressions during
actual emotional events. So in real life,
for example, people only make a wide-eyed
gasping face during an episode of fear nine
percent of the time
across 16 different studies. And in fact,
that face, if you were in Papua New
Guinea in the Trobriand Islands, would be
considered an anger face. It's a threat
face. It's a face that you make to
threaten someone. So in real life, people
are moving their faces in a variety of
ways to express a given emotion. They
might scowl in anger about 30 percent of
the time, but they might cry in anger.
They might have a stone-faced stare in
anger. They might even smile in anger. And
conversely, people often make these faces
when they're not emotional at all. For
example, people often scowl a full on
facial scowl when they're just
concentrating really hard.
Nevertheless, there are hundreds of
studies where
subjects are shown posed expressions
like these supposed faces and then
they're asked to identify the emotion
being portrayed. And again, the
proportions are on the y-axis and so you
can see there's quite a difference, right,
even though people only make a wide-eyed
gasping face about 9% of the time,
68% of the time, test subjects identify
that as a fear expression, and so on, and
so forth.
So, which data are the companies using to,
to, as the basis of their development?
They're using the blue bars. So when
software detects someone is scowling,
they infer that the person is angry, and
in fact, you'll hear companies refer to a
scowl as an "anger expression" as if
there's a one-to-one correspondence, and
frowning for sadness, and so on. And so,
the question is, well, if people sometimes
make these faces to express emotions, the
the the presumed emotion, but often not.
Why are test subjects, as perceivers,
identifying emotions in these faces so
so frequently? So why are the blue bars so
much higher than the white bars? And now,
I'm going to show you the answer. Here's
the kind of experiment that is almost
always used in the sorts of studies that
generate the data for those blue bars.
Test subjects are shown a posed face
like this, and then they're shown a
small set of words, and then they're
asked to pick the word that matches the
face. So, which word matches this face?
Good job. When test subjects choose the
expected word from the list, it's called
"accuracy" even though this person is not
angry.
In fact, she's just posing a face. And in
most of the faces that are used in these
experiments, subjects are just posing
faces, right. So it's not really
accuracy. It's more like how much did you
agree with the experimenter's
expectations. But it's called "accuracy," so
that's what we're going to call it today too.
So, hundreds of studies show
pretty high accuracy using this method.
This is on average, so this is a
meta-analytic average across hundreds of
studies. And emotion perception, you know,
feels as easy as reading words on a page,
because in fact, that's actually what's
happening in these experiments. And when
you remove the words, and you merely ask
test subjects to freely label the faces,
accuracy drops precipitously. And for
some emotions like contempt and shame
and embarrassment, the rates actually
drop to chance levels, which is about
17% in most of these studies. And here's
what happens when we add a little bit of
diversity into the picture. So, things get
a little more interesting. So, we tested a
group of hunter-gatherers in Tanzania
called the Hadza. The Hadza have been
hunting and gathering continuously as a
culture since the Pleistocene. They don't
live in, you know, the same, exact same
circumstances as ancient humans, but they
are living, they are hunting and
gathering on the African savannah, so
they are living a lifestyle that is
similar to the conditions that some
psychologists, evolutionary psychologists,
believe gave rise to these "universal"
expressions. So they're a great
population to, to test. And they're
actually aren't that many of them left.
It's actually really hard to get access
to these, to this group of individuals.
You have to have special research
permits, and so on. So, we were able to,
with the help of an anthropologist who
we collaborated with, get access to the
Hadza, and who were very generous with
their time and, you know, labeled some
faces for us. And we showed them a set of
faces, and we asked them to do exactly
what we asked other test subjects to do,
and accuracy actually dropped even
further. And this number is actually a
little high, because what the Hadza were
very good at doing was distinguishing a
smile from
all the other faces which were depicting
negative emotions. So when you just look
at the accuracy for, for labeling scowls
and pouts and things like that, just the
negative depictions of negative emotions,
the rate dropped even further, pretty
much to to chance levels. And so, this is
what happens when you remove the secret
ingredient from these experiments: the
evidence for universal emotional
expressions vanishes. Now, I'm not saying
that that means that faces carry no
information, or that we can't look at a
face and and make a reasonable guess
about how someone feels. But what I am
telling you is that human brains are
doing more than just looking at a face
when they make such judgments. That is,
right now, when you're looking at me, or
when I'm looking at you, some of you are
smiling and nodding - thank you very much.
Others are, you know, maybe looking a
little more skeptical, or at least that's
the guess that my brain is making, and my
brain isn't just using your face. There's
a whole context around us. But in these
experiments, we're just looking,
the, the experimenters were looking only
for the signal value in the face alone,
stripped away of all context; except the
context that they, unbeknownst to them,
actually had provided to the subjects,
which are the words. So - just to confirm
that, you know, the experimental
context was actually generating evidence,
making, making, that it could make ANY
emotion look universal, we decided to
test this by going back to the original
experimental method. And we identified
six emotions from different cultures
that have never been identified as
universal, that can't be translated into
English with a single word, which is
important because all of the presumed
universal emotions happen to be English
categories, and they also don't exist in
the language that is spoken by the Hadza,
which is Hadzane. And then, what
we did is we invented expressions
for these emotions - we just made them up
- and in this case, we were using
vocalizations, although we have a version
with faces. But we were using
vocalizations because, it's a complicated
story, but we were basically replicating,
we were replicating another experiment
and kind of criticizing it. So we used
vocalizations. So for example, the
category Gigil is the overwhelming urge
to squeeze or pinch something that's
very cute; you know, when you see something
cute and you just want to, you just want
to, you know, squeeze the cheeks of a baby,
right. That's the, that's the emotion. And
so, we made up a vocalization to go with
that, which sounds something like this: eee!
OK, so, we made that sound, and then we
asked our test subjects again from the
Hadza test subjects to match each
sound with a little story that we told
about emotion, because in remote samples
that are, you know, small-scale societies
that are very remote from Western
cultures, typically the way these
experiments are done is you don't give
them a list of words. You tell them a
little story about the emotion that
contains the emotion word, and then you
give them two faces or two vocalizations
and you ask them to basically pick the
expression that matches. So, that's what
we did. And then the average accuracy
actually was pretty high. And if you look
at the individual emotions, five of the
six of them look universal. And in fact,
these accuracy rates are pretty similar
to what you see in many studies of anger,
for anger, sadness, fear, and so on.
So this is where the blue bars come from.
Scientists have been using, really since
the 1960s, an experimental method that
doesn't discover evidence for universal
expressions of emotion, but it
manufactures that evidence. This method
of providing test subjects with
linguistic cues is responsible for the
scientific belief that a scowl
expresses anger and only anger, that a
smile expresses happiness and only
happiness, and so on. And so, if you're a
company who wants to build AI to
perceive emotions in humans by measuring
their facial movements, then it's
probably important to realize that these
famous configurations don't actually
consistently display disgust, anger, and
fear, and so on, and that it's a mistake
to infer that someone who is scowling is
angry. And in fact, it's a, it's a mistake
to call a scowl an "anger expression,"
because only sometimes does a, is
a scowl indicative of anger. Instead, what
we see when we look at the data is that
variation is the norm. And to show, I'll
just show you what I mean. So, if you were
looking at this person's face, how, how
does she look to you? What, what emotion
does she seem to be expressing? Sadness.
She's sneezing (not even an emotion at
all). Smelling something good. So usually,
this is it, yep, usually people see her as
tired, or, or as grieving, or as about to
cry, sad.
Actually, this is my daughter Sophia
experiencing what I can only
describe to you as a profound and deep
sense of pleasure, at the chocolate
Museum in Cologne. Germany. And this
little sweetheart is also experiencing a
profound sense of pleasure. And the
lesson here is that people move their
faces in many different ways
during the same emotion. Now, if we were
to only look at this little guy's
eyebrows up to his, you know, eyes and
nose, this, these facial actions actually
are very reminiscent of
the presumed expression for anger. So for
example, this face is often seen as angry.
Does anybody actually know who this is?
Jim Webb. This is actually Jim Webb when
he won the senatorial race in Virginia,
which returned the Senate to Democratic
control. This victory returned the Senate
to democratic control. Sorry, I was just
having a moment there. And so, without
context, we see his face as communicating
anger because actually this face
symbolizes anger in our culture. So
people don't just move their faces in
different ways during the same emotion,
they also move their faces in the same
way during different emotions. So in
real life, a face doesn't speak for
itself, when it comes to emotion, right.
People usually see this guy, this
face as smug or pride or confidence.
Actually it's the, the supposed
universal expression for disgust. And
what's really interesting is that when
you stick the presumed, you know, the presumed
expression for disgust on a body, or in
any kind of context that suggests a
different emotion, perceivers actually
track the face differently. Their
scanning of the face has a completely
different pattern, suggesting they're
making different meaning of that face on
the, by virtue of the, the context. It's,
and I'll just tell you as an aside, in
every - maybe this an exaggeration - in most
studies where you pit a face against the
context, the face always loses. Faces are
inherently ambiguous without context to
make them meaningful. So, what's up with
these expressions? Where did they come
from? Well, it turns out that they were
not discovered by actually observing
people as they moved their faces
expressing emotions in real life. In fact,
these are stipulated expressions. So, a
handful of scientists just anointed
these as
the expressions of emotion, as universal
truths, and then people built a whole
science around it. So basically, they're
stereotypes. And what we have is a
science of stereotypes, or, you know,
emojis,
which by themselves, I should tell you,
also are highly ambiguous, it turns out,
without context. So obviously, we don't
want to build a science of artificial
intelligence on stereotypes. We want to
build them on emotional episodes as they
occur in in real life. And in real life,
an emotion is not an entity, right. It's a
category that's filled with variety. When
you're angry, your face does many things,
and your body does many things, and it
turns out your brain also does different
things depending on on the context that
you're in. Now, for those of you who build
classification systems, you know about
categories, right? So for example, if you
were building a category of, you're
building a recognition system for cats, a
cat recognition system, you would develop
a classifier that could learn the
features that cats have in common that
distinguish them from other animals, like
dogs and birds and fish and so on. And
this CAT-egory... get it? My daughter made
me say that, OK? This category (Thank you
for laughing - now I can tell her that
you thought it was funny.) is a
collection of instances that have similar
features. But, you know, there's actually
plenty of variation in the instances
of this category, too, right? Some cats are
big, some cats are small, some cats have,
you know, cats have different eye colors,
some cats have long fur, some have short
fur, some have no fur. But the human brain
tends to ignore this variation in favor
of what cats have in common. And the
interesting thing is that humans also
have the capacity to make other kinds of
categories. Categories where
there are no physical similarities. Where
the category is not based on physical
similarities of the instances. And this
is something we do all the time. For
example, here's a category. This is a
category that every I'm sure everyone in
this room knows. You want to take a guess
what it is? Human-made objects? I suppose
if you treat the elephant like a picture
of an elephant, then that would, that
would be true, yeah. OK, well, these are
all objects that you can't bring through
airport security. Actually, the last time
I did this, one clever person actually
said they're all instances of things
that you can squirt water out of. And I
thought, well, actually, yeah, if you think
of the gun as a water pistol, then that
that could work, right? This is a category
that's not made of instances that share
physical features. Instead, they share a
common function, in this case, squirting
water through them, or not being able to
take them through airport security. This
category, though, exists inside our heads
and in the head of every adult who has
ever flown on an airplane. It's a
category of social reality. So, for
objects to belong to this category, they
they belong not because they all share
the same physical features, but because
we impose a similar function on them by
collective agreement. We've all agreed
that it is not OK to take water
through the, you know, a water bottle
through security, or a gun, or an
elephant. And in fact, it turns out that
most of the categories that we deal with
in civilization are categories of social
reality, whose instances don't
necessarily share physical features, but
we've imposed the same function on those
features by collective agreement. Can you,
can you think of any that might come to
mind? Things that we treat as similar but
but are actually, their physical features
actually vary quite a
bit? Money. Exactly. Money is, money is a
great example. So, throughout the course
of human history, and actually even right
now, there's nothing about what humans
have used as currency that defines those
instances as currency. It's just that a
group of people decide that something
can be traded for material goods, and so
they can. And, you know, little pieces of
paper, pieces of plastic, shells, salt, big
rocks in the ocean which are immovable,
mortgages. And when we remove our
collective agreement, those things lose
their value, right? So one way of thinking
about the, the mortgage bubble is that
mortgages, the value of mortgages is
based on collective agreement, and some
people removed their agreement. Anything
else? Yeah, that's true, you have to work
really hard to accept the collective
agreement of driving on the wrong side
of the road. Oh, come on. Beauty.
How about citizenship of a country? How
about a country, right? If you go, for
example, into, if you look at a map from the
1940s or before the 1940s, the map of the
world looks very different. The map of
the earth is pretty much the same. The
physical features of the earth are more
or less the same, but, but the countries
are that are drawn are different. So we
could go on and on like this. We could
talk about social rules, like being
married. Marriage, it actually turns out, is
also a category of social reality. The
presidency of any country is, you know,
people don't have power because they're
endowed by nature with power. They have
power because we all agree that certain
positions give you power. And if we
revoked our agreement, then they wouldn't
have power anymore.
That's called a revolution. So,
emotion categories are categories like
this. Anger and sadness and fear and so
on are categories that exist because
of collective agreement, just in the same
way that we had to impose a function on
the elephant that, that wasn't there
before, in order for it to belong to
this category. We also impose meaning on
a downturned mouth, a scowl. We impose
meaning on that scowl as anger, right. So
a scowl isn't inherently meaningful as
anger. In this culture, we've learned to
impose that meaning based on our shared
knowledge of anger. And in the Trobriand
Islands, they would impose a different
meaning they impose a meaning of,
sorry, uh, they impose a meaning on a
different face for anger, for the
stereotype of anger. It's a wide-eyed
face, a wide-eyed gasping face. And this
is also what allows us to see other
expressive movements as anger, right? So
what we're doing is imposing meaning on
a smile or on a stone-faced stare, or on a
cry as anger in a particular situation.
It transforms mere physical movements
into something much more meaningful,
which allows us to predict what's going
to happen next. So, if you want a machine
to perceive emotions in a human, then it
has to learn to construct categories on
the fly. Perceiving emotions is not a
clustering problem - it's a category
construction problem. And it's a category
construction problem whether you're
measuring facial movements, or bodily
movements, or the acoustics of someone's
voice, or whether you're measuring the
changes in their autonomic nervous
system, or even in
the neural activity of the brain, or even
all of those, right? All of these things
are physical changes that aren't
inherently meaningful as emotions.
Someone or something has to impose
meaning on them to make them meaningful,
right? So an increase in heart rate is
not inherently fear, but it can become
fear when it is, um, it's pressed into
service to serve a particular function
in a particular situation. So, emotions
are not built into your brain from birth.
They are just built, as you need them.
And this is really hard to understand
intuitively since, you know, your brain
categorizes very automatically and very
effortlessly without your awareness. And
so, we need special examples to kind of
reveal to us what our brains are doing,
kind of categorizing very continuously
and, and effortlessly. And so, what I'd
like you to do right now is, we're going
to go through one of these examples, so I
can, I can explain it to you. So, here's a
bunch of black and white blobs. Tell me
what you see. Sorry, a person? A person
kicking a soccer ball. Mm-hmm. An octopus.
One-eyed octopus. So, right now, what's
happening in each of your brains is that
billions of your neurons are working
together to try to make sense of this, so
that you see something other than black
and white blobs. And what your brain is
actually doing is it's searching through
a lifetime of past experience, issuing
thousands of guesses at the same time,
weighing the probabilities, trying to
answer the question, "what is this like?"
Not "what is this?" but "what is this like?"
"How similar is this to past experiences?"
And this is all happening
in the blink of an eye. Now, if you are
seeing merely black and white blobs, then
your brain hasn't found a good match, and
you're in a state that scientists call
experiential blindness. So now I'm gonna
cure you of your experiential
blindness. This is always my favorite
part of any talk. Should I do that again?
Now many of you see a bee. And the reason
why is that now, as your brain is
searching through past experiences,
there's new knowledge there from the
color photograph that you just saw. And
the really cool thing is that what you
just saw a moment or two ago, that
knowledge is actually changing how you
experience these blobs right now. So your
brain is now categorizing this visual
input as a member of the category "Bee." And
as a result, your brain is filling in
lines where there are no lines. It's
actually changing the firing of its own
neurons so that you see a bee where there
is actually no bee present. This kind of
category-induced hallucination is pretty
much business as usual for your brain.
This is just how your brain works. And
your brain also constructs emotions
in exactly this way. And here's why,
here's why it happens. Because your brain
is actually entombed in a dark silent
box, called your skull, and it has to
learn what is going on around it in the
world via scraps of information that it
gets through the sensory channels of the
body. Now, the brain is trying to figure
out the causes of these sensations, so
that it understands what they mean and
it knows what to do about them to keep
you alive and well. And the problem is
that the sensory information from the
world is noisy. Ambiguous. It's often
incomplete, like we saw with the blobby bee
example, and any given sensory input like
a flash of light can have
many different causes. So your brain
has this dilemma. And it doesn't just
have this dilemma based on sensory
inputs from the world. It also has this
dilemma to solve regarding the sensory
inputs from your body. So, there are
sensations that come from your body, like
your lungs expanding and contracting and
your heart beating, and there are
sensations from moving your muscles and
from metabolizing glucose, and so on and
so forth. And the the same kind of
problem that we face with having to make
sense of information from the world we
we also face from having to make sense
of our own bodies, which are largely a
mystery to the brain, more or less. So, an
ache in your gut, for example, could be
experienced as hunger if you were
sitting at a dinner table. But if you
were in a doctor's office waiting for
test results, that gut, that ache in
your gut would be experienced as anxiety.
And if you were a judge in a courtroom,
that ache would be experienced as a gut
feeling that the defendant can't be
trusted. So, your brain is basically
constantly trying to solve a reverse
inference problem, because it has to
determine the causes of sensations when
all it actually has access to are the
effects. And so, how does it do this? How
does the brain resolve this, this reverse
inference problem? And the answer is by
remembering past experiences that are
similar in some way. So, it's remembering
past experiences where physical changes
in the world and in the body are
functionally similar to the present
conditions. Similar. It's creating,
basically, categories. So,
your brain is using past experience to
create ad hoc categories to make sense
of sensory inputs, so that it knows what
they are and what to do about them. And
these categories represent the causal
relationships between the events in the
world and in the body, and the
consequences, which is what the brain
actually detects. And this is actually
how your brain is wired to work. It's
wired to work this way. It's
metabolically efficient to work this way.
And this is how your brain constructs
all of your experiences and guides all
of your actions. Your brain begins with
the initial conditions in the body and
in the world, and then it predicts
forward in time, predicting what's about
to happen next,
by creating categories that are
candidates to make sense of incoming
sensory inputs. To make them meaningful,
so that your brain knows what to do next.
And the information from the world and
from the body either confirms those
categories, or it, it prompts the brain to
to learn something and try again. It
updates and then the brain makes another
attempt at categorization. So, emotions
are not, you know, reactions to the world.
They are actually your constructions of
the world. It's not like something
happens in the world and then you react
to it with an emotion. In fact, what's
happening is that your brain is
constructing an experience, an episode, or
an event, where what it's trying to
do is make sense of or categorize what
is going on inside your own body, like an
ache, in relation to what's happening in
the world, like being in a doctor's
office. So, emotions are basically brain
guesses that are forged by billions of
neurons working together. And so, the
emotions that seem to happen to you are
actually made by you.
And categorization is also how your
brain allows you to see emotions in
other people. So, your brain remembers
past experiences from similar situations
to make meaning of the present, you know,
to make meaning of the raise of an
eyebrow, or the movement of the mouth, and
so on. So, to perceive emotion in somebody
else, what your brain is actually doing
is it's categorizing that person's
facial movements, and their body
movements, and the acoustics of their
voice, and the surrounding context, and
actually stuff that's happening inside
their own bodies, all conditioned on past
experience. So, even though when we're
talking to each other,
we're mainly looking at each other's
faces, and we're aware of the
movements of each other's faces, and we
might be maybe aware of the tone of
voice, our attention is not given to
the rest of the sensory array that the
brain has available, including what's
going on inside your own body. You know,
your body, inside your own body, is a
context that you carry around with you
everywhere you go, that is involved in
every single action and experience that
your brain creates. Largely, you are largely,
and you are largely unaware of it,
actually. And this is how a scowl can
become anger or confusion or
indigestion or even amusement;
so that emotions that you seem to detect
and other people are partly made inside
your own head. So, when one human
perceives emotion in another person, she
is not "detecting" emotion. Her brain is
guessing by creating categories for
emotion in the moment. And this is how a
single physical feature can take on
different emotional meanings in
different contexts. So, for a machine to
perceive emotion, it has to be trained on
more than stereotypes. It actually has to
capture the full high-dimensional, the
high-dimensional detail of the context,
not just measuring a face, or a face
and a body, which is inherently ambiguous
without the context. So, perceiving
emotion means learning to construct
categories using the features from
biology, like faces and bodies and brains,
but in a particular context. And the
thing I want to point out here is that
I'm using the context, the word
"context," pretty liberally here, because
context also often includes the
actions of other people, right? So we are
social animals, and other humans make up
important parts of our context, which
suggests that when you want to measure,
when you want to detect emotion in
a person, you want to build AI, you're
measuring the context, you might consider
also measuring the physical changes in
the people who are around that person,
because that can give you a clue about
about what the physical changes in the
target person really mean. So, measuring
the features of other people, that is,
their physical changes and actions, that
are contingent on the biological changes
in the person of interest, is a an
extension of the idea of context
which is really important. And in the
last few minutes what I'm going to do is
switch gears here, from perceiving
emotion to ask whether it's possible to
build machines who can actually
experience emotion the way that humans
do. And this is a question I think that
interests AI, people who work in AI,
often because they're interested in
questions about empathy. And so, if
emotions are made by categorizing
sensations from the body and from the
surrounding context using past
experience, then machines would need all
three of these ingredients, or something
like them. And so, we're going to just
take this really quickly one at a time.
So, the first is past experience. Can
machines actually recall past experience?
Well, machines are really great at
storage and retrieval.
Unfortunately brains don't work like a
file system. Memories aren't retrieved
like files. They are, memories are
dynamically constructed in the moment. And
brains have this amazing capacity to
kind of combine bits and pieces of the
past in novel ways. They're... brains are
generative. They are information-gaining
structures. They can create new content,
not just merely reinstate old content,
which is necessary for constructing
categories on the fly. To my knowledge -
and maybe, you know, which might be out of
date, but to my knowledge - there are no
computing systems that are powered by
dynamic categorization, that can create
abstract categories by grouping things
together that are physically dissimilar
but because they, they are all in that
particular situation serving a similar
function. So, an important challenge for
computers to experience emotion is to be
able to develop computing systems that
have that capability. The second
ingredient is context. So, computers are
getting better and better at sensing the
world. So, there are advances in computer
vision and speech recognition and so on.
But a system doesn't just have to detect
information in the world. It also has to
decide which information is relevant, and
which information is not, right? This is
the "signal vs. noise" problem. And this
is what scientists call "value." So, value
is not something that's detectable in
the world. Value is not a property of
sights and sounds and so on, or the
information that creates sights and
sounds, and so on, from the world. Value is
a function of that information in
relation to the state of the organism or
the system that's doing the sensing. So,
if there's a blurry shape in the
distance, does it have value for you as
food, or can you ignore it? Well, partly
that depends on what the shape is, but it
also depends on when you last ate, and
even more importantly, the value also
depends on whether or not that shape
wants to eat you. And so, to solve this
problem, it turns out that, you know, the
brain didn't
start off, in terms of
brain evolution, it didn't start off with
systems that allow creatures to compute
value. Those evolved in concert with
sensory systems, in concert with the
ability to see and hear and so on, for
exactly this reason. And so, evolution
basically gave us brain circuitry that
allows us to compute value, which also
gives us our mood, or what scientists
call "affect," which are simple feelings of
feeling pleasant, feeling unpleasant,
feeling worked up, feeling calm. Affect or
mood is not emotion. It's just a quick
summary of the state of what's going on
inside your own body, like a barometer.
And affect is a signal that something is
relevant to your body or not - whether
that thing has value to you or not. And
so, for a machine to experience emotion,
it also needs something that allows it
to estimate the value of things in the
world in relation to a body. Which brings
us to the third ingredient: brains
evolved for the purposes of controlling
and balancing the systems of a body.
Brains didn't evolve so that we could
see really well, or hear really well, or
feel anything. They evolved to control
the body, to keep the systems of the body
in balance. And the bigger the body gets,
with the more systems, the bigger the
brain gets. So, a disembodied brain has no
bodily systems to balance. It has no
bodily sensations to make sense of. It
has no affect to signal value. So a
disembodied brain would not experience
emotion. Which means that for a machine
to experience emotion like a human does,
it needs a body, or something LIKE a body:
a collection of systems that it has to
keep in balance, with sensations that it
has to explain. And to me, I think this is
the most surprising insight about AI and
emotion. I'm not saying that a machine
has to have an actual flesh-and-blood
body to
experience emotions. But I am suggesting
that it needs something like a body, and
I have a deep belief that there are
clever engineers who can come up with
something that is enough like a body to
provide this necessary ingredient for
emotion. Now these ideas and others, the
science behind them and related ideas,
can be found in my book, "How Emotions are
Made: The Secret Life of the Brain," and
there's also additional information on
my website. And even though this is not,
strictly speaking... I'm not throwing tons
of data at you, I do always at the end of
talks like to thank my lab. They really are,
they're the ones who actually do all the
really hard work. You know, scientists
like me just get to stand up here and
talk about it, so I just want to thank
them as well, and thank you for your
attention, and I'll take questions. I am
wondering how someone who is say blind
from birth will perceive emotion because
they don't they cannot depend on visual
cues whether it's facial expression or
body language so I'm guessing they
usually go off of vocal tones or lack
thereof have you looked into their
accuracy of predicting emotions and is
that better or worse than people who
rely on visual cues? So people who are
born congenitally blind have no
difficulty experiencing emotion and they
have no difficulty in perceiving emotion
through the sensory channels that they
have access to because their brains work
largely in the same way that a sighted
person's brain works at birth the brain
is collecting patterns statistical
patterns and so it's just vision isn't
part of that pattern and what's really
interesting actually is that so for
someone who is is congenitally blind
they they're learning patterns that
include you know changes in sound
pressure that become hearing changes
in the pressure on the skin which
becomes touch and they have taste
they have sensations from the body which
become affect so they can do multimodal
learning just like the rest of us and
they can learn to experience and express
emotion and perceive it through the
channels they have access to what's
really interesting is that when adults
have let's say who are congenitally
blind because they have cataracts have
those cataracts removed for the first
time they can see or they should be able
to see but actually it takes them a
while to learn to see and when they
finally learn to see they their
experience if you talk to these people
what they say is that they feel
like they're always guessing what faces
mean and what body postures mean they
find faces in particular hard for
example even as one there's one person
Michael May who's been studied really
extensively over a number of years and
even a couple of years after his
cataracts were replaced his cornea so
he had corneal abrasions that so his
cornea were replaced he was still
guessing consciously guessing at whether
a face was male or female
before someone spoke he just it was
really hard for him to do and he
experienced his vision as separate from
everything else like a like a second
language that he was learning to speak
which had no affect to it right so so
but the answer to your question is so
we could ask a bunch of
questions like so do blind people who
are congenitally blind do they actually
make facial expressions you know the way
that a sighted person does and the
answer is they their facial movement
they don't make the stereotypic
expressions when they're angry or sad or
whatever but they do learn to make
deliberate movements in a particular way
for example when they there are these
studies showing that when congenitally
blind athletes win an award and they
they know they're being filmed
they will make body movements that
indicate being really thrilled but they
don't but they're doing it because
they've learned it in the in the same
way that if you test a congenitally
blind person on the meaning of color
words their mapping of color words
largely is the same as a sighted person's
because they've learned
from the statistical regularities in
language which words are more similar to
each other and which ones aren't so
their abilities at emotion perception
and emotion expression largely look the
same as a sighted person's
without the without the visual component
was really interesting is that people
who are congenitally deaf who don't who
who tend to learn mental state language
they develop concepts for mental states
later also are delayed in their ability
to perceive emotion in other people so
that literature suggests a coupling
between emotion words and the ability to
learn to form emotion categories in
childhood.
so you said an essential component in
recognizing an emotion is the context. I
would never say recognizing but yeah. if
we didn't have the context and but we
could monitor whatever is happening
inside a person's body and and the brain
really well would we be able to
recognize emotions and and what
specifically would it take what would we
want to monitor? Yeah so it's interesting
the I mean when I was originally
thinking about giving this talk I
thought I might start with machine
learning attempts to identify emotion
with neural activity patterns of neural
activity and it turns out that you can
in a given study if you if you show
people films say and you try to evoke
emotions by showing them films you can
actually build a pattern classifier that
can distinguish anger from sadness from
fear
meaning you can distinguish when people
are watching films that
presumably evoke anger versus sadness
versus fear the problem with those
studies is that that classifier can't be
used in another study like it doesn't
generalize right so you're what you're
building is you're building a set of
classifiers that work in a specific
context but when you generalize try to
generalize to another sample of subjects
maybe so let me say it this way if you
have the same subjects in the same study
watch movies and so you evoke anger by
watching a movie and you evoke anger by
having them let's say remember a prior
anger episode you get you can classify
the emotions and distinguish them
from one another and you can you get a
little bit of carryover from one
modality of evoking to another but if
you go to a completely separate study
the patterns look completely different
and this is true across hundreds of
studies so for example we developed a we
have I published a pattern
classification paper where we use 400
studies and we developed these
classifiers based on this meta-analytic
database that that those classifiers are
not successful at classifying any new
set of instances I mean they show really
good you know I mean we used to leave
one out method we used a you know
multivariate Bayesian approach you know
there are no problems with the
statistics the issue is that when
scientists do this they believe that
what they're discovering in these
patterns is actually a literal brain
state for the emotion the literal brain
state for anger and then they think it
should generalize to something to every
brain to every instance of anger and
they don't generalize usually outside of
their own studies this is also true for
physiology where so we just published a
meta-analysis where we examined the
physiological changes in people's bodies
like their heart rate changes their
breathing their skin conductance and so
on and you you see that these physical
measures can distinguish
sometimes one emotion category
from another in a study but they don't
generalize across studies and in fact
the patterns themselves really change
from study to study and so there's when
you look at it in a meta-analytic sense
it looks like for all emotions
heart rate could go up or go down or
stay the same depending on what the
situation is so there's so far no one
has done a really high-dimensional
nobody's made a really high-dimensional
attempt at this meaning they haven't
tried to measure the brain and measure
the body and measure the face and
measure aspects of the context that's
actually what I think needs to be done
so I think this is a solvable problem I
just think we have not been going about
it in the right way and I think that
this is a real opportunity for for any
company that is serious about doing this
I love the way you mentioned in the book
that and then the talk is well how we
perceive emotions based on context so we
look at the context and then we infer
emotion and one of the examples that you
have in the book and and you have here
as well was Serena Williams winning a
Grand Slam
and you have Jim Webb I switched it out
people were starting to say oh that's
Serena Williams oh okay well that's
right yeah so but there's something that
is troubling to me at least in that in
that example well I think that's that's
certainly possible and what I would say
what I would say to that though is that
there are studies particularly by Hillel
Aviezer who's actually done work you
know he didn't I published the picture
of Serena Williams in 2007 I published
an example and Hillel came out with a
great set of experiments in 2008 and
then again in 2012 and he proceeded to
continue where he knows he has the
reports of people people's subjective
experience and he has their facial
movements and in fact there are meta-
analyses which have the subjective
reports of people and their facial
movements and in some time
also the reports of people interacting
with the people whose faces have been
and there's no evidence that the
variability is due to a
series of quick emotions being evoked
over time so what you know but I want to
back up one step and say this when when
you ask the question well maybe Serena
Williams is really experiencing maybe
she really is in a state of anger in
that moment or in that case it's
actually looks like fear more or terror
more when you say really that implies
that there's some objective criterion
that you could use to measure the state
that Serena Williams is really in and
there is no objective criterion for any
emotion that's ever been studied ever so
what scientists use is agreement they
use collective agreement essentially so
you can ask does the face does the face
match her report does the face match
somebody else's report do two people
agree on what they see so you're using
all kinds of basically perceiver based
agreement which is basically consensus
because no one has found the there
is no ground truth when it comes to
emotion that anyone has ever found that
replicates from study to study and so
there's a part of you that wants to say
I can't even answer your question
because I think it's not even a
scientific question that's answerable
but we can answer it in other ways
by looking at various forms of
consensus and while I can't say anything
about Serena Williams and what she
experienced I can say that in other
studies it's very clear that people are
scowling absolutely when they are not
angry my husband this is my husband Dan
Barrett who works for Google sorry honey
I'm gonna out you
you know he gives a full-on
facial scowl when he is concentrating
really hard and it was only after I
learned that that I was telling my
students actually like can you believe
and they're like can we believe it we
experience it every time we give a
presentation in front of you
right so I'm sitting there you know
paying a lot of attention to every
single thing they say and they think oh
my god she hates it
and the whole you know emotional climate
in my lab changed the the moment I
realized that so so that you know that's
an anecdote but it's an anecdote that
reflects what is in the literature which
is that people are making a variety of
facial I'm not saying it's random I'm
saying there's a pattern there are
patterns there that we haven't really
yet detected and I think it's in part
because we are measuring individual
signals or we think we're doing really
well if we measure the face and the body
or we measure the face and acoustics or
we measure the face and something about
you know maybe heart rate but we pick
you know we pick up two channels instead
of doing something really high
dimensional I'm not saying there's no
meaning there if there was if that were
true we you know you and I couldn't have
a conversation right now I'm saying that
it's probably something high dimensional
and it might be quite I idiographic
meaning there could be different
different brains maybe have the capacity
to do a different number of categories
to make different number of categories
and that's also something I discuss in
my book actually. so when you listed all
the sort of pre qualifications for maybe
emotion forming I was thinking you know
a lot of vegetarians say oh you know all
animals have feelings have this ability
to emote and to feel emotion and a
lot of meat eaters are like no no no no
that's impossible they they don't do you
have any opinions oh yes here's my
opinion I think that I think everybody
has to stop calling affect emotion like
many many many problems disappear they
just completely dissolve when we
understand that every waking moment of
our lives there's stuff going on inside
our bodies and we can't we don't have
access to the small every little small
change in our bodies that gives that
send sensory information to the brain if
we did we would never pay attention to
anything outside our own skins ever
again so instead
evolution has given us affect so we sense
what's going on inside our own bodies by
feeling pleasant or unpleasant feeling
worked up or feeling kind of calm
feeling comfortable or uncomfortable
that's not emotion that's affect or
mood that's with us always every waking
moment of your life you have some affect
there are some affective features to
your experience and it's very likely
also true of non-human animals you
know I can say this circuitry is
very similar similar enough that I think
you could go down all the way to
certainly all vertebrates and I would
even guess that there are some
invertebrates actually maybe all
invertebrates I don't know even insects
potentially could actually have affect
although I think that's drawing I mean I
might draw the line at like flies or
something but but recently there was a
study that came out that suggested maybe
they do have affect so you know my
feeling about this is I guess twofold
one is I think we have to stop
conflating affect and emotion
affect just with you always even when
you you experience yourself as being
rational even when you experience
yourself as just thinking or just
remembering it's just that when affect
is super strong our brains explain it as
emotion once we make that distinction
and we understand that distinction that
emotions and affect
maybe affect you could think of it as a
feature of emotion it's actually a
feature of consciousness then I think we
can say without hesitation
we don't know for sure whether non-human
animals feel affect but they probably do
and we should probably treat them as if
they do that that solves a lot of
problems it actually doesn't matter
really from a moral standpoint whether
an animal feels emotion it matters
whether they can feel pleasure and pain
that's enough actually it's an
interesting scientific question whether
or not they can they can their brains
can create emotion that's a whole
different conversation but I think the
answer to your question isn't really
about emotion it's about affect and I
think there it's really obvious that
if you're gonna the smart thing to do
you just want to do things where you do
the least amount of damage if you're
wrong
right and so that means including
animals in our moral circle right they
can feel if you just assume they can
feel pleasure and pain that solves a lot
of problems. yep thank you so much for
your time so you mentioned at the end I
guess to answer the question if machines
can experience motion that three things
and the body was one of them and then
earlier on or you also mentioned I guess
one purpose to have to create emotions
is like to know what to do next
so my question is if a being without a
body like a machine really needs that
body element if the purpose of that
being is different than just you know
knowing what to do next
therefore can we take that body out of
the one of the three requirements based
on a different purpose of that being
that's a great question that is a great
question
so if the so can you give me an example
we help me to help I don't have an
example by the way I'm thinking when I
hear machines yeah and you're modeling
all based on in humans we have their
purpose their emotions like we are
creating them maybe at the beginning for
survival maybe it's different social
elements no but if you take the body we
can still have a brain artificial
intelligence without a body which is a
different being or element therefore I'm
questioning that model of three things
needed yep to create emotion you know here's
something you need to have so you'd need
to have something that could tell the
machine what it needs to pay attention
to in the world and what it can ignore
so value right so I mean let me back up
and say it this way I mean I I don't
know how else to think about it except
in organic terms right but for example
if you look at brain evolution if I were
to say it really simply
like super simplistically so I'm just
glossing over like a ton of detail
organisms first developed a motor sort
of a rudimentary motor system with just
a tube for like a gut that's it they
used to just float around in the sea and
kind of like filter food and it wasn't
until the Cambrian explosion when there
was a like a lot of oxygen and other
things and so a lot of you know an
explosion of diversity of life that
predation developed and predation was a
selection pressure for the development
of two things sensory systems so these
little like floating tubes had no visual
system they had no rudimentary visual
system or auditory system or they had
no sensory systems they didn't
really need them and they also had no
internal nervous system to control
any any systems inside because they
didn't have any systems really inside
except a motor system and a gut and that
was it so they had to develop sensory
systems but whether they're a predator
or a prey most predators also are
prey right so that they could they could
sense they had to develop distance
senses so they could detect what was it
going to happen what was out there but they
also had to figure out what was
meaningful and what wasn't what was
valuable because it's expensive to run a
system and learn the two most expensive
things that any human or any organic
system can do is move and learn and so
it means and so the development of the
systems of the body sort of served
the purpose of helping to determine the
value to the organism now it turned out
that you know along the way that those
systems also developed sensation you know
develop the capacity to send
sensations to the brain which had to be
made sense of if you completely demolish
that and you say okay well you have a
machine that
its purpose isn't to sense things in the
world and make sense of them then so
that it can predict what to do next
then maybe you don't need a body but
then you're not even talking about
something that is I don't even know I
mean you'd have to give me an example
for me to kind of reason through it in
terms of the energetics and the I wonder
if maybe body is throwing me off because
like AI purpose can be also survive to
exist and can be and I say just very
simplistically need needs electricity
or its connection to the cloud or
something but that can be its body
what's its function what does it do what
do you mean like what like it
doesn't just you plug it in and it
doesn't just sit there it does something
what's its function what does it do
do you mean artificial intelligence what
does it do well it's so you're saying
okay it gets its energy from a plug I
get that but what is it actually
attempting to recognize or do or what's
its function I mean we can use it for I
don't know some industrial experience or
maybe a self driving car AI for
example and then what I'm saying is okay
okay so it's driving a car it's driving
a car for you it has to sense things in
the world right and then the question is
can can it experience emotion and then
in your three model I agree with the two
of two and I was questioning about the
body and then maybe the body is what is
reflecting body is its survival to
create that value here's what I would
say I would say okay well let's let's
take this as an example I don't know I'm
just doing this up down my head but
let's take is an example so so sure you
can just plug it in and it can get its
energy from an electrical outlet but
still you want to have an efficient
machine that uses electricity
efficiently and otherwise that would be
like more expensive than it needs to be
and so that means that you'd want it to
do things predictively because that's
actually more energy efficient that's
why the human body doesn't care or
actually any at all brains are
structured to work efficiently not
because of the the you know the
the energy source there is glucose and
other other organic sources so it's the
same principle basically in fact
electrical machines in fact the whole
idea of predictive coding which is what
I'm talking about
from cybernetics and then human
researchers who study humans were like
oh wait a minute that actually could be
really useful for explaining things here
so you'd still want it to be super
efficient presumably if it's driving a
car it has to determine what's what it
has to pay attention to and what it
doesn't you can't be energy it can't be
frivolous in its energy use
right so it's got to be predictive and
it has to basically not pay attention to
some things and it probably has a bunch
of systems that it has to keep in
balance so that it's working efficiently
that so far counts as there's nothing in
there that actually violates anything
that I've said I think I was trying to
be really careful to say when I say a
body
I don't literally mean a flesh and blood
body I mean one of your brain's basic
jobs is to keep the systems of your body
in balance and that requirement which is
called a allostasis that requirement
forces a lot of other things to be true
about how the system works so if you
want if you want AI to do anything like
a human it has to be put under the same
selection pressures as a human not
literally with flesh and blood if
however you're talking about a function
that a human can't do or that isn't
relevant to humans then nothing
I've said is relevant to you probably at
all right because we're only talking
about about humans but could a car you
could have could a computer that drives
a car feel emotion maybe if it had
sensory inputs that it had to make sense
of but the problem is that I don't know
that I would call that emotion because
for humans we make a distinction between
the brain makes a distinction between
the sensations from the body and the
sensations from the world if I were
if there if you didn't have sensations
from your body you you wouldn't have
affect and so it just wouldn't be the
same but I don't know I mean may I
maybe I I can't really answer but maybe
maybe actually can I try to kind of
offer one idea that might combine you
both yeah so what if emotion is just
kind of heuristic for how your body
feels like you don't have enough
convenient power to process everthing
you summarize it and machine in that
regard would need the same heuristics if
it's not allowed then it would be
emotion so like either heuristic sorta
like something is brought wrong and my
views are off that can be in a way seen
as emotions and for us it would be like
something is off for me I feel pain you
don't really know where pain is but yeah
that's a signal field for deeper
investigation right and might be one of
the causes is I mean I don't believe our
brain is a pinnacle of engineering at
least no no like I correct me if I'm
wrong but let's say the frequency of our
neurons is like hundred Hertz a second
so the bandwidth is really like limited
and the only thing that gives us life is
that you have like hundred billions and
machine might not need that because
they're frequency's like higher here I
guess but right but I think it's maybe
I'm wrong but it mean I think it comes
down to a philosophical question like
okay so so a machine is driving a car
would have sensory
inputs that it has to make sense of and
it would have to do it predictively and
all of that but so it would have to have
category would have to do ad hoc
categorization and it would have to
maybe not but I think that would
probably be efficient way to do it so
it's making categories and it's
perceiving things but so when does that
become an emotion and when doesn't it I
mean you could also ask that of humans
right I mean we I mean I you know nobody
asked me this question but you know like
what is the difference between an
emotion category and any other kind of
category that a human can cat can develop
you know any kind of any other kind of
category that is of this sort which is
ad hoc and of social reality and the
answer is
nothing nothing is different so you know
in some ways it's a not a question I
think that science can answer because in
this culture we've drawn a boundary and
we've said well these things are
emotions and these things aren't there's
something rat...they're thoughts and in
half the world people don't make that
distinction so is it possible to to
develop categories you know to do ad hoc
categorization to do a predictively to
make sense of the world or sensations
sensory input from the world without a
body sure sure you could do it without a
body but then it probably wouldn't be
what we would call human emotion or what
feels to us like human emotion but but
of course you know it would be it could
be similar I guess to the experiences
that our brains can make but I I don't
know I I have to think about it more
actually my iPad is speaking to me I
Thanks Lisa for a great great
presentation I had a follow-up question
do we believe that if the human brain
and consciousness could process all of
the interoceptive signals everything
from the world all the percepts in real
time so there's no bandwidth issue
suppose the human brain could just
process everything the first question is
do you believe we would still have
affect by the sort of simple state of
where we are suppose we could just
represent every piece of information
coming and the follow-up question is
that depending on their answer so that
is how how how that relates to our
notion of the emotional experience so in
order though for us so for us to have
high dimensional in order for us to have
let me let me think about this for a
second so if we could sense everything
so our wiring changed right because we
the reason why we can't is that we don't
have the wiring to do it but would we
have affect
I think yes I think we still would and
I'll tell you why we would because the
way the brain is structured it's
structured to do dimension reduction
and compression of information so if you
were to look at the cortex and you were
to sort of take the cortex off the
subcortical parts of the brain and just
lift it off like a and stretch it out
like a napkin and you were to look at it
in cross-section what you would see is
that you go from primary sensory regions
like primary visual cortex or primary
interceptive cortex which is where the
information from the body goes to there
are a lot of little pyramidal cells with
few connections which and the
information cascades to the front of the
brain where there are fewer cells which
are much bigger with many many many
connections but the brain is doing with
all sensory inputs is it's doing
compression it's doing dimension
reduction that's how multimodal learning
hap that's how really all learning
happens essentially so I think it
happens in vision it happens with
audition and so I think even if we could
have even if we had higher dimensional
access to what the the sensory changes
in the body I still think given the way
that the cortex is structured we would
still experience we would still have
affect which basically affect is just a
low dimensional representation of the
stuff going on inside your body
[Applause]
you
Không có nhận xét nào:
Đăng nhận xét