>>My name is Elie Bursztein and today uh we're going to talk about SHA-1. Uh it has been a
joint collaboration with many people. Uh, the 4 the 5 people who were uh helping were Marc
Stevens from CWI, Pierre Karpman from INRIA, Ange Albertini, Yarik Markov and Alex
Petit-Bianco who are also from my team. And today I'm going to tell you about how we created
this uh SHA-1 collision. Uh before that I will tell you a little bit about the story about
how you do cryptographic and break hash function. And then we'll talk about what happens in
the post-collision world. Um we had a little bit of problem with the calibration of the video
projector so I tried to shrink down a little bit a slide. Everyone can see them correctly?
Yes? [off-mic comment] Alright. Awesome. So let's get started. Um, uh to make sure everyone is
on the same page uh let me briefly recap what the hash function is. A hash functions or
cryptographic hash function has 2 unique property. The first thing is when you have 2 files
uh when, if they are different is to hash to a different digest or short hashes. So basically a
short hash is, is a thick strings uh which compress your arbitrary lang long file into
this uh unifi- unique identifier and people use that as a unique identifier for files. The other
thing that is obvious, obvi- for many of you is uh that the hash function is one way. You can has
a thing but you cannot go back. And you cannot learn anything from the file by just looking at
the hash. Those are 2 main property of a hash function and a lot of people depend on it.
For example, a hash function I use uh involving document and s- software signing, uh you can use
them for example for Windows update, Firefox update, and Safari uh if are using I used to
make sure we can verify the signature. They are also used in the same way for HTTPS
certificate. Uh they are involving to helping ensuring that the SSL certificate are
signed by a senior authority and are therefore valid. Uh I use inversion control and you will
see that they usually never talk and finally uh they also use uh backup integrity to check which
file has been backup and if file has been back up correctly. Uh they're also using a slew of
other software uh including databases some 5 system and some 4. And this is a huge and
variety of software which make it very hard to understand uh all the consequences of what
happen when you use a collision. I'll give you an example later in the talk about unforeseen
consequences when you try to play with collision. So what we're going to talk about today.
We're going to talk about 3 things. First as I said we're going to look at how we going to
attack hash function, then we're going to discuss how we found the hash func- or the collision,
what happened and what not. And finally, we'll discuss uh post-collision world. And more
specifically uh how you deal with legacy software with using SHA-1 and how you do, what does
the future hold for hash function, what we're going to. So, before getting started uh
the one thing, it's very important to mention is, uh SHA-1 collision is just a final
straw, very, very long line of research and we are standing on shoulder of giant. Uh among all
the people who contributed and there is a lot of people who did over the last 15 years uh 2
stand as giant and they are not me. Uh they are uh Wong who what the first one who in 2005 came
up with a term called way to attack both MD5 and SHA-1. She's the one who divided the metal
that has been refined and finally lead to collision. The second one is my coder, um Marc
Steven who for the ne- the last 10 years actually was the one who kept going at it and despite
other people dropping out, uh decided to hold uh strong and basically carry out uh all the
work 'til the end. So, they are the 2 that deserve most of the credit for the work. Uh I was
just lucky to be there and be h- uh be able to give them a helpful hand. Um, actually a
funny anecdote, uh that was my second item, that SHA-1. Uh when it was time for it, back in
2009, 2010 u hwe looked briefly at the idea of trying to have to break SHA-1 in which case we
were like, no way, that's way too complicated it's too much resource sensitive, can't do it
and we give up. And I think a lot of people did give up as well. Because it's imp- uh
except to Marc an impossible task so I'm glad we were able as Google to help uh prove
financial resources. Uh so if you take something from the talk is like, Marc Steven is the
awesome guy and Wong was a genius behind uh the security attack who get all started. So
uh attacking hash function, um there are quite a bit of confusion about what other
defend type of attack are. So I'm going to sum them up. Uh but before that a quick show of
hand, how many of you know what a collision attack is? Quite a few, okay, okay. Uh how many of
you know how, what is a pre-measure attack. Way, less way more, less people right
okay. How about a second preimage attack? Okay, so to make sure we all on the same
page we're going to briefly recap and see a bunch of diagrams where it's going to be
very quick. Uh so collision attack is the topic of today is basically a, see a hack have to
files control and the goal is you create 2 files who hash to the same thing. So basically,
you are violating a license your only principle is that 2 files should hash to another hash
function. That is a collision attack. We have also another type of attack which is called
preimage attack. In that case uh, the at- there is an unknown file and you know the hash and
the goal of an attack is to be able to engineer a second file which would hash to the same
thing. So second preimage attack which is the one what is confusing most people is where
you know a file, let's say SSL certificate and as an attacker you create a file which is
doubling anger in the sense that it actually has the same hash. So far so good? Yes. Awesome.
Okay, so how to create a collision attack. The, the one where basically we have a lot of
practical results uh preimage, second preimage don't have uh practical attack. Uh for the
main hash function sh- SHA-1 and also MD5 and of course a newer one like SHA-256. So we're going
to focus on the collision attack uh which we can do in practice. So, collision attack are not
bruteforce attack. The first thing, which is important to understand, you can not brute
force your way out of creating a collision. Uh it's impractical. Even if you were to use GPU and
even if you were to use a tone of GPU you can't to give you an idea we think it's about 12
million years of computation with a GPU to get to a collision, not going to happen.
So, if you can't do that what, what are you going to do. You are going to use something which
is called cryptanalysis. And the idea of cryptanalysis is we are studying how the function uh
diffuses a bit and how it scrambles them in a way to find biases we can exploit to reduce
the size of the competition to point where we can finally do it. So, I'm going to give you a
primer on cryptanalysis. I'm going to skip a ton of detail. Uh it's just for you to get an
idea, if you want more detail uh research paper. Not me. [laughter] Alright, so here is a
view of, I'm missing a slide here. Uh okay, okay we missing a slide, that's fine. Okay. So
here is the other version of the SHA-1 uh compressed function. Uh so the SHA-1 compressed function
is not the one you're going to find on Wikipedia. If you go to the Wikipedia page you get the
more simpler function. So reason why we use this for presentation is because when we do
cryptanalysis we need to understand very precisely each of the step of the function.
This is what happen when you have a block. So you chop up your uh filing to small block
and you hash them and then you pass it to the next one, to the next. Make something called a
dam- uh damgard merkle construction. So basically, this is a function who do the
crumbling. Alright then we study these 80 steps long and you see them on the screen. Our case a
presentation of them and we do study how the bead flows through the sink and the thing which
provides the crumbling are the little uh p- purple box uh like F is a non-linear function, plus
is a studied to be modular um, operation and so forth. So, how cryptanalysis works. Um the idea
is we are studying message differential paths so the idea is we try to understand for a
group of messages uh wh- how the beats are moving and how they are being uh swapped around a,
or flipped and so forth. And that's the red dots you see on the screen. So when we have this
understanding they give right to what we call a uh equation system and for SHA-1 we know
exactly what happened for the first 16 step. Remember there's 80 of those so for the first 16
it's basically solved and we understand what happened. So we can understand how to reduce the
computation. Now we have reduced the computation uh by 16 step. Then uh we used a bunch of
tricks uh including boomerang, neutral beat and so forth which is basically try to get to
counteract the, the chaos in the atrophy creeping up me up uh to the point to pushing the
computation to a smaller part by another 8, uh 8 steps which is up to step 24. And then well we
don't know. And the other part for you is this huge competition that's in Tucson at least where
there's a part from 24 to 80 where we're kind of, we cannot really understand what happened
and it's too big to say the expression is too large, so we cannot, that's where we use a
GPU to actually compute all the possibility. Or a huge fraction for those because we cannot
understand what happened there. So, wh- how you create a collision. Well we use the idea
from Wong which is the idea of using 2 collision, 2 blocks. So, you have a ba- a bunch of the
beginning of the file which we call the prefix and this is whatever you want and then we
have a first block which we pull a near collision. So, what is a near collision? A near collision
is 2 blocks which basically are very close to each other uh for the definition of uh closeness,
basically imagine that those 2 are good candidates and they are different but the are
sufficiently close that we have a good uh, guess or we have a good uh belief that they are
going to be resolved with a second block. So basically we take the compute of those 2
blocks and then we have the bl- the second block who basically cancel out the, those 2
collision. The collision and then wind up with the same output. So at that point the
left and the right will have exactly the same output and after that you can put whatever
you want after that. Um because the collision has been resolved. That is the basic idea of a
collision and the technology part is we use 2 block. All you have to really remember is we
have a prefix which is basically something you have to choose in event and then you have the near
collision, collision and then after that it's all the same. From a perspective, it's all the
same because there is no dependency in how you conv- create the has function to the
past. So at that point it's identical and you can put whatever you want for the uh
suffix. So, okay, so, I described this one, this further one before don't, don't know,
I'm not sure what happened. So how do you exploit collision? So exploiting collision uh the one
we do for SHA-1 is what we call the fixed prefix attack which means uh we had to create a
prefix, so choose a green box uh before we doing the computation because we can't change it
afterward because it fit into the collision so we selected the prefix and you have to select it
in the first mark way and then we have the collision blocks and after that what you can do what
we call an arbitrary suffix which is, uh you can jam whatever you want before after
that then you can create many many document which have a collision in it and we'll show
you how we can explore that in a minute. So, uh here's how it works. So it doesn't seem very
powerful right? Because you have choose everything. Everything is pre-computed, uh that is true.
But the prefix you choose would actually influence how much you can do. In our cases we used PDF
and JPEG. You can think of all the stuff like EDS, um I like other type of file from which
have flexibility, but PDF for us was the best one. And so the idea is we use a collision block
to change the length of specific fields of SHA comment and that way what we d- what we display
on the screen will be different from file 1 because it's going to point to one part of the
suffix and on the other file is going to point to another part of the suffix. And that way even
though we have the same suffix we're able to display different view of the word uh because the
collision blocks themselves are used to control some of the display function. I show you a
practical example uh later on in the talk. Alright, uh chosen-prefix attack. So that is
the one we tried using MD5, this is for reference. MD5 is way more weaker, so in that specific
case you don't have to worry about choosing a specific prefix, you can use the one you
want. And that gives you way more uh flexibility I'm going to use, show you a practical attack
as far as using the MD5 attack just to illustrate the purpose. Alright, enough theory. Ah. No
we're going to go to the uh real world attack and I'm going to show you exactly how to use
these things in practice and we're going to start with MD5 because the one which happened
about 8 years ago and so we have a lot of um hindsight to look at in the real attack to show you.
SHA-1 is too new to have those kind of stuff. It gives you a sense of what you can do when
you have those collision. So, in two thou-, the first attack which was created um, using the
MD5 collision in practice was creating a rogue SSL certificate. You should able to
manipulate the signature of a certificate, it turns out you can create a third si- uh
signing certificate for everything you want which is valid then you can impersonate
any worksite on the planet. This is what Marc did with Alex [indescribable] and a bunch of
other people. Uh, back in 2009. And so the way it works is as follow. This is what necessary
certificate look like, uh you have a bunch of field which hare serial number and validity
period which are uh from the uh what should I give you, the register and then you have the
real cert name. Which is basically which domain is your certificate signing for? And
then below that you have something very important which is the X509 cert- extensions CA
FALSE. What it means is your certificate cannot be used to singed other certificate because
obviously you're not a CA. So the whole goal when you create this, when you forget
certificate is to swap those 2 fields. So how you do that with a collision, here's how you do
it. Uh first you obviously, you write uh the cert name to say where I want to be able to sign
for everything. So I remove the real domain name and then put a start instead. Then I put my own
public key obviously so I can actually do the creation. And then I say, well I'm a CA
certificate right. And then after that you have to hide the read priv- the read public key
and you do that by hiding it into what we call the uh netscape comment extension, so a
lot of SSL certificate have a lot of field and one of them is the ability to jump comment in
turn, you can use that as a new trick to hide the previous public key and you use this
space to create a collision. Then you leave the signature. you copy pass it to your new,
brand new uh fake certificate then you have a signing certificate. That's basically
how you create a, uh fake uh valid SSL certificate. Uh, if you think that there, that's a
uh academic attack, it's not. It turns out that uh in 2012 um a malware was discovered which was
mainly uh spying on an Iranian com- computer. Uh the malware was called Flame. I'm not sure
you remember it. Uh this malware it was fairly unique in a sense that it was using a colliding
certificate to push fake Windows update. What happened is someone learned from it uh decided to
create a collision uh by uh stealing uh by lifting the signature of a Windows terminal
server, which should not have and removing the restrictions. Those payload certificates were
only used for compatibility and they had a bunch of restrictions so what they did is they choose
a certificate, changes the name and then remove to say it's only for Windows th- uh XP. They say,
no, no it's for everything. Repackaged it and used it to deliver uh malware as a form of
Windows update which we're saying is a certificate to attack Iranian computer. Um
what's very interesting about this attack is uh because he choose collision, uh we, Marc
and uh his team was able to reverse engineer how this malware works and were able to
figure out how the collision was created. And low and behold it didn't choose any of the
technique which were discussed in that Cadamier it was using a completely other technique which
were using a 4 block collision. If you remember I told you a 2 block collision is the way we do
it because it's the most efficient so now the people who created the Flame uh fake
certificate was not using 2 block, they were using 4 block which is less efficient. You can
still do it and were using uh vector which were another one discussed in Cadamier. So
someone in the world had enough cryptographic uh knowledge and enough uh resources to pull out
this kind of attack uh which led a lot of people to uh, I put it as it was a state sponsor
malware. Um and that's the explanation behind this. If you want to know more about it there
is a very, very neat paper by Marc which is called, uh Reverse-engineering cryp- of the
Cryptanalytic Attack Used in the Flame Super Malware, which explains all of this in detail
but that's the bottom line. So, basically you can weaponize collision to create attack and
it has been done in the past and that's why taking collision seriously is so important. So,
where do we live today? Well for the old function the old dead, uh MD4 is dead, for very, very
long time. Uh MD5 is so dead that you can do it on your smartphone, no kidding you can
literally create a collision in your smartphone. And then SHA- t- SHA-1 is a new attack we
created and it took us 2 63 competition. Which is pushing the limit of what we can do. Uh
it's one of the largest competition, uh competition in the world as far as we know. So
how do we, do we find this collision? Well, uh we started in 2015 and as I said the prefix
has to be fixed for SHA-1 so we had to create a clever prefix. Ange Albertini uh was the one
who uh look at this because it is uh very, very well known for how to do uh world expression to
craft thing, interesting uh documents. And so he came up with the idea of using PDF and
JPEG so we can actually have a prefix which give us malleability to show one measure
or another and make it very, very easy. I'll show you a demo in a minute. Uh and then, uh I
ran the first competition um between 2015 and 2016, took me about 9 months and 300,000
computer. Then uh, excitement began. Uh wait for the result and then uh we have to develop a
full attack and we have quite a few sweater, 0:18:31 as well. And finally uh we did the second
clash computation in early 2017 and then we uh issued the release in the press um in
February 2017. So, what Ange game up with to be able to use, to, the collision to full extent
was to embed a JPEG into the PD- to a PDF header to created collision which will be like
valid PDF document. And the trick is uh JPEG has something called a comment and the idea is
you'll use a collision as a boundary for the length of the comment, so both, both file
while having the same hash will actually display different thing because 1 in 1 case uh the
length will point to the first dimension the other case um by rebounding on comment of comment
will rebound to another images. So while the comment of the 2 files is the same, if someone is
to open it you will see 2 different images that the power of creating collision, of course
we can use whichever images you like. So now you can create any pair of doppelganger uh files.
So, try to imagine 2 insurance contracts, one say you only have to pay 1,000 dollars and the
other one say well no actually have to pay a hu- 1 million dollars and those would hash the
same. It's a kind of stuff you can do with that. So, then I took Marc's code and tried to
scale it up to uh I decide to use about 300,000 core uh in multiple data center around the
world. And most of the complexity was to find the right balance between the resilience
and preferences so every time you have to dispatch a new job we have to copy stuff in memory,
move thing between computer and so for us there is a lot of overhead so you want to run as
long as possible at the same time, computer fail and we have to reboot a job or the job has
been preempted and so forth. So we said they're done after a few try and out, about 1 hour batch
and we run well quite a few of those, like million of those. Uh and then, uh also we factor the
code to be stateless. Uh one of the difficult thing when you do this dash competition is when
you have a database where you have to put all the job. Instead of doing that uh we were making
each job stateless so you can restart them and then uh we're using basically a random
generator to decide which SHA to use instead of having a synchronization because
otherwise you have to deal with fade over and replication of the master and become uh even more,
a bigger hell to, to manage. And last but not least, uh I make a huge mistake. Um, Map Produce
is, a very famous computation firmware and the idea is you map your computation and you're
going to reduce to all the solution, I'm like that's perfect that seems to be the
great tool to do the job. So now it's actually the wrong tool for the job because the map is very,
very complex. You look for the entire space but reusing the part where you aggregate the
solution is very small. Huh It's only a few bytes. And so we were spending a lot of time waiting
for the last part of the computation to finish, I was running batches of uh I think uh
50,000 job at a time. So 50,000 hours by 50,000 hours and so w- as you can see on the graph uh
most half of the time of the job was used for basically waiting for the last bits to finish. So
for the second computation part we actually moved away from this part in, to uh using a simple
job system where we have it factor in a bunch of job and make them all independent which
was way more efficient. So lesson learned, don't choose Map Produce for the job. Um, we
learned that the hard way. Then becomes the part where we, was the most scary. Uh we finish the
first computation, uh some time in uh I believe January and then uh in the middle of the night
um, my, my phone rang and then I know, we found the first collision. I send an email to
Marc, I'm saying, hey we found it. Great. Okay and then he said to look at it. I'm like, can I
get to it and we'll get to it? And I'm like, wait, wait, wait looking at it and then uh turned
out that at first glance what we found was not solvable. Which mean that we didn't know how we,
we built an equation system to find the second block but we couldn't figure out what was the
equation system had no solution. And then we were very, very scared for many for quite a bit
of time that we couldn't do it. But then Marc and Pierre uh figured out how to add extract
condition and fix the solvability using that technique to actually make it work again.
And then we were able to find more speed up to accelerate the computation and then were able
to run the computation and then Yarik, uh from my time actually run the second set of
computation uh through GPU. So GPU is very interesting, uh a lot of people say GPU is great
for crypto. That's true. Except the architecture is from a, very different from uh a CPU. So you
have to rethink how you do things, in particular uh memory transfer is very expensive so we
couldn't do load, unload, load, unload and those kinds of stuff. You have to allocate memory, do
your computation, allocate [inaudible]. So what we decide to do, and that's Pierre's idea
is to use a feet forward, where we compute the base solution in CPU and then we move on to the
first 2 steps between 14 and 26 and we compute the pot- ton of potential solution. Using GPU to
compute all the next step and until we have regular statistics do one step and move one step,
move one step, when we don't have enough solution go back and back and forth. So reason why
that work is because the first few steps you have very, very little solutions. This is a, uh
graphing logarithm. Logarithmic scale right? So it mean that the first step you do maybe a
100,000 computation. The one in the middle you do, you have up to 100 trillion of solution. So,
if, it quickly ramps up. So what we do is we were saturating the memory with as many potential
next step. Do all them at once and if we don't find a solution back track, generate more
solution, move back and forth and so forth. So it's a completely different way to
approach cryptography. So while GPU is great for crypto it also require a lot new way of uh
distributing uh, distributing your computation because you don't have the same memory in
general, memory manipulation as you have in CPU. Alright, and then uh we spend 110 years uh of
GPU a far cry from the 12 million would have spent if using bruteforce, we succeeded.
That being said, before we succeeded, uh we had a funny incident. I mean I didn't find
it funny at that time but now it's okay. Uh, so Yarik finish a computation and uh from my team,
and send an email to Marc, saying hey we found the collision and we are ready to c-
celebrate, you know the champagne is there for swig, we'd like where we're going to
celebrate something awesome. And Marc say, doesn't work. And we're like, what? Doesn't work.
What you mean it doesn't work? And like, doesn't work. And it turns out uh because the way we
did computation we had it done in big endian instead of little endian and it was looking at it
the other way. [laughter] Yep. So, we fixed it. [laughter] It's fine for like 1 exchange but and
then for all fine. But I I'm telling you, we were wiped. Anyways, so. And then we were
happy, we were able to create those uh, colliding PDF, uh you might have it, from them online.
So basically, you have um the same shell, it's the same SHA-1 and a different SHA-256 and as
you see because we use a comment of comment trait uh 1 is blue the other 1 is red. Uh that the
idea we had uh and then we were like okay we're going to give the world 90 days before we
release this, the, the code to create uh 2 PDF which are doppelganger. Well trust the
internet to do it in 2 days. Uh so they have a better tools than ours so I hear they're releasing
it up on GitHub one so I'm using their tool, thank you for recreating it. I, and so
basically that's easy as it is. So we have 2 PDF which have different values, a cat and a
tiger that, that's what, what you do is you basically use a collider script and the collider
script will automatically match both of them into one PDF and align the comments 1 point to
another and that's how fast it is. Literally really time and then you can see, boom you have
2 PDF's which have the same. [applause] Thank you. Okay, and then the uh you have the
SHA-256. Alright, so here is the gig impression behind the scene. Uh you have the fixed port I
promised you which says PDF, uh the PDF headers and then below it we have the JPEG start and
you see and then following that we have the JPEG comment. And then at the border of the JPEG
comment we have a collision block right? And you can see it here. Let me try to do this. Uh
y- you see it hear right. You have the last bit of the, of the comment which is inside the
collision block. So they will be different in both cases right? And so one of the comment is 1,
73 the other one is 1, 7, F. So as a result uh you do create a disorganization and in one of
the file the image start way lower and it's, it's a comment. In the other file well until we
link it I say it's a JPEG and then in other case it's a real image. And that how it works
right so a lot of people were confused when we should you that how we can do and create as many
as we want. Try to explain that to the press. They're like [laughter]. And then that's what
it is. Uh I want to also give a big prop to Hector Martin which is a guy who actually created
the visualization. Again, had a create one and he do a way better one so I stole with his
permission, uh his version because I think it's a clearer version we have. So thank you
for him to do that. Um, so post-collision world. So what happen when you create a
collision? Uh first thing, uh and that the thing which actually was the goal, uh we
release the collision uh a few days or I think even on the day of a meeting between brother
which was a, which is a consortium to decide what is the future of web browser and there
was, it was about to have discussion about um, prolongating the lifetime of
SHA-1, except it was um su- was supposed to be some type but people were like, no, rom my
understanding because some people have complained, but in the life of the attack, uh
Firefox is finally give up and say fine, we are going to immediately ahead of schedule
stop it which was exactly our intention. If we engage to do any such a long computation is
not for the fun. Also it was very fun it was because we really wanted to push the world
to stop using SHA-1 once and for all and as many, many of the cryptographic loom unless you
really should monitor the people and the real attacks tend to uh, delay it for another year. Right
ya know, out of sight out of mind so we decided to put it in front of you so you have to do
it. And I think for us it was the most important thing, is we did break SHA-1 and as a result
user will feel lot safer so that's really the goal of the entire exercise and it was a
well worth investment. Uh, Microsoft Office did the same in May. Uh so now all uh, I believe
every browser have duplicated SHA-1 so now we have, we are in a better world, better place.
Um, we got leaked. Someone put a bet that SHA-1 would be broken in 2017, literally an hour
before we were about to release. I have no idea who it is but the guy made 80 coin and I want my
share. [laughter] So, uh also I, I assume Marc, so the release was about 5, uh 5am PSC because
he's in Netherland and so I saw him a few hours before scrambling to try to get the uh
bug bounty for the uh for bitcoin so there is this people who are giving out, a little bit
of bitcoin money for the people who break SHA-1 to incentive to buy them, to do on, see him
wrestling to get the data, and the money, but he did. So he died claim the bounty just in
time so Marc got the bounty that's fair. Uh hopefully nobody stole it from us. Uh, and then,
I, I spoke about um, enforcing c-, uh enforcing uh situation right. And I think the, the most
impressive one and uh uh completely take us aback was when you get- have problem and
I'm going to talk about it in a minute, well we insert SVN because none of us use SVN for
very long time so we learn some of it. But then a guy from WebKit was like, WebKit which is
the uh web wanderer for Safari, uh one of them was like, oh we use SHA-1, uh let me make sure
we don't [indiscernible] for collision, which is a great idea. And so he pushes you to
test to the SVN of WebKit and the WebKit SVN die. It's not like you can revert, it just
died. And for 8 hours you see all the WebKit engineer completely freaking out like,
did you try this command? And we do this. And ultimately they were able to recover it but it
literally destroys the entire SVN from working for 8 hours. And then a patch did it m-, did
issue an SVN uh emergency patch and say, please do not test collision on this SVN we know
it's broken. Don't do this. So that's an unforeseen collis- uh c- that's an unforeseen
situation. If we had known we had given them advanced notice but we did not. We did not have
the test vectors, we did not know it would break anything. But we broke uh GIT which means
there are other software somewhere which are broken by uh by SHA-1 so you have to find a
way to mitigate those and that going to be uh. So next part of the talk, which is what do you
do when you think there is a SHA-1 software looming into your network? Right. What do you do?
Well we have a great example. GIT. So, despite early warning and there was a long strand and
I put the link there and they're on the slide about people trying to warn the news to not use
SHA-1. Like no, no that's fine I'm going to use SHA-1. Don't do it. No no I'm fine I'm going to
do it. So at the end of the day GIT use SHA-1 and it's making so many assumptions that it's
really hard for them to move away from Sha-1. There are they are moving away from it finally
but it took a long, lot of discussion and it's still not there. So now we have SHA, we
have a very well used software. How many of you use GIT? Yeah. So, all of you use something
which is enabled to put directly to SHA-1 collision where we can have 2 blocks which share the
same collision and you end up with 2 uh 2 different view of the world for the same GIT
repository. Not good right? So how do you mitigate that? Well, turned out we have a tool. And
actually Marc invented this tool. Um, he, it's for uh, he invented this tool and applied
that I said to flame but it still work for SHA-1. The idea is to use counter cryptanalysis.
It's a very mouthful word but the idea is because the way you create collision uh create
unique property into the file. Most likely there are tr- trivial differences to the way
we use differential pass. You can with a single file, this uh detect if this file is part of a
collision. It is very very high precision. It's like um zer- uh the full points are zero point
zero, zero, zero, zero, zero like 17, zero, 1. So you can run it in a projection system, it
has a little bit of overhead but it's okay. And it only it's also OU which is very important to
understand. If you see a collision how it has been constructed. So you know if it's
ours or if it's a new one we've never seen, so that's how they did configure the same one. So,
counter cryptanalysis, Marc improved the version he had and pushed it on GitHub so if you
have a SHA-1 software, using ShA-1 please use counter cryptanalysis. We do. And the we
did that so someone from Google, shown 0:34:22 um, fixed uh JGit to basically do counter
cryptanalysis check when you submit. So JGit will refuse any collision uh block. The other
one who did the same thing is GitHub. GitHub in March announced their blog, well
finally we're going to use counter cryptanalysis. So if you use GitHub, how many of you do
use GitHub? That's it? I'm kidding. A lot of people use it right? Um uh you are protected
as well because they did put in place, so check when you push your, your comment to actually
do this. So the way you actually deal with legacy software is by doing the protection and
practice uh mitigation. Um, and even Git bought onto the counter-cryptanalysis band
wagon, actually it was deployed in 2012, 2. So now we have a the same. Uh, we have uh
counter-cryptanalysis also for every Gmail document. Uh and if it drives document for good for
system uh somehow our user detests on the day of the release to see if it's working
with internet. It does. Uh the reason why we do that is because we are concerned with people
using all uh client software that we don't know of and it might actually store them in
backup software we don't know of. A lot of copyright users as well so we don't know where
their backup system are so we basically prevent collision that way and you can do the same. Um
as I said uh we are concerned that crash legacy software like SVN, um might have colliding
document with different terms like contract, lease, um power of attorney and so forth. And of
course, blackswan is unforeseen consequences that we don't know of. And we really don't want to
know of. And uh so, how much does it cost us? About 4.5 percent, uh of extract
computation, 5 e- email attachment that we scan that way. Uh that's based on the 1
billion datasets we tested uh after we release it. So we believe that's way worth the
investment. It not a huge number. It's significant, it's not like impossible to do. So
where that leave us, right? Well picture is bright. Uh it's awesome. We have a hot new hash
function coming online. Uh we have SHA-256 and the SHA-2 family and we also have SHA-3
and BLAKE. And what's great about all of them is they are using different construction
which mean that if one breaks it's unlikely it's going to break all of them at the same
time so we have a lot of diversity to choose from. We all have different perspectives so
we're in a very good place. So hopefully you won't have that kind of talk for the next 10
years, hopefully. Uh, and yea, so takeaway SHA, SHA-1 is finally dead. Uh long lived to
SHA-256. Counter-cryptanalysis is a really cool tool, and I, I hope I inspire you to check it
out. And the future is good, so thank you very much. I'm going to take any questions you would
have. Uh, thank you very much. [applause]
Không có nhận xét nào:
Đăng nhận xét