- 20
- 9 173 944
Kevin Fang
United States
Приєднався 3 сер 2019
How a Leap Day Took Down Microsoft
A look into one of the largest leap day bugs in history, as well as how Microsoft Azure's compute platform works (well, worked - it has been over 10 years. Though the fundamentals likely remain the same).
Sources:
azure.microsoft.com/en-us/blog/summary-of-windows-azure-service-disruption-on-feb-29th-2012/
Chapters:
0:00 Intro
0:43 Cloud Stuff
3:41 Azure VM Stuff
5:10 The Incident
7:59 Mitigation
10:23 Aftermath
Corrections:
-
Music:
- Ubiquitous by Diamond Ortiz
- Jane Street by Track Tribe
- Blackout by LEMMiNO (ua-cam.com/video/RsVVcsVDt-s/v-deo.html)
- Cipher by LEMMiNO (ua-cam.com/video/b0q5PR1xpA0/v-deo.html)
- Funk Game Loop by Kevin Macleod
Sources:
azure.microsoft.com/en-us/blog/summary-of-windows-azure-service-disruption-on-feb-29th-2012/
Chapters:
0:00 Intro
0:43 Cloud Stuff
3:41 Azure VM Stuff
5:10 The Incident
7:59 Mitigation
10:23 Aftermath
Corrections:
-
Music:
- Ubiquitous by Diamond Ortiz
- Jane Street by Track Tribe
- Blackout by LEMMiNO (ua-cam.com/video/RsVVcsVDt-s/v-deo.html)
- Cipher by LEMMiNO (ua-cam.com/video/b0q5PR1xpA0/v-deo.html)
- Funk Game Loop by Kevin Macleod
Переглядів: 132 499
Відео
How A Steam Bug Deleted Someone’s Entire PC
Переглядів 832 тис.3 місяці тому
A deeper look into a steam-for-linux GitHub issue (github.com/valvesoftware/steam-for-linux/issues/3671) investigating how a steam script was able to delete the entire contents of someone's root directory. While the direct cause of the rm -rf is fairly obvious, how it was triggered in the original bug report is not, and may forever remain a mystery... Sources: www.opensuse-forum.de/thread/10620...
Polish Amazon Offers Deal So Good Their Servers Implode
Переглядів 210 тис.6 місяців тому
A look into how Allegro, the Polish version of Amazon, handled a sudden surge of traffic due to a one-of-a-kind deal. Sources: blog.allegro.tech/2018/08/postmortem-why-allegro-went-down.html android.com.pl/sponsorowany/138693-honor-allegro-promocja/ www.gsmmaniak.pl/868392/honor-7c-promocja-zlotowka-allegro/ antyweb.pl/honor-7c-za-1-zl-allegro-nie-dziala Chapters: 0:00 Intro 1:17 Front-end 3:01...
How Not To Secure Your Company (Target Data Breach)
Переглядів 402 тис.8 місяців тому
A look into how hackers stole 40 million credit and debit cards over a period of 2 weeks from Target in 2013. Sources: www.commerce.senate.gov/services/files/24d3c229-4f2f-405d-b8db-a3a67f183883 people.cs.vt.edu/danfeng/papers/Target-Yao-unpublished.pdf aroundcyber.files.wordpress.com/2014/09/aorato-target-report.pdf krebsonsecurity.com/2015/09/inside-target-corp-days-after-2013-breach/ krebson...
Cloudflare Deploys Really Slow Code, Takes Down Entire Company
Переглядів 589 тис.9 місяців тому
Cloudflare is back at it again with more regex and state machines. Previously on Cloudflare: ua-cam.com/video/GEbn3nHyKnA/v-deo.html Sources: blog.cloudflare.com/details-of-the-cloudflare-outage-on-july-2-2019/ blog.cloudflare.com/introducing-quicksilver-configuration-distribution-at-internet-scale/ swtch.com/~rsc/regexp/regexp1.html www.regular-expressions.info/catastrophic.html cyberzhg.githu...
How GitHub's Database Self-Destructed in 43 Seconds
Переглядів 906 тис.11 місяців тому
A brief maintenance accident turns for the worse as GitHub's database automatically fails over and breaks the website. Sources: github.blog/2018-10-30-oct21-post-incident-analysis/ github.blog/2016-12-08-orchestrator-github/ github.blog/2018-06-20-mysql-high-availability-at-github/ news.ycombinator.com/item?id=18272928 www.reddit.com/r/programming/comments/9q94am/github_major_service_outage/ hu...
The Worst Website Launch of All Time
Переглядів 334 тис.11 місяців тому
With a $464,000,000 budget and a timeline of three years, CMS and company attempt to build a website. Unfortunately, it does not go as planned. Sources: oig.hhs.gov/oei/reports/oei-06-14-00350.pdf Chapters: 0:00 Intro 0:22 Part 1: Optimistic Beginnings 1:08 Part 2: The Dumpster Fire Begins 4:49 Part 3: Countdown to Launch 7:48 Part 4: The Last 40 Days 10:35 Part 5: Launch and Aftermath Notes: -...
Dev Deletes Entire Production Database, Chaos Ensues
Переглядів 2,5 млнРік тому
If you're tasked with deleting a database, make sure you delete the right one. Sources: about.gitlab.com/blog/2017/02/10/postmortem-of-database-outage-of-january-31/ about.gitlab.com/blog/2017/02/01/gitlab-dot-com-database-incident/ Notes: 1:05 - The middle bullet point about the account that had 47,000 IPs was never mentioned in the postmortem (there was an initial report the day of and a more...
Capital One's $200M Cloud Data Breach
Переглядів 471 тис.Рік тому
How a random ex-AWS employee managed to get into the AWS account of Capital One unnoticed using a fairly low-skill attack. Sources: www.justice.gov/media/1019711/dl?inline blog.appsecco.com/an-ssrf-privileged-aws-keys-and-the-capital-one-breach-4c3c2cded3af krebsonsecurity.com/2019/08/what-we-can-learn-from-the-capital-one-hack/ www.researchgate.net/publication/361860348_A_Systematic_Analysis_o...
How This SQL Command Blew Up a Billion Dollar Company
Переглядів 616 тис.Рік тому
A story of the Heartland Payment Systems breach from 2007-2009, the world's largest at the time. The specific details of how everything went down is unknown, so this is built on top of the USSS/FBI advisory, and various articles. The FBI advisory (see the third source) covered dozens of breaches that occurred in the late 2000s, all of which had the same attack pattern (Windows, SQL Server, xp_c...
How One Line of Code Almost Blew Up the Internet
Переглядів 1,9 млнРік тому
Sources: blog.cloudflare.com/incident-report-on-memory-leak-caused-by-cloudflare-parser-bug/ blog.cloudflare.com/quantifying-the-impact-of-cloudbleed/ bugs.chromium.org/p/project-zero/issues/detail?id=1139 asamborski.github.io/cs558_s17_blog/2017/04/08/cloudbleed.html www.colm.net/open-source/ragel/ "[CloudFlare] A Day at the CloudFlare Office" ua-cam.com/video/_ttI4eWuQU4/v-deo.html Assumption...
Can ChatGPT solve the world's hardest puzzles?
Переглядів 54 тис.Рік тому
ChatGPT tries to solve some problems from www.janestreet.com/puzzles/ Puzzle 1: www.janestreet.com/puzzles/the-hidden-warning-index/ Puzzle 2: www.janestreet.com/puzzles/robot-tug-of-war-index/ Puzzle 3: www.janestreet.com/puzzles/single-cross-index/ Chapters: 0:00 Intro 0:21 Easy riddles 0:58 Jane Street puzzles intro 1:27 Puzzle 1: The Hidden Warning 3:30 Puzzle 2: Robot Tug of War 5:57 Puzzl...
Real day in the life of a Twitter software engineer
Переглядів 29 тис.Рік тому
A normal day in the life of a Twitter SWE
Software Engineer Interview Simulator
Переглядів 21 тис.2 роки тому
This is a very realistic software engineer interview simulator.
How many ways can this puzzle be solved?
Переглядів 11 тис.2 роки тому
Attempt at making an "educational" video... The only snippet about the history of this puzzle that I could find on the internet: "90‘s in the last century, a puzzle toy called "Drop of Cleverness Bauble Spelling Tray" was popular in Japan. The toy is composed of a tray and 12 Baubles. Player's goal is to use these baubles to fill the pan gap acording to different patterns. [...] It is more inte...
seth everman plays toto - africa in different genres but it gets more and more out of tune over time
Переглядів 9 тис.3 роки тому
seth everman plays toto - africa in different genres but it gets more and more out of tune over time
how to pass every coding interview (animation)
Переглядів 14 тис.3 роки тому
how to pass every coding interview (animation)
the jazz riff that Davie504 plays a lot
Переглядів 4,4 тис.4 роки тому
the jazz riff that Davie504 plays a lot
gus johnson freestylin' in the studio (remixed)
Переглядів 110 тис.4 роки тому
gus johnson freestylin' in the studio (remixed)
5:00 he just triggered all of the gd players with this music
Great code review picking up on that error ahead of time.
too many ways to pronounce "Azure"
Hey I don't speak computer, can I get an explanation in fortnite terms?
How to prevent this: Constantly run and monitor a canary cluster using current UTC plus one day.
If you get a call on a Friday, it means you're gonna work on the weekends too.
Dude dir Channel is so underrated
Language models are impressively bad with anything rekated to math. I once gave one a string and asked it to count the characters and it failed in the most spectacularly impressive ways.
Steam must not have had unix dev - 'dirname $0' - and then protect any rm -rf invocations from invocation without args, because scripted rm -rf is never not scary.
This is one of the funniest bugs ever
You know what, props to them though. It was definitely a learning moment
The Video: Surprised Pikachu- i mean Microsoft Face.
ds why ya use windawos
Ime dont get anything he is saying
How's the house fly gonna get to the 💩 pile w/ o increment operators?
8:21 I would assume it's down to the code having to go through the QA pipeline. It's irrelevant the size of the fix even a "one liner" code fix will still incur the overhead of automated testing. As it should. Who's to say the fix doesn't cause some other issue via an unintended consequence. This would be standard for an enterprise level DevOps and I would go out on a limb and say MS Azure infrastructure probably has some long ass complex deployment process that involves running 100's if not 1000's of automated tests. 5 hours sounds about right to me. Just my professional opinion.
9:28 And that's why you don't rush a fix.
It reminds me famous bumblebee 123 issue. rm -rf /usr
The Onosecond
0:47 made me laugh way too hard
This is why you leave the DateTime functions to smarter people that work at [Insert prestigious company here] oh wait...
You sound like fireship
"I accidentally deleted prod db" need some big balls to type that ngl
Where is the orchestrator gui you mentioned
i'd rather end up on Kevin Fang's channel myself as a developer of anything infrastructure related involving servers rather than be a victim of a fundamental issue resulting in a catastrophic failure that i myself could not fix for over a day
Time is the bane of most developers. Few in the world are aware of just how many kinds of time we deal with and how many problems that can cause.
In hindsight, I think 40 minutes of west coast writes was a more than fair sacrifice 🤷♂️
0:06 they call it Ash'er over here..
This is my greatest fear whenever I have to do anything more complicated than commit, push, merge...
Fantastic delivery as always
And this is the government that controls vaccine ingredients?
The original incarnation of regular expressions was very efficient and fast. But then people started adding features to it - features people liked, so they've stuck around. But - they open the door to VERY inefficient expressions. It doesn't take a lot of study to understand why some of these things break your performance.
love how you always find new ways to present information!
Is there a terminal emulator that requires a second person to review every single command?
I always heard it pronounced Ashure none of the ones you said
This is the sort of bug I would expect from a beginner programmer not the experienced ones that Microsoft would have working on Azure but i guess anyone can do it and that would be why Microsoft is updating their C++ compiler to detect some leap year bug.
Please stop poorly faking fireships voice and use your own
Now my brain is HI
Steam doesn't work on my Linux I'm working on changing proton to use OpenGL but my gosh is compatibility for Linux horrible
I’m at 4:01 and I have to pause the video for a few minutes. Once I’ve had a some time to rest and restore my courage, I’ll continue.
Please do the Kaseya hack!
Where do I get the tool that you used at 7:24 for debugging regex?
Stop adding those moronic sounds - this is not tiktok...
It cannot judge itself to be flawed? I am impressed that this needed a leap year bug.
Granted this does not matter if it cannot stop new creation requests from entering. I am guessing this gets into a larger discussion outside the scope of this video. A failure killing a server, cluster, site, etc. is kind of magic. How many servers does it try within a cluster? How many clusters does it try? How many sites does it try? What are the odds of those failures? At what point does the cost of the attempts invoke an alarm response?
Hard to believe this was over 12 years ago. I remember it like it was yesterday. I was working for Microsoft at the time and my service got impacted by this. It was a long 12 hours. There was no laughing at that rookie mistake!
Boooh! Scary!!! Anyways, here's rm -rf ""
'Is leap year? - take days since epoch and add 366, else add 365' - boom. Fixed. Adds a bit of a nuanced behavior but over years the expired date doesn't drift more than the one day.
It seems goverment software sucks in all the countries.
5:38 never thought I'd show up in a video like this, that was probably an HLX scale, which runs Windows Embedded on an AMD Geode
6:00 I remember the "FILES" arg in MS-DOS. Glad I don't have to even think about it anymore.