The Dailies – December 5, 2018

Word of the Day

Swig (n./v., SWIG)

A big gulp of something or to take a big gulp, like an angler fish or a sailor on leave.

Gif of the Day

TagsAnimalsDogsStairsUnorthodox transportation methodsFour legs good, two legs betterLook ma, no feet?

Link of the Day

Why should you wait a few seconds before ejecting a flash drive?

A thorough, helpful answer to a common tech problem can be found over on MetaFilter:

From the computer's point of view, a USB flash drive appears to be a large array of 512-byte logical blocks (LBs), each one having a logical block address or LBA that's just an integer between 0 and some upper bound that depends on the size of the drive. The computer is able to read or write any one of these blocks at any time, just by handing its LBA to the drive and then reading or writing the associated 512 bytes of data. This is the same data addressing scheme that has long been used by hard drives, which USB flash drives are designed to emulate.

As you correctly point out, the computer does decide on the filesystem format: the choice of which particular set of LBAs to store any given file at is entirely the computer's, and when you tell the computer to defrag a drive, it will typically rewrite lots of LBs from lots of files to new LBAs so as to make any given file occupy LBs with a contiguous range of LBAs.

But that's not the whole story, because it turns out that the flash memory chips inside a USB flash drive can't actually write 512 bytes to any arbitrary logical block whenever the computer asks for that to happen. They simply don't work that way.

Internally, a flash memory write operation can only ever write zero-bits. The hardware is simply not capable of writing one-bits. The only way you get a one-bit out of a flash memory chip is by reading a bit that has never had a zero written to it since it was erased.

Erasure, for a flash memory chip, is not the same thing as writing. Erasure happens to a huge chunk of flash memory all at once; a typical size for that chunk, in a modern flash chip, is 8 megabytes. During erasure, every single bit inside that 8 megabyte erase block is set to the value 1.

So in order to write a 512-byte chunk of arbitrary data to some specified LBA, the flash drive's internal controller has to find a 512-byte block inside the flash memory chip that has not already been written since the 8MiB erase block containing it was last erased. If you write 512 bytes of arbitrary data to some LBA, and then re-write that same LBA with some other 512 bytes, the flash memory chip inside your USB drive does not see a request to re-write the same 512-byte region it wrote the first time. Instead, the internal controller chip finds another blank 512-byte region, and that's where the re-write actually goes.

And that means that the internal controller needs to keep track of a mapping between the LBAs the computer thinks it's reading and writing, and the real addresses inside the internal flash memory chips where the LB addressed by each of those LBAs is actually stored. The computer thinks it can just rewrite any LB to any LBA at any time, because that's how disk drives have always worked; it knows nothing about 8MiB erase blocks and the need for every write operation to hit a pre-erased region inside the flash memory chip.

The result of all these shenanigans is that writes progressively use up space inside the flash memory chip, even if they're all going to the same LBA from the computer's point of view. Every time a given LBA is rewritten, a 512-byte region inside the flash memory chip that used to hold its contents becomes "stale" and no longer addressable via any LBA over USB, as the controller's internal mapping is updated to map that LBA to the new spot where its data now lives.

And inevitably, at some point the flash memory is going to get full of erase blocks with a mixture of stale (superseded) and live (still addressable via some LBA) data, and a nice fresh 512 byte erased region to put the next write arriving over USB will simply not exist. And that's when the flash drive's internal controller has to run something very much like a defrag, where it copies all the still-live data out of some mixed stale+live erase block into a spare, clean erase block reserved for that purpose, then updates the LBA mappings, then erases what is now a completely stale block to make it able to accept more writes.

Now, in order to avoid gross amounts of data loss due to any unexpected power removal, the copy phase of that has to happen before the re-mapping phase and the re-mapping phase has to happen before the block-erase phase. But even with that ordering, if the power ends up getting cut right after the re-mapping is done but before the block erasure has had time to complete (erasing a block is relatively slow), then the chip can end up with a block that might well read as if erased but that has not in fact been fully erased. And when that block subsequently gets stuff written to it when the drive is next plugged in, some of the 1-bits inside it might be a bit marginal, and might actually read back as zeroes when the LB is read back out again, and that's data corruption.

Doing the recommended Safely Remove Hardware or Eject Device dance before physically removing the drive means that it will always have had enough time to complete any internal garbage collection and internal block erasure that it might still have been doing even after the computer has finished any given write operation. And that means you won't subsequently end up writing data back to partially erased memory, and that means it probably will come back out the same as it was when it went in, and that's a Good Thing.

Cool stuff. If you want to see other explanations, go check out the entire MetaFilter thread. And our thanks to user flabdablet for making this clear to us.

TagsSo you knowTechnologySafely remove hardwareRead and writeDon't break the universeBut what would happen if you removed the Comcast building from the Philly skyline?