Thứ Ba, 7 tháng 5, 2013

The Quick Guide to GUIDs

http://betterexplained.com/articles/the-quick-guide-to-guids/

The Quick Guide to GUIDs

Our world is numbered. Books have ISBNs and products have barcodes. Cars have VINs, even people have social security numbers.

Numbers help us reference items unambiguously. "John Smith" may be many people, but Social Security Number 123-45-6789 refers to one person exactly.

A GUID (globally unique identifier) is a bigger, badder version of this type of ID number. You may see the term UUID tossed about (universally unique identifier), a nitpicky word for those whose numbers are unique not only within the globe, but throughout the entire universe.

Any way you title it, GUIDs or UUIDs are just big, gigantic ID numbers.

The Problem With Counting

"We don't need no stinkin' GUIDs," you may be thinking between gulps of Top Ramen, "I'll just use regular numbers and start counting up from 1."

Sure, it sounds easy. Just start with ISBN #1 and add one for each new book. But problems arise:

  • Who does the counting? A central authority?
  • Who handles simulataneous requests and eliminates duplicates?
  • Can IDs be shared between products? Is Social Security Number 1 different from ISBN 1?
  • Can people guess what the next ID will be? How many IDs have been issued?

The problem with counting is that we want to create ID numbers without the management headache.

GUIDs to the Rescue

GUIDs are large, enormous numbers that are nearly guaranteed to be unique. They are usually 128 bits long and look like this in hexadecimal:

30dd879c-ee2f-11db-8314-0800200c9a66

The format is a well-defined sequence of 32 hex digits grouped into chunks of 8-4-4-4-12. This gives us 2^128 or about 10^38 numbers.

Here's the thinking behind GUIDs:

  • If you pick a huge random number (39 digits long), it's really unlikely that someone will pick the same one.
  • GUIDs are not tied to a product. A GUID can be used for people, cars, files, webpages, colors, anything. With regular registration numbers, you start counting at 1 and numbers can overlap. Social Security Number 123-45-6789 is different from ISBN 123456789 which is different from barcode 123456789. This isn't an issue with GUIDs.
  • It's up to the person reading the GUID to figure out the context of the GUID. There are so many GUIDs that you can use them to number everything and not run out.

GUIDs give you a unique serial number that can be used on any item in the universe.

The Great GUID Shortage

When learning about GUIDs, it feels like 39 measly digits aren't enough. Won't we run out if people get GUID-crazy, assigning them for everything from their pets to their favorite bubble gum flavor?

Let's see. Think about how big the Internet is: Google has billions of web pages in its index. Let's call it a trillion (10^12) for kicks. Think about every wikipedia article, every news item on CNN, every product in Amazon, every blog post from any author. We can assign a GUID for each of these documents.

Now let's say everyone on Earth gets their own copy of the internet, to keep track of their stuff. Even crazier, let's say each person gets their own copy of the internet every second. How long can we go on?

Over a billion years.

Let me say that again. Each person gets a personal copy of the internet, every second, for a billion years.

It's a mind-boggling amount of items, and it's hard to get our heads around it. Trust me, we won't run out of GUIDs anytime soon. And if we do? We'll start using GUIDs with more digits.

Using GUIDs

If you want to create GUIDs, try the

There are several ways to create GUIDs (RFC 4122 describes the conventions), but you want to avoid that mess and use a library. The general types of GUIDs are:

  • Random: Just use the system's random-number generator to create a 128-bit number.
  • Time-based: Create a GUID based on the current time.
  • Hardware-based: Make a GUID with certain portions based on hardware features, such as the MAC address of a network card. This isn't great because the GUID isn't "anonymous" and can be partially traced to the machine that created it.
  • Content-based (MD5 or SHA-1 hash of data): Create a GUID based on a hash of the file contents. Files with the same contents will get the same GUID. You can also seed the hash with a unique namespace (like your URL).

You can mix-and-match techniques above. If you want duplicate files to have the same GUID, then use GUIDs based on the contents. If you want GUIDs to be unique, even if the contents are the same, then create them randomly or with a combination of file contents and a random number.

GUID Examples

Here's a few things you can do with GUIDs:

The Tradeoffs with GUIDs

Like most things in life, GUIDs have benefits and drawbacks. Weigh the features to see if they make sense:

Pros:

  • No central authority: You avoid the need for management, but can't keep track of what's been assigned. A compromise is to generate GUIDs internally and then hand them out.
  • Easily combined: You can merge GUIDs from different data sources very easily with a low chance of conflict.

Cons:

  • Appear random: Users cannot easily guess the ID for an object they don't know. This is good for security, difficult for debugging.
  • GUID overhead: GUIDs are an example of the time-space tradeoff. You save time in merging but have to use space to store the large (16-byte) GUID. It may not make sense to have a 16-byte GUID keeping track of a 4-byte item in your database.

GUIDs are not a GUARantee

There's one giant caveat for GUIDs: collisions are still possible.

First, the birthday paradox shows us the chance of a collision as GUIDs are used. It's very, very unlikely that GUIDs will collide, but as more are assigned, there are fewer left to choose from.

Second, a malicious user could try hijacking GUIDs that he knows will be used (assuming the user can assign their own GUIDs), or resubmitting different content to a previous GUID (submitting file A under the hash of file B).

If you are writing software, program defensively and detect cases where the GUID already exists. Give the user an error or even better, recover, create a new GUID on the server side and try again. GUIDs are great, but they aren't a magic bullet.

As always, we're never done learning. Read more about GUIDs here:


IPv6 (which is very similar to GUIDs, since it's also 128 bits in length) is capable of assigning approx. 1000 addresses per each square meter of surface on Earth. They also say that every grain of sand in the Sahara could have it's own IP address, but the question is still open whether there'd be any left for the rest of us

Remember IPv4, which should allow almost every person on Earth (6-7 billion and growing fast) to have an address, but with everyone having a dozen gadgets on them this number seems less than enough. Ofcourse this is kind of an overstatement, but has some truth in it. Up to this day almost half of the IPv4 addresses have been used, and with the current rate we are running out of them exponentially!

In contrast with GUIDs, IPv6 will have a central authority managing the addresses (or at least address ranges), thus collisions are out of the question, and still we could have the same amount of both IDs. Though I wouldn't assign an address to all my teddy bears and goldfish, it'd be a challenge for them to interpret all the protocols

Perhaps you should write an article about IPv6 too, it's at least as interesting as GUIDs, and I like your style much better than my own

Không có nhận xét nào:

Đăng nhận xét

(Chơi cho vui) AIRDROP CHAINGE FINANCE - dự án xây dựng ứng dụng ngân hàng số cho mọi người

 Không hiểu lắm về cái này, tuy nhiên thấy quảng cáo khá nhiều, lại chỉ cung cấp vài thông tin cá nhân (mà mấy ông lớn như facebook với goog...