After making the WP_MonsterID WordPress plugin to create a random monster avatar from an assortment of parts for each commenter (based on other people’s code), fruityoaty asked This looks nifty, but how many monster images are available for assigning?

I’d been meaning to calculate this anyway so I did the math and posted it in the comments:

The current part totals are: 17 eyes, 8 hairs, 12 mouths, 15 bodies, 10 legs. That is 244,800 possible combinations. In addition, the body color can range between 20-235 for red, green, and blue. If we count that as 20 distinguishable values for red, green, and blue that adds 8000 possible colors and brings the unique monster count to 2 billion. The only problem is that the algorithm is only using the first 6 digits of the md5 hash of the email which only provides 16 million possible combinations. So I guess the answer is 16 million monsters currently and in the next release I’ll use a few more digits of the hash and increase it to a billion or so.

Edit: I did change this so in version 0.3 and later there should be a couple billion possibilities.

Calculating this got me wondering how many unique users it would take before there was likely to be a duplicate monster. For two users it was easy (1 out of 2 billion) but as the number of users increased things got messy since each new monster could match any of the prior monsters. Luckily I remembered enough of my stats class to google for something on calculating the chance of people in a group sharing a birthday.

If you’ve never heard of this problem, stop and take a quick guess for how many people you think it would take for the odds to be better than 50% for two people sharing a birthday. Or as my statistics professor put it, There are twenty-five people in the room will you bet me that no one shares the same birthday?

…

…

Guessed?

Now I know just enough statistics to know betting against a statistics professor is a bad idea but I have to say, at the time, I thought it would have been a fairly good bet. It turns out that I, like most non-statistics professors, underestimated the chance of any two people in a group sharing the same birthday. Actually, if there are 23 people in a room there is a greater than 50% chance that at least two will share a birthday. If there are 47 people in a room, there is a 95% chance that at least one pair share a birthday. This greatly increasing probability occurs because like the monsters each person added to the room can match any of the previous people (the 5th person can match person 1,2,3,4; the 25th person can match person 1,2,3,4,5,6,…,24;…).

All this was interesting in understanding the problem but didn’t really get any closer to finding the probability. Luckily Wikipedia provides an approximation for determining the number of people at a given probability of overlap:

Substituting in 2 billion for 365, results in a probablity of overlap that looks like:

Even with 2 billion monsters there is still a 50% chance of overlap with only 52,000 monsters and a 1 out of 10 chance of overlap with only 20,000 monsters. Most unintuitive to me is that there’s a 99% chance of overlap with only 135,000 monsters. The chances of an overlap really does pile up as the number of already present monsters grow. In the plus side, most normal sized blogs should be safe from monster overlap with only a .1% chance of overlap even with 2000 commenters.

So what does all this mean? Well besides not getting suckered in any birthday betting, it’s a good reminder to be careful about assuming uniqueness among a group just because the chance of a match is rare. For example, if in some application each user was assigned a random key of 4 digits (10000 possible combinations). There would be a greater than 50% chance of overlap after only 1% (117 users) of the keys were assigned.

If any one feels like messing around with the calculations themselves here’s the function in R to calculate the miminimum number of assignments to reach a certain probability of overlap from a total number of possible combinations. I’m sure it would be trivial to convert to any other language. Note that’s natural log not log10. `number_assignments=function(total_number,probability_overlap){sqrt(2*total_number*log(1/(1-probability_overlap)))}`

fruityoaty| 25-Jan-07 at 10:50 am | PermalinkWow, that’s such a thorough answer. Excellent. Thanks.

ScottS-M| 25-Jan-07 at 3:42 pm | PermalinkMaybe a bit too thorough but it was pretty interesting once I started looking into it.

Jeff A| 27-Jan-07 at 3:15 am | PermalinkHmm, that made my head hurt! But it is interesting though.

Now for my next question. I am very new to php so I know it would be impossible for me. But how hard would it be to set up a check routine that makes sure there won’t be a duplicate by say changing part of the substring to something else then creating the monster.

The more I look at this thought I realize that you would have to create a table in the DB to handle this or it would probably slow your site some once you had a lot of commentors.

ScottS-M| 27-Jan-07 at 12:03 pm | Permalink@Jeff A

That is a good idea to prevent overlaps. Unfortunately it might not work with the code as is.

Right now the code makes some random characters out of the email address let’s say “lkjhdfg”. It then checks for

`lkjhdfg.png`

and if it exists returns the file path otherwise it generates a new monster with randomness generated by “lkjhdfg” and saves it to that file path. So there’s two chances for overlap. One if the random characters generated from the email overlap and one if the monster generating randomness overlaps. I think it would be pretty hard to keep track of both.On the good side, I believe this makes it impossible for someone to obtain email addresses from the pictures. Also I think most blogs (unless they have 1000′s of commenters) should have a pretty low (less than 1 in 1000) chance of overlap.

icoguo| 07-Feb-07 at 8:04 am | Permalinkｔｅｓｔ

reading_is_dangerous| 15-Feb-07 at 9:33 pm | Permalink…what do you think: Is that equation included in Mother Nature’s provisions for unique individuals? At any level?

ScottS-M| 16-Feb-07 at 12:20 am | Permalink@reading

That’s an interesting point. I hadn’t really connected it with fingerprints or DNA sequences. I think a conservative estimate of the number of genes in the human genome is 20,000. If even 100 of those had 2 distinct possibilities then there would be 10^30 possible combinations. That puts my 10^9 monsters (and the world population) to shame even with the shared birthday problem. Running it through the estimate it looks like there would be a .01% chance of overlap with 10 trillion people. Just back of the napkin calculations but still fun to think about.

Yardsnacker| 05-Jun-08 at 4:42 am | Permalinkhi

Jonas| 08-Jun-08 at 6:09 pm | PermalinkI’m just another tester.

Bjorn| 26-Jun-08 at 9:36 am | PermalinkJust testing, this looks pretty nifty. Oh, and the birthday sharing things is one of my favourites ;)

Saarthak| 10-Aug-08 at 6:36 pm | PermalinkLol, this is fascinating. You’re a genius..

If I use this plugin in my blog, will the previously posted comments automatically change to these avatars?

ScottS-M| 10-Aug-08 at 10:12 pm | Permalink@Saarthak

Glad you like it. It should work fine with previous comments.

t e s t a g a i n| 08-Oct-08 at 7:57 pm | Permalinki l i k e s p a c e s .

i l i k e m o n s t e r s .

i l i k e s p a c e m o n s t e r s .

Dan| 08-Jan-09 at 6:06 am | Permalinkwhat does my inner monster look like?

Dan| 08-Jan-09 at 6:08 am | PermalinkThis monster reminds me of the onion flavored bagel character at Fred Meyer stores in the Northwest USA.

Host| 04-Apr-09 at 2:44 pm | PermalinkThanks So Much

Mario Da Costa e Sil| 17-Jul-09 at 12:30 pm | PermalinkI’ll get my monster just filling this? Anyway, what a wonderfull idea! thx!

Mario Da Costa e Sil| 17-Jul-09 at 12:32 pm | PermalinkHey! what coincidence! I use glasses…

bucabay| 13-Oct-09 at 9:08 am | PermalinkWhat is the chance that two people reading this post had stumbled upon the birthday paradox before? I did and that proves it for me.. lol.

comptonkid| 15-Nov-09 at 2:13 am | PermalinkOh and by the way I just love Monster’s…

LowRyderz| 25-Nov-09 at 3:50 am | PermalinkYou took this idea and ran away with it. Power to you. Very smart and ingenious.

Sebastian Wilson| 12-Jun-10 at 6:30 pm | PermalinkHi! I think I’ve got an idea to help you solve the problem of the uniqueness, or at least to reduce the probabilities of overlaping.

If you assign numbers to the possible characters you can enter on an email address, say 0=0, 1=1, …, 9=9, a=10, b=11, …, y=24, z=35, @=36, .=37, _=38, -=39, you can convert any e-mail address to a unique number using base 39 (like base 2, 8, 10 or 16, but in this case with the number of different characters an e-mail address can have).

You may also apply a similar procedure to define wich number goes with each monster, and then you get wich monster goes with each e-mail address. Of couse, beacuse there are more possible email address than monster combinations, you could use a modulo operator…

hope it helps!

uhm| 27-Oct-10 at 8:04 am | PermalinkI just wanna see my monster!

uhm| 27-Oct-10 at 8:06 am | PermalinkSo i got like a giant blue sad squid, with bat-like wings, a tail (with fur?) and some curly hair.

Wonderful.

lol

Neilandio| 18-May-11 at 1:45 pm | Permalinkjust trying my monster id

Daniel Miller| 30-Jun-11 at 2:31 pm | PermalinkI hate statistics, but i love monsters!

MoSka| 25-Jul-11 at 2:44 pm | PermalinkGreat Job!

Vilius| 29-Nov-11 at 11:07 am | Permalinkand.. that’s me

pimbo| 28-Jan-12 at 5:16 am | Permalinknice!

Senzahl| 01-Mar-12 at 2:30 pm | Permalinkmirror, mirror

anita cripps| 13-Mar-12 at 3:44 pm | PermalinkI want a monster!!!

anita cripps| 13-Mar-12 at 3:45 pm | Permalinkmonster

anita cripps| 13-Mar-12 at 3:45 pm | Permalinkmonster!!!

Анатолий Ухванов| 01-Jun-12 at 6:20 pm | PermalinkI heard, monsters are given for free here…

Give me one too!

Анатолий Ухванов| 01-Jun-12 at 6:21 pm | Permalinkugh! What a MONSTER! :)

so cool| 15-Apr-13 at 4:29 am | Permalinkjust testing!

so cool| 15-Apr-13 at 4:30 am | Permalinkjust testing

so cool| 15-Apr-13 at 4:33 am | Permalinkthis is neat!

Whats mine| 21-Jun-13 at 4:45 am | PermalinkTest

Whats mine| 21-Jun-13 at 4:46 am | PermalinkTest 2

Bigue Nique| 10-Oct-13 at 1:02 am | PermalinkBetter know what your unique monster look like!

what does it looks| 21-Nov-13 at 9:21 am | PermalinkHello

Shivanand| 28-Dec-13 at 9:37 am | PermalinkI commented just to see what my monster looks like :D :)

Warren| 29-Aug-14 at 1:03 am | PermalinkJust testing …cool like it

spyrius| 23-Nov-14 at 5:01 pm | Permalinknice monsters :B

Happy Derp| 28-Oct-15 at 7:51 am | PermalinkTesting to see my monterid :D

edit. Hahahaha nice. Almost poro with moustache :D blue poro lol

zikey| 20-Jan-17 at 2:02 pm | PermalinkAre there any cases where

two unique hashescan produce thesame icon?Basically a situation where

`7d2e`

and`88d9`

would produce the same icon.I’d like to avoid those situations completely and have the algorithm be as strong as the hash algorithm used.

The icon generation should use all bits of the hash and even a single bit difference should produces a unique result, even if its just a color or shape/graphic.