
After making the WP_MonsterID Wordpress plugin to create a random monster avatar from an assortment of parts for each commenter (based on other people’s code), fruityoaty asked This looks nifty, but how many monster images are available for assigning?
I’d been meaning to calculate this anyway so I did the math and posted it in the comments:
The current part totals are: 17 eyes, 8 hairs, 12 mouths, 15 bodies, 10 legs. That is 244,800 possible combinations. In addition, the body color can range between 20-235 for red, green, and blue. If we count that as 20 distinguishable values for red, green, and blue that adds 8000 possible colors and brings the unique monster count to 2 billion. The only problem is that the algorithm is only using the first 6 digits of the md5 hash of the email which only provides 16 million possible combinations. So I guess the answer is 16 million monsters currently and in the next release I’ll use a few more digits of the hash and increase it to a billion or so. Edit: I did change this so in version 0.3 and later there should be a couple billion possibilities.
Calculating this got me wondering how many unique users it would take before there was likely to be a duplicate monster. For two users it was easy (1 out of 2 billion) but as the number of users increased things got messy since each new monster could match any of the prior monsters. Luckily I remembered enough of my stats class to google for something on calculating the chance of people in a group sharing a birthday.
If you’ve never heard of this problem, stop and take a quick guess for how many people you think it would take for the odds to be better than 50% for two people sharing a birthday. Or as my statistics professor put it, There are twenty-five people in the room will you bet me that no one shares the same birthday?
…
…
Guessed?
Now I know just enough statistics to know betting against a statistics professor is a bad idea but I have to say, at the time, I thought it would have been a fairly good bet. It turns out that I, like most non-statistics professors, underestimated the chance of any two people in a group sharing the same birthday. Actually, if there are 23 people in a room there is a greater than 50% chance that at least two will share a birthday. If there are 47 people in a room, there is a 95% chance that at least one pair share a birthday. This greatly increasing probability occurs because like the monsters each person added to the room can match any of the previous people (the 5th person can match person 1,2,3,4; the 25th person can match person 1,2,3,4,5,6,…,24;…).
All this was interesting in understanding the problem but didn’t really get any closer to finding the probability. Luckily Wikipedia provides an approximation for determining the number of people at a given probability of overlap:
Substituting in 2 billion for 365, results in a probablity of overlap that looks like:
Even with 2 billion monsters there is still a 50% chance of overlap with only 52,000 monsters and a 1 out of 10 chance of overlap with only 20,000 monsters. Most unintuitive to me is that there’s a 99% chance of overlap with only 135,000 monsters. The chances of an overlap really does pile up as the number of already present monsters grow. In the plus side, most normal sized blogs should be safe from monster overlap with only a .1% chance of overlap even with 2000 commenters.
So what does all this mean? Well besides not getting suckered in any birthday betting, it’s a good reminder to be careful about assuming uniqueness among a group just because the chance of a match is rare. For example, if in some application each user was assigned a random key of 4 digits (10000 possible combinations). There would be a greater than 50% chance of overlap after only 1% (117 users) of the keys were assigned.
If any one feels like messing around with the calculations themselves here’s the function in R to calculate the miminimum number of assignments to reach a certain probability of overlap from a total number of possible combinations. I’m sure it would be trivial to convert to any other language. Note that’s natural log not log10. number_assignments=function(total_number,probability_overlap){sqrt(2*total_number*log(1/(1-probability_overlap)))}
fruityoaty | 25-Jan-07 at 10:50 am | Permalink
Wow, that’s such a thorough answer. Excellent. Thanks.
ScottS-M | 25-Jan-07 at 3:42 pm | Permalink
Maybe a bit too thorough but it was pretty interesting once I started looking into it.
Jeff A | 27-Jan-07 at 3:15 am | Permalink
Hmm, that made my head hurt! But it is interesting though.
Now for my next question. I am very new to php so I know it would be impossible for me. But how hard would it be to set up a check routine that makes sure there won’t be a duplicate by say changing part of the substring to something else then creating the monster.
The more I look at this thought I realize that you would have to create a table in the DB to handle this or it would probably slow your site some once you had a lot of commentors.
ScottS-M | 27-Jan-07 at 12:03 pm | Permalink
@Jeff A
That is a good idea to prevent overlaps. Unfortunately it might not work with the code as is.
Right now the code makes some random characters out of the email address let’s say “lkjhdfg”. It then checks for
lkjhdfg.pngand if it exists returns the file path otherwise it generates a new monster with randomness generated by “lkjhdfg” and saves it to that file path. So there’s two chances for overlap. One if the random characters generated from the email overlap and one if the monster generating randomness overlaps. I think it would be pretty hard to keep track of both.On the good side, I believe this makes it impossible for someone to obtain email addresses from the pictures. Also I think most blogs (unless they have 1000’s of commenters) should have a pretty low (less than 1 in 1000) chance of overlap.
icoguo | 07-Feb-07 at 8:04 am | Permalink
test
reading_is_dangerous | 15-Feb-07 at 9:33 pm | Permalink
…what do you think: Is that equation included in Mother Nature’s provisions for unique individuals? At any level?
ScottS-M | 16-Feb-07 at 12:20 am | Permalink
@reading
That’s an interesting point. I hadn’t really connected it with fingerprints or DNA sequences. I think a conservative estimate of the number of genes in the human genome is 20,000. If even 100 of those had 2 distinct possibilities then there would be 10^30 possible combinations. That puts my 10^9 monsters (and the world population) to shame even with the shared birthday problem. Running it through the estimate it looks like there would be a .01% chance of overlap with 10 trillion people. Just back of the napkin calculations but still fun to think about.
Yardsnacker | 05-Jun-08 at 4:42 am | Permalink
hi
Jonas | 08-Jun-08 at 6:09 pm | Permalink
I’m just another tester.
Bjorn | 26-Jun-08 at 9:36 am | Permalink
Just testing, this looks pretty nifty. Oh, and the birthday sharing things is one of my favourites ;)
Saarthak | 10-Aug-08 at 6:36 pm | Permalink
Lol, this is fascinating. You’re a genius..
If I use this plugin in my blog, will the previously posted comments automatically change to these avatars?
ScottS-M | 10-Aug-08 at 10:12 pm | Permalink
@Saarthak
Glad you like it. It should work fine with previous comments.
t e s t a g a i n | 08-Oct-08 at 7:57 pm | Permalink
i l i k e s p a c e s .
i l i k e m o n s t e r s .
i l i k e s p a c e m o n s t e r s .
Dan | 08-Jan-09 at 6:06 am | Permalink
what does my inner monster look like?
Dan | 08-Jan-09 at 6:08 am | Permalink
This monster reminds me of the onion flavored bagel character at Fred Meyer stores in the Northwest USA.
Host | 04-Apr-09 at 2:44 pm | Permalink
Thanks So Much
Mario Da Costa e Sil | 17-Jul-09 at 12:30 pm | Permalink
I’ll get my monster just filling this? Anyway, what a wonderfull idea! thx!
Mario Da Costa e Sil | 17-Jul-09 at 12:32 pm | Permalink
Hey! what coincidence! I use glasses…
bucabay | 13-Oct-09 at 9:08 am | Permalink
What is the chance that two people reading this post had stumbled upon the birthday paradox before? I did and that proves it for me.. lol.
comptonkid | 15-Nov-09 at 2:13 am | Permalink
Oh and by the way I just love Monster’s…
LowRyderz | 25-Nov-09 at 3:50 am | Permalink
You took this idea and ran away with it. Power to you. Very smart and ingenious.
Sebastian Wilson | 12-Jun-10 at 6:30 pm | Permalink
Hi! I think I’ve got an idea to help you solve the problem of the uniqueness, or at least to reduce the probabilities of overlaping.
If you assign numbers to the possible characters you can enter on an email address, say 0=0, 1=1, …, 9=9, a=10, b=11, …, y=24, z=35, @=36, .=37, _=38, -=39, you can convert any e-mail address to a unique number using base 39 (like base 2, 8, 10 or 16, but in this case with the number of different characters an e-mail address can have).
You may also apply a similar procedure to define wich number goes with each monster, and then you get wich monster goes with each e-mail address. Of couse, beacuse there are more possible email address than monster combinations, you could use a modulo operator…
hope it helps!