Yesterday, I wrote about the World Cup pick’em site and doing some prediction “math” to figure out the likely group finishing positions of each team in a group. Today, I planned to rewrite parts of it to speed it up.
My naive approach was to use transactions to create the db records needed so that a team could compute its group record, then back those transactions out to go on to the next possible finishes of games. This was SLOW, and I knew it was going to be slow. I began the rewrite by changing the way the group record was calculated. Instead of relying solely on DB records, it allowed me to pass in an array of these records, and if that array wasn’t present, it would look them up in the DB for me. This allowed me to create instances of models, but not save them to the DB, and just pass those into the Team.group_record method. This reduction dropped individual database accesses when getting the group record down from about 16 or so to 2. The savings are noticeable.
After making this change (and various other necessary changes based on this), I had the tools to rewrite the prediction methods. That is when I discovered a major logical problem with the way I had written it yesterday. It would generate all the possible scores 0..5 for both sides.. but then use that scoreline for BOTH games. This meant that instead of 6^2 possibilities, there really existed 6^4 possibilities of games/scorelines.
Luckily I had stumbled across a neat collection of Ruby mixin methods for Arrays to do things like permutations, combinations, and (what I used) repeating permutations. I dropped this code to config/initializers/array.rb:
class Array
def permutations
return [self] if size < 2
perm = []
each { |e| (self - [e]).permutations.each { |p| perm << ([e] + p) } }
perm
end
def rep_perm_block(n)
if n < 0
elsif n == 0
yield([])
else
rep_perm_block(n-1) do |x|
each do |y|
yield(x + [y])
end
end
end
end
def rep_permutations(n)
ret = []
rep_perm_block(n) do |x|
ret << x
end
ret
end
end
I had a bit of trouble converting from yield style syntax to normal “build an array and return it” syntax, hence the helper method on the bottom. Rails automatically loads anything in config/initializers/, so this mixin is always loaded for any Rails environment. To generate all possible scorelines, I do this:
>> (0..5).to_a.rep_permutations(4)
=> [[0, 0, 0, 0], [0, 0, 0, 1], [0, 0, 0, 2], [0, 0, 0, 3],
[0, 0, 0, 4], [0, 0, 0, 5], [0, 0, 1, 0], ...
And just pair them up to get scorelines for each of the two games.
This results in some improved numbers for USA’s chances:
=> [{"USA"=>[0.969135802469136, 0.0308641975308642, 0.0, 0.0]},
{"ENG"=>[0.0308641975308642, 0.529320987654321, 0.266203703703704, 0.173611111111111]},
{"SVN"=>[0.0, 0.335648148148148, 0.259259259259259, 0.405092592592593]},
{"ALG"=>[0.0, 0.104166666666667, 0.474537037037037, 0.421296296296296]}]
There’s plenty more places this can be taken. For instance, when there is an extremely low % chance for a team to finish in a spot, let’s say less than 2.5% or so, I could get a list of all the game scorelines that would generate that position, then find the commonality between them so it can be displayed in English e.g. “Egypt must win by 5 goals and Scotland must lose” or something like that. There’s a lot more to do on this application though, not just this prediction model…