Thursday, June 18, 2009

Cool Use of CTEs to create number table

Many developers have needed to create a numbers table for matching data to ranges, etc., and this article shows various ways to do it, including multiple uses of CTEs: http://www.projectdmx.com/tsql/tblnumbers.aspx

Poker DW: Stacking the Deck Part 2

Earlier we showed how to construct a list of all possible 5 card poker hands, and verified the result in Wikipedia. What I want to create is a table of those hands, all 2,598,960 of them, with their high- and low-hand ranks also. I want to be able to analyze the odds of one hand winning over another with n cards to come, which means that I'll need to create a temp copy of the deck, pull out the known cards from it, and make a hand with every card left and evaluate those hands against my table of all possible.

Now how will I represent the cards in the table of poker hands? The most obvious way is to create a five column key consisting of the 5 CardId values that make that hand. Question: does the order of those CardIds matter? Remember what we talked about last entry, the order should not matter. But we have to put them in some sort of order - if we have five columns, say CardId1, CardId2, CardId3, CardId4, and CardId5, something is going to have to go somewhere. Let's say that we arbitrarily enter the CardIds into the columns in no particular order - how will we now query them? Let's make a trivial example of querying for two cards. Our WHERE clause of such a query would look like:

WHERE CardId1 = @CardId1 AND CardId2 = @CardId2
OR CardId1 = @CardId2 AND CardId2 = @CardId1



We have to match every permutation of variables to columns. With three cards:

WHERE
CardId1 = @CardId1 AND CardId2 = @CardId2 AND CardId3 = @CardId3
OR CardId1 = @CardId1 AND CardId2 = @CardId3 AND CardId3 = @CardId2 OR CardId1 = @CardId2 AND CardId2 = @CardId1 AND CardId3 = @CardId3 OR CardId1 = @CardId2 AND CardId2 = @CardId3 AND CardId3 = @CardId1 OR CardId1 = @CardId3 AND CardId2 = @CardId1 AND CardId3 = @CardId2 OR CardId1 = @CardId3 AND CardId2 = @CardId2 AND CardId3 = @CardId1


Going back to our research on permutations, the number of permutations of n elements is n!, which is also equal to n(n + 1)/2. With five cards we're looking at a WHERE clause that is 5(5+1)/2 = 5*6/2 = 15 lines long. The coding for that isn't so bad (try not to make a mistake - you'll be matching 5 variable/column pairs per line for 15 lines, for a total of 75 equality checks), but think of how slowly that would perform! And that's just to evaluate one hand - imagine the gears grinding away to find all possible 5 card hands with two cards to come - if you're on the flop, and you want to evaluate your chances to the river, you have "47 choose 2" =
1081 possible outcomes.

What I came up with is a solution using prime numbers that I learned while studying Gödel's incompleteness theorems. We assign every card in the deck a unique prime number; the first card gets 2, the second card 3, all the way up to the last card, which gets prime number 239. Now what happens if we want to look at a two-card hand and match it to a table of all possible two-card hands? If we multiply the prime numbers corresponding to those cards, we will get a number that is unique to those two cards (the primes of any other two cards will result in a different number when multiplied). Obviously it doesn't matter which order the primes are multiplied, so we have just found the perfect primary key for our poker hands table. When we want to evaluate a hand, we multiply the primes corresponding to the cards and match the result to the primary key.

We have an updated Dim_Deck creation script that adds a "PrimeFactor" column to it. Now I'm working on a creating the table of all possible hands.

Poker DW: Stacking the Deck

Returning to the Poker Data Warehouse, I dove into the text-parsing process after setting up a sketch of an initial database design, with the idea of polishing out the design at a later date. I recently completed the parsing of a sample hand history, and man was it more work than I expected! Fortunately I love writing text-parsing routines (perhaps from early career work in merge-purge duplicate removal), so the 750 line stored procedure was more a labor of love than a tedious chore. Getting into the dirty details of the source data made me think more about the 30,000 ft overview also.


I created a representation of a 52 card deck of cards in the Poker DW, and I started thinking about how to evaluate 5 card poker hands (i.e., determining what a player had at the end of the hand). What I really want is to be able to evaluate the odds of making the best hand on the next card or the river, which would ultimately allow me to judge whether a player made the right decision. This result would be similar to the "% to win" stats that you see on TV.


After I created my deck of cards, I started playing around with representing a hand of 5 cards. How many possible 5 card hands are there? Easy - think of it like this. Take 1 card from the deck, there's 52 cards to choose from. Take another card, there's 51 to choose from. Keep picking until you have 5 cards in your hand, that leaves 52 * 51 * 50 * 49 * 48 = 311,875,200 possible 5 card hands.


The problem with this method is that I'm picking permutations of 5 card hands, rather than combinations. Let's reduce my example above to picking two cards rather than five. According to that math, there are 52 * 51 = 2,652 possible two card hands. Using the card deck created above, this query will return that count, 2652 rows:


;WITH Draw1 AS (
SELECT Card1 = CardId
FROM Dim_Deck
),
Draw2 AS (
SELECT
Card1,
Card2 = CardId
FROM Dim_Deck D2
JOIN Draw1 D1
ON D1.Card1 <> D2.CardId
)
SELECT COUNT(*) FROM Draw2



Note the use of the recursive CTE to create the second draw, Draw2. So let's say that I picked the five of clubs first, and the four of hearts second. That is one of the 2,652 possible events. But the reversal of that order is also one of the possible events (picking the four of hearts first, and the five of clubs second). But I really don't care which order the two cards come in (the permutation), I only care about the set of cards that results.


Looking at an even simpler example of a deck of 5 cards, ace to five, how many ways are there to pick two? Here's a simple matrix:



The code above will pick everything except the diagonal that shows pairs:




but what we really want is this:



And in order to get it, we change the "<>" operators to ">":


;WITH Draw1 AS (
SELECT Card1 = CardId
FROM Dim_Deck
),
Draw2 AS (
SELECT
Card1,
Card2 = CardId
FROM Dim_Deck D2
JOIN Draw1 D1
ON D1.Card1 > D2.CardId
)
SELECT COUNT(*) FROM Draw2

and we obtain the correct result, 1326 rows.

Monday, June 15, 2009

StackOverflow

I started answering questions on StackOverflow, and came across a couple of interesting sites. The first is Vyas code page, a small library of useful SQL code. The other is an add-on toolkit for SSMS.

Friday, June 12, 2009

T-SQL to Export Table Structure to a script

Good script for scripting out SQL tables via T-SQL rather than Enterprise Manager/Dev Studio: http://www.sqlservercentral.com/scripts/Miscellaneous/30730/