Jesse's SQL Blog: May 2009

Wednesday, May 27, 2009

CRM Import - Importing Into Drop-Down Combobox Targets

During the initial try to load accounts into MS-CRM 4.0, my source file failed for columns that had drop-down combobox columns as targets. A little research lead to this forum post, which advises using the GUIDs of the target system. I was a little mislead by this post, since it actually applied to lookups, not drop-down combobox values. The solution I needed involved converting the source values to the AttributeValue equivalents in the StringMap table of the MS-CRM database.

Thursday, May 21, 2009

BCP out Temp Tables

Hit a little snag today trying to output, via BCP, the results of a stored procedure that created and dropped a local temp table. The spd would run fine in Query Analyser, but when run from the DOS prompt I got the error message that the temp table didn't exist. Perhaps the problem is caused by BCP compiling and running the spd in different threads? Anyways, the results of a little googling provide the workaround of keeping the table in TempDb rather than creating as a real temp table. That works as long as the table is not created and dropped within the spd that BCP calls.

Monday, May 18, 2009

Case Study: Poker DW: Reporting Questions

After reviewing the entities and their relationships, we next want to look at some of the questions we might want to answer from the data in our DW. Some possible questions:

Which players will call a big checkraise on the flop with an overpair?
What is the actual expected outcome of reraising on the button with a suited connector and bluffing the flop?

Next: ???

Case Study: Poker DW: Entities

Continuing from the introduction, one of the first things we'll have to do in designing our Poker DW (Data Warehouse), is to identify all of the entities of interest. Taking a look at our sample hand history file (ignore the last two lines of HTML code, the hosting company stamped them when I uploaded the file), the first line starts with a sequence number, the format of the game and the datetime stamp of when a particular hand took place. The second line indicates the name of the table, whether it is real or play money, and which seat is the dealer. The next ten or so lines tell us who is in which seat, and what their stack size is (at the beginning of the hand), followed by a line for each player who posts a blind (small, big, and other). So up to now, we've seen such entities as Time, Table, Player, Seat, and Money (stack and pot size).

The next section begins with the header line "*** POCKET CARDS ***". Here we have such information as the hole cards dealt to the player, and all of the preflop action (fold, check, call, or raise). We can identify three more entities here: Betting Stage, Cards and Actions. The next section, "*** FLOP *** [10s 9d 3h]", contains the same entities, but this time we have community cards. At each step in these sections, we can calculate the pot size and stack sizes for each player. Two more sections, "Turn" and "River", provide similar info.

Special consideration should be given to the next section, "*** SHOW DOWN ***", as it will show us exactly what cards other players held during the hand, allowing us to "backfill" that info for earlier rounds of betting. This will help us answer some important questions in the hand histories. The final section, "*** SUMMARY ***", provides info such as the rake, the Hi hand (and Low if this is a hi/lo split game), and the final pot size (which we can use to verify our "running" pot size throughout the hand).

So let's summarize our entities and their relationships. Central to this is Hands. Hands occur at certain Times at a particular Table, which have Seats. Players make Actions with Money based on Cards appearing at a Betting Stage.

Friday, May 15, 2009

Case Study: Data Warehouse for Poker (Intro)

A friend of mine plays a good deal of online poker, and wants to improve his game by studying the hands he has played. I suggested creating a data warehouse from the hand histories held in the text files that the app saves, and using that data to identify winning and losing trends in his game. This entry will serve as the first in a series of the steps we will take to develop this.

Guide to Entries:

Intro (this entry)
Identifying Entities
Reporting Questions

Thursday, May 14, 2009

Grouping Datetimes to Identify Sessions

Let's say that I have a record of events that are unique by a datetime "timestamp" marker, and that these events can be grouped by sessions, so that every event that occurs within a certain period of time of the preceeding and/or subsequent events to it are considered as part of the same session. For example, let's say that we are examining cars driving by a traffic counter where each car passing is recorded as an event, and we want to organize the events by sessions so that any two cars passing within 5 seconds of one another constitutes a session (the events will "chain" together so that if four cars pass within 2 seconds of one another, but the first and last cars are within 10 seconds, all four car events are a part of the same session).

Now, if I only have the events, how do I create sessions around them?

Wednesday, May 6, 2009

Another Version for Calculating Median

Joe Celko published a "history" of calculating the median in SQL, along with a final version that seems to work similarly to mine: http://www.simple-talk.com/sql/t-sql-programming/median-workbench/.

Tuesday, May 5, 2009

Querying Sys.Columns & Sys.Types

If you want to query the structure of a table that includes column names and data types, you have to perform a join between catalog views Sys.Columns and Sys.Types. There are some caveats to this. If any of your columns are defined as nvarchar or user-defined data types, you must qualify the data coming from Sys.Types. When you add a user-defined data type, it is entered into Sys.Types with a reference to it's native data type. In the example of the AdventureWorks database, the data type "AccountNumber" is defined as nvarchar, and shows up in Sys.Types with system_type_id = 231 (which points to "nvarchar", where system_type_id = 231 and user_type_id = 231).

Running the query:

SELECT
Tb.Name,
C.Name,
Tp.Name
FROM Sys.Tables Tb
JOIN Sys.Schemas Sch
ON Sch.Schema_Id = Tb.Schema_Id
JOIN Sys.Columns C
ON C.Object_Id = Tb.Object_Id
JOIN Sys.Types Tp
ON Tp.System_Type_Id = C.System_Type_Id
WHERE Tb.Name = 'Address'
ORDER BY Tb.Name, C.Name, Tp.Name

produces these results:

Weird, huh? Why did 'AddressLine1' show up six times with six different data types? The reason is two-fold. First, 'AddressLine1' is defined as nvarchar(60), which means that it will also show up as "sysname" datatype (think of "sysname" as MicroSoft's built-in user-defined data type).

Take a look at the results of the query below. It shows that, including itself, six different data types are based on nvarchar! That's why 'AddressLine1' showed up six times in the query above.

SELECT Name FROM Sys.Types Tp
WHERE System_Type_Id = 231

Name
-------------------
nvarchar
sysname
AccountNumber
Name
OrderNumber
Phone
(6 row(s) affected)

So let's change our query to use this 'User_Type_Id' column instead:

SELECT
Tb.Name,
C.Name,
Tp.Name
FROM Sys.Tables Tb
JOIN Sys.Schemas Sch
ON Sch.Schema_Id = Tb.Schema_Id
JOIN Sys.Columns C
ON C.Object_Id = Tb.Object_Id
JOIN Sys.Types Tp
ON Tp.User_Type_Id = C.System_Type_Id
WHERE Tb.Name = 'Address'
ORDER BY Tb.Name, C.Name, Tp.Name

This produces the results we want:

Jesse's SQL Blog