Pages

Tuesday, September 17, 2013

How to: Load DATE Type Data by SSIS

We have discussed Implicit conversion from varchar to datetime by SQL Server, today I will show you how to take advantage of this kind of implicit conversion when loading date type data using SSIS. 

Problem

Every business needs time dimension to measure its profit or growth. Date-related columns can be birth date of persons, order dates or ship dates of products, prescription date, etc. However, date-related columns can be tricky to load because there seems always some data issues with them. For example, suppose we have a source file with PRESCRIPTION_DATE column in format of DD-MMM-YYYY  in source files. Some values are just empty or set as a dot (that might come from SAS) while the destination table defines this column as date type or datetime type.

How to load date-type columns?

Step 1. Define all date-related columns as VARCHAR in flat file connection manager and load all data  as VARCHAR first in Flat File Source.

Step 2.  Use derived column to clean up those dot values or other issues and to "transform" the data.


      Solution 1: Define a new derived column [Conv_DateColumn] with expression set as below:
 LEN(TRIM(PRESCRIPTION_DATE)) < 2 ? NULL(DT_DBDATE) : (DT_DBDATE)(SUBSTRING(PRESCRIPTION_DATE,1,2) + "-" + SUBSTRING(PRESCRIPTION_DATE,3,3) + "-" + SUBSTRING(PRESCRIPTION_DATE,6,4))

      Solution 2: Define a new derived column [Conv_DateColumn] as a (DT_STR,10,1252), in format of "YYYY-MM-DD".
             

Step 3. When you do Mappings for OLE DB Destination Editor, map the [Conv_DateColumn] to the dateColumn with date or datetime as column type.

That's it!

Magic Windows Functions: Row Number

ROW_NUMBER() returns the sequential number of a row within a partition of a result set, starting at 1 for the first row in each partition.

ROW_NUMBER ( )     OVER ( [ PARTITION BY value_expression , ... [ n ] ] order_by_clause )

Below is an example about usage:

SELECT Emp_Name, Company, Join_Date, Resigned_Date,
ROW_NUMBER() OVER(PARTITION BY Emp_Name, Company ORDER BY Join_Date ) AS Row
FROM Emp_Details
ORDER BY Emp_Name;


Another good example is to help delete duplicate records in a table:

 WITH CTE (Emp_Name, Company, DuplicateEmp)
AS
(
SELECT Emp_Name, Company,
ROW_NUMBER() OVER(PARTITION BY Emp_Name, Company ORDER BY Join_Date) AS DuplicateEmp
FROM Emp_Details
)
DELETE
FROM CTE
WHERE DuplicateEmp> 1
GO

Sunday, September 15, 2013

Implicit Conversion From Varchar to Datetime

Sometimes a simple little thing may bring us some sights. Today I'd like to share with you something related to implicit conversion in sql server.


Problem

A colleague brought me this scenario: when she tried to update some table by joining with other tables, She got the following error:

Msg 242, Level 16, State 3, Line 1
The conversion of a varchar data type to a datetime data type resulted in an out-of-range value.
The statement has been terminated.

So the action to work around this is to change data type of the related column from varchar(10) to datetime by just using ALTER COLUMN function. However, another error jumped out:

Msg 241, Level 16, State 1, Line 1
Conversion failed when converting date and/or time from character string.
The statement has been terminated.


What? This is kind of common practices for me as well. What did the error come from?

So I went ahead to dig into the date column that was defined as varchar(10). The moment I saw the data, I got some hint. In that column, the date was displayed as "DD-MM-YYYY". 

Solution

After changing the format from "DD-MM-YYYY" to "YYYY-MM-DD" by the following script:

UPDATE schemaName.tableName
SET datecolumn = SUBSTRING(datecolumn,7,4)+'-'SUBSTRING(datecolumn,4,2)+'-' +SUBSTRING(datecolumn,1,2)

there is no more problem to use ALTER COLUMN to update that column's data type from Varchar to Datetime. 


Conclusion

 As a summary, SQL Server only allows implicit conversion from Varchar to Datetime when the date format is "YYYY-MM-DD". If you want to learn more about implicit/explicit data conversion, please read technet's article: Data Type Conversion.

Monday, September 9, 2013

How to: Install PowerPivot and Data Mining Add-ins for Excel 2010

For data analysts and BI developers, there are two add-ins for Excel 2010 that might be interesting to be installed for a tryout:
  • PowerPivot add-in for Excel 2010
  • Data mining add-in for Excel 2010

Where to download?

Microsoft SQL Server 2012 SP1 PowerPivot for Microsoft Excel 2010 can be downloaded at
http://www.microsoft.com/en-us/download/details.aspx?id=29074

Moreover, SQL Server 2012 Data Mining Add-ins for Office 2010 can be downloaded at http://www.microsoft.com/en-us/download/details.aspx?id=29061

For both add-ins, you need to decide which version you need since both a 32-bit and 64-bit version are available now. If you have a 32-bit version of Excel,  you need to choose 32-bit version of PowerPivot: ENU\x86\SQL_AS_DMAddin.msi. (Don't get confused with the machine version.) If you don't know which version your Excel is, go to File Tab and click Help. Then on the right of the screen, you will find out detailed information about your Excel including versions.

Memory Considerations

Note that for 32-bit version of Excel, the limit for memory usage is 2GB.  If you need to use more than 2GB of memory to work with your mining model, you must use the 64-bit version of Excel with the 64-bit version of the PowerPivot add-in. For more info, please refer to the article on Technet:  How to Use the SQL Server Data Mining Add-ins with PowerPivot for Excel

Wednesday, September 4, 2013

Which Kind of Table Expression to Use: Common Table Expressions, Views, TVFs, or Derived Tables?

Table expressions in T-SQL are a very convenient tools that make your codes more readable and easier to maintain. There are four kinds of table expressions:
  1. Derived tables
  2. Common table expressions (CTEs)
  3. Views
  4. Inline table-valued functions (TVFs)

So which kind of table expression to Use?


Derived tables and CTEs are only valid when the outer query is executed. That is to say, their scope is within the outer query. CTEs can be referred multiple times in the execution of the one SELECT, DELETE, UPDATE, or INSERT statement; while derived tables can only be referred once. On the other hand, views and TVFs are stored as database objects; and thus they are reusable as database objects (such as control access with permissions include SELECT, INSERT, UPDATE, and DELETE permissions). The table below lists the main differences between these four kinds of table expressions. Now will you be able to answer the question at the beginning?



Scopes Referred Multiple Times Reusable Support Parameters
Derived TablesSingle StatementNoNoNo
CTEsSingle StatementYesNoNo
ViewsDB ObjectsYesYesNo
TVFsDB ObjectsYesYesYes

Here is another little quiz: when you need a reusable table expression that need to have input parameters, which one should you use?

Bingo! Table-valued functions!

If you want to dive in more, here are more scenarios to learn:

  1. If you need to find hierarchy relationships among the data, you definitely need a recursive CTEs

  2. If you need pivot data to help you present the aggregation results (such as running totals, year-to-date calculations) or crosstab queries, CTEs shown below help you not intimidated by the complexity of PIVOT operator in T-SQL.
    ;WITH SaleTotalByYear AS
    (
    SELECT
    [SalesPersonID],                          -- grouping column
    YEAR([OrderDate]) as OrderYear,           -- spreading column
    [SubTotal]                                -- aggregation column
    FROM [AdventureWorks2008R2].[Sales].[SalesOrderHeader]
    where [SalesPersonID] IS NOT NULL
    )
    SELECT SalesPersonID, [2005],[2006],[2007]
    FROM SaleTotalByYear
    PIVOT(SUM(SubTotal) FOR OrderYear IN ([2005],[2006],[2007]) ) AS P
    ORDER BY SalesPersonID;

  3. If you need a table expression that work with APPLY operator, table-valued functions are the right choice. The well-known script shown below is to find the scripts and its query plan for all queries currently running on your server:  
    SELECT 
        er.session_id,
        db.name database_name,
        er.status,
        er.wait_type,
        eqp.query_plan,
        est.text
    FROM sys.dm_exec_requests er
        JOIN sys.databases db ON (er.database_id = db.database_id)
        CROSS APPLY sys.dm_exec_query_plan(er.plan_handle) eqp
        CROSS APPLY sys.dm_exec_sql_text(er.sql_handle) est 


Better Performance?


Last not the least, you may wonder whether table expressions will bring better performance. The book of "Microsoft SQL Server 2012 T-SQL Fundamentals" mentioned that "Generally speaking, table expressions have neither positive nor negative performance impact". However,  The Seven Sins against TSQL Performance studied performance effects when using functions and nested views. So be cautious about performance impacts when using these kinds of table expressions.