Pages

Sunday, December 29, 2013

Predictions for Big Data and Cloud Computing in 2014

Now it is the time to review the past year and make resolutions for the coming year. Although prediction is very difficult – especially of the future, you can find lots of predictions for 2014. Here are predictions for big data and cloud computing – two of the 2013 big buzzwords:

In these articles, we are looking forward to seeing that Hadoop will adapt to the volume, velocity and variety of data in 2014. That is, “Hadoop will power real-time tailored applications”, as well as techniques for “deeper security, deeper operational management, deeper resource management and multi-site replication.” Businesses are going to “leveraging the digital breadcrumb trail customers leave as they interact with products and services online”. Imagine how these data about customers can be magically turned into value:  analysis of customers behavior to provide more personalized services, fraud detection, anomaly detection, and trend analysis…

On the other hand, Cloud computing will become even more fundamental to business strategy. Clouds will be more specialized in terms of needs or verticals in order to differentiate from each other.

Open Source, such as R, may move from an alternative role to a leading role. As for this, it seems to make sense that when none of mainstream can provide a unified framework to solve problems, who will be willing to spend money on expensive licenses? So open source deserves as a playground for implement new ideas.

It was also predicted the move from the Internet of Things to the Web of Things.

So what is your plan to prepare yourself well for the coming year?

Thursday, December 26, 2013

How to: Run SSIS in 32 bit Mode

SSIS is an excellent tools for extract, transform, and loading data. Nowadays, more and more machines that SQL Server runs on are 64 bit. However, there are some conditions that require SSIS to be run in 32 bit mode, such as

1. when SSIS tries to interact with Microsoft Excel or Access files that only support 32 bit.

2. ActiveX script task or Execute DTS 2000 package task.

3. Debug script tasks.

Sometimes if it is not set in a right runtime mode, you will get error messages like:

Error: 0xC00F9304 at Package, Connection manager "Excel Connection Manager": SSIS Error Code DTS_E_OLEDB_EXCEL_NOT_SUPPORTED: The Excel Connection Manager is not supported in the 64-bit version of SSIS, as no OLE DB provider is available.

Or even without error messages when you try to get your script task pause at a breakpoint inside the script. It seems never hit the breakpoint and does not pause as expected.

How to set 32bit runtime mode at PROJECT Level?

After the project is opened in BIDS, right-click the project on the Solution Explorer panel and then click Properties.

Then on the Property Pages, first select Debugging under “Configuration Properties” on the left. Then change the property for Run64BitRuntime to be FALSE. (The default value for Run64BitRuntime is True.)

TurnOff64Bit1 TurnOff64Bit2

Now every package under this Demo Debug Project is set to run in 32 bit.

How to set 32bit runtime mode at PACKAGE Level?

To set 32bit runtime mode at PACKAGE Level means that you can run just one package in 32 bit. This can be accomplished through SQL Agent job settings.

It is required to have X86 DTExec.exe installed in order to have SQL Agent job to execute in 32 bit mode. 32 bit version of DTExec.exe is often installed at

C:\Program Files (x86)\Microsoft SQL Server\100\DTS\Binn\DTExec.exe (for SQL Server 2008)

C:\Program Files (x86)\Microsoft SQL Server\100\DTS\Binn\DTExec.exe (for SQL Server 2012)

As shown below, when you schedule jobs, you can set the Execution Options in job steps to “Use 32 bit runtime”.

TurnOff64Bit-sqlAgentJob

If you execute the package using DTEXEC via command-line scheduling process, you need to specify the 32-bit version of DTEXEC by explicitly running the DTEXEC.EXE from the right folder as shown:

"C:\Program Files (x86)\Microsoft SQL Server\100\DTS\Binn\DTExec.exe" /f C:\PackageFolder\Package.dtsx /Conf C:\ConfigFile.dtsConfig

Tuesday, December 24, 2013

Quality Measures in Healthcare

What is high quality health care? The answer can be “doing the right thing, at the right time, in the right way – and having the best possible results”. It is often measured in terms of cost, quality, and access as the so-called Iron Triangle of Health Care. Many healthcare programs have been set up to reward the quality of healthcare services, and to help consumers and payers make decision for the best use of their healthcare dollars. Today I would like to give a brief introduction about two common quality measures in healthcare that I learned through my work. One is HEDIS (Healthcare Effectiveness Data and Information Set) measures and the other is Physician Quality Reporting System (PQRS) . These are two main measures used for quality reporting.

HEDIS measures

HEDIS measures are developed, managed, and updated annually by the National Committee for Quality Assurance (NCQA). (NCQA is often referred as the “gold standard” in health plan accreditations. NCQA accredited health plans covers about 70.5% of all Americans enrolled in private health insurance according to NCQA’s own report.)

HEDIS measures address two types of care: preventive health care (children and adolescents, women and adolescent girls, adults, and seniors) and condition-specific care. Condition-specific care can be both chronic conditions and acute conditions. Sometimes HEDIS measures are categorized as chart review measures and claim-derived measures.

Patient-Centered Medical Home (PCMH)

NCQA provides PCMH program with the mission of improving the quality of healthcare. It includes clinical programs such as:

– Diabetes Recognition Program (DRP)

– Heart/Stroke Recognition Program (HSRP)

– Back Pain Recognition Program (BPRP)

Therefore, PCMH quality measures utilize the HEDIS measure set.

Examples of HEDIS measures

Johns Hopkins Medicine’s HEDIS tip sheet of September 2013 demonstrates how HEDIS measures are used by providers. You can see that CPT, ICD-9 codes, and UB Revenue Codes are used for billing purposes.

PQRS measures

On the other hand, Medicare has developed PQRS for its specially covered populations: primarily elder people over the age of 65 or younger with chronic disabilities. You can learn more detailed about PQRS from CMS here.

Tuesday, December 17, 2013

How to: Query BIT Data Type in SQL Server and SSIS

If you have ever met problems when trying to export BIT data type columns to files or use them in SSIS, here are the tips for you.

The bit data type is an integer data type that can take a value of 1, 0, or NULL. Thus it is often used to represent Boolean type values such as Yes/No, True/False, and On/Off. However, BIT columns might be interpreted differently by different providers. When you query BIT columns in SSMS, it displays as 1 and 0 as its value. However, when you either query BIT columns in SSIS or export them to text files, it won’t be 1 or 0 any more. Below will show you some tricks to handle BIT columns.

How SSIS handle BIT columns?

In SSIS, BIT columns are interpreted as Boolean [DT_BOOL]. How to tell this?

You need right click the OLE DB Source component and choose “Show Advanced Editor”. Then click the “Input and Output Properties” Tab. Expand “Output Columns” under OLE DB Source Output. You will notice that the BIT column is Boolean type as shown below. Here column IsSHP is the BIT data type column.

bitTypeSSIS

So when you need to query BIT column, you need to directly use TRUE or FALSE. Below shows that how you can query BIT columns in the conditional split task.


bitTypeSSISv0
Note that there is no quote around true in the expression.

How to export BIT columns to files as integers in SSIS?

Solution 1: Using Derived Columns
bitTypeSSISDerivedColumn
Or if you want to store it with less storage, you can use DT_UI1 data type.
IsSHP_bit = [IsSHP] ? (DT_UI1)1 : (DT_UI1)0


Solution 2: Tweak OLE DB Source

First of all, you need to convert BIT columns in OLE DB connection manager from OLE DB Source Editor,
select convert(int, IsSHP) as IsSHP

Then change the DataType in the OLE DB Source to integer type DT_I4 as shown below.


bitTypeSSIS2
Now you will get 1 or 0 as output for BIT columns.


That’s it and hope it will help!

Tuesday, December 10, 2013

Earn Certificates to Differentiate Yourself?

Becoming a certified DBA/developer won’t make you an exceptional DBA/developer ; being an exceptional DBA/developer doesn’t require any certification. But certifications can definitely add a plus if you have another competitor with the same other qualifications but without certifications. Below is the info about Microsoft SQL Server certifications for recent versions of SQL Server.

SQL Server 2012 related certifications

Microsoft Certified Solutions Associate (MCSA)

MCSA: SQL Server 2012 certification. It requires to pass three exams:

  1. Exam 70-461: Querying Microsoft SQL Server 2012
  2. Exam 70-462: Administering Microsoft SQL Server 2012 Databases
  3. Exam 70-463: Implementing a Data Warehouse with Microsoft SQL Server 2012

Microsoft Certified Solutions Expert (MCSE)

There are two paths for expert level certificates. The prerequisite for taking expert level exams is to earn the above MCSA.

MCSE: Data Platform

  1. Exam 70-464: Developing Microsoft SQL Server 2012 Databases
  2. Exam 70-465: Designing Database Solutions for SQL Server 2012

MCSE: Business Intelligence

  1. Exam 70-466: Implementing Data Models and Reports with Microsoft SQL Server 2012
  2. Exam 70-467: Designing Business Intelligence Solutions with Microsoft SQL Server 2012

What will be the impact of SQL Server 2014?

From the article: How will the updates for SQL Server 2014 impact official Microsoft training and certifications? It seems that SQL Server 2014 won’t affect MCSA for now but it will definitely impact MCSE exams taken after 2014 March. So if you want to complete your expert level exam for SQL Server 2012, please be sure to finish it by March.

Reference: Microsoft SQL Server certifications

Friday, December 6, 2013

SSIS Validation Status “VS_NEEDSNEWMETADATA”

Problem                                                 SSISdebug

Here is the error message:                                                   

Error on Component:[DFT_xxx]: "component "OLE DB Destination" (2327)" failed validation and returned validation status "VS_NEEDSNEWMETADATA".

It looks like some data mapping for columns in the SSIS package were lost after being deployed to the production. Both dev and production have the same number of columns. Also the name looks the same!

Solution

Check the columns to see whether they have the same case in the production as in the development. SSIS is CASE sensitive and it cannot automatically recognize those columns if the only difference is the case.

How to update column name?

You need to use sp_RENAME to update column name as below.

sp_RENAME '[dbo].[tableName].columnname', 'ColumnName' , 'COLUMN'
GO

Hope it will help you debug faster!

Friday, November 29, 2013

Quotes about Making Predictions / Data Mining

As the saying “prediction is very difficult – especially of the future” is one of golden rules among economists, below are some of the quotes about making predictions with data:  crystal-ball-300x240

  • “The art of prophecy is very difficult, especially about the future.”Mark Twain
  • “Any time there's a 50% chance of getting something right, there's a 90% chance you'll get it wrong.”Yogi Berra
  • “No one can possibly know what is about to happen: it is happening, each time, for the first time, for the only time.”James Baldwin
  • “Occurrences in this domain are beyond the reach of exact prediction because of the variety of factors in operation, not because of any lack of order in nature.”Albert Einstein
  • “The best way to predict your future is to create it.”Abraham Lincoln
  • “Prediction is very difficult, especially about the future.”Niels Bohr
  • “To write it, it took three months; to conceive it three minutes; to collect the data in it all my life. ”F. Scott Fitzgerald, American author
  • “Where is the Life we have lost in living?
    Where is the wisdom we have lost in knowledge?
    Where is the knowledge we have lost in information?”
    Thomas Stearns Eliot, poet, 1888-1965
  • “Errors using inadequate data are much less than those using no data at all.”Charles Babbage, British Mathematician and Inventor, 1791-1871
  • “The goal is to transform data into information, and information into insight.” — Carly Fiorni, former president of Hewlett Packard
  • “Big data is mostly about taking numbers and using those numbers to make predictions about the future. The bigger the data set you have, the more accurate the predictions about the future will be.”Anthony Goldbloom, the founder and CEO of Kaggle
  • “The world is now awash in data and we can see consumers in a lot clearer ways.”Max Levchin, PayPal co-founder.
  • “Data is a precious thing and will last longer than the systems themselves.”Tim Berners-Lee, inventor of the World Wide Web.
  • Drowning in Data yet Starving for Knowledge” — Anonymous
Image courtesy of KATIE DRUMMOND.

Sunday, November 24, 2013

Script Task: Find the Day of Week in SSIS

Problem

How to find out the day of week in SSIS to achieve the same result as DATENAME( dw, datecolumn ) as in TSQL? For example, we need to find out “2013-11-24” is Sunday or Saturday.

Solution

Suppose FROM_SERVICE_DATE is the date column to derive the info about day of week. Below are two options. Using “Script Task” seems more complicate than directly using “Derived Column”. But it is more readable, isn’t it? Moreover, from this example, you can learn to use “Script Task” to handle multiple CASE situations.

Using “Derived Column”

Below is the expression to input in “Derived Column” task by using nested Conditional Operator in SSIS.

DATEPART("dw",FROM_SERVICE_DATE)==1?"Sunday":DATEPART("dw",FROM_SERVICE_DATE)==2?"Monday":DATEPART("dw",FROM_SERVICE_DATE)==3?"Tuesday":DATEPART("dw",FROM_SERVICE_DATE)==4?"Wednesday":DATEPART("dw",FROM_SERVICE_DATE)==5?"Thursday":DATEPART("dw",FROM_SERVICE_DATE)==6?"Friday":DATEPART("dw",FROM_SERVICE_DATE)==7?"Saturday":""
 
Using “Script Task”

Step 1: Derive a new column with "dayNoOfWeek” as Name for the numeric number of day of week, such as 1st or 6th of the week.
SSIS_ScriptTask_WeekofDay_1_InputColumn
Step 2: Drag and drop Script Component task to the Data Flow Task (DFT) pane, you will be prompted to select script component type as shown below. Check “Transformation”. Then connect it with the above Derived Column Task.


SSIS_ScriptTask_WeekofDay_0_scriptTrans
Step 3: Under “Input Column” page, check “dayNoOfWeek” as shown below.

SSIS_ScriptTask_WeekofDay_2_InputColumn
Step 4:
Under “Inputs and Outputs” page, expand “Output0” and select “Output Columns”. Click “Add Column” button. A new column will appear. You can rename it to “NameOfWeekDay”.
Next, modify the column properties on the right. Set the new column to be a string with length of 10.


SSIS_ScriptTask_WeekOfDay_3a
Now go back to “Script” page on the top, select the “ScriptLanguage” to be C# related. Then click “Edit Script” button at the low right corner.


SSIS_ScriptTask_WeekOfDay_Script2
Enter the following codes in the Sub function Input0_ProcessInputRow.

public override void Input0_ProcessInputRow(Input0Buffer Row)
    {

        switch (Row.dayNoOfWeek) {             
            case 1: Row.NameOfWeekDay = "Sunday";       break;
            case 2: Row.NameOfWeekDay = "Monday";       break;
            case 3: Row.NameOfWeekDay = "Tuesday";      break;
            case 4: Row.NameOfWeekDay = "Wednesday";    break;
            case 5: Row.NameOfWeekDay = "Thursday";     break;
            case 6: Row.NameOfWeekDay = "Friday";       break;
            case 7: Row.NameOfWeekDay = "Saturday";     break;
        }
   }

If you prefer to Visual Basic as your script language, add the following code in the sub function of Input0_ProcessInputRow.       

SELECT CASE Row.dayNoOfWeek
Case 1
Row.NameOfWeekDay = "Sunday"
Case 2
Row.NameOfWeekDay = "Monday"
Case 3
Row.NameOfWeekDay = "Tuesday"
Case 4
Row.NameOfWeekDay = "Wednesday"
Case 5
Row.NameOfWeekDay = "Thursday"
Case 6
Row.NameOfWeekDay = "Friday"
Case 7
Row.NameOfWeekDay = "Saturday"
END SELECT
Next, build and close the Script Edit window.

Tuesday, November 19, 2013

Is Truncate DDL or DML?

Recently as I referred to the book of Microsoft SQL Server 2012 T-SQL Fundamentals by Ben-Gan, Itzik for DML definitions, there is a conclusion about TRUNCATE statement that surprised me at first:
"Another common misunderstanding is that TRUNCATE is a DDL statement, but in fact it is a DML statement."
Yes. I had always had that “misunderstanding” to regard TRUNCATE as DDL because TRUNCATE statement removes the data by deallocating pages used to store the table and records only the page deallocations in the transaction log. It is not simply row removal as DELETE statement.

After further discussion with the author (I really appreciated that Itzik spent time to help me understand his conclusion.), it turned out TRUNCATE is listed under Data Manipulation Language (DML) in the standard document ISO_9075_02_Foundation_2011_E. Also, the Standard ISO/ANSI SQL deals with only the logical aspects of the code, and logically TRUNCATE statement deletes all rows from a table.

At the same time, Microsoft listed TRUNCATE TABLE as Data Definition Language (DDL) Statements together with ALTER, CREATE, DISABLE/ENABLE TRIGGER, DROP, and UPDATE STATISTICS.

Isn’t it interesting to know about that?

Wednesday, November 13, 2013

How to: Set Row Versioning-based Transaction Isolation Levels


Row versioning-based transaction isolation levels were introduced since SQL Server 2005. They use row versioning for concurrency controls instead of locks. Therefore they are very attractive because no more blocking issues with usage of row versioning. However, since row versioning-based isolation levels are related to both database-level settings and transaction-level settings, it is very easy to get confused about how to set it right. I hope you will get a clear idea after reading this.


Two types, three commands?


There are two types of transaction isolation levels based on row versioning. One is READ COMMITTED using row versioning. (It is also known as READ COMMITTED SNAPSHOT isolation or RCSI. ) The other is SNAPSHOT isolation. The difference between the two is the "cutoff" time for the latest row versions they will use.  RCSI uses latest row versions when the statement starts while SNAPSHOT uses the latest committed row when the transaction starts.
On the other hand, there are three commands involved in order to set RCSI and SNAPSHOT isolation levels:
  1. ALTER DATABASE <current_database> SET READ_COMMITTED_SNAPSHOT ON;
  2. ALTER DATABASE <current_database>  SET ALLOW_SNAPSHOT_ISOLATION ON;
  3. SET TRANSACTION ISOLATION LEVEL SNAPSHOT
The first two commands are database-level settings and can be turned on at the same time. So the question is:

 

What is the right combination?


First of all, let us do some experiments. Create a table with two rows:

create table tranTest (id int primary key, price int)
insert tranTest values (1, 10), (2, 20) 

Suppose both ALLOW_SNAPSHOT_ISOLATION and READ_COMMITTED_SNAPSHOT are set OFF. You can double check the isolation levels by following the steps in How to Check Transaction Isolation Levels.

Experiment 1: ALLOW_SNAPSHOT_ISOLATION is OFF with READ_COMMITTED_SNAPSHOT is ON;

Run the following code in session 1:
ALTER DATABASE DemoSQL2012 SET ALLOW_SNAPSHOT_ISOLATION ON;
Next, run the following code in session 2:
SET TRANSACTION ISOLATION LEVEL SNAPSHOT

Ok, it seems fine with no error message appears. 
However, when you begin a new transaction in the same session that you have tried setting SNAPSHOT isolation, it will give the following error:


Msg 3952, Level 16, State 1, Line 3
Snapshot isolation transaction failed accessing database 'xxx' because snapshot isolation is not allowed in this database. Use ALTER DATABASE to allow snapshot isolation.

 

That means that in order to start a SNAPSHOT transaction, you must have ALLOW_SNAPSHOT_ISOLATION to be turned on. Only when ALLOW_SNAPSHOT_ISOLATION database option is set ON, the support for snapshot isolation transaction is activated. Moreover, the database engine does not generate row versions for modified data until all active transactions that have data modified complete. Interestingly, it does not matter whether READ_COMMITTED_SNAPSHOT is ON or OFF.
After you set ALLOW_SNAPSHOT_ISOLATION option on, run the following codes:

---- EXPERIMENT for SNAPSHOT ISOLATION
-- Connection 1
BEGIN TRAN;

  SELECT id, price
  FROM dbo.tranTest
  WHERE id = 2;
  UPDATE dbo.tranTest
    SET price += 1.00
  WHERE id = 2;

  SELECT id, price
  FROM dbo.tranTest
  WHERE id = 2;

-- Connection 2
SET TRANSACTION ISOLATION LEVEL SNAPSHOT;

BEGIN TRAN;

  SELECT id, price
  FROM dbo.tranTest
  WHERE id = 2;
-- 20
-- Connection 1
COMMIT TRAN;

-- Connection 2
  SELECT id, price
  FROM dbo.tranTest
  WHERE id = 2;
--- 20
COMMIT TRAN;

BEGIN TRAN;

  SELECT id, price
  FROM dbo.tranTest
  WHERE id = 2;
--- 21
COMMIT TRAN;

-- Connection 1
-- Cleanup
UPDATE dbo.tranTest
  SET price = 20.00
WHERE id = 2;


Notice that in session 2, we got same results for reading in the first transaction but a different result for the later transaction. That proves that SNAPSHOT isolation level always read the last committed row version when the transaction starts. The cutoff for the row to be read is when transaction starts.


Experiment 2: ALLOW_SNAPSHOT_ISOLATION is ON with READ_COMMITTED_SNAPSHOT is ON;


In order to enable the database option for READ_COMMITTED_SNAPSHOT, please only have one active session connected to the related database. Then run the command:

ALTER DATABASE DemoSQL2012 SET READ_COMMITTED_SNAPSHOT ON;


Then test the isolation level by running these codes:
--- EXPERIMENT FOR READ_COMMITTED_SNAPSHOT
-- Connection 1
BEGIN TRAN;

  SELECT id, price
  FROM dbo.tranTest
  WHERE id = 2;

  UPDATE dbo.tranTest
    SET price += 1.00
  WHERE id = 2;

  SELECT id, price
  FROM dbo.tranTest
  WHERE id = 2;

-- Connection 2
BEGIN TRAN;

  SELECT id, price
  FROM dbo.tranTest
  WHERE id = 2;
-- 20
-- Connection 1
COMMIT TRAN;

-- Connection 2
  SELECT id, price
  FROM dbo.tranTest
  WHERE id = 2;
--- 21
COMMIT TRAN;

-- Connection 1
-- Cleanup
UPDATE dbo.tranTest
  SET price = 20.00
WHERE id = 2;


If you get the same results in session 2 as shown above, congratulations! The difference in the results demonstrates that READ_COMMITTED_SNAPSHOT always get the last committed row version right before the statement starts.

Conclusion


In order set SNAPSHOT transaction isolation, you need to first turn on the ALLOW_SNAPSHOT_ISOLATION option at database level. Then at the related session, run "SET TRANSACTION ISOLATION LEVEL SNAPSHOT" to set isolation levels at transaction level.

On the other hand, for RCSI, you only need to have READ_COMMITTED_SNAPSHOT turned on at database level.