Pages

Tuesday, August 27, 2013

How to: Send Emails by Script Task in SSIS

Sending emails are common tasks in control flow of SSIS packages. We need to inform users when packages run completely or fail. Script Task in SSIS provides a more simple but more flexible approach than the 'Send Mail Task'. Using 'Script Task', we can customize our sending email tasks. That is to say, with 'Script Task', we can configure senders, recipients, subjects, and bodies of an e-mail message. OK, let's get started!


First, you need to add the following for the header of your script.


Imports Microsoft.SqlServer.Dts.Runtime
Imports System.Net.Mail
Imports System.Net


Then, define variables:


Dim myHtmlMessage As MailMessage
Dim mySmtpClient As SmtpClient

The sender, recipient, subject, and body of an e-mail message may be specified as parameters when a MailMessage is used to initialize a MailMessage object.

Below is the whole script body:


Imports System
Imports Microsoft.SqlServer.Dts.Runtime
Imports System.Net.Mail
Imports System.Net


Public Class ScriptMain
  Public Sub Main()
    Dim myHtmlMessage As MailMessage
    Dim mySmtpClient As SmtpClient


    myHtmlMessage =
    New MailMessage(sender, recipient, "Subject", "body1." + vbCrLf + vbCrLf + "body2:" + Dts.Variables("System::MachineName").Value.ToString + vbCrLf + "Execution by:" + Dts.Variables("System::UserName").Value.ToString + vbCrLf + "Package StartTime: " + Dts.Variables("System::StartTime").Value.ToString + vbCrLf + vbCrLf + strSuccessMessage)

   mySmtpClient =
    New SmtpClient(Dts.Variables("cvarSMTPServer").Value.ToString())
  mySmtpClient.Send(myHtmlMessage)

  Dts.TaskResult = Dts.Results.Success
  End Sub
End Class


Now you can send emails by executing the task!

More freebies to addon


If you would like to achieve even more, keep reading.

  1. Attachments 

    In order to add attachment to emails, you need use
        myHtmlMessage..Attachments.Add(New Attachment("c:\textfile1.txt"))

      

  2. Users and Password

    The credentials returned by DefaultNetworkCredentials represents the authentication credentials for the current security context in which the application is running. mySmtpClient.Credentials = CredentialCache.DefaultNetworkCredentials

    If you don't want to use windows authentication to connect to your SMTP host and want to specify a different username and password, you can then use NetworkCredential class as shown below

     mySmtpClient.Credentials = New NetworkCredential(UserName, Password, Domain)

Wednesday, August 14, 2013

Stairway to Integration Service: Script Task

Another effective way to make SSIS packages dynamic is to write custom code that perform tasks you cannot perform with the built-in components. Here is a cookbook/cheatsheet that will speed up your learning of customized script tasks or components.


There are two important features you need to determine before you click "Edit Scripts" button. First of all, you need to add existing variables to the ReadOnlyVariables and ReadWriteVariables lists in the Script Transformation Editor or Script Task Editor to make them available to the custom script. Secondly, you need to determine which languages you would like to use: C# or Visual Basic.
Besides these, you need to add necessary libraries at the beginning of scripts. You can refer the article of Adding the Script Task to Your SSIS Packages for detailed steps.

Note that the following examples are all for 2008 version. This is not an exhaustive list. Please stay tuned for more tasks accomplished by Script Task in SSIS:


ProblemsExpression
Get files' name from readable variables into String sfilePathname.Solution 1:
Dim sfilePathname As String
sfilePathname = Dts.Variables("vFilePathName").Value.ToString

Solution 2:
Dim sfilePathname As String = Dts.Variables("vFilePathName").Value
Get the date when the file was created.Dts.Variables("vFileDate").Value = File.GetLastWriteTime(sfilePathname)
Get dateStamp from filename if it contains date info at last 6 strings.Dim dateStamp As String = sfilePathname.Substring(sfilePathname.LastIndexOf(("_")) + 1, 6)
Output value of dateStamp to ReadWriteVariables vYrMon.Dts.Variables("vYrMon").Value = CInt(dateStamp)

Get substring of dateStamp.Dim sYear As String = dateStamp.Substring(0, 4)
Get the system information from system variables.strSystemErrorEmailFrom = Trim(Dts.Variables("System::MachineName").Value.ToString())
Send SMTP mails.// Add "Imports System.Net.Mail" at the top
Dim myHtmlMessage As MailMessag 
Dim mySmtpClient As SmtpClient

myHtmlMessage=New MailMessage(strSystemErrorEmailFrom, strSystemErrorEmailTo, "Subject", "body")
    mySmtpClient =  New SmtpClient( Dts.Variables("cvarSMTPServer").Value.ToString() )
    mySmtpClient.Credentials = CredentialCache.DefaultNetworkCredentials
    mySmtpClient.Send(myHtmlMessage)
    Dts.TaskResult = Dts.Results.Success
Convert text to proper case
(ex. 1st letter in each word is uppercase)
Row.OutputName = StrConv(Row.InputName, VBStrConv.ProperCase)
Build Event Handler for OnError Event.Public Sub Main()
        Dim arrErrorMessages As Collections.ArrayList
        Try
            arrErrorMessages = CType(Dts.Variables("objErrorMessages").Value, Collections.ArrayList)
        Catch ex As Exception
            arrErrorMessages = New Collections.ArrayList()
        End Try
        arrErrorMessages.Add("Error on Component:[" + Dts.Variables("SourceName").Value.ToString() + "]: " + Dts.Variables("ErrorDescription").Value.ToString())
        Dts.Variables("objErrorMessages").Value = arrErrorMessages
        Dts.TaskResult = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success
End Sub
Get the filename from a full file path.Solution 1:
Dim fileName As String = "C:\mydir\test.txt"

Dim result As String = Path.GetFileName(fileName)
Console.WriteLine("GetFileName('{0}') returns '{1}'", fileName, result)

Solution 2:
RIGHT( @[User::FileName], FINDSTRING( REVERSE( @[User::FileName] ), "\\", 1 ) - 1 )
//Results: GetFileName('C:\mydir\test.txt') returns 'test.txt'
Dynamically pre-truncate table.sSQLScript = "DELETE FROM " + sTableName + vbCrLf
+ " WHERE " + sPKName + " > 0"
Dts.Variables("SQLScript1").Value = sSQLScript
Determine a varible's value depending on a file name If strFileName.Contains("BROUGHT").ToString Then
        intFacilityId = "1"
Else
        intFacilityId = "0"
End If
Get the filename without extension from a full file path.Dts.Variables("vFileName").Value = Path.GetFileNameWithoutExtension(Dts.Variables("vFilePath").Value.ToString) 
Split strings that delimited by "|".Dim sAry AsString() = col1.Split("|")
Check file exists or not.Solution 1:
Dim directory As DirectoryInfo = New DirectoryInfo("c:\")
Dim file As FileInfo() = directory.GetFiles("*.txt")
If file.Length > 0 Then
   Dts.Variables("User::FileExists").Value = True
Else
   Dts.Variables("User::FileExists").Value = False
End If
Solution 2:
If (System.IO.File.Exists("C:\mydir\test.txt")) Then
      Dts.TaskResult = ScriptResults.Success
    Else
      Dts.TaskResult = ScriptResults.Failure
End If

Friday, August 9, 2013

How to: Pad a Number with Leading Zeros

As mentioned in the previous post "What do NDC, NCPDP, NABP, NPI mean? ", sometimes a column requires a fixed length, such as NDC is a 11-digit number. If the source file does not provide a fixed-length number and the data is already loaded to your database, you need to message the column by padding it with leading zeros.

Here are two solutions by using SQL scripts or by using user-defined functions (UDFs).

  • Solution 1:
             UPDATE [tableName]
             SET NDC = RIGHT(REPLICATE('0', 11) + CONVERT(VARCHAR,NDC),11)

  • Solution 2:
    • Be careful that length of NULL string is NULL.
    • CREATE FUNCTION dbo.PadLeft
      (@str INT, @PadWidth INT, @PadChar CHAR(1))
      RETURNS VARCHAR(255)
      AS
      BEGIN
      RETURN RIGHT(REPLICATE(@PadChar, @PadWidth) + CONVERT(VARCHAR,@str),@PadWidth)
      END
      GO
    • After creating the function above, you can test it with "select dbo.PadLeft(12345, 11, '0')". The result should be "00000012345".


Thursday, August 8, 2013

What do NDC, NCPDP, NABP, NPI mean?

Today I will briefly talk about key terms in healthcare industry that I learned over time.

NDC: National Drug Code

The National Drug Code  (NDC) is a unique product identifier used in the United States for human drugs to identify the vendor (manufacturer), product and package size of all drugs and biologics recognized by the Food and Drug Administration (FDA). It is 10-digit numeric code with 3-segment numeric identifier assigned to each medication listed under Section 510 of FDCA. There are three kinds of combinations for NDC:  4-4-2, 5-3-2, or 5-4-1. 
  • The first segment, known as the labeler code, is assigned by FDA. A labeler is any firm that manufactures, repacks, or distributes a drug product.
  • The second segment, known as the product code, identifies a specific drug, strength, and dosage form of that drug.
  • The third segment, known as the package code, identifies the package size. 
The NDC can be found on the drug container (i.e., vial, bottle, or tube).
 
Pseudo-NDC
NDC derived by CMS (Centers for Medicare and Medicaid Services) is a 11-digit numeric code with a fixed length segmentation: 5-4-2. Since it is different from NDC by FDA, it is sometimes called pseudo-NDC. The NDC examples for diabetic supply list are shown below:


Since July 2013, all outpatient drug claims billed are required to include the J code and a valid NDC (National Drug Code), NDC quantity and NDC Unit of Measure (UOM). The NDC quantity is in the format of 9999.99 while the NDC UOM codes can be
  • F2: International Unit
  • GR: Gram
  • ME: Milligram
  • ML: Millilitre
  • UN: Unit

The NDCs on claims usually don't contain hyphens or spaces between segments.

NCPDP and NABP

A NCPDP Provider Identification number (NCPDP Provider ID) formerly known as the Pharmacy NABP number, is 7-digit code a to provide pharmacies with a unique, national identifier that would assist pharmacies in their interactions with pharmacy payers and claims processors.

NPI

A National Provider Identifier or NPI is a unique 10-digit identification number issued to health care providers in the United States by the Centers for Medicare and Medicaid Services (CMS) NPPES.

Reference: 
  1. CMS NDC 
  2. FDA NDC
  3. Anatomy Of The National Drug Code 

Monday, August 5, 2013

Concepts in Microsoft Association Rules: Lift, Support, Importance, and Probability

       You must have heard of the famous diapers and beer story that illustrates the correlation in shoppers' baskets. Market basket analysis by association rule mining has been widely used by retailers since 1990s to adjust store layouts, and to develop cross-promotion plans and catalogs. Nowadays instant recommendations with association rules becomes a hot spot for research. Microsoft Association Rules algorithm is a common algorithm to create association rules, which can be used in a market basket analysis. It supports several parameters that affect the behavior, performance, and accuracy of the resulting mining model. Therefore, it is important to have a clear understanding about these following concepts.      


LIFT


In data mining and association rule learning, lift is a measure of the performance of a model (association rule) at predicting or classifying cases as having an enhanced response (with respect to the population as a whole), measured against a random-choice targeting model. For example, suppose that 5% of the customers mailed a catalog without using the model would make a purchase. But a certain model (or rule) has identified a segment with a response rate of 15%. Then that segment would have a lift of 3.0 (15%/5%). Lift indicates how well the model improved the predictions over a random selection given actual results.

SUPPORT


Support is the probability of a transaction contains targeted item or itemset. The larger Support is, the larger number of cases that contain the targeted item or combination of items the model has. You can use parameter MINIMUM_SUPPORT and MAXIMUM_SUPPORT to define the thresholds. By default, MINIMUM_SUPPORT is 0.0 and MAXIMUM_SUPPORT is 1.0.

RULEs


The Rules tab in Microsoft Association Rules Viewer displays Probability, Importance, Rule that are related to rules that the mining algorithm finds.

Rule: A rule is a description of the presence of an item in a transaction based on the presence of other items.

Probability: The likelihood of a rule, defined as the probability of the right-hand item given the left-hand side item. By default, MINIMUM_PROBABILITY is 0.4. However, probability sometimes is misleading. For example, if every transaction contains a gift bag--perhaps the gift bag is added to each customer's cart automatically as a promotion, a rule predicting that gift bag has a probability of 1. It is accurate but not very useful. To flag the usefulness of a rule, Importance is the right measure to use.

Importance: A measure of the usefulness of a rule. A greater value means a better rule. The importance of a rule is calculated by the log likelihood of the right-hand side of the rule, given the left-hand side of the rule. For example, in the rule of If {A} then {B}, the importance is Log( Pr(A&B)/  Pr(B without A) ) .

Thursday, August 1, 2013

How to: Extract the Name from a Combination of Name and Title

Data transformation often requires to extract info/substring set from part of strings. If this substring set is a fixed length, it is easy to implement by functions of Substring in SSIS. However, when the substring set is in various length, you have to find out the rule first.

For example, if a column NameTitle is a combination of name and title, we want to separate name and title. Suppose that all members has only one string as titles, i.e., we have NameTitle = "Andrew S MD", how to separate "Andrew S" and "MD"?

SSIS expressions makes it easy to implement that kind of data transformation. For introduction of SSIS expressions, you can read Stairway to Integration Service: SSIS Expressions. Now go ahead to drag a Data Flow Task and get the source column. In Derived Column Transformation, define a column that "Add as a New Column", then follow the steps below to get its SSIS expression:

  1. Use LEN and REPLACE functions to find out the numbers of spaces NameTitle column contains. Replace all spaces with empty strings first. Then substract the length of original strings with the modified one. Pay attention to match parentheses in same colors.
    LEN([NAMETITLE])-LEN(Replace([NAMETITLE], " ", ""))
  2. Determine the location of the space right before titles (or space that separates name and title) by FINDSTRING function.
    FINDSTRING( [NAMETITLE], " ", LEN([NAMETITLE])-LEN(Replace([NAMETITLE], " ", "")) )
  3. Use the SUBSTRING function to retrieve names from NameTitle.
    SUBSTRING( [NAMETITLE], 1, FINDSTRING( [NAMETITLE], " ", LEN([NAMETITLE])-LEN(Replace([NAMETITLE], " ", "")) )-1 )
BTW, in Excel, you can use LEFT(A2, LEN(A2)- LEN(RIGHT(A2, LEN(A2)-SEARCH("@", SUBSTITUTE(A2, " ", "@", LEN(A2)_LEN(SUBSTITUTE(A2, " ", ""))))))-1) to extract names from the combination, if A2 is the location of NameTitle.