Pages

Thursday, August 1, 2013

How to: Extract the Name from a Combination of Name and Title

Data transformation often requires to extract info/substring set from part of strings. If this substring set is a fixed length, it is easy to implement by functions of Substring in SSIS. However, when the substring set is in various length, you have to find out the rule first.

For example, if a column NameTitle is a combination of name and title, we want to separate name and title. Suppose that all members has only one string as titles, i.e., we have NameTitle = "Andrew S MD", how to separate "Andrew S" and "MD"?

SSIS expressions makes it easy to implement that kind of data transformation. For introduction of SSIS expressions, you can read Stairway to Integration Service: SSIS Expressions. Now go ahead to drag a Data Flow Task and get the source column. In Derived Column Transformation, define a column that "Add as a New Column", then follow the steps below to get its SSIS expression:

  1. Use LEN and REPLACE functions to find out the numbers of spaces NameTitle column contains. Replace all spaces with empty strings first. Then substract the length of original strings with the modified one. Pay attention to match parentheses in same colors.
    LEN([NAMETITLE])-LEN(Replace([NAMETITLE], " ", ""))
  2. Determine the location of the space right before titles (or space that separates name and title) by FINDSTRING function.
    FINDSTRING( [NAMETITLE], " ", LEN([NAMETITLE])-LEN(Replace([NAMETITLE], " ", "")) )
  3. Use the SUBSTRING function to retrieve names from NameTitle.
    SUBSTRING( [NAMETITLE], 1, FINDSTRING( [NAMETITLE], " ", LEN([NAMETITLE])-LEN(Replace([NAMETITLE], " ", "")) )-1 )
BTW, in Excel, you can use LEFT(A2, LEN(A2)- LEN(RIGHT(A2, LEN(A2)-SEARCH("@", SUBSTITUTE(A2, " ", "@", LEN(A2)_LEN(SUBSTITUTE(A2, " ", ""))))))-1) to extract names from the combination, if A2 is the location of NameTitle.

No comments:

Post a Comment