Formatting Telephone Numbers

I created this simple PowerCenter mapplet to format phone numbers. Informatica Developer (IDQ) may have a different or easier way to do it but I wanted something I could use in PowerCenter without the hassle of maintaining a separate object in IDQ and having to import it into PowerCenter. You can alter the mapplet to format other data like postal codes, SSNs, etc.

Download the mapplet and give it a try.

Mapplet Overview

My approach is a simple one, take the phone number, remove all special characters, then reformat the number with dashes as 999-999-9999. This approach allows me to ignore how the phone number comes into the mapplet, for example partially formatted , not formatted or formatted correctly but not how I need it.

The mapplet's input is the telephone number without a country identifier. In addition to the reformatted phone number, the mapplet also outputs:

  • a Pattern (example xxx-xxx-xxx) based on the input number (this could be used for a quick/dirty data profiling)
  • the phone number parsed into its parts, area code, central office
  • the length of the input phone number

Read further to see detail on the mapplet works.

Telephone Pattern
Reg_Replace(Telephone, '[0-9]' , 'x'  )

I like this bit of code because you can use it to build patterns then use the patterns to find outliers. Just output the pattern(s) to a table or spreadsheet and DISTINCT the column to see the quality of your data. You could do the same thing for postal codes, SSNs, ext.

Telephone Number, without formatting, numbers only
REPLACECHR(0, Telephone_in, REPLACECHR(0, Telephone_in, '0123456789', ''), '')

If you want to reformat a telephone number, first remove any special characters from it's current formatting. Here I use two REPLACECHR functions, the inner REPLACECHR strips the numbers from Telephone, the result (which should be any special characters) is used as the character set for the outer REPLACECHR.

Area Code
IIF(LENGTH(Telephone_NbrOnly_v)>7,  SUBSTR(Telephone_NbrOnly_v,1,3) ,'')

I use the "numbers only" version of the telephone number to get the area code. If the telephone number is more than 7 numbers, I make the assumption the first 3 characters are the area code. If the telephone number is is 7 characters or less, there is no area code.
I also break out the Telephone number into separate fields, area code, central office, and extension. Then to output the formatted telephone number. I concatenate the telephone number with - (dash).

Decode(TRUE,
LENGTH( Telephone_NbrOnly_v)>7,  
Telephone_AreaCode_v || '-' || Telephone_CentralOffice_v || '-' || Telephone_LineNumber_v,
LENGTH( Telephone_NbrOnly_v)=7,  
Telephone_CentralOffice_v || '-' || Telephone_LineNumber_v,
'')


Thank you for your support
 or  PayPal


IMPORTANT,  YOU are responsible for backing up your systems before trying or implementing any suggestions from this blog. I do not guarantee 100% accuracy of any code examples. I do not presume to know your system environment(s) or Security requirements; all code examples from this blog should be thoroughly tested before any attempted use on a production system.