Archive for data quality management

Planes, Trains, Automobiles… and Meetings

Each Labor Day weekend, analysts discuss the impact of gasoline prices and other factors on travel driven by an extra day off and the symbolic end of summer. Some estimates indicate that a greater number of travelers by car saved a collective $1.4 billion this year, much of which was spent elsewhere during trips.map-clipart-travel-map
Deciding if and when to travel greater distances is a more complicated process since it often requires relatively expensive air travel with a commitment in advance. The price of airline tickets is significantly determined by the timing of purchase relative to travel. Analytics indicate that the optimal time to purchase tickets is generally between three and eight weeks prior to travel. The specific timing changes based on the time of year and also varies significantly by departure and arrival airports. It unfortunately does not require advanced analytics to observe that prices tend to rapidly increase beginning two weeks prior to travel.
Associations effectively create similar opportunities for customers through meetings and events. Associations spend considerable time and effort on planning and marketing activities based on a range of customer data including prior event participation, topical interests, individual engagement, meeting sessions, organization characteristics, continuing education requirements, and demographics. Registration patterns are closely monitored against prior year’s activity and current goals. Sometimes these patterns are not consistent with historical data. In some cases a trend of early registrants fails to continue and in other cases a late surge of registrants follows a period of marketing efforts.
Associations can supplement their AMS data with external sources to help explain such patterns. The U.S. Census Bureau makes very useful geographic data available including zip codes with latitude/longitude mapped to several other data points. The Census Bureau also offers an application programming interface to obtain detailed geographic data for specific addresses. This data can be used to calculate the distance between two points (it’s kind of a long story that involves the trigonometry I never thought I would use such as sine, cosine, and inverse tangent). Tableau data visualization software provides out-of-the-box geographic visualization capabilities based on data including zip code, area code, city, metropolitan/micropolitan statistical area, and even congressional district.
Various surveys and other analytics estimate that a common comfortable distance for car travel is around 200 miles.  Here is a map created using Tableau showing the population density by zip code within a 200 mile radius of Phoenix:
Phoenix
Here is a similar map based on the same color scale for Chicago:
Chicago
A far greater number of individuals are likely to consider driving to a meeting in Chicago than a meeting in Phoenix. It is important to note that this straight line distance is clearly a rough estimate and encountering a large body of water such as Lake Michigan would challenge even the most dedicated meeting attendee. Fortunately you can leverage more advanced data sources such as the Google Maps API to estimate driving distances.
This data provides associations with great analytics opportunities to drive meeting participation by aligning with customer travel decisions. For example, marketing campaigns can consider travel decision timing and target specific airport markets. Registration pricing deadlines for individual events can incorporate location analytics. Your association can also create opportunities for strong customer engagement through efforts to help groups of potential meeting attendees organize buses together or even carpool.
Deriving travel distance analytics from customer location data is an example of gaining value from data. This demonstrates why it is important to consider analytics when designing business processes throughout the organization and not just as part of analysis after the fact. Increased meeting attendance often contributes to other benefits including member retention, publication revenue, engagement, and membership recruitment.
Understanding the impact of customer travel scenarios is a great way to leverage association analytics to create the future and grow your number of happy conference attendees.
 

How Can Associations Use SQL 2012 Data Quality Services (DQS)?

scuba-cat-6

This is what bad data is like…

Data Quality

How valuable is the Ford Pinto brand?  How about a patent for a cat scuba suit?  Like other intangible assets, the value of data is rooted in its quality.  Nonprofits have an even greater motivation to maintain the integrity of their information, because their organizational success is dependent on effectively communicating via membership data.  One of the data quality management tools we deploy at DSK Solutions includes SQL Server 2012 Data Quality Services.  DQS is especially valuable because it offers an efficient, semi-automated means for associations to create a data quality foundation to build their enterprise analytics.  By identifying data attributes and functional dependencies, DQS can effectively correct bad entries (cleanse) and eliminate duplicate records (match).
 

The Benefits of Data Quality Services

First, DQS has the powerful ability to automatically discover knowledge about your data.  Even with only a sample of the larger data set, DQS can identify inconsistent, incomplete, and invalid data.  For example, using Term-Based Relations (TBRs), DQS can identify strings that are inconsistent with the rest of the entries in that column.  So, if ninety-nine of your entries use “123 Oak St” as the street address and one uses “124 Oak St,” DQS will correct the odd entry to be consistent.  Additionally, developers can build domain rules that define the correct format or value.  For example, if a user email does not follow the pattern “something@somthing.com”, DQS can either mark the entry as invalid for later review or automatically update with the missing characters.
Next, DQS can check for consistencies throughout the record.  Using third party reference tools or user-defined rules, associations can validate that data is logical.  For example, if an entry lists a member city as “Chicago” and member state as “DC,” DQS can identify the inconsistency and either mark it as invalid or correct it to “IL.”  Another valuable feature is that users can develop matching rules to determine duplicate entries.  For example, if two records are 95% similar (again, based on user-defined rules), DQS can eliminate duplicate rows and consolidate the data into one unique entry.

2 types of data profiling

Two types of data profiling


Finally, DQS has an effective user-interface for controlling the discovery and cleansing process.  A DQS project steps through mapping the fields to rule domains, creating results that rate data on completeness and accuracy, and managing the project results.
Unfortunately, DQS is not a magic bullet.  There are some challenges to implementing DQS for large databases.  For example, the implementation of DQS for an AMS/CRM involves many important steps.  First, analysts, like DSK Solutions, consolidate problem data into a single table or view (DQS transformations work on one table, not entire databases).  Next, associations cleanse and match the data using a combination of DQS SSIS transformation and manual data verification.  Finally, data experts reintegrate the groomed data back into the original table structure (including considerations for timing, normalizing, and other SQL scripting).
DQS implementation

DQS Implementation for netFORUM


To conclude, it is important to note the DQS is knowledge-driven, meaning that it will take data-oriented managers to develop a strategy for a final asset.  As the non-profit world embraces data, DQS will play a pivotal role in creating the level of quality necessary to build an effective business intelligence infrastructure.

Data is an Asset

Data is one of the most important assets an association has because it defines each association’s uniqueness. You have data on members and prospects, their interests and purchases, your events, speakers, your content, social media, press, your staff, budget, strategic plan, and much more. But is your data accurate and are you using it fully? Your data is an asset and should be carefully cultivated, managed and refined into information which will allow you to better serve your community and ensure you remain viable in today’s competitive landscape.
Although data is one of the most important ‘raw materials’ of the modern world, most organizations do not treat it that way. In fact, according to The Data Warehousing Institute, the cost of poor data quality in America is six hundred billion dollars every year. Data quality issues are also the cause of many failed IT projects.
Your data is talking to you, are you listening?
Associations have known for a long time that data is essential for market segmentation. However, there is so much more that can be done to harness data and use it as a strategic asset. Hidden within your data are stories about which members are at risk of not renewing, which prospects are likely to join, who might make a good speaker, where the best location for your next event is, the level to which you can raise rates without a decrease in member count, your best strategy for global expansion, and much more. We would be wise to listen to the stories our data is telling us, and to make sure the data on which they are based, is accurate.
The insights you glean from your data are only as good as the underlying data itself. It’s obvious that if the input is flawed, the output will be misleading. When it comes to data, there is a direct correlation between the quality of the data and the accuracy of the analysis. I’m no longer surprised at the high number of duplicate records, and the high percentage of incomplete, inaccurate and inconsistent data we find when we begin to analyze an association’s data. Because it is difficult to quantify the value of data in the same way we can measure cash, buildings and people, the activities designed to manage and protect data as an asset are often low on the priority list. That is, until a business intelligence or analytics project is undertaken. Then suddenly data quality management (DQM) takes center stage.
DQM is a Partnership between Business and IT
Business responsibilities include: 1) Determining and defining the business rules that govern the data, and 2) Verifying the data quality.  IT responsibilities include: 1) Establishing architecture, technical facilities, systems, and databases, and 2) Managing the processes that acquire, maintain, disseminate data
DQM is a Program, Not a Project
DQM is not really a “project” because it doesn’t “end”. Think of DQM as a program consisting of the following activities:

  • Committing to and managing change – are the benefits clear and is everyone on board?
  • Describing the data quality requirements – what is the acceptable level of quality?
  • Documenting the technical requirements – how exactly will we clean it up and keep it clean?
  • Testing, validating, refining – is our DQM program working?  How can we make it better?

DQM is Proactive and Reactive
The proactive aspects of DQM include: establishing the structure of a DQM team, identifying the standard operating procedures (SOPs) that support the business, defining “acceptable quality”, and implementing a technical environment.  The reactive aspects include identifying and addressing existing data quality issues. This includes missing, inaccurate or duplicate data. For example:

  1. Important data may be missing because you have never collected it. The information you have on a member may allow you to send a renewal, but it’s not enough for you to determine their level of interest in the new programs you are rolling out in the coming year. Or the information you have on your publications is enough to be able to sell them online, but because the content is not tagged in a way that matches customer interest codes, you can’t serve up recommendations as part of the value your association offers. Associations must not only have accurate data but more data in order to fully understand the contextual landscape in which our members and prospects operate.
  2. When organizations merge, data from the two separate organizations needs to be combined, and it can often be very time-consuming to determine which aspects of the record to retain and which to retire. A determination must also be made about how to handle the historical financial transactions of the merged company.
  3. With the ability for visitors to create their own record online, the increase in duplicate records is on the rise. The Jon Smith who purchased a publication online is really the same Jonathan Smith who attended the last three events and whose membership is in the grace period. Because he used different email, a duplicate record is created and you miss the opportunity to remind him of the value of his membership when he registered for the event.

Sometimes it’s not until a data quality issue surfaces in a publicly embarrassing way that an organization decides to truly tackle the problem – a board report has erroneous data, an association cannot reply quickly to a request for statistics from an outsides source, thereby losing the PR opportunity, the CEO cannot determine the primary contact for an organization in the system. It’s usually only after several situations like these that DQM receives serious attention, but it is unfortunate that it often starts with a search for blame. This engenders fear which represents a threat to the success of a DQM initiative. It is essential that DQM programs begin with acceptance of the current state and commitment to a better future. A promise of “amnesty” with regard to what happened in the past can go a long way toward fostering buy-in for the program.
How do you Eat an Elephant?
The easiest way to start a DQM program is to start small. Identify an area that requires attention and focus first on that. In order to obtain support from key stakeholders, show how the program ties in with the association’s strategic plan.  After you identify the primary focus (for some it might not be company names, it might be demographics), set an initial timeframe (such as 3 months). Make the first project of the program manageable so you can obtain a relatively quick win and work the kinks out of your program.
Steps for your First DQM Initiative:

  1. Create a new position or assign primary DQM responsibilities to an individual
  2. Build a cross functional team and communicate the value of the program
  3. Decide how to measure quality (example # records reviewed/cleaned)
  4. Set a goal (# records)
  5. Reference the goal in the performance evaluation of the individuals on the team
  6. Evaluate progress
  7. Revise

Data is an asset and is one of the most important assets an association has because it is unique in its detail and context and can be used strategically to ensure we remain relevant and viable. When analyzed and leveraged properly it can provide a competitive advantage in attracting and retaining members and creating new sources of non-dues revenue. It is important that the underlying data is accurate and complete and a well-organized DQM program should be considered essential. Worldwide attention is being given to the importance of making “data-driven decisions”. Private industry and government have been using data to guide their decisions for many years and now is the time for associations to recognize data as the valuable asset it is.

Balanced Scorecard and Business Intelligence

Sometimes when an organization begins a business intelligence initiative (BI) they are so excited about data visualization and data transparency in the form of dashboards that the first thing they want to do is start measuring everything.  I believe that strategy comes before measures and those organizations that thoughtfully and purposefully align what they are measuring to their strategic plan achieve more meaningful long-term results from their BI initiative.

The Balanced Scorecard is a performance management system designed to align, measure, and communicate how well an organization’s activities are supporting the strategic vision and mission of the organization.
It was originated by Drs. Robert Kaplan (Harvard Business School) and David Norton as a performance measurement framework that added strategic non-financial performance measures to traditional financial metrics to create a more ‘balanced’ view of organizational performance.  Four strategic perspectives are addressed within the Balanced Scorecard framework:
  1. Customer 
  2. Financial 
  3. Internal Processes – commonly includes technology, systems, etc.
  4. Learning and Growth (aka “Organization Capacity”) – commonly includes people, training, etc.
Objectives (goals) are set for each perspective, measures (numbers) that represent things to be measured (such as sales, customers, returns) are identified and can then be transformed into ratios or counts, which serve as Key Performance Indicators (KPIs).  Initiatives (projects) are undertaken in order to “move the needle” in a positive direction on the KPI gage for that measure.
 
Balanced Scorecard dashboards include both leading and lagging indicators.  For example, customer and financial KPIs are traditionally lagging indicators – the numbers indicate what has already happened.  KPIs for the two perspectives of internal processes and learning/growth are leading indicators.  This is because positive results achieved with respect to internal processes and learning/growth initiatives should lead to a positive result in the customer and financial KPIs.
 
Gartner is a leader in the field of information technology research and they organize BI capabilities into three main categories:  analysis, information delivery and integration.  The concept of “scorecards” fits into their BI analysis category.  Gartner recognizes that tying the metrics displayed in a dashboard to an organization’s strategy map ensures that the most important things are being measured, because each measure on a scorecard is tied to the organization’s strategic plan.  Sounds obvious right?  But it’s still relatively rare and that’s a subject for another post.