Archive for data discovery

How to Determine What Data to Combine

There is a lot of value in combining data from one business area with data from another business area. Similar to a jigsaw puzzle, when we combine data sets and put the pieces together, we get a complete picture of customers, events, and activities. But how do you know what data to combine?

Take Inventory of What You Have

To get started, take inventory of the data you already have available in your business area. Let’s take members for example. Membership teams often require a high level of granularity. They also have years of membership data that can be leveraged. The data they have may be stored in their Association Management System, Customer Relationship Management system, and their Financial Management System.
After identifying the data sources, consider what data is stored in each data source. Identify the file type and how you extract or integrate the data with other systems.

Consider What’s Missing

To determine what data could augment your existing data source, think about the aspects of the customer or activity that you care about.
What information could help answer your business question? If you don’t have a business question, what information would provide additional insight on customer behavior?
The membership team typically has data on when a member joined, membership type, length of membership, contact information, and dues payments. What other information would help them understand members? It may be helpful to combine membership data with components from other areas such as the number of events attended in the past two years, the last meeting attended, age, member status, tenure in the industry, and total spending in the past year.

Combine and Analyze

Combine the data and analyze it. Look for trends ad relationships. Distill down the information so that each component of activity that is of interest is presented as attributes of that person.
 
The table below shows some combined information as it relates to the Top 10 and Bottom 10 thread topics from an association’s online community. Using the information, we can see what a correlation may exist between a person’s attributes and the most active threads. From the data below, it looks like younger individuals with less membership tenure and professional development are replying and posting to threads generated by younger authors than the bottom threads. Perhaps action can be taken to target the younger members with messaging encouraging them and providing the benefits of authoring and responding to community posts.
thread-stats
Once you combine data, you can determine if there is actually a relationship between two data sets. You can also see if you need additional data to augment your analysis. Using business intelligence tools, like Tableau, allows you to easily connect data sets and experiment.

Planes, Trains, Automobiles… and Meetings

Each Labor Day weekend, analysts discuss the impact of gasoline prices and other factors on travel driven by an extra day off and the symbolic end of summer. Some estimates indicate that a greater number of travelers by car saved a collective $1.4 billion this year, much of which was spent elsewhere during trips.map-clipart-travel-map
Deciding if and when to travel greater distances is a more complicated process since it often requires relatively expensive air travel with a commitment in advance. The price of airline tickets is significantly determined by the timing of purchase relative to travel. Analytics indicate that the optimal time to purchase tickets is generally between three and eight weeks prior to travel. The specific timing changes based on the time of year and also varies significantly by departure and arrival airports. It unfortunately does not require advanced analytics to observe that prices tend to rapidly increase beginning two weeks prior to travel.
Associations effectively create similar opportunities for customers through meetings and events. Associations spend considerable time and effort on planning and marketing activities based on a range of customer data including prior event participation, topical interests, individual engagement, meeting sessions, organization characteristics, continuing education requirements, and demographics. Registration patterns are closely monitored against prior year’s activity and current goals. Sometimes these patterns are not consistent with historical data. In some cases a trend of early registrants fails to continue and in other cases a late surge of registrants follows a period of marketing efforts.
Associations can supplement their AMS data with external sources to help explain such patterns. The U.S. Census Bureau makes very useful geographic data available including zip codes with latitude/longitude mapped to several other data points. The Census Bureau also offers an application programming interface to obtain detailed geographic data for specific addresses. This data can be used to calculate the distance between two points (it’s kind of a long story that involves the trigonometry I never thought I would use such as sine, cosine, and inverse tangent). Tableau data visualization software provides out-of-the-box geographic visualization capabilities based on data including zip code, area code, city, metropolitan/micropolitan statistical area, and even congressional district.
Various surveys and other analytics estimate that a common comfortable distance for car travel is around 200 miles.  Here is a map created using Tableau showing the population density by zip code within a 200 mile radius of Phoenix:
Phoenix
Here is a similar map based on the same color scale for Chicago:
Chicago
A far greater number of individuals are likely to consider driving to a meeting in Chicago than a meeting in Phoenix. It is important to note that this straight line distance is clearly a rough estimate and encountering a large body of water such as Lake Michigan would challenge even the most dedicated meeting attendee. Fortunately you can leverage more advanced data sources such as the Google Maps API to estimate driving distances.
This data provides associations with great analytics opportunities to drive meeting participation by aligning with customer travel decisions. For example, marketing campaigns can consider travel decision timing and target specific airport markets. Registration pricing deadlines for individual events can incorporate location analytics. Your association can also create opportunities for strong customer engagement through efforts to help groups of potential meeting attendees organize buses together or even carpool.
Deriving travel distance analytics from customer location data is an example of gaining value from data. This demonstrates why it is important to consider analytics when designing business processes throughout the organization and not just as part of analysis after the fact. Increased meeting attendance often contributes to other benefits including member retention, publication revenue, engagement, and membership recruitment.
Understanding the impact of customer travel scenarios is a great way to leverage association analytics to create the future and grow your number of happy conference attendees.
 

How to Choose the Right Visualization for your Association

Choosing the right data visualization is as important as choosing the right outfit to wear to an important meeting. Although your alma mater’s sweatshirt is perfect for the ball game, a suit and tie is more appropriate when trying to convince your board to increase your budget. Similarly, you are going to catch some flack for showing up to the game in a suit and tie! Choosing the right visualization for your audience is similar to choosing the right outfit for the function.
Did you know that the human brain is able to process images three times faster than text? From our primitive beginning, we’ve depended on our brain’s ability to detect subtle patterns and interpret meaning. So, how do you choose the right visualization? Let’s take a look at some common types of visualization and when they should be used to effectively communicate the story your data is telling.

Tabular

table

  • Best used when exact quantities of numbers must be known.
  • Numbers are presented in rows and columns and may contain summary information, such as averages or totals.
  • This format is NOT favorable to finding trends and comparing sets of data because it is hard to analyze sets and numbers and the presentation is cumbersome with larger data sets. It is estimated that the visual working memory has a capacity of about seven items. This means that you can store up to 7 bits of information (like numbers) in your brain’s “RAM” simultaneously. If you build a table with financial information for each month of the year for different areas of your association, it becomes difficult to find outliers or even the most profitable month.
  • This kind of visualization is likely what many association staff are accustomed to (think of all those excel spreadsheets floating around your office) so you may need to use a tabular format in conjunction with one of the other types listed below to convey the information.
  • A variation of the tabular chart is a highlight table. A highlight table applies color to the cell based on its value. The use of color can make outliers stand out more.

 Line Charts

line chart

  • Best used when trying to visualize continuous data over time.
  • Line charts use a common scale and are ideal for showing trends in data over time.
  • Example: membership or registrant counts throughout the year compared to previous years.
  • Trend lines and goal lines can also be added to compare actual counts with certain benchmarks.

Bar Charts

bar chart

  • Best used when showing comparisons between categories.
  • The bars are proportional to the values they represent and can be shown either horizontally or vertically. One axis of the chart shows the specific categories being compared, and the other axis represents discrete values.
  • Example: Bar charts can be helpful when looking at certain segments of your customers, registrants or members.
  • Goal lines can also be added to compare the actual counts with your benchmarks.
  • A variation of the bar chart is the stacked bar chart. This incorporates the use of color to visually show how certain segments add up to the total. In the example above, it’s easy to see that while 2010 Conference attendance counts are higher, the number of Paid attendees actually decreased from the previous year.
  • Another variation of the bar chart is called a bullet chart. This chart allows you to take a single measure (for example, revenue) and compare it to another measure (for example, revenue goal). It also can display percentiles.

bullet chart

Pie Charts

pie chart

  • Best used to compare parts to the whole.
  • Pie charts make it easy for an audience to understand the relative importance of values.
  • Using this format for more than 5 sections is not recommended as it can become difficult to compare the results. Too many sections make interpretation difficult because the difference between the sections can become too narrow to effectively interpret.
  • Often, even when wanting to compare parts to the whole, a bar chart can be more effective.

In addition to difference chart types, the use of filters and sorting is important to increase the association staff person’s ability to explore the data in more detail.
The goal of any visualization should be to communicate the information in the most concise and impactful way by using the appropriate visualizations for your data.  Effective visualizations enable your audience to quickly understand the story in the data and speeds the ability for association staff to reach key insights.

Data Discovery – a “Bicycle for the Mind”

Steve Jobs said that computers are like a “bicycle for the mind” allowing us to go further faster with less effort.  Within the field of business intelligence, I believe data discovery should be the same, amplifying our intelligence and creativity, allowing us to see patterns and insights which we can use to create the future of our organizations.

Data Visualization is Key

A key feature of data discovery is the ability to present data visually in order to quickly convey the information to our brains.  The goal is to stimulate our minds and then to allow us to interact and have a direct “conversation” with the data.  We “ask” follow-on questions by clicking through and moving elements on the screen, “re-presenting” the data many ways.
We are inspired when we see data visually and notice how it changes based upon the questions we ask.  We can see the power as we interact with the data and the analysis process becomes a natural extension of the activity of thought, enabling us to drill down, drill up, filter, bring in more data sources, or create multiple visual interpretations.  Interactivity supports visual thinking.  We can work with the data visually at the speed of thought, rather than writing queries to the databases.  We can learn and reach insights faster.  When we interact with the data visually, we are participating in the data discovery process and data becomes our partner in gleaning the information that becomes business intelligence.

How Does your Association Make Decisions Today?

An MIT study shows that most organizations still rely on instinct, intuition, politics and tradition.  But according to Harvard Business Review, top performing organizations are 5x more likely to use data to make decisions.  If we know that decisions based on data tend to be better decisions, why don’t we use it all the time?  Research says that only 15-20% of organizations believe they have access to the data they need in order to make good decisions.  Why is this?
IVeniceMonet once heard it explained this way – imagine you are Monet and you are explaining to a friend how to create a painting of Venice at twilight – like this one.  You might say, start out with a church steeple on the left sort of a brownish color and surround it by a golden light, with deep blue in the sky.  Oh yes, and maybe add some little purple/violet touches in the water to the right.  Do you think your friend would create a painting that looks like this one?  Probably not.  And yet that is what is happening daily between business and IT.  Business tries to describe in words the data that they want to see.  And IT tries their best to represent in reports and dashboards what they heard the business say.
The way it typically works is you have a question and ask IT to create a query or report and then when they deliver it, usually you have another question, or want another piece of data or have it grouped a different way.  Traditionally this change request has been considered a BAD thing!  And then you have to wait for IT to revise it.  At least that’s what used to happen to me when I worked for an association.  This process doesn’t really work well for either the IT department or the business users in the association trying to make decisions.

Change the Ending

Association staff want access to their data!  They don’t know their questions until they start to see the data.  Data discovery enables them to have a conversation with the data directly.  Data discovery allows all of us to shift perspective quickly and change the way we look at a problem.  We can cycle through different views deliberately trying to get the insight to pop out!
This is how we learn to understand the story our data is telling.  And once we understand the story, we can change the ending.

The Value of Data Discovery for Associations

The Magic Quadrant

In February 2013, Gartner Inc. released an important report entitled Magic Quadrant for Business Intelligence and Analytics Platforms which details the current state of the business intelligence (BI) market and evaluates the strengths and weaknesses of several of the top vendors. It’s interesting to note that in this report, Gartner emphasized the emergence of data discovery into the “mainstream business intelligence and analytics architecture”, something we have been highlighting at DSK Solutions for years.
What is data discovery? Associations and nonprofits are sitting on large quantities of data and don’t always realize the value of this powerful asset. The old days of spray and pray are gone. Remember direct mailing blasts? How ineffective! Associations were shooting in the dark and wasting resources that could have been allocated to better serve members. Unfortunately, some associations still rely on this marketing approach, but there is a better way: segmented target marketing based on data.
All of your data – including CRM or AMS (customer data), general ledger and budget (financial data), and Google Analytics (Web data), can be pooled together to illuminate your member strategy. Think of each data source as a small flashlight that reveals a little bit of the path in front of you. When your data sources are pulled together, the path becomes much clearer. When analyzing your data with data discovery, it becomes possible to discover things you did not know before.

Necessary Steps

Clients frequently come to us seeking guidance on how to begin the task of leveraging their data to inform better decision making. Before you can embark on data discovery, you have to do two things:

  • Ask the right questions.  What is meaningful to your organization? What are you trying to find out about your members, prospects, products, services and profit?
  • Clean your data. If your data is filled with duplicates, inaccuracies, inconsistencies and other forms of noise, your analysis will be flawed. Remember: Garbage in, garbage out.  Quality data as an input allows for accurate analysis as an output, which results in the improved ability to make good decisions.

These two steps form the foundation of the data discovery process. Almost always, the answers you derive from your data will lead to more questions. It’s okay to ask why. In fact, you should be asking why! Start by asking questions like these:

  • How dependent is your association on dues revenue?
  • What is the price elasticity of membership (Full Rate v.s. Discounted Rate)?
  • Which members are at risk for not renewing?
  • How far (in miles) will registrants travel to attend a meeting?
  • Which products or services have the highest profit?

Then start asking “why”.  Remember the idea of the Ishikawa (or fishbone) diagram?  It’s an easy and useful way to begin thinking in terms of cause and effect – you ask “why” 5 times, until you arrive at the root cause of an effect.  Now with interactive data discovery you ask these questions directly by interacting with the data in a visual way!  At DSK we describe it as “having a conversation with your data”.  For example, a certification department of an association wanted to look at their pass/fail ratio for an exam.  Using data discovery, they discovered many more college-aged people were registering and doing poorly than in the past.  In the process of asking “why” the failure rate was increasing, they discovered an opportunity not only to publish a new study guide, but also they located an entire new source of prospective members and created a new membership type to serve the college market.
Data discovery is an iterative process where you ask questions of your data in an interactive way. Drilling down both vertically and horizontally into your data allows you to not only answer the questions you know you have, but shed slight on those unknown-unknowns and enables associations to make better decisions.

 SS 2 Filtered on Type 2

How Can Associations Use SQL 2012 Data Quality Services (DQS)?

scuba-cat-6

This is what bad data is like…

Data Quality

How valuable is the Ford Pinto brand?  How about a patent for a cat scuba suit?  Like other intangible assets, the value of data is rooted in its quality.  Nonprofits have an even greater motivation to maintain the integrity of their information, because their organizational success is dependent on effectively communicating via membership data.  One of the data quality management tools we deploy at DSK Solutions includes SQL Server 2012 Data Quality Services.  DQS is especially valuable because it offers an efficient, semi-automated means for associations to create a data quality foundation to build their enterprise analytics.  By identifying data attributes and functional dependencies, DQS can effectively correct bad entries (cleanse) and eliminate duplicate records (match).
 

The Benefits of Data Quality Services

First, DQS has the powerful ability to automatically discover knowledge about your data.  Even with only a sample of the larger data set, DQS can identify inconsistent, incomplete, and invalid data.  For example, using Term-Based Relations (TBRs), DQS can identify strings that are inconsistent with the rest of the entries in that column.  So, if ninety-nine of your entries use “123 Oak St” as the street address and one uses “124 Oak St,” DQS will correct the odd entry to be consistent.  Additionally, developers can build domain rules that define the correct format or value.  For example, if a user email does not follow the pattern “something@somthing.com”, DQS can either mark the entry as invalid for later review or automatically update with the missing characters.
Next, DQS can check for consistencies throughout the record.  Using third party reference tools or user-defined rules, associations can validate that data is logical.  For example, if an entry lists a member city as “Chicago” and member state as “DC,” DQS can identify the inconsistency and either mark it as invalid or correct it to “IL.”  Another valuable feature is that users can develop matching rules to determine duplicate entries.  For example, if two records are 95% similar (again, based on user-defined rules), DQS can eliminate duplicate rows and consolidate the data into one unique entry.

2 types of data profiling

Two types of data profiling


Finally, DQS has an effective user-interface for controlling the discovery and cleansing process.  A DQS project steps through mapping the fields to rule domains, creating results that rate data on completeness and accuracy, and managing the project results.
Unfortunately, DQS is not a magic bullet.  There are some challenges to implementing DQS for large databases.  For example, the implementation of DQS for an AMS/CRM involves many important steps.  First, analysts, like DSK Solutions, consolidate problem data into a single table or view (DQS transformations work on one table, not entire databases).  Next, associations cleanse and match the data using a combination of DQS SSIS transformation and manual data verification.  Finally, data experts reintegrate the groomed data back into the original table structure (including considerations for timing, normalizing, and other SQL scripting).
DQS implementation

DQS Implementation for netFORUM


To conclude, it is important to note the DQS is knowledge-driven, meaning that it will take data-oriented managers to develop a strategy for a final asset.  As the non-profit world embraces data, DQS will play a pivotal role in creating the level of quality necessary to build an effective business intelligence infrastructure.