Archive for Big Data

How Associations Are Successfully Using Artificial Intelligence

With AI no longer science fiction, associations are using advanced technologies to convert mountains of data into actionable insights.

At the recent EMERGENT event, hosted by Association Trends, we had the opportunity to jointly present case studies with ASAE’s Senior Director of Business Analytics, Christin Berry.

These success stories include how ASAE has:

Combined artificial intelligence and text analytics to enhance customer engagement, understand evolving trends, and improve product offerings

DOUBLED online engagement with unique open and the click-to-open rates using AI to personalize newsletters

Reduced the need for surveys, identified what’s trending, and measured through Community Exploration

Leveraged Expertise Search and Matching to better identify experts and bring people with similar interests together

I’m Matt Lesnak, VP of Product Development & Technology at Association Analytics and I hope to demystify these emerging technologies to jumpstart your endeavors in association innovation.

Text and Other Analytics

Associations turn to analytics and visual discovery for answers to common questions including:

  • How many members to we have for each member type?
  • How many weeks are people registering in advance of the annual meeting?
  • How much are sales this year for the top products?

Questions about text content can be very different, and less specific.  For example:

  • What is it about?
  • What are the key terms?
  • How can I categorize the content?
  • Who and where is it about?
  • How is it like other content?
  • How is the writer feeling?

It is widely estimated that 70% of analytics effort is spent on data wrangling.

This high proportion is no different for text analytics but can be well worth the effort. Text analytics involves unique challenges including:

  • Term ambiguity: Bank of a river vs. a bank with money vs. an airplane movement
  • Equivalent terms: Eat vs. ate, run vs. running
  • High volume: Rapidly growing social data
  • Different structure: Doesn’t really have rows, columns, and measure
  • Significant data wrangling: Must be transformed into usable format

Like the ever-growing data from association source systems that might flow to data warehouse, text content of interest might include community discussions, articles or other publications/books, session/speaker proposals, journal submissions, and voice calls or messages.

Possible uses include enhancing your content strategy, providing customized resources, extracting trending topics for CEOs, and identifying region-specific challenges.

Learn More

 

Personalized Newsletter

ASAE is working with rasa.io to automatically identify topics of newsletter content as part of a pilot that significantly improved user engagement.  ASAE and rasa.io first tracked newsletters interactions over time to understand individual preferences and trending topics.  Individuals then received personalized newsletters based on demonstrated preferences.

The effort had been very successful, as unique open and the click-to-open rates have more than doubled for the personalized newsletters.

Underlying technology includes Google, IBM Watson, and Amazon Web Services; combined with other machine learning tools developed by rasa.io.


Community Exploration

ASAE leverages a near-real-time integration with over 10 million community data points combined with enterprise data warehouse to analyze over 50,000 pieces of discussion content and over 50,000 site searches.  The integration is offered as part of the Association Analytics Acumen product through a partnership with Higher Logic.

Information extracted includes named entities, key phrases, term relevancy, and sentiment analysis.  This capability provides several impactful benefits.

Quick wins:

  • Visualize search terms
  • What’s trending
  • Staff and volunteer use
  • Reduce need for surveys

Longer-term opportunities:

  • Aboutness of posts as content strategy
  • Identifying key expertise areas
  • Connecting like-minded individuals

Underlying technology includes AWS Comprehend, Python, and Hadoop with Mahout.

Learn More


Expertise Search and Matching

Another application of text analytics that we’ve implemented involves enabling associations to better identify experts and bring together people with similar interests.  In addition to structured data from multiple sources, text from content including meeting abstracts and paper manuscripts provides insights into potential individual interests and expertise.

This incorporates data extracted from content using approaches including content similarity, term relevancy, validation of selected tags, and identifying potential collaborators.

Underlying technology includes Python and Hadoop with Mahout.


Approaches and Technology

We’re written extensively about the importance of transforming data into a format optimized for analytics, such as a dimensional data model implemented as a date warehouse.

Thinking back to the common association questions involving membership, event registration, and product sales; these are based on discrete data such as member type, event, and day.

Text data is structured for analysis using a different approach, but fundamentally similar as each term is a field instead of, for example, a member type table field.

Picture a matrix with each document as a row and each term as a column.

This is referred to as “vector space representation”.  With thousands of commonly used words in the English language, that can be a big matrix.  Fortunately, we have ways to reduce this size and complexity.

First, some basic text preparation:

  • Tokenization – splitting into words and sentences
  • Stop Word Removal – removing words such as “a”, “and”, “the”
  • Stemming – reduction to root word
  • Lemmatization – morphological analysis to reduce words
  • Spelling Correction – like common spell-checkers

Another classic approach is known as “Term Frequency–Inverse Document Frequency (TF-IDF)”.  We use TF-IDF to reduce the data to include the most important terms using the calculated scores.  TF-IDF is different from many other techniques as it considers the entire population of potential content as opposed to isolated individual instances.

It is widely estimated that 70% of analytics effort is spent on data wrangling.  This high proportion is no different for text analytics but can be well worth the effort.

Other key foundational processing:

  • Part-of-Speech Tagging: Noun, verb, adjective
  • Named Entity Recognition: Person, place, organization
  • Structure Parsing: Sentence component relationships
  • Synonym Assignment: Discrete list of synonyms
  • Word Embedding: Words converted to numbers

The use of Word Embedding, also referred to as Word Vectors is particularly interesting.  For example, the word embedding similarity of “question” and “answer” is over 0.93.  This isn’t necessarily intuitive and it is not feasible to manually maintain rules for different term combinations.

A team of researchers at good created a group of models known as Word2vec that is implemented in development languages including Python, Java, and C.

Here are common analysis techniques:

  • Text Classification: Assignment to pre-defined groups, that generally requires a set of classified content
  • Topic Modeling: Derives topics from text content
  • Text Clustering: Separating content into similar groups
  • Sentiment Analysis: Categorizing opinions with measures for positive, negative, and neutral


Finding and Measuring Results

With traditional data queries and interactive visualizations, we generally specify the data we want by selecting values, numeric ranges, or portions of strings.  This is very binary – either the data matches the criteria, or it does not.

We filter and curate text using similarity measures that estimate “distance” between text content.  Examples include point-based Euclidean Distance, Vector-based Cosine Distance, and set-based Jaccard Similarity.

Once we identify desired content, how do we measure overall results?  This is referred as relevance and is made up of measures known as precision and recall.  Precision is the fraction of relevant instances among the retrieved instances, and recall is the fraction of relevant instances that have been retrieved over the total amount of relevant instances.  The balance between these measured is based on a tradeoff between ensuring all content is included and only including content of interest.  This should be driven by the business scenario.

This overall approach to text analytics is like that used for recommendation engines based on collaborative filtering driven by preferences of “similar” users and “similar” products.


APIs to the Rescue

Fortunately, there are web-based Application Programming Interfaces (APIs) that we’ve used to help you get started.  Here are online instances from Amazon and IBM for interactive experimenting:

This is a lot of information, but the takeaways are they there are big opportunities for associations to mine their trove of text data and it is easy to get started using web-based APIs to rapidly provide valuable insights.

Learn More

 

Matt Lesnak, VP of Product Development & Technology
Association Analytics

Columns Aren’t Just for Advice and Holding Up Buildings: They Can also Help Your Analytics

Traditional databases, like an Association Management Systems, are designed to handle frequent transactions and store data. This is very different from dimensional data models that are specifically designed for analysis while aligning with the analytical workflow.
columns_rows

Data, Files, and Blocks

“Cloud computing” is a bit of a misnomer. Data is still stored in files made up of blocks on a computer.
To increase efficiency, databases store entire table rows of data in the same block. For example, all of a customer’s attributes such as name, address, member type, and previous event attendance are stored in a single block for fast retrieval. In this scenario, each “row” represents an individual customer while each “column” represents their different attributes.
If you think about it, most analytics involves aggregating data such as sums, counts, and averages that span many rows. Your exploration might eventually lead you to detailed individual records, but it will likely take several steps to identify these records. This means that if you looking at say, average event revenue, the database will need to retrieve entire records from several blocks just to get the revenue field for the eventual calculation. Imagine having to individually navigate many shelves from left to right when you could just quickly create a stack of what you need!

Columns and Rows

Similar to the goal of a dimensional data model, database technologies can further optimize analytics by primarily storing data in columns instead of rows. For this scenario involving average event revenue, the database simply accesses a single block with all of the data for the revenue column across all rows.
These columnar databases significantly improve performance and storage while providing several other key benefits.

  • Data compression: Since columns are generally the same data type, compression methods best suited for the type of data can be applied to reduce needed storage. In addition, aggregations can sometimes be performed directly on compressed data.
  • Advanced analytics: Many of the algorithms underlying advanced analytics leverage vector and matrix structures that are easily populated by single columns of data.
  • Change tracking: Some technologies track changes at the column level, so you can maintain granular history without having to unnecessarily repeat other data.
  • Sparse data storage: For columns that maintain valuable data that is infrequently populated such as specify product purchases; traditional database technologies need to maintain “NULL” values while column-based databases avoid this storage.
  • Efficient distributed processing: Similar to managing file blocks, column-based technologies can distribute data across machines based on column to rapidly process data in parallel.

Potential Options

Examples of columnar database technologies include Apache HBase, Google BigQuery, and Amazon Redshift. HBase is part of the open-source Hadoop ecosystem, BigQuery is a cloud-based service based on technology that served as a precursor to Hadoop, and Amazon Redshift is a cloud-based service that is part of the popular Amazon Web Services (AWS) offering.
Speaking of holding up buildings, our friends at the National Council of Architectural Registration Boards created some great visualizations based on Amazon Redshift using Tableau Public. Analytics tools such as Tableau and Microsoft Power BI offer native connectors to Amazon Redshift and other big data technologies.  These technologies are another way that you can enhance your analytics using data and tools that you already have with cloud services to rapidly make data-guided decisions for your association.

Words with (Association) Friends

Associations define the future through the exploration, analysis, and visualization of data. This generally involves using existing data to consistently describe key business events like event attendance, member engagement, training course popularity and website traffic. We can tell great stories with this data, but actual language can be the best form of communication. There are a lot of opportunities to use text analytics to help associations make even more confident data-guided decisions.

Taming Big Data

Text analytics is often viewed as within the realm of big data. This makes sense as it generally aligns with the volume, velocity, and variety characteristics commonly used to define big data.
Like other forms of data, text can be used to discover structure, meaning and relationships and provide context to other values. In the case of text, the data shows if a word is in content such as documents, comments and social media posts.
Picture a giant spreadsheet with one column for each of the nearly 10,000 commonly used words in the English language. That’s quite a bit of data for even the savviest Excel user or AMS application. Measures represented by the intersections between the rows and columns might include counts of words in documents and how close words are to one another.
Fortunately, several proven methods exist to make text data much more manageable. They include:

  • Removing “stop words” such as “a”, “and”, and “the” that are not likely to be studied.
  • Using frequency thresholds that include counts and how unique words are in the content.
  • Using stemming to group similar words with different suffixes, like “recommend,” “recommended” and “recommending.”
  • Applying the statistical technique of factor analysis that groups words by ideas and themes.

anayltics-word-cloud
We don’t use these techniques just to address the volume challenges posed by big data. More concise data significantly improves the value of all advanced analytics.

Context is key

Text analytics data is much more valuable in conjunction with other internal and external information like index terms in documents, tags assigned to social content, survey questions accompanying free-form text comments, and characteristics of individuals generating content. Seemingly basic categorizations – like comments tagged as high quality by customers or those made by individuals with a high level of engagement – can significantly impact the analysis and help perform predictive analytics against new data.
You can also provide meaning to text through ontologies, which assign relationships similar to association business processes. For example, an “attendee” is associated with an “event.” They can be defined as part of the text analytics process, or obtained from third-party sources.

The usual models

Once our text is structured in a usable and manageable way, we can apply advanced analytics and statistical methods. We use techniques tailored to this form of data, including categorizing and grouping documents and words. These include:

  • Clustering – Grouping things determined to be similar, like words that often occur together or have similar meanings.
  • Classification Trees – Assigning documents to a categories based on hierarchical rules. A document with the word “event” might be assigned a more detailed category, like “Detroit” or “annual conference.”
  • Graphs – Showing how variables are interconnected and influence one another. These are part of a broader category and are better used for scenarios such as modeling social networks.

There are two ways to categorize these approaches. A supervised approach means the goal is known, like assigning documents to a list of topics. In an unsupervised study, techniques like clustering are used to find similar documents – but without first identifying specific criteria. As with other types of advanced analytics, the modeling process is iterative and requires some manual validation.

What are they saying?

Sentiment analysis of social network content, or looking at positive and negative feelings, is a popular goal of text analytics. Deriving sentiment is more challenging than other applications of text analytics because of nuances in language and difficultly in understanding tone. Many suggested word lists are available to assist.
These two sentences both could indicate a person’s opinion about an event:

  • “I really got a lot a great information from this event!”
  • “There was much more great information presented at the prior events.”

Another potential pitfall of sentiment analysis is from whom the data comes. Are individuals with negative experiences more likely to voice their opinion that those with positive feedback?
Sentiment analysis underscores the importance of making data-guided decisions, as observations should be investigated and measured over time before drawing definitive conclusions.

Applications for Associations

Associations can gain valuable information from a variety of common business scenarios.

  • Social media and collaborative platforms – Assigning categories and other similar comments.
  • Event surveys – Understanding specific feedback beyond discrete questions.
  • Meeting abstracts – Automatically assigning topics.
  • Document similarity – Recommending similar documents and identifying expertise.
  • Customer bios – Identifying individual areas of expertise.
  • Customer service contacts – Interpreting the reason for the contact.

A range of enterprise and other software tools, including the popular (and free) R programming language, are available to implement text analytics.  You can also visualize the results of text analytics using leading tools such as Tableau to create visualizations such as heat maps, document clusters, and word clouds.
Your association analytics can include true customer conversations and engagement detail available from text using these approaches and tools that are part of our proven 5 step methodology.

Moving Into a Data Guided Culture Means Abandoning the 35mm Camera Mentality

Some of you might have seen our CEO, Debbie King, speaking at ASAE Annual in Detroit this week.  If not you missed an inspiring presentation about building data analytics into your strategic plans and investing in a data guided culture.  Debbie was joined by Frank Krause, Chief Operating Officer of the American Geophysical Union (AGU), who offered brilliant insight and practical advice on making sure that association analytics are informative and actionable, not merely interesting.  At one point the evolution from static reports to data visualizations is compared to advances from film to digital photography.  This 35mm filmis a fitting analogy that makes the benefits hard to deny.
Think back to when you packed a bag for vacation and you took a 35 mm camera and 5 rolls of film.  Wait, that might not be enough, better bring 7.  You had to stop and think because you knew for a fact every roll is 36 shots, maybe 37 if you get the first 2 threads to catch.  Then you get to the end of the trip and you’ve loaded up the last roll and you want to make sure those shots count.  As a result, you realize you could miss a priceless moment.  So you consider the high cost of stopping at a local shop to pick up some more rolls of film, but then you think of the cost to develop them later. You decide to take a chance that you’ll remember to capture these special moments.
Believe it or not, there are a lot of similarities between film and having a traditional static report built for your association’s analysis needs.  In both cases you have the initial investment cost, time waiting on either IT for your report or the drugstore to process the film, and then determining what you’ll do with the output.  You can create a beautiful picture album to share with all your friends or you can toss the pictures in a shoebox.  The report will have actionable information like showing you event registration numbers are down over the past two years, but will it be too late by the time you discover the information?
And this is where we come to the digital era of instant gratification and why it can be advantageous.  First the digital camera, then nearly equal or better power in your smart phone.  Do you think twice before you turn on burst mode and capture 16 pictures in 3 seconds?  Get two or three bad ones, simply delete them and take more.  There is no upfront investment or limit to worry about and you could go so far as to say it is iterative; this is like an agile data analytics initiative.  You are able to build and rebuild a data visualization on the fly.  Sure, you can have instant access to ask and answer new business questions.  You can even test and experiment on the fly – something you’d never try with various lighting and a film camera.  Build and edit data visualization at at your fingertips.
The bottom line is that everyone wants and expects answers quickly. This allows us to maintain our connection to ideas and helps focus our thoughts to ask better questions.  So it’s time to put away your 35 mm camera and shift your focus to a digital data guided culture.

It Doesn’t Have to be Big Data to be Important!

Everyone is talking about BIG DATA!  It sounds so important and exciting that the phrase is now used commonly. From small associations to large businesses, even socially and on Prime Time TV advertising we hear the buzzwords!  These are very exciting times for data analysts because on the continuum of data analytics, big data is very advanced in many ways but basic in others.  To learn more about the definition of Big Data – see our post: What’s the Big Deal about Big Data for Associations?
For the data analyst, working with Big Data involves the following:

  • Business intelligence (BI) and business analytics (BA) involve important processes for analyzing data across all functional areas in order to guide:
    • Decision-making
    • Strategy development
    • New opportunity creation
  • Business intelligence (BI) involves turning data into information and then using dashboards and scorecards to present.
  • Business analytics (BA) creates value and transforms information into knowledge using statistical methods for explanatory and predictive modeling.
  • Turning data into information requires:
    • Technical infrastructure
    • Data collection tools
    • Mining and analytical software
    • Data visualization

Where’s the data in an association?

  • In CRM, AMS, LMS and other software systems
  • In spreadsheets
  • In the cloud
  • On the Internet
  • In data warehouses
  • Data aggregators and third party vendors

Analytical Process

  • The project and amount of data available determine the statistical methods used:
    • “Big Data” requires a test and learn protocol and this can be expensive. Many different techniques exist depending on the type of data and desired outcome.  Often it’s best to start with your “small data” and augment it with big data, such as census or social media data.
    • Small data should be collected from association systems and often individual spreadsheets.
    • Data should be cleaned and merged at some level before beginning the analytical process.
    • Purchased data needs from data aggregators and other third party systems should be integrated into a working data file or data mart.

Associations can be data-guided and not be using big data yet.  Many associations are starting with basic aspects of data management, such as cleaning, verifying or reconciling data across the organization.

What’s the Big Deal about Big Data for Associations?

The term “Big Data” has become the latest media darling and with all the hype about it, it’s no wonder people are confused about what it means.  When I am asked to give presentations about big data, what I find is the most common misconception is that “big data” means “all data”.
The data science purist describes “big data” as consisting of both structured data (such as relational databases) and unstructured data (text, video, audio files, etc.).  They would describe big data as data generated at a high velocity, amounting to an enormous volume, which is stored in a variety of formats.  IBM is famous for adding a 4th “v” to the description, which is data whose veracity is in question.
LinkedIn-mapBut the most important thing to understand about “big data” is that in the future it can and will be used to answer business questions for associations based on the actual behavior of individuals (and organizations), in a way that surveys, focus groups or traditional reporting never could.
New streams of data are being generated at an unprecedented rate and existing structured data sets can be combined in novel ways with Big Data analytics to uncover connections that will enable associations to not only remain viable but to thrive in the coming decades.  This is the definition of Big Data that matters more than the three “Vs” which are just constructs.
One thing is I know is that data comes in a lot faster than analysis!  And with Big Data it is essential to be able to visualize the patterns and extract the meaning, because without visualizations it is impossible to absorb the raw data and draw meaningful conclusions.  We find Tableau does a great job handling Big Data visualizations.
The tipping point for Big Data analytics for associations is when business users to have access to tools that let them combine what they are good at—asking the right questions and interpreting the results—with what machines are good at: computation, analysis, and statistics using large datasets.
Big Data is changing everything we thought we knew about information in the way the internet and mobile devices are changing everything we thought we knew about how people work, play and live.