Archive for R

Tableau’s R Integration

Ever wondered how data scientists and data analysts use Tableau for predictive analytics? The ability to integrate R into Tableau is powerful functionality. For those familiar with using R, it can be tricky to get started. Here’s how to get started with the R Integration.

Step 1. Set Up R on Your Computer

First, you will need to have a user interface for R on your computer. We recommend R Studio Desktop.

Step 2. Install RServe Package

Next, you will need to install the RServe package. To do this, click on Packages -> Install. Then, type in RServe and it will find the package for you to install.
reserve

Step 3. Set Up Rserve Connection

Now you will need to run the following code to start up the Rserve connection:
library(Rserve)
Rserve()

Step 4. Set Up the External Connection in Tableau

There is one more thing you will need to do prior to writing in R in Tableau, but to do this you will need to switch over to Tableau. Tableau needs to have the external connection set-up in order to run R.  Go to the Help -> Settings and Performance -> Manage External Connections.
R Serve
 
In the pop-up, type in localhost for the Server name. Click on Test Connection to verify it is now connected.

Step 5. Start Using R Integration

At this point, we can now start taking advantage of the R integration.  The integration uses calculated fields to pass R code. There are four different types of calculations used in the R integration:

  1. SCRIPT_BOOL
  2. SCRIPT_INT
  3. SCRIPT_REAL
  4. SCRIPT_STR

Which one you use depends on what type of value you expect to get as a result of your R Code.  SCRIPT_BOOL would be used if you expected a TRUE/FALSE value returned.  SCRIPT_INT would be used if you expected to have an integer returned.  SCRIPT_REAL would be used if you expected a numeric value returned.  SCRIPT_STR would be used if you expected a string value to be returned.
The basic set-up of any R calculated field is as follows:
SCRIPT_REAL (
“R code”,
Tableau fields being passed in
)
The R code would be encased by quote marks and the parenthesis would encase both the R code and any Tableau measures/dimensions that will be used inside the R code. You can pass in multiple Tableau fields, you will just need to separate the field names using a comma.
Two important items to know is that inside the R code, you do not use the Tableau field name. You will use .arg and you cannot mix aggregate and non-aggregate arguments.  Here is an example below.
script_bool
Within my R code, I would need to refer to sum([Profit]) as .arg1 and ATTR([Department]) as .arg2.  Also, I made Department an Attribute in order to use both it and Profit.

Example of R and Tableau in Action

Now that you have the basics of the calculated field, here’s a real life example using the Superstore dataset. We’ll be looking at the correlation between Profit and Discount.  The returned value will be a numeric value, so I will be using SCRIPT_REAL.
script_real
Now, use that field to visualize the correlation coefficient between Customer Segment and Supplier. A value close to -1 indicates a negative linear relationship between the variables. A value to close +1 indicates a positive linear relationship between the variables.
matrix
This is just a starter in using the R integration. Hopefully, this will help you get started using this at your own association. If you need help developing predictive models or using R, contact us.

How to Harness the Power of Recommendation

Taking a customer-focused approach to data analytics helps provide optimal value, enhance engagement and understand the overall customer journey. Individuals’ actions provide valuable information that goes further than what is collected with surveys and online profiles. Additionally, actions uncover hidden patterns that can be used to build a recommendation system to guide customers toward other interests.
Here are the most common approaches to creating recommendation systems:

  • Collaborative filtering. This is based on data about similar users or similar items. It includes these techniques:
    • Item-based: Recommends items that are most similar to the user’s activity
    • User-based: Recommends items that are liked by similar users
  • Content-based filtering: Makes suggestions based on user profiles and similar item characteristics
  • Hybrid filtering: Combines different techniques

Recommendation systems results are similar to those on sites that suggest products and people, like Amazon and LinkedIn. Collaborative filtering leads to more of a self-learning process, since it is entirely based on actual activity and not data provided by users. There are scenarios where the others are more appropriate that we’ll address soon.
Similarity between users or items is measured by “distance” calculations from those long-ago geometry and trigonometry classes. You can use the results with a visualization tool such as Tableau, creating a similarity matrix and quickly identifying relationships.
correlation_matrix
It is sometimes helpful to group individuals and items into categories, which can be done by combining similarity scoring with data mining techniques like cluster analysis and decision trees.
Recommendation systems generally require data structured by columns instead of the row-based data that is best for interactive data discovery. Similar to text analytics, the items themselves — meetings, publications, donations, and content — represent large columns. It’s used by specialized R packages for the recommendation system features described in this book.
These algorithms generally need binary values, like a “yes” if someone purchased an item and “no” if he did not. But if users can rate items on a scale of 1-5, what does a score of 3 mean? Normalizing scores based on individual and overall ratings is a good way to answer this question.
The data requirements are really not as onerous as they may sound. Once data is in the right format for the R analysis tools, your imagination can take over to drive actionable association analytics. Content-based filtering works well for new users, and a hybrid approach can help prevent a “filter bubble” where some people get a too-narrow set of interests from similar recommendations.
Data from meeting registrations, membership history, donations, publication purchases, content interaction, web navigation, survey responses and profile characteristics can be used to guide association customers. Additionally recommendations can bring people with common interests together. This new insight can be used to enhance all customer interactions, ranging from email marketing to dynamic website presentation to event sessions.

A Beginner’s Guide to Analysis with R

Many associations want to do more advanced analytics projects using R — a programming language used for statistics — but are not sure how to start.
Before starting this kind of analysis, you need to define the goal. It is best to make this a S.M.A.R.T. goal, which means it is Specific, Measurable, Attainable, Relevant, and Time-Bound.
define-your-goalThe detailed S.M.A.R.T. goal will become your dependent variable, which is what you are trying to measure in your analysis. Here’s what a transformed basic goal looks like:
Basic goal: Increase membership retention.
S.M.A.R.T. goal: Determine what program changes will increase next year’s membership retention for first-year members by 10 percent, compared to the two previous years.
After defining the dependent variable, you need to determine the independent variables you are measuring. These are the factors you think may be influencing whether you reach your detailed goal. In this type of analysis, you will have multiple independent variables. In fact, the more independent variables, the better.
As you analyze the data, you will be able to narrow down the independent variables to those that have the highest impact on your goal. For example:

  • Dependent variable
    • Renewal (Did the member renew or not?)
  • Independent variables
    • Participation in chapter events
    • Is the member at a university
    • Participation in committees
    • Gender
    • Age
    • Workplace type and size
    • Location
    • Number and type of events attended

After determining your goal and what may be influencing it, you need to figure out what pool of data you will examine to look for answers. For our example, we would need to start with first-year members who could renew.
However, you may need to filter your data more. For example, if you know that there was a huge change in the renewal process in middle of the year, you may want to remove people who joined before then. Or maybe you have free memberships that automatically renewed each year, so these people should not be included in your pool.
Later in the blog, we will talk about preparing your data and how to run and interpret descriptive statistics in R.