Using R and JavaScript - Displayr https://www.displayr.com/category/r/ Displayr is the only BI tool for survey data. Mon, 28 Jun 2021 03:26:01 +0000 en-US hourly 1 https://wordpress.org/?v=6.4.2 https://www.displayr.com/wp-content/uploads/2023/10/cropped-Displayr-Favicon-Dark-Bluev2-32x32.png Using R and JavaScript - Displayr https://www.displayr.com/category/r/ 32 32 Using R in Displayr Video Series https://www.displayr.com/using-r-in-displayr-video-series/?utm_medium=Feed&utm_source=Syndication https://www.displayr.com/using-r-in-displayr-video-series/#respond Mon, 08 Jun 2020 22:10:20 +0000 https://www.displayr.com/?p=23882 ...]]> R is one of the most powerful coding languages for analyzing data. It's used by millions of people across the globe, and is free to boot. Here at Displayr, we've seamlessly integrated R with our software to enable those with special custom requirements or analysis needs the ability to implement those alongside our standard features. What you now have is a one stop shop for point and click features as well as more advanced custom coding. For those who have never done coding before, or may not be familiar with R coding, getting up to speed may feel like a daunting task. For this reason, we've created a series of videos to introduce you to coding in R and walk through practical examples of how to use R to further customize your reporting and dashboards.

Links to the videos and the documents they review are below. If you're using our sister software Q, you can download the QPack version to follow along. They generally start with the basics and move onto the more advanced.

Name Content Link to Displayr Document Link to Video
Overview
  • How does R work with Displayr
  • How do I get help with R?
  • Other tips?
Displayr doc

QPack

 
Primer
  • Referencing Data
  • Data Types
  • Data Structures
  • Functions
Same document as Overview above  
Simple Tables
  • Table subsetting/indexing
  • Combining tables
  • Table calculations
  • Sorting/ordering
  • Renaming rows/columns
  • Blanking cells with small values
  • Removing rows/cols with small samples
  • Renaming things and formatting
  • Building a brand funnel
Displayr doc

QPack

 
R Variables
  • Creating a combo box filter simple & advanced
  • Filtering and deleting observations
  • Banding and re-categorizing variables
  • Checking if "any of" some variables have a particular value
  • Splitting and combining text strings
  • Using apply() to apply an action to each row or column
Displayr doc

QPack

 
Custom R Outputs
  • Exploring outputs
  • Error handling
  • Updating/customizing text
  • Logos and links
Displayr doc

QPack

 
Advanced Tables
  • Working with nested banners
  • Merging tables that don't match
  • Customizing cell formatting
  • Adding spans
  • Adding statistical test results
Displayr doc

QPack

 
Troubleshooting
  • Tips
  • Useful functions
  • Common errors/examples
Displayr doc

QPack

 

 

]]>
https://www.displayr.com/using-r-in-displayr-video-series/feed/ 0
How to use the Displayr Cloud Drive https://www.displayr.com/how-to-use-the-displayr-cloud-drive/?utm_medium=Feed&utm_source=Syndication https://www.displayr.com/how-to-use-the-displayr-cloud-drive/#respond Fri, 08 May 2020 05:36:28 +0000 https://www.displayr.com/?p=23271 ...]]> What can be saved to the Displayr Cloud Drive?

Displayr's Cloud Drive can be used for saving a variety of files. These files include images, company logos and client data sets. You can even share raw data files or R tables and charts between your company documents.

Once the Cloud Drive has been enabled on your account (contact support@displayr.com to enquire), you can access it from your document via your Profile icon > Displayr cloud drive.

You will be presented with a table that lists all the files that are stored within your company's cloud drive. This includes auditable information such as when it was last modified and what company document last updated or called it.

Saving to the Cloud Drive

To upload files to the Cloud Drive, simply click the Upload button and select the files you wish to upload. In this example, we have uploaded a logo and a data set that we want to use in our document.

Loading from the Cloud Drive

Open a document, then load the data by clicking New Data Set > Displayr Cloud Drive and choosing the data file we uploaded.

At the top of the Select a data file screen, there is a setting for automatically refreshing the data set. This will allow you to update the data file and, in turn, the document's data will update automatically. In this example, the automatic refresh interval has been set to 12 hours.

You will now be able to see the data file variables under Data Sets in the bottom left corner.

If you wish to manually update the data ahead of the automatic refresh, simply click the data set folder and then click Update > Displayr Cloud Drive > OK.

Next, we can add the saved logo to the first page via Insert > Image > Displayr Cloud Drive and choose the previously loaded image file.

Sharing R outputs between documents

If you have any tables or visualizations (created via Insert > Visualization) which you want to share with other documents, you can do this easily by selecting the output, clicking Export > Displayr Cloud Drive, naming your file and then pressing Export.

Tables are saved as R files (*.rds) and visualizations are saved as R-rendered HTML widgets without the underlying data included.

Here, we have saved the income table as an R output called test.

Connecting to the Cloud Drive using R code

An alternative method of exporting to the Cloud Drive is to use R code directly in an R output (via Insert > R Output). Here, we will use the QSaveData function from the flipAPI package:

library(flipAPI)
QSaveData(table.Income,"test.rds")

If you wish to then import this, or any other Cloud Drive file, into any document in your account, you can use the corresponding QLoadData function:

QLoadData("test.rds")

This process can also work with .csv files. You just need to specify the correct extension in the function parameter. For further information please see the Displayr Cloud Drive R API documentation.

In order to create a workflow that automatically imports and exports updated files, you can additionally add a flipTime function such as UpdateEvery or UpdateAt to set a timer. Below we have set it to run every 3 hours:

library(flipTime)
UpdateEvery(3, "hours", options = "wakeup")

You can find further information on automatic updating here.

 

]]>
https://www.displayr.com/how-to-use-the-displayr-cloud-drive/feed/ 0
How to Customize the Sample Size Description Widget https://www.displayr.com/how-to-customize-the-sample-size-description-widget/?utm_medium=Feed&utm_source=Syndication https://www.displayr.com/how-to-customize-the-sample-size-description-widget/#respond Tue, 18 Feb 2020 20:57:22 +0000 https://www.displayr.com/?p=20557 ...]]> Displayr has a built-in Sample Size Description widget (under Insert > More > Data > Sample Size Description) that you can use to describe the data being displayed, as outlined in this post. But what if the default text isn't quite what you want? This post explains how to easily customize the text to your liking. You will see how to change, reorder, and remove elements of the description, as well as modify it to reference filters selected in a combo box.

Breaking down the fields

The text in the sample size description output has 5 parts: Initial text, Sample description, Sample size description, Sample size, and Final text.

You can use the Object Inspector (as seen below) to customize some of the aspects of the Sample Size Description output: Initial text, Sample size description, and Final text. If your data is not filtered, the text in Total sample description field will be shown for Sample description.

Other bits of the output, Sample description and Sample size, are determined in the underlying R code of the output. The Sample size field displays the number of cases specified by the Complete data variable, including any filters applied. The Sample description field displays the name or names of any filters applied or the text in the Total sample description field.

Changing the basic fields in the Object Inspector allows for some customization. For deeper customization, you need to edit the R code.

Deeper Customization with R

Basic edits to the sample size description text output using R aren't as difficult as they sound. Changes such as reordering or removing the fields involves editing a single line of code.

To view the R code behind the sample size description widget, go to Properties > R CODE in the Object Inspector. The important line of code that controls the output text is the last line: paste0(formInitial, base, formN, n, formFinal). Each of the fields inside the parentheses in the code corresponds to one of the text fields of the sample size description.

  1. Reordering the text: To swap the order of the fields, change the order of the text inside the paste0() function, such as: paste0(formN, n, formInitial, base, formFinal)
  2. Removing text: To remove a field, delete it from inside the parentheses. The following example removes "Base: total sample;" from the output: paste0(formN, n, formFinal)
  3. Adding custom text: To add custom text to the description, add it to the function in quotation marks, like so: paste0(formInitial, n, " respondents ", base).

Advanced Customization: Dynamic updating with Combo or List Boxes

If I have an R variable filter that is connected to a combo box or list box, the Sample Size Description output updates as the filters change, but the underlying R code needs further editing in order to show the actual selections in the control. To learn how to connect a filter to a combo or list box, please see this blog post. Like other charts or visualizations, the Sample Size Description must first be connected to the filer used in the combo box. In the Object Inspector of the Sample Size Description, select same filter variable used in the Combo Box in Inputs > FILTERS & WEIGHT > Filter(s).

 

 

After selecting the filter variable, the text updates to reflect the new Sample size. But, it only displays the name of the combo box filter - Gender - and not if Male or Female is selected. To enable that, we need to edit the R code.

To make the sample description field react to the combo box selection, change the first line of code from base <- attr(QFilter, "label") to base <- toString(Combo.box) where "Combo.box" is the name of your combo box or control used with your filter variable. That small change means the sample description field will update as the selections in the combo box change. The text will update to include all selections in the combo box, so if the combo box allows multiple selections, the text may become rather large.

If all the filters are selected, the output will show all of the included categories and not the text in the Total sample description field. To do that, we need to make a few more edits to the R code. Replace base <- attr(QFilter, "label") with the code below, changing Combo.box to the name of your combo box control and d3 to the name of the variable set the combo box is based on.

available.items <- nlevels(d3)
selected.items <- length(Combo.box)
all.selected <- ifelse(selected.items == available.items, TRUE, FALSE)
base <- toString(Combo.box)

If you are using multiple response data in your filter, use the R code below. It counts the possible selections in the question and excludes the NET.

available.items <- ncol(subset(d3, select=-c(NET)))
selected.items <- length(Combo.box)
all.selected <- ifelse(selected.items == available.items, TRUE, FALSE)
base <- toString(Combo.box)

This code compares the number of possible selections from the underlying question to the number of selections in the combo box. If they're the same, it stores TRUE in all.selected. Then, replace base in the final paste0() line with ifelse(all.selected, formTotalSample, base).

Rather than showing just the selections in the combo box, if the number of selections matches the number of variables in the underlying question, the widget will display the text in the Total sample description field just as it does when not using a filter connected to a combo box.

]]>
https://www.displayr.com/how-to-customize-the-sample-size-description-widget/feed/ 0
Computing Willingness-To-Pay (WTP) in Displayr https://www.displayr.com/computing-willingness-to-pay-wtp-in-displayr/?utm_medium=Feed&utm_source=Syndication https://www.displayr.com/computing-willingness-to-pay-wtp-in-displayr/#respond Thu, 30 Jan 2020 15:06:15 +0000 https://www.displayr.com/?p=21705 ...]]> This post explains the basics of computing willingness-to-pay (WTP) for product features in Displayr.

Step 1: Estimate a choice model with a numeric price attribute

The starting point is to estimate a choice model (Displayr: Insert > More > Conjoint/Choice Modeling > Hierarchical Bayes; Q: Automate > Browse Online Library > Conjoint/Choice Modeling > Hierarchical Bayes). When doing this, the price attribute needs to be set up as a numeric attribute. If you haven't done this before, please be aware that the scale of the price attribute is not readily comparable to the other attributes. In the example below, for example, note that the price attribute seems to have very little variability compared to the other attributes. This is because the distribution of a numeric variable is for its coefficient (don't be concerned if you don't understand this; the key bit to appreciate is that it is OK that its distribution appears much smaller).

Step 2: Save the utilities

Add new variables to the data set using Insert > More > Conjoint/Choice Modeling > Save Variables(s) > Individual-level Coefficients ( in Q: Automate > Browse Online Library > Conjoint/Choice Modeling > Save Variables(s) > Individual-level Coefficients).

Step 3: Modify the R code of the utilities

When you click on one of the variables that is created in step 2, you can see the underlying R Code, and it will look something like this (in Q,right-click on the variable and select Edit R Variable):

input.choicemodel = choice.model
if (!is.null(input.choicemodel$simulated.respondent.parameters)) stop()
flipChoice::RespondentParameters(input.choicemodel)

It can be changed to compute WTP with a simple modification of the last line and addition of a fourth line:

input.choicemodel = choice.model
if (!is.null(input.choicemodel$simulated.respondent.parameters)) stop()
x = flipChoice::RespondentParameters(input.choicemodel)
sweep(x, 1, -x[, "Price"], "/")

Step 4: Creating tables or visualizations

To create a table showing the average WTP for each attribute level, drag the variable set onto a page, and then using STATISTICS > Cells select Median and remove Average (as the mean can be a bit misleading with WTP data). Then, hide the Price attribute by selecting the row and using Data Manipulation > Hide in the ribbon. An example is shown below. You can then plot this if you so wish.

]]>
https://www.displayr.com/computing-willingness-to-pay-wtp-in-displayr/feed/ 0
Creating and Working with JavaScript Variables https://www.displayr.com/creating-and-working-with-javascript-variables/?utm_medium=Feed&utm_source=Syndication https://www.displayr.com/creating-and-working-with-javascript-variables/#respond Mon, 16 Dec 2019 16:23:31 +0000 https://www.displayr.com/?p=21118 ...]]> It’s easier than you think to use basic JavaScript to create new variables! The following worked example in Displayr shows you how to combine, split, and transform data into new variables using conditional ‘if’ statements and Boolean logic (“and”, “or”).

When to create JavaScript variables instead of R variables in Displayr

Just like R variables, JavaScript variables are updated automatically in the project, and you can also easily go back to edit variables if you want to tweak the code. There are two key differences to keep in mind between JavaScript and R. JavaScript code is ran through your browser and was originally designed as a scripting language for websites. R code requires data to be packaged up and processed on our R servers and was specifically designed to do basic as well as advanced analysis. While usually there isn't a noticeable difference between creating a basic JavaScript variable versus R, if your data is quite large (think thousands of cases) and you're doing a simple manipulation on it, it may be faster to use JavaScript. If your new variable requires more advanced calculations and coding, it may be easier - or only possible - to create it using R.

How to create JavaScript variables in Displayr

Create a JavaScript variable via the Ribbon in Displayr (Insert > JavaScript > Numeric Variable). You have a choice of creating a Numeric or Text variable. For simplicity, let’s consider only numeric variables for now.

A new variable will appear in the Data Sets panel, and the JavaScript expression can be entered into the JAVASCRIPT CODE box in the Object Inspector. An example is shown below:

You need to write (or paste) a JavaScript expression using variable Names (not labels). Hovering over a variable in the Data Sets panel will show the variable name. You can also click-and-drag the variable into the text box to paste the variable name.

A worked example

In our example, we have collected consumption data on six different sub-types of cola. This forms six different numeric variables. The following JavaScript expression finds the sum of the variables with JavaScript: q2a_1 + q2a_2 + q2a_3 + q2a_4 + q2a_5 + q2a_6.

These are referred to as the input variables. You can see a preview of the new variable beside the JavaScript Code box and check that your expression is working correctly.

You can manually check the first few lines to confirm that your JavaScript expression works as expected.

Banding a numeric variable

Suppose the expression in the above example resulted in a new numeric variable that we called “sumCola”. Now we want to create an additional variable that allocates people to one of three categories based on their total cola consumption (Light Drinkers – less than 5 colas, Medium Drinkers – between 5-10 colas, and Heavy Drinkers – more than 10 colas).

The if command allows you to do this:

if (sumCola < 5) 1; 
else if (sumCola>= 5 && sumCola <= 10) 2; 
else if (sumCola > 10) 3; 
else NaN;

 

The code can be interpreted as follows:

if (sumCola < 5) 1;

If the variable sumCola is less than 5, return a value of 1.

else if (sumCola>= 5 && sumCola <= 10) 2;

If the first expression above does not apply, and if the variable sumCola is greater than or equal to 5 and (represented by &&) less than or equal to 10 (i.e., falls between 5 and 10), return a value of 2.

else if (sumCola > 10) 3;

If the first two expressions above do not apply, and if the variable sumCola is greater than 10, return a value of 3.

else NaN

Consider any other values as missing. NaN stands for Not-A-Number (i.e., missing or blank)

The newly created variable will have the values of 1, 2, or 3 (and perhaps NaN). Because we want this to act as a categorical variable, we have to add two extra steps. Click on the variable and make the following changes in the Object Inspector:

  1. Change the Structure to Nominal: Mutually exclusive categories.
  2. Click the Labels button and give the variable more descriptive labels (e.g., 1 = “Light Drinker”, 2 = “Medium Drinker” and 3 =  “Heavy Drinker”).

You can create the same bandings by merging columns in a table. However, when you import new data, new values may need to be manually merged into one of the three categories. The beauty of using the JavaScript variable is that it automatically allocates new results to the appropriate category.

Instead of having to create a separate sumCola variable just to be used in your JavaScript variable, you can also combine the two into a single variable. This can be done by creating variables within the JavaScript variable code. In the following example, we create a new variable within the JavaScript variable called x, and use it in our if-else logic:

var x = q2a_1 + q2a_2 + q2a_3 + q2a_4 + q2a_5 + q2a_6;
if (x > 5) 1;
else if (x <= 5 && x >= 10) 2; 
else if (x < 10) 3; 
else NaN;

Combining different weighting variables

In the above examples, the JavaScript variables are returning constant values. But JavaScript variables can return the values of other variables as well. A common application is the need to combine the weights for various segments into one variable.

Consider a situation where we’ve collected data for four countries: UK, France, Australia, and Japan. The data for each country was held in 4 separate data files and merged into a final file. Membership to a particular segment (market) is defined by a variable called country. Each country’s data file had a specific weighting variable, so we now have four different weighting variables in the merged data file (weight_UKweight_FRweight_AUweight_JP). When we do our analysis, we only want to deal with one weighting variable. So let’s combine them using JavaScript:

 
if (country == 1) weight_UK; 
else if (country == 3) weight_FR;  
else if (country == 7) weight_AU;
else if (country == 4) weight_JP; 
else NaN; 

 

You’ll notice that there are two consecutive equals signs in the above expression. You need both of them because a single equals sign has a different role in JavaScript. In short, a single equals sign is used for assignment while the double-equals sign tests for equality.

When setting a variable as something (also called assigning a value), as in the example above with the variable called x, use the single equals sign. When testing for equality, as in the example above with the expression country == 3, use the double-equals sign.

You may also notice that the values of the country variable are not sequential. That’s because, in our hypothetical example, the variable country has values of 1 = UK, 2 = Germany, 3 = France, 4 = Japan, etc.

Making a date variable from a categorical variable

In Displayr, JavaScript (like R) has a suite of ready-made functions that can help you do great things. One such function is Q.EncodeDate(). It will enable you to return a number in a format that Displayr can, in turn, recognize as a date question. When used in conjunction with conditional statements, you can turn a categorical variable (e.g. period or wave variable) into a date question. Date questions have numerous benefits in Displayr, such as automatic aggregation of time, testing (significance) against the previous period, specific filter functions, and more.

Suppose you have a variable in your data file called week that encodes the week of the survey in categorical format. As a result, it just encodes values: 1 = week one, 2 = week two, 3 = week three, and so on. You can turn this into a date question, starting the first week on the 1st Jan 2018, using an expression like the one below. Notice that Q.EncodeDate works with the YYYY,MM,DD format.

 
if (week == 1) Q.EncodeDate(2018,01,01); 
else if (week == 2) Q.EncodeDate(2018,01,08); 
else if (week == 3) Q.EncodeDate(2018,01,15); 
else if (week == 4) Q.EncodeDate(2018,01,22); 
else if (week == 5) Q.EncodeDate(2018,01,29); 
else if (week == 6) Q.EncodeDate(2018,02,05); 
else NaN; 

 

Then change the variable structure to Date/Time to use it as a date variable set.

Working with JavaScript text variables

JavaScript text variables are useful if you want to join the results of different open-ended variables. The following expression joins three spontaneous brand awareness questions into a single text variable and inserts punctuation. It is akin to the concatenate function in Excel.

q1a_1 + ", " + q1a_b + ", " + q1a_c + "."

If you aim to create create a categorical variable, as in the first example with Light-Heavy drinkers, then you can use a text JavaScript variable to generate the labels. Displayr automatically sets up value labels when converting a text variable to a categorical variable.

If you create a numeric variable, you may need to manually assign labels to the values (e.g., 1 = Segment A, 2 = Segment B) as per the first example in this post. The first example (summing the colas) can be rewritten as a text variable with the following:

if (sumCola > 5) "Light Drinker";
else if (sumCola <= 5 && sumCola <= 10) "Medium Drinker"; 
else if (sumCola > 10) "Heavy Drinker";
else "";

Tips when working with code

Writing in computer code needs to be exact – you need to pay attention to:

  • The presence or absence of semi-colons: they signal that you’re ready for the next bit of the expression.
  • The capitalization of letters: case-sensitivity means that “if” does not mean the same as “If” or “IF”.
  • The right number of brackets (and where they open/close)
  • Whether brackets are (round) or {curly} or [square].
  • When symbols are doubled: double ampersands (&&) for “AND”, double-equal signs (==) for equality, and double pipes (||) for "OR".
  • Missing values with your input variables: they can lead to missing results if not handled correctly. There is a more elaborate code you can use to make conditional statements around missing values.
]]>
https://www.displayr.com/creating-and-working-with-javascript-variables/feed/ 0
Creating Demand Curves Using Conjoint Studies https://www.displayr.com/creating-demand-curves-using-conjoint-studies/?utm_medium=Feed&utm_source=Syndication https://www.displayr.com/creating-demand-curves-using-conjoint-studies/#respond Mon, 09 Dec 2019 20:55:46 +0000 https://www.displayr.com/?p=20954 ...]]> It shows how likely people are to make purchases at different price points. There are lots of different ways of estimating demand curves. In this post, I explain the basics of doing so from a conjoint study using Displayr.

Example demand curve

Below is a demand curve from a choice-based conjoint study of the chocolate market. It shows preference share for a 2-ounce Hershey milk chocolate bar.

Preparation: Creating the model and simulator

Before computing the demand curve you need a simulator. The most straightforward way of doing this is to create a model using Insert > More > Conjoint/Choice Modeling > Hierarchical Bayes, followed by Insert > More > Conjoint/Choice Modeling > Simulator.

Manually creating the demand curve

The simplest way to create a demand curve is to manually run each scenario of interest in your simulator. Let's say we wanted to create the demand curve for Hershey. We would set each of the alternatives to the desired attribute levels, with Hershey at the lowest price point, and make a note of Hershey's market share. Then, we would increase Hershey's price to the next price point and make a note of that share, and so on. You can then use Home > Enter Table to create a table of these data points (with price in the first column and market share in the second) and hook it up to a visualization.

Code based-creation of a demand curve

There are several situations where manually creating the demand curve is a poor solution, including:

  • When you want to create the demand curve in a dashboard so that it automatically updates when the user filters the data or changes the attribute levels of the alternatives.
  • Where there are a large number of alternatives to be simulated (e.g., models of SKUs).
  • Where there is a numeric price attribute, and you want to test lots of price points.

In such situations, it is often better to use code to create the demand curve.

Step 1: Duplicating the code used to create the simulator

When you create a simulator automatically in Displayr it creates an R Output below the simulator that contains the underlying code that calculates the preference shares. In the screenshot below, I've selected it (hence the outline). Step 1 is to click on and press Home > Duplicate to create a copy of the R Output.

Step 2: Modifying the code

Inspecting the code

You can inspect the underlying code in the copied R Output by viewing Properties > R CODE in the Object Inspector. It will have a structure like the code below. In this example:

  • Lines 1 to 4 describe the scenario that is being simulated, with one row for each alternative, and all four alternatives grouped as a list within a scenario list.
  • Looking at Alternative 1, we can see that the level for Brand is set to cBrand.1, with the blue shading telling us that this is the name of something else in the project. In this case, the something else is the control on the page where the user selects the level of the brand attribute.

If you hover your mouse over any of the references to the controls, a box will appear to the left telling you the current selection. In the example below, we can see that the first alternative's price has been set to "$0.99".

Modifying the code

We can modify the code to insert other attribute levels. For example, if we replaced cPrice.1 with "$0.99", we would get the same result as changing it in the price control. However, if we change the R code to "$0.99", the code will no longer use the price control and will instead always use $0.99 as the price for alternative 1.

The code below is a modification of the code above, but it computes the demand curve. The key aspects of the code are:

  • Lines 1 to 4 are identical to those that have been automatically created by the simulator bar changing the alternative list parameters to c.
  • You can copy and modify Lines 5 to 13 as described in the remaining steps.
  • The prices for the simulator are in line 5.
  • In lines 10 and 11 replace "Alternative 3" with the name of the alternative that you are wanting to compute demand for. As shown in the screenshot below, in this case study, Hershey is Alternative 3.
  • Replace hershey in line 13 with the name of the brand you are interested in.

Step 3: Creating the Visualization

You can now hook up your new table to a visualization from the Insert > Visualization menu.  To create the area chart from my example above, click Insert > Visualization > Area and select your R table in the Inputs > DATA SOURCE > Output in 'Pages' drop-down in the Object Inspector.

 

]]>
https://www.displayr.com/creating-demand-curves-using-conjoint-studies/feed/ 0
Creating R Variables from Multiple Input Variables Using Code https://www.displayr.com/creating-r-variables-from-multiple-input-variables-using-code/?utm_medium=Feed&utm_source=Syndication https://www.displayr.com/creating-r-variables-from-multiple-input-variables-using-code/#respond Tue, 30 Jul 2019 04:33:00 +0000 https://www.displayr.com/?p=17674 ...]]> Numeric variables

All the traditional mathematical operators (i.e., +, -, /, (, ), and *) work in R in the way that you would expect when performing math on variables.

For example, to add two numeric variables called q2a_1 and q2b_1, select Insert > New R > Numeric Variable (top of the screen), paste in the code q2a_1 + q2b_1, and click CALCULATE. That will create a numeric variable that, for each observation, contains the sum values of the two variables. Similarly, the following code computes a proportion for each observation: q2a_1 / (q2a_1 + q2b_1).

To see the name of a variable, hover over it in the Variable Sets tree. Or, drag the variable into the R CODE box.

Vector arithmetic

One of the great strengths of using R is that you can use vector arithmetic. Consider the expression q2a_1 / sum(q2a_1). This tells R to divide the value of q2_a1 by the sum of all the values that all observations take for this variable. That is, when computing the denominator, R sums the values of every observation in the data set.  Other programs, such as SPSS, would instead treat this expression as meaning to divide q2_a1 by itself.

Similarly, if we wished to standardize q2a_1 to have a mean of 0 and a standard deviation of 1, we can use (q2a_1 - mean(q2a_1)) / sd(q2a_1).

In these two examples, there are also specialist functions we can use: q2a_1 / sum(q2a_1) is equivalent to writing prop.table(q2a_1), and (q2a_1 - mean(q2a_1)) / sd(q2a_1) is equivalent to scale(q2a_1).

rowSums and rowMeans

As shown in the previous section, sum will add up all the observations in a variable. If we want to calculate the average of a set of variables, resulting in a new variable, we do so as follows:

rowMeans(cbind(q2a, q2b, q2c, q2d, q2e, q2f))

Where:

  • cbind groups the variables together in a table with one row for each observation and one column for each variable
  • rowMeans computes the mean of each row in the table.

Missing values in vector arithmetic

Most in-built R functions, such as sdmean, sum, rowMeans, and rowSums, will return missing values if any of the values in the vector (variable in this case) passed to them contains a missing value. In most cases, the trick is to use na.rm = TRUE. For example:

(q2a_1 - mean(q2a_1, na.rm = TRUE)) / sd(q2a_1, na.rm = TRUE)

Sadly, there is no shortage of exotic exceptions to this rule. For example, prop.table cannot deal with missing values, and scale automatically removes them.

Variable sets

The data file used in this post contains 12 variables showing the frequency of consumption for six different colas on two usage occasions. When Displayr imports this data, it automatically works out that these variables belong together (based on their having consistent metadata). The variables are then automatically grouped together as a variable set, which is represented in the Data Sets tree, as shown below.

When your mouse pointer is positioned over the variable set, it shows the raw data for the variables. In addition to showing the 12 variables, you can also see nine automatically constructed additional variables:

  • One variable which shows the sum of the variables, called SUM, SUM. This is the right-most of the variables.
  • Six showing the sum of each of the cola brands: Coca-Cola, SUM, Diet Coke, Sum, etc.
  • Two showing the sum of the variables pertaining to each occasion: Sum, 'out and about' and Sum, 'at home'.

These automatically constructed variables can considerably reduce the amount of code required to perform calculations. For example, to compute Coca-Cola's share of category requirements, we can use the expression:

(q2a_1 + q2a_2) / `Q2 - No. of colas consumed`[,"SUM, SUM"]

Note that the denominator has two aspects:

  • The Label of the variable set, which is surrounded by backticks (the key that looks a bit like an apostrophe but isn't; on my keyboard it's above the Tab key, but this can vary depending on your keyboard's region).
  • [,"SUM, SUM"] which means to take the column SUM, SUM.

At first glance, this may seem somewhat strange and unguessable. However, if you create a table with the variable set, you can get a better understanding of what is happening and why. The table below shows the variable set, and you can see that the SUM variables correspond to the totals. With categorical variable sets, NET appears instead of SUM. And, if you delete these categories from the table, it will also delete them from the data set itself.

The apply function

R has a super-cool function called apply. It is a little tricky to get your head around it if you're new to writing R code, so if your head is already swimming, skip this section!

Earlier we looked at rowMeans(cbind(q2a, q2b, q2c, q2d, q2e, q2f)). We can rewrite this as apply(cbind(q2a, q2b, q2c, q2d, q2e, q2f), 1, mean). This is doing exactly the same thing, except that:

  • We are telling R to compute the average with the mean argument
  • The 1 tells R to perform the calculation by rows. If we instead had a 2, we would instead compute the mean of the columns.

The useful thing about apply is that we can add in any function we want. For example, to compute the minimum, we replace mean with min:

apply(cbind(q2a, q2b, q2c, q2d, q2e, q2f), 1, min)

And, we can even write custom functions to apply for each row. The example below identifies flatliners (also known as straightliners), who are people with the same answer to each of a set of variables:

apply(cbind(q2a, q2b, q2c, q2d, q2e, q2f), 1, function(x) length(unique(x)) == 1)

The way it works is that:

  • The function(x) part is boilerplate, telling R that you are going to be creating a custom function, and to represent each row as x
  • unique identifies all the unique values in x (i.e., each row)
  • length(unique(x)) counts the number of unique values for each row
  • length(unique(x)) == 1 returns a TRUE for each row that contains only one unique value (i.e., flatlining) and a FALSE otherwise

We can make the code simpler by referring to variable set labels rather than variable names, as done below. But, when doing this, keep in mind that any automatically constructed SUM or NET variables will be in the calculation. This is fine for working out flatlining (as in this example), but will lead to double-counting in other situations e.g., if computing a sum or average).

apply(`Q2 - No. of colas consumed`, 1, function(x) length(unique(x)) == 1)

Categorical variables

This section returns to basics and looks at all the steps that go into recoding a numeric variable into a categorical variable. In this example, we will illustrate various aspects of how the program works by recoding age into a new variable with four categories. If all you are really wanting to do is recode, there is a much better way: see How to Recode into Existing or New Variables.

  1. Create a table by dragging the variable onto the page. This shows us the labels that we need to reference in our code.
  2. Insert > New R > Numeric Variable, which will cause a new variable to appear in the Data Sets tree on the left side of the screen.
  3. Type or copy and paste the code shown below into INPUTS > R CODE (on the right of the screen) and click CALCULATE (at the top-right of the screen).
  4. Check the new variable by cross-tabbing it with the original variable. That is, drag the new variable (probably called newvariable) over the original table, releasing it in the Columns slot.  You will see the values that have been recoded to each of the categories, showing as averages.
  5. Click back on the new variable in the Data Sets tree, and give it an appropriate Label and Name (top-right of the screen; e.g., Age groupings, and age, respectively).
  6. Optional: change the structure of the data so that it is categorical, by setting INPUTS > Structure to Nominal: Mutually exclusive categories (at the bottom) and set the labels by clicking DATA VALUES > Labels.

Looking at the code above, note that:

  • For a single category, we use the == operator.
  • For multiple categories, we list them surrounded by c() and use the %in% operator.
  • The values are assigned at the end of the line, after a ~.

Automatic updating: benefits and gotchas

When your original data updates, the code is automatically re-run. This is mainly a good thing. However, if you merge the categories of the input age variable, it will cause problems to the variable. Here are two ways to avoid this:

  • Duplicate the original variable (Home > Duplicate) and merge its categories.
  • Modify the code to use the label of the merged categories.

Not (!)

In R, the way you write "not" (as in, "not under 40") is to use an exclamation mark (!). So, we can write:

Variable labels containing punctuation

Rather than typing variable labels, we can drag them from the data set into the R code. Where the variable label contains punctuation, it will be surrounded by backticks, which look a bit like an apostrophe. On my keyboard, the backtick key is above the Tab key.

Using variable names

When you hover over a variable in the Data Sets tree, you will see a preview which includes its name. In my data set, "living arrangement" has a variable name of d4, and we can refer to that in the code as well in place of the label.

Or (|)

You can also use the or operator, which is a pipe (i.e., a single vertical line). On my keyboard, I hold down the shift key and click the button above Enter to get the pipe.

In this example, note that I've used parentheses around the expression that is preceded by the not operator (!), as otherwise it would be read as "not living with partner and children or living with children only", rather than "not(living with partner and children or living with children only)."

Other (TRUE)

In the example above, line 3 is a very verbose way of writing "everybody else". We can instead use the code snippet below. The case_when function evaluates each expression in turn, so when it gets to line 3, R reads this as "everybody else" or "other".

Missing values (NA)

If our categories are not exhaustive, we will end up with missing values. For example, this code creates a variable with a 1 for people with children and missing values for others.

Recoding after creating the R variable

It might look like the missing values caused by the example above is a mistake. But it can be an efficient way to work because you can later recode the variable using Displayr's GUI. Simply click DATA VALUES > Values, change the Missing data in the Missing Values setting to Include in analyses, and set your desired value in the Value field.

And (&)

The example below uses the and operator, &, to compute a respondent's family life stage. The green bits, preceded by a #, are optional comments which help make the code easier to understand.

Temporary variables within the code used to create a variable

A much nicer way of computing a household structure variable is shown in the code below. This approach initially creates four variables as inputs to the main variable of interest, and these variables are not accessible anywhere else in Displayr. They exist for the sole purpose of computing household structure.

Line 1 computes a variable that contains TRUE and FALSE values for each row of data, as do lines 2 through 4. Then, case_when evaluates these using standard boolean logic for each row of data.

What makes this better code? It improves on the earlier example because:

  • Calculations are performed once. In the earlier example, the definition of younger appeared six times, but in this example, it only appears once.
  • It is simpler to read

ifelse

Earlier we looked at this example:

[desktop]

[mobile][/mobile]

A much shorter way of writing it is to use ifelse:

[desktop]

You can nest these if you wish, as shown below. The use of two lines and the spacing is a matter of personal preference; they are not required.

Using the numeric values of variables in computations

It can be more convenient to refer to values rather than labels when doing computations. But there's a good way and a bad way to do this. I'm going to start with the bad way because it is an obvious (but not the smartest) approach for many people new to writing code using R (particularly those used to SPSS).

Bad approach

The example below uses as.numeric to convert the categorical data into numeric data. A value of 1 is automatically assigned to the first label, a value of 2 to the second, and so on. These values will not necessarily match the values that have been set in the raw data file. For example, if the data file contains values of 1 Male and 2 Female, but no respondent selected male, then the value of 1 would be assigned to Female.

Better approach

The safer way to work is to click on the variable set, and then select a numeric structure from Inputs > Structure (on the right side of the screen). For example, you would change the age variable to a structure of Numeric. Or, better yet, first duplicate the variable (Home > Duplicate), and then change the structure of the duplicate so that the original variable remains unchanged.

In my example, the age variable in the data has midpoints assigned to each category (e.g., 21 for 18 to 24, 27 for 25 to 29, etc.). You can see these by clicking on the variable and select DATA VALUES > Values on the right of the screen.

Subscripting

An alternative approach to recoding is to use subscripting, as done below. Why this works is actually a little complex -- but it does work!

[desktop]

[mobile][/mobile]

Mathematical operations on categorical variables

This next approach is a wonderful time saver, but is a little harder on the brain.

Earlier we looked at recoding age into two categories in a few different ways, including via an ifelse:

[desktop]

The code below does the same thing. Let' unpack it:

  • `Age 2` is the numeric version of age, created in the way described in the previous section.
  • `Age 2` >= 40 creates a variable with a TRUE value for people with an age of 40 or more, and FALSE for people under 40.
  • + 1 adds a 1 to the TRUE and FALSE values. This may seem odd, but it is a standard thing in computing: when you use a TRUE or a FALSE in calculations, the TRUE is treated as a 1 and the FALSE as a 0.
  • The parentheses tell us to first compute the TRUE and FALSE. Without them, the analysis would then be checking to see who is aged 41 or more.

This next example can be particularly useful. This code creates 18 categories representing all the combinations of age and gender, where:

  • as.numeric(Age) converts the categorical variable into numeric values, as described above in the "bad approach" sub-section. This means that the youngest category gets a value of 1, the second as 2, etc.
  • max(as.numeric(Age)) * (as.numeric(Gender) - 1) assigns a value of 0 to Males and 9 to Females, where the 9 is the number of age categories.
  • By adding the two together, we get values of 1 through 9 for the age categories of males, and 10 through 18 for females.
  • If your goal is to create a new variable to use in tables, a better approach is Insert > New Banner.

Returning to our household structure example, we can write it as:

Debugging

When you insert an R variable, you get a preview of the resulting values whenever you click CALCULATE. However, if doing anything remotely complicated, it is usually a good idea to:

  • First check the code by creating an R OUTPUT (Insert > R Output), as these are better for debugging.
  • Click on the R Output and check Inputs > OUTPUT > Show raw R Output, which will show all the steps in processing the code, line by line
  • Use R functions like summary and table to show the values of intermediate calculations, as shown in the example below.

 

]]>
https://www.displayr.com/creating-r-variables-from-multiple-input-variables-using-code/feed/ 0
How to Band Numeric Variables in Displayr https://www.displayr.com/how-to-band-numeric-variables-in-displayr/?utm_medium=Feed&utm_source=Syndication https://www.displayr.com/how-to-band-numeric-variables-in-displayr/#respond Tue, 25 Jun 2019 02:13:52 +0000 https://www.displayr.com/?p=18237 ...]]> Let's say you are asking survey respondents for an absolute number (eg: how many colas have you consumed in the past week?) or a point on a set scale (eg: what proportion of staff are female? Please type in a number from 0 to 100). It’s not uncommon to want to band up the range of potential inputs into categories (eg: 0-5, 6-10, 11+) for analysis purposes.

Numeric to banded pic

The purpose of this article is to show you the options you have for creating a banded (categorical) version of a variable, using both drag-and-drop and code methods (R and JavaScript).

Checking the Variables: Structure and Values

These variables I just described are normally read into Displayr as numeric variables. That is what you would expect of a good data collection platform that only accepts a numeric input. For numeric data, the values and labels are one and the same. A variable’s structure is indicated by the icon next to the variable in the Data tree, but also in its Object Inspector under INPUTS > Structure.

Structure

Displayr reads these variables as nominal or ordinal if there is text involved with the value label (eg: 0 – Not at all satisfied and 10 – Extremely Satisfied are the endpoints of your scale). If that is the case, it is prudent to check the Values so that they align correctly with the labels. The Values button is just under the Structure dropdown in the Object Inspector as per the picture above. You don’t want a value of 1 ascribed to 0-Not at all satisfied and so forth (it should be a value of 0). You should change it so that the correct value aligns with the label.

Displayr will interpret these variables as text if there are spaces or other characters involved in the data. This is one key reason why Excel/CSV files are a poor file format for survey data. If you have it as a text variable, you can change the variable structure to numeric, but it can’t be guaranteed that all non-numeric information will be correctly converted into numeric values. So you may need to manually format and clean your text variable in the Excel/CSV file (ie: remove all non-numeric characters that could be ‘polluting’ the variable).

Banding by drag-and-drop

Banding via drag-and-drop is the easiest way to band your variable and makes the most sense if you are unlikely to update your data file with new data.

If your variable(s) is numeric use Home > Duplicate to create a copy of the variable and then change the copied variable set structure to be nominal (or ordinal). As per the picture above, you change the structure in Object Inspector > INPUTS > Structure.

Drag your categorical variable on to the page to make a table. Then select all the categories you want to band together (using Ctrl or Shift) and use Data Manipulation > Merge to merge them into a category. At this point, it’s prudent to use Data Manipulation > Rename to give the banding a correct label.

And that’s it! The main drawback to using this method is that if you update your data file with fresh data (eg: more respondents) then you may end up with a category that is not in a band. For example, if you band up 1,2,3,4,6,7,9, and 10 into a band "1-10" using drag-and-drop and then if you update your data file, you may have news cases which provide a 5 or 8 score. These new values have not been included in the "1-10" band, and you’ll have to manually merge them in (by repeating the process above). To get around this, use one of the code based systems below.

Banding via R variable

Banding with code is a sure-fire method to ensuring that all potential values within a range will end up in the correct bands. It’s very simple code to implement and doesn't require extensive R knowledge (just copy the template below). You can flexibly change the band later by tweaking the code.

Suppose your variable has the label Q2. Number of  Coca-Cola consumed? It will also have a variable name (Q2_a). You can use either the variable label or name in Displayr. The variable name is revealed in the Object Inspector > Properties > General and also by hovering over the variable in the Data Set tree.

Insert > R Variable

Over in the R CODE window in the Object Inspector you can write a simple IF and ELSE IF statement. What I like about R CODE is that you can drag a variable from the Data Sets tree directly into the R Code box, and give it the convenient label ‘x’ (or whatever). So the first line of code looks like: x = `Q2 - No. of Coca-Cola consumed`

Then you can easily set up your bands referring ‘x’, like the below:

x = `Q2 - No. of Coca-Cola consumed
x[x = 0] = 0
x[x >= 1 & x <= 5] = 1
x[x >= 6 & x <= 10] = 2
x[x > 10] = 3
x

You can augment and adjust the code above to suit your banding needs, adding as many lines as you like. See here for a guide to IF and ELSE IF statements in R.

In the above, I've given the new bands values of 0,1,2, and 3 respectively. You could make these whatever you want. Once you've made the variable you will need to adjust the label for each Value, which you can do by going to the Values Attributes window using the Values button in the Object Inspector.

Values box

Banding via JavaScript

For those of you who prefer to use JavaScript, the process is very similar to the R variable, except of course you use JavaScript, so the code is a little different. It uses IF and ELSE IF statements. In the above, you can drag-drop and/or us the variable label in the code. With JavaScript, you need to use the variable name, as per the first line of the code below. The variable name is revealed in the Object Inspector of the variable (under Properties) and by hovering your mouse over the variable in the Data Set tree. So your code could look like this:

x = q2a_1
if (x == 0) 1;
else if (x >= 1 && x <= 5) 2;
else if (x >= 6 && x <= 10) 3;
else if (x > 10) 4;

Try for yourself

The examples in this post are in this Displayr document. The variables are at the top of the Data tree.

]]>
https://www.displayr.com/how-to-band-numeric-variables-in-displayr/feed/ 0
How to link images to a visualization in Displayr https://www.displayr.com/how-to-link-images-to-a-visualization-in-displayr/?utm_medium=Feed&utm_source=Syndication https://www.displayr.com/how-to-link-images-to-a-visualization-in-displayr/#respond Sun, 23 Jun 2019 23:29:07 +0000 https://www.displayr.com/?p=15885 ...]]> Consider the bar charts in the image below. When the data is set to automatically sort rows by decreasing order, brand orders can change when a filter is applied to the bar chart. The point of interest here is that the logos (which are six separate objects that sit adjacent to the visualization) also dynamically update to reflect the order. For example, the second field updates to switch from the Coke Zero logo to the Pepsi Max logo. I'll use this as a worked example in this post.

 

Preparation: Getting a URL for each image

We are not going to be working with images you insert via Insert > Images from the Ribbon because those are static images. Responsive images actually sit within an R Output (Insert > R Output) that can only read images from a URL location. So the first step is to get the URLs for all images.

There are several ways to do this. Some Displayr users might have a network or shared server that can generate URLs. You can use a cloud-based app like Google Photos or Dropbox to generate links.

For instance, I used the image hosting service imgur to make my URL, which generated a URL that looks like this: https://i.imgur.com/mrIpR63.png.

In Displayr, it's useful to have a pasted table of all your URLs alongside the brand name. We will later refer to that table when we write the code. As this table will be a reference page, I suggest putting the table on a new page and hiding it (from Viewers). The following steps will do this for you:

  • Insert a new page by going to Insert > New Page > Title Only from the Ribbon
  • Insert a table by going to Insert > Enter Table from the Ribbon
  • Select the blank table object on the page, and go to Object Inspector > Inputs > DATA SOURCE > Paste or Type Data 
  • Enter your table in the spreadsheet with your URLs. The one I did in my example looks like the below:
  • Click OK (Note: the name of the table in this example is table.output).
  • Optional: Give the page a title like "Image Reference," and then select the page in the Pages Tree and hide it by going to Appearance > Hide from the Ribbon.

Create a merged table of data and image URLs

The next step is to line up the data that will feed into the visualization alongside the image. We ultimately want to achieve a merged table, like the one in the image below. The table has the data for the visualization in the first column (the %'s in this case) alongside the corresponding image URL. There are several ways to do this, and my example is just one method. You may like to write your own R code to make a custom table. What matters is that you have matched the item (brand), the data (% in this case), and the correct image URL.

First, I made a table of my source data, which was created by dragging the Preferred Cola variable onto the page to generate a standard summary table. The table is named table.Q3.Preferred.cola.

I created a merged table of my source data and images and matched it up by brand. To do this, I used Home > Tables > Merge Two Tables. Over in the Object Inspector, I nominated the tables to merge. In this case, it was table.Q3.Preferred.cola and table.output. I selected Side-by-Side and Matching Only as my options. Because I want the R Output to sort the data from highest to lowest, I added some lines of code to the merged object. I did this by selecting the merged table, going to Properties > R CODE in the Object Inspector, and adding this final line:  merged[order(merged[,1], decreasing = TRUE),]. If you're not familiar with this process, I recommend reading the blog post How to sort your data with R in Displayr.

Finally, in my example, I extracted the first column of the merged table because the visualization needs it separate from the URL text. I did that with the following code within another R Output (Insert > R Output).  The name of this output is by default sorteddata.


sorteddata = as.numeric(merged[,1])
sorteddata = as.data.frame(sorted_data)
rownames(sorteddata) = rownames(merged)
sorteddata

Create your visualization

Now insert your visualization and link it up to the data table. In my example,  I selected Insert > Visualization > Bar Chart. I hooked it up to the R Output sorteddata. If you're not sure how to set up a visualization, I recommend reading How to Create a Bar Chart in Displayr (noting the Visualization section is the relevant section, not the Charts section).

Create holders for your images

Now, create another R Output, and then use the following code. You will need to change the references in the first three lines. The first line tells us which order the item is (in this example, it is the first). The second and third line reference the merged R Output, including the column that has the URL. If you're not familiar with table subsetting and referencing with R Output, I recommend reading How to do Simple Table Manipulations with R in Displayr.

item = 1
src = merged[item,"Src"]
alt = rownames(merged)[item]
text = paste0(
    '<head>
<meta name="viewport" content="width=device-width, initial-scale=1">

<img src="" data-wp-preserve="%3Cstyle%3E%0A.responsive%20%7B%0A%20%20width%3A%20100%25%3B%0A%20%20height%3A%20auto%3B%0A%7D%0A%3C%2Fstyle%3E" data-mce-resize="false" data-mce-placeholder="1" class="mce-object" width="20" height="20" alt="&lt;style&gt;" title="&lt;style&gt;" />

</head><body><img src="', src, '" alt="', alt, '" class="responsive">
</body>')</pre>

rhtmlMetro::Box(text,text.as.html = TRUE)

Next, resize your R Output. The image is set to responsively adjust in size to fill the R Output container (that's why there is lots of HTML code cited within the R code above!) Then duplicate your R Output (Home > Duplicate) and adjust the first line as necessary (eg: item = 2, item = 3, etc) in each successive R Output. Finally, align them next to the visualization. To see the dynamic behavior in action, select the original data table, table.Q3.Preferred.cola, and apply a filter to see the visualization and linked images update accordingly.

Try for yourself

The worked example can be found in this Displayr document.

]]>
https://www.displayr.com/how-to-link-images-to-a-visualization-in-displayr/feed/ 0
How to Dynamically Change a Question Based on a Control Box https://www.displayr.com/how-to-dynamically-change-a-question-based-on-a-control-box/?utm_medium=Feed&utm_source=Syndication https://www.displayr.com/how-to-dynamically-change-a-question-based-on-a-control-box/#respond Wed, 19 Jun 2019 01:11:27 +0000 https://www.displayr.com/?p=17978 ...]]> The two main types of control boxes are the combo and the list box. Typically they are used for changing how the data is filtered, as discussed in this post. But you can also use a control box to change the actual question in a table (or chart, visualization, etc.). You can also use control boxes to change the weighting you want to apply.

For example, the image below shows a question (Preferred Cola) that I've chosen to split by income brackets, using the selection in the control box.

Income selected in control box

If I change the control box option to Age, it becomes:

Control box selection Age

You can do this with an R variable. The R variable dynamically updates when the selection in the control box changes. The purpose of this post is to show, via example, how you can do this.

Setup your control box with your options

Use Insert > Control and then choose either a Combo or List box. Over in the Object Inspector, list your questions in CONTROL > Item List (which can be labeled however you like). In this example, I entered 4 possible options for a combo box:

Control options

I set the Selection mode to be "Single selection," and When item list changes to be "Select first."

Be sure to take note of the control box’s name under PROPERTIES > GENERAL > Name, because we’re about to use this in the R variable.

Changing single-variable questions via your control box

Next, you will need to create an R variable with conditional statements that link to the questions via Insert > R > Numeric variable. This will make a new numeric variable under Data Sets, creatively called “newvariable” by default. Displayr will reveal in the Object Inspector a blank box where you can put in the R CODE:

Code for control box

As per the picture above, you enter simple conditional statements with R. Basically, it references the control box (called Combo.box in this example) and then each of the 4 options. The four variable names -- d1, d2, d3, and d4 -- pertain to each of the single-variable questions to use in the table. The code consists of very straightforward "IF and ELSE IF" statements.

Be sure to change the variable Structure to be nominal or ordinal (if you intend for the question to be categorical). This is done under INPUTS > Structure in the Object Inspector for the R variable (in the picture above at the very bottom under the code).

And that’s it! From there you can use your R variable in a table, directly in a visualization, or in another analysis. It will change dynamically as you alter the selection in the control box.

Changing multiple-variable questions via your control box

When working with multiple-variable questions, it may be possible to use the same approach of using 'if/else' code for each variable in your variable set, but there are some provisos:

  • Your variables must be set together as either a Binary – Multi or Number – Multi, as applicable.
  • You should have the same number of variables for the questions that are to be substituted.
  • The variable labels should be applicable for all questions, as these can't dynamically change.

When the number of variables and/or variable labels are different between the questions you wish to dynamically change via a control box, it is better to substitute tables instead. The steps are as follows:

  • Create separate tables for each of the questions listed in your control box, drag them off your page and select Appearance > Hide from the ribbon.
  • Create an R output via Insert > R Output that selects which table to choose based on the table name (found under PROPERTIES > GENERAL > Name) and the control box selection:
if (Combo.box == "Awareness") table.D1.Age.by.Awareness else
if (Combo.box == "Preference") table.D1.Age.by.Preferred.cola
  • In the above example I have 2 control options that switch between 2 tables, one 'Age by Awareness', the other 'Age by Preferred Cola'. As the final output is a visualization, I've also hidden this R output and dragged it off the page.
  • Once you update the visualization's output reference under Inputs > DATA SOURCE > Outputs in 'Pages' to this R output, you will then be able to dynamically control the data shown:

multiple-variable dynamic filtering

Changing the weighting dynamically with an R variable

You can apply the same technique to dynamically change the weighting. You essentially reference different weighting variables in the R code based on your selection in the control. For example:

if (Combo.box == "USA") weight_us else
if (Combo.box == "France") weight_fr else
if (Combo.box == "UK") weight_uk

Then make sure the R variable has the Usable as weight box checked in the Object Inspector. You can then apply that to a table (or chart or whatever) as your weighting variable.

Try for yourself

The above example is captured in this Displayr document. The R variables are the first two variables in the Data Set.

Get started!
]]>
https://www.displayr.com/how-to-dynamically-change-a-question-based-on-a-control-box/feed/ 0
How to Switch Logos and Images Based on User Selections https://www.displayr.com/how-to-switch-logos-and-images-based-on-user-selections/?utm_medium=Feed&utm_source=Syndication https://www.displayr.com/how-to-switch-logos-and-images-based-on-user-selections/#respond Mon, 27 May 2019 01:30:28 +0000 https://www.displayr.com/?p=17760 ...]]> Displayr's Conditional Image visualization allows you to add images to your document which change when the data changes in response to filters or other interactive components on your page. For instance, you could display a thumbs-up when your result is higher than expected, or a thumbs-down when your results are lower than expected. In this article, we are going to use this same tool to switch between different brand logos based on your viewer's selection.

For this to work, you need three components:

  1. An interactive menu on your page which lets you choose the brand
  2. Some R Code which translates the menu selection into a number
  3. A Conditional Image which changes logos based on the numbers

It's important to remember that all of these elements need to be on the same page. Displayr's interactive features work on a page-by-page basis when your document is published as a web page.

Step 1 - Create your menu

If you've been designing a dashboard which can show results for one of several brands based on a menu selection, then you probably already have this. If that's the case, then jump to Step 2 below.

If you are starting out from scratch, you'll need to set up a combo box or a list box to your page. These two types of menus will both do the same job - the difference between them is how they look on your page.

  1. Select Insert > Control (More) > Combo Box or List Box.
  2. Click into Control > Item list in the Object Inspector on the right of the screen, and enter the list of brands that you want to be able to switch between. Each item should be separated by a semi-colon (;).
  3. (Optional) Change the formatting options in the Control section (e.g. fonts, colors, etc).
  4. Click into Properties > General > Name, and change it to "brand.switch". It doesn't matter what you call it, so long as you remember this name for the next step.

In this example I will use three brands: Coke, Diet Coke, and Pepsi.

Step 2 - Translate your brands into numbers

The conditional image visualization tool chooses which image to display based on a numerical value. Meaning that all we need to do is choose a number for each brand, so that our visualization knows which image to show.

  1. On the same page, select Insert > R Output.
  2. Paste in the code below.
  3. Click Calculate.
  4. Select Appearance > Hide. This prevents the number from showing up with the document is published.

The code you need to use is like this:

brands = c("Coke", "Diet Coke", "Pepsi")
current.brand = match(brand.switch, brands)

The first line of the code lists the brands in the same order as they appear in your menu from Step 1. The second line of code looks up the position of the selected brand in the list. So if the user selects "Coke" the value will be 1, if they select "Diet Coke" the value will be 2, and so on.

Now, if you change the menu selection, you should see the number in the output update itself.

Step 3 - Create your image

The final stage is to create the conditional image visualization, and then connect it to the number above in Step 2. Importantly, your images must be specified by URLs. That is, they need to be hosted on the web somewhere, and you need to copy in the links.

  1. Select Insert > Visualization > Conditional Image.
  2. Change Inputs > DATA SOURCE > Data source to Use an Existing R Output.
  3. Click into Inputs > DATA SOURCE > Input data and select the R Output that you created in Step 2 above (in this example it is called current.brand).
  4. Change Inputs > OUTPUT > Image type to Custom Images.
  5. Paste the URL of the image for the first brand into Default image.
  6. Change Threshold 1 to the number 2
  7. Paste the URL for the second brand logo into Image 1.
  8. Change Threshold 2 to the number 3
  9. Paste the URL for the third logo into Image 2.

Here is the appearance of my settings for this example:

To see a very basic example of the finished product, click this link. To get a copy of the original document so that you can see the code and other options that I have used in this post, click here.

]]>
https://www.displayr.com/how-to-switch-logos-and-images-based-on-user-selections/feed/ 0
How to Remove a Row or Column using R in Displayr https://www.displayr.com/how-to-remove-a-row-or-column-using-r-in-displayr/?utm_medium=Feed&utm_source=Syndication https://www.displayr.com/how-to-remove-a-row-or-column-using-r-in-displayr/#respond Mon, 27 May 2019 00:30:34 +0000 https://www.displayr.com/?p=15623 ...]]> In doing so, you end up with tables within R Outputs. These R tables cannot be manipulated with the Data Manipulation techniques in the Ribbon, as these buttons are designed for tables that you build from variables in your Data Sets. Tables you make with R you will need to manipulate with R.

Consider the table below that is within an R Output. (It has been generated by subtracting the scores between two sources tables: the scores for males in one table minus the scores for females on a similar table). What's important here is that in the output the  “None of these” row and the NET row/column have carried over. We may want to remove them in our final R Output:

Table remove rows

One way to accomplish this is to go back to the source tables, and remove them there (without the need to fiddle with any R). But there are situations where you don’t want to change the variable set and/or perhaps your scenario is such that you can’t change it. The good news is that removing a row or column from your R outputs is very easy to do with just 1-2 lines of additional code. In this post, I’ll demonstrate how you can use some code to do this two ways:

  • Specifying the rows/columns to remove by index
  • Specifying the rows/columns to remove by name

The second one is likely the most useful of the two because often we want to remove a particular row/column than the 1st, 8th or last row/column.

Note: If the terms subsetting and index are unfamiliar to you, I suggest reading this introductory post: How to do Simple Table Manipulations with R Using Display. In all of the below, the name of the R Output we're referring to is "table".

Specifying the rows/columns to remove by index

Let’s say you wanted to remove the “None of these” and the “NET” row. A simple way to do it (provided the order of your rows isn’t likely to change) is to just specify the rows you want to keep:

 table[1:6,] 

But you could also use a minus sign (-) and then specify the rows you don’t want to keep. So in this alternative, we’re saying we “don’t want the 7th and 8th row".

 table[-(7:8),] 

This is all very well and good, but it becomes a bit problematic if the ordering of your rows changes. With an update to the data, the NET suddenly becomes the 9th row. Perhaps then you’re better to specify the labels for the rows, as per the next section.

But there is one more trick you can do with specifying by index, and that is you get it to remove the last couple of rows. In this case, the code is as simple as:

n = nrow(table)
table[-((n-1):n),]

Here we’re getting the code to first calculate the number of rows, and storing that as n. Then, in the subset on the next line, we’re asking it to NOT return the second last to the last row (ie.. remove the last 2 rows). I could have put the above all on one line, but I think it's easier to see what's going on with the n.

Specifying the rows/columns to remove by name

If you change the source tables (e.g. by updating the data to add, subtract, or sort rows/columns), then the ordering of the R Output may be out-of-date, and so we could end up removing the wrong row. I want to be confident the updates to my R Outputs will be accurate and correct. For that reason, I prefer to specify the names of the rows or columns I'd like to remove. To do this, I use the function setdiff() which figures out what to retain (i.e. what remains after you specify what to drop).

x = setdiff(rownames(table),c("None of these","NET"))
y = setdiff(colnames(table),"NET")
table[x,y]

Let me break it down for you:

  1. On the first line, the setdiff() function calculates the difference between all the row names in the original table, and the array of labels I’ve specified using the combine function(). So the remainder is just the six brands. I’ve stored this array of 6 brands as x.
  2. Likewise, I’ve done the same for the columns, storing it as y. Because there’s only one ("NET") I didn’t need to use the combine function c() when inputting it into the setdiff() function.
  3. And then one the third line, I’ve asked the R Output to subset the table by x and y respectively.

Try for yourself

The examples above can be found on this Displayr document here.

]]>
https://www.displayr.com/how-to-remove-a-row-or-column-using-r-in-displayr/feed/ 0
How to Set the Initial Zoom and Position of Geographic Maps https://www.displayr.com/how-to-set-the-initial-zoom-and-position-of-geographic-maps/?utm_medium=Feed&utm_source=Syndication https://www.displayr.com/how-to-set-the-initial-zoom-and-position-of-geographic-maps/#respond Tue, 23 Apr 2019 03:12:05 +0000 https://www.displayr.com/?p=17103

Create a geographic map to your liking using the leaflet map package in Displayr

  1. Follow instructions on How to Make a Geographic Map in Displayr to create a map and hook up your data to the map. The example I will walk through maps the percentage of food inspections that passed in each Chicago zip code between 2010 and June 2018 (original data found here).
  2. Select a few more settings to make the map look nice. For my map I am also selecting the following in the Object Inspector:
    • Use the leaflet map package using Chart > APPEARANCE > Map package > leaflet.
    • Only show "Pass" rates by setting Inputs > COLUMN MANIPULATIONS > Number of columns from left to show as 1.
    • Show background map for context by checking Chart > APPEARANCE > Background map.
    • Set the color of missing regions to transparent by selecting Chart > APPEARANCE > Color of NA values > More colors > white box with the red X.
    • Set the shading so that darker reds mean a lower pass rate using Chart > DATA SERIES > Color palette > Reds, dark to light.

The map below is created using the steps above.

You can't see any of the data for Chicago in the map initially. To help the viewer see where the data is, we will zoom the map into Chicago automatically for them.

How to customize the default area shown on the geographic map

You'll need four bits of information before we get started:

For the code Description Value in example
YourVisualizationName The Name of your map in Dispayr. You can find this by going to Properties > GENERAL > Name chart.5
TheLongitude The longitude for the center of the map -87.6298
TheLatitude The latitude for the center of the map 41.8781
TheZoomLevel The zoom level from 0 (the world) to 18 (street-level) 9

 

We will use this information in our R code to customize the initial appearance of the map.

  1. Access the R code underlying the map Properties> R CODE
  2. At the bottom of the R code, load in the leaflet R package with library(leaflet) to get access to the full set of functions to customized the leaflet map.
  3. Add another line after that, to set the initial position and zoom of the map using the setView function from the leaflet R package. The syntax is as follows: YourVisualizationName <- setView(map = YourVisualizationName$htmlwidget, lng = TheLongitude, lat = TheLatitude, zoom = TheZoomLevel). For this example, this will be:

    [sourcecode language="r"]
    library(leaflet)
    chart.5 <- setView(chart.5$htmlwidget, lng = -87.6298, lat = 41.8781, zoom = 9)
    [/sourcecode]

After your map recalculates, it will now automatically be zoomed into your desired area. The final map from my example is shown below.

Tips for positioning the initial map

  • You can tweak your initial zoom level by using decimals, such as 1.5.
  • To find the latitude and longitude to center your map you can:
    1. Click on a blank spot on a Google map. A small window at the bottom of the screen will display the coordinates.
    2. Google "YourCity/State/Country lat long", and use coordinates from the results.
    3. Use a website such as https://www.latlong.net/ to find the coordinates.

Try creating one yourself with our interactive Geographic Maps tutorial

Check it out here
]]>
https://www.displayr.com/how-to-set-the-initial-zoom-and-position-of-geographic-maps/feed/ 0
Using the API to Create a Regression and Save Values as a JavaScript Variable https://www.displayr.com/using-the-api-to-create-a-regression-and-save-values-as-a-javascript-variable/?utm_medium=Feed&utm_source=Syndication https://www.displayr.com/using-the-api-to-create-a-regression-and-save-values-as-a-javascript-variable/#respond Wed, 10 Apr 2019 06:43:55 +0000 https://www.displayr.com/?p=17069 ...]]> Step 1: Do everything in Getting Started with the Displayr API

Once you have done this, open up the document and it should look like this:

Step 2: Obtain the Document secret

To modify a document using the API we need to know its Document secret. This is found by following these steps:

  1. Go to the document's settings page (if in the document, click on the cog at the top right of the screen and press Document Settings)
  2. Expand out the Properties section.
  3. The document secret is located in the bottom-right corner.

Step 3: Download the regression.zip file

  1. Click here to download the zip file
  2. Double-click on it to open it
  3. Save its contents somewhere on your computer or network

The zip file contains:

  • A file called regression.QScript which contains a QScript for running the regression and creating a new variable with the predicted values
  • A file called regression.py which contains a Python script for running the QScript

Step 2: Edit and run the regression.py file

  1. Open the file in a text editor
  2. On line 20, replace insert-document-secret with the document secret (as described above)
  3. Save the file
  4. Run the regression.py script using the process in Step 6 of Getting Started with the Displayr API
  5. Check out the regression model (it has been added as a new page in your document) and the variable at the top of the list under Data Sets contains the predicted values from the model.
]]>
https://www.displayr.com/using-the-api-to-create-a-regression-and-save-values-as-a-javascript-variable/feed/ 0
How to Fit a Structural Equation Model in Displayr https://www.displayr.com/how-to-fit-a-structural-equation-model-in-q/?utm_medium=Feed&utm_source=Syndication https://www.displayr.com/how-to-fit-a-structural-equation-model-in-q/#respond Thu, 21 Mar 2019 07:09:20 +0000 https://www.displayr.com/?p=16732 ...]]> In this post I am going to walk through the steps of fitting a structural equation model (SEM) in Displayr. The post assumes that you already know what a SEM is and how to interpret it.

Case study

In the post I am going to analyze Bollen's famous Political Democracy data set (Kenneth Bollen (1989), Structural Equations with Latent Variables, Wiley.)

Step 1: Load the data

Typically data sets are loaded into Displayr from raw data files. But, in this case we will load some data that is stored in an R package.

  • Insert > New Data Set > R
  • Name: BollenPoliticalDemocracy
  • Paste in the code below into the R CODE box
  • Click OK

Step 2: Fit the model

The hard step is fitting the model, as this requires you to specify the measurement model, the relationships to be tested (i.e., the regressions), and the correlation structure of the model. For more information about this, please check out the lavaan website.

To do this:

  • Insert > R Output
  • Paste in the code below
  • Press Calculate

Step 3: Review the path diagram

In order to check that the model has been correctly specified it's a good idea to review the path diagram.

  • Insert > R Output
  • Paste in the code below
  • Press Calculate

Step 4: Extract the summary statistics

  • Insert > R Output
  • Paste in the code below
  • Press Calculate
  • In the Object inspector, on the right of your screen, click Properties > OUTPUT > Show as > Text
  • To align the text neatly, go to Properties > APPEARANCE and set the font to Courier New.

]]>
https://www.displayr.com/how-to-fit-a-structural-equation-model-in-q/feed/ 0
How to Chart Web Traffic using Google Analytics and Displayr https://www.displayr.com/how-to-chart-web-traffic-using-google-analytics-and-displayr/?utm_medium=Feed&utm_source=Syndication https://www.displayr.com/how-to-chart-web-traffic-using-google-analytics-and-displayr/#respond Mon, 28 Jan 2019 23:34:10 +0000 https://www.displayr.com/?p=13438 ...]]> The R package googleAnalyticsR has been built specifically for R users using the Google Analytics Reporting API v4. I have previously outlined the best authentication process between Displayr and the API (see How to connect Displayr to the Google Analytics API for more details), but will do a quick re-cap. Essentially what we need to do is log into Google Analytics and set up a Google project and service account then download a secret JSON key containing authentication credentials which we push through to the API via R code.

Once authentication has been set up in your R Output via Insert > R Output (Analysis Group), the next thing we need is the View ID of the website you want to pull data from. To determine your View ID, ensure you are logged into your Google Analytics account under the specific website you want to view (if you have multiple sites monitored), click Admin on the bottom left, go to the View column and click View Settings. The View ID will be visible under Basic Settings.

Call the API

In the below example, I will call four different metrics - users, new users, sessions and page views – from my website for all records last quarter split by date:

library(googleAnalyticsR)

view_id = XXXXXX # replace this with your View ID

df = google_analytics(view_id,
            date_range = c("2018-07-01", "2018-09-30"),
            metrics = c("users", "newUsers", "sessions", "pageViews"),
            dimensions = c("date"),
            max = -1)

Here I have used max = -1 so that it will pull all the data, but you can also cap this at a specific number if you wish. Once you press Calculate, you will see a result with a structure like this:

If you also want to make this call take place daily at a specific time (e.g. 9am), we can add the below lines to the top of the R Output:

library(flipTime)
UpdateAt("01-11-2018 09:00", units = "days", frequency = 1, options = "wakeup")

Data Sampling

It's important to note that for standard Google Analytics accounts, data sampling occurs when you reach the limit of 500k sessions at the property level for the specified date range (see data sampling) in order to fetch results faster. This means that if your API call is requesting more than 500k rows, some of the rows will be estimates rather than measured values. Of course, the number of records requested by an API call will depend on the popularity of your site, the specified date range, and other factors.

If you are using a wide date range it may be prudent to split the date ranges into separate calls so as to avoid hitting the session sampling threshold and then combine them together later using a simple rbind command, for example. You can easily compare the outputs with those produced by Google Analytics to ascertain the correct split.

Another option is to use the anti_sample = TRUE setting in your API call, but it won't work in every situation. If you click Show raw R output under OUTPUT on the Object Inspector when using this option, you can read logs outlining how much sampling is taking place. By default, anti-sampling already exports all records so you don't need to set a value of max. Not using anti-sampling will also allow you to use date shortcuts such as "90daysAgo" or "yesterday" for both start and end date. Otherwise, you will need to specify the exact dates. For a list of all the metrics and dimensions you can call via the API, see API names.

Visualize your data

Now that we have the data as a table, we can hook this up to one of Displayr's cool visualizations. I have chosen the area chart (Insert > Visualization (Analysis group) > Area Chart) which is essentially a line chart with the background colored in. I just need to select the R output under DATA SOURCE > Output in 'Pages' on the Inputs tab of the Object Inspector and change some settings.

First, I will tick Show as small multiples (panel chart) to split this into separate charts for each metric, then I will add a smoother line on the Chart tab under TREND LINES > Line of best fit. I've chosen Friedman's super smoother, changed Line type to dot and ticked Ignore last data point.

You can make an area chart for free using Displayr's area chart maker! Plus now that you know how to link up Google Analytics to Displayr, you can use your own website data! 

 

]]>
https://www.displayr.com/how-to-chart-web-traffic-using-google-analytics-and-displayr/feed/ 0
How to Blank Cells with Small Sample Sizes using R in Displayr https://www.displayr.com/how-to-blank-cells-with-small-sample-sizes-using-r-in-displayr/?utm_medium=Feed&utm_source=Syndication https://www.displayr.com/how-to-blank-cells-with-small-sample-sizes-using-r-in-displayr/#respond Fri, 18 Jan 2019 04:35:03 +0000 https://www.displayr.com/?p=14843 ...]]> In this post, I explain how you can automatically modify the contents of tables using a secondary R Output. In doing so, we give you a template for some simple R code that you can flexibly use whatever your scenario.

Cell modification with R, a recap

In "How to Blank and Cap Cells of Tables Using R in Displayr", I explained how you can modify the cells of a table in an R Output by using a condition. The condition then becomes the subset of the table you are modifying. It works like this:

table[condition] = value

In English, the square brackets specify a subset of a table. When the condition evaluates to TRUE, then we're manipulating just that subset of the table. Using the equals sign, it sets that subset to be equal to a new value. In the case of blanking cells, that value is NA (which stands for a missing value).

Note: In either case, you need to put in an extra line of code, which is just ‘table’. This returns the final table with the substituted values (and not just the value). This line is included as the line of code in the examples below.

How to blank cells with small sample sizes

Now, to get R to blank a table with small sample sizes, the code needs to reference the sample size for each figure. There are a couple of different ways to give this information to R. I cover one way below and describe an alternative at the end of post.

I like to have a source table that has both the values and the sample size within each cell. In the grid summary table below, I’ve specified both % and Base n as statistics.

Q5 Brand Image table with sample size

This table has the name (table.Q5). Putting the following code in an R Output (Insert > R Output) will blank all the cells with a base n less than 75.

x = table.Q5
y = 75
values_tab = x[,,"%"]
base_tab = x[,,"Base n"]
values_tab[base_tab &lt; y] = NA
values_tab

The first line is specifying the source table. The second line is specifying our threshold for small sample size. The third line creates a table that only has the values (% in this case). The fourth line produces a table of just the base. This is the basis of the condition (next line). The fifth line is the key that pulls it altogether. It basically says "if the base is less than the threshold of 75 in the table, then substitute with a missing value (NA)". The sixth line just returns the new table of values (freshly substituted). So the end result is the below:

Blanked cells table

Adapting the code - having a separate table of values and base size

If you’re borrowing the above code, be sure that you’ve got the correct statistics in the source table. For example, the base n in a cross-tab is different from the column n. The column n is what you use to derive column-%’s. Remember, in multi-variable questions (such as a Pick Any), the base n or column n could vary by row (or column). In the worked example above, each % in the cells of the source table was a separate binary variable (grouped into a Pick Any - Grid), so had its own base n.

You don’t have to use just one source tab to house all your reference statistics. You could have the statistics in separate source tables, but you’d need to adjust the code accordingly, a bit like the below (where lines 1 and 2 refer to different tables in the document).

values = table.Q5
base = table.Q5.base
y = 75
values[base &lt; y] = NA
values

Be aware that the tables need to overlap exactly in terms of the order of their rows and columns. That’s why I prefer to use just the one source table (and extract what you need from that) wherever possible.

And of course, you can fiddle with the code to produce a different outcome. For instance, you can set all the cells to 0 instead of NA if you prefer.

Try it yourself

The worked example is in this Displayr document, so you can see the code in action.

]]>
https://www.displayr.com/how-to-blank-cells-with-small-sample-sizes-using-r-in-displayr/feed/ 0
How to Sort your Data with R in Displayr https://www.displayr.com/how-to-sort-your-data-with-r-in-displayr/?utm_medium=Feed&utm_source=Syndication https://www.displayr.com/how-to-sort-your-data-with-r-in-displayr/#respond Tue, 15 Jan 2019 06:07:53 +0000 https://www.displayr.com/?p=13940 ...]]> But there may be situations when custom automatic sorts will require you to fiddle with the underlying R CODE. Below, we discuss a couple of examples showing how you can add a line of R code to your R Outputs to get them sorting automatically. We hope to shed light on the one line of code needed, so you can then adapt it to your needs. Make sure you check out "How to do Simple Table Manipulations with R in Displayr" if you haven't yet, as this post assumes some knowledge from that post.

How you can sort data in Displayr without touching any code

For many of the R-based features in the Insert menu (mainly Visualizations), we’ve actually got the option to sort rows within the Inputs panel of the Object Inspector. So, the R Output interprets the source table as though it’s being sorted before the output is actually drawn.

Sorting rows

When you may like to sort data via R code

One scenario where you may need to get into the R CODE to do the sorting is when you’re making your own custom table in an R Output. Examples might include a table that’s a KPI summary, a brand index matrix or any calculation/compilation. You only need to add a line of code at the end to keep the table sorted automatically. For example, consider the table below, which is the brand funnel built by R Code (as explained in this post).

Brand funnel

By including line 7 in the code used to build the table, it will sort automatically.

Sorting rows

Another scenario is that you’ve used one of Displayr's built-in tools for joining tables (such as Home > Tables > Merge Two Tables), and you want to sort the final output. You can do that by going to Properties > R CODE in the Object Inspector of the output. For example, the table below was created using the menu item Insert > Tables > Merge Two or More Tables:

MergedTables

And then by going into Properties > R CODE in the Object Inspector, I added line 5 below. Notice what happens to the output:

Sorting

Understanding the magic line of R Code

The R Code looks complicated, but once you break it down, the logic of it isn’t that hard to get your head around. It just looks convoluted. The basic example (which you can use as a template) for a crosstab looks like this:

table[order(table[,column], decreasing = TRUE),]

Note that  “table” is the name of the table (data frame or matrix in R lingo) you wish to sort within the R Output and “column” is the column you’re referencing. I put them in blue so it stands out that these are the key bits you need to adapt. 

The first bit to understand is that you can give an array of indexes to R via the square brackets and it will sort the table for you. Let’s say, I had the following which is from a table with a reference name of tabQ3:

Table Q3

The order of indexes of the rows from highest to lowest is 7,1,3,6,2,4,5

We feed that as an array in a table subset (with square brackets). I use the c() combining function to put the numbers together.

table = tabQ3
table[c(7,1,3,6,2,4,5)]

Table3 sorted

So how then do we get that list of indexes without doing it manually as I did above? With the order() function. The combining function c(7,1,3,6,2,4,5) is the same as writing order(table, decreasing = TRUE). Putting that into the table subset, it then becomes table[order(table, decreasing = TRUE)]. Yes, I know there are brackets within brackets of different types.  You need the decreasing = TRUE bit otherwise R will sort in ascending order (which you may want).

The above example is with a single-column table, so it's one dimensional. If you have two dimensions, then you need an extra comma when you reference the table (if that doesn't make sense, then check out this introductory post). The below sorts a crosstab of Preferred Cola (rows) by Age (columns) on the first age category. The first line of the code is simply to store the reference as an object called 'table' within the R Output.

table = table.Q3.Preferred.cola.by.D1.Age
table[order(table[,"18 - 29"], decreasing = TRUE),]

Crosstab sorted

As I mentioned earlier, to someone new at R, line 2 of the code seems convoluted. But hopefully, my step-by-step explanation of subsetting a table by means of an array of indices untangles this for you. Remember, you can source this line of code and adapt it to your context.

Test yourself: how would you sort the same crosstab above by rows instead? Say by Coca-Cola?

(Answer = table[,order(table["Coca-Cola",], decreasing = TRUE)]

Have a look for yourself

In this Displayr document, I’ve got the worked examples from above. So you can go in and have a look (and a play!)

]]>
https://www.displayr.com/how-to-sort-your-data-with-r-in-displayr/feed/ 0