JavaScript How To... - Displayr

Creating and Working with JavaScript Variables

Kris Tonthat — Mon, 16 Dec 2019 16:23:31 +0000

It’s easier than you think to use basic JavaScript to create new variables! The following worked example in Displayr shows you how to combine, split, and transform data into new variables using conditional ‘if’ statements and Boolean logic (“and”, “or”).

When to create JavaScript variables instead of R variables in Displayr

Just like R variables, JavaScript variables are updated automatically in the project, and you can also easily go back to edit variables if you want to tweak the code. There are two key differences to keep in mind between JavaScript and R. JavaScript code is ran through your browser and was originally designed as a scripting language for websites. R code requires data to be packaged up and processed on our R servers and was specifically designed to do basic as well as advanced analysis. While usually there isn't a noticeable difference between creating a basic JavaScript variable versus R, if your data is quite large (think thousands of cases) and you're doing a simple manipulation on it, it may be faster to use JavaScript. If your new variable requires more advanced calculations and coding, it may be easier - or only possible - to create it using R.

How to create JavaScript variables in Displayr

Create a JavaScript variable via the Ribbon in Displayr (Insert > JavaScript > Numeric Variable). You have a choice of creating a Numeric or Text variable. For simplicity, let’s consider only numeric variables for now.

A new variable will appear in the Data Sets panel, and the JavaScript expression can be entered into the JAVASCRIPT CODE box in the Object Inspector. An example is shown below:

You need to write (or paste) a JavaScript expression using variable Names (not labels). Hovering over a variable in the Data Sets panel will show the variable name. You can also click-and-drag the variable into the text box to paste the variable name.

A worked example

In our example, we have collected consumption data on six different sub-types of cola. This forms six different numeric variables. The following JavaScript expression finds the sum of the variables with JavaScript: q2a_1 + q2a_2 + q2a_3 + q2a_4 + q2a_5 + q2a_6.

These are referred to as the input variables. You can see a preview of the new variable beside the JavaScript Code box and check that your expression is working correctly.

You can manually check the first few lines to confirm that your JavaScript expression works as expected.

Banding a numeric variable

Suppose the expression in the above example resulted in a new numeric variable that we called “sumCola”. Now we want to create an additional variable that allocates people to one of three categories based on their total cola consumption (Light Drinkers – less than 5 colas, Medium Drinkers – between 5-10 colas, and Heavy Drinkers – more than 10 colas).

The if command allows you to do this:

if (sumCola < 5) 1; 
else if (sumCola>= 5 && sumCola <= 10) 2; 
else if (sumCola > 10) 3; 
else NaN;

The code can be interpreted as follows:

if (sumCola < 5) 1;

If the variable sumCola is less than 5, return a value of 1.

else if (sumCola>= 5 && sumCola <= 10) 2;

If the first expression above does not apply, and if the variable sumCola is greater than or equal to 5 and (represented by &&) less than or equal to 10 (i.e., falls between 5 and 10), return a value of 2.

else if (sumCola > 10) 3;

If the first two expressions above do not apply, and if the variable sumCola is greater than 10, return a value of 3.

else NaN

Consider any other values as missing. NaN stands for Not-A-Number (i.e., missing or blank).

The newly created variable will have the values of 1, 2, or 3 (and perhaps NaN). Because we want this to act as a categorical variable, we have to add two extra steps. Click on the variable and make the following changes in the Object Inspector:

Change the Structure to Nominal: Mutually exclusive categories.
Click the Labels button and give the variable more descriptive labels (e.g., 1 = “Light Drinker”, 2 = “Medium Drinker” and 3 = “Heavy Drinker”).

You can create the same bandings by merging columns in a table. However, when you import new data, new values may need to be manually merged into one of the three categories. The beauty of using the JavaScript variable is that it automatically allocates new results to the appropriate category.

Instead of having to create a separate sumCola variable just to be used in your JavaScript variable, you can also combine the two into a single variable. This can be done by creating variables within the JavaScript variable code. In the following example, we create a new variable within the JavaScript variable called x, and use it in our if-else logic:

var x = q2a_1 + q2a_2 + q2a_3 + q2a_4 + q2a_5 + q2a_6;
if (x > 5) 1;
else if (x <= 5 && x >= 10) 2; 
else if (x < 10) 3; 
else NaN;

Combining different weighting variables

In the above examples, the JavaScript variables are returning constant values. But JavaScript variables can return the values of other variables as well. A common application is the need to combine the weights for various segments into one variable.

Consider a situation where we’ve collected data for four countries: UK, France, Australia, and Japan. The data for each country was held in 4 separate data files and merged into a final file. Membership to a particular segment (market) is defined by a variable called country. Each country’s data file had a specific weighting variable, so we now have four different weighting variables in the merged data file (weight_UK, weight_FR, weight_AU, weight_JP). When we do our analysis, we only want to deal with one weighting variable. So let’s combine them using JavaScript:

 
if (country == 1) weight_UK; 
else if (country == 3) weight_FR;  
else if (country == 7) weight_AU;
else if (country == 4) weight_JP; 
else NaN;

You’ll notice that there are two consecutive equals signs in the above expression. You need both of them because a single equals sign has a different role in JavaScript. In short, a single equals sign is used for assignment while the double-equals sign tests for equality.

When setting a variable as something (also called assigning a value), as in the example above with the variable called x, use the single equals sign. When testing for equality, as in the example above with the expression country == 3, use the double-equals sign.

You may also notice that the values of the country variable are not sequential. That’s because, in our hypothetical example, the variable country has values of 1 = UK, 2 = Germany, 3 = France, 4 = Japan, etc.

Making a date variable from a categorical variable

In Displayr, JavaScript (like R) has a suite of ready-made functions that can help you do great things. One such function is Q.EncodeDate(). It will enable you to return a number in a format that Displayr can, in turn, recognize as a date question. When used in conjunction with conditional statements, you can turn a categorical variable (e.g. period or wave variable) into a date question. Date questions have numerous benefits in Displayr, such as automatic aggregation of time, testing (significance) against the previous period, specific filter functions, and more.

Suppose you have a variable in your data file called week that encodes the week of the survey in categorical format. As a result, it just encodes values: 1 = week one, 2 = week two, 3 = week three, and so on. You can turn this into a date question, starting the first week on the 1st Jan 2018, using an expression like the one below. Notice that Q.EncodeDate works with the YYYY,MM,DD format.

 
if (week == 1) Q.EncodeDate(2018,01,01); 
else if (week == 2) Q.EncodeDate(2018,01,08); 
else if (week == 3) Q.EncodeDate(2018,01,15); 
else if (week == 4) Q.EncodeDate(2018,01,22); 
else if (week == 5) Q.EncodeDate(2018,01,29); 
else if (week == 6) Q.EncodeDate(2018,02,05); 
else NaN;

Then change the variable structure to Date/Time to use it as a date variable set.

Working with JavaScript text variables

JavaScript text variables are useful if you want to join the results of different open-ended variables. The following expression joins three spontaneous brand awareness questions into a single text variable and inserts punctuation. It is akin to the concatenate function in Excel.

q1a_1 + ", " + q1a_b + ", " + q1a_c + "."

If you aim to create create a categorical variable, as in the first example with Light-Heavy drinkers, then you can use a text JavaScript variable to generate the labels. Displayr automatically sets up value labels when converting a text variable to a categorical variable.

If you create a numeric variable, you may need to manually assign labels to the values (e.g., 1 = Segment A, 2 = Segment B) as per the first example in this post. The first example (summing the colas) can be rewritten as a text variable with the following:

if (sumCola > 5) "Light Drinker";
else if (sumCola <= 5 && sumCola <= 10) "Medium Drinker"; 
else if (sumCola > 10) "Heavy Drinker";
else "";

Tips when working with code

Writing in computer code needs to be exact – you need to pay attention to:

The presence or absence of semi-colons: they signal that you’re ready for the next bit of the expression.
The capitalization of letters: case-sensitivity means that “if” does not mean the same as “If” or “IF”.
The right number of brackets (and where they open/close)
Whether brackets are (round) or {curly} or [square].
When symbols are doubled: double ampersands (&&) for “AND”, double-equal signs (==) for equality, and double pipes (||) for "OR".
Missing values with your input variables: they can lead to missing results if not handled correctly. There is a more elaborate code you can use to make conditional statements around missing values.

How to Band Numeric Variables in Displayr

Matt Steele — Tue, 25 Jun 2019 02:13:52 +0000

Let's say you are asking survey respondents for an absolute number (eg: how many colas have you consumed in the past week?) or a point on a set scale (eg: what proportion of staff are female? Please type in a number from 0 to 100). It’s not uncommon to want to band up the range of potential inputs into categories (eg: 0-5, 6-10, 11+) for analysis purposes.

The purpose of this article is to show you the options you have for creating a banded (categorical) version of a variable, using both drag-and-drop and code methods (R and JavaScript).

Checking the Variables: Structure and Values

These variables I just described are normally read into Displayr as numeric variables. That is what you would expect of a good data collection platform that only accepts a numeric input. For numeric data, the values and labels are one and the same. A variable’s structure is indicated by the icon next to the variable in the Data tree, but also in its Object Inspector under INPUTS > Structure.

Displayr reads these variables as nominal or ordinal if there is text involved with the value label (eg: 0 – Not at all satisfied and 10 – Extremely Satisfied are the endpoints of your scale). If that is the case, it is prudent to check the Values so that they align correctly with the labels. The Values button is just under the Structure dropdown in the Object Inspector as per the picture above. You don’t want a value of 1 ascribed to 0-Not at all satisfied and so forth (it should be a value of 0). You should change it so that the correct value aligns with the label.

Displayr will interpret these variables as text if there are spaces or other characters involved in the data. This is one key reason why Excel/CSV files are a poor file format for survey data. If you have it as a text variable, you can change the variable structure to numeric, but it can’t be guaranteed that all non-numeric information will be correctly converted into numeric values. So you may need to manually format and clean your text variable in the Excel/CSV file (ie: remove all non-numeric characters that could be ‘polluting’ the variable).

Banding by drag-and-drop

Banding via drag-and-drop is the easiest way to band your variable and makes the most sense if you are unlikely to update your data file with new data.

If your variable(s) is numeric use Home > Duplicate to create a copy of the variable and then change the copied variable set structure to be nominal (or ordinal). As per the picture above, you change the structure in Object Inspector > INPUTS > Structure.

Drag your categorical variable on to the page to make a table. Then select all the categories you want to band together (using Ctrl or Shift) and use Data Manipulation > Merge to merge them into a category. At this point, it’s prudent to use Data Manipulation > Rename to give the banding a correct label.

And that’s it! The main drawback to using this method is that if you update your data file with fresh data (eg: more respondents) then you may end up with a category that is not in a band. For example, if you band up 1,2,3,4,6,7,9, and 10 into a band "1-10" using drag-and-drop and then if you update your data file, you may have news cases which provide a 5 or 8 score. These new values have not been included in the "1-10" band, and you’ll have to manually merge them in (by repeating the process above). To get around this, use one of the code based systems below.

Banding via R variable

Banding with code is a sure-fire method to ensuring that all potential values within a range will end up in the correct bands. It’s very simple code to implement and doesn't require extensive R knowledge (just copy the template below). You can flexibly change the band later by tweaking the code.

Suppose your variable has the label Q2. Number of Coca-Cola consumed? It will also have a variable name (Q2_a). You can use either the variable label or name in Displayr. The variable name is revealed in the Object Inspector > Properties > General and also by hovering over the variable in the Data Set tree.

Insert > R Variable

Over in the R CODE window in the Object Inspector you can write a simple IF and ELSE IF statement. What I like about R CODE is that you can drag a variable from the Data Sets tree directly into the R Code box, and give it the convenient label ‘x’ (or whatever). So the first line of code looks like: x = `Q2 - No. of Coca-Cola consumed`

Then you can easily set up your bands referring ‘x’, like the below:

x = `Q2 - No. of Coca-Cola consumed
x[x = 0] = 0
x[x >= 1 & x <= 5] = 1
x[x >= 6 & x <= 10] = 2
x[x > 10] = 3
x

You can augment and adjust the code above to suit your banding needs, adding as many lines as you like. See here for a guide to IF and ELSE IF statements in R.

In the above, I've given the new bands values of 0,1,2, and 3 respectively. You could make these whatever you want. Once you've made the variable you will need to adjust the label for each Value, which you can do by going to the Values Attributes window using the Values button in the Object Inspector.

Banding via JavaScript

For those of you who prefer to use JavaScript, the process is very similar to the R variable, except of course you use JavaScript, so the code is a little different. It uses IF and ELSE IF statements. In the above, you can drag-drop and/or us the variable label in the code. With JavaScript, you need to use the variable name, as per the first line of the code below. The variable name is revealed in the Object Inspector of the variable (under Properties) and by hovering your mouse over the variable in the Data Set tree. So your code could look like this:

x = q2a_1
if (x == 0) 1;
else if (x >= 1 && x <= 5) 2;
else if (x >= 6 && x <= 10) 3;
else if (x > 10) 4;

Try for yourself

The examples in this post are in this Displayr document. The variables are at the top of the Data tree.

How To Convert Text Dates To Numeric

Mathew McLean — Tue, 12 Jun 2018 14:52:49 +0000

Formatting and cleaning data is a crucial and often time-consuming step in any data analysis. One frequent step in this process involves converting text dates to numeric values. In this article, I’ll be discussing at a high level how to do this conversion and why it is important.

We’ve all found ourselves needing to share data between systems only for it to fail because dates were formatted incorrectly in one program and couldn’t be read by the other. One tool may output dates that are in American format with the month coming before the day while your intended analysis tool expects dates to be in international format with day before month.

Given messy date text in a data file, we need to be able to read it in and work with it in some later step in our analysis. Why is it important to not simply leave the text dates as text? Because we frequently need to perform operations with the dates. Another reason it is important to convert the text dates is to ensure our code is as safe and efficient as possible. Each program will have certain features that can be used with text values and other features that work specifically with dates. For example, attempting to add “1” to “2018-10-31” does not make much sense to a program that only knows “2018-10-31” is a text value. However, if it treats that value as a date, it may know to add one day and output “2018-11-01”.

Common tasks we would like to perform with dates include displaying dates in a particular format for charts and tables, extracting specific parts of a date, aggregating data by date, or filtering data that falls into a certain time period. Fortunately, every popular data analysis software offers some features to simplify these tasks. However, it can be a hassle depending on the software and how messy the data file is. Consider the following table of data.

Most programs will have routines that recognize the dates in the column “Purchase date”, but the “Ship date” column with its inconsistent and less common formats will cause more difficulties. It is easy for different programs to get tripped up by these differing formats. The program reading the dates in will have to worry about things like whether the first day of the month is written “1” or “01”, whether months and days are separated by “/” or “ “, leap years, etc. Some of the most common formats include DD/MM/YYYY (e.g. 31/01/2018), DD/Mon/YYYY (e.g. 31/Jan/2018), and YYYY-MM-DD.

The presence of time stamps cause considerable additional headaches. With time stamps, a program needs to worry about time zones, whether dates are in 12 or 24 hour format, whether seconds or fractions of seconds are present, leap seconds, plus many more hassles. Some examples of common formats for the timestamps are “H:M:S” (e.g. 18:10:34), “H:M pm/am” (e.g 12:34 pm), and “H:M:S timezone” (e.g. 02:54:30 AEST).

We don’t want to have to worry about all these issues ourselves with each analysis, so we rely on features of the data analysis tool to handle this for us. Examples in popular programs include the DATEVALUE function in Excel, strptime in R, the Date class and its methods in Javascript, and the datetime function in python.