Data Transformation with OpenRefine

Data, Data transformation, Tutorial

Data-Transformation-Open-Refine-Web-Scraper-Blog-Image

With cleaner data, we can begin to transform it. Data transformation can manifest in different forms. It can be clustering, merging, adding information, replacing strings, and so on. OpenRefine covers them all. 


To watch the video-tutorial visit our YouTube channel.

OpenRefine supports clustering. A useful feature when it comes to data transformation of misspelled records. It takes data from a column and clusters it allowing the user to bundle possibly-similar groups together. For example, if there are many entries as “Central Park”, and a bunch of entries as “central park”. Data under each would be bundled as different groups; however it is clear that the only difference between both is the lower and upper cases of the titles. For easier data analysis we would need to combine the two groups by clustering.

To check if there are any misspelled words or phrases in your data, you simply select the column -> Facet -> Text facet.

Data-Transformation-With-Open-Refine-Web-Scraper-Blog-Clustering

Data-Transformation-With-Open-Refine-Web-Scraper-Blog-Clustering

As in can be observed from the facet log, various “product-color” entries have spelling errors, such as lower case letters, additional spaces before/after/in-between words, misplaced period points, etc., for the groups “Midnight blue”, “ Grass green”, and “Red”.

Data-Transformation-With-Open-Refine-Web-Scraper-Blog-Misspelled-Records-Clustering

Data-Transformation-With-Open-Refine-Web-Scraper-Blog-Misspelled-Records-Clustering

To cluster them together click on the “Cluster” button in the top right corner of the facet log and select which groups should be clustered together, choose the default “New Cell Value” title or customize how you prefer, then apply the changes by clicking “Merge Selected & Re-cluster”.

Data-Transformation-With-Open-Refine-Web-Scraper-Blog-Clustering-Records

Data-Transformation-With-Open-Refine-Web-Scraper-Blog-Clustering-Records

The misspelled entries are now corrected to match the main group’s title.

This data transformation can be very useful and change a lot for your data analysis.

Data-Transformation-With-Open-Refine-Clustering-Done-Web-Scraper-Blog

Data-Transformation-With-Open-Refine-Clustering-Done-Web-Scraper-Blog

Now, we are going to dive deeper into modifying data more specifically. Frequently, some unnecessary strings or symbols can ruin the work of analysis with the data. For example, the “$” in the “product-price” column is an obstacle, because with the symbol the cell values will not be recognized as numbers for many spreadsheet softwares; therefore, we have to eliminate the symbols. The term “time-consuming” is an underestimation of how much time it would take to manually go over 8000 cells to delete a single symbol. In OpenRefine it can be done in only a minute with the help of GREL (General Refine Expression Language). 

GREL obtains such possibilities as:

  • creating a custom text or numeric facet;
  • adding a column based on another column;
  • Create a new column by fetching a URL;
  • Transforming cells in a column.

To delete the “$” from the “product-price” column select the column -> Edit cells -> Transform.

Data-Transformation-With-Open-Refine-Transform-Cells-Web-Scraper-Blog

Data-Transformation-With-Open-Refine-Transform-Cells-Web-Scraper-Blog

In the “Expression” part a formula is needed that will only remove the unnecessary symbols.

This can be done by two functions:

  • replace(value, "$", "");
  • value.split("$").join("").

Input the function, make sure that the formula works by checking the “Preview” log, then click “OK”.

Data-Transformation-With-Open-Refine-Deleting-Strings-Web-Scraper-Blog

Data-Transformation-With-Open-Refine-Deleting-Strings-Web-Scraper-Blog

As simple as that! Now the data transformation from a mixed data type column can be selected as solely a number column, making analysis and work with data easier. 

Now, we are going to look at how to import/merge data from a different project with OpenRefine. To begin, download the “Discounts-Data.csv” file here

Import the data into OpenRefine by creating a new project.

Data-Transformation-With-Open-Refine-Import-Second-Data-File-Web-Scraper-Blog

Data-Transformation-With-Open-Refine-Import-Second-Data-File-Web-Scraper-Blog

If we take a look at the additional file, it shows that there is a new column of discounts for two specific products.

Data-Transformation-With-Open-Refine-Extra-Column-Data-Web-Scraper-Blog

Data-Transformation-With-Open-Refine-Extra-Column-Data-Web-Scraper-Blog

We are going to merge “product-discount” into our primary data.

For this process, we are going to use another GREL formula.:

cell.cross('arg1','arg2').cells['arg3'].value[arg4]


  • arg1 = name of the project you are exporting data from = “Discounts Data csv”;
  • arg2 = name of the key column = “product-title”;
  • arg3 = name of the column you are importing = “product-discount”;
  • arg4 = indicate which value to import in the array (if multiple matches for the key) (recommended to use 0).

Since there will be cells that will not obtain an additional value, we add a little bit of logic to clean things up, and for the data not to show null values.

To begin the merging, select the “product-title” column -> Edit column -> Add column based on this column.

Data-Transformation-With-Open-Refine-Add-Column-Based-On-This-Column-Web-Scraper-Blog

Data-Transformation-With-Open-Refine-Add-Column-Based-On-This-Column-Web-Scraper-Blog

Following the formula layout, the formula we are going to input in the “Expression” log is as follows:

if(value!='null',cell.cross("Discounts Data csv", "product-title").cells["product-discount"].value[0],'')

Name the column as “product-discount” or as you wish to name it, and click “OK”.

Data-Transformation-With-Open-Refine-GREL-Merge-Data-Web-Scraper-Blog

Data-Transformation-With-Open-Refine-GREL-Merge-Data-Web-Scraper-Blog

Now that the data has been merged together, for simpler layout we are going to reorder the columns so that the “product-discount” is placed after the “product price”.

Select column “All” -> Edit columns -> Re-order / remove columns….

Data-Transformation-With-Open-Refine-Re-Order-Remove-Columns-Web-Scraper-Blog

Data-Transformation-With-Open-Refine-Re-Order-Remove-Columns-Web-Scraper-Blog

Drag the “product-discount” under the “product-price” and click “OK”. 

In this log you can also remove various columns at once, if needed. 

Data-Transformation-With-Open-Refine-Reordering-Columns-Web-Scraper-Blog

Data-Transformation-With-Open-Refine-Reordering-Columns-Web-Scraper-Blog

It is as easy as that!

Previous parts of "Data Transfromation with OpenRefine" blog series:

  • Part #1 - Introduction to OpenRefine;
  • Part #2 - Data Cleaning with OpenRefine.

Video tutorial of the "Data Cleaning with OpenRefine".


























Go back to blog page