OpenRefine

UCI Libraries Digital Scholarship Services Fall 2019 Workshop on OpenRefine

View the Project on GitHub UCI-Libraries/OpenRefine

UCI Libraries Digital Scholarship Services

Friday October 18, 2019

1:00 PM - 4:00 PM

Instructor(s): Danielle Kane Helper(s): Madelynn Dickerson

REGISTER HERE

Please Note: You must be a UCI affiliate to register, please contact Danielle Kane: kaned@uci.edu if you are not a member of the UCI community to see if there is room before registering

General Information: OpenRefine is described as “a power tool for working with messy data” David Huynh - but what does this mean? It is probably easiest to describe the kinds of data OpenRefine is good at working with and the sorts of problems it can help you solve.

OpenRefine is most useful where you have data in a simple tabular format such as a spreadsheet, a comma separated values file (csv) or a tab delimited file (tsv) but with internal inconsistencies either in data formats, or where data appears, or in terminology used. OpenRefine can be used to standardize and clean data across your file. It can help you:

1. Get an overview of a data set
2. Resolve inconsistencies in a data set, for example standardizing date formatting
3. Help you split data up into more granular parts, for example splitting up cells with multiple authors into separate cells
4. Match local data up to other data sets, for example in matching local subjects against the Library of Congress Subject Headings
5. Enhance a data set with data from other sources

Some common scenarios might be:

1. Where you want to know how many times a particular value (name, publisher, subject) appears in a column in your data
2. Where you want to know how values are distributed across your whole data set
3. Where you have a list of dates which are formatted in different ways, and want to change all the dates in the list to a single common date format.

Who: The course is aimed at graduate students and other researchers, including undergrads, faculty, staff and community members. You don’t need to have any previous knowledge of the tools that will be presented at the workshop.

Where: Langson Library Rm 228. Get directions with OpenStreetMap.

When: October 18, 2019. Add to your Google Calendar.

Requirements: Participants must bring a laptop with a Mac, Linux, or Windows operating system (not a tablet, Chromebook, etc.) that they have administrative privileges on. They should have a few specific software packages installed (listed below).

Accessibility: We are committed to making this workshop accessible to everybody. The workshop organisers have checked that:

The room is wheelchair / scooter accessible.
Accessible restrooms are available.

Materials will be provided in at the workshop and large-print handouts are available if needed by notifying the organizers in advance. If we can help making learning easier for you (e.g. sign-language interpreters, lactation facilities) please get in touch (using contact details below) and we will attempt to provide them.

Contact: Please email Danielle Kane at kaned@uci.edu for more information.

Syllabus: OpenRefine

  1. Importing data
  2. Layout of OpenRefine, Rows vs Records
  3. Faceting and filtering
  4. Clustering
  5. Working with columns and sorting
  6. Transformations
  7. Advanced functions

Setup

Installing and running OpenRefine

You can download OpenRefine from http://openrefine.org/download.html. This lesson has been tested with all versions of OpenRefine up to the latest tested version, 3.2.

If you are using an older version, it is recommended you upgrade to the latest tested version.

There are versions for Windows, macOS and Linux.

Please follow the installation instructions on the OpenRefine wiki: Installation Instructions

Notes:

When you download OpenRefine for Windows or Linux from the address above, you are downloading a zip file. To install OpenRefine you simply unzip the downloaded file wherever you want to install the program. This can be to a personal directory or to an applications or software directory - OpenRefine should run wherever you put the unzipped folder. The location has to be a “local” drive as problems have been reported trying to run OpenRefine from a Network drive.

OpenRefine is a Java application, and you need to have a ‘Java Runtime Environment’ (JRE) installed on your computer to run OpenRefine. If you don’t already have one installed then you can download and install from http://java.com by going to the site and clicking “Free Java Download”.

OpenRefine does not support Internet Explorer or Edge. Please use Firefox, Chrome or Safari instead.