3 Principles of Good Spreadsheet Design
/I have run across countless spreadsheet articles that walk you through the steps for working with data, creating formulas, and making it all look good, etc. But, rarely do I run across articles that talk about the bigger picture of how to organize an entire workbook so that it is easy to use, modify, and analyze. Without this bigger picture in mind, it is so easy to create spreadsheets that are overly complex, convoluted, and ultimately produce data that is inaccurate. I recently worked with two organizations who were struggling with poorly designed spreadsheets that supported critical processes but clearly were not working for them. One was a food manufacturing company needing to track the many ingredients that were used in each of their products. The other was a transit company that needed to track ridership and vehicle usage for multiple routes.
Using my three principles of good spreadsheet design, I transformed their complex, convoluted, and error prone spreadsheets into organized and maintainable tools that are efficient for data entry and produce accurate reports.
Miles of Data to Track
The transit company is a great example of a spreadsheet designed with good intentions but that eventually spun out of control. The screenshot below illustrates how it combined data entry, multiple sets of data, and analysis into a single sheet. While this worked initially, data entry was slow and as routes changed and spreadsheet updates accumulated, errors piled up. Finally, the spreadsheet could not accurately produce the reports that were required by their funding sources. With all of the reporting functions intermingled with the data entry, each time they had to make a change, formulas had to be updated individually across all the rows or columns. As you can imagine, this was time consuming and impractical. Inevitably, some formulas were missed and the ridership totals and vehicle usage totals became inaccurate.
Principle 1: Separate Data Entry From Reporting/Analysis
Like most businesses, the transit company above used spreadsheets for two main purposes: data entry and reporting/analysis. Both are equally important but have conflicting requirements. When entering data, we want to be able to add new data points quickly, using the fewest mouse movements and keyboard clicks possible. The most efficient approach is to organize all of the data we are entering into adjacent rows and columns. On the other hand, when doing analysis we want to see the summaries of our data (totals, averages, etc.) all together so that we can interpret meaning and see trends. How to solve this conflict? Use separate tabs for the data entry and reporting/analysis parts of your spreadsheet.
In the case of the transit company, I rebuilt their spreadsheet to separate the data entry and analysis functions so that each could be managed separately. The revised sheet captures all the ridership and vehicle usage data on their own tabs, with formulas that are reused, while the reporting is done on its own separate tabs. Below is an example of the separated sheets.
Principle 2: Group Sets of Data into Separate Tables
Just like it's helpful to keep data and summaries separate, it is helpful to keep different sets of data separated. To improve the efficiency of the transit company spreadsheet, I created one table with the ridership information (see the Data Entry screenshot previously for how this looks) and a separate table to track the use of their vehicles. By separating this data it was much easier to enter each efficiently and then analyze ridership separately from vehicle usage.
They also wanted to be able to summarize their data by many different types of dates. For example, they wanted to be able to compare ridership on different days of the week, different months, and different quarters. I was able to accomplish this by adding a date dimension table, which contains details (e.g. day of the week, name of the month, etc.) about every date for a period of time. This table contains many years worth of dates so that it never needs to be updated. By defining a relationship between the ridership data and this date table, it becomes easy to view the data using different versions of the date without needing to add all of the date data to the ridership data table. Here's an example of the date dimension table and report table that uses the day of the week instead of the date.
Principle 3: Let Excel do the Formula Work For You
While there are endless guides for writing better formulas, the best formula is one that you don't have to write. Excel Tables and Pivot Tables allow you to analyze and calculate data while minimizing the number of formulas you need to write and manage. All of the reporting screenshots above are examples of Pivot Tables in use.
With Excel Tables you can turn a data range into a table and then, using one formula, calculate entire columns using the data in the table. If the formula needs to change later on, you can do it in one row and it will automatically update all the other rows. The transit company's vehicle usage data included starting and ending mileage for each vehicle. By turning this into a table, I was able to set up a formula that was used in a column to automatically calculate the total mileage driven for every entry. In the future, if the formula needs to be changed, they will only need to change it in one row and it will automatically update for all other rows.
Pivot Tables allow you to quickly and easily summarize and categorize your data without having to use any formulas. In our transit company example, they are now able to summarize their ridership data by stop, date, month, or bus as they choose by updating the pivot table. All without writing a single formula.