Programming
Article

Using Python to Parse Spreadsheet Data

By Shaumik Daityari

When you work on web applications for large organizations and enterprises, I am sure you have faced this unique problem. Organizations often store data in spreadsheets and you need to provide an interface for them to enter this data into your web application. The general idea is to upload the file, read its contents, and store it in either files or databases that your web application uses. You may also need to export data from your web applications. For instance, how would you export the marks of all the students in a class? Again, spreadsheets are the preferred medium.

In this post, we will discuss different ways to handle these files and parse them to get the required information using Python (2.7).

A Quick Spreadsheet Primer

Before parsing spreadsheets, you must understand how they are structured. A spreadsheet file is a collection of sheets and each sheet is a collection of data cells placed in a grid, similar to a table. In a sheet, a data cell is identified by two values — its row and column numbers.

A spreadsheet

For instance, in the given screenshot, the spreadsheet contains only one sheet, “Query1”. The cell “2A” corresponds to the 2nd row and 1st column. The value of cell 2A is “12510001”.

Although programs with a GUI assign letters to the names of columns, when we parse the data, we will start row and column numbers from 0. That means, cell 2A will correspond to (1, 0), 4B to (1,3), 3C to (2, 2), and so on.

A Quick Summary of CSV Files

CSV stands for “comma-separated values” (or sometimes character-separated if the delimiter used is some character other than a comma) and the name is pretty self explanatory. A typical CSV file looks like the following:

"ID", "Name", "Age"
"1", "John", "11"
"2", "Mary", "12"
"3", "Sherlock", "13"

You may convert spreadsheets to CSV files to ease the parsing. CSV files can be parsed easily using the csv module in Python.

Getting the required Python modules

To read and write XLS files, you would need to install two modules — xlrd and xlwt. You can do so through one of the Python installers: pip or easy_install.

pip install xlrd
pip install xlwt

You may use xlutils to perform the actions of both xlrd and xlwt.

To read and write CSV files, you need the csv module, which comes pre-installed with Python 2.7.

Reading Spreadsheets

If you have a file and you want to parse the data in it, you need to perform the following in this order:

  • Import the xlrd module
  • Open the spreadsheet file (or workbook)
  • Select a sheet
  • Extract the values of particular data cells

Open a spreadsheet file

Let’s first open a file in Python.

import xlrd
workbook = xlrd.open_workbook('my_file_name.xls')

If your file has non-ASCII characters, you should open it in the unicode format as follows:

workbook = xlrd.open_workbook('my_file_name.xls', encoding='cp1252')

If your spreadsheet is very large, you can add an argument on_demand to open_workbook, which loads only current sheets to memory:

workbook = xlrd.open_workbook('my_file_name.xls', on_demand = True)

Opening a sheet

If you know the name of the sheet, you can open it by running the following:

worksheet = workbook.sheet_by_name('My_Sheet_Name')

If you are not sure of the name of the sheet, you can open the first worksheet by its index:

worksheet = workbook.sheet_by_index(0)

workbook.nsheets wants to know the number of sheets. workbook.sheet_names() gives you a list of the names of the sheets present in the file, which helps you iterate over the sheets.

Getting data from cells

Once you have selected the worksheet, you can extract the value of a particular data cell as follows.

# Value of 1st row and 1st column
sheet.cell(0, 0).value

You can iterate it over a loop to extract data in the whole sheet.

You can detect an empty cell by using empty_cell in xlrd.

if sheet.cell(0, 0).value == xlrd.empty_cell.value:
    # Do something

Creating spreadsheets

The workflow for creating worksheets is similar to that of the previous section.

  • Import the module xlwt
  • Create a blank spreadsheet file (or workbook)
  • Create a sheet within the file
  • Put desired values in particular data cells
  • Save the workbook

Create a new file

You can create a new spreadsheet file using xlwt. A simple piece of code creates an empty file:

import xlwt

workbook = xlwt.Workbook()
workbook.save('my_file.xls')

You may open it in unicode just like in the case of xlrd.

Adding sheets

You can add sheets to your workbook by using add_sheet.

sheet = workbook.add_sheet('Sheet_1')

Working with cells in a sheet

Once you have created a sheet, you can add values to particular cells in the sheet as follows.

sheet.write(0, 0,'Inserting data in 1st Row and 1st Column')

In case you want to write data to rows, you can fix the row and add data.

row = sheet.row(1) # Selecting the second row
row.write(0,'2nd Row and 1st Column')
row.write(1,'1st Row and 2nd Column')

If you want to delete the data in a row, it can be done by a single command.

row.flush_row_data()

You can set the width of a column as well.

sheet.col(0).width = 625 # In pixels

Adding styles to cells

With xlwt, not only can you write values to cells, but you can add custom styles too! The following snippet shows how to do this:

style = xlwt.XFStyle()

font = xlwt.Font('Arial')
style.font = font

pattern = xlwt.Pattern()
pattern.pattern = xlwt.Pattern.SOLID_PATTERN
pattern.pattern_fore_colour = xlwt.Style.colour_map['red']

style.pattern = pattern

sheet.write(0, 0, "Some data", style)

You can also add custom colors to the xlwt palette.

Conclusion

As I mentioned earlier, creating and parsing spreadsheets is inevitable when you are working with huge web applications. Thus, familiarity with parsing libraries can only help you when the need arises.

What scripting language do you use to handle spreadsheets? Does Python have any other library for this purpose? Let us know in the discussion below.

Recommended

Learn Coding Online
Learn Web Development

Start learning web development and design for free with SitePoint Premium!

Get the latest in Front-end, once a week, for free.