When you work on web applications for large organizations and enterprises, I am sure you have faced this unique problem. Organizations often store data in spreadsheets and you need to provide an interface for them to enter this data into your web application. The general idea is to upload the file, read its contents, and store it in either files or databases that your web application uses. You may also need to export data from your web applications. For instance, how would you export the marks of all the students in a class? Again, spreadsheets are the preferred medium.
In this post, we will discuss different ways to handle these files and parse them to get the required information using Python (2.7).
A Quick Spreadsheet Primer
Before parsing spreadsheets, you must understand how they are structured. A spreadsheet file is a collection of sheets and each sheet is a collection of data cells placed in a grid, similar to a table. In a sheet, a data cell is identified by two values — its row and column numbers.
For instance, in the given screenshot, the spreadsheet contains only one sheet, “Query1”. The cell “2A” corresponds to the 2nd row and 1st column. The value of cell 2A is “12510001”.
Although programs with a GUI assign letters to the names of columns, when we parse the data, we will start row and column numbers from 0. That means, cell 2A will correspond to (1, 0), 4B to (1,3), 3C to (2, 2), and so on.
A Quick Summary of CSV Files
CSV stands for “comma-separated values” (or sometimes character-separated if the delimiter used is some character other than a comma) and the name is pretty self explanatory. A typical CSV file looks like the following:
"ID", "Name", "Age" "1", "John", "11" "2", "Mary", "12" "3", "Sherlock", "13"
You may convert spreadsheets to CSV files to ease the parsing. CSV files can be parsed easily using the
csv module in Python.
Getting the required Python modules
To read and write XLS files, you would need to install two modules —
xlwt. You can do so through one of the Python installers:
pip install xlrd pip install xlwt
You may use
xlutils to perform the actions of both
To read and write CSV files, you need the
csv module, which comes pre-installed with Python 2.7.
If you have a file and you want to parse the data in it, you need to perform the following in this order:
- Import the
- Open the spreadsheet file (or workbook)
- Select a sheet
- Extract the values of particular data cells
Open a spreadsheet file
Let’s first open a file in Python.
import xlrd workbook = xlrd.open_workbook('my_file_name.xls')
If your file has non-ASCII characters, you should open it in the unicode format as follows:
workbook = xlrd.open_workbook('my_file_name.xls', encoding='cp1252')
If your spreadsheet is very large, you can add an argument
open_workbook, which loads only current sheets to memory:
workbook = xlrd.open_workbook('my_file_name.xls', on_demand = True)
Opening a sheet
If you know the name of the sheet, you can open it by running the following:
worksheet = workbook.sheet_by_name('My_Sheet_Name')
If you are not sure of the name of the sheet, you can open the first worksheet by its index:
worksheet = workbook.sheet_by_index(0)
workbook.nsheets wants to know the number of sheets.
workbook.sheet_names() gives you a list of the names of the sheets present in the file, which helps you iterate over the sheets.
Getting data from cells
Once you have selected the worksheet, you can extract the value of a particular data cell as follows.
# Value of 1st row and 1st column sheet.cell(0, 0).value
You can iterate it over a loop to extract data in the whole sheet.
You can detect an empty cell by using
if sheet.cell(0, 0).value == xlrd.empty_cell.value: # Do something
The workflow for creating worksheets is similar to that of the previous section.
- Import the module
- Create a blank spreadsheet file (or workbook)
- Create a sheet within the file
- Put desired values in particular data cells
- Save the workbook
Create a new file
You can create a new spreadsheet file using
xlwt. A simple piece of code creates an empty file:
import xlwt workbook = xlwt.Workbook() workbook.save('my_file.xls')
You may open it in unicode just like in the case of
You can add sheets to your workbook by using
sheet = workbook.add_sheet('Sheet_1')
Working with cells in a sheet
Once you have created a sheet, you can add values to particular cells in the sheet as follows.
sheet.write(0, 0,'Inserting data in 1st Row and 1st Column')
In case you want to write data to rows, you can fix the row and add data.
row = sheet.row(1) # Selecting the second row row.write(0,'2nd Row and 1st Column') row.write(1,'1st Row and 2nd Column')
If you want to delete the data in a row, it can be done by a single command.
You can set the width of a column as well.
sheet.col(0).width = 625 # In pixels
Adding styles to cells
xlwt, not only can you write values to cells, but you can add custom styles too! The following snippet shows how to do this:
style = xlwt.XFStyle() font = xlwt.Font('Arial') style.font = font pattern = xlwt.Pattern() pattern.pattern = xlwt.Pattern.SOLID_PATTERN pattern.pattern_fore_colour = xlwt.Style.colour_map['red'] style.pattern = pattern sheet.write(0, 0, "Some data", style)
You can also add custom colors to the xlwt palette.
As I mentioned earlier, creating and parsing spreadsheets is inevitable when you are working with huge web applications. Thus, familiarity with parsing libraries can only help you when the need arises.
What scripting language do you use to handle spreadsheets? Does Python have any other library for this purpose? Let us know in the discussion below.