Introduction
The term "comma separated values" (or "csv") refers to a straightforward file format that uses certain formatting to arrange tabular data. It provides a standard format for data interchange and stores tabular data, such as from a spreadsheet or database, in plain text. The rows and columns of data in a csv file that opens in an excel sheet determine the format that should be used.
Reading CSV files
Python has many routines for reading csv files. We are outlining a few reading functions.
Using the csv.reader() function
The csv file is read using Python's csv.reader() function. It creates a list of all the columns for each row in the file.
The following information is in the text file python.txt, which by default uses the comma (,) as a delimiter:
name,department,birthday month Parker,Accounting,November Smith,IT,October |
Example
import csv with open('python.csv') as csv_file: csv_reader = csv.reader(csv_file, delimiter=',') line_count = 0 for row in csv_reader: if line_count == 0: print(f'Column names are {", ".join(row)}') line_count += 1 |
Output:
Column names are name, department, birthday month Parker works in the Accounting department, and was born in November. Smith works in the IT department, and was born in October. Processed 3 lines. |
Using the open() function, the file "python.csv" was opened in the code above. To read the file, we used the csv.reader() function, which produces an iterable reader object. The data were contained in the reader object, and we used a for loop to iteratively print the contents of each row.
Read a CSV into a Dictionary
Instead than dealing with a list of distinct text items, we can use the DictReader() function to read the data from the CSV file directly into a dictionary.
A reminder of the contents of our input file, python.txt
name,department,birthday month Parker,Accounting,November Smith,IT,October |
Example
import csv with open('python.txt', mode='r') as csv_file: csv_reader = csv.DictReader(csv_file) line_count = 0 for row in csv_reader: if line_count == 0: print(f'The Column names are as follows {", ".join(row)}') line_count += 1 print(f'\t{row["name"]} works in the {row["department"]} department, and was born in {row["birthday month"]}.') line_count += 1 print(f'Processed {line_count} lines.') |
Output:
The Column names are as follows name, department, birthday month Parker works in the Accounting department, and was born in November. Smith works in the IT department, and was born in October. Processed 3 lines. |
Pandas reading csv files
The NumPy library serves as the foundation for the open-source Pandas library. It offers quick data preparation, data cleaning, and analysis for the user.
It takes only a few seconds and is simple to read the csv file into a pandas DataFrame. To open, examine, and read the csv file in pandas and have it save the data in a DataFrame, we don't need to write many lines of code.
Here, we're going to read a slightly more challenging file named hrdata.csv, which contains information about the company's employees.
Name,Hire Date,Salary,Leaves Remaining John Idle,08/15/14,50000.00,10 Smith Gilliam,04/07/15,65000.00,8 Parker Chapman,02/21/14,45000.00,10 Jones Palin,10/14/13,70000.00,3 Terry Gilliam,07/22/14,48000.00,7 Michael Palin,06/28/13,66000.00,8 |
Example
import pandas df = pandas.read_csv('hrdata.csv') print(df) |
Output:
Name Hire Date Salary Leaves Remaining 0 John Idle 03/15/14 50000.0 10 1 Smith Gilliam 06/01/15 65000.0 8 2 Parker Chapman 05/12/14 45000.0 10 3 Jones Palin 11/01/13 70000.0 3 4 Terry Gilliam 08/12/14 48000.0 7 5 Michael Palin 05/23/13 66000.0 8
|
Writing CSV Files
The csv.writer() function in Python can be used to write any new or existing CSV files. It contains two methods, the writer function or the Dict Writer class, and is similar to the csv.reader() module.
It provides the writerow() and writerows methods (). The writerows() function writes many rows, whereas the writerow() function only writes one row.
Dialects
It is described as a framework that enables the creation, storage, and reuse of different formatting settings. It supports a variety of characteristics; the most popular ones are:
- Dialect.delimiter: This attribute is used to denote a field's boundary. The comma is the default value (,).
- Dialect.quotechar: Fields with special characters are quoted using this property.
- Dialect.lineterminator : whose default value is "rn," is used to start new lines.
Let's create a CSV file with the data below.
data = [{'Rank': 'B', 'first_name': 'Parker', 'last_name': 'Brian'}, {'Rank': 'A', 'first_name': 'Smith', 'last_name': 'Rodriguez'}, {'Rank': 'C', 'first_name': 'Tom', 'last_name': 'smith'}, {'Rank': 'B', 'first_name': 'Jane', 'last_name': 'Oscar'}, {'Rank': 'A', 'first_name': 'Alex', 'last_name': 'Tim'}] |
Example:
import csv with open('Python.csv', 'w') as csvfile: fieldnames = ['first_name', 'last_name', 'Rank'] writer = csv.DictWriter(csvfile, fieldnames=fieldnames) writer.writeheader() writer.writerow({'Rank': 'B', 'first_name': 'Parker', 'last_name': 'Brian'}) writer.writerow({'Rank': 'A', 'first_name': 'Smith', 'last_name': 'Rodriguez'}) writer.writerow({'Rank': 'B', 'first_name': 'Jane', 'last_name': 'Oscar'}) writer.writerow({'Rank': 'B', 'first_name': 'Jane', 'last_name': 'Loive'}) print("Writing complete") |
Output:
It returns a file called "Python.csv" that has the following information in it:
first_name,last_name,Rank Parker,Brian,B Smith,Rodriguez,A Jane,Oscar,B Jane,Loive,B |
Create a dictionary from a CSV file
To write the CSV file directly into a dictionary, we may also utilize the class DictWriter.
The following information is located in a file called "python.csv":
Parker, Accounting, November Smith, IT, October |
Example:
import csv with open('python.csv', mode='w') as csv_file: fieldnames = ['emp_name', 'dept', 'birth_month'] writer = csv.DictWriter(csv_file, fieldnames=fieldnames) writer.writeheader() writer.writerow({'emp_name': 'Parker', 'dept': 'Accounting', 'birth_month': 'November'}) writer.writerow({'emp_name': 'Smith', 'dept': 'IT', 'birth_month': 'October'}) |
Output:
import csv with open('python.csv', mode='w') as csv_file: fieldnames = ['emp_name', 'dept', 'birth_month'] writer = csv.DictWriter(csv_file, fieldnames=fieldnames) writer.writeheader() writer.writerow({'emp_name': 'Parker', 'dept': 'Accounting', 'birth_month': 'November'}) writer.writerow({'emp_name': 'Smith', 'dept': 'IT', 'birth_month': 'October'}) |
Create CSV Files Utilizing Pandas
As an open source library built on top of the Numpy library, Pandas is described as such. It offers quick data preparation, data cleaning, and analysis for the user.
It is as simple as using pandas to read the CSV file. The DataFrame, a two-dimensional, heterogeneous tabular data structure with three primary parts (data, columns, and rows), must be created. Here, we'll read a slightly more challenging file named hrdata.csv that provides information about the company's personnel.
Name,Hire Date,Salary,Leaves Remaining John Idle,08/15/14,50000.00,10 Smith Gilliam,04/07/15,65000.00,8 Parker Chapman,02/21/14,45000.00,10 Jones Palin,10/14/13,70000.00,3 Terry Gilliam,07/22/14,48000.00,7 Michael Palin,06/28/13,66000.00,8 |
Example:
import pandas df = pandas.read_csv('hrdata.csv', index_col='Employee', parse_dates=['Hired'], header=0, names=['Employee', 'Hired', 'Salary', 'Sick Days']) df.to_csv('hrdata_modified.csv') |
Output:
Employee, Hired, Salary, Sick Days John Idle, 2014-03-15, 50000.0,10 Smith Gilliam, 2015-06-01, 65000.0,8 Parker Chapman, 2014-05-12, 45000.0,10 Jones Palin, 2013-11-01, 70000.0,3 Terry Gilliam, 2014-08-12 , 48000.0,7 Michael Palin, 2013-05-23, 66000.0,8
|