1 minute read

A simple tutorial to creating a SQLite database in R.

Motivation

In my current role at MNPS, I’ve dealt with massive flat files in various formats: CSV’s, text files, and a lot SAVs. After a few weeks of reading data into R via {readr}, data.table::fread, and {haven} (I personally prefer using {haven} over {foreign} because it’s part of the tidyverse). I started to lose patience waiting for files to load. Resaving data as .RData files definitely didn’t help, and I opted to saving smaller dataset as .RDS. However this still didn’t solve the issue for my larger data sets. And when I say large, these files are +200MB and in some cases close to 10 million rows. Much of these data have string variables, too. So, I saw this as an opportunity to learn something new in my free time that could improve my efficiency at work. There were plenty of online resources for maximizing programming efficiency in R (this includes timing code execution, etc.) and I landed on an elegant, simple solution of creating a SQLite database.

Here’s how

I found that it’s actually quite simple to create a database in R. There were some great examples (list examples here) that led me here. This process utilizes the {dbplyr} and {RSQLite} packages.

# ---- Load libraries
library(dbplyr)
## 
## Attaching package: 'dbplyr'
## The following objects are masked from 'package:dplyr':
## 
##     ident, sql
library(RSQLite)

The actual creation of the local SQLite database is pretty easy.

db <- "my-db.sqlite"  # This will be my database name
conn <- dbConnect(drv = SQLite(), dbname = db)

Next up we can write tables to our database. In the dbWriteTable command you communicate with the database connection, add the name of the table, and the data to write to the DB.

# Write `mtcars` data to our DB named "cars"
dbWriteTable(conn, "cars", mtcars)

Once we have multiple tables in our database, I’ve found src_dbi(conn) to be helpful to see all the tables in the connection.

# View tables in connection
src_dbi(conn)
## src:  sqlite 3.30.1 [my-db.sqlite]
## tbls: cars

Finally, it’s always important to disconnect from the database when you’re done.

# # Disconnect from DB
DBI::dbDisconnect(conn)

Updated: