I was recently having a conversation with a fairly ‘tech-savvy’ acquaintance and was a bit surprised at just how little he knew about databases. I have to admit that, back in the 1990s, when I first began dabbling with them as a teenager, databases seemed awfully dry and not very compelling; I was far more interested in programming space invaders using Turbo Pascal. But the World Wide Web changed all that. When I began developing websites in PHP and other languages, I realised just how important data storage was for online content and the future of the Internet.
Today, we live in a world of big data. If you have a Google account, and surf while signed in, then all your activity across the entire Google spectrum is being recorded for analysis by the company. In a single day I’ve easily clocked up more than 200 searches and browsed myriad YouTube videos. This is all tracked in my account and if you extrapolate for the total number of Google subscribers in the world, you’re looking at a veritable cosmos of data. Giants like Amazon are utterly dependent upon vast and intricate database systems to control and orchestrate every aspect of their operations. Facebook is, in essence, one spectacularly large database, attuned to recognising links between people and exploiting them commercially.
Learning the basics of data storage is a fun and interesting thing to do, not only because it provides a sense of how these systems work, but because it enhances your logical reasoning abilities by forcing you to break complex data into pieces and link them up. Plus, you may have need to make your own databases someday. In this article I will introduce the Structured Query Language (SQL), which has been the most popular solution for database systems for some decades. My aim is to explore how to create a fairly simple database and to show how powerful its data retrieval functionality can be.
Since my girlfriend is mad about horses and knows nothing about databases, I decided to use her as a sounding board for this project. We’ll build a database for storing all the details of horse races and the animals, jockeys and trainers involved. I’ll introduce significant concepts as I build the database, thus keeping the focus on the practical side and (hopefully) not lapsing into dry theory.
SQL is a language for creating and working with relational databases. These are not the only kind of database out there, but they are the most popular and can solve most data storage problems. SQL has been around since the 1970s and there is a well-defined standard for the language. Nevertheless, over the years many companies have created SQL variations with their own proprietary features added. For instance, Microsoft produces SQL Server, designed to power huge databases spread over many individual server computers. Its unique flavour of SQL is called T-SQL, which adds extra bells and whistles that other versions of the language don’t have. Another popular choice is the MySQL system, which is open-source and free in most circumstances. It powers some pretty big concerns, like YouTube, Facebook and Twitter.
In this series of articles, I’ll be using one of the most lightweight SQL variations of all. It’s called SQLite and it implements most of the current SQL standard. It’s easy to use, and a free program called DB Browser can create, modify and query its databases, which are held in individual files. If you’re looking to get started with SQL, this is the approach I’d recommend. Begin by heading over to sqlitebrowser.org to grab your copy; I’m working on 64-bit Windows 10 so I chose the appropriate version. The installation is fast and easy. Upon running the program, you’ll see a window like this:
To get going, click New Database on the toolbar, pick a folder and enter the filename Equus. After clicking Save, a new file will be created with a type of Data Base File. At only 8 KB in size, these empty SQLite containers are rather svelte. A dialog box will open, inviting you to create a new table; for now, cancel it. The Database Structure tab shows the objects currently inside: Tables, Indices, Views and Triggers. But the database is currently devoid of these:
In SQL, data is stored in relations, better known as tables. You can think of these as being like any table you might lay out in Word or Excel; they have columns and rows which are also referred to as fields and records. Let’s sketch out our first table for storing the details about each horse that takes part in the races. I’m deliberately going to create a table that will cause some design problems down the road, so we can appreciate what they are and fix them when the time comes.
What data should we store about an individual horse? I settled on the following fields:
- Date of birth
- Trainer’s contact number
You can visualize a table with these as column headings and each row providing a record of data about an individual horse. When you’re building a database table in SQL, you must pay attention to the type of data that each field will store. I’m referring to how the computer sees the data—is it text, a number, a date? You must be specific so the database can act as nightclub bouncer and reject entries that are unsuitable for the field. For example, the user should not be able to enter ‘Prince of Persia’ into the date of birth field. Moreover, she should not be able to enter ‘2nd May 2009’ either. Although this makes sense to us humans, the database needs the date to be in a standard format in order to process it. For example, we may wish to calculate the horse’s age by finding the current date and subtracting his birth date.
The name field should hold text, a sequence of characters including letters and spaces. We can use the data type TEXT for such fields. We’ll make breed, colour, trainer and the trainer’s contact number TEXT fields too. You may wonder why the phone number should be stored as text. The rule of thumb is to ask, ‘does this field contain numeric data that will be used in calculations?’ Phone numbers are purely informational and are not numbers in any arithmetical sense; they are also likely to contain spaces, + signs and parentheses, which numeric fields cannot handle.
Date of birth should clearly be stored as a date, and SQLite provides the DATE type for just this purpose. Dates should be given in the format YYYY-MM-DD. Weight rather depends on how accurately we wish to store it. In our case, we’ll assume it to be a value in kilograms, rounded to the nearest whole. Such numbers, positive, negative, and zero are handled via the INTEGER data type. At this point, we know what fields are required and which data types they should use:
- name TEXT
- date_of_birth DATE
- breed TEXT
- colour TEXT
- weight: INTEGER
- trainer: TEXT
- trainer_phone: TEXT
When creating a table in SQL, you cannot use spaces in the table’s name, or in the names of its fields. A popular convention is to use lowercase lettering throughout and to separate words with underscores, as I’ve done here. We are ready to translate our design in SQL code and ask the database to create this table. Switch to the Execute SQL tab in DB Browser; it should contain an empty SQL 1 tab, ready to issue instructions to the system. In order to create a new, empty, table in SQL, we use the CREATE TABLE command:
CREATE TABLE horse ( name TEXT NOT NULL, date_of_birth DATE NOT NULL, breed TEXT NOT NULL, colour TEXT NOT NULL, weight INTEGER NOT NULL, trainer TEXT NOT NULL, trainer_phone TEXT NOT NULL );
Another SQL convention is to give tables singular names; hence why I called this horse and not horses. We store data about horses in the horse table, as you likely store socks in your sock drawer. The layout of the command, including the indentation and the splitting of lines, is up to you. The program won’t mind if you put the whole thing on one line, but you should take care to arrange the code to be easily readable by humans. Note how the list of fields is enclosed in parentheses and separated by commas, and that the whole command is terminated by a semi-colon.
The definition of each field includes its name, data type, and the NOT NULL designation. This is optional, but very often employed. It informs the database that, when a record is inserted into this table, the field in question must have a value and cannot be left empty. If a field needs to have a value in order for its record to make sense, designate it NOT NULL; otherwise, omit this designation. It would, for example, make no sense for the name field to be empty. In the table above, every field is required.
I have placed each SQL keyword, like CREATE and NOT, in uppercase. However, the language is not sensitive to case and you may use lowercase if you wish. Having typed in the code, click the Execute button above or press F5. Assuming the code has no errors, the table will be successfully created. Otherwise, errors will be reported in the pane underneath. If you encounter them, carefully check the code, making sure spaces, commas and brackets are in the right places.
At this point, return to the Database Structure tab and expand the Tables node. You should see the new table listed, and expanding it will reveal the names and data types of each field. It’s time to put some data into it. Switch back to the Execute SQL tab and erase the entire CREATE TABLE command you typed in before. To insert data, you’ll be surprised to learn, you issue an INSERT INTO command. Without further ado, here it is:
INSERT INTO horse (name, date_of_birth, breed, colour, weight, trainer, trainer_phone) VALUES ('Prince of Persia', '2009-05-02', 'Thoroughbred', 'Bay', 410, 'Ivor Winner', '02686 312389'), ('Grandad''s Latest Haircut', '2010-08-23', 'Thoroughbred', 'Roan', 449, 'Randy Allnyte', '03464 847623'), ('Napoleon The Great', '2012-12-18', 'Arabian', 'Chestnut', 393, 'Robyn Banks', '07838 887473');
First, we identify which table is to receive the data. Next, we list the fields into which data is to be inserted. Finally, after the VALUES keyword, we list the data for each record, taking care to match the sequence of fields given at the outset. Each record is comma-separated except the last, after which we terminate with a semi-colon. You can add as many records as you like in this fashion.
There are a few points to observe about this code. Text values are placed in single quotes, else the database will become confused as it tries to interpret words like ‘Prince’ as keywords in the SQL language. In the second record, the name field contains an apostrophe, which is the same character (a single quote) as delineates the whole text value; you must therefore double these to let the interpreter know you mean an apostrophe, not the end of the text. Dates are entered in single quotes as well, in the YYYY-MM-DD format. Numbers need no quotes.
Run this code to insert the three records. To confirm the entry of the data, click the Browse Data tab. You can choose the table to show via the drop-down list at the top; but since we have only one table it is chosen automatically. The contents are shown in a traditional table view, with column headers and rows:
To save the database, click Write Changes on the toolbar. This will update the file on disk to contain the new table and its data. Join me in my next post, as we dive into simple queries to get data out of the table and consider key fields and their importance.