A Very Gentle Introduction to SQL

SQL (pronounced “ess cue ell,” “sequel,” or “squeal”) stands for Structured Query Language, the international standard for getting data sets out of databases.

General Syntax

In general, SQL queries take the form of:

SELECT [list of fields separated by commas] FROM [table “” not found /]
WHERE [condition(s) you’re looking for]

For example, to get a list of the chart number, first name, and last name of all active patients who live in the 90210 zip code, we’d use the SQL query:

SELECT REGISTER.PATNO, REGISTER.FNAME, REGISTER.LNAME FROM REGISTER WHERE ( REGISTER.STATUSPAT = “ACTIVE”) AND ( REGISTER.ZIPCODE = “90210”)

To find out which patient(s) have the home phone number 936-273-3859, we’d write the query:

SELECT REGISTER.FNAME, REGISTER.LNAME FROM REGISTER WHERE REGISTER.HPHONE = “936-273-3859”

To find a list of Medicaid babies born since 1/1/2016, we’d write the query:

SELECT REGISTER.FNAME, REGISTER.LNAME FROM REGISTER WHERE
(REGISTER.SICKVNO = 1) AND (REGISTER.BIRTHDAT > “1/1/2016”)

You can use Boolean operators (OR, AND, NOT) in the WHERE conditions. For example, to find a list of active patients whose PCP is doctor CLR with primary, secondary, or tertiary insurance BLU:

SELECT REGISTER.FNAME, REGISTER.LNAME FROM REGISTER WHERE (REGISTER.STATUSPAT = “ACTIVE”) AND (REGISTER.DOCTOR = “CLR”) AND ((REGISTER.INS_CARRIER_CODE = “BLU”) OR (REGISTER.INS2_CARRIER_CODE = “BLU”) OR (REGISTER.INS_CARRIER_CODE_3 = “BLU”))

To see more examples of search criteria that can be used in OP Reports, see Module 4, Lesson 2, “SEARCH tab.”

It’s generally better to start with one or two conditions (WHERE clauses) and slowly add more in one at a time, rather than piling on all the WHERE conditions at first. Otherwise, it’s easy to put so many conditions on your search that one will be overly restrictive, and nothing will turn up in your results:

SELECT SINGLEGUY.NAME, SINGLEGUY.PHONE FROM SINGLEGUY WHERE
( SINGLEGUY.HEIGHT > 72) AND ( SINGLEGUY.INCOME > 100000) AND (SINGLEGUY.EYES = ‘BLUE’) AND (SINGLEGUY.MOM_DISTANCE > 200) AND (SINGLEGUY.HOME <> “BASEMENT”) AND (SINGLEGUY.ATHLETIC = ‘Y’) AND (SINGLEGUY.KIDS <= 3) AND (SINGLEGUY.PSYCHO_EX = ‘N’) AND (SINGLEGUY.WILLING_TO_COMMIT = ‘Y’) …

Sorting

When you get a long list, it’s often helpful to sort the rows. Adding an ORDER BY clause will sort the query results. For example, to sort our Beverly Hills query by last name then first name, we’d want…

SELECT REGISTER.PATNO, REGISTER.FNAME, REGISTER.LNAME FROM REGISTER WHERE ( STATUSPAT = ‘ACTIVE’ ) and ( ZIP = ‘90210’ )
ORDER BY REGISTER.LNAME, REGISTER.FNAME

But to sort it by patient chart number, from smallest to largest, we’d want…

SELECT REGISTER.PATNO, REGISTER.FNAME, REGISTER.LNAME FROM REGISTER WHERE ( STATUSPAT = ‘ACTIVE’ ) and ( ZIP = ‘90210’ )
ORDER BY REGISTER.PATNO

Joins

For all but the simplest reports, we will want information from more than one table – this procedure is called a join. In order to join two tables in a logical fashion, we have to find a common “unique index” – that is, a field common to both tables.

Let’s say we want to print a report that shows a patient’s chart number, first and last name, and the full name of their PCP. This will require data from both the REGISTER and the STAFF1 tables:

REGISTER
PATNO FNAME LNAME DOCTOR

101

Mary

Smith

MQ

102

Mickey

Maus

LVD

103

Daphne

Duq

OS

104

Polka

Hontas

LVD

STAFF1
STAFFID STAFFINIT STAFFNAME

102

LVD

Ludwig Von Drake

104

MQ

Michaela Quinn

204

OS

Otto Scratchandsniff

So we could Join these two tables to get something like:

PATNO FNAME LNAME STAFFNAME

101

Mary

Smith

Michaela Quinn

102

Mickey

Maus

Ludwig Von Drake

103

Daphne

Duq

Otto Scratchansniff

104

Polka

Hontas

Ludwig Von Drake

To do the join, we use an SQL command like:

SELECT REGISTER.PATNO, REGISTER.FNAME, REGISTER.LNAME, STAFF1.STAFFNAME FROM REGISTER
INNER JOIN STAFF1 ON ( STAFF1.STAFFINIT = REGISTER.DOCTOR)

The equal sign in the phrase STAFF1.STAFFINIT = REGISTER.DOCTOR shows that we’re looking for exact matches between the DOCTOR field in REGISTER and the STAFFINIT field in STAFF1. To be precise, the tables STAFF1 and REGISTER are now “inner joined,” and STAFFINIT and DOCTOR are the two “join fields” that the tables are “joined on.”

A join doesn’t permanently change the structure of the tables – it’s a “read-only” operation on the table.

Types of joins: inner, outer, left, right

There are 4 basic kinds of join: Inner Join, Left Outer Join, Right Outer Join, and Full Outer Join. To illustrate them, let’s look at two pretend tables:

DEMOGRAPHICS
PATNO NAME

101

Mary Smith

102

Mickey Mouse

103

Daffy Duck

104

Poca Hontas

105

DJ Berman

RECREATION

PATNO SPORT

101

soccer

102

baseball

104

tennis

106

jai-alai

An inner join matches only rows that appear in both tables. If we “inner join” the tables DEMOGRAPHICS and RECREATION on the fields DEMOGRAPHICS.PATNO = RECREATION.PATNO, we get:

INNER JOIN

PATNO NAME SPORT

101

Mary Smith

soccer

102

Mickey Mouse

baseball

104

Poca Hontas

tennis

The phrase “inner join” means that, only when an exact match is found in both fields, the two rows are joined and appear in the resulting table. If one row is blank in the join field (or no match is found in either other table), then neither row will appear in the resulting table. So people who don’t have a favorite sport declared (like 103 and 105), or records in which a sport has been entered but no name (106), will disappear out of the results. In a Venn diagram, inner join is like “A intersect B.”

Outer joins include at least one of the complete tables, plus whatever that table intersects in the other table.

For example, left outer join means you want all the rows in the first (left) table to show up, whether or not there’s a match for it in the second (right) table.

LEFT OUTER JOIN

PATNO NAME SPORT

101

Mary Smith

soccer

102

Mickey Mouse

baseball

103

Daffy Duck

104

Poca Hontas

tennis

105

DJ Berman

With this particular dataset, a left outer join means that all the names listed in DEMOGRAPHICS will appear in the results, with matching RECREATION results, if applicable. In a Venn diagram, a left outer join is like “A”.

A right outer join, like a left outer join, includes all of one table and whatever matches (if any) it has in the other table. A right outer join includes everything in the second (right) table, and whatever matches in the first (left) table:

RIGHT OUTER JOIN

PATNO NAME SPORT

101

Mary Smith

soccer

102

Mickey Mouse

baseball

104

Poca Hontas

tennis

106

jai-alai


Full Outer Join, the final possibility for a table join, means you want all the rows in both tables to appear, matching where applicable.

FULL OUTER JOIN

PATNO NAME SPORT

101

Mary Smith

soccer

102

Mickey Mouse

baseball

103

Daffy Duck

104

Poca Hontas

tennis

105

DJ Berman

106

jai-alai

In general, full outer joins may wreak havoc because they tend to be sloppy “total dumps” of two tables. Use them only when a narrower join won’t work.

Choosing join fields can be tricky

Be careful that you select the right fields to join on. If you didn’t look carefully, you might accidentally join two columns that look like they actually point to the same type of data, but don’t:

SELECT REGISTER.PATNO, REGISTER.FNAME, REGISTER.LNAME, STAFF1.STAFFNAME FROM REGISTER INNER JOIN STAFF1 ON ( REGISTER.PATNO = STAFF1.STAFFID)

This join looks for matches between the PATNO field and the STAFFID field. This would give the following result:

PATNO FNAME LNAME STAFFNAME

104

Polka

Hontas

Michaela Quinn

The one row that does show up is matched wrong, of course: It’s totally wrong to assume that patient #104 would have doctor #104 as her PCP.

However, in OP Reports, the Query Wizard (see the next module for TABLE) often does just that. The Query Wizard “suggests” table joins, based on what it perceives to be similar/identical fields. In this case, it would see that both PATNO and STAFFID are lists of numbers starting at 101, and would assume they’re the same data series. Sometimes the suggested joins are correct; other times, not so. Accept the suggested joins only after critical evaluation.

Joins sometimes result in one-to-one results, one-to-many results, or many-to-many results. In the latter two cases, you will have much longer reports, sometimes with multiple seemingly “duplicate” line items. You will need to either account for this in your report design, or leave the duplicates (which can be removed later using Excel’s Remove Duplicates tool.)

In a one-to-one relationship, we expect each row from the table on the left to have one, and only one, proper match to the table on the right, and vice versa. Each patient in REGISTER can only have one birth history, and each birth history recorded in MEDHIST is only ascribed to one patient in REGISTER:

REGISTER

PATNO FNAME LNAME DOCTOR

101

Mary

Smith

MQ

102

Mickey

Maus

LVD

103

Daphne

Duq

OS

104

Polka

Hontas

LVD

MEDHIST

PATNO DELIVERY FEEDING B_WEIGHTKG

101

NSVD BREAST

3.53

102

NSVD BOTTLE

1.99

103

C-SECTION

BOTH

3.78

104

NSVD

BREAST

3.12

In a one-to-many (also called a “many-to-one”) relationship, a row in one of the tables will have multiple matching rows in the other. In the case of associating PCPs with patients, any single (“one”) PCP might have hundreds (“many”) of patients on his panel. However, any given patient only has one PCP at a time:

REGISTER

PATNO

FNAME

LNAME

DOCTOR

101

Mary

Smith

MQ

102

Mickey

Maus

LVD

103

Daphne

Duq

OS

104

Polka

Hontas

LVD

STAFF1

STAFFID STAFFINIT STAFFNAME

102

LVD

Ludwig Von Drake

104

MQ

Michaela Quinn

204

OS

Otto Scratchandsniff

In a many-to-many relationship, multiple rows in one table will match to multiple rows in the other table. ENC_NOTE contains all the encounter note data in OP, where PATNO is the patient’s chart number, NOTE_ID is a unique progress note number, and NOTE_DATE is the date of service for the encounter note. It’s reasonable to expect that a patient may have several different entries in ENC_NOTE, one for each visit to the office, as well as many different billing entries in ARCHIVE_TRANSACTIONS.

ENC_NOTE

PATNO NOTE_ID NOTEDATE

101

4385

04/01/2014

101

4496

04/18/2014

101

4690

04/29/2014

ARCHIVE_TRANSACTIONS

PATNO DATE1 CPTCODE CHARGE

101

04/01/2014

99214

120.00

101

04/01/2014

96110

50.00

101

04/18/2014

99213

85.00

101

04/29/2014

99213

85.00

Making a join on PATNO alone will result in a “cross-multiplied’ table. That is, every PATNO = 101 row on the left table will match and join every PATNO = 101 on the right table. Since there are three such rows in the left table, and four such rows in the right table, the resulting join will have twelve rows (3 x 4):

ENC_NOTE. PATNO

ENC_NOTE.NOTE_ID

ENC_NOTE. NOTEDATE

ARCHIVE_ TRANS ACTIONS. DATE1

ARCHIVE_ TRANS ACTIONS. CPTCODE

ARCHIVE_ TRANS ACTIONS. CHARGE

101

4385

04/01/2014

04/01/2014

99214

120.00

101

4385

04/01/2014

04/01/2014

96110

50.00

101

4385

04/01/2014

04/18/2014

99213

85.00

101

4385

04/01/2014

04/29/2014

99213

85.00

101

4496

04/18/2014

04/01/2014

99214

120.00

101

4496

04/18/2014

04/01/2014

96110

50.00

101

4496

04/18/2014

04/18/2014

99213

85.00

101

4496

04/18/2014

04/29/2014

99213

85.00

101

4690

04/29/2014

04/01/2014

99214

120.00

101

4690

04/29/2014

04/01/2014

96110

50.00

101

4690

04/29/2014

04/18/2014

99213

85.00

101

4690

04/29/2014

04/29/2014

99213

85.00

This contains way too much duplication to have anything useful. It’s only meaningful if we match up and join on not only the patient numbers, but also the date of service. If re-define the inner join with both

as well as

ENC_NOTE.NOTEDATE = ARCHIVE_TRANSACTIONS.DATE1

…we will eliminate some of the redundant rows, and see only the CPT codes and charges that correspond to that encounter note.

ENC_NOTE.
PATNO
ENC_NOTE.
NOTE_ID
ENC_NOTE.
NOTEDATE
ARCHIVE_
TRANSACTIONS.
DATE1
ARCHIVE_
TRANSACTIONS.
CPTCODE
ARCHIVE_
TRANSACTIONS.
CHARGE

101

4385

04/01/2014

04/01/2014

99214

120.00

101

4385

04/01/2014

04/01/2014

96110

50.00

101

4496

04/18/2014

04/18/2014

99213

85.00

101

4690

04/29/2014

04/29/2014

99213

85.00

What may begin as a many-to-many relationship on a single join may become a one-to-many (or even a one-to-one) relationship as multiple join conditions are added.

In addition to the above-described inner join, another way to join tables is he outer join. The outer join allows one table to show all the applicable fields, whether or not they have matches in the other table.

For example, let’s say we want to look at the birth history of all patients in REGISTER. We could join the following two tables on REGISTER.PATNO = MEDHIST.PATNO:

REGISTER

PATNO FNAME LNAME DOCTOR

101

Mary

Smith

MQ

102

Mickey

Maus

LVD

103

Daphne

Duq

OS

104

Polka

Hontas

LVD

105

Dollie

Parson

MQ

MEDHIST

PATNO DELIVERY FEEDING B_WEIGHTKG

101

NSVD

BREAST

3.53

104

NSVD

BREAST

3.12

…but an inner join like

SELECT REGISTER.PATNO, REGISTER.LNAME, REGISTER.FNAME, MEDHIST.DELIVERY, MEDHIST.FEEDING, MEDHIST.B_WEIGHTKG
FROM REGISTER
INNER JOIN MEDHIST ON (MEDHIST.PATNO = REGISTER.PATNO)
ORDER BY REGISTER.PATNO

would give us the following result:

REGISTER.PATNO REGISTER.FNAME REGISTER.LNAME MEDHIST.DELIVERY MEDHIST.FEEDING MEDHIST.B_WEIGHTKG

101

Mary

Smith

NSVD

BREAST

3.53

104

Polka

Hontas

NSVD

BREAST

3.12

When patients in the REGISTER table don’t have a corresponding row in the MEDHIST table (maybe they’re cross- cover patients for whom we never obtained a birth history), they don’t appear in an inner join. However, an outer join like:

SELECT REGISTER.PATNO, REGISTER.LNAME, REGISTER.FNAME, MEDHIST.DELIVERY, MEDHIST.FEEDING, MEDHIST.B_WEIGHTKG
FROM REGISTER
LEFT OUTER JOIN MEDHIST ON (MEDHIST.PATNO = REGISTER.PATNO)
ORDER BY REGISTER.PATNO

will give us:

REGISTER.PATNO REGISTER.FNAME REGISTER.LNAME MEDHIST.DELIVERY MEDHIST.FEEDING MEDHIST.B_WEIGHTKG

101

Mary

Smith

NSVD

BREAST

3.53

102

Mickey

Maus

103

Daphne

Duq

104

Polka

Hontas

NSVD

BREAST

3.12

105

Dollie

Parson

The two most commonly used kinds of outer joins are left outer join (used above) and its complement, right outer join. “Left” and “right” refer to the table which should be included in its entirety, even if there is no match in the other table. The first table referred to in the code is by convention always the “left” table and the table referred to second is the “right” table…

SELECT REGISTER.BLA, ARCHIVE_TRANSACTIONS.STUFF FROM REGISTER LEFT OUTER JOIN ARCHIVE_TRANSACTIONS ON (REGISTER.PATNO = ARCHIVE_TRANSACTIONS.PATNO)

In this case, REGISTER is the “left” table and ARCHIVE_TRANSACTIONS is the “right” table. Thus in this query, rows in REGISTER will be returned even if they have no matches in ARCHIVE_TRANSACTIONS.

Switching the order of the tables will give the same result if the join is a Right Outer Join:

SELECT REGISTER.BLA, ARCHIVE_TRANSACTIONS.STUFF FROM ARCHIVE_TRANSACTIONS RIGHT OUTER JOIN REGISTER ON (REGISTER.PATNO = ARCHIVE_TRANSACTIONS.PATNO)

Note that when SELECTing fields from two tables you’re joining, you select fields from both tables in the list of fields. That is,

SELECT TABLE_A.FIELD1, TABLE_B.FIELD2 FROM TABLE_A INNER JOIN TABLE_B ON (TABLE_A.INDEX = TABLE_B.INDEX)

Special SQL Calculations and Transformations

Firebird SQL (the dialect of SQL that OP uses) allows special calculations on fields you’re selectin from the database. As we’ve seen, a patient’s age on a particular date of service is the difference (in years or months) between the DOS and the patient’s DOB.

Let’s say you want to know how old patient #101 was on the dates of service we’ve seen him.

SELECT ARCHIVE_TRANSACTIONS.DATE1, REGISTER.BIRTHDAT, DATEDIFF(day, REGISTER.BIRTHDAT, ARCHIVE_TRANSACTIONS.DATE1)/365.24 AS age_in_years FROM REGISTER INNER JOIN ARCHIVE_TRANSACTIONS ON (ARCHIVE_TRANSACTIONS.PATNO = REGISTER.PATNO) WHERE (REGISTER.PATNO = 101) AND (ARCHIVE_TRANSACTIONS.CPTCODE NOT IN (‘1’, ‘2’))

DATEDIFF(day, REGISTER.BIRTHDAT, ARCHIVE_TRANSACTIONS.DATE1) gives the difference in days between the child’s date of birth and a date of service in claims. To get age in years, we divide that value by the number of days in an average year (365.24).

ARCHIVE_TRANSACTIONS.CPTCODE NOT IN (‘1’, ‘2’) eliminates dates that payments were made on patient #101’s account. This leaves only charge dates (i.e. dates of service) in the list.

For a more complete example of DATEDIFF used in a full report, see the Example Reports tutorial.

For other kinds of calculations that can be used in OP Reports, see Module 4, Part 2, “CALCS”.

SQL References

There’s no need to reproduce a complete SQL tutorial here when there are plenty of good ones already out there:

SQL Dialects

There are different “dialects” of SQL that use slightly different commands. OP’s database uses Firebird SQL, which you can read about here: