Data Types and Built-in Functions in PostgreSQL
Jul 21, 2020 • 11 Minute Read
Introduction
PostgreSQL is one of the world's most advanced and most widely used open source databases. Over the years, numerous data types and functions, developed by a worldwide team of volunteers, have been added to PostgreSQL. In this guide, you will learn about essential data types and functions available in PostgreSQL and see a demonstration of their use cases, which will help you get better at database development and data analysis tasks.
Data Types in PostgreSQL
In PostgreSQL, there are multiple aliases for most data types in the form of short or long keywords, and optional attributes like length, precision, and scale can be mentioned.
Character
The most elementary data type used to store character values is known as CHAR or CHARACTER. A fixed length for the character value has to be defined along with the keyword as CHAR(n) or CHARACTER(n). In case a character value of smaller length is added, it is padded with white space for the remaining characters.
SELECT 'Art'::CHAR(7) AS "Padded";
--Output:"Art "
SELECT 'Artwork'::CHAR(7) AS "NotPadded";
--Output:"Artwork"
The :: operator is used for explicit type casting as per the syntax in PostgreSQL. It can also be achieved by using the Cast() function.
SELECT CAST('Art' AS CHAR(7)) AS "Padded";
--Output:"Art "
SELECT CAST('Artwork' AS CHAR(7)) AS "NotPadded";
--Output:"Artwork"
A good use case for fixed character data types like CHAR is columns where certain values or codes of fixed length are stored, such as Postal Codes. For character values of a varying length, the datatype VARCHAR(n) or CHARACTER VARYING(n) or CHAR VARYING(n) is used. The length can vary up to the maximum length defined by n. The character values smaller than the maximum length are not padded in this datatype.
SELECT 'Gill Bates'::VARCHAR(14) AS "SmallerNamesAreNotPadded";
--Output: "Gill Bates"
SELECT 'Zark Muckerberg'::VARCHAR(14) AS "LongerNameClipped";
--Output: "Zark Muckerber"
In case of casting, the longer values get clipped, but inserting longer data values than defined in the column definition will result in an error. For columns where the maximum length cannot be defined due to the extremely variable nature of character values, the maximum length can be omitted in the syntax along with the keyword. There is also a TEXT datatype that is specifically used in such cases.
SELECT 'The quick brown fox jumps over the lazy dog.'::VARCHAR AS "UsingVARCHAR";
--Output: "The quick brown fox jumps over the lazy dog.”
SELECT 'The quick brown fox jumps over the lazy dog.'::TEXT AS "UsingTEXT";
--Output: "The quick brown fox jumps over the lazy dog.”
Numeric
For storing numbers without fractional components, there are three data types available in PostgreSQL:
- SMALLINT or INT2
This is used if only small range integer values are to be stored and must be preferred only if there are constraints on disk space. It takes up two bytes of storage.
- INTEGER or INT or INT4
This is the most commonly used and preferred datatype since it provides the best balance between range, storage size, and performance. It takes up 4 bytes of storage.
- BIGINT or INT8
In cases where the range of the INTEGER datatype is not sufficient and larger integer values are to be stored, BIGINT is the preferred datatype. It takes up eight bytes of storage.
For numeric values with fixed precision, the NUMERIC or DECIMAL datatype can be used. The length and precision attributes are mentioned along with the NUMERIC or DECIMAL keyword.
SELECT 9.99::NUMERIC(8,3) AS "NumericValue";
--Output: 9.990
In case they are omitted, numeric values of any precision and scale can be stored in the column up to the implementation limit.
SELECT 999989999999.889999::NUMERIC AS "NumericValue";
--Output: 999989999999.889999
The value to be stored is rounded to the specified number of fractional digits in case the value is greater than the declared scale.
SELECT 99998.889999::NUMERIC(8,3) AS "NumericValue";
--Output: 99998.890
Apart from the ordinary numeric values, the numeric type allows the special value NaN (i.e. Not-a-Number).
SELECT 'NaN'::NUMERIC AS "Not-a-Number";
--Output: NaN
If the values to be stored may have variable precision, they are stored as approximations through the use of following floating-point data types:
- REAL (or FLOAT4)
SELECT 99998999.889999::REAL AS "RealValue";
--Output: 9.9999e+007
- DOUBLE PRECISION (or FLOAT8)
SELECT 99998999.889999::DOUBLE PRECISION AS "DoublePrecisionValue";
--Output: 99998999.889999
Note: Storing and retrieving such values might result in slight discrepancies depending on the underlying processor, operating system, and compiler.
For storing monetary values, the MONEY datatype can be used. It allows storage of currency symbols and commas.
SELECT '$100,000.00'::MONEY AS "CTC";
--Output: "$100,000.00"
Date/Time
- DATE This is used to store date values without a time of day.
SELECT '2020, 4 July'::DATE AS "US Independence Day";
--Output:"2020-04-07"
SELECT '20200407'::DATE AS " US Independence Day";
--Output:"2020-04-07"
- TIME This is used to store only time of day values, which can be with or without the time zone.
SELECT '12:00:00.12345678'::TIME(2) AS "Noon with 2 micro-second precision";
--Output: "12:00:00.12"
SELECT '00:00:00 IST'::TIME WITH TIME ZONE AS "Midnight as per Indian Standard Time";
--Output: "00:00:00+02:00"
- TIMESTAMP This is used to store the date along with the time of day.
SELECT '2020, 4 July 12:00:00'::TIMESTAMP AS "US Independence Day NOON";
--Output: "2020-07-04 12:00:00"
SELECT '2020, 4 July 12:00:00 EST'::TIMESTAMP WITH TIME ZONE AS "US Independence Day NOON with Time zone";
--Output: "2020-07-04 22:30:00+05:30"
- INTERVAL This is a quite versatile datatype that can accept date/time-based intervals in various formats.
SELECT '1 Y 5 Mon 2 H 5 M'::INTERVAL AS "1Year2Months2Hours5Minutes";
--Output:"1 year 5 mons 02:05:00"
select '11 hours, 30 minutes, 40 seconds ago'::INTERVAL AS "11 and a half hours, 40 seconds ago";
--Output: "-11:30:40"
Boolean
At times there could be a requirement to store TRUE, FALSE, or UNKOWN flags in a particular column. For such a case, the BOOLEAN datatype can be used. TRUE/FALSE values can be represented as Y/N, T/F, 1/0 and ON/OFF as well. UNKOWN values are represented by NULL.
SELECT 1::BOOLEAN AS "TrueValue", 0::BOOLEAN AS "FalseValue",NULL::BOOL AS "Unknown";
--Output: true false [null]
Array
If there is a requirement to store a collection of values belonging to an in-built or user-defined datatype, an ARRAY can be defined by simply adding square brackets along with the keyword for the datatype.
select '{ "Certification practice exams", "Interactive courses", "Technical Projects" }'::VARCHAR[] as "Pluralsight Premium"
--Output: "{"Certification practice exams","Interactive courses","Technical Projects"}"
SELECT ARRAY[55,66,77]::INTEGER[] AS "Integer Array"
--Output: {55,66,77}
Unlike programming languages, the array index in PostgreSQL begins with 1 rather than 0.
SELECT (ARRAY[55,66,77]::INTEGER[])[1] AS "First Array Element"
--Output: 55
Functions in PostgreSQL
In this section, you will learn about various functions that can prove quite useful in day-to-day tasks related to data analysis, database administration, and database development.
- NOW() Returns the current timestamp with timezone.
select NOW()
--Output:"2020-07-14 21:29:17.003693+05:30"
- EXTRACT() or DATE_PART() Returns part of the date mentioned.
select EXTRACT('Year' from NOW()),EXTRACT('Century' from NOW()), DATE_PART('MONTH',NOW())
--Output: 2020 21 7
- AGE() Returns the time period from the date value/column passed as a parameter.
SELECT AGE('July 4, 1776'::date) as "Age since US Independence"
--Output: "244 years 10 days"
- SUBSTRING() Returns the substring of certain length (passed as the third parameter) from the start position (passed as the second parameter). Note that in PostgreSQL, the string is considered to start from 0 index position.
SELECT SUBSTRING('Pluralsight',0,7);
--Output: "Plural"
*STRING_AGG() Converts a column of values into a delimited list. The delimiter can be specified as a parameter.
SELECT STRING_AGG(DISTINCT table_type, ',') AS "Distinct Table types in the Public schema" FROM information_schema.tables WHERE table_schema = 'public';
--Output:"BASE TABLE,VIEW"
- ARRAY_CAT() Used to concatenate two arrays.
SELECT ARRAY_CAT('{1,2,3}'::INT[],'{4,5,6}'::INT[]),
--Output:{1,2,3,4,5,6}
- ALL() Used to check if all values of a subquery/array match with the outer query/array.
SELECT 1 = ALL(ARRAY[1,1,1]) as "Do all elements match with 1?"
--Output: true
SELECT 'BASE TABLE,VIEW' = ALL (SELECT STRING_AGG(DISTINCT table_type, ',') AS "Distinct Table types in the Public schema" FROM information_schema.tables WHERE table_schema = 'public')
--Output: true
- IN() or ANY() or SOME() Used to check if any of the values of a subquery/array match with the outer query/array.
SELECT 3 = ANY(ARRAY[1,2,3]), 3 = SOME(ARRAY[1,2,3]), 3 IN (1,2,3)
--Output: true
- COALESCE() Accepts a number of parameters and return the first not null value it comes across from all parameters or NULL in case all are NULL.
SELECT COALESCE (null,null,null,'THIS IS NOT NULL','THIS IS ALSO NOT NULL')
--Output: "THIS IS NOT NULL"
- FILTER() This is a special function only available in PostgreSQL that can be used to further add a where clause in the SELECT part of your query to do aggregations across those filtered records.
SELECT count(*) filter (where is_insertable_into='YES') as "Count for records with is_insertable_into YES", count(*) filter (where table_type='VIEW') as "Count of VIEWS"
FROM information_schema.tables where table_schema not like 'pg%'
--Output:39 62 –Your output may vary as per the available tables
Functions for Database Administration
-
PG_SIZE_PRETTY This is used along with other functions like pg_table_size to obtain the output in a memory unit like KB, MB, GB etc. instead of the usual bigint number returned by those functions.
-
PG_DATABASE_SIZE Returns the size for the database name passed as a parameter.
SELECT PG_SIZE_PRETTY (PG_DATABASE_SIZE ('postgres'))
--Output: "10 MB"
- PG_TABLE_SIZE Returns the size for the database table name passed as a parameter.
SELECT PG_SIZE_PRETTY (PG_TABLE_SIZE ('postgres'))
--Output: "2 MB"
- VERSION Returns the PostgreSQL Database version.
SELECT VERSION()
--Output: "PostgreSQL 12.3, compiled by Visual C++ build 1914, 64-bit"
Conclusion
Feel free to explore this course to learn a lot more about the data types and functions available in PostgreSQL.