How To Generate Random Data in PostgreSQL

It doesn’t happend every day, but there are occasionally reasons to use random data, or even random sequences of data. PostgreSQL, “The World’s Most Advanced Open Source Relational Database” allows to generate random data in a lot of different ways. While I was preparing the new version and sample data for our recommendation system RecommenderPro I was using PostgreSQL to create some (a lot) of anonimized data to demonstrate the functionality of the product, hence the idea of this post. Here’s some example of how to use our beloved PostgreSQL to generate random data:

random() function

If you just need a random float number the random function is enough. It return a floating point number in the range 0 <= x < 1.

select random() /* 0.0 <= value < 1 */

If you need a floating point number between 0 and, let’s say, 9, you can use the following select statement.

select random()*9  /* 0.0 <= value < 9 */

If you need an integer number between 0 and 9 use the following select statement.

select floor(random()*9)  /* 0 <= value < 9 */

And if you’d like to have some different offset you can simply subtract or add that. The following will return values between -20 and 20:

SELECT trunc(40*random() - 20);

PostgreSQL doesn a good job in returning random values however, there are chanches, that you need repetible values to make some kind of observation. How to force random to generate always the same random values? Simple! Just seed the random!

Try run the following query multiple times

SELECT setseed(.5), random();

It returns a random result as expected but it’s the same random result every time. Changing the seed value from 0.5 to some other value, the value returned by random will be different but will be still always the same for any query execution. In case you have multiple query that need the same sequence of randomic values, you can also set the seed in the first select.

SELECT setseed(.5);
select random();
select random()*10;
select floor(random()*10);

Sequences of random data using generate_series

One of the most important tool in the perfect PostgreSQL developer toolbox is the generate_series function.

The generate_series function generates a list of numbers (or timestamp) from a starting value to a stop value using a step (by default the step is 1).

select * from generate_series(1,10) 
generate_series
1
2
3
4
5
6
7
8
9
10

If you need to define a column name, use the following syntax.

select * from generate_series(1,10) as mytable(myvalue)
myvalue
1
2
3
4
5
6
7
8
9
10

To generate some random data with an id you can get some idea from the following select.

SELECT 
	generate_series(1,10) AS id, 
	md5(random()::text) AS random_text, 
	random() * 10 AS random_number,
	floor(random() * 10) AS random_integer,
	'2000-01-01'::date + trunc(random() * 366 * 10)::int random_date /* 10 year range starting from jan 1st, 2000 */ 
id random_text random_number random_integer random_date
1 8a17b7ed70b2b1835727682787f39097 0.4225312727089303 3.0 2000-09-08
2 f47324eb11a6416b66e53d775f1f4098 3.621055028581388 3.0 2009-01-16
3 8593ee40889c503387193cc771683e45 6.211348115097302 7.0 2004-05-02
4 0f8d782bed4d512d0e3e5c5d6dd0ca34 8.290160101204087 8.0 2010-01-03
5 58f2a1ff22f712cac6dfee48a5c97ba6 6.324909868740711 3.0 2005-10-11
6 f98823fd945e5b698dfeab2aa08e3df4 7.469451084566678 0.0 2003-07-01
7 5c5835cbf6b0b64c463ea7594139269b 8.83030099747625 9.0 2007-12-18
8 e810e029b46c9c548cd3cf3982c1ce34 0.4947420333170882 3.0 2000-05-06
9 8a8f2fd5ba9bf9cf3488ce68b82b718e 4.968150791127037 3.0 2004-07-28
10 5de881f247410b7ea409d5b41c10c284 1.0645231178098058 2.0 2008-05-13

To create a table with random data you can use the create table ... as select statement.

create table my_random_data as
select 
	generate_series(1,10) AS id, 
	md5(random()::text) AS random_text, 
	random() * 10 AS random_number,
	floor(random() * 10) AS random_integer,
	'2000-01-01'::date + trunc(random() * 366 * 10)::int random_date /* 10 year range starting from jan 1st, 2000 */ 
	

In this case the column data types are inferred by the select output, as shown below

TABLE my_random_data (
	id int4 NULL,
	random_text text NULL,
	random_number float8 NULL,
	random_integer float8 NULL,
	random_date date NULL
);

Generate random first names and last name

Working with random numbers and dates are quite simple. Dealing with random character data with a “potential” meaning isn’t. How to generate unique random first names and last names?

Here it is!

drop table if exists random_people;
create table random_people as
  with rnd_people as (
    select distinct  
	    ('[0:9]={"Daniele","Debora","Mattia","Jake","Amy","Henry","Neil","Tim","John","George"}'::text[])
		    [floor(random()*10)] first_name,
	    ('[0:9]={"Rossi","Verdi","Gialli","Caponi","Gallini","Gatti","Ford","Daniel","Harrison","Macdonald"}'::text[])
		    [floor(random()*10)] last_name
    from 
      generate_series(1, 10 * 10 * 50)
  )
  select 
    row_number() over() id, first_name, last_name 
  from 
    rnd_people;

After executing this query the random_people table contains (very probably) 100 records with data similar to the followings.

id first_name last_name
1 Henry Daniel
2 John Gallini
3 Daniele Gallini
4 Debora Caponi
5 Neil Gialli
6 George Harrison
7 John Gatti
8 Jake Gialli
9 Jake Harrison
10 Tim Rossi

Generate random data from a text file

Having the correct grants PostgreSQL allows to read files from the file system. The following query generates 10000 records with verses from the “DIVINA COMMEDIA - Canto 1° Inferno” in random order. The text is stored in a text file accessible by the PG server. There are 136 lines of text in the file (hence the random() * 136). Array returned by string_to_array starts by 1.

WITH versi_canto_1(verso) AS(
  SELECT (string_to_array(pg_read_file('C:\DEV\recommenderpro\data\inferno_canto_1.txt')::text,E'\r\n')) 
) 
SELECT versi_canto_1.verso[floor(random() * 136) + 1] FROM versi_canto_1, generate_series(1,10000);

The output is similar to the following

verso
Io non so ben ridir com’i’ v’intrai,
ripresi via per la piaggia diserta,
tu se’ solo colui da cu’ io tolsi
Vedi la bestia per cu’ io mi volsi:
Ed una lupa, che di tutte brame
che tu mi meni là dov’or dicesti,
Ma tu perché ritorni a tanta noia?
E qual è quei che volontieri acquista,
con lei ti lascerò nel mio partire;
con la paura ch’uscia di sua vista,
nel tempo de li dèi falsi e bugiardi.
rispuose poi che lagrimar mi vide,
mantoani per patria ambedui.

Conclusions

Hope you’ll find this post useful. Do you have any other advice about data generation in PostgreSQL? Let me know in the comments.

Comments

comments powered by Disqus