Created
May 8, 2021 15:55
-
-
Save alecco/9976dab8fda8256ed403054ed0a65d7b to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
-- Note: example given is for SQLite but can be adapted to other SQL | |
-- This is a fast query to sample random rows | |
-- It gets a list of existing row identifiers (rowids), | |
-- materializes, sorts, and picks 10. | |
SELECT * FROM test | |
WHERE rowid IN | |
(SELECT rowid FROM test | |
ORDER BY random() LIMIT 10); | |
-- If the table has many, many rows, say hundreds of millions, | |
-- materializing the list and sorting it becomes expensive. | |
-- The size of the list of ids can be reduced. But don't | |
-- overdo it or you'll affect the sampling properties! | |
SELECT * FROM test JOIN | |
(SELECT rowid as rid FROM test | |
WHERE random() % 10 = 0 -- Reduce rowids by 10x | |
ORDER BY random() LIMIT 10) AS srid | |
ON test.rowid = srid.rid; | |
-- The magic reduction value can be calculated with count() of the table | |
-- and finding a reasonable number to reduce to a million (for example). | |
-- But you have to also be careful for tables with less than a million rows. | |
-- This is left as exercise. | |
-- To populate a table to play with this you can do | |
-- https://stackoverflow.com/a/28139886 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment