Last active
December 19, 2024 08:32
-
-
Save v20100v/c6a196ef364276b62429c0af90bad137 to your computer and use it in GitHub Desktop.
List all Confluence spaces with indicators in raw SQL
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
SELECT DISTINCT | |
s.spaceid as space_id, | |
s.spacename as space_name, | |
s.spacekey as space_key, | |
s.spacetype as space_type, | |
s.spacestatus as space_status, | |
sd.body as space_description, | |
s.creationdate as space_creation_date, | |
up.username as space_creator_username, | |
u.display_name as space_creator_displayname, | |
u.email_address as space_creator_email, | |
MAX(t.nb_of_page) as nb_of_page, | |
MAX(t.nb_of_blogpost) as nb_of_blogpost, | |
MAX(t.nb_of_attachment) as nb_of_attachment, | |
MAX(t.nb_of_custom) as nb_of_custom, | |
MAX(t2.nb_of_comment) as nb_of_comment, | |
MAX(t3.nb_of_like) as nb_of_like, | |
MAX(t4.total_attachments_size) as total_attachments_size, | |
last_creation.username as last_creator, | |
last_creation.creationdate as last_creation_date, | |
last_modification.username as last_modifier, | |
last_modification.lastmoddate as last_modification_date | |
FROM spaces s | |
-- Most recent date of content creation with the creator | |
LEFT JOIN ( | |
SELECT c.spaceid, up.username, c.creationdate | |
FROM content c | |
LEFT JOIN user_mapping up ON c.creator = up.user_key, | |
( | |
SELECT c2.spaceid, MAX(c2.creationdate) as last_creationdate | |
FROM content c2 | |
WHERE c2.content_status = 'current' | |
AND c2.creationdate IS NOT NULL | |
AND c2.contenttype != 'SPACEDESCRIPTION' | |
GROUP BY c2.spaceid | |
) last_creation | |
WHERE c.content_status = 'current' | |
AND c.spaceid = last_creation.spaceid | |
AND c.creationdate = last_creation.last_creationdate | |
) last_creation ON last_creation.spaceid = s.spaceid | |
-- Most recent date of content modification with the modifier | |
LEFT JOIN ( | |
SELECT c3.spaceid, up2.username, c3.lastmoddate | |
FROM content c3 | |
LEFT JOIN user_mapping up2 ON c3.lastmodifier = up2.user_key, | |
( | |
SELECT c4.spaceid, MAX(c4.lastmoddate) as last_modificationdate | |
FROM content c4 | |
WHERE c4.content_status = 'current' | |
AND c4.lastmoddate IS NOT NULL | |
GROUP BY c4.spaceid | |
) last_modification | |
WHERE c3.content_status = 'current' | |
AND c3.contenttype != 'SPACEDESCRIPTION' | |
AND c3.spaceid = last_modification.spaceid | |
AND c3.lastmoddate = last_modification.last_modificationdate | |
) last_modification ON last_modification.spaceid = s.spaceid | |
-- Number of page, blogpost, attachment and custom | |
LEFT JOIN( | |
SELECT | |
c5.spaceid, | |
c5.contenttype, | |
CASE WHEN c5.contenttype = 'PAGE' | |
THEN COUNT(c5.contenttype) END nb_of_page, | |
CASE WHEN c5.contenttype = 'BLOGPOST' | |
THEN COUNT(c5.contenttype) END nb_of_blogpost, | |
CASE WHEN c5.contenttype = 'ATTACHMENT' | |
THEN COUNT(c5.contenttype) END nb_of_attachment, | |
CASE WHEN c5.contenttype = 'CUSTOM' | |
THEN COUNT(c5.contenttype) END nb_of_custom | |
FROM content c5 | |
WHERE c5.content_status = 'current' | |
AND c5.contenttype != 'SPACEDESCRIPTION' | |
GROUP BY c5.contenttype, c5.spaceid | |
) t ON t.spaceid = s.spaceid | |
-- Number of comment | |
LEFT JOIN( | |
SELECT c7.spaceid, COUNT(c6.contenttype) as nb_of_comment | |
FROM content c6 | |
JOIN content c7 ON c6.pageid = c7.contentid | |
WHERE c6.prevver IS NULL | |
AND c6.content_status = 'current' | |
AND c6.contenttype = 'COMMENT' | |
GROUP BY c6.contenttype, c7.spaceid | |
) t2 ON t2.spaceid = s.spaceid | |
-- Number of like | |
LEFT JOIN( | |
SELECT c8.spaceid, COUNT(l.id) as nb_of_like | |
FROM likes l | |
JOIN content c8 ON l.contentid = c8.contentid | |
WHERE c8.prevver IS NULL | |
AND c8.content_status = 'current' | |
GROUP BY c8.spaceid | |
) t3 ON t3.spaceid = s.spaceid | |
-- Total attachments size | |
LEFT JOIN( | |
SELECT c9.spaceid, SUM(cp.LONGVAL) as total_attachments_size | |
FROM content c9 | |
JOIN contentproperties cp ON c9.contentid = cp.contentid | |
WHERE c9.contenttype = 'ATTACHMENT' | |
AND cp.propertyname = 'FILESIZE' | |
AND c9.content_status = 'current' | |
AND c9.prevver IS NULL | |
GROUP BY c9.spaceid | |
) t4 ON t4.spaceid = s.spaceid | |
LEFT JOIN user_mapping up ON up.user_key = s.creator | |
LEFT JOIN cwd_user u ON u.user_name = up.username | |
LEFT JOIN content ON content.spaceid = s.spaceid | |
LEFT JOIN bodycontent sd ON sd.contentid = content.contentid | |
WHERE content.contenttype = 'SPACEDESCRIPTION' | |
GROUP BY | |
s.spaceid, | |
s.spacename, | |
s.spacekey, | |
s.spacetype, | |
s.spacestatus, | |
sd.body, | |
up.username, | |
u.user_name, | |
u.display_name, | |
u.email_address, | |
last_creator, | |
last_creation_date, | |
last_modifier, | |
last_modification_date |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
List all spaces with indicators in raw SQL
Context
The goal is to build an SQL query to list all the Confluence spaces of an Atlassian server or Atlassian Datacenter instance (*). Here is the list of information and indicators retrieved for each space in this request:
We did not do any specific benchmark to measure query execution time. But as an example we obtain in our test Confluence instance an execution time of 9.8 seconds for 768 spaces.
(*) Tested only on Confluence server 7.13 with PostgreSQL. The query is written for PostgreSQL database without use exotic SQL function, so there shouldn't be a lof of adjustement to do for other DBMS.
Building this query step by step
To list all Confluence spaces
Let's add the description property of a space
To do this, we need to join two table. The description of a space is in the content table. Then just filter on content type condition (
contenttype=SPACEDESCRIPTION
) like this :Let's add the space's creator with the creation date
Still in the table space, we find the creator and creationdate fields that can be easily joined.
Be carefull to the interpretation of the lastmodifier and lastmoddate field! It is by no means the last contributor in the space who would have modified a page or deposited an attachment. In fact, this is the last user to have modified the description field of the space. We will see below how to identify the last contributor to a space, and more precisely the last creator of new content, and the last user to have modified content.
Let's add the last creator and the last creation date foreach space
Just use the
MAX()
aggregate function on the creationdate field of the content table, and group the result by space. In this example, we have excluded the description of a space from the contributions of a space. So if you want to focus only on pages or blogposts, it will suffice to filter on these types of content.Ok it's easy to fetch the last creation date, but to add the associated creator it's a bit more difficult. To do this, we use an SQL subquery like this.
Ok nice! We can now add to our starting query with a simple join on the space identifier.
Let's add the last modifier and the last modification date foreach space
We do the same thing by doing the following left join:
Let's count the number of page, blogpost, attachment file and custom content foreach space
Rather than separating this calculation into several queries with GROUP BY clause, we group them into a single sub-query for performance reasons. It is much faster to do a matrix transpose than to perform several joins.
And finally we add it to our starting query with a simple join on the space identifier, as we saw previously. But this time we add an additional GROUP BY taking the MAX in order to denormalize the result, i.e. do the transpose.
Let's count the number of comment
A comment is also contain into the content table. But we must do a reflexive join to find the associated space, because a comment does not have an associated spaceid. And so you have to find the page associated with the comment to find the associated space.
Let's count the number of likes
We use the likes table.
Let's add the total size of all attachments (only last version) foreach space
We use the contentproperties table.
If you want to get the total size of all attachments for all version, just remove the filter
c9.prevver IS NULL)
in the query.Here we go !
Bonus (trick)
What if I want to paginate the result?
To implement a server-side paging system (for a Confluence data mining API), the simplest idea is to add the usual statement
LIMIT $limit OFFSET $offset
at the end. But the performance gain is extremely limited with regard to the number of rows in each join to be processed. Indeed, in our test instance, extracting 10 spaces takes almost 9.1 seconds.The paging gain is minimal. In fact, for the gain to be significant, the dataset of each join should be filtered with the following WHERE condition :
By adding this WHERE condition on every join, extracting 10 spaces takes almost 518ms in our test instance. Harder, Better, Faster... Stronger !!! Below is an example of adding this pagination clause in the join that calculates the total size of the attachments. For the implementation with an application server (Java, PHP, nodeJs...), it is enough to replace the numerical values of this example by the variables: limit and offset.