Pentaho Enterprise Edition: Extract the BA schedule into a database

by Andrew Cave

September 3, 2018

Pentaho Enterprise and Community Editions offer the Scheduling tool to enable jobs to be set to run at certain times on a repeatable basis. At the moment, while the schedule is viewable, there is no built-in method for extracting the schedules for reporting or documentation purposes. A couple of years back I figured out the methods to achieve this and I’d like to share the ETL with the wider Pentaho community.

This ETL uses the Pentaho REST API to get the XML containing the scheduler information, including the emails, the output types, the messages and which reports/jobs/transformations are being used.

screen shot of the KTR

In section 1, we set the URL, password and HTTP Accept type header (application/xml) and return the xml in a single string variable.

In section 2, the result XML is sent to XML readers which will split it to gain different contexts of the data

a. ‘Get job header’ fetches the timing information
b. ‘get DayOfWeek Recurrences’ picks out the scheduling recurrences for days of the week (e.g. when set to run Mon, Wed, Fri)
c. ‘job data detail’ extracts the information stored in ‘key/value’ pairs

Section 3 is where ‘key/value’ pairs which are in an indefinite number of rows, are ‘pivoted’ so that the key is now the field name and the ‘value’ is the…value.

After those sections, the rows from each line are joined using the jobname, UNIX timestamps get converted to database timestamps and the data is prepared before finally being inserted to the table pentaho_schedule

Here’s a zip of the whole ktr. It has explanatory notes and the (postgres) DDL for creating the pentaho_schedule table. Converting that to another database should be easy.
Don’t forget to change the database connection to suit your setup.

read_schedule_ba

Andrew Cave

Andrew Cave is a senior data engineer with BizCubed. He has worked in network data, billing, telco credit and debt after a career in the welfare sector. He loves databases. Follow him on LinkedIn

Pentaho Enterprise Edition: Extract the BA schedule into a database

Andrew Cave

More blog posts

Top 10 Reasons Organisations are Spending Too Much on Data

Meet the women of BizCubed – Buddy (Mousaidna) Rosario

Meet the Women of BizCubed – Karina Liauw

Follow us