Workflow Team Web Tools¶

Welcome to the documentation for the new Workflow Team Web Tools.

Using the Web Tools as an Operator
Workflows Procedures
Running the Web Tools
Predicting Actions with SciKit-Learn
Troubleshooting
Maintaining the Python Backend
JavaScript and Mako User Interface

Using the Web Tools as an Operator ¶

Once you are pointed to a proper URL to access the webtools, you will come to the home page, with links to different views. For each of the examples below, I will use the base URL of https://localhost:8080/, since that is likely the URL you can see if you run the server on your machine. If you are looking at a production server, the URL will of course be different.

For users familiar with CherryPy, each page is a function of a WorkflowTools object. To pass parameters to the function in a browser, the usual urlencoding of the parameters can be appended to the URL to call each function. Most users should be able to interact with the website exclusively through links though. From the URL root index, users will be able to directly access the following:

The Global Error View
Creating a New Account
Reseting Account Password
Viewing Workflow Logs

The Global Error View ¶

WorkflowTools.globalerror(pievar='errorcode')[source]¶

This page, located at https://localhost:8080/globalerror, attempts to give an overall view of the errors that occurred in each workflow at different sites. The resulting view is a table of piecharts. The rows and columns can be adjusted to contain two of the following:

Workflow step name
Site where error occurred
Exit code of the error

The third variable is used to split the pie charts. This variable inside the pie charts can be quickly changed by submitting the form in the upper left corner of the page.

The size of the piecharts depend on the total number of errors in a given cell. Each cell also has a tooltip, giving the total number of errors in the piechart. The colors of the piecharts show the splitting based on the pievar. Clicking on the pie chart will show the splitting explicitly using the List Workflows page.

If the steps make up the rows, the default view will show you the errors for each campaign. Clicking on the campaign name will cause the rows to expand to show the original workflow and all ACDCs (whether or not the ACDCs have errors). Following the link of the workflow will bring you to Detailed Workflow View. Clicking anywhere else in the workflow box will cause it to expand to show errors for each step.

Parameters:

pievar (str) –

The variable that the pie charts are split into. Valid values are:

errorcode
sitename
stepname

Returns: the global views of errors

Return type: str

List Workflows ¶

WorkflowTools.listpage(errorcode='', sitename='', workflow='')[source]¶

This returns a list of workflows, site names, or error codes that matches the values given for the other two variables. The page can be accessed directly by clicking on a corresponding pie chart on The Global Error View.

Parameters:	errorcode (int) – Error to match sitename (str) – Site to match workflow (str) – The workflow to match
Returns:	Page listing workflows, site names or errors codes
Return type:	str
Raises:	cherrypy.HTTPRedirect to 404 if all variables are filled

Detailed Workflow View ¶

WorkflowTools.seeworkflow(workflow='', issuggested='')[source]¶

Located at https://localhost:8080/seeworkflow, this shows detailed tables of errors for each step in a workflow.

For the exit codes in each row, there is a link to view some of the output of the error message for jobs having the given exit code. This should help operators understand what the error means.

At the top of the page, there are links back for The Global Error View, Viewing Workflow Logs, related JIRA tickets, and ReqMgr2 information about the workflow and prep ID.

The main function of this page is to submit actions. Note that you will need to register in order to actually submit actions. See Creating a New Account for more details. Depending on which action is selected, a menu will appear for the operator to adjust parameters for the workflows.

Under the selection of the action and parameters, there is a button to show other workflows that are similar to the selected workflow, according to the Workflow Info. Each entry is a link to open a similar workflow view page in a new tab. The option to submit actions will not be on this page though (so that you can focus on the first workflow). If you think that a workflow in the cluster should have the same actions applied to it as the parent workflow, then check the box next to the workflow name before submitting the action.

Finally, before submitting, you can submit reasons for your action selection. Clicking the Add Reason button will give you an additional reason field. Reasons submitted are stored based on the short reason you give. You can then select past reasons from the drop down menu to save time in the future. If you do not want to store your reason, do not fill in the Short Reason field. The long reason will be used for logging and communicating with the workflow requester (eventually).

Parameters:	workflow (str) – is the name of the workflow to look at issuggested (str) – is a string to tell if the page has been linked from another workflow page
Returns:	the error tables page for a given workflow
Return type:	str
Raises:	404 if a workflow doesn’t seem to be in assistance anymore Resets personal cache in the meanwhile, just in case

Getting the List of Actions ¶

WorkflowTools.getaction(days=0, acted=0)[source]¶

The page at https://localhost:8080/getaction returns a list of workflows to perform actions on. It may be useful to use this page to immediately check if your submission went through properly. This page will mostly be used by Unified though for acting on operator submissions.

Parameters:

days (int) – The number of past days to check. The default, 0, means to only check today.
acted (int) –
Used to determine which actions to return. The following values can be used:
- 0 - The default value selects actions that have not been run on
- 1 - Selects actions reported as submitted by Unified
- Negative integer - Selects all actions

Returns:

JSON-formatted information containing actions to act on. The top-level keys of the JSON are the workflow names. Each of these keys refers to a dictionary specifying:

”Action” - The action to take on the workflow
”Reasons” - A list of reasons for this action
”Parameters” - Changes to make for the resubmission
”user” - The account name that submitted that action

Return type:

JSON

Reporting Completed Actions ¶

WorkflowTools.reportaction()[source]¶

A POST request to https://localhost:8080/reportaction tells the WorkflowWebTools that a set of workflows has been acted on by Unified. The body of the POST request must include a JSON with the passphrase under "key" and a list of workflows under "workflows".

An example of making this POST request is provided in the file WorkflowWebTools/test/report_action.py, which relies on WorkflowWebTools/test/key.json.

Returns:

A JSON summarizing results of the request. Keys and values include:

’valid_key’ - If this is false, the passphrase passed was wrong,

and no action is changed
’valid_format’ - If false, the input format is unexpected
’success’ - List of actions that have been marked as acted on
’already_reported’ - List of workflows that were already removed
’does_not_exist’ - List of workflows that the database does not know

Return type: JSON

Viewing Workflow Logs ¶

WorkflowTools.showlog(search='', module='', limit=50)[source]¶

This page, located at https://localhost:8080/showlog, returns logs that are stored in an elastic search server. More details can be found at Elastic Search Tools. If directed here from Detailed Workflow View, then the search will be for the relevant workflow.

Parameters:	search (str) – The search string module (str) – The module to look at, if only interested in one limit (int) – The limit of number of logs to show on a single page
Returns:	the logs from elastic search
Return type:	str

Creating a New Account ¶

WorkflowTools.newuser(email='', username='', password='')[source]¶

New users can register at https://localhost:8080/newuser. From this page, users can enter a username, email, and password. The username cannot be empty, must contain only alphanumeric characters, and must not already exist in the system. The email must match the domain names listed on the page or can be a specific whitelisted email. See Setting Up Server Configuration for more information on setting valid emails. Finally, the password must also be not empty.

If the registration process is successful, the user will recieve a confirmation page instructing them to check their email for a verification link. The user account will be activated when that link is followed, in order to ensure that the user possesses a valid email.

The following parameters are sent via POST from the registration page.

Parameters:	email (str) – The email of the new user username (str) – The username of the new user password (str) – The password of the new user
Returns:	a page to generate a new user or a confirmation page
Return type:	str
Raises:	cherrypy.HTTPRedirect back to the new user page without parameters if there was a problem entering the user into the database

Reseting Account Password ¶

WorkflowTools.resetpassword(email='', code='', password='')[source]¶

If a user forgets his or her username or password, navigating to https://localhost:8080/resetpassword will allow them to enter their email to reset their password. The email will contain the username and a link to reset the password.

This page is multifunctional, depending on which parameters are sent. The link actually redirects to this webpage with a secret code that will then allow you to submit a new password. The password is then submitted back here via POST.

Parameters:	email (str) – The email linked to the account code (str) – confirmation code to activate the account password (str) – the new password for a given code
Returns:	a webview depending on the inputs
Return type:	str
Raises:	404 if both email and code are filled

Manually Reseting Your Workflow Cache ¶

WorkflowTools.resetcache(prepid='', workflow='')[source]¶

The function is only accessible to someone with a verified account.

Navigating to https://localhost:8080/resetcache resets the error info for the user’s session. It also clears out cached JSON files on the server. Under normal operation, this cache is only refreshed every half hour.

Returns:	a confirmation page
Return type:	str

Redoing the Workflow Clusters ¶

WorkflowTools.cluster()[source]¶

The function is only accessible to someone with a verified account.

Navigating to https://localhost:8080/cluster causes the server to regenerate the clusters that it has stored. This is useful when the history database of past errors has been updated with relevant errors since the server has been started or this function has been called.

Returns:	a confirmation page
Return type:	str

Workflows Procedures ¶

Here is the list of recommended procedures for various exit codes:

Exit Code	Cause	Normal Procedure	Additional Procedure	Additional Match Expression
73	RootEmbeddedFileSequence: secondary file list seems to be missing. When overflow procedure is enables, the agent should be able to see the secondary file set as present at those sites as well.	Open a GitHub issue
84	The file could not be found	ACDC	If a second round of ACDC and fails for same files, contact the transfer team and/or site support team.	root://[-/w.]+.root
85	If FileReadError, the file could not be read. If R__unzip: error, most likely a bad node with corrupted intermediate files.	ACDC	If getting unzip errors, most likely a bad node. Report immediately. If a second round of ACDC and fails for same files, contact the transfer team and/or site support team.	root://[-/w.]+.root R__unzip
92	File was not found and fallback procedure also failed.	ACDC	If a second round of ACDC and fails for same files, contact the transfer team and/or site support team.	root://[-/w.]+.root
134		Strong failure to report immediately
137	Likely an unrelated batch system kill. Sometimes a site-related problem.	Point SSP. Open GGUS ticket and put site in drain.
8001		Strong failure, report immediately
8002		Strong failure, report immediately
8004	BadAlloc - memory exhaustion	Report immediately to the Site Support Team.
11003	JobExtraction failures	ACDC
50513	Failure to run SCRAM setup scripts	ACDC
50660	Performance kill: “Job exceeding maxRSS”	More memory
61202	Can’t determine proxy filename. X509 user proxy required for job.	If temporary failure: ACDC, otherwise contact glideinWMS team
71104	Site containing only available copy of input datasets is in drain.	Copy input files from tape (?) or wait for site listed to come out of drain.		The job can run only at .(T[12].
71304	Job was killed by WMAgent for using too much wallclock time	ACDC
71305	Job was killed by WMAgent for using too much wallclock time	ACDC
71306	Job was killed by WMAgent for using too much wallclock time	ACDC
99109	Uncaught exception in WMAgent step executor (often staging out problems)	Check the Site Readiness and consult with the Site Support Team. If a temporary failure: ACDC. If a problem with the site: Make GGUS ticket, put the site in drain then kill and clone
99303	The agent has not found the JobReport.x.pkl file. Regardless of the status of the job, it will fail the job with 99303. Work is needed to make this error as rare as possible since there are very limited cases to make the file fail to appear.	Strong failure, report immediately

The module WorkflowWebTools.procedures, which is used by WorkflowWebTools.classifyerrors contains only the PROCEDURES dictionary. The keys of this dictionary are the integer exit codes from the agent to act on. The values are dictionaries that contain 'normal' and 'additional' actions and a 'cause'. The 'additional' actions are dictionaries containing 're' and 'action'. The 're' field is a compiled regular expression to match to the logs. group(1) of this regular expression is taken to be a parameter to be displayed to the operator. The 'action' field gives additional instructions to the operator, who should inspect the parameters returned by the regular expression to determine the ultimate action to perform. The 'cause' will add some insight into the error codes in the documentation. It isn’t shown to the operator because they already see the different types of error codes and are provided with a link to look at the error logs directly.

An example procedure with just exit code 84 would look like the following:

PROCEDURES = {
    84: {
        'normal': 'ACDC',
        'cause': 'The file could not be found',
        'additional': {
            're': re.compile(r'(root://.+root)'),
            'action': ('If a second round of ACDC and fails for same files, '
                       'contact the transfer team and/or site support team.')
            }
        }
    }

What this means is that error code 84 usually should be an ACDC. However, if the operator sees the same files appear from the regular expression, then the operator should contact the transfer team and/or the site support team.

These dictionaries are used to generate the procedures table above.

author:	Daniel Abercrombie <dabercro@mit.edu>

Running the Web Tools ¶

The webtools are usually operated behind a cherrypy server. Before running the script runserver/workflowtools.py, there are a few other things that you should set up first.

Setting Up Server Configuration ¶

The first script you should run is setup_server.sh. Run this from inside the runserver directory:

cd runserver
./setup_server.sh

This script sets up a basic server configuration.

First, it generates or links to certificates for HTTPS. It takes one argument for determining the openssl flag for the passphrase protocol. Valid options for generating passwords are:

-aes128
-aes192
-aes256

To link to an existing key instead, pass the option:

./setup_server.sh -link

The script performs the following steps:

Creates a keys directory
- Prompts for the locations of private key and certificates to create softlinks if the softlinks option is passed.
- Otherwise, a keys/privkey.pem with the password flag passed is created along with a cert.pem to use for SSL connections
Generates a random list of salts from generatesalt.py
Copies a example config.yml to the working directory

If the certificate is not linked, it is self-signed and valid for one year.

After completing these steps, the script reminds the user to edit config.yml.

Server Configuration ¶

The server configuration is set by a YAML file, config.yml. A number of parameters are set here, and read by the WorkflowWebTools.serverconfig:

The webmaster name and email determines who the registration emails are sent from.
The host name and port tell where to serve the cherrypy application if using the CherryPy server.
The data locations are split up into
- Historic information
- Location to fetch the current errors
- Location to fetch the current errors’ explanations
- If any of these files are behind Shibboleth, cookie_file, cookie_pem, and cookie_key should be listed under data too. These keys are used as kwargs for CMSToolBox.get_json()
Valid registration email domains and specific whitelisted emails
Actions stores the number of days to check the submission history for actions requested
Clustering parameters

See WorkflowWebTools.clusterworkflows for more details on the clustering parameters.

Updating the Error History ¶

update_history.py

A python script for updating the history database.

Can take arguments that are file names or urls. If a local file doesn’t exist, the url is assumed. The input file is then added to the central history database, as defined in your server config file. If no arguments are given, the central history database is just generated as empty.

It is recommended to set up a cron job to update your central history regularly. For example adding the line:

0 * * * * <path/to>/update_history.py <URL to errors>

to your crontab will automatically update the central database every hour. Duplicate entries will not be added.

author:	Daniel Abercrombie <dabercro@mit.edu>

Starting the CherryPy Server ¶

Finally, the service can be launched by running:

./workflowtools.py

If you need sudo privileges, to access a certain port for example, you can use the script:

./run.sh

You may need to adjust the values in runserver/setenv.sh first.

Running the Server Behind WSGI ¶

If running the site in a production environment, you will likely need to run behind WSGI to enable CERN’s SSO. Install mod_wsgi, which is not listed as a strict requirement for the package.

In your apache configuration, create a virtual host for the WSGI application. You will also need to allow access to the runserver/static directory. An example would be as follows:

Listen 443
<VirtualHost *:443>
    WSGIScriptAlias / <path_to_WorkflowWebTools>/runserver/workflowtools.py

    Alias /static <path_to_WorkflowWebTools>/runserver/static

    <Directory <path_to_WorkflowWebTools>/runserver>

        #
        # Add Shibboleth authentification here? (Not developed yet)
        #

        WSGIApplicationGroup %{GLOBAL}
        <IfVersion >= 2.4>
            Require all granted
        </IfVersion>
        <IfVersion < 2.4>
            Order allow,deny
            Allow from all
        </IfVersion>

    </Directory>

    ServerName 127.0.0.1
    SSLEngine on
    SSLCertificateFile <path_to_cert>
    SSLCertificateKeyFile <path_to_private_key>

</VirtualHost>

Note

This is just what I have working on my laptop. This website has not been run behind WSGI in production yet. I would be grateful to learn about any errors in this section of the documentation.

Predicting Actions with SciKit-Learn ¶

There have been some preliminary attempts to create a model that predicts actions on a workflow based on the errors that it throws. The results of this preliminary training are presented in the slides located at WorkflowWebTools/docs/170705/dabercro_WorkflowWebTools_170705.pdf.

The following are instructions to run the training and test interactively:

First, create a JSON file that contains all of features and actions of previous workflows. This can be done with the script located at WorkflowWebTools/runserver/create_actionshistorylink.py:
```
./create_actionshistorylink.py static/history.VERSION_NUMBER.json
```
This should be done from the production server. The argument is just the name of the output file.
Then download the output JSON file to the computer where you will be doing the training. The module WorkflowWebTools.paramsregression also contains a script for running the training interactively. This can be run from inside the WorkflowWebTools directory with the following:
```
python2.7 paramsregression.py PATH/TO/history.VERSION_NUMBER.json
```
By default, this will train for the 'action' field. To train a different parameter, give another argument to the script:
```
python2.7 paramsregression.py PATH/TO/history.VERSION_NUMBER.json memory
```

The easiest way to get the classifier from inside server would be to call something like the following:

from WorkflowWebTools import actionshistorylink
from WorkflowWebTools import paramsregression

model = paramsregression.get_classifier(actionshistorylink.dump_json(), 'action')

The clusterer then works like any other sklearn trained model.

Note

This may take a long time to return, so generating the classifier is only something you would want to do once in a while (such as when the server first turns on or when an approved user accesses some restricted address).

For more details on the fuctions used in the internals, see Machine Learning Functions (or the source code).

Troubleshooting ¶

This is just a quick guide for anyone trying to troubleshoot or restart the production server.

If the web service is not responding, first check that you are accessing the machine using CERN’s network (either by being physically at CERN or using a proxy). If you are certain that you are using the CERN network, then you will have to log into the server. The name of the server is not given due to potential security concerns. To complete all of the following steps, you need sudo access to the machine.

SSH into a CERN machine.
Try downloading actions from the machine:
wget -O test.json --no-check-certificate "https://vocms0113.cern.ch:80/getaction?days=30&acted=-1"
If the file test.json is not empty, then the service is likely still running properly. Go back to checking your own connection.
SSH into the server.

Check top for the process python2.7. If it is there, but the server is still not responding, kill the process.
Do the following to start the process again:
cd /home/dabercro/OpsSpace/WorkflowWebTools/runserver
./run.sh
Check the server status:
sudo tail -f /home/dabercro/OpsSpace/WorkflowWebTools/runserver/nohup.out
(You can exit tail with Ctrl + c.) Hopefully, the last line says something like:
ENGINE Bus STARTED

Maintaining the Python Backend ¶

For developers wishing to make adjustments to the modules or anyone else who wants to understand some of the backend of the server, all of the Python modules for this system are documented below.

Server Configuration ¶

Small module to get information from the server config.

author:	Daniel Abercrombie <dabercro@mit.edu>

WorkflowWebTools.serverconfig.LOCATION = '/home/docs/checkouts/readthedocs.org/user_builds/cms-comp-ops-tools/checkouts/v0.7/WorkflowWebTools/test'¶: The string giving the location that serverconfig looks first for the configuration

WorkflowWebTools.serverconfig.all_errors_path()[source]¶

Returns:	the path to all errors for recent workflows
Return type:	str

WorkflowWebTools.serverconfig.config_dict()[source]¶

Returns:	the configuration in a dict
Return type:	str
Raises:	Exception – when it cannot find the configuration file

WorkflowWebTools.serverconfig.get_cluster_settings()[source]¶

Returns:	dictionary containing the settings for clustering
Return type:	dict:

WorkflowWebTools.serverconfig.get_history_length()[source]¶

Returns:	the number of days of history to check for workflows with an action submitted.
Return type:	int

WorkflowWebTools.serverconfig.get_valid_emails()[source]¶

Get iterator for valid email patterns for this instance. This is configurable by the webmaster.

Returns:	List of valid email patterns
Return type:	list

WorkflowWebTools.serverconfig.host_name()[source]¶

Returns:	the name of the host from the config
Return type:	str

WorkflowWebTools.serverconfig.host_port()[source]¶

Returns:	the port to host the site on
Return type:	str

WorkflowWebTools.serverconfig.wm_email()[source]¶

Returns:	the email of the webmaster
Return type:	str

WorkflowWebTools.serverconfig.wm_name()[source]¶

Returns:	the name of the webmaster
Return type:	str

WorkflowWebTools.serverconfig.workflow_history_path()[source]¶

Returns:	the path to the files of workflow error history
Return type:	str

Functions for Error Tracking ¶

Simple utils functions that do not rely on a session.

These are separated from globalerrors, since we do not need to generate an ErrorInfo instance in other applications.

author:	Daniel Abercrombie <dabercro@mit.edu>

WorkflowWebTools.errorutils.add_to_database(curs, data_location)[source]¶

Add data from a file to a central database through the passed cursor

Parameters:

curs (sqlite3.Cursor) – is the cursor to the database
data_location (str or list) – If a string, this is the location of the file or url of data to add to the database. This should be in JSON format, and if a local file does not exist, a url will be assumed. If the url is invalid, an empty database will be returned. If a list, it’s a list of status to get workflows from wmstats.

WorkflowWebTools.errorutils.create_table(curs)[source]¶: Create the workflows error table with the proper format :param sqlite3.Cursor curs: is the cursor to the database

WorkflowWebTools.errorutils.get_list_info(status_list)[source]¶

Get the list of workflows that match the statuses listed via workflowinfo.

Parameters:	status_list (list) – The list of workflow statuses to get the info for
Returns:	The workflow info dictionary, which matches the format of the unified all_errors.json.
Return type:	dict

WorkflowWebTools.errorutils.open_location(data_location)[source]¶

This function assumes that the contents of the location is in JSON format. It opens the data location and returns the dictionary.

Parameters:	data_location (str) – The location of the file or url
Returns:	information in the JSON file
Return type:	dict

Global Errors ¶

Generates the content for the errors pages

author:	Daniel Abercrombie <dabercro@mit.edu>

class WorkflowWebTools.globalerrors.ErrorInfo(data_location='')[source]¶

Holds the information for any errors for a session

connection_log(action)[source]¶

Logs actions on the sqlite3 connection

Parameters:	action (str) – is the action on the connection

execute(query, params=None)[source]¶

Locks the internal database and makes the query.

Parameters:	query (str) – The query, which can include ‘?’ params (tuple) – The parameters to pass into the query
Returns:	The results of the query
Return type:	list

get_allmap()[source]¶

Returns:	A dictionary that maps ‘errorcode’, ‘stepname’, and ‘sitename’ to the lists of all the errors, steps, or sites
Return type:	dict

get_prepid(prep_id)[source]¶

Parameters:	prep_id (str) – The name of the Prep ID to check cache for
Returns:	Either cached PrepIDInfo, or a new one
Return type:	WorkflowWebTools.workflowinfo.PrepIDInfo

get_step_list(workflow)[source]¶

Gets the list of steps within a workflow

Parameters:	workflow (str) – Name of the workflow to gather information for
Returns:	list of steps withing the workflow
Return type:	list

get_step_table(step, readymatch=None)[source]¶

Get the sparse representation of the step table. Fetches from an internal dictionary, so faster than database access

Parameters:	step (str) – The step name for the table readymatch (list) – The list of site readiness statuses to match
Returns:	The list used to build the step table. Each element of the list is a tuple of `(number of errors, site name, exit code)`
Return type:	list of tuples

get_workflow(workflow)[source]¶

This should be used to get the workflow info so that there is no redundant fetching for a single session.

Parameters:	workflow (str) – The prep ID for a workflow
Returns:	Cached WorkflowInfo from the ToolBox.
Return type:	WorkflowWebTools.workflowinfo.WorkflowInfo

return_workflows()[source]¶

Returns:	the ordered list of all workflow prep IDs that need attention
Return type:	list

set_all_lists()[source]¶: Get sets the list of all steps, sites, and errors for an ErrorInfo object. This should be called if data is added to the ErrorInfo cursor manually.

setup()[source]¶: Create an SQL database from the all_errors.json generated by production

teardown()[source]¶: Close the database when cache expires

WorkflowWebTools.globalerrors.TITLEMAP = {'errorcode': 'error code', 'sitename': 'site name', 'stepname': 'workflow'}¶: Dictionary that determines how a chosen pievar shows up in the pie chart titles

WorkflowWebTools.globalerrors.check_session(session, can_refresh=False)[source]¶

If session is None, fills it.

Parameters:	session (cherrypy.Session) – the current session can_refresh (bool) – tells the function if it is safe to refresh and close the old database
Returns:	ErrorInfo of the session
Return type:	ErrorInfo

WorkflowWebTools.globalerrors.default_errors_format()[source]¶

Gives a defaultdict with the format:

{group1: {'errors': {group1_1: {group1_1_1: errors, group1_1_2: errors}}, group2: ...}}

Returns:	A defaultdict for building errors
Return type:	collections.defaultdict

WorkflowWebTools.globalerrors.get_errors(pievar, session=None)[source]¶

Gets the number of errors with the format:

{group1: {'errors': {group1_1: {group1_1_1: errors, group1_1_2: errors}}, group2: ...}}

where each group is a different value for the variables that go into the row of the global errors table. That is, the groups will usually be the subtask list, unless pievar is "stepname". In that case, the grouping is by error code.

Parameters:	pievar (str) – The variable that each piechart is split into. session (cherrypy.Session) – Stores the information for a session
Returns:	A dictionary of 2D list of errors.
Return type:	defaultdict

WorkflowWebTools.globalerrors.get_row_col_names(pievar)[source]¶

Get the column and row for the global table view, based on user input

Parameters:	pievar (str) – The variable to divide the piecharts by.
Returns:	The names of the global table rows, and the table columns
Return type:	(str, str)

WorkflowWebTools.globalerrors.get_step_table(step, session=None, allmap=None, readymatch=None, sparse=False)[source]¶

Gathers the errors for a step into a 2-D table of ints

Parameters:	step (str) – name of the step to get the table for session (cherrypy.Session) – Stores the information for a session allmap (dict) – a globalerrors.ErrorInfo allmap to override the session’s allmap readymatch (tuple) – Match the readiness statuses in this tuple, if set sparse (bool) – Determines whether or not a sparse matrix is returned
Returns:	A table (made of lists) of errors for the step or a sparse dictionary of entries
Return type:	list of lists or dict of dicts of ints

WorkflowWebTools.globalerrors.group_errors(input_errors, grouping_function, **kwargs)[source]¶

Takes inputs errors with the format:

{group1: {'errors': {group1_1: {group1_1_1: errors, group1_1_2: errors}}, group2: ...}}

and sums the errors into a larger group. This second grouping is done by the output of the grouping_function.

Parameters:	input_errors (dict) – The input that will be grouped grouping_function (function) – Takes an input, which is a key of `input_errors` and groups those keys by this function output. kwargs – The keyword should point to a function. That keyword will be added to the dictionary of each group. It’s value will be the function output with the group as an argument.
Returns:	A dictionary with the same format as the input, but with groupings.
Return type:	defaultdict

WorkflowWebTools.globalerrors.list_matching_pievars(pievar, row, col, session=None)[source]¶

Return an iterator of variables in pievar, and number of errors for a given rowname and colname

Parameters:	pievar (str) – The variable to return an iterator of row (str) – Name of the row to match col (str) – Name of the column to match session (cherrypy.Session) – stores the session information
Returns:	List of tuples containing name of pievar and number of errors
Return type:	list

WorkflowWebTools.globalerrors.see_workflow(workflow, session=None)[source]¶

Gathers the error information for a single workflow

Parameters:	workflow (str) – Name of the workflow to gather information for session (cherrypy.Session) – Stores the information for a session
Returns:	Dictionary used to generate webpage for a requested workflow
Return type:	dict

Workflow Info ¶

Module containing and returning information about workflows.

authors:	Daniel Abercrombie <dabercro@mit.edu>

class WorkflowWebTools.workflowinfo.Info[source]¶

Implements shared operations on the cache

cache_filename(attribute)[source]¶

Return the name of the file for caching

Parameters:	attribute (str) – The information to store in the file
Returns:	The full file name to store the cache
Return type:	str

reset()[source]¶: Reset the cache for this object and clear out the files.

class WorkflowWebTools.workflowinfo.PrepIDInfo(prep_id, url='cmsweb.cern.ch')[source]¶

A class that just holds a small amount of information about a given PrepID.

get_requests(*args, **kwargs)[source]¶

Returns:	The requests for the Prep ID from ReqMgr2 API
Return type:	dict

get_workflows()[source]¶

Returns:	A list of tuples containing (workflow name, timestamp of request)
Return type:	list

get_workflows_requesttime()[source]¶

Returns:	A list of tuples containing (workflow name, timestamp of request)
Return type:	list

class WorkflowWebTools.workflowinfo.WorkflowInfo(workflow, url='cmsweb.cern.ch')[source]¶

Class that holds methods for accessing various information about a workflow.

get_errors(*args, **kwargs)[source]¶

A wrapper for errors_for_workflow() if you happen to have a WorkflowInfo object already.

Parameters:	get_unreported (bool) – Get the unreported errors from ACDC server
Returns:	a dictionary containing error codes in the following format: {step: {errorcode: {site: number_errors}}}
Return type:	dict

get_explanation(errorcode, step='')[source]¶

Gets a list of error logs for a given error code.

Parameters:	errorcode (str) – The error code to explain step (str) – The full name of the step to return explanations from
Returns:	list of error logs
Return type:	list

get_prep_id()[source]¶

Returns:	the PrepID for this workflow
Return type:	str

get_recovery_info(*args, **kwargs)[source]¶

Get the recovery info for this workflow.

Returns:	a dictionary containing the information used in recovery. The keys in this dictionary are arranged like the following: { task: { 'sites_to_run': list(sites), 'missing_to_run': int() } }
Return type:	dict

get_workflow_parameters(*args, **kwargs)[source]¶

Get the workflow parameters from ReqMgr2, or returns a cached value. See the ReqMgr 2 wiki for more details.

Returns:	Parameters for the workflow from ReqMgr2.
Return type:	dict

site_to_run(task)[source]¶

Gets a list of sites that a task in the workflow can run at

Parameters:	task (str) – The full name of the task to find sites for
Returns:	a list of site to run at
Return type:	list

WorkflowWebTools.workflowinfo.cached_json(attribute, timeout=None)[source]¶

A decorator for caching dictionaries in local files.

Parameters:	attribute (str) – The key of the `WorkflowInfo` cache to set using the decorated function. timeout (int) – The amount of time before refreshing the JSON file, in seconds.
Returns:	Function decorator
Return type:	func

WorkflowWebTools.workflowinfo.errors_for_workflow(workflow, url='cmsweb.cern.ch')[source]¶

Get the useful status information from a workflow

Parameters:

workflow (str) – the name of the workflow request
url (str) – the base url to find the information at

Returns:

a dictionary containing error codes in the following format:

{step: {errorcode: {site: number_errors}}}

Return type:

dict

WorkflowWebTools.workflowinfo.explain_errors(workflow, errorcode)[source]¶

Get example errors for a given workflow and errorcode

Parameters:	workflow (str) – is the workflow name errorcode (str) – is the error code
Returns:	a dict of log snippets from different sites.
Return type:	list

WorkflowWebTools.workflowinfo.list_workflows(status)[source]¶

Get the list of workflows currently in a given status. For a list of valid requests, visit the Request Manager Interface.

Parameters:	status (str) – The status of the workflow lists being looked for
Returns:	A list of workflows matching the status
Return type:	list

Workflow Clustering ¶

Tools for clustering workflows based on their errors.

The errors are clustered based on a generated vector. The vector can be thought as lying on two different hyper-sphere shells. One sphere is for the error codes that occur and the other sphere is the sites where those errors occur. The overall distance between two workflow will be the sum in quadrature of the distances on these two hyper-spheres.

These distances are configurable in the server config.yml. Each sphere has a center radius, thickness, and number of errors to be at the midpoint. The equation to determine distance from the origin for a vector of errors is the following.

\[\mathrm{distance} = \frac{d}{\sqrt{2}} + 2 w \left(\frac{|\vec{v}|}{|\vec{v}| + m} - 0.5\right)\]

The following parameters are set in config.yml for the site name and the error code hyperspheres separately.

m is the ‘midpoint’ parameter is the number of errors at a given site or given error code that will place the workflow at the midpoint of the hypersphere shell.
d is the ‘distance’ parameter is the cartesian distance between the midpoints of two different sites or error codes.
w is the ‘width’ parameter which defines the total shell width that the workflow could possibly land on.

The direction is determined by the error code or site name distribution. Note that this always points in the upper quadrant for a given coordinate.

The equation is chosen so that two workflows that have completely different errors at the same site, and the error width is 0, will end up with a distance that is equal to the error distance when clustering. This way, site and errors can have different distance weights, and there can be some separation for the number of errors in a workflow (for non-zero width).

author:	Daniel Abercrombie <dabercro@mit.edu>

WorkflowWebTools.clusterworkflows.CLUSTER_LOCK = <thread.lock object>¶: Lock that should be acquired before running clustering functions in here

WorkflowWebTools.clusterworkflows.get_clustered_group(workflow, clusterer, session=None)[source]¶

Get the group for a given workflow in this session

Parameters:	workflow (str) – The workflow to get the group for. clusterer (sklearn.cluster.KMeans) – is the clusterer fit with historic data. session (cherrypy.Session) – Stores the information for a session
Returns:	List of other workflows in the same group
Return type:	set

WorkflowWebTools.clusterworkflows.get_clusterer(history_path, errors_path='')[source]¶

Use this function to get the clusterer of workflows

Parameters:	history_path (str) – Path to the workflow historical data. This can be a local file path or a URL. errors_path (str) – The errors for a given session to include in the clustering
Returns:	A dict of a clusterer that is fitted to historical data with its allmap. The keys are ‘clusterer’ and ‘allmap’.
Return type:	dict

WorkflowWebTools.clusterworkflows.get_workflow_groups(clusterer, session=None)[source]¶

Groups workflows together based on a fitted clusterer

Parameters:	clusterer (dict) – is a dictionary with the clusterer fit with historic data and the allmap to generate it. This matches the output of `get_clusterer()`. session (cherrypy.Session) – Stores the information for a session
Returns:	A dictionary pointing workflows to a group
Return type:	dict

WorkflowWebTools.clusterworkflows.get_workflow_vectors(workflows, session=None, allmap=None)[source]¶

Gets the errors for workflows in a list of numpy arrays

Parameters:	workflows (str) – the workflows that vectors are returned for session (cherrypy.Session) – Stores the information for a session allmap (dict) – a globalerrors.ErrorInfo allmap to override the session’s allmap
Returns:	a list of numpy arrays of errors for the workflow
Return type:	list of numpy.array

Manage Users ¶

Module to manage users of WorkflowWebTools.

author:	Daniel Abercrombie <dabercro@mit.edu>

WorkflowWebTools.manageusers.add_user(email, username, password, url)[source]¶

Adds the user to the users database and sends a verification email, if the parameters are valid.

Parameters:	email (str) – The user email to send verification to. Make sure to check for valid domains. username (str) – The username of the account to add password (str) – The password to be stored url (str) – The base url to redirect the user to for validating their account
Returns:	Success code. 0 for adding user, 1 for not adding
Return type:	int

WorkflowWebTools.manageusers.confirmation(code, lookup='validator', return_curs=False)[source]¶

Determine the user from the confirmation code and activate the account

Parameters:	code (str) – the confirmation code for the user lookup (str) – The field to look up the username with return_curs (bool) – If false, closes the connection
Returns:	the user name if valid code, or ‘’ if not. Followed by conn, curs if return_curs is True
Return type:	str [, sqlite3.Connection, sqlite3.Cursor]

WorkflowWebTools.manageusers.do_salt_hash(to_hash)[source]¶

Salt and hash an object into a storable form.

Parameters:	to_hash (str) – Alternate salt and hashing to get stored variable
Returns:	A salty hash
Return type:	str

WorkflowWebTools.manageusers.get_user_db()[source]¶

Gets the users database in the local directory.

Returns:	the users connection, cursor
Return type:	sqlite3.Connection, sqlite3.Cursor

WorkflowWebTools.manageusers.resetpassword(code, password)[source]¶

Resets the password for a user.

Parameters:	code (str) – is the validation code for the account password (str) – is the new password
Returns:	user from the password reset
Return type:	str

WorkflowWebTools.manageusers.send_reset_email(email, url)[source]¶

Generates a confirmation code for a given account and sends a reset email.

Parameters:	email (str) – the email linked to the account url (str) – the url of the running instance

WorkflowWebTools.manageusers.validate_password(_, username, password)[source]¶

Verifies users’ logon attempts. This is called automatically by cherrypy, hence the unused parameter.

Parameters:	_ (str) – The hostname username (str) – The attempted username password (str) – The attempted password
Returns:	Whether or not the loging attempt succeeds
Return type:	bool

Reasons Manipulation ¶

Module for manipulating the database that stores past operator reasons.

author:	Daniel Abercrombie <dabercro@mit.edu>

WorkflowWebTools.reasonsmanip.get_reasons()[source]¶

Gets the reasons database in the local directory.

Returns:	the reasons connection, cursor
Return type:	(sqlite3.Connection, sqlite3.Cursor)

WorkflowWebTools.reasonsmanip.reasons_list()[source]¶

Get the full list of reasons

Returns:	all of the reasons in a dictionary with the short reasons being the key
Return type:	dict

WorkflowWebTools.reasonsmanip.short_reasons_list()[source]¶

Get the list of short reasons

Returns:	the list of short reasons
Return type:	list of strs

WorkflowWebTools.reasonsmanip.update_reasons(reasons)[source]¶

Gets the reasons for a given action and updates the reasons db

Parameters:	reasons (list of dicts) – is a list of dictionaries of short and long reasons, with keys ‘short’ and ‘long’
Raises:	TypeError – if the parameter is not a list KeyError – if the dictionaries in the list do not have the correct structure

Manage Actions ¶

Module to manage actions of WorkflowWebTools.

author:	Daniel Abercrombie <dabercro@mit.edu>

WorkflowWebTools.manageactions.extract_reasons_params(action, **kwargs)[source]¶

Extracts the reasons and parameters for an action from kwargs

Parameters:	action (str) – The action being asked for by the operator kwargs – Keywords that are submitted via POST to /submitaction
Returns:	reasons for action, and parameters for the action
Return type:	list, dict

WorkflowWebTools.manageactions.fix_sites(**kwargs)[source]¶

Fix the site lists for tasks that had zero sites.

Parameters:	kwargs – Keywords to be parsed by `extract_reasons_params()`

WorkflowWebTools.manageactions.get_acted_workflows(num_days)[source]¶

Get all of the workflows that have actions assigned

Parameters:	num_days (int) – is the number of past days to check for actions. This speeds up the check while not losing the ability to look farther back in time.
Returns:	a list of workflows acted on
Return type:	list

WorkflowWebTools.manageactions.get_actions(num_days=None, num_hours=24, acted=0)[source]¶

Get the recent actions to be acted on in dictionary form

Parameters:	num_days (int) – is the number of days to check for actions num_hours (int) – can be used instead of num_days for finer granularity. If `num_days` is given, `num_hours` is ignored. acted (int) – Flag (0, 1, or None) for whether to use acted or not
Returns:	A dictionary of actions, to be rendered as JSON
Return type:	dict

WorkflowWebTools.manageactions.get_actions_collection()[source]¶

Gets the actions collection from MongoDB.

Returns:	the actions collection
Return type:	pymongo.collection.Collection

WorkflowWebTools.manageactions.get_datetime_submitted(workflow)[source]¶

Get the datetime for a submitted workflow

Parameters:	workflow (str) – A workflow to get the time submitted
Returns:	A datetime object, or None if the workflow has never been submitted
Return type:	datetime.datetime or None

WorkflowWebTools.manageactions.report_actions(workflows, output=None)[source]¶

Mark actions as acted on

Parameters:	workflows (list) – is the list of workflows to no longer show output (dict) – If set, a reference to a dictionary to update with details of what was and wasn’t set to acted

WorkflowWebTools.manageactions.submitaction(user, workflows, action, session=None, **kwargs)[source]¶

Writes the action to Unified and notifies the user that this happened

Parameters:	user (str) – is the user that submitted the action workflows (str) – is the original workflow name or workflows action (str) – is the suggested action for Unified to take session (cherrypy.Session) – the current session kwargs – can include various reasons and additional datasets
Returns:	a tuple of workflows, reasons, and params for the action
Return type:	list, str, list of dicts, dict

Show Log ¶

Generates content for the showlog webpage

author:	Daniel Abercrombie <dabercro@mit.edu>

WorkflowWebTools.showlog.give_logs(query, module='', limit=50)[source]¶

Takes output from CMSToolBox.elasticsearch.search_logs() and organizes it to pass to a mako template

Parameters:	query (str) – The query to pass to `CMSToolBox.elasticsearch.search_logs()` module (str) – The module to show limit (int) – The max number of logs to show on a single page
Returns:	dictionary with [‘logs’], which is a list of logs, and [‘meta’], which shows meta information of newest query result
Return type:	dict

Classifying Errors ¶

This module reads the explain error info for a session and tries to classify them in a way that can help recommend procedures to the operator. These procedures are gathered from WorkflowWebTools.procedures.

author:	Daniel Abercrombie <dabercro@mit.edu>

WorkflowWebTools.classifyerrors.classifyerror(errorcode, workflow, session=None)[source]¶

Return the most relevant characteristics of an error code for this session. This will include things like:

Files not opening
StageOut errors

Note

More error types should be added to this function as needed

Parameters:	errorcode (int) – The error code that we want to classify workflow (str) – the workflow that we want to get the errors from session (cherrypy.Session) – Is the user’s cherrypy session
Returns:	A tuple of strings describing the key characteristics of the errorcode. These strings are good for printing directly in web browsers. The first string is the types of errors reported with this error code. The second string is the recommended normal actions. The third string contains special parameters useful for additional actions.
Return type:	str, str

WorkflowWebTools.classifyerrors.get_max_errorcode(workflow, session=None)[source]¶

Get the errorcode with the most errors for a session

Parameters:	workflow (str) – Is the primary name of the workflow session (cherrypy.Session) – Is the user’s cherrypy session
Returns:	The error code that appears most often for this workflow
Return type:	int

List Workflows At Site ¶

This module provides lists of errors per pie variable

author:	Daniel Abercrombie

WorkflowWebTools.listpage.listworkflows(error_code, site_name, workflow, session=None)[source]¶

Gives back a list of tuples containing pie variables and the number of errors that matches a given error_code and site_name.

Parameters:	error_code (int) – The error code that we want errors for site_name (str) – The site name that we want errors for workflow (str) – The workflow that we want errors for session (cherrypy.Session) – holds the session information
Returns:	List of tuples of pie variable and errors, sorted by number of errors
Return type:	list

Machine Learning Functions ¶

The actionshistorylink module creates files that link the action and the errors thrown for each task where the action and history are both stored on the server. The config.yml file is read to determine the locations of this history databases.

author:	Daniel Abercrombie <dabercro@mit.edu>

WorkflowWebTools.actionshistorylink.dump_json(file_name=None)[source]¶

Dump a list of pairs into a file and returns the dictionary. The pairs are dictionary of errors, and action document. Each element in the list corresponds to a different subtask.

Parameters:	file_name (str) – The location to place the json file, if set
Returns:	The errors and actions for each subtask
Return type:	dict

The paramsregression module uses neural net classifiers to predict parameters for error handling workflows.

Note

Regression part is a work in progress. Need more data.

author:	Daniel Abercrombie <dabercro@mit.edu>

WorkflowWebTools.paramsregression.convert_to_dense(errors, keys=None, allerrors=None, allsites=None)[source]¶

Take a dictionary of sparse matrices, where the sparse matrices have keys of error code and them site names, and return the equivalent dense matrices.

Parameters:	errors (dict) – The container for sparse matrices. keys (list) – Keys for each of the matrices. Defaults to ‘good_sites’ and ‘bad_sites’ allerrors (list) – An ordered list of all the errors to consider If both this and allsites are blank, this function will just pull lists from the sparse matrices allsites (list) – An ordered list of all the sites to consider
Returns:	Container for two dense matrices
Return type:	dict of lists of lists

WorkflowWebTools.paramsregression.get_classifier(raw_data, parameter, **kwargs)[source]¶

Fit a classifier. If the module is run as a script, just print the training and test data output. Otherwise, return the classifier for farther use.

Parameters:	raw_data (dict) – Raw data in the form of output from `actionshistorylink.dump_json()`. parameter (str) – The parameter to classify. kwargs – These are kwargs for the `sklearn.neural_network.MLPClassifier` that is running underneath.
Returns:	Trained classifier model
Return type:	sklearn.neural_network.MLPClassifier

WorkflowWebTools.paramsregression.main()[source]¶: This is for testing.

JavaScript and Mako User Interface ¶

Some documentation of the JavaScript used on the webpages are given below.

addreason.js ¶

This file contains the functions used to add reasons to the Detailed Workflow View The fields added by this script are handled by the WorkflowWebTools.reasonsmanip module.

Todo

This is some of the messiest code in the package. Need to impliment some JSLint check.

author:	Daniel Abercrombie <dabercro@mit.edu>

count¶: Keeps track of the number of reasons on a given page.

getDefaultReason(num)¶

This function is used to generate a new reason on the Detailed Workflow View. Each name and ID is unique so that the short and long reasons can be matched.

Arguments:	num (int) – is a unique number for each reason submitted on the current page.
Returns:	the innerHTML for a reason div when a new one is generated.
Return type:	str

addReason()¶: Creates a new reason division and fills it with the result from getDefaultReason(). It increments count. Finally, the division is appended to the “reasons” division.

removeReason(childId)¶

Removes a reason from the “reasons” division.

Arguments:	childId (str) – is the full ID of the child to remove.

fillLongReason(number)¶

If a short reason is selected from the select field, this function displays the long reason.

Arguments:	number (int) – is the ID number of the reason to fill.

checkReason(num)¶: Checks if a short reason entered by the user is already taken or not. If it is taken, the function gives a warning message and displays the corresponding long reason.

expand_children(arrow, this_level, this_name, only_hide)¶

This function expands or collapses the rows underneath a given header. The collapsing happens recursively.

Arguments:	this_level – Tells what level the row expansion is on. (0, 1, or 2, for example) this_name – Give the name of the row doing the expansion. only_hide – Set this to true for recursive collapsing. Prevents expansion when not desired.

makeparams.js ¶

This file contains the functions used to add parameters to the page when different actions are selected. The fields added by this script are handled by the WorkflowWebTools.manageactions module.

author:	Daniel Abercrombie <dabercro@mit.edu>

taskTable(opts, texts, taskNumber)¶

Creates a “table” for parameters to be filled in and returns the <div> these are contained in.

param dict opts:
	Each key is the name of the parameter. Each value is the list of possible values for this parameter.
param list texts:
	A list of parameters that should be set by a text field.
param int taskNumber:
	Separates the different parameters for each task in ACDC.
returns:	The <div> element that contains the table

printSiteLists(method)¶: Sets the contents of the site tables under the “acdc” action based on the method of site selection subsequently selected.

makeParamTable(action)¶: Offers the various options that are available for a given action.

checkuser.js ¶

JavaScript functions that are used when registering a new user or reseting a password. The password check is not done by any other part of the system, but all other checks are performed by the Python backend on submission for security purposes.

author:	Daniel Abercrombie <dabercro@mit.edu>

checkPassword()¶: Checks input fields with the ids #firstword and #secondword. If they match, sets the innerHTML of the div #confirmresult with a green positive message. Otherwise the div is filled with a red warning message.

validateForm()¶: Checks the form for registering a new user. The requirements for the form values are given in Creating a New Account.

piechart.js ¶

Contains the drawPieCharts function for the global errors page.

author:	Daniel Abercrombie <dabercro@mit.edu>

drawPies()¶: This function finds multiple canvases and draws pie charts onto them. The number of errors for each canvas are by a JSON string hidden in the table with id ‘errortable’. The piecharts are split with the largest contributer always having the same color. The radius of the piechart is a fraction of the canvas size, determined by the following equation:

\[\frac{r}{r_{canvas}} = \frac{n_{errors}}{n_{errors} + 10}\]