Workflow Team Web Tools

build

Welcome to the documentation for the new Workflow Team Web Tools.

Using the Web Tools as an Operator

Once you are pointed to a proper URL to access the webtools, you will come to the home page, with links to different views. For each of the examples below, I will use the base URL of https://localhost:8080/, since that is likely the URL you can see if you run the server on your machine. If you are looking at a production server, the URL will of course be different.

For users familiar with CherryPy, each page is a function of a WorkflowTools object. To pass parameters to the function in a browser, the usual urlencoding of the parameters can be appended to the URL to call each function. Most users should be able to interact with the website exclusively through links though. From the URL root index, users will be able to directly access the following:

The Global Error View

WorkflowTools.globalerror(pievar='errorcode')[source]

This page, located at https://localhost:8080/globalerror, attempts to give an overall view of the errors that occurred in each workflow at different sites. The resulting view is a table of piecharts. The rows and columns can be adjusted to contain two of the following:

  • Workflow step name
  • Site where error occurred
  • Exit code of the error

The third variable is used to split the pie charts. This variable inside the pie charts can be quickly changed by submitting the form in the upper left corner of the page.

The size of the piecharts depend on the total number of errors in a given cell. Each cell also has a tooltip, giving the total number of errors in the piechart. The colors of the piecharts show the splitting based on the pievar. Clicking on the pie chart will show the splitting explicitly using the List Workflows page.

If the steps make up the rows, the default view will show you the errors for each campaign. Clicking on the campaign name will cause the rows to expand to show the original workflow and all ACDCs (whether or not the ACDCs have errors). Following the link of the workflow will bring you to Detailed Workflow View. Clicking anywhere else in the workflow box will cause it to expand to show errors for each step.

Parameters:pievar (str) –

The variable that the pie charts are split into. Valid values are:

  • errorcode
  • sitename
  • stepname
Returns:the global views of errors
Return type:str

List Workflows

WorkflowTools.listpage(errorcode='', sitename='', workflow='')[source]

This returns a list of workflows, site names, or error codes that matches the values given for the other two variables. The page can be accessed directly by clicking on a corresponding pie chart on The Global Error View.

Parameters:
  • errorcode (int) – Error to match
  • sitename (str) – Site to match
  • workflow (str) – The workflow to match
Returns:

Page listing workflows, site names or errors codes

Return type:

str

Raises:

cherrypy.HTTPRedirect to 404 if all variables are filled

Detailed Workflow View

WorkflowTools.seeworkflow(workflow='', issuggested='')[source]

Located at https://localhost:8080/seeworkflow, this shows detailed tables of errors for each step in a workflow.

For the exit codes in each row, there is a link to view some of the output of the error message for jobs having the given exit code. This should help operators understand what the error means.

At the top of the page, there are links back for The Global Error View, Viewing Workflow Logs, related JIRA tickets, and ReqMgr2 information about the workflow and prep ID.

The main function of this page is to submit actions. Note that you will need to register in order to actually submit actions. See Creating a New Account for more details. Depending on which action is selected, a menu will appear for the operator to adjust parameters for the workflows.

Under the selection of the action and parameters, there is a button to show other workflows that are similar to the selected workflow, according to the Workflow Info. Each entry is a link to open a similar workflow view page in a new tab. The option to submit actions will not be on this page though (so that you can focus on the first workflow). If you think that a workflow in the cluster should have the same actions applied to it as the parent workflow, then check the box next to the workflow name before submitting the action.

Finally, before submitting, you can submit reasons for your action selection. Clicking the Add Reason button will give you an additional reason field. Reasons submitted are stored based on the short reason you give. You can then select past reasons from the drop down menu to save time in the future. If you do not want to store your reason, do not fill in the Short Reason field. The long reason will be used for logging and communicating with the workflow requester (eventually).

Parameters:
  • workflow (str) – is the name of the workflow to look at
  • issuggested (str) – is a string to tell if the page has been linked from another workflow page
Returns:

the error tables page for a given workflow

Return type:

str

Raises:

404 if a workflow doesn’t seem to be in assistance anymore Resets personal cache in the meanwhile, just in case

Getting the List of Actions

WorkflowTools.getaction(days=0, acted=0)[source]

The page at https://localhost:8080/getaction returns a list of workflows to perform actions on. It may be useful to use this page to immediately check if your submission went through properly. This page will mostly be used by Unified though for acting on operator submissions.

Parameters:
  • days (int) – The number of past days to check. The default, 0, means to only check today.
  • acted (int) –

    Used to determine which actions to return. The following values can be used:

    • 0 - The default value selects actions that have not been run on
    • 1 - Selects actions reported as submitted by Unified
    • Negative integer - Selects all actions
Returns:

JSON-formatted information containing actions to act on. The top-level keys of the JSON are the workflow names. Each of these keys refers to a dictionary specifying:

  • ”Action” - The action to take on the workflow
  • ”Reasons” - A list of reasons for this action
  • ”Parameters” - Changes to make for the resubmission
  • ”user” - The account name that submitted that action

Return type:

JSON

Reporting Completed Actions

WorkflowTools.reportaction()[source]

A POST request to https://localhost:8080/reportaction tells the WorkflowWebTools that a set of workflows has been acted on by Unified. The body of the POST request must include a JSON with the passphrase under "key" and a list of workflows under "workflows".

An example of making this POST request is provided in the file WorkflowWebTools/test/report_action.py, which relies on WorkflowWebTools/test/key.json.

Returns:A JSON summarizing results of the request. Keys and values include:
  • ’valid_key’ - If this is false, the passphrase passed was wrong,
    and no action is changed
  • ’valid_format’ - If false, the input format is unexpected
  • ’success’ - List of actions that have been marked as acted on
  • ’already_reported’ - List of workflows that were already removed
  • ’does_not_exist’ - List of workflows that the database does not know
Return type:JSON

Viewing Workflow Logs

WorkflowTools.showlog(search='', module='', limit=50)[source]

This page, located at https://localhost:8080/showlog, returns logs that are stored in an elastic search server. More details can be found at Elastic Search Tools. If directed here from Detailed Workflow View, then the search will be for the relevant workflow.

Parameters:
  • search (str) – The search string
  • module (str) – The module to look at, if only interested in one
  • limit (int) – The limit of number of logs to show on a single page
Returns:

the logs from elastic search

Return type:

str

Creating a New Account

WorkflowTools.newuser(email='', username='', password='')[source]

New users can register at https://localhost:8080/newuser. From this page, users can enter a username, email, and password. The username cannot be empty, must contain only alphanumeric characters, and must not already exist in the system. The email must match the domain names listed on the page or can be a specific whitelisted email. See Setting Up Server Configuration for more information on setting valid emails. Finally, the password must also be not empty.

If the registration process is successful, the user will recieve a confirmation page instructing them to check their email for a verification link. The user account will be activated when that link is followed, in order to ensure that the user possesses a valid email.

The following parameters are sent via POST from the registration page.

Parameters:
  • email (str) – The email of the new user
  • username (str) – The username of the new user
  • password (str) – The password of the new user
Returns:

a page to generate a new user or a confirmation page

Return type:

str

Raises:

cherrypy.HTTPRedirect back to the new user page without parameters if there was a problem entering the user into the database

Reseting Account Password

WorkflowTools.resetpassword(email='', code='', password='')[source]

If a user forgets his or her username or password, navigating to https://localhost:8080/resetpassword will allow them to enter their email to reset their password. The email will contain the username and a link to reset the password.

This page is multifunctional, depending on which parameters are sent. The link actually redirects to this webpage with a secret code that will then allow you to submit a new password. The password is then submitted back here via POST.

Parameters:
  • email (str) – The email linked to the account
  • code (str) – confirmation code to activate the account
  • password (str) – the new password for a given code
Returns:

a webview depending on the inputs

Return type:

str

Raises:

404 if both email and code are filled

Manually Reseting Your Workflow Cache

WorkflowTools.resetcache(prepid='', workflow='')[source]

The function is only accessible to someone with a verified account.

Navigating to https://localhost:8080/resetcache resets the error info for the user’s session. It also clears out cached JSON files on the server. Under normal operation, this cache is only refreshed every half hour.

Returns:a confirmation page
Return type:str

Redoing the Workflow Clusters

WorkflowTools.cluster()[source]

The function is only accessible to someone with a verified account.

Navigating to https://localhost:8080/cluster causes the server to regenerate the clusters that it has stored. This is useful when the history database of past errors has been updated with relevant errors since the server has been started or this function has been called.

Returns:a confirmation page
Return type:str

Workflows Procedures

Here is the list of recommended procedures for various exit codes:

Exit
Code
Cause Normal Procedure Additional Procedure Additional Match
Expression
73 RootEmbeddedFileSequence:
secondary file list seems
to be missing.

When overflow
procedure is enables, the
agent should be able to
see the secondary file
set as present at those
sites as well.
Open a GitHub issue    
84 The file could not be
found
ACDC If a second round of ACDC
and fails for same files,
contact the transfer team
and/or site support team.
root://[-/w.]+.root
85 If FileReadError, the
file could not be read.

If R__unzip: error,
most likely a bad node
with corrupted
intermediate files.
ACDC If getting unzip errors,
most likely a bad node.
Report immediately.

If a second round of
ACDC and fails for same
files, contact the
transfer team and/or site
support team.
root://[-/w.]+.root
R__unzip
92 File was not found and
fallback procedure also
failed.
ACDC If a second round of ACDC
and fails for same files,
contact the transfer team
and/or site support team.
root://[-/w.]+.root
134   Strong failure to report
immediately
   
137 Likely an unrelated batch
system kill. Sometimes a
site-related problem.
Point SSP. Open GGUS
ticket and put site in
drain.
   
8001   Strong failure, report
immediately
   
8002   Strong failure, report
immediately
   
8004 BadAlloc - memory
exhaustion
Report immediately to the
Site Support Team.
   
11003 JobExtraction failures ACDC    
50513 Failure to run SCRAM
setup scripts
ACDC    
50660 Performance kill: “Job
exceeding maxRSS”
More memory    
61202 Can’t determine proxy
filename. X509 user proxy
required for job.
If temporary failure:
ACDC, otherwise contact
glideinWMS team
   
71104 Site containing only
available copy of input
datasets is in drain.
Copy input files from
tape (?) or wait for site
listed to come out of
drain.
  The job can run only at .*(T[12].*
71304 Job was killed by WMAgent
for using too much
wallclock time
ACDC    
71305 Job was killed by WMAgent
for using too much
wallclock time
ACDC    
71306 Job was killed by WMAgent
for using too much
wallclock time
ACDC    
99109 Uncaught exception in
WMAgent step executor
(often staging out
problems)
Check the Site Readiness
and consult with the Site
Support Team.

If a temporary
failure: ACDC.

If a problem with
the site: Make GGUS
ticket, put the site in
drain then kill and clone
   
99303 The agent has not found
the JobReport.x.pkl file.
Regardless of the status
of the job, it will fail
the job with 99303. Work
is needed to make this
error as rare as possible
since there are very
limited cases to make the
file fail to appear.
Strong failure, report
immediately
   

The module WorkflowWebTools.procedures, which is used by WorkflowWebTools.classifyerrors contains only the PROCEDURES dictionary. The keys of this dictionary are the integer exit codes from the agent to act on. The values are dictionaries that contain 'normal' and 'additional' actions and a 'cause'. The 'additional' actions are dictionaries containing 're' and 'action'. The 're' field is a compiled regular expression to match to the logs. group(1) of this regular expression is taken to be a parameter to be displayed to the operator. The 'action' field gives additional instructions to the operator, who should inspect the parameters returned by the regular expression to determine the ultimate action to perform. The 'cause' will add some insight into the error codes in the documentation. It isn’t shown to the operator because they already see the different types of error codes and are provided with a link to look at the error logs directly.

An example procedure with just exit code 84 would look like the following:

PROCEDURES = {
    84: {
        'normal': 'ACDC',
        'cause': 'The file could not be found',
        'additional': {
            're': re.compile(r'(root://.+root)'),
            'action': ('If a second round of ACDC and fails for same files, '
                       'contact the transfer team and/or site support team.')
            }
        }
    }

What this means is that error code 84 usually should be an ACDC. However, if the operator sees the same files appear from the regular expression, then the operator should contact the transfer team and/or the site support team.

These dictionaries are used to generate the procedures table above.

author:Daniel Abercrombie <dabercro@mit.edu>

Running the Web Tools

The webtools are usually operated behind a cherrypy server. Before running the script runserver/workflowtools.py, there are a few other things that you should set up first.

Setting Up Server Configuration

The first script you should run is setup_server.sh. Run this from inside the runserver directory:

cd runserver
./setup_server.sh

This script sets up a basic server configuration.

First, it generates or links to certificates for HTTPS. It takes one argument for determining the openssl flag for the passphrase protocol. Valid options for generating passwords are:

  • -aes128
  • -aes192
  • -aes256

To link to an existing key instead, pass the option:

./setup_server.sh -link

The script performs the following steps:

  1. Creates a keys directory
    • Prompts for the locations of private key and certificates to create softlinks if the softlinks option is passed.
    • Otherwise, a keys/privkey.pem with the password flag passed is created along with a cert.pem to use for SSL connections
  2. Generates a random list of salts from generatesalt.py
  3. Copies a example config.yml to the working directory

If the certificate is not linked, it is self-signed and valid for one year.

After completing these steps, the script reminds the user to edit config.yml.

Server Configuration

The server configuration is set by a YAML file, config.yml. A number of parameters are set here, and read by the WorkflowWebTools.serverconfig:

  • The webmaster name and email determines who the registration emails are sent from.
  • The host name and port tell where to serve the cherrypy application if using the CherryPy server.
  • The data locations are split up into
    • Historic information
    • Location to fetch the current errors
    • Location to fetch the current errors’ explanations
    • If any of these files are behind Shibboleth, cookie_file, cookie_pem, and cookie_key should be listed under data too. These keys are used as kwargs for CMSToolBox.get_json()
  • Valid registration email domains and specific whitelisted emails
  • Actions stores the number of days to check the submission history for actions requested
  • Clustering parameters

See WorkflowWebTools.clusterworkflows for more details on the clustering parameters.

Updating the Error History

update_history.py

A python script for updating the history database.

Can take arguments that are file names or urls. If a local file doesn’t exist, the url is assumed. The input file is then added to the central history database, as defined in your server config file. If no arguments are given, the central history database is just generated as empty.

It is recommended to set up a cron job to update your central history regularly. For example adding the line:

0 * * * * <path/to>/update_history.py <URL to errors>

to your crontab will automatically update the central database every hour. Duplicate entries will not be added.

author:Daniel Abercrombie <dabercro@mit.edu>

Starting the CherryPy Server

Finally, the service can be launched by running:

./workflowtools.py

If you need sudo privileges, to access a certain port for example, you can use the script:

./run.sh

You may need to adjust the values in runserver/setenv.sh first.

Running the Server Behind WSGI

If running the site in a production environment, you will likely need to run behind WSGI to enable CERN’s SSO. Install mod_wsgi, which is not listed as a strict requirement for the package.

In your apache configuration, create a virtual host for the WSGI application. You will also need to allow access to the runserver/static directory. An example would be as follows:

Listen 443
<VirtualHost *:443>
    WSGIScriptAlias / <path_to_WorkflowWebTools>/runserver/workflowtools.py

    Alias /static <path_to_WorkflowWebTools>/runserver/static

    <Directory <path_to_WorkflowWebTools>/runserver>

        #
        # Add Shibboleth authentification here? (Not developed yet)
        #

        WSGIApplicationGroup %{GLOBAL}
        <IfVersion >= 2.4>
            Require all granted
        </IfVersion>
        <IfVersion < 2.4>
            Order allow,deny
            Allow from all
        </IfVersion>

    </Directory>

    ServerName 127.0.0.1
    SSLEngine on
    SSLCertificateFile <path_to_cert>
    SSLCertificateKeyFile <path_to_private_key>

</VirtualHost>

Note

This is just what I have working on my laptop. This website has not been run behind WSGI in production yet. I would be grateful to learn about any errors in this section of the documentation.

Predicting Actions with SciKit-Learn

There have been some preliminary attempts to create a model that predicts actions on a workflow based on the errors that it throws. The results of this preliminary training are presented in the slides located at WorkflowWebTools/docs/170705/dabercro_WorkflowWebTools_170705.pdf.

The following are instructions to run the training and test interactively:

  • First, create a JSON file that contains all of features and actions of previous workflows. This can be done with the script located at WorkflowWebTools/runserver/create_actionshistorylink.py:

    ./create_actionshistorylink.py static/history.VERSION_NUMBER.json
    

    This should be done from the production server. The argument is just the name of the output file.

  • Then download the output JSON file to the computer where you will be doing the training. The module WorkflowWebTools.paramsregression also contains a script for running the training interactively. This can be run from inside the WorkflowWebTools directory with the following:

    python2.7 paramsregression.py PATH/TO/history.VERSION_NUMBER.json
    

    By default, this will train for the 'action' field. To train a different parameter, give another argument to the script:

    python2.7 paramsregression.py PATH/TO/history.VERSION_NUMBER.json memory
    

The easiest way to get the classifier from inside server would be to call something like the following:

from WorkflowWebTools import actionshistorylink
from WorkflowWebTools import paramsregression

model = paramsregression.get_classifier(actionshistorylink.dump_json(), 'action')

The clusterer then works like any other sklearn trained model.

Note

This may take a long time to return, so generating the classifier is only something you would want to do once in a while (such as when the server first turns on or when an approved user accesses some restricted address).

For more details on the fuctions used in the internals, see Machine Learning Functions (or the source code).

Troubleshooting

This is just a quick guide for anyone trying to troubleshoot or restart the production server.

If the web service is not responding, first check that you are accessing the machine using CERN’s network (either by being physically at CERN or using a proxy). If you are certain that you are using the CERN network, then you will have to log into the server. The name of the server is not given due to potential security concerns. To complete all of the following steps, you need sudo access to the machine.

  1. SSH into a CERN machine.

  2. Try downloading actions from the machine:

    wget -O test.json --no-check-certificate "https://vocms0113.cern.ch:80/getaction?days=30&acted=-1"
    

    If the file test.json is not empty, then the service is likely still running properly. Go back to checking your own connection.

  3. SSH into the server.

  4. Check top for the process python2.7. If it is there, but the server is still not responding, kill the process.

  5. Do the following to start the process again:

    cd /home/dabercro/OpsSpace/WorkflowWebTools/runserver
    ./run.sh
    
  6. Check the server status:

    sudo tail -f /home/dabercro/OpsSpace/WorkflowWebTools/runserver/nohup.out
    

    (You can exit tail with Ctrl + c.) Hopefully, the last line says something like:

    ENGINE Bus STARTED
    

Maintaining the Python Backend

For developers wishing to make adjustments to the modules or anyone else who wants to understand some of the backend of the server, all of the Python modules for this system are documented below.

Server Configuration

Small module to get information from the server config.

author:Daniel Abercrombie <dabercro@mit.edu>
WorkflowWebTools.serverconfig.LOCATION = '/home/docs/checkouts/readthedocs.org/user_builds/cms-comp-ops-tools/checkouts/v0.7/WorkflowWebTools/test'

The string giving the location that serverconfig looks first for the configuration

WorkflowWebTools.serverconfig.all_errors_path()[source]
Returns:the path to all errors for recent workflows
Return type:str
WorkflowWebTools.serverconfig.config_dict()[source]
Returns:the configuration in a dict
Return type:str
Raises:Exception – when it cannot find the configuration file
WorkflowWebTools.serverconfig.get_cluster_settings()[source]
Returns:dictionary containing the settings for clustering
Return type:dict:
WorkflowWebTools.serverconfig.get_history_length()[source]
Returns:the number of days of history to check for workflows with an action submitted.
Return type:int
WorkflowWebTools.serverconfig.get_valid_emails()[source]

Get iterator for valid email patterns for this instance. This is configurable by the webmaster.

Returns:List of valid email patterns
Return type:list
WorkflowWebTools.serverconfig.host_name()[source]
Returns:the name of the host from the config
Return type:str
WorkflowWebTools.serverconfig.host_port()[source]
Returns:the port to host the site on
Return type:str
WorkflowWebTools.serverconfig.wm_email()[source]
Returns:the email of the webmaster
Return type:str
WorkflowWebTools.serverconfig.wm_name()[source]
Returns:the name of the webmaster
Return type:str
WorkflowWebTools.serverconfig.workflow_history_path()[source]
Returns:the path to the files of workflow error history
Return type:str

Functions for Error Tracking

Simple utils functions that do not rely on a session.

These are separated from globalerrors, since we do not need to generate an ErrorInfo instance in other applications.

author:Daniel Abercrombie <dabercro@mit.edu>
WorkflowWebTools.errorutils.add_to_database(curs, data_location)[source]

Add data from a file to a central database through the passed cursor

Parameters:
  • curs (sqlite3.Cursor) – is the cursor to the database
  • data_location (str or list) – If a string, this is the location of the file or url of data to add to the database. This should be in JSON format, and if a local file does not exist, a url will be assumed. If the url is invalid, an empty database will be returned. If a list, it’s a list of status to get workflows from wmstats.
WorkflowWebTools.errorutils.create_table(curs)[source]

Create the workflows error table with the proper format :param sqlite3.Cursor curs: is the cursor to the database

WorkflowWebTools.errorutils.get_list_info(status_list)[source]

Get the list of workflows that match the statuses listed via workflowinfo.

Parameters:status_list (list) – The list of workflow statuses to get the info for
Returns:The workflow info dictionary, which matches the format of the unified all_errors.json.
Return type:dict
WorkflowWebTools.errorutils.open_location(data_location)[source]

This function assumes that the contents of the location is in JSON format. It opens the data location and returns the dictionary.

Parameters:data_location (str) – The location of the file or url
Returns:information in the JSON file
Return type:dict

Global Errors

Generates the content for the errors pages

author:Daniel Abercrombie <dabercro@mit.edu>
class WorkflowWebTools.globalerrors.ErrorInfo(data_location='')[source]

Holds the information for any errors for a session

connection_log(action)[source]

Logs actions on the sqlite3 connection

Parameters:action (str) – is the action on the connection
execute(query, params=None)[source]

Locks the internal database and makes the query.

Parameters:
  • query (str) – The query, which can include ‘?’
  • params (tuple) – The parameters to pass into the query
Returns:

The results of the query

Return type:

list

get_allmap()[source]
Returns:A dictionary that maps ‘errorcode’, ‘stepname’, and ‘sitename’ to the lists of all the errors, steps, or sites
Return type:dict
get_prepid(prep_id)[source]
Parameters:prep_id (str) – The name of the Prep ID to check cache for
Returns:Either cached PrepIDInfo, or a new one
Return type:WorkflowWebTools.workflowinfo.PrepIDInfo
get_step_list(workflow)[source]

Gets the list of steps within a workflow

Parameters:workflow (str) – Name of the workflow to gather information for
Returns:list of steps withing the workflow
Return type:list
get_step_table(step, readymatch=None)[source]

Get the sparse representation of the step table. Fetches from an internal dictionary, so faster than database access

Parameters:
  • step (str) – The step name for the table
  • readymatch (list) – The list of site readiness statuses to match
Returns:

The list used to build the step table. Each element of the list is a tuple of (number of errors, site name, exit code)

Return type:

list of tuples

get_workflow(workflow)[source]

This should be used to get the workflow info so that there is no redundant fetching for a single session.

Parameters:workflow (str) – The prep ID for a workflow
Returns:Cached WorkflowInfo from the ToolBox.
Return type:WorkflowWebTools.workflowinfo.WorkflowInfo
return_workflows()[source]
Returns:the ordered list of all workflow prep IDs that need attention
Return type:list
set_all_lists()[source]

Get sets the list of all steps, sites, and errors for an ErrorInfo object. This should be called if data is added to the ErrorInfo cursor manually.

setup()[source]

Create an SQL database from the all_errors.json generated by production

teardown()[source]

Close the database when cache expires

WorkflowWebTools.globalerrors.TITLEMAP = {'errorcode': 'error code', 'sitename': 'site name', 'stepname': 'workflow'}

Dictionary that determines how a chosen pievar shows up in the pie chart titles

WorkflowWebTools.globalerrors.check_session(session, can_refresh=False)[source]

If session is None, fills it.

Parameters:
  • session (cherrypy.Session) – the current session
  • can_refresh (bool) – tells the function if it is safe to refresh and close the old database
Returns:

ErrorInfo of the session

Return type:

ErrorInfo

WorkflowWebTools.globalerrors.default_errors_format()[source]

Gives a defaultdict with the format:

{group1: {'errors': {group1_1: {group1_1_1: errors, group1_1_2: errors}}, group2: ...}}
Returns:A defaultdict for building errors
Return type:collections.defaultdict
WorkflowWebTools.globalerrors.get_errors(pievar, session=None)[source]

Gets the number of errors with the format:

{group1: {'errors': {group1_1: {group1_1_1: errors, group1_1_2: errors}}, group2: ...}}

where each group is a different value for the variables that go into the row of the global errors table. That is, the groups will usually be the subtask list, unless pievar is "stepname". In that case, the grouping is by error code.

Parameters:
  • pievar (str) – The variable that each piechart is split into.
  • session (cherrypy.Session) – Stores the information for a session
Returns:

A dictionary of 2D list of errors.

Return type:

defaultdict

WorkflowWebTools.globalerrors.get_row_col_names(pievar)[source]

Get the column and row for the global table view, based on user input

Parameters:pievar (str) – The variable to divide the piecharts by.
Returns:The names of the global table rows, and the table columns
Return type:(str, str)
WorkflowWebTools.globalerrors.get_step_table(step, session=None, allmap=None, readymatch=None, sparse=False)[source]

Gathers the errors for a step into a 2-D table of ints

Parameters:
  • step (str) – name of the step to get the table for
  • session (cherrypy.Session) – Stores the information for a session
  • allmap (dict) – a globalerrors.ErrorInfo allmap to override the session’s allmap
  • readymatch (tuple) – Match the readiness statuses in this tuple, if set
  • sparse (bool) – Determines whether or not a sparse matrix is returned
Returns:

A table (made of lists) of errors for the step or a sparse dictionary of entries

Return type:

list of lists or dict of dicts of ints

WorkflowWebTools.globalerrors.group_errors(input_errors, grouping_function, **kwargs)[source]

Takes inputs errors with the format:

{group1: {'errors': {group1_1: {group1_1_1: errors, group1_1_2: errors}}, group2: ...}}

and sums the errors into a larger group. This second grouping is done by the output of the grouping_function.

Parameters:
  • input_errors (dict) – The input that will be grouped
  • grouping_function (function) – Takes an input, which is a key of input_errors and groups those keys by this function output.
  • kwargs – The keyword should point to a function. That keyword will be added to the dictionary of each group. It’s value will be the function output with the group as an argument.
Returns:

A dictionary with the same format as the input, but with groupings.

Return type:

defaultdict

WorkflowWebTools.globalerrors.list_matching_pievars(pievar, row, col, session=None)[source]

Return an iterator of variables in pievar, and number of errors for a given rowname and colname

Parameters:
  • pievar (str) – The variable to return an iterator of
  • row (str) – Name of the row to match
  • col (str) – Name of the column to match
  • session (cherrypy.Session) – stores the session information
Returns:

List of tuples containing name of pievar and number of errors

Return type:

list

WorkflowWebTools.globalerrors.see_workflow(workflow, session=None)[source]

Gathers the error information for a single workflow

Parameters:
  • workflow (str) – Name of the workflow to gather information for
  • session (cherrypy.Session) – Stores the information for a session
Returns:

Dictionary used to generate webpage for a requested workflow

Return type:

dict

Workflow Info

Module containing and returning information about workflows.

authors:Daniel Abercrombie <dabercro@mit.edu>
class WorkflowWebTools.workflowinfo.Info[source]

Implements shared operations on the cache

cache_filename(attribute)[source]

Return the name of the file for caching

Parameters:attribute (str) – The information to store in the file
Returns:The full file name to store the cache
Return type:str
reset()[source]

Reset the cache for this object and clear out the files.

class WorkflowWebTools.workflowinfo.PrepIDInfo(prep_id, url='cmsweb.cern.ch')[source]

A class that just holds a small amount of information about a given PrepID.

get_requests(*args, **kwargs)[source]
Returns:The requests for the Prep ID from ReqMgr2 API
Return type:dict
get_workflows()[source]
Returns:A list of tuples containing (workflow name, timestamp of request)
Return type:list
get_workflows_requesttime()[source]
Returns:A list of tuples containing (workflow name, timestamp of request)
Return type:list
class WorkflowWebTools.workflowinfo.WorkflowInfo(workflow, url='cmsweb.cern.ch')[source]

Class that holds methods for accessing various information about a workflow.

get_errors(*args, **kwargs)[source]

A wrapper for errors_for_workflow() if you happen to have a WorkflowInfo object already.

Parameters:get_unreported (bool) – Get the unreported errors from ACDC server
Returns:a dictionary containing error codes in the following format:
{step: {errorcode: {site: number_errors}}}
Return type:dict
get_explanation(errorcode, step='')[source]

Gets a list of error logs for a given error code.

Parameters:
  • errorcode (str) – The error code to explain
  • step (str) – The full name of the step to return explanations from
Returns:

list of error logs

Return type:

list

get_prep_id()[source]
Returns:the PrepID for this workflow
Return type:str
get_recovery_info(*args, **kwargs)[source]

Get the recovery info for this workflow.

Returns:a dictionary containing the information used in recovery. The keys in this dictionary are arranged like the following:
{ task: { 'sites_to_run': list(sites), 'missing_to_run': int() } }
Return type:dict
get_workflow_parameters(*args, **kwargs)[source]

Get the workflow parameters from ReqMgr2, or returns a cached value. See the ReqMgr 2 wiki for more details.

Returns:Parameters for the workflow from ReqMgr2.
Return type:dict
site_to_run(task)[source]

Gets a list of sites that a task in the workflow can run at

Parameters:task (str) – The full name of the task to find sites for
Returns:a list of site to run at
Return type:list
WorkflowWebTools.workflowinfo.cached_json(attribute, timeout=None)[source]

A decorator for caching dictionaries in local files.

Parameters:
  • attribute (str) – The key of the WorkflowInfo cache to set using the decorated function.
  • timeout (int) – The amount of time before refreshing the JSON file, in seconds.
Returns:

Function decorator

Return type:

func

WorkflowWebTools.workflowinfo.errors_for_workflow(workflow, url='cmsweb.cern.ch')[source]

Get the useful status information from a workflow

Parameters:
  • workflow (str) – the name of the workflow request
  • url (str) – the base url to find the information at
Returns:

a dictionary containing error codes in the following format:

{step: {errorcode: {site: number_errors}}}

Return type:

dict

WorkflowWebTools.workflowinfo.explain_errors(workflow, errorcode)[source]

Get example errors for a given workflow and errorcode

Parameters:
  • workflow (str) – is the workflow name
  • errorcode (str) – is the error code
Returns:

a dict of log snippets from different sites.

Return type:

list

WorkflowWebTools.workflowinfo.list_workflows(status)[source]

Get the list of workflows currently in a given status. For a list of valid requests, visit the Request Manager Interface.

Parameters:status (str) – The status of the workflow lists being looked for
Returns:A list of workflows matching the status
Return type:list

Workflow Clustering

Tools for clustering workflows based on their errors.

The errors are clustered based on a generated vector. The vector can be thought as lying on two different hyper-sphere shells. One sphere is for the error codes that occur and the other sphere is the sites where those errors occur. The overall distance between two workflow will be the sum in quadrature of the distances on these two hyper-spheres.

These distances are configurable in the server config.yml. Each sphere has a center radius, thickness, and number of errors to be at the midpoint. The equation to determine distance from the origin for a vector of errors is the following.

\[\mathrm{distance} = \frac{d}{\sqrt{2}} + 2 w \left(\frac{|\vec{v}|}{|\vec{v}| + m} - 0.5\right)\]

The following parameters are set in config.yml for the site name and the error code hyperspheres separately.

  • m is the ‘midpoint’ parameter is the number of errors at a given site or given error code that will place the workflow at the midpoint of the hypersphere shell.
  • d is the ‘distance’ parameter is the cartesian distance between the midpoints of two different sites or error codes.
  • w is the ‘width’ parameter which defines the total shell width that the workflow could possibly land on.

The direction is determined by the error code or site name distribution. Note that this always points in the upper quadrant for a given coordinate.

The equation is chosen so that two workflows that have completely different errors at the same site, and the error width is 0, will end up with a distance that is equal to the error distance when clustering. This way, site and errors can have different distance weights, and there can be some separation for the number of errors in a workflow (for non-zero width).

author:Daniel Abercrombie <dabercro@mit.edu>
WorkflowWebTools.clusterworkflows.CLUSTER_LOCK = <thread.lock object>

Lock that should be acquired before running clustering functions in here

WorkflowWebTools.clusterworkflows.get_clustered_group(workflow, clusterer, session=None)[source]

Get the group for a given workflow in this session

Parameters:
  • workflow (str) – The workflow to get the group for.
  • clusterer (sklearn.cluster.KMeans) – is the clusterer fit with historic data.
  • session (cherrypy.Session) – Stores the information for a session
Returns:

List of other workflows in the same group

Return type:

set

WorkflowWebTools.clusterworkflows.get_clusterer(history_path, errors_path='')[source]

Use this function to get the clusterer of workflows

Parameters:
  • history_path (str) – Path to the workflow historical data. This can be a local file path or a URL.
  • errors_path (str) – The errors for a given session to include in the clustering
Returns:

A dict of a clusterer that is fitted to historical data with its allmap. The keys are ‘clusterer’ and ‘allmap’.

Return type:

dict

WorkflowWebTools.clusterworkflows.get_workflow_groups(clusterer, session=None)[source]

Groups workflows together based on a fitted clusterer

Parameters:
  • clusterer (dict) – is a dictionary with the clusterer fit with historic data and the allmap to generate it. This matches the output of get_clusterer().
  • session (cherrypy.Session) – Stores the information for a session
Returns:

A dictionary pointing workflows to a group

Return type:

dict

WorkflowWebTools.clusterworkflows.get_workflow_vectors(workflows, session=None, allmap=None)[source]

Gets the errors for workflows in a list of numpy arrays

Parameters:
  • workflows (str) – the workflows that vectors are returned for
  • session (cherrypy.Session) – Stores the information for a session
  • allmap (dict) – a globalerrors.ErrorInfo allmap to override the session’s allmap
Returns:

a list of numpy arrays of errors for the workflow

Return type:

list of numpy.array

Manage Users

Module to manage users of WorkflowWebTools.

author:Daniel Abercrombie <dabercro@mit.edu>
WorkflowWebTools.manageusers.add_user(email, username, password, url)[source]

Adds the user to the users database and sends a verification email, if the parameters are valid.

Parameters:
  • email (str) – The user email to send verification to. Make sure to check for valid domains.
  • username (str) – The username of the account to add
  • password (str) – The password to be stored
  • url (str) – The base url to redirect the user to for validating their account
Returns:

Success code. 0 for adding user, 1 for not adding

Return type:

int

WorkflowWebTools.manageusers.confirmation(code, lookup='validator', return_curs=False)[source]

Determine the user from the confirmation code and activate the account

Parameters:
  • code (str) – the confirmation code for the user
  • lookup (str) – The field to look up the username with
  • return_curs (bool) – If false, closes the connection
Returns:

the user name if valid code, or ‘’ if not. Followed by conn, curs if return_curs is True

Return type:

str [, sqlite3.Connection, sqlite3.Cursor]

WorkflowWebTools.manageusers.do_salt_hash(to_hash)[source]

Salt and hash an object into a storable form.

Parameters:to_hash (str) – Alternate salt and hashing to get stored variable
Returns:A salty hash
Return type:str
WorkflowWebTools.manageusers.get_user_db()[source]

Gets the users database in the local directory.

Returns:the users connection, cursor
Return type:sqlite3.Connection, sqlite3.Cursor
WorkflowWebTools.manageusers.resetpassword(code, password)[source]

Resets the password for a user.

Parameters:
  • code (str) – is the validation code for the account
  • password (str) – is the new password
Returns:

user from the password reset

Return type:

str

WorkflowWebTools.manageusers.send_reset_email(email, url)[source]

Generates a confirmation code for a given account and sends a reset email.

Parameters:
  • email (str) – the email linked to the account
  • url (str) – the url of the running instance
WorkflowWebTools.manageusers.validate_password(_, username, password)[source]

Verifies users’ logon attempts. This is called automatically by cherrypy, hence the unused parameter.

Parameters:
  • _ (str) – The hostname
  • username (str) – The attempted username
  • password (str) – The attempted password
Returns:

Whether or not the loging attempt succeeds

Return type:

bool

Reasons Manipulation

Module for manipulating the database that stores past operator reasons.

author:Daniel Abercrombie <dabercro@mit.edu>
WorkflowWebTools.reasonsmanip.get_reasons()[source]

Gets the reasons database in the local directory.

Returns:the reasons connection, cursor
Return type:(sqlite3.Connection, sqlite3.Cursor)
WorkflowWebTools.reasonsmanip.reasons_list()[source]

Get the full list of reasons

Returns:all of the reasons in a dictionary with the short reasons being the key
Return type:dict
WorkflowWebTools.reasonsmanip.short_reasons_list()[source]

Get the list of short reasons

Returns:the list of short reasons
Return type:list of strs
WorkflowWebTools.reasonsmanip.update_reasons(reasons)[source]

Gets the reasons for a given action and updates the reasons db

Parameters:

reasons (list of dicts) – is a list of dictionaries of short and long reasons, with keys ‘short’ and ‘long’

Raises:
  • TypeError – if the parameter is not a list
  • KeyError – if the dictionaries in the list do not have the correct structure

Manage Actions

Module to manage actions of WorkflowWebTools.

author:Daniel Abercrombie <dabercro@mit.edu>
WorkflowWebTools.manageactions.extract_reasons_params(action, **kwargs)[source]

Extracts the reasons and parameters for an action from kwargs

Parameters:
  • action (str) – The action being asked for by the operator
  • kwargs – Keywords that are submitted via POST to /submitaction
Returns:

reasons for action, and parameters for the action

Return type:

list, dict

WorkflowWebTools.manageactions.fix_sites(**kwargs)[source]

Fix the site lists for tasks that had zero sites.

Parameters:kwargs – Keywords to be parsed by extract_reasons_params()
WorkflowWebTools.manageactions.get_acted_workflows(num_days)[source]

Get all of the workflows that have actions assigned

Parameters:num_days (int) – is the number of past days to check for actions. This speeds up the check while not losing the ability to look farther back in time.
Returns:a list of workflows acted on
Return type:list
WorkflowWebTools.manageactions.get_actions(num_days=None, num_hours=24, acted=0)[source]

Get the recent actions to be acted on in dictionary form

Parameters:
  • num_days (int) – is the number of days to check for actions
  • num_hours (int) – can be used instead of num_days for finer granularity. If num_days is given, num_hours is ignored.
  • acted (int) – Flag (0, 1, or None) for whether to use acted or not
Returns:

A dictionary of actions, to be rendered as JSON

Return type:

dict

WorkflowWebTools.manageactions.get_actions_collection()[source]

Gets the actions collection from MongoDB.

Returns:the actions collection
Return type:pymongo.collection.Collection
WorkflowWebTools.manageactions.get_datetime_submitted(workflow)[source]

Get the datetime for a submitted workflow

Parameters:workflow (str) – A workflow to get the time submitted
Returns:A datetime object, or None if the workflow has never been submitted
Return type:datetime.datetime or None
WorkflowWebTools.manageactions.report_actions(workflows, output=None)[source]

Mark actions as acted on

Parameters:
  • workflows (list) – is the list of workflows to no longer show
  • output (dict) – If set, a reference to a dictionary to update with details of what was and wasn’t set to acted
WorkflowWebTools.manageactions.submitaction(user, workflows, action, session=None, **kwargs)[source]

Writes the action to Unified and notifies the user that this happened

Parameters:
  • user (str) – is the user that submitted the action
  • workflows (str) – is the original workflow name or workflows
  • action (str) – is the suggested action for Unified to take
  • session (cherrypy.Session) – the current session
  • kwargs – can include various reasons and additional datasets
Returns:

a tuple of workflows, reasons, and params for the action

Return type:

list, str, list of dicts, dict

Show Log

Generates content for the showlog webpage

author:Daniel Abercrombie <dabercro@mit.edu>
WorkflowWebTools.showlog.give_logs(query, module='', limit=50)[source]

Takes output from CMSToolBox.elasticsearch.search_logs() and organizes it to pass to a mako template

Parameters:
Returns:

dictionary with [‘logs’], which is a list of logs, and [‘meta’], which shows meta information of newest query result

Return type:

dict

Classifying Errors

This module reads the explain error info for a session and tries to classify them in a way that can help recommend procedures to the operator. These procedures are gathered from WorkflowWebTools.procedures.

author:Daniel Abercrombie <dabercro@mit.edu>
WorkflowWebTools.classifyerrors.classifyerror(errorcode, workflow, session=None)[source]

Return the most relevant characteristics of an error code for this session. This will include things like:

  • Files not opening
  • StageOut errors

Note

More error types should be added to this function as needed

Parameters:
  • errorcode (int) – The error code that we want to classify
  • workflow (str) – the workflow that we want to get the errors from
  • session (cherrypy.Session) – Is the user’s cherrypy session
Returns:

A tuple of strings describing the key characteristics of the errorcode. These strings are good for printing directly in web browsers. The first string is the types of errors reported with this error code. The second string is the recommended normal actions. The third string contains special parameters useful for additional actions.

Return type:

str, str

WorkflowWebTools.classifyerrors.get_max_errorcode(workflow, session=None)[source]

Get the errorcode with the most errors for a session

Parameters:
  • workflow (str) – Is the primary name of the workflow
  • session (cherrypy.Session) – Is the user’s cherrypy session
Returns:

The error code that appears most often for this workflow

Return type:

int

List Workflows At Site

This module provides lists of errors per pie variable

author:Daniel Abercrombie
WorkflowWebTools.listpage.listworkflows(error_code, site_name, workflow, session=None)[source]

Gives back a list of tuples containing pie variables and the number of errors that matches a given error_code and site_name.

Parameters:
  • error_code (int) – The error code that we want errors for
  • site_name (str) – The site name that we want errors for
  • workflow (str) – The workflow that we want errors for
  • session (cherrypy.Session) – holds the session information
Returns:

List of tuples of pie variable and errors, sorted by number of errors

Return type:

list

JavaScript and Mako User Interface

Some documentation of the JavaScript used on the webpages are given below.

addreason.js

This file contains the functions used to add reasons to the Detailed Workflow View The fields added by this script are handled by the WorkflowWebTools.reasonsmanip module.

Todo

This is some of the messiest code in the package. Need to impliment some JSLint check.

author:Daniel Abercrombie <dabercro@mit.edu>
count

Keeps track of the number of reasons on a given page.

getDefaultReason(num)

This function is used to generate a new reason on the Detailed Workflow View. Each name and ID is unique so that the short and long reasons can be matched.

Arguments:
  • num (int) – is a unique number for each reason submitted on the current page.
Returns:

the innerHTML for a reason div when a new one is generated.

Return type:

str

addReason()

Creates a new reason division and fills it with the result from getDefaultReason(). It increments count. Finally, the division is appended to the “reasons” division.

removeReason(childId)

Removes a reason from the “reasons” division.

Arguments:
  • childId (str) – is the full ID of the child to remove.
fillLongReason(number)

If a short reason is selected from the select field, this function displays the long reason.

Arguments:
  • number (int) – is the ID number of the reason to fill.
checkReason(num)

Checks if a short reason entered by the user is already taken or not. If it is taken, the function gives a warning message and displays the corresponding long reason.

expand_children(arrow, this_level, this_name, only_hide)

This function expands or collapses the rows underneath a given header. The collapsing happens recursively.

Arguments:
  • this_level – Tells what level the row expansion is on. (0, 1, or 2, for example)
  • this_name – Give the name of the row doing the expansion.
  • only_hide – Set this to true for recursive collapsing. Prevents expansion when not desired.

makeparams.js

This file contains the functions used to add parameters to the page when different actions are selected. The fields added by this script are handled by the WorkflowWebTools.manageactions module.

author:Daniel Abercrombie <dabercro@mit.edu>
taskTable(opts, texts, taskNumber)

Creates a “table” for parameters to be filled in and returns the <div> these are contained in.

param dict opts:
 Each key is the name of the parameter. Each value is the list of possible values for this parameter.
param list texts:
 A list of parameters that should be set by a text field.
param int taskNumber:
 Separates the different parameters for each task in ACDC.
returns:The <div> element that contains the table
printSiteLists(method)

Sets the contents of the site tables under the “acdc” action based on the method of site selection subsequently selected.

makeParamTable(action)

Offers the various options that are available for a given action.

checkuser.js

JavaScript functions that are used when registering a new user or reseting a password. The password check is not done by any other part of the system, but all other checks are performed by the Python backend on submission for security purposes.

author:Daniel Abercrombie <dabercro@mit.edu>
checkPassword()

Checks input fields with the ids #firstword and #secondword. If they match, sets the innerHTML of the div #confirmresult with a green positive message. Otherwise the div is filled with a red warning message.

validateForm()

Checks the form for registering a new user. The requirements for the form values are given in Creating a New Account.

piechart.js

Contains the drawPieCharts function for the global errors page.

author:Daniel Abercrombie <dabercro@mit.edu>
drawPies()

This function finds multiple canvases and draws pie charts onto them. The number of errors for each canvas are by a JSON string hidden in the table with id ‘errortable’. The piecharts are split with the largest contributer always having the same color. The radius of the piechart is a fraction of the canvas size, determined by the following equation:

\[\frac{r}{r_{canvas}} = \frac{n_{errors}}{n_{errors} + 10}\]