Workflow Team Web Tools¶
Welcome to the documentation for the new Workflow Team Web Tools.
Using the Web Tools as an Operator¶
Once you are pointed to a proper URL to access the webtools, you will
come to the home page, with links to different views.
For each of the examples below, I will use the base URL of https://localhost:8080/,
since that is likely the URL you can see if you run the server on your machine.
If you are looking at a production server, the URL will of course be different.
For users familiar with CherryPy, each page is a function of a WorkflowTools object.
To pass parameters to the function in a browser,
the usual urlencoding of the parameters can be appended to the URL to call each function.
Most users should be able to interact with the website exclusively through links though.
From the URL root index, users will be able to directly access the following:
The Global Error View¶
-
WorkflowTools.globalerror(pievar='errorcode')[source]¶ This page, located at
https://localhost:8080/globalerror, attempts to give an overall view of the errors that occurred in each workflow at different sites. The resulting view is a table of piecharts. The rows and columns can be adjusted to contain two of the following:- Workflow step name
- Site where error occurred
- Exit code of the error
The third variable is used to split the pie charts. This variable inside the pie charts can be quickly changed by submitting the form in the upper left corner of the page.
The size of the piecharts depend on the total number of errors in a given cell. Each cell also has a tooltip, giving the total number of errors in the piechart. The colors of the piecharts show the splitting based on the
pievar. Clicking on the pie chart will show the splitting explicitly using the List Workflows page.If the steps make up the rows, the default view will show you the errors for each campaign. Clicking on the campaign name will cause the rows to expand to show the original workflow and all ACDCs (whether or not the ACDCs have errors). Following the link of the workflow will bring you to Detailed Workflow View. Clicking anywhere else in the workflow box will cause it to expand to show errors for each step.
Parameters: pievar (str) – The variable that the pie charts are split into. Valid values are:
- errorcode
- sitename
- stepname
Returns: the global views of errors Return type: str
List Workflows¶
-
WorkflowTools.listpage(errorcode='', sitename='', workflow='')[source]¶ This returns a list of workflows, site names, or error codes that matches the values given for the other two variables. The page can be accessed directly by clicking on a corresponding pie chart on The Global Error View.
Parameters: - errorcode (int) – Error to match
- sitename (str) – Site to match
- workflow (str) – The workflow to match
Returns: Page listing workflows, site names or errors codes
Return type: str
Raises: cherrypy.HTTPRedirect to 404 if all variables are filled
Detailed Workflow View¶
-
WorkflowTools.seeworkflow(workflow='', issuggested='')[source]¶ Located at
https://localhost:8080/seeworkflow, this shows detailed tables of errors for each step in a workflow.For the exit codes in each row, there is a link to view some of the output of the error message for jobs having the given exit code. This should help operators understand what the error means.
At the top of the page, there are links back for The Global Error View, Viewing Workflow Logs, related JIRA tickets, and ReqMgr2 information about the workflow and prep ID.
The main function of this page is to submit actions. Note that you will need to register in order to actually submit actions. See Creating a New Account for more details. Depending on which action is selected, a menu will appear for the operator to adjust parameters for the workflows.
Under the selection of the action and parameters, there is a button to show other workflows that are similar to the selected workflow, according to the Workflow Info. Each entry is a link to open a similar workflow view page in a new tab. The option to submit actions will not be on this page though (so that you can focus on the first workflow). If you think that a workflow in the cluster should have the same actions applied to it as the parent workflow, then check the box next to the workflow name before submitting the action.
Finally, before submitting, you can submit reasons for your action selection. Clicking the Add Reason button will give you an additional reason field. Reasons submitted are stored based on the short reason you give. You can then select past reasons from the drop down menu to save time in the future. If you do not want to store your reason, do not fill in the Short Reason field. The long reason will be used for logging and communicating with the workflow requester (eventually).
Parameters: - workflow (str) – is the name of the workflow to look at
- issuggested (str) – is a string to tell if the page has been linked from another workflow page
Returns: the error tables page for a given workflow
Return type: str
Raises: 404 if a workflow doesn’t seem to be in assistance anymore Resets personal cache in the meanwhile, just in case
Getting the List of Actions¶
-
WorkflowTools.getaction(days=0, acted=0)[source]¶ The page at
https://localhost:8080/getactionreturns a list of workflows to perform actions on. It may be useful to use this page to immediately check if your submission went through properly. This page will mostly be used by Unified though for acting on operator submissions.Parameters: - days (int) – The number of past days to check. The default, 0, means to only check today.
- acted (int) –
Used to determine which actions to return. The following values can be used:
- 0 - The default value selects actions that have not been run on
- 1 - Selects actions reported as submitted by Unified
- Negative integer - Selects all actions
Returns: JSON-formatted information containing actions to act on. The top-level keys of the JSON are the workflow names. Each of these keys refers to a dictionary specifying:
- ”Action” - The action to take on the workflow
- ”Reasons” - A list of reasons for this action
- ”Parameters” - Changes to make for the resubmission
- ”user” - The account name that submitted that action
Return type: JSON
Reporting Completed Actions¶
-
WorkflowTools.reportaction()[source]¶ A POST request to
https://localhost:8080/reportactiontells the WorkflowWebTools that a set of workflows has been acted on by Unified. The body of the POST request must include a JSON with the passphrase under"key"and a list of workflows under"workflows".An example of making this POST request is provided in the file
WorkflowWebTools/test/report_action.py, which relies onWorkflowWebTools/test/key.json.Returns: A JSON summarizing results of the request. Keys and values include: - ’valid_key’ - If this is false, the passphrase passed was wrong,
- and no action is changed
- ’valid_format’ - If false, the input format is unexpected
- ’success’ - List of actions that have been marked as acted on
- ’already_reported’ - List of workflows that were already removed
- ’does_not_exist’ - List of workflows that the database does not know
Return type: JSON
Viewing Workflow Logs¶
-
WorkflowTools.showlog(search='', module='', limit=50)[source]¶ This page, located at
https://localhost:8080/showlog, returns logs that are stored in an elastic search server. More details can be found at Elastic Search Tools. If directed here from Detailed Workflow View, then the search will be for the relevant workflow.Parameters: - search (str) – The search string
- module (str) – The module to look at, if only interested in one
- limit (int) – The limit of number of logs to show on a single page
Returns: the logs from elastic search
Return type: str
Creating a New Account¶
-
WorkflowTools.newuser(email='', username='', password='')[source]¶ New users can register at
https://localhost:8080/newuser. From this page, users can enter a username, email, and password. The username cannot be empty, must contain only alphanumeric characters, and must not already exist in the system. The email must match the domain names listed on the page or can be a specific whitelisted email. See Setting Up Server Configuration for more information on setting valid emails. Finally, the password must also be not empty.If the registration process is successful, the user will recieve a confirmation page instructing them to check their email for a verification link. The user account will be activated when that link is followed, in order to ensure that the user possesses a valid email.
The following parameters are sent via POST from the registration page.
Parameters: - email (str) – The email of the new user
- username (str) – The username of the new user
- password (str) – The password of the new user
Returns: a page to generate a new user or a confirmation page
Return type: str
Raises: cherrypy.HTTPRedirect back to the new user page without parameters if there was a problem entering the user into the database
Reseting Account Password¶
-
WorkflowTools.resetpassword(email='', code='', password='')[source]¶ If a user forgets his or her username or password, navigating to
https://localhost:8080/resetpasswordwill allow them to enter their email to reset their password. The email will contain the username and a link to reset the password.This page is multifunctional, depending on which parameters are sent. The link actually redirects to this webpage with a secret code that will then allow you to submit a new password. The password is then submitted back here via POST.
Parameters: - email (str) – The email linked to the account
- code (str) – confirmation code to activate the account
- password (str) – the new password for a given code
Returns: a webview depending on the inputs
Return type: str
Raises: 404 if both email and code are filled
Manually Reseting Your Workflow Cache¶
-
WorkflowTools.resetcache(prepid='', workflow='')[source]¶ The function is only accessible to someone with a verified account.
Navigating to
https://localhost:8080/resetcacheresets the error info for the user’s session. It also clears out cached JSON files on the server. Under normal operation, this cache is only refreshed every half hour.Returns: a confirmation page Return type: str
Redoing the Workflow Clusters¶
-
WorkflowTools.cluster()[source]¶ The function is only accessible to someone with a verified account.
Navigating to
https://localhost:8080/clustercauses the server to regenerate the clusters that it has stored. This is useful when the history database of past errors has been updated with relevant errors since the server has been started or this function has been called.Returns: a confirmation page Return type: str
Workflows Procedures¶
Here is the list of recommended procedures for various exit codes:
| Exit Code |
Cause | Normal Procedure | Additional Procedure | Additional Match Expression |
|---|---|---|---|---|
| 73 | RootEmbeddedFileSequence: secondary file list seems to be missing. When overflow procedure is enables, the agent should be able to see the secondary file set as present at those sites as well. |
Open a GitHub issue | ||
| 84 | The file could not be found |
ACDC | If a second round of ACDC and fails for same files, contact the transfer team and/or site support team. |
root://[-/w.]+.root |
| 85 | If FileReadError, the file could not be read. If R__unzip: error, most likely a bad node with corrupted intermediate files. |
ACDC | If getting unzip errors, most likely a bad node. Report immediately. If a second round of ACDC and fails for same files, contact the transfer team and/or site support team. |
root://[-/w.]+.root R__unzip |
| 92 | File was not found and fallback procedure also failed. |
ACDC | If a second round of ACDC and fails for same files, contact the transfer team and/or site support team. |
root://[-/w.]+.root |
| 134 | Strong failure to report immediately |
|||
| 137 | Likely an unrelated batch system kill. Sometimes a site-related problem. |
Point SSP. Open GGUS ticket and put site in drain. |
||
| 8001 | Strong failure, report immediately |
|||
| 8002 | Strong failure, report immediately |
|||
| 8004 | BadAlloc - memory exhaustion |
Report immediately to the Site Support Team. |
||
| 11003 | JobExtraction failures | ACDC | ||
| 50513 | Failure to run SCRAM setup scripts |
ACDC | ||
| 50660 | Performance kill: “Job exceeding maxRSS” |
More memory | ||
| 61202 | Can’t determine proxy filename. X509 user proxy required for job. |
If temporary failure: ACDC, otherwise contact glideinWMS team |
||
| 71104 | Site containing only available copy of input datasets is in drain. |
Copy input files from tape (?) or wait for site listed to come out of drain. |
The job can run only at .*(T[12].* | |
| 71304 | Job was killed by WMAgent for using too much wallclock time |
ACDC | ||
| 71305 | Job was killed by WMAgent for using too much wallclock time |
ACDC | ||
| 71306 | Job was killed by WMAgent for using too much wallclock time |
ACDC | ||
| 99109 | Uncaught exception in WMAgent step executor (often staging out problems) |
Check the Site Readiness and consult with the Site Support Team. If a temporary failure: ACDC. If a problem with the site: Make GGUS ticket, put the site in drain then kill and clone |
||
| 99303 | The agent has not found the JobReport.x.pkl file. Regardless of the status of the job, it will fail the job with 99303. Work is needed to make this error as rare as possible since there are very limited cases to make the file fail to appear. |
Strong failure, report immediately |
The module WorkflowWebTools.procedures, which is used by
WorkflowWebTools.classifyerrors contains only the PROCEDURES dictionary.
The keys of this dictionary are the integer exit codes from the agent to act on.
The values are dictionaries that contain 'normal' and
'additional' actions and a 'cause'.
The 'additional' actions are dictionaries containing 're' and 'action'.
The 're' field is a compiled regular expression to match to the logs.
group(1) of this regular expression is taken to be a parameter to be displayed to the operator.
The 'action' field gives additional instructions to the operator,
who should inspect the parameters returned by the regular expression to determine
the ultimate action to perform.
The 'cause' will add some insight into the error codes in the documentation.
It isn’t shown to the operator because they already see the different types of error codes
and are provided with a link to look at the error logs directly.
An example procedure with just exit code 84 would look like the following:
PROCEDURES = {
84: {
'normal': 'ACDC',
'cause': 'The file could not be found',
'additional': {
're': re.compile(r'(root://.+root)'),
'action': ('If a second round of ACDC and fails for same files, '
'contact the transfer team and/or site support team.')
}
}
}
What this means is that error code 84 usually should be an ACDC. However, if the operator sees the same files appear from the regular expression, then the operator should contact the transfer team and/or the site support team.
These dictionaries are used to generate the procedures table above.
| author: | Daniel Abercrombie <dabercro@mit.edu> |
|---|
Running the Web Tools¶
The webtools are usually operated behind a cherrypy server.
Before running the script runserver/workflowtools.py,
there are a few other things that you should set up first.
Setting Up Server Configuration¶
The first script you should run is setup_server.sh.
Run this from inside the runserver directory:
cd runserver
./setup_server.sh
This script sets up a basic server configuration.
First, it generates or links to certificates for HTTPS. It takes one argument for determining the openssl flag for the passphrase protocol. Valid options for generating passwords are:
-aes128-aes192-aes256
To link to an existing key instead, pass the option:
./setup_server.sh -link
The script performs the following steps:
- Creates a
keysdirectory - Prompts for the locations of private key and certificates to create softlinks if the softlinks option is passed.
- Otherwise, a
keys/privkey.pemwith the password flag passed is created along with a cert.pem to use for SSL connections
- Generates a random list of salts from
generatesalt.py - Copies a example
config.ymlto the working directory
If the certificate is not linked, it is self-signed and valid for one year.
After completing these steps, the script reminds the user to edit config.yml.
Server Configuration¶
The server configuration is set by a YAML file, config.yml.
A number of parameters are set here, and read by the WorkflowWebTools.serverconfig:
- The webmaster name and email determines who the registration emails are sent from.
- The host name and port tell where to serve the cherrypy application if using the CherryPy server.
- The data locations are split up into
- Historic information
- Location to fetch the current errors
- Location to fetch the current errors’ explanations
- If any of these files are behind Shibboleth,
cookie_file, cookie_pem, and cookie_key should be listed under data too.
These keys are used as kwargs for
CMSToolBox.get_json()
- Valid registration email domains and specific whitelisted emails
- Actions stores the number of days to check the submission history for actions requested
- Clustering parameters
See WorkflowWebTools.clusterworkflows for more details on the clustering parameters.
Updating the Error History¶
-
update_history.py
A python script for updating the history database.
Can take arguments that are file names or urls. If a local file doesn’t exist, the url is assumed. The input file is then added to the central history database, as defined in your server config file. If no arguments are given, the central history database is just generated as empty.
It is recommended to set up a cron job to update your central history regularly. For example adding the line:
0 * * * * <path/to>/update_history.py <URL to errors>
to your crontab will automatically update the central database every hour. Duplicate entries will not be added.
| author: | Daniel Abercrombie <dabercro@mit.edu> |
|---|
Starting the CherryPy Server¶
Finally, the service can be launched by running:
./workflowtools.py
If you need sudo privileges, to access a certain port for example, you can use the script:
./run.sh
You may need to adjust the values in runserver/setenv.sh first.
Running the Server Behind WSGI¶
If running the site in a production environment,
you will likely need to run behind WSGI to enable CERN’s SSO.
Install mod_wsgi, which is not listed as a strict requirement for the package.
In your apache configuration, create a virtual host for the WSGI application.
You will also need to allow access to the runserver/static directory.
An example would be as follows:
Listen 443
<VirtualHost *:443>
WSGIScriptAlias / <path_to_WorkflowWebTools>/runserver/workflowtools.py
Alias /static <path_to_WorkflowWebTools>/runserver/static
<Directory <path_to_WorkflowWebTools>/runserver>
#
# Add Shibboleth authentification here? (Not developed yet)
#
WSGIApplicationGroup %{GLOBAL}
<IfVersion >= 2.4>
Require all granted
</IfVersion>
<IfVersion < 2.4>
Order allow,deny
Allow from all
</IfVersion>
</Directory>
ServerName 127.0.0.1
SSLEngine on
SSLCertificateFile <path_to_cert>
SSLCertificateKeyFile <path_to_private_key>
</VirtualHost>
Note
This is just what I have working on my laptop. This website has not been run behind WSGI in production yet. I would be grateful to learn about any errors in this section of the documentation.
Predicting Actions with SciKit-Learn¶
There have been some preliminary attempts to create a model that predicts actions
on a workflow based on the errors that it throws.
The results of this preliminary training are presented in the slides located at
WorkflowWebTools/docs/170705/dabercro_WorkflowWebTools_170705.pdf.
The following are instructions to run the training and test interactively:
First, create a JSON file that contains all of features and actions of previous workflows. This can be done with the script located at
WorkflowWebTools/runserver/create_actionshistorylink.py:./create_actionshistorylink.py static/history.VERSION_NUMBER.json
This should be done from the production server. The argument is just the name of the output file.
Then download the output JSON file to the computer where you will be doing the training. The module
WorkflowWebTools.paramsregressionalso contains a script for running the training interactively. This can be run from inside theWorkflowWebToolsdirectory with the following:python2.7 paramsregression.py PATH/TO/history.VERSION_NUMBER.json
By default, this will train for the
'action'field. To train a different parameter, give another argument to the script:python2.7 paramsregression.py PATH/TO/history.VERSION_NUMBER.json memory
The easiest way to get the classifier from inside server would be to call something like the following:
from WorkflowWebTools import actionshistorylink
from WorkflowWebTools import paramsregression
model = paramsregression.get_classifier(actionshistorylink.dump_json(), 'action')
The clusterer then works like any other sklearn trained model.
Note
This may take a long time to return, so generating the classifier is only something you would want to do once in a while (such as when the server first turns on or when an approved user accesses some restricted address).
For more details on the fuctions used in the internals, see Machine Learning Functions (or the source code).
Troubleshooting¶
This is just a quick guide for anyone trying to troubleshoot or restart the production server.
If the web service is not responding, first check that you are accessing the machine using CERN’s network (either by being physically at CERN or using a proxy). If you are certain that you are using the CERN network, then you will have to log into the server. The name of the server is not given due to potential security concerns. To complete all of the following steps, you need sudo access to the machine.
SSH into a CERN machine.
Try downloading actions from the machine:
wget -O test.json --no-check-certificate "https://vocms0113.cern.ch:80/getaction?days=30&acted=-1"If the file
test.jsonis not empty, then the service is likely still running properly. Go back to checking your own connection.SSH into the server.
Check top for the process
python2.7. If it is there, but the server is still not responding, kill the process.Do the following to start the process again:
cd /home/dabercro/OpsSpace/WorkflowWebTools/runserver ./run.shCheck the server status:
sudo tail -f /home/dabercro/OpsSpace/WorkflowWebTools/runserver/nohup.out(You can exit
tailwithCtrl + c.) Hopefully, the last line says something like:ENGINE Bus STARTED
Maintaining the Python Backend¶
For developers wishing to make adjustments to the modules or anyone else who wants to understand some of the backend of the server, all of the Python modules for this system are documented below.
Server Configuration¶
Small module to get information from the server config.
| author: | Daniel Abercrombie <dabercro@mit.edu> |
|---|
-
WorkflowWebTools.serverconfig.LOCATION= '/home/docs/checkouts/readthedocs.org/user_builds/cms-comp-ops-tools/checkouts/v0.7/WorkflowWebTools/test'¶ The string giving the location that serverconfig looks first for the configuration
-
WorkflowWebTools.serverconfig.all_errors_path()[source]¶ Returns: the path to all errors for recent workflows Return type: str
-
WorkflowWebTools.serverconfig.config_dict()[source]¶ Returns: the configuration in a dict Return type: str Raises: Exception – when it cannot find the configuration file
-
WorkflowWebTools.serverconfig.get_cluster_settings()[source]¶ Returns: dictionary containing the settings for clustering Return type: dict:
-
WorkflowWebTools.serverconfig.get_history_length()[source]¶ Returns: the number of days of history to check for workflows with an action submitted. Return type: int
-
WorkflowWebTools.serverconfig.get_valid_emails()[source]¶ Get iterator for valid email patterns for this instance. This is configurable by the webmaster.
Returns: List of valid email patterns Return type: list
-
WorkflowWebTools.serverconfig.host_name()[source]¶ Returns: the name of the host from the config Return type: str
-
WorkflowWebTools.serverconfig.host_port()[source]¶ Returns: the port to host the site on Return type: str
Functions for Error Tracking¶
Simple utils functions that do not rely on a session.
These are separated from globalerrors, since we do not need to generate an ErrorInfo instance in other applications.
| author: | Daniel Abercrombie <dabercro@mit.edu> |
|---|
-
WorkflowWebTools.errorutils.add_to_database(curs, data_location)[source]¶ Add data from a file to a central database through the passed cursor
Parameters: - curs (sqlite3.Cursor) – is the cursor to the database
- data_location (str or list) – If a string, this is the location of the file or url of data to add to the database. This should be in JSON format, and if a local file does not exist, a url will be assumed. If the url is invalid, an empty database will be returned. If a list, it’s a list of status to get workflows from wmstats.
-
WorkflowWebTools.errorutils.create_table(curs)[source]¶ Create the workflows error table with the proper format :param sqlite3.Cursor curs: is the cursor to the database
-
WorkflowWebTools.errorutils.get_list_info(status_list)[source]¶ Get the list of workflows that match the statuses listed via
workflowinfo.Parameters: status_list (list) – The list of workflow statuses to get the info for Returns: The workflow info dictionary, which matches the format of the unified all_errors.json. Return type: dict
-
WorkflowWebTools.errorutils.open_location(data_location)[source]¶ This function assumes that the contents of the location is in JSON format. It opens the data location and returns the dictionary.
Parameters: data_location (str) – The location of the file or url Returns: information in the JSON file Return type: dict
Global Errors¶
Generates the content for the errors pages
| author: | Daniel Abercrombie <dabercro@mit.edu> |
|---|
-
class
WorkflowWebTools.globalerrors.ErrorInfo(data_location='')[source]¶ Holds the information for any errors for a session
-
connection_log(action)[source]¶ Logs actions on the sqlite3 connection
Parameters: action (str) – is the action on the connection
-
execute(query, params=None)[source]¶ Locks the internal database and makes the query.
Parameters: - query (str) – The query, which can include ‘?’
- params (tuple) – The parameters to pass into the query
Returns: The results of the query
Return type: list
-
get_allmap()[source]¶ Returns: A dictionary that maps ‘errorcode’, ‘stepname’, and ‘sitename’ to the lists of all the errors, steps, or sites Return type: dict
-
get_prepid(prep_id)[source]¶ Parameters: prep_id (str) – The name of the Prep ID to check cache for Returns: Either cached PrepIDInfo, or a new one Return type: WorkflowWebTools.workflowinfo.PrepIDInfo
-
get_step_list(workflow)[source]¶ Gets the list of steps within a workflow
Parameters: workflow (str) – Name of the workflow to gather information for Returns: list of steps withing the workflow Return type: list
-
get_step_table(step, readymatch=None)[source]¶ Get the sparse representation of the step table. Fetches from an internal dictionary, so faster than database access
Parameters: - step (str) – The step name for the table
- readymatch (list) – The list of site readiness statuses to match
Returns: The list used to build the step table. Each element of the list is a tuple of
(number of errors, site name, exit code)Return type: list of tuples
-
get_workflow(workflow)[source]¶ This should be used to get the workflow info so that there is no redundant fetching for a single session.
Parameters: workflow (str) – The prep ID for a workflow Returns: Cached WorkflowInfo from the ToolBox. Return type: WorkflowWebTools.workflowinfo.WorkflowInfo
-
return_workflows()[source]¶ Returns: the ordered list of all workflow prep IDs that need attention Return type: list
-
-
WorkflowWebTools.globalerrors.TITLEMAP= {'errorcode': 'error code', 'sitename': 'site name', 'stepname': 'workflow'}¶ Dictionary that determines how a chosen pievar shows up in the pie chart titles
-
WorkflowWebTools.globalerrors.check_session(session, can_refresh=False)[source]¶ If session is None, fills it.
Parameters: - session (cherrypy.Session) – the current session
- can_refresh (bool) – tells the function if it is safe to refresh and close the old database
Returns: ErrorInfo of the session
Return type:
-
WorkflowWebTools.globalerrors.default_errors_format()[source]¶ Gives a defaultdict with the format:
{group1: {'errors': {group1_1: {group1_1_1: errors, group1_1_2: errors}}, group2: ...}}
Returns: A defaultdict for building errors Return type: collections.defaultdict
-
WorkflowWebTools.globalerrors.get_errors(pievar, session=None)[source]¶ Gets the number of errors with the format:
{group1: {'errors': {group1_1: {group1_1_1: errors, group1_1_2: errors}}, group2: ...}}
where each group is a different value for the variables that go into the row of the global errors table. That is, the groups will usually be the subtask list, unless
pievaris"stepname". In that case, the grouping is by error code.Parameters: - pievar (str) – The variable that each piechart is split into.
- session (cherrypy.Session) – Stores the information for a session
Returns: A dictionary of 2D list of errors.
Return type: defaultdict
-
WorkflowWebTools.globalerrors.get_row_col_names(pievar)[source]¶ Get the column and row for the global table view, based on user input
Parameters: pievar (str) – The variable to divide the piecharts by. Returns: The names of the global table rows, and the table columns Return type: (str, str)
-
WorkflowWebTools.globalerrors.get_step_table(step, session=None, allmap=None, readymatch=None, sparse=False)[source]¶ Gathers the errors for a step into a 2-D table of ints
Parameters: - step (str) – name of the step to get the table for
- session (cherrypy.Session) – Stores the information for a session
- allmap (dict) – a globalerrors.ErrorInfo allmap to override the session’s allmap
- readymatch (tuple) – Match the readiness statuses in this tuple, if set
- sparse (bool) – Determines whether or not a sparse matrix is returned
Returns: A table (made of lists) of errors for the step or a sparse dictionary of entries
Return type: list of lists or dict of dicts of ints
-
WorkflowWebTools.globalerrors.group_errors(input_errors, grouping_function, **kwargs)[source]¶ Takes inputs errors with the format:
{group1: {'errors': {group1_1: {group1_1_1: errors, group1_1_2: errors}}, group2: ...}}
and sums the errors into a larger group. This second grouping is done by the output of the grouping_function.
Parameters: - input_errors (dict) – The input that will be grouped
- grouping_function (function) – Takes an input, which is a key of
input_errorsand groups those keys by this function output. - kwargs – The keyword should point to a function. That keyword will be added to the dictionary of each group. It’s value will be the function output with the group as an argument.
Returns: A dictionary with the same format as the input, but with groupings.
Return type: defaultdict
-
WorkflowWebTools.globalerrors.list_matching_pievars(pievar, row, col, session=None)[source]¶ Return an iterator of variables in pievar, and number of errors for a given rowname and colname
Parameters: - pievar (str) – The variable to return an iterator of
- row (str) – Name of the row to match
- col (str) – Name of the column to match
- session (cherrypy.Session) – stores the session information
Returns: List of tuples containing name of pievar and number of errors
Return type: list
-
WorkflowWebTools.globalerrors.see_workflow(workflow, session=None)[source]¶ Gathers the error information for a single workflow
Parameters: - workflow (str) – Name of the workflow to gather information for
- session (cherrypy.Session) – Stores the information for a session
Returns: Dictionary used to generate webpage for a requested workflow
Return type: dict
Workflow Info¶
Module containing and returning information about workflows.
| authors: | Daniel Abercrombie <dabercro@mit.edu> |
|---|
-
class
WorkflowWebTools.workflowinfo.Info[source]¶ Implements shared operations on the cache
-
class
WorkflowWebTools.workflowinfo.PrepIDInfo(prep_id, url='cmsweb.cern.ch')[source]¶ A class that just holds a small amount of information about a given PrepID.
-
get_requests(*args, **kwargs)[source]¶ Returns: The requests for the Prep ID from ReqMgr2 API Return type: dict
-
-
class
WorkflowWebTools.workflowinfo.WorkflowInfo(workflow, url='cmsweb.cern.ch')[source]¶ Class that holds methods for accessing various information about a workflow.
-
get_errors(*args, **kwargs)[source]¶ A wrapper for
errors_for_workflow()if you happen to have aWorkflowInfoobject already.Parameters: get_unreported (bool) – Get the unreported errors from ACDC server Returns: a dictionary containing error codes in the following format: {step: {errorcode: {site: number_errors}}}
Return type: dict
-
get_explanation(errorcode, step='')[source]¶ Gets a list of error logs for a given error code.
Parameters: - errorcode (str) – The error code to explain
- step (str) – The full name of the step to return explanations from
Returns: list of error logs
Return type: list
-
get_recovery_info(*args, **kwargs)[source]¶ Get the recovery info for this workflow.
Returns: a dictionary containing the information used in recovery. The keys in this dictionary are arranged like the following: { task: { 'sites_to_run': list(sites), 'missing_to_run': int() } }
Return type: dict
-
get_workflow_parameters(*args, **kwargs)[source]¶ Get the workflow parameters from ReqMgr2, or returns a cached value. See the ReqMgr 2 wiki for more details.
Returns: Parameters for the workflow from ReqMgr2. Return type: dict
-
-
WorkflowWebTools.workflowinfo.cached_json(attribute, timeout=None)[source]¶ A decorator for caching dictionaries in local files.
Parameters: - attribute (str) – The key of the
WorkflowInfocache to set using the decorated function. - timeout (int) – The amount of time before refreshing the JSON file, in seconds.
Returns: Function decorator
Return type: func
- attribute (str) – The key of the
-
WorkflowWebTools.workflowinfo.errors_for_workflow(workflow, url='cmsweb.cern.ch')[source]¶ Get the useful status information from a workflow
Parameters: - workflow (str) – the name of the workflow request
- url (str) – the base url to find the information at
Returns: a dictionary containing error codes in the following format:
{step: {errorcode: {site: number_errors}}}
Return type: dict
-
WorkflowWebTools.workflowinfo.explain_errors(workflow, errorcode)[source]¶ Get example errors for a given workflow and errorcode
Parameters: - workflow (str) – is the workflow name
- errorcode (str) – is the error code
Returns: a dict of log snippets from different sites.
Return type: list
-
WorkflowWebTools.workflowinfo.list_workflows(status)[source]¶ Get the list of workflows currently in a given status. For a list of valid requests, visit the Request Manager Interface.
Parameters: status (str) – The status of the workflow lists being looked for Returns: A list of workflows matching the status Return type: list
Workflow Clustering¶
Tools for clustering workflows based on their errors.
The errors are clustered based on a generated vector. The vector can be thought as lying on two different hyper-sphere shells. One sphere is for the error codes that occur and the other sphere is the sites where those errors occur. The overall distance between two workflow will be the sum in quadrature of the distances on these two hyper-spheres.
These distances are configurable in the server config.yml.
Each sphere has a center radius, thickness, and number of errors to
be at the midpoint.
The equation to determine distance from the origin for a vector
of errors is the following.
The following parameters are set in config.yml for the
site name and the error code hyperspheres separately.
- m is the ‘midpoint’ parameter is the number of errors at a given site or given error code that will place the workflow at the midpoint of the hypersphere shell.
- d is the ‘distance’ parameter is the cartesian distance between the midpoints of two different sites or error codes.
- w is the ‘width’ parameter which defines the total shell width that the workflow could possibly land on.
The direction is determined by the error code or site name distribution. Note that this always points in the upper quadrant for a given coordinate.
The equation is chosen so that two workflows that have completely different errors at the same site, and the error width is 0, will end up with a distance that is equal to the error distance when clustering. This way, site and errors can have different distance weights, and there can be some separation for the number of errors in a workflow (for non-zero width).
| author: | Daniel Abercrombie <dabercro@mit.edu> |
|---|
-
WorkflowWebTools.clusterworkflows.CLUSTER_LOCK= <thread.lock object>¶ Lock that should be acquired before running clustering functions in here
-
WorkflowWebTools.clusterworkflows.get_clustered_group(workflow, clusterer, session=None)[source]¶ Get the group for a given workflow in this session
Parameters: - workflow (str) – The workflow to get the group for.
- clusterer (sklearn.cluster.KMeans) – is the clusterer fit with historic data.
- session (cherrypy.Session) – Stores the information for a session
Returns: List of other workflows in the same group
Return type: set
-
WorkflowWebTools.clusterworkflows.get_clusterer(history_path, errors_path='')[source]¶ Use this function to get the clusterer of workflows
Parameters: - history_path (str) – Path to the workflow historical data. This can be a local file path or a URL.
- errors_path (str) – The errors for a given session to include in the clustering
Returns: A dict of a clusterer that is fitted to historical data with its allmap. The keys are ‘clusterer’ and ‘allmap’.
Return type: dict
-
WorkflowWebTools.clusterworkflows.get_workflow_groups(clusterer, session=None)[source]¶ Groups workflows together based on a fitted clusterer
Parameters: - clusterer (dict) – is a dictionary with the clusterer fit with
historic data and the allmap to generate it.
This matches the output of
get_clusterer(). - session (cherrypy.Session) – Stores the information for a session
Returns: A dictionary pointing workflows to a group
Return type: dict
- clusterer (dict) – is a dictionary with the clusterer fit with
historic data and the allmap to generate it.
This matches the output of
-
WorkflowWebTools.clusterworkflows.get_workflow_vectors(workflows, session=None, allmap=None)[source]¶ Gets the errors for workflows in a list of numpy arrays
Parameters: - workflows (str) – the workflows that vectors are returned for
- session (cherrypy.Session) – Stores the information for a session
- allmap (dict) – a globalerrors.ErrorInfo allmap to override the session’s allmap
Returns: a list of numpy arrays of errors for the workflow
Return type: list of numpy.array
Manage Users¶
Module to manage users of WorkflowWebTools.
| author: | Daniel Abercrombie <dabercro@mit.edu> |
|---|
-
WorkflowWebTools.manageusers.add_user(email, username, password, url)[source]¶ Adds the user to the users database and sends a verification email, if the parameters are valid.
Parameters: - email (str) – The user email to send verification to. Make sure to check for valid domains.
- username (str) – The username of the account to add
- password (str) – The password to be stored
- url (str) – The base url to redirect the user to for validating their account
Returns: Success code. 0 for adding user, 1 for not adding
Return type: int
-
WorkflowWebTools.manageusers.confirmation(code, lookup='validator', return_curs=False)[source]¶ Determine the user from the confirmation code and activate the account
Parameters: - code (str) – the confirmation code for the user
- lookup (str) – The field to look up the username with
- return_curs (bool) – If false, closes the connection
Returns: the user name if valid code, or ‘’ if not. Followed by conn, curs if return_curs is True
Return type: str [, sqlite3.Connection, sqlite3.Cursor]
-
WorkflowWebTools.manageusers.do_salt_hash(to_hash)[source]¶ Salt and hash an object into a storable form.
Parameters: to_hash (str) – Alternate salt and hashing to get stored variable Returns: A salty hash Return type: str
-
WorkflowWebTools.manageusers.get_user_db()[source]¶ Gets the users database in the local directory.
Returns: the users connection, cursor Return type: sqlite3.Connection, sqlite3.Cursor
-
WorkflowWebTools.manageusers.resetpassword(code, password)[source]¶ Resets the password for a user.
Parameters: - code (str) – is the validation code for the account
- password (str) – is the new password
Returns: user from the password reset
Return type: str
-
WorkflowWebTools.manageusers.send_reset_email(email, url)[source]¶ Generates a confirmation code for a given account and sends a reset email.
Parameters: - email (str) – the email linked to the account
- url (str) – the url of the running instance
-
WorkflowWebTools.manageusers.validate_password(_, username, password)[source]¶ Verifies users’ logon attempts. This is called automatically by cherrypy, hence the unused parameter.
Parameters: - _ (str) – The hostname
- username (str) – The attempted username
- password (str) – The attempted password
Returns: Whether or not the loging attempt succeeds
Return type: bool
Reasons Manipulation¶
Module for manipulating the database that stores past operator reasons.
| author: | Daniel Abercrombie <dabercro@mit.edu> |
|---|
-
WorkflowWebTools.reasonsmanip.get_reasons()[source]¶ Gets the reasons database in the local directory.
Returns: the reasons connection, cursor Return type: (sqlite3.Connection, sqlite3.Cursor)
-
WorkflowWebTools.reasonsmanip.reasons_list()[source]¶ Get the full list of reasons
Returns: all of the reasons in a dictionary with the short reasons being the key Return type: dict
-
WorkflowWebTools.reasonsmanip.short_reasons_list()[source]¶ Get the list of short reasons
Returns: the list of short reasons Return type: list of strs
-
WorkflowWebTools.reasonsmanip.update_reasons(reasons)[source]¶ Gets the reasons for a given action and updates the reasons db
Parameters: reasons (list of dicts) – is a list of dictionaries of short and long reasons, with keys ‘short’ and ‘long’
Raises: - TypeError – if the parameter is not a list
- KeyError – if the dictionaries in the list do not have the correct structure
Manage Actions¶
Module to manage actions of WorkflowWebTools.
| author: | Daniel Abercrombie <dabercro@mit.edu> |
|---|
-
WorkflowWebTools.manageactions.extract_reasons_params(action, **kwargs)[source]¶ Extracts the reasons and parameters for an action from kwargs
Parameters: - action (str) – The action being asked for by the operator
- kwargs – Keywords that are submitted via POST to /submitaction
Returns: reasons for action, and parameters for the action
Return type: list, dict
-
WorkflowWebTools.manageactions.fix_sites(**kwargs)[source]¶ Fix the site lists for tasks that had zero sites.
Parameters: kwargs – Keywords to be parsed by extract_reasons_params()
-
WorkflowWebTools.manageactions.get_acted_workflows(num_days)[source]¶ Get all of the workflows that have actions assigned
Parameters: num_days (int) – is the number of past days to check for actions. This speeds up the check while not losing the ability to look farther back in time. Returns: a list of workflows acted on Return type: list
-
WorkflowWebTools.manageactions.get_actions(num_days=None, num_hours=24, acted=0)[source]¶ Get the recent actions to be acted on in dictionary form
Parameters: - num_days (int) – is the number of days to check for actions
- num_hours (int) – can be used instead of num_days for finer granularity.
If
num_daysis given,num_hoursis ignored. - acted (int) – Flag (0, 1, or None) for whether to use acted or not
Returns: A dictionary of actions, to be rendered as JSON
Return type: dict
-
WorkflowWebTools.manageactions.get_actions_collection()[source]¶ Gets the actions collection from MongoDB.
Returns: the actions collection Return type: pymongo.collection.Collection
-
WorkflowWebTools.manageactions.get_datetime_submitted(workflow)[source]¶ Get the datetime for a submitted workflow
Parameters: workflow (str) – A workflow to get the time submitted Returns: A datetime object, or None if the workflow has never been submitted Return type: datetime.datetime or None
-
WorkflowWebTools.manageactions.report_actions(workflows, output=None)[source]¶ Mark actions as acted on
Parameters: - workflows (list) – is the list of workflows to no longer show
- output (dict) – If set, a reference to a dictionary to update with details of what was and wasn’t set to acted
-
WorkflowWebTools.manageactions.submitaction(user, workflows, action, session=None, **kwargs)[source]¶ Writes the action to Unified and notifies the user that this happened
Parameters: - user (str) – is the user that submitted the action
- workflows (str) – is the original workflow name or workflows
- action (str) – is the suggested action for Unified to take
- session (cherrypy.Session) – the current session
- kwargs – can include various reasons and additional datasets
Returns: a tuple of workflows, reasons, and params for the action
Return type: list, str, list of dicts, dict
Show Log¶
Generates content for the showlog webpage
| author: | Daniel Abercrombie <dabercro@mit.edu> |
|---|
-
WorkflowWebTools.showlog.give_logs(query, module='', limit=50)[source]¶ Takes output from
CMSToolBox.elasticsearch.search_logs()and organizes it to pass to a mako templateParameters: - query (str) – The query to pass to
CMSToolBox.elasticsearch.search_logs() - module (str) – The module to show
- limit (int) – The max number of logs to show on a single page
Returns: dictionary with [‘logs’], which is a list of logs, and [‘meta’], which shows meta information of newest query result
Return type: dict
- query (str) – The query to pass to
Classifying Errors¶
This module reads the explain error info for a session and tries to
classify them in a way that can help recommend procedures to the operator.
These procedures are gathered from WorkflowWebTools.procedures.
| author: | Daniel Abercrombie <dabercro@mit.edu> |
|---|
-
WorkflowWebTools.classifyerrors.classifyerror(errorcode, workflow, session=None)[source]¶ Return the most relevant characteristics of an error code for this session. This will include things like:
- Files not opening
- StageOut errors
Note
More error types should be added to this function as needed
Parameters: - errorcode (int) – The error code that we want to classify
- workflow (str) – the workflow that we want to get the errors from
- session (cherrypy.Session) – Is the user’s cherrypy session
Returns: A tuple of strings describing the key characteristics of the errorcode. These strings are good for printing directly in web browsers. The first string is the types of errors reported with this error code. The second string is the recommended normal actions. The third string contains special parameters useful for additional actions.
Return type: str, str
-
WorkflowWebTools.classifyerrors.get_max_errorcode(workflow, session=None)[source]¶ Get the errorcode with the most errors for a session
Parameters: - workflow (str) – Is the primary name of the workflow
- session (cherrypy.Session) – Is the user’s cherrypy session
Returns: The error code that appears most often for this workflow
Return type: int
List Workflows At Site¶
This module provides lists of errors per pie variable
| author: | Daniel Abercrombie |
|---|
-
WorkflowWebTools.listpage.listworkflows(error_code, site_name, workflow, session=None)[source]¶ Gives back a list of tuples containing pie variables and the number of errors that matches a given error_code and site_name.
Parameters: - error_code (int) – The error code that we want errors for
- site_name (str) – The site name that we want errors for
- workflow (str) – The workflow that we want errors for
- session (cherrypy.Session) – holds the session information
Returns: List of tuples of pie variable and errors, sorted by number of errors
Return type: list
Machine Learning Functions¶
The actionshistorylink module creates files that link
the action and the errors thrown for each task where the action and history
are both stored on the server.
The config.yml file is read to determine the locations of this history databases.
| author: | Daniel Abercrombie <dabercro@mit.edu> |
|---|
-
WorkflowWebTools.actionshistorylink.dump_json(file_name=None)[source]¶ Dump a list of pairs into a file and returns the dictionary. The pairs are dictionary of errors, and action document. Each element in the list corresponds to a different subtask.
Parameters: file_name (str) – The location to place the json file, if set Returns: The errors and actions for each subtask Return type: dict
The paramsregression module uses neural net classifiers to predict parameters
for error handling workflows.
Note
Regression part is a work in progress. Need more data.
| author: | Daniel Abercrombie <dabercro@mit.edu> |
|---|
-
WorkflowWebTools.paramsregression.convert_to_dense(errors, keys=None, allerrors=None, allsites=None)[source]¶ Take a dictionary of sparse matrices, where the sparse matrices have keys of error code and them site names, and return the equivalent dense matrices.
Parameters: - errors (dict) – The container for sparse matrices.
- keys (list) – Keys for each of the matrices. Defaults to ‘good_sites’ and ‘bad_sites’
- allerrors (list) – An ordered list of all the errors to consider If both this and allsites are blank, this function will just pull lists from the sparse matrices
- allsites (list) – An ordered list of all the sites to consider
Returns: Container for two dense matrices
Return type: dict of lists of lists
-
WorkflowWebTools.paramsregression.get_classifier(raw_data, parameter, **kwargs)[source]¶ Fit a classifier. If the module is run as a script, just print the training and test data output. Otherwise, return the classifier for farther use.
Parameters: - raw_data (dict) – Raw data in the form of output from
actionshistorylink.dump_json(). - parameter (str) – The parameter to classify.
- kwargs – These are kwargs for the
sklearn.neural_network.MLPClassifierthat is running underneath.
Returns: Trained classifier model
Return type: sklearn.neural_network.MLPClassifier
- raw_data (dict) – Raw data in the form of output from
JavaScript and Mako User Interface¶
Some documentation of the JavaScript used on the webpages are given below.
addreason.js¶
This file contains the functions used to add reasons to the Detailed Workflow View
The fields added by this script are handled by the WorkflowWebTools.reasonsmanip module.
Todo
This is some of the messiest code in the package. Need to impliment some JSLint check.
| author: | Daniel Abercrombie <dabercro@mit.edu> |
|---|
-
count¶ Keeps track of the number of reasons on a given page.
-
getDefaultReason(num)¶ This function is used to generate a new reason on the Detailed Workflow View. Each name and ID is unique so that the short and long reasons can be matched.
Arguments: - num (int) – is a unique number for each reason submitted on the current page.
Returns: the innerHTML for a reason div when a new one is generated.
Return type: str
-
addReason()¶ Creates a new reason division and fills it with the result from
getDefaultReason(). It incrementscount. Finally, the division is appended to the “reasons” division.
-
removeReason(childId)¶ Removes a reason from the “reasons” division.
Arguments: - childId (str) – is the full ID of the child to remove.
-
fillLongReason(number)¶ If a short reason is selected from the select field, this function displays the long reason.
Arguments: - number (int) – is the ID number of the reason to fill.
-
checkReason(num)¶ Checks if a short reason entered by the user is already taken or not. If it is taken, the function gives a warning message and displays the corresponding long reason.
-
expand_children(arrow, this_level, this_name, only_hide)¶ This function expands or collapses the rows underneath a given header. The collapsing happens recursively.
Arguments: - this_level – Tells what level the row expansion is on. (0, 1, or 2, for example)
- this_name – Give the name of the row doing the expansion.
- only_hide – Set this to true for recursive collapsing. Prevents expansion when not desired.
makeparams.js¶
This file contains the functions used to add parameters to the page when different actions are selected.
The fields added by this script are handled by the WorkflowWebTools.manageactions module.
| author: | Daniel Abercrombie <dabercro@mit.edu> |
|---|
-
taskTable(opts, texts, taskNumber)¶
Creates a “table” for parameters to be filled in and returns the <div> these are contained in.
| param dict opts: | |
|---|---|
| Each key is the name of the parameter. Each value is the list of possible values for this parameter. | |
| param list texts: | |
| A list of parameters that should be set by a text field. | |
| param int taskNumber: | |
| Separates the different parameters for each task in ACDC. | |
| returns: | The <div> element that contains the table |
-
printSiteLists(method)¶ Sets the contents of the site tables under the “acdc” action based on the method of site selection subsequently selected.
-
makeParamTable(action)¶ Offers the various options that are available for a given action.
checkuser.js¶
JavaScript functions that are used when registering a new user or reseting a password. The password check is not done by any other part of the system, but all other checks are performed by the Python backend on submission for security purposes.
| author: | Daniel Abercrombie <dabercro@mit.edu> |
|---|
-
checkPassword()¶ Checks input fields with the ids #firstword and #secondword. If they match, sets the innerHTML of the div #confirmresult with a green positive message. Otherwise the div is filled with a red warning message.
-
validateForm()¶ Checks the form for registering a new user. The requirements for the form values are given in Creating a New Account.
piechart.js¶
Contains the drawPieCharts function for the global errors page.
| author: | Daniel Abercrombie <dabercro@mit.edu> |
|---|
-
drawPies()¶ This function finds multiple canvases and draws pie charts onto them. The number of errors for each canvas are by a JSON string hidden in the table with id ‘errortable’. The piecharts are split with the largest contributer always having the same color. The radius of the piechart is a fraction of the canvas size, determined by the following equation:
\[\frac{r}{r_{canvas}} = \frac{n_{errors}}{n_{errors} + 10}\]