Site Admin Toolkit¶
This package contains useful tools for Site Admins to maintain their site. These are generally scripts designed to be run from within the directory they are stored after some configuration for a given site.
Note
This documentation is new, so it does not describe all of the tools in SiteAdminToolkit repository.
Anything that is frequently used should be added to the repository README to be useful to future admins.
Installation¶
The repository is maintained and tested assuming that admins are using Python 2.6. Any earlier version of Python will not work with some tools. Since the contents of this repository is currently scripts that do not depend on the centralized CMSToolBox Modules, installation can simply be done by cloning the repository:
git clone https://github.com/CMSCompOps/SiteAdminToolkit.git
However, if you wish to run the test suites at your site,
you should install SiteAdminToolkit through the OpsSpace installer.
Unmerged Cleaner¶
The Unmerged Cleaner tool is used to clean unprotected files from a site’s unmerged directory.
This is usually the LFN /store/unmerged.
Other possibilities exist, such as /store/dcachetests/unmerged,
so this can be configured between separate runs, as described under Configuration.
There are three main steps to the deletion, which are also described in more detail later:
- Create a configuration file.
A default configuration file is created the very first time you run
unmerged-cleaner/ListDeletable.py. The script tries to import aconfig.py. If it fails,config.pyis generated. Make sure you do not have some existingconfigmodule that your Python installation will load instead. Then edit the variables described under Configuration. - List the directories or files (configurable) inside the unmerged directory that can be deleted.
Directories that can be considered for deletion are ones that are not
listed as protected by Unified,
are not too new (configurable), and are not in the
logsorSAMdirectories (configurable). An optimized tool, ListDeletable.py, has been created to do this step. - Delete the directories. After creating a list of directories or files that can be deleted, it is the admin’s responsibility to remove them. Available tools are described under Deletion Tools. Feel free to contribute scripts used for your own site by creating a pull request at the CMSCompOps repository.
ListDeletable.py¶
This script is located as SiteAdminToolkit/unmerged-cleaner/ListDeletable.py.
After cloning the repository, it can immediately be run by:
cd SiteAdminToolkit/unmerged-cleaner
./ListDeletable.py
It depends on the ConfigTools.py module too, so it must be run from
the unmerged-cleaner directory or the directory must be added to your $PYTHONPATH.
It is used to list directories that can be removed and stores this list in a simple text file.
The directories listed are in PFN format.
Configuration¶
The first time you run ListDeletable.py, the file config.py
will be generated by the Config Tools Module.
This will attempt to determine the site it is being run at through the hostname of the node
and set SITE_NAME accordingly.
However, you should check the default values since there are other values that can be changed.
In particular, the STORAGE_TYPE may affect whether or not the script runs optimally at the site.
The various configuration options inside config.py are listed below.
- SITE_NAME - This is the site where the script is run at. The only thing this affects is the LFN to PFN translation of the unmerged directory, which can be overwritten directly using UNMERGED_DIR_LOCATION.
- LFN_TO_CLEAN - The Unmerged Cleaner tool cleans the directory matching this LFN. On most sites, this will not need to be changed, but it is possible for a
/store/dcachetests/unmergeddirectory to exist, for example. The default is'/store/unmerged'. - UNMERGED_DIR_LOCATION - The location, or PFN, of the unmerged directory. This can be retrieved from Phedex (default) or given explicitly.
- WHICH_LIST - Determines whether a list of directories or files will be generated. These lists will be in PFN format. Possible values are
'directories'or'files'. The default is'directories'. - DELETION_FILE - The list of directory or file PFNs to delete are placed this file. The default is
'/tmp/<WHICH_LIST>_to_delete.txt'. - SLEEP_TIME - This is the number of seconds between each deletion of a directory or file. The sleep avoids overloading the system and allows the operator to interrupt a deletion. The default is
0.5. - DIRS_TO_AVOID - The directories in this list are left alone. Only the top level of directories within the unmerged location is checked against this if WHICH_LIST is
'directories'. The defaults are['SAM', 'logs']. - MIN_AGE - Directories with an age less than this, in seconds, will not be deleted. The default (
1209600) corresponds to two weeks. Mathematical expressions here are evaluated. - STORAGE_TYPE - This defines the storage type of the site. This may be necessary for the script to run correctly or optimally. Acceptable values are
'posix'and'hadoop'. The default is'posix'.
Running¶
After creating and checking the config.py, the ListDeletable.py script can be run again
to write a list of directory or file PFNs that can be removed.
We expect most site admins to have tools to correctly remove those directories.
However, available tools for removing directories or files in this list are given under
Deletion Tools.
Potential Optimization¶
This script was originally developed on a Hadoop system and unit tested on POSIX.
There are three different functions that interact with the file system which could potentially
be broken or unoptimized for other types of file systems.
These functions are list_folder(), get_file_size(),
do_delete(), and get_mtime().
Anyone who wants to contribute optimized versions of these functions,
depending on the value of config.STORAGE_TYPE
(as it is called from within ListDeletable.py)
is welcome to make pull requests.
| authors: | Christoph Wissing <christoph.wissing@desy.de> Max Goncharov <maxi@mit.edu> Daniel Abercrombie <dabercro@mit.edu> |
|---|
Deletion Tools¶
If you are going to use one of the available deletion tools on your site, it is recommended that you run the unmerged cleaner unit test.
test/test_unmerged_cleaner.py performs the unit tests for the Unmerged Cleaner.
The script can take two optional arguments for testing for the file system at your site.
- The first argument is the location of the directory to be tested. This should be a directory that does not exist, and it should be in a location managed by the filesystem to test.
- The second argument is the type of filesystem testing for. See STORAGE_TYPE under Configuration for the currently supported file system types.
| author: | Daniel Abercrombie <dabercro@mit.edu> |
|---|
Currently, there are two deletion tool available.
One is integrated into ListDeletable.py,
and the other is a Perl script that is used on Hadoop systems.
ListDeletable.py –delete¶
ListDeletable.py can be used to directly delete listed directories or files.
After creating and reviewing the list of directories or files to delete,
ListDelete.py --delete reads your deletion file and removes the directories listed.
This is done via the ListDeletable.do_delete() function.
Please check the function documentation for any existing caveats.
This --delete flag will only work after the deletion file is created.
It is not possible to list and remove directories with an unmodified ListDeletable.py at the same time.
Any listed file or directory without /unmerged/ in the path will cause the script to quit.
HadoopDelete.pl¶
Warning
Currently untested. Serves as a template to be rewritten for a given site.
NAME¶
HadoopDelete.pl – Perform deletions on a Hadoop system with a directory list
DESCRIPTION¶
This script, located in SiteAdminTools/unmerged-cleaner/HadoopDelete.pl,
is used to delete files and directories inside of a list of directories.
In particular, it is used on the Hadoop site T2_US_MIT.
Before running it at your site, check that $file results in
the proper transformation between LFN and PFN.
To run the script, pass the location of the DELETION_FILE parameter
that is set in ListDeletable.py as the arugment.
For example, if the directories to delete are listed in
/tmp/dirs_to_delete.txt, run this script as:
./HadoopDelete.pl /tmp/dirs_to_delete.txt
AUTHOR¶
Max Goncharov <maxi@mit.edu>
Unmerged Cleaner Reference¶
Here are documented members of the Unmerged Cleaner, mostly for people interested in adding some feature.
ListDeletable.py¶
This documentation for ListDeletable does not have all members
because the module docstring is used to generate ListDeletable.py.
This should be enough to get a contributor or advanced user started though.
-
class
ListDeletable.DataNode(path_name)[source]¶ An object that holds other DataNodes inside of it. If a single DataNode is removable, then all nodes under it are removable too. Removability is determined by the list of protected directories and the directory age.
-
ListDeletable.do_delete()[source]¶ Does the deletion for a site based on the deletion file contents. If the deletion file does not exist a message is printed to the user and the script exits.
Note
This can potentially be optimized for different filesystems.
Warning
For Hadoop sites: Currently, we assume your Hadoop instance is mounted at /mnt/hadoop and the checksums are located under /mnt/hadoop/cksums. If this is not the case, the cksums will not be deleted. Your LFN will still be properly propagated to delete the unmerged files themselves.
-
ListDeletable.filter_protected(unmerged_files, protected)[source]¶ Lists unprotected files
Parameters: - unmerged_files (list) – the list of files to check and delete, if unprotected.
- protected (list) – the list of protected LFNs.
Raises: SuspiciousConditions – If the beginning of the file name does not match the configured location of
/store/unmergedor if there is a partial match with a protected LFN
-
ListDeletable.get_file_size(name)[source]¶ Get the size of a file.
Note
This can potentially be optimized for different filesystems. There’s no real use for this function though besides reporting back the user, so go ahead and return 0 or something if you want to.
Parameters: name (str) – Name of file Returns: File size, in bytes Return type: int
-
ListDeletable.get_mtime(name)[source]¶ Get the modification time for a directory or file.
Note
This can potentially be optimized for different filesystems.
Parameters: name (str) – Name of directory or file Returns: Modification time Return type: int
-
ListDeletable.get_unmerged_files()[source]¶ Returns: the old files’ PFNs in the unmerged directory Return type: list
-
ListDeletable.lfn_to_pfn(lfn)[source]¶ Parameters: lfn (str) – is the LFN of a file Returns: the PFN Return type: str
-
ListDeletable.list_folder(name, opt)[source]¶ Lists the directories or files in a parent directory.
Note
This can potentially be optimized for different filesystems.
Parameters: - name (str) – is the name of the directory to list.
- opt (str) – determines what to list inside the directory. If ‘subdirs’, then only directories are listed. If any other value, only files inside directory name are listed.
Returns: a list of directories or files in a directory.
Return type: list
Config Tools Module¶
The unmerged cleaner also makes use of the local module defined in ConfigTools.py.
This module includes tools for identifying the location of the unmerged directory as well as generating the default configuration.
| author: | Daniel Abercrombie <dabercro@mit.edu> |
|---|
-
ConfigTools.generate_default_config()[source]¶ This generates the file
config.py, if it does not exist. Site admins should check this configuration and change to their desired values before runningListDeletable.pya second time.
-
ConfigTools.get_default(key)[source]¶ Parameters: key (str) – This can be any of the keys in DEFAULTS in addition to SITE_NAME and UNMERGED_DIR_LOCATION Returns: a string to write to the default configuration module. Return type: str