Site Consistency

build

Compares files on site to files in the dynamo inventory. This tool requires dynamo and xrdfs to be installed separately.

Running the Tool

A simple consistency check on a site can be done by doing the following when an instance of dynamo is installed:

from dynamo_consistency import config, datatypes, getsitecontents, getinventorycontents

config.CONFIG_FILE = '/path/to/config.json'
site = 'T2_US_MIT'     # For example

inventory_listing = getinventorycontents.get_inventory_tree(site)
remote_listing = getsitecontents.get_site_tree(site)

datatypes.compare(inventory_listing, remote_listing, 'results')

In this example, the list of file LFNs in the inventory and not at the site will be in results_missing.txt. The list of file LFNs at the site and not in the inventory will be in results_orphan.txt.

The actual comparison done by the production instance of dynamo has a few more filters and steps, as outlined under Comparison Script.

Configuration

A configuration file should be created before pointing to it, like above. The configuration file for Site Consistency is a JSON or YAML file with the following keys

  • AccessMethod - A dictionary of access methods for sites. Sites default to XRootD, but setting a value of SRM causes the site to be listed by gfal-ls commands.
  • CacheLocation - The directory where all cached information is stored
  • DirectoryList - A list of directories inside of /store/ to check consistency.
  • GFALThreads - The number of threads used by the GFAL listers
  • GlobalRedirectors - The redirectors to start all locate calls from, unless looking for a site that is listed in the Redirectors configuration.
  • IgnoreAge - Ignore any files or directories with an age less than this, in days.
  • IgnoreDirectories - The check ignores any paths that contain any of the strings in this list.
  • InventoryAge - The age, in days, of how old the information from the inventory can be
  • ListAge - The age, in days, of how old the list of files directly from the site can be
  • LogLocation - The directory where all the logs are stored
  • MaxMissing - If more files than this number are missing, then there will be no automatic entry into the register
  • MaxOrphan - If more than files than this number are orphan files at a site, then there will be no automatic entry into the register
  • NumThreads - The number of threads used by the XRootD listers
  • PathPrefix - A dictionary of prefixes to place before /store/ in the XRootD call. If the prefix is not set for a site, and it fails to list /store, it tries /cms/store (prefix '/cms') by default.
  • RedirectorAge - The age, in days, of how old the information on doors from redirectors can be. If this value is set to zero, the redirector information is never refreshed.
  • Redirectors - A dictionary with keys of sites with hard-coded redirector locations. If a site is not listed in this way, the redirector is found by matching domains from CMSToolBox.siteinfo.get_domain() to redirectors found in a generic xrdfs locate call.
  • Retries - Number of retries after timeouts to attempt
  • SaveCache - If set and evaluates to True, copies old cached directory trees instead of overwriting
  • Timeout - This gives the amount of time, in seconds, that you want the listing to try to run on a single directory before it times out.
  • Unmerged - A list of sites to handle cleaning of /store/unmerged on. If the list is empty, all the sites are managed centrally
  • UnmergedLogsAge - The minimum age of the unmerged logs to be deleted, in days
  • UseLoadBalancer - A list of sites where the main redirector of the site is used
  • WebDir - The directory where text files and the sqlite3 database live

Configuration parameters can also be quickly overwritten for a given run by setting an environment variable of the same name.

Production Settings

The configuration in production is the following.

{
  "CacheLocation": "/local/dynamo/consistency/cache",
  "LogLocation": "/local/dynamo/consistency/logs",
  "WebDir": "/home/dynamo/consistency/web",
  "MaxMissing": 10000,
  "MaxOrphan": 10000,
  "Timeout": 300,
  "Retries": 3,
  "NumThreads": 2,
  "GFALThreads": 12,
  "InventoryAge": 0,
  "ListAge": 0,
  "IgnoreAge": 5,
  "RedirectorAge": 0,
  "Redirectors": {
    "T1_RU_JINR_Disk": "se-xrd01.jinr-t1.ru:1095",
    "T1_FR_CCIN2P3_Disk": "ccxrpli003.in2p3.fr:1095", 
    "T2_UK_London_Brunel": "dc2-grid-64.brunel.ac.uk",
    "T2_UK_London_IC":     "gfe02.grid.hep.ph.ic.ac.uk:1098",
    "T2_FR_IPHC": "sbgse1.in2p3.fr",
    "T2_CH_CERN": "eoscms.cern.ch:1094",
    "T3_CH_PSI": "t3se01.psi.ch",
    "T2_FR_CCIN2P3": "ccxrpli003.in2p3.fr:1094",
    "T2_FR_GRIF_LLR": "llrpp01.in2p3.fr:11001",
    "T2_FR_GRIF_IRFU": "node12.datagrid.cea.fr:11001",
    "T2_IT_Bari": "ss-01.recas.ba.infn.it:8080",
    "T2_IT_Rome": "cmsrm-xrootd02.roma1.infn.it:7070",
    "T2_PK_NCP" : "pcncp22.ncp.edu.pk:11001",
    "T2_RU_INR":  "grse001.inr.troitsk.ru:11000",
    "T2_RU_ITEP": "se3.itep.ru:1095",
    "T3_US_MIT": "t3serv006.mit.edu"
  },
  "PathPrefix": {
    "T2_UK_London_Brunel": "/cms",
    "T2_FR_IPHC": "/cms/phedex",
    "T2_KR_KISTI": "/pnfs/sdfarm.kr/data/cms"
  },
  "DirectoryList": [
    "mc",
    "data",
    "generator",
    "results",
    "hidata",
    "himc"
  ],
  "IgnoreDirectories": [
    "/SAM",
    "/HCTest",
    "/HCtest",
    "/GenericTTbar",
    "/EWKWPlus2Jets_WToLNu_M-50_14TeV-madgraph-pythia8",
    "/VBFToHHTo4B_SM_14TeV-madgraph-pythia8",
    "/WToLNu_1J_14TeV-madgraphMLM-pythia8",
    "/WZTo3LNu_1Jets_14TeV-madgraphMLM-pythia8",
    "/TTGamma_SingleLeptFromTbar_TuneCUETP8M1_14TeV-madgraph-pythia8",
    "/WGToLNuG_PtG-40_TuneCUETP8M1_14TeV-madgraphMLM-pythia8",
    "/TA_Tleptonic_kappa_aut_LO_madspin-pythia8",
    "/RAW"
  ],
  "AccessMethod": {
    "T1_UK_RAL_Disk": "SRM",
    "T2_ES_IFCA" : "SRM",
    "T2_FR_GRIF_LLR": "SRM",
    "T2_UK_SGrid_RALPP": "SRM",
    "T1_IT_CNAF_Disk": "SRM",
    "T1_US_FNAL_Disk": "directx"
  }, 
  "GlobalRedirectors": [
    "cms-xrd-global.cern.ch"
  ],
  "SaveCache": 1,
  "UseLoadBalancer": [
    "T2_US_Vanderbilt"
  ],
  "Unmerged": [
  "T1_DE_KIT_Disk",
  "T1_ES_PIC_Disk",
  "T1_FR_CCIN2P3_Disk",
  "T1_IT_CNAF_Disk",
  "T1_RU_JINR_Disk",
  "T1_UK_RAL_Disk",
  "xT1_US_FNAL_Disk",
  "T2_AT_Vienna",
  "T2_BE_IIHE",
  "T2_BE_UCL",
  "T2_BR_SPRACE",
  "T2_BR_UERJ",
  "T2_CH_CERN",
  "T2_CH_CSCS",
  "T2_CN_Beijing",
  "T2_DE_DESY",
  "T2_DE_RWTH",
  "T2_EE_Estonia",
  "T2_ES_CIEMAT",
  "T2_ES_IFCA",
  "T2_FI_HIP",
  "T2_FR_CCIN2P3",
  "T2_FR_GRIF_IRFU",
  "T2_FR_GRIF_LLR",
  "T2_FR_IPHC",
  "T2_GR_Ioannina",
  "T2_HU_Budapest",
  "T2_IN_TIFR",
  "T2_IT_Bari",
  "T2_IT_Legnaro",
  "T2_IT_Pisa",
  "T2_IT_Rome",
  "T2_KR_KISTI",
  "T2_KR_KNU",
  "T2_MY_UPM_BIRUNI",
  "T2_PK_NCP",
  "T2_PL_Swierk",
  "T2_PL_Warsaw",
  "T2_PT_NCG_Lisbon",
  "T2_RU_IHEP",
  "T2_RU_INR",
  "T2_RU_ITEP",
  "T2_RU_JINR",
  "T2_RU_PNPI",
  "T2_RU_SINP",
  "T2_TH_CUNSTDA",
  "T2_TR_METU",
  "T2_TW_NCHC",
  "T2_UA_KIPT",
  "T2_UK_London_Brunel",
  "T2_UK_London_IC",
  "T2_UK_SGrid_Bristol",
  "T2_UK_SGrid_RALPP",
  "T2_US_Caltech",
  "T2_US_Florida",
  "T2_US_MIT",
  "T2_US_Nebraska",
  "T2_US_Purdue",
  "T2_US_UCSD",
  "T2_US_Vanderbilt",
  "T2_US_Wisconsin"
  ],
  "UnmergedLogsAge": 60
}

Comparison Script

Note

The following script description was last updated on April 11, 2018.

The production script, located at dynamo_consistency/prod/compare.py at the time of writing, goes through the following steps for each site.

  1. Points config.py to the local consistency_config.json file
  2. Notes the time, and if it’s daylight savings time for entry into the summary database
  3. Reads the list of previous missing files, since it requires a file to be missing on multiple runs before registering it to be copied
  4. It gathers the inventory tree by calling dynamo_consistency.getinventorycontents.get_db_listing().
  5. Creates a list of datasets to not report missing files in. This list consists of the following.
  6. It creates a list of datasets to not report orphans in. This list consists of the following.
    • Datasets that have any files on the site, as listed by the dynamo MySQL database
    • Deletion requests fetched from PhEDEx (same list as datasets to skip in missing)
    • Any datasets that have the status flag set to 'IGNORED' in the dynamo database
    • Merging datasets that are protected by Unified
  7. It gathers the site tree by calling dynamo_consistency.getsitecontents.get_site_tree(). The list of orphans is used during the running to filter out empty directories that are reported to the registry during the run.
  8. Does the comparison between the two trees made, using the configuration options listed under Configuration concerning file age.
  9. If the number of missing files is less than MaxMissing, the number of orphans is less than MaxOrphan, and the site is under the webpage’s “Debugged sites” tab, connects to a dynamo registry to report the following errors:
    • Every orphan file and every empty directory that is not too new nor should contain missing files is entered in the deletion queue.
    • For each missing file, every possible source site as listed by the dynamo database, (not counting the site where missing), is entered in the transfer queue. Creates a text file full of files that only exist elsewhere on tape.
  10. Creates a text file that contains the missing blocks and groups.
  11. .txt file lists and details of orphan and missing files are moved to the web space
  12. If the site is listed in the configuration under the Unmerged list, the unmerged cleaner is run over the site:
  13. The summary database is updated to show the last update on the website
author:Daniel Abercrombie <dabercro@mit.edu>

Automatic Site Selection

To automatically run prod/compare.py over a few well-deserving sites, use prod/run_checks.sh.

Usage

run_checks.sh <MAXNUMBER> <MATCH> [<FLAG>]

runs the Consistency Check for sites that match the name MATCH (using a MySQL “LIKE” expression), limited to MAXNUMBER. Sites that have not been run before will get priority. After that, priority is assigned by the sites that have gone the longest without getting a new summary entry in the summary webpage. Sites that are currently running are excluded.

If any value for FLAG is given, only sites with no files listed or more than 1000 missing or orphan files will be considered.

Examples

run_checks.sh 1 T2_US_MIT                           # If you want to run on a single site
ListAge=0 InventoryAge=0 run_checks.sh 1 T2_US_MIT  # To get a fresh cache, using environment variables to override configuration
run_checks.sh 10 T2_%                               # Run on 10 high priority tier-2 sites
run_checks.sh 2 T1_%_Disk                           # Start 2 tier-1 sites

Author

Daniel Abercrombie <dabercro@mit.edu>

Moving Sites To and From Debugged Tab

To mark sites as ready to be acted on, change the isgood value in the sites table in the summary database to 1. For example, if you are in the directory of your webpage, and want to mark T2_US_MIT as good, you could do the following:

echo "UPDATE sites SET isgood = 1 WHERE site = 'T2_US_MIT';" | sqlite3 stats.db

To mark a site as bad, set isgood to 0:

echo "UPDATE sites SET isgood = 0 WHERE site = 'T2_US_MIT';" | sqlite3 stats.db

Checking PhEDEx for Dataset Presence

This simple script, located at dynamo_consistency/prod/check_phedex.py, uses the dynamo_consistency.checkphedex.check_for_datasets() check on orphan files. At the end of the check, datasets with more than one file at the site are printed again. This is for easy parsing by eye.

This script can be used as a check if orphan files are really not supposed to be at a site.

Note

Once we allow files to be deleted from a site that only has a partial dataset subscription (to be deleted when the files are in a different part of the dataset), then this script will be less useful.

author:Daniel Abercrombie <dabercro@mit.edu>

Manually Setting XRootD Doors

In addition to the Redirectors key in the configuration file, which sets the redirector for a site, there is also a mechanism for setting all the doors for a site. A list of possible doors can be found at <CacheLocation>/<SiteName>_redirector_list.txt. Any url in that list that matches the domain of the site will be used to make xrootd calls. To add or remove urls from this list, just add or remove lines from this file.

Note

If the RedirectorAge configuration parameter is not set to 0, then this redirector list will be overwritten once it becomes too old. To force the generation of a new list when the RedirectorAge is set to 0, simply delete the redirector list file for that site.

A list of redirectors found by the global redirectors is stored in <CacheLocation>/redirector_list.txt.

Reference

The following is a full reference to the submodules inside of the dynamo_consistency module.

checkphedex.py

A module that provides functions to check the comparison results to the list of files and deletions in PhEDEx.

author:Daniel Abercrombie <dabercro@mit.edu>
dynamo_consistency.checkphedex.check_for_datasets(site, orphan_list_file)[source]

Checks PhEDEx exhaustively to see if a dataset should exist at a site, according to PhEDEx, but has files marked as orphans according to our check. This is done via the PhEDEx filereplicas API. The number of filereplicas for each dataset is printed to the terminal. Datasets that contain any filereplicas are returned by this function.

Parameters:
  • site (str) – The name of the site to check
  • orphan_list_file (list) – List of LFNs that are listed as orphans at the site
Returns:

The list of number of files and datasets for each dataset that is supposed to have at least 1 file at the site.

Return type:

list of tuples

dynamo_consistency.checkphedex.get_phedex_tree(site, callback=None, **kwargs)[source]

Get the file list tree from PhEDEx. Uses the InventoryAge configuration to determine when to refresh cache.

Parameters:site (str) – The site to get information from PhEDEx for.
Returns:A tree containing file replicas that are supposed to be at the site
Return type:dynamo_consistency.datatypes.DirectoryInfo
dynamo_consistency.checkphedex.set_of_deletions(site)[source]

Get a list of datasets with approved deletion requests at a given site that were created within the number of days matching the IgnoreAge configuration parameter. This request is done via the PhEDEx deleterequests API.

Parameters:site (str) – The site that we want the list of deletion requests for.
Returns:Datasets that are in deletion requests
Return type:set

config.py

Small module to get information from the config.

Warning

Must be used on a machine with xrdfs installed (for locate command).

author:Daniel Abercrombie <dabercro@mit.edu>
dynamo_consistency.config.CONFIG_FILE = 'consistency_config.json'

The string giving the location of the configuration JSON file. Generally, you want to set this value of the module before calling config_dict() to get your configuration.

dynamo_consistency.config.DIRECTORYLIST = None

If this is set to a list of directories, it overrides the DirectoryList set in the configuration file. This prevents the tool from attempting to list directories that are not there.

dynamo_consistency.config.LOADER = <module 'json' from '/usr/lib/python2.7/json/__init__.pyc'>

A module that uses the load function on a file descriptor to return a dictionary. (Examples are the json and yaml modules.) If your CONFIG_FILE is not a JSON file, you’ll want to change this also before calling config_dict().

dynamo_consistency.config.config_dict(make_dir=True)[source]
Parameters:

make_dir (bool) – Create the cache directory if it’s missing

Returns:

the configuration file in a dictionary

Return type:

str

Raises:
  • IOError – when it cannot find the configuration file
  • KeyError – when the CacheDirectory key is not set in the configuration file
dynamo_consistency.config.get_redirector(site, banned_doors=None)[source]

Get the redirector and xrootd door servers for a given site. An example valid site name is T2_US_MIT.

Parameters:
  • site (str) – The site we want to contact
  • banned_doors (list) – Give a list of doors to not return. These are usually ones that just timed out.
Returns:

Public hostname of the local redirector and a list of xrootd door servers

Return type:

str, list

dynamo_consistency.config.locate_file(file_name, redirs=None)[source]

Find servers for a file.

Parameters:
  • file_name (str) – Name of the file to locate
  • redirs (list) – Global redirectors to start from. If blank, gets them from the configuration file.
Returns:

List of hostnames that hold the file

Return type:

list

datatypes.py

Module defines the datatypes that are used for storage and comparison. There is also a powerful create_dirinfo function that takes a filler function or object and uses the multiprocessing module to recursively list directories in parallel.

author:Daniel Abercrombie <dabercro@mit.edu>
exception dynamo_consistency.datatypes.BadPath[source]

An exception for throwing when the path doesn’t make sense for various methods of a DirectoryInfo

class dynamo_consistency.datatypes.DirectoryInfo(name='', directories=None, files=None)[source]

Stores all of the information of the contents of a directory

Parameters:
  • name (str) – The name of the directory
  • directories (list) – If this is set, the infos in the list are merged into a master DirectoryInfo.
  • files (list) – List of tuples containing information about files in the directory.
add_file_list(file_infos)[source]

Add a list of tuples containing file_name, file_size to the node. This is most useful when you get a list of files from some other source and want to easily convert that list into a DirectoryInfo()

Parameters:file_infos (list) – The list of files (full path, size in bytes[, timestamp])
add_files(files)[source]

Set the files for this DirectoryInfo node

Parameters:files (list) – The tuples of file information. Each element consists of file name, size, and mod time.
Returns:self for chaining calls
Return type:DirectoryInfo
compare(other, path='', check=None)[source]

Does one way comparison with a different tree

Parameters:
  • other (DirectoryInfo) – The directory tree to compare this one to
  • path (str) – Is the path to get to this location so far
  • check (function) – An optional function that double checks a file name. If the checking function returns True for a file name, the file will not be included in the output.
Returns:

Tuple of list of files and directories that are present and not in the other tree and the size of the files that corresponds to

Return type:

list, list, long

count_nodes(empty=False)[source]
Parameters:empty (bool) – If True, only return the number of empty nodes
Returns:The total number of nodes in this Directory Info. This corresponds to approximately the number of listing requests required to build the data.
Return type:int
display(path='')[source]

Print out the contents of this DirectoryInfo

Parameters:path (str) – The full path to this DirectoryInfo instance
displays(path='')[source]

Get the string to print out the contents of this DirectoryInfo.

Parameters:path (str) – The full path to this DirectoryInfo instance
Returns:The display string
Return type:str
empty_nodes_list()[source]

This function should be used to get the nodes to delete in the proper order for non-recursive deletion

Returns:The list of empty directories to delete in the order to delete
Return type:list
empty_nodes_set()[source]

This function recursively builds the entire list of empty directories that can be deleted

Returns:The set of empty directories to delete
Return type:set
get_directory_size()[source]

Report the total size used by this directory and its subdirectories.

Returns:Size of files in directory, in bytes
Return type:int
get_file(file_name)[source]

Get the file dictionary based off the name.

Parameters:file_name (str) – The LFN of the file
Returns:Dictionary of file information
Return type:dict
Raises:BadPath – if the file_name does not start with self.name
get_files(min_age=0, path='')[source]

Get the list of files that are older than some age

Parameters:
  • min_age (int) – The minimum age, in seconds, of files to list
  • path (str) – The path to this file. Used for recursive calls
Returns:

List of full file paths

Return type:

list

get_node(path, make_new=True)[source]

Get the node that corresponds to the path given. If the node does not exist yet, and make_new is True, the node is created.

Parameters:
  • path (str) – Path to the desired node from current node. If the path does not exist yet, empty nodes will be created.
  • make_new (str) – Bool to create new node if none exists at path or not
Returns:

A node with the proper path, unless make_new is False and the node doesn’t exist

Return type:

DirectoryInfo or None

get_num_files(unlisted=False, place_new=False)[source]

Report the total number of files stored.

Parameters:
  • unlisted (bool) – If true, return number of unlisted directories, Otherwise return only successfully listed files
  • place_new (bool) – If true, pretend there’s one more file inside any new directory or if files is None. This prevents listing of empty directories to include directories that should not actually be deleted.
Returns:

The number of files in the directory tree structure

Return type:

int

get_unlisted(path='')[source]
Parameters:path (str) – Path to prepend to the name, used in recursive calls
Returns:List of directories that were unlisted
Return type:list
listdir(*args, **kwargs)[source]

Get the list of directory names within a DirectoryInfo. Adding an argument will display the contents of the next directory. For example, if dir.listdir() returns:

0: data
1: mc

dir.listdir(1) then lists the contents of mc and dir.listdir(1, 0) lists the contents of the first subdirectory in mc.

Parameters:
  • args – Is a list of indices to list the subdirectories
  • kwargs – Supports ‘printing’ which is set to a bool. Defaults as True.
Returns:

The DirectoryInfo that is being listed

Return type:

DirectoryInfo

remove_node(path_name)[source]

Remove an empty node from the DirectoryInfo

Parameters:

path_name (str) – The path to the node, including the self.name at the beginning

Returns:

self for chaining

Return type:

DirectoryInfo

Raises:
  • NotEmpty – if the directory is not empty or self.files is None
  • BadPath – if the path_name does not start with the self.name
save(file_name)[source]

Save this DirectoryInfo in a file.

Parameters:file_name (str) – is the location to save the file
setup_hash()[source]

Set the hashes for this DirectoryInfo

dynamo_consistency.datatypes.IGNORE_AGE = 1.0

The maximum age, in days, of files and directories to ignore in this check. This variable should be reset once in a while by deamons that run while an operator might be adjusting the configuration.

exception dynamo_consistency.datatypes.NotEmpty[source]

An exception for throwing when a non-empty directory is deleted from a DirectoryInfo

dynamo_consistency.datatypes.compare(inventory, listing, output_base=None, orphan_check=None, missing_check=None)[source]

Compare two different trees and output the differences into an ASCII file

Parameters:
  • inventory (DirectoryInfo) – The tree of files that should be at a site
  • listing (DirectoryInfo) – The tree of files that are listed remotely
  • output_base (str) – The names of the ASCII files to place the reports are generated from this variable.
  • orphan_check (function) – A function that double checks each expected orphan. The function takes as an input, an LFN. If the function returns true, the LFN will not be listed as an orphan.
  • missing_check (function) – A function checks each expected missing file The function takes as an input, an LFN. If the function returns true, the LFN will not be listed as missing.
Returns:

The two lists, missing and orphan files

Return type:

tuple

dynamo_consistency.datatypes.create_dirinfo(location, first_dir, filler, object_params=None, callback=None)[source]

Create the directory information

Parameters:
  • location (str) – This is the beginning of the path where we will find first_dir. For example, to find the first directory mc, we also have to say where it is. In most cases, using LFNs, location would be /store/ (where mc is inside). This is a path.
  • first_dir (str) – The name of the first directory that is inside the path of location. This should not be a path, but the name of the directory to list recursively.
  • filler (function or constructor) –

    This is either a function that lists the directory contents given just a path of os.path.join(location, first_dir), or it is a constructor that does the same thing with a member function called list. If filler is an object constructor, the parameters for the object creation must be passed through the parameter object_params. Both listings must return the following tuple:

    • A bool saying whether the listing was successful or not
    • A list of tuples of sub-directories and their mod times
    • A list of tuples files inside, their size, and their mode times
  • object_params (list) – This only needs to be set when filler is an object constructor. Each element in the list is a tuple of arguments to pass to the constructor.
  • callback (function) – A function that is called every time master thread has finished checking the child threads. This can happen very many times at large sites. The function is called with the main DirectoryTree as its argument
Returns:

A DirectoryInfo object containing everything the directory listings from os.path.join(location, first_dir) with name first_dir.

Return type:

DirectoryInfo

dynamo_consistency.datatypes.get_info(file_name)[source]

Get the DirectoryInfo from a file.

Parameters:file_name (str) – is the location of the saved information
Returns:Saved info
Return type:DirectoryInfo

getsitecontents.py

Tool to get the files located at a site.

Warning

Must be used on a machine with XRootD python module installed.

author:

Daniel Abercrombie <dabercro@mit.edu>

Max Goncharov <maxi@mit.edu>

class dynamo_consistency.getsitecontents.GFalLister(site, thread_num=None)[source]

An object to list a site through gfal-ls calls

ls_directory(path)[source]

Gets the contents of a path

Parameters:path (str) – The full path, starting with /store/, of the directory to list.
Returns:A bool indicating the success, a list of directories, and a list of files.
Return type:bool, list, list
class dynamo_consistency.getsitecontents.Lister(thread_num, site)[source]

The protoype of the listing facility

Parameters:
  • thread_num (int) – This optional parameter is only used to Create a separate logger for this object
  • site (str) – Used for reading the correct configuration
list(path, retries=0)[source]

Return the directory contents at the given path. The list member is expected of every object passed to datatypes.

Parameters:
  • path (str) – The full path, starting with /store/, of the directory to list.
  • retries (int) – Number of attempts so far
Returns:

A bool indicating the success, a list of directories, and a list of files. The list of directories consists of tuples of (directory name, mod time). The list of files consistents of tuples of (file name, size, mod time). The modification times are in seconds from epoch and the file size is in bytes.

Return type:

bool, list, list

ls_directory(path)[source]

Prototype function that lists the directories

Parameters:path (str) – The full path, starting with /store/, of the directory to list.
reconnect()[source]

A prototype for reconnecting to the remote servers

class dynamo_consistency.getsitecontents.XRootDLister(site, door, thread_num=None)[source]

A class that holds two XRootD connections. If the primary connection fails to list a directory, then a fallback connection is used. This keeps the load of listing from hitting more than half of a site’s doors at a time.

Parameters:
  • site (str) – The site that this connection is to.
  • door (str) – The URL of the door that will get the most load
  • thread_num (int) – This optional parameter is only used to Create a separate logger for this object
ls_directory(**kwargs)[source]

Gets the contents of the previously defined redirector at a given path

Parameters:path (str) – The full path, starting with /store/, of the directory to list.
Returns:A bool indicating the success, a list of directories, and a list of files.
Return type:bool, list, list
class dynamo_consistency.getsitecontents.XRootDSubShell(site, door, thread_num=None)[source]

Very similar to the XRootDLister, but uses a subshell through pexpect.

ls_directory(**kwargs)[source]

Prototype function that lists the directories

Parameters:path (str) – The full path, starting with /store/, of the directory to list.
dynamo_consistency.getsitecontents.ct_timestamp(line)[source]

Takes a time string from gfal and extracts the time since epoch

Parameters:line (str) – The line from the gfal-ls call including month, day, and year in some format with lots of hypens
Returns:Timestamp’s time since epoch
Return type:int
dynamo_consistency.getsitecontents.get_site_tree(site, callback=None, **kwargs)[source]

Get the information for a site, from XRootD or a cache.

Parameters:
  • site (str) – The site name
  • callback (function) – The callback function to pass to datatypes.create_dirinfo()
Returns:

The site directory listing information

Return type:

dynamo_consistency.datatypes.DirectoryInfo

getinventorycontents.py

This module gets the information from the inventory about a site’s contents

author:Daniel Abercrombie <dabercro@mit.edu>
class dynamo_consistency.getinventorycontents.InvLoader[source]

Creates an InventoryManager object, if needed, and stores it globally for the module. It also holds a list of datasets in the deletion queue.

get_inventory()[source]

Make an inventory, if needed, and return it.

Returns:Any InventoryManager in the vicinity.
Return type:common.inventory.InventoryManager
dynamo_consistency.getinventorycontents.get_db_listing(site, callback=None, **kwargs)[source]

Get the list of files from dynamo database directly from MySQL.

Parameters:site (str) – The name of the site to load
Returns:The file replicas that are supposed to be at a site
Return type:dynamo_consistency.datatypes.DirectoryInfo
dynamo_consistency.getinventorycontents.get_site_inventory(site, callback=None, **kwargs)[source]

Loads the contents of a site, based on the dynamo inventory

Parameters:site (str) – The name of the site to load
Returns:The file replicas that are supposed to be at a site
Return type:dynamo_consistency.datatypes.DirectoryInfo
dynamo_consistency.getinventorycontents.set_of_ignored()[source]

Get the full list of IGNORED datasets from the inventory